IPD - Lehrstuhl Tichy - Programmiersysteme

DeNom: A Tool to Find Problematic Nominalizations using NLP

  • Tagung:

    Konferenzartikel 

  • Autoren:

    Mathias Landhäußer
    Sven J. Körner
    Jan Keim
    Walter F. Tichy
    Jennifer Krisch

  • Summary

    Nominalizations in natural language requirements specifications can lead to imprecision. For example, in the phrase "transportation of pallets" it is unclear who transports the pallets from where to where and how. Guidelines for requirements specifications therefore recommend avoiding nominalizations. However, not all nominalizations are problematic. We present an industrial-strength text analysis tool called DeNom, which detects problematic nominalizations and reports them to the user for reformulation.

    DeNom uses Stanford's parser and the Cyc ontology. It classifies nominalizations as problematic or acceptable by first detecting all nominalizations in the specification and then subtracting those which are sufficiently specified within the sentence through word references, attributes, nominal phrase constructions, etc. All remaining nominalization are incompletely specified, and are therefore prone to conceal complex processes. These nominalizations are deemed problematic.

    A thorough evaluation used 10 real-world requirements specifications from Daimler AG consisting of 60,000 words. DeNom identified over 1,100 nominalizations and classified 129 of them as problematic. Only 45 of which were false positives, resulting in a precision of 66%. Recall was 88%. In contrast, a naive nominalization detector without classification would overload the user with 1,100 warnings, a thousand of which would be false positives.

Bibtex

@inproceedings{,
author={Mathias Landh{\"a}u{\ss}er, Sven J. K{\"o}rner, Jan Keim, Walter F. Tichy, Jennifer Krisch},
title={DeNom: A Tool to Find Problematic Nominalizations using NLP},
year=2015,
month=08,
booktitle={Second International Workshop on Artificial Intelligence for Requirements Engineering, At Ottawa, Canada},
url={https://ps.ipd.kit.edu/downloads/},
abstract={Nominalizations in natural language requirements specifications can lead to imprecision. For example, in the phrase "transportation of pallets" it is unclear who transports the pallets from where to where and how. Guidelines for requirements specifications therefore recommend avoiding nominalizations. However, not all nominalizations are problematic. We present an industrial-strength text analysis tool called DeNom, which detects problematic nominalizations and reports them to the user for reformulation.

 

DeNom uses Stanford's parser and the Cyc ontology. It classifies nominalizations as problematic or acceptable by first detecting all nominalizations in the specification and then subtracting those which are sufficiently specified within the sentence through word references, attributes, nominal phrase constructions, etc. All remaining nominalization are incompletely specified, and are therefore prone to conceal complex processes. These nominalizations are deemed problematic.

A thorough evaluation used 10 real-world requirements specifications from Daimler AG consisting of 60,000 words. DeNom identified over 1,100 nominalizations and classified 129 of them as problematic. Only 45 of which were false positives, resulting in a precision of 66%. Recall was 88%. In contrast, a naive nominalization detector without classification would overload the user with 1,100 warnings, a thousand of which would be false positives.},
pages={9-16},
pptUrl={https://ps.ipd.kit.edu/downloads/},
}