SENSE / Sale

SALE MX - Model Extraction from Natural Language Texts


SALE MX aims at the extraction of UML models from natural language (NL) text. To avoid error prone natural language processing (NLP), SALE MX starts after a (currently manual) annotation of a NL text. The annotation explicitly marks the semantics of the text, thereby documenting a common understanding of the requirements.

The basis of the entire process is SENSE, the Software Engineer's Natural Language Semantics Encoding. SENSE describes how semantics can be encoded and used to process NL texts. SALE (the SENSE Annotation Language for English) is one possible realization of the SENSE process and provides a set of thematic roles with which you can explicitly encode the semantics of texts. Even though designed for English, SALE is also usable for various languages like German, French and Hungarian.

SALE also comes with an ANTLR based compiler that transforms the annotated text into a graph representation, which can be loaded into GrGen.NET. This graph is the internal discourse model of the text and is the central artifact of our process. Graph rewriting rules are then used to evaluate the structure of the semantics. We also use graph rewriting rules to produce an internal graph representation of an UML document which can be saved to an XMI document for further processing.

Apart from the annotation process, the system works without user interaction and produces UML diagrams. This annotation process can be time consuming and is the bottleneck of our system at the moment. Therefore we aim at providing a supportive tool for annotators and try to (pre-) annotate texts automatically.

Teil des Forschungsgebiets