Empirical Evaluation of Semi-automated XML Annotation of Text Documents with the GoldenGATE Editor

  • Tagung:

    Konferenzartikel 

  • Autoren:

    Guido Sautter
    Clemens Böhm
    Frank Padberg
    Walter F. Tichy
     

  • Summary

    Digitized scientific documents should be marked up according to domain-specific XML schemas, to make maximum use of their content. Such markup allows for advanced, semantics-based access to the document collection. Many NLP applications have been developed to support automated annotation. But NLP results often are not accurate enough and manual corrections are indispensable. We therefore have developed the GoldenGATE editor, a tool that integrates NLP applications and assistance features for manual XML editing. Plain XML editors do not feature such a tight integration: Users have to create the markup manually or move the documents back and forth between the editor and (mostly command line) NLP tools. This paper features the first empirical evaluation of how users benefit from such a tight integration when creating semantically rich digital libraries. We have conducted experiments with humans who had to perform markup tasks on a document collection from a generic domain. The results show clearly that markup editing assistance in tight combination with NLP functionality significantly reduces the user effort in annotating documents. 

  • Jahr:

    2007 

Beteiligte Mitarbeiter (zufällige Reihenfolge)
Titel Vorname Nachname

Bibtex

@inproceedings{978-3-540-74850-2,
author={Guido Sautter, Clemens B{\"o}hm, Frank Padberg, Walter F. Tichy},
title={Empirical Evaluation of Semi-automated XML Annotation of Text Documents with the GoldenGATE Editor},
year=2007,
month=Sep,
booktitle={Research and Advanced Technology for Digital Libraries},
publisher={Springer},
volume={4675/2007},
series={Lecture Notes in Computer Science},
howpublished={Proc. 11th European Conference on Research and Advanced Technology for Digital Libraries, Budapest, Hungary},
abstract={Digitized scientific documents should be marked up according to domain-specific XML schemas, to make maximum use of their content. Such markup allows for advanced, semantics-based access to the document collection. Many NLP applications have been developed to support automated annotation. But NLP results often are not accurate enough and manual corrections are indispensable. We therefore have developed the GoldenGATE editor, a tool that integrates NLP applications and assistance features for manual XML editing. Plain XML editors do not feature such a tight integration: Users have to create the markup manually or move the documents back and forth between the editor and (mostly command line) NLP tools. This paper features the first empirical evaluation of how users benefit from such a tight integration when creating semantically rich digital libraries. We have conducted experiments with humans who had to perform markup tasks on a document collection from a generic domain. The results show clearly that markup editing assistance in tight combination with NLP functionality significantly reduces the user effort in annotating documents.},
number={Volume 4675/2007},
pages={357-367},