Influence of Pre-annotation on POS-tagged Corpus DevelopmentReport as inadecuate

Influence of Pre-annotation on POS-tagged Corpus Development - Download this document for free, or read online. Document in PDF available to download.

1 INIST - Institut de l-information scientifique et technique 2 LIPN - Laboratoire d-Informatique de Paris-Nord 3 ALPAGE - Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing Inria Paris-Rocquencourt, UPD7 - Université Paris Diderot - Paris 7

Abstract : This article details a series of carefully designed experiments aiming at evaluating the influence of automatic pre-annotation on the manual part-of-speech annotation of a corpus, both from the quality and the time points of view, with a specific attention drawn to biases. For this purpose, we manually annotated parts of the Penn Treebank corpus under various experimental setups, either from scratch or using various pre-annotations. These experiments confirm and detail the gain in quality observed before, while showing that biases do appear and should be taken into account. They finally demonstrate that even a not so accurate tagger can help improving annotation speed.

Author: Karën Fort - Benoît Sagot -



Related documents