A Maximum Entropy Approach to Sentence Boundary Detection of Vietnamese TextsReportar como inadecuado




A Maximum Entropy Approach to Sentence Boundary Detection of Vietnamese Texts - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

1 KIWI - Knowledge Information and Web Intelligence LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications 2 MSI - Modélisation et Simulation Informatique de systèmes complexes

Abstract : We present for the first time a sentence boundary detection system for identifying sentence boundaries in Vietnamese texts. The system is based on a maximum entropy model. The training procedure requires no hand-crafted rules, lexicon, or domain-specific information. Given a corpus annotated with sentence boundaries, the model learns to classify each occurrence of potential end-of-sentence punctuations as either a valid or invalid sentence boundary. Performance of the system on a Vietnamese corpus achieved a good recall ratio of about 95%. The approach has been implemented to create a software tool named vnSentDetector, a plug-in of the open source software framework vnToolkit which is intended to be a general framework integrating useful tools for processing of Vietnamese texts.





Autor: Hong Phuong Le - Tuong Vinh Ho -

Fuente: https://hal.archives-ouvertes.fr/



DESCARGAR PDF




Documentos relacionados