Expanding lexicons by inducing paradigms and validating attested formsReport as inadecuate

Expanding lexicons by inducing paradigms and validating attested forms - Download this document for free, or read online. Document in PDF available to download.

1 Inria Saclay - Ile de France

Abstract : One of the bottlenecks in Natural Language Processing for a given language is creating a lexicon that covers the language. The morphological lexicon provides two important pieces of information for NLP applications: 1) the normalization of a word, its lemmatization, which allows the application to recognize two variants of the same word; and 2) the part-of-speech roles that the word can play, which allows the application to parse the text, creating relations between the words in a text. Many NLP applications, e.g. Information Retrieval, Classification, Terminology Extraction, etc., depend upon the normalization and parsing information found in lexicons. When words are not present in these lexicons, it is difficult to predict what their proper lemmatizations and parts-of-speech are. In this paper we present a technique for updating a lexicon given an unknown word via induction of paradigms from an existing, but incomplete, lexicon and validation of the paradigm using corpus evidence.

Keywords : natural language processing lexicography computational linguistics dictionary lexicon

Author: Gregory Grefenstette - Yan Qu David Evans

Source: https://hal.archives-ouvertes.fr/


Related documents