Variable-Length Sequence Language Model for Large Vocabulary Continuous Dictation MachineReport as inadecuate




Variable-Length Sequence Language Model for Large Vocabulary Continuous Dictation Machine - Download this document for free, or read online. Document in PDF available to download.

1 PAROLE - Analysis, perception and recognition of speech INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications 2 ORPAILLEUR - Knowledge representation, reasonning INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications

Abstract : In natural language, some sequences of words are very frequent. A classical language model, like n-gram, does not adequately take into account such sequences, because it underestimates their probabilities. A better approach consists in modeling word sequences as if they were individual dictionary elements. Sequences are considered as additional entries of the word lexicon, on which language models are computed. In this paper, we present two methods for automatically determining frequent phrases in unlabeled corpora of written sentences. These methods are based on information theoretic criteria which insure a high statistical consistency. Our models reach their local optimum since they minimize the perplexity. One procedure is based only on the n-gram language model to extract word sequences. The second one is based on a class n-gram model trained on 233 classes extracted from the eight grammatical classes of French. Experimental tests, in terms of perplexity and recognition rate, are carried out on a vocabulary of 20000 words and a corpus of 43 million words extracted from the ?Le Monde? newspaper. Our models reduce perplexity by more than 20% compared with n-gram nR3 and multigram models. In terms of recognition rate, our models outperform n-gram and multigram models.

Mots-clés : speech recognition n-gram sequence model language model speech séquence modèle de séquence modèle de langage langage parole reconnaissance de la parole n-gramme





Author: Imed Zitouni - Jean-François Mari - Kamel Smaïli - Jean-Paul Haton -

Source: https://hal.archives-ouvertes.fr/



DOWNLOAD PDF




Related documents