Adaptive Latency for Part-of-Speech Tagging in Incremental Text-to-Speech SynthesisReport as inadecuate

Adaptive Latency for Part-of-Speech Tagging in Incremental Text-to-Speech Synthesis - Download this document for free, or read online. Document in PDF available to download.

1 GIPSA-CRISSP - CRISSP GIPSA-DPC - Département Parole et Cognition

Abstract : Incremental text-to-speech systems aim at synthesizing a text -on-the-fly-, while the user is typing a sentence. In this context, this article addresses the problem of the part-of-speech tagging POS, i.e. lexical category which is a critical step for accurate grapheme-to-phoneme conversion and prosody estimation. Here, the main challenge is to estimate the POS of a given word without knowing its -right context- i.e. the following words which are not available yet. To address this issue, we propose a method based on a set of decision trees estimating online whether a given POS tag is likely to be modified when more right-contextual information becomes available. In such a case, the synthesis is delayed until POS stability is guaranteed. This results in delivering the synthetic voice in word chunks of variable length. Objective evaluation on French shows that the proposed method is able to estimate POS tags with more than a 92% accuracy compared to a non-incremental system while minimizing the synthesis latency between 1 and 4 words. Perceptual evaluation ranking test is then carried in the context of HMM-based speech synthesis. Experimental results show that the word grouping resulting from the proposed method is rated more acceptable than word-byword incremental synthesis.

Keywords : part-of-speech classification TTS natural language processing Incremental speech synthesis

Author: Maël Pouget - Olha Nahorna - Thomas Hueber - Gérard Bailly -



Related documents