DTIC ADP014020: Development and Evaluation of Audio-Visual ASR: A Study on Connected Digit RecognitionReportar como inadecuado




DTIC ADP014020: Development and Evaluation of Audio-Visual ASR: A Study on Connected Digit Recognition - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

Descargar gratis o leer online en formato PDF el libro: DTIC ADP014020: Development and Evaluation of Audio-Visual ASR: A Study on Connected Digit Recognition
We present our findings from audio-visual speech recognition experiments for connected digit recognition in noisy environments. We derive hybrid (geometric- and appearance-based) visual lip features using a real-time lip tracking algorithm that we proposed previously. Using a small single-speaker corpus modeled after the TIDIGITS database, we build

Autor: Defense Technical Information Center

Fuente: https://archive.org/


Introducción



UNCLASSIFIED Defense Technical Information Center Compilation Part Notice ADP014020 TITLE: Development and Evaluation of Audio-Visual ASR: A Study on Connected Digit Recognition DISTRIBUTION: Approved for public release, distribution unlimited This paper is part of the following report: TITLE: Multi-modal Speech Recognition Workshop 2002 To order the complete compilation report, use: ADA415344 The component part is provided here to allow users access to individually authored sections )f proceedings, annals, symposia, etc.
However, the component should be considered within [he context of the overall compilation report and not as a stand-alone technical report. The following component part numbers comprise the compilation report: ADP014015 thru ADP014027 UNCLASSIFIED Development and Evaluation of Audio-Visual ASR: A Study on Connected Digit Recognition Michael T Chan Rockwell Scientific Company 1049 Camino Dos Rios Thousand Oaks, CA 91360 E-mail: mtchan@rwsc.com Abstract We present our findings from audio-visual speech recognition experimentsfor connected digit recognition in noisy environments.
We derive hybrid (geometric- and appearance-based)visual lipfeatures using a real-time lip tracking algorithm that we proposed previously.
Using a small single-speaker corpus modeled after the TIDIGITS database, we build whole-word HMMs using both singlestream and 2-stream modeling strategies.
For the 2stream HMM method, we use stream-dependentweights to adjust the relative contributions of the two feature streams based on the acoustic SNR level.
The 2-stream HMM consistently gave the lowest WER, with an error reduction of 83% at -3dB SNR level compared to the acoustic-only baseline.
Visual-only ASR WER at 6.85% was also achieved A real-time system prototype was developedfor concept demonstration. first is model-based or geometric-based.
Examples of such features are the width and height of the mouth (and their temporal derivatives) that can be estimated from the images...






Documentos relacionados