Modelling phonetic reduction in a corpus of spoken English using Random Forests and Mixed-Effects RegressionReportar como inadecuado

Modelling phonetic reduction in a corpus of spoken English using Random Forests and Mixed-Effects Regression - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

Random Forests, Phonetic Reduction, Phonetics, Linguistics, Mixed-Effects Regression

Dilts, Philip C

Supervisor and department: Baayen, R. Harald. Linguistics Tucker, Benjamin V. Linguistics

Examining committee member and department: Arppe, Antti Linguistics Gahl, Susanne Linguistics, University of California, Berkeley Kondrak, Grzegorz Computing Science

Department: Department of Linguistics


Date accepted: 2013-09-30T15:09:05Z

Graduation date: 2013-11

Degree: Doctor of Philosophy

Degree level: Doctoral

Abstract: In this thesis, phonetic reduction in the Buckeye Corpus Pitt et al. 2005 of conversational speech is modelled using advanced statistical techniques. Two measures of phonetic reduction are modelled, reduction in the duration of words and deletion of segments from words. Statistical modelling techniques are used to predict how much of each type of reduction is observed in the corpus. Predictor variables are selected from a number of broad classes, including demographic, phonetic, predictability, syntactic, semantic, and pragmatic variables. The broad scope of these variables leads to a generalizable picture of the factors leading to reduction in spontaneous speech.Two modelling techniques with complementary properties are applied to the modelling task: Random Forest RF models Breiman 2001, and Linear Mixed-Effect Regression LMER Models. RF models can be used to model complex interactions and highly co-linear predictor variables much more easily than LMER models can. Conversely, LMER models allow each word form and speaker to differ in their response to reduction-predicting variables. LMER models can also easily incorporate predictor variables composed of a large number of unordered categories. Both of these properties of LMER models are effectively impossible to incorporate into current RF models on the scale required for the present study.Results relating to the variables or combinations of variables that correlate with reduction or improve model prediction are described. Possible explanations for the results and implications for the nature of the processes underlying reduction during spontaneous speech are explored. Results relating to the modelling process are also discussed. In particular, random forest modelling indicated that several potential interactions between variables were overlooked in initial LMER modelling. When these interactions were included in a second round of LMER modelling, several were found to improve prediction significantly.The results of the present study may lead to improvements in speech recognition and speech production technologies. The results also suggest that random forests can be used to improve regression models of language data.

Language: English

DOI: doi:10.7939-R3KH0F719

Rights: Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.

Autor: Dilts, Philip C



Documentos relacionados