Investigating the Suitability of Implementing the e-rater® Scoring Engine in a Large-Scale English Language Testing Program. Research Report. ETS RR-13-36Reportar como inadecuado




Investigating the Suitability of Implementing the e-rater® Scoring Engine in a Large-Scale English Language Testing Program. Research Report. ETS RR-13-36 - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.



ETS Research Report Series, Dec 2013

In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement between the automated and the human score and relations with criterion variables. Results showed that the sample size was generally not sufficient for prompt-specific scoring. For the generic scoring model, automated scores agreed with human raters as strongly as, or more strongly than, human raters agreed with one another for more than 97% of the prompts. The impact of substituting e-rater for the second human rater made no practically important impact on test takers' scores at both the item and total test score levels. However, neither automated scoring models nor human raters performed invariantly across all prompts or across different test countries/territories. Further investigation indicated homogeneity in the examinee population, possibly nested within test countries/territories as one potential cause of this lack of invariance. Among other limitations, findings may not be generalizable beyond the examinee population investigated in this study.

Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests, Prompting, English (Second Language), Second Language Learning, Evaluators, Essays, Models, Test Items, Scores, Generalization, Writing Tests, Sample Size, Testing Programs, Comparative Analysis, Interrater Reliability, Participant Characteristics, Correlation, Simulation, Item Analysis, Foreign Countries, Regression (Statistics)

Educational Testing Service. Rosedale Road, MS19-R Princeton, NJ 08541. Tel: 609-921-9000; Fax: 609-734-5410; e-mail: RDweb[at]ets.org; Web site: https://www.ets.org/research/policy_research_reports/ets





Autor: Zhang, Mo; Breyer, F. Jay; Lorenz, Florian

Fuente: https://eric.ed.gov/?q=a&ft=on&ff1=dtySince_1992&pg=3223&id=EJ1109947







Documentos relacionados