Comparison of Automatic and Expert Teachers’ Rating of Computerized English Listening-Speaking Test

  •  Cao Linlin    


Through Many-Facet Rasch analysis, this study explores the rating differences between 1 computer automatic rater and 5 expert teacher raters on scoring 119 students in a computerized English listening-speaking test. Results indicate that both automatic and the teacher raters demonstrate good inter-rater reliability, though the automatic rater indicates less intra-rater reliability than college teacher and high school teacher raters under the stringent infit limits. There’s no central tendency and random effect for both automatic and human raters. This research provides evidence for the automatic rating reform of the computerized English listening-speaking test (CELST) in Guangdong NMET and encourages the application of MFRM in actual score monitoring.

This work is licensed under a Creative Commons Attribution 4.0 License.