Automated Classification of Written Proficiency Levels on the CEFR-Scale through Complexity Contours and RNNs

Open Access
Authors
  • M. Ströbel
Publication date 2021
Host editors
  • J. Burstein
  • A. Horbach
  • E. Kochmar
  • R. Laarmann-Quante
  • C. Leacock
  • N. Madnani
  • I. Pilán
  • H. Yannakoudakis
  • T. Zesch
Book title Innovative Use of NLP for Building Educational Applications
Book subtitle EACL 2021 : proceedings of the 16th workshop : April 20, 2021
ISBN (electronic)
  • 9781954085114
Event 16th Workshop on Innovative Use of NLP for Building Educational Applications
Pages (from-to) 199-209
Publisher Stroudsburg, PA: Association for Computational Linguistics
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract
Automatically predicting the level of second language (L2) learner proficiency is an emerging topic of interest and research based on machine learning approaches to language learning and development. The key to the present paper is the combined use of what we refer to as ‘complexity contours’, a series of measurements of indices of L2 proficiency obtained by a computational tool that implements a sliding window technique, and recurrent neural network (RNN) classifiers that adequately capture the sequential information in those contours. We used the EF-Cambridge Open Language Database (Geertzen et al. 2013) with its labelled Common European Framework of Reference (CEFR) levels (Council of Europe 2018) to predict six classes of L2 proficiency levels (A1, A2, B1, B2, C1, C2) in the assessment of writing skills. Our experiments demonstrate that an RNN classifier trained on complexity contours achieves higher classification accuracy than one trained on text-average complexity scores. In a secondary experiment, we determined the relative importance of features from four distinct categories through a sensitivity-based pruning technique. Our approach makes an important contribution to the field of automated identification of language proficiency levels, more specifically, to the increasing efforts towards the empirical validation of CEFR levels.
Document type Conference contribution
Note With supplementary material
Language English
Published at https://aclanthology.org/2021.bea-1.21
Downloads
2021.bea-1.21 (Final published version)
Supplementary materials
Permalink to this page
Back