Applying automatically parsed corpora to the study of language variation

Open Access
Authors
Publication date 2014
Host editors
  • J. Tsujii
  • J. Hajic
Book title COLING 2014: the 25th International Conference on Computational Linguistics
Book subtitle proceedings of COLING 2014 : technical papers: August 23-29, 2014, Dublin, Ireland
ISBN
  • 9781941643266
Event COLING 2014
Pages (from-to) 1974-1984
Publisher Sroudsburg, PA: Association for Computational Linguistics
Organisations
  • Faculty of Humanities (FGw) - Amsterdam Institute for Humanities Research (AIHR) - Amsterdam Center for Language and Communication (ACLC)
Abstract
In this work, we discuss the benefits of using automatically parsed corpora to study language variation. The study of language variation is an area of linguistics in which quantitative methods have been particularly successful. We argue that the large datasets that can be obtained using automatic annotation can help drive further research in this direction, providing sufficient data for the increasingly complex models used to describe variation. We demonstrate this by replicating and extending a previous quantitative variation study that used manually and semi-automatically annotated data.
We show that while the study cannot be replicated completely due to limitations of the existing automatic annotation, we can draw at least the same conclusions as the original study. In addition, we demonstrate the flexibility of this method by extending the findings to related linguistic constructions and to another domain of text, using additional data.
Document type Conference contribution
Language English
Published at http://www.aclweb.org/anthology/C14-1186
Downloads
C14-1186 (Final published version)
Permalink to this page
Back