The Denoised Web Treebank: Evaluating Dependency Parsing under Noisy Input Conditions

Open Access
Authors
Publication date 2016
Host editors
  • N. Calzolari
  • K. Choukri
  • T. Declerck
  • S. Goggi
  • M. Grobelnik
  • B. Maegaard
  • J. Mariani
  • H. Mazo
  • A. Moreno
  • J. Odijk
  • S. Piperidis
Book title LREC 2016 : Tenth International Conference on Language Resources and Evaluation
Book subtitle May 23-28, 2016, Grand Hotel Bernardin Conference Center, Portorož, Slovenia
ISBN (electronic)
  • 9782951740891
Event Language Resources and Evaluation Conference (LREC 2016)
Pages (from-to) 649-653
Publisher Paris: European Language Resources Association (ELRA)
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract We introduce the Denoised Web Treebank: a treebank including a normalization layer and a corresponding evaluation metric for dependency parsing of noisy text, such as Tweets. This benchmark enables the evaluation of parser robustness as well as text normalization methods, including normalization as machine translation and unsupervised lexical normalization, directly on syntactic trees. Experiments show that text normalization together with a combination of domain-specific and generic part-of-speech taggers can lead to a significant improvement in parsing accuracy on this test set.
Document type Conference contribution
Language English
Published at http://www.lrec-conf.org/proceedings/lrec2016/summaries/86.html
Other links http://www.lrec-conf.org/proceedings/lrec2016/index.html
Downloads
86_Paper (Final published version)
Permalink to this page
Back