Learning Structural Dependencies of Words in the Zipfian Tail

Authors
Publication date 2011
Host editors
  • H. Bunt
  • J. Nivre
  • Ö. Çetinoğlu
Book title Proceedings of the 12th International Conference on Parsing Technologies
Book subtitle IWPT 2011 : October 5-7, 2011, Dublin City University
ISBN
  • 9781932432046
Event IWPT 2011
Pages (from-to) 80-91
Publisher New Brunswick, NJ: Association for Computational Linguistics
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract
Using semi-supervised EM, we learn finegrained but sparse lexical parameters of a generative parsing model (a PCFG) initially estimated over the Penn Treebank. Our lexical parameters employ supertags, which encode complex structural information at the pre-terminal level, and are particularly sparse in labeled data - our goal is to learn these for words that are unseen or rare in the labeled data. In order to guide estimation from unlabeled data, we incorporate both structural and lexical priors from the labeled data. We get a large error reduction in parsing ambiguous structures associated with unseen verbs, the most important case of learning lexico-structural dependencies. We also obtain a statistically significant improvement in labeled bracketing score of the treebank PCFG, the first successful improvement via semi-supervised EM of a generative structured model already trained over large labeled data.
Document type Conference contribution
Language English
Published at http://www.aclweb.org/anthology/W/W11/W11-2911.pdf
Other links http://www.aclweb.org/anthology/sigparse.html#2011_1
Permalink to this page
Back