Accurate Unlexicalized Parsing for Modern Hebrew

R. Tsarfaty; K. Sima'an

doi:https://doi.org/10.1007/978-3-540-74628-7_8

Accurate Unlexicalized Parsing for Modern Hebrew

Authors	R. Tsarfaty K. Sima'an
Publication date	2007
Host editors	V. Matoušek P. Mautner
Book title	Text, Speech and Dialogue
Book subtitle	10th International Conference, TSD 2007, Pilsen, Czech Republic, September 3-7, 2007 : proceedings
ISBN	9783540746270
ISBN (electronic)	9783540746287
Series	Lecture Notes in Computer Science
Event	Text, Speech and Dialogue, 10th International Conference
Pages (from-to)	39-47
Publisher	Berlin: Springer
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	Many state-of-the-art statistical parsers for English can be viewed as Probabilistic Context-Free Grammars (PCFGs) acquired from treebanks consisting of phrase-structure trees enriched with a variety of contextual, derivational (e.g., markovization) and lexical information. In this paper we empirically investigate the applicability and adequacy of the unlexicalized variety of such parsing models to Modern Hebrew, a Semitic language that differs in structure and characteristics from English. We show that contrary to experience with parsing the WSJ, the markovized, head-driven unlexicalized variety does not necessarily outperform plain PCFGs for Semitic languages. We demonstrate that enriching unlexicalized PCFGs with morphologically marked agreement features percolated up the parse tree (e.g., definiteness) outperforms plain PCFGs as well as a simple head-driven variation on the MH treebank. We further show that an (unlexicalized) head-driven variety enriched with the same features achieves even better performance. We conclude that morphologically rich languages introduce an additional dimension of parametrization that is orthogonal to the horizontal/vertical dimensions proposed before [1] and its contribution is essential and complementary.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1007/978-3-540-74628-7_8 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Accurate Unlexicalized Parsing for Modern Hebrew