How different are language models and word clouds?

R. Kaptein; D. Hiemstra; J. Kamps

doi:https://doi.org/10.1007/978-3-642-12275-0_48

How different are language models and word clouds?

Authors	R. Kaptein D. Hiemstra J. Kamps
Publication date	2010
Host editors	C. Gurrin Y. He G. Kazai U. Kruschwitz S. Little T. Roelleke S. Rüger K. van Rijsbergen
Book title	Advances in Information Retrieval
Book subtitle	32nd European Conference on IR Research, ECIR 2010, Milton Keynes, UK, March 28-31, 2010: proceedings
ISBN	9783642122743
ISBN (electronic)	9783642122750
Series	Lecture Notes in Computer Science
Event	32nd European Conference on IR Research (ECIR 2010), Milton Keynes, UK
Pages (from-to)	556-568
Publisher	Berlin: Springer
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	Word clouds are a summarised representation of a document’s text, similar to tag clouds which summarise the tags assigned to documents. Word clouds are similar to language models in the sense that they represent a document by its word distribution. In this paper we investigate the differences between word cloud and language modelling approaches, and specifically whether effective language modelling techniques also improve word clouds. We evaluate the quality of the language model using a system evaluation test bed, and evaluate the quality of the resulting word cloud with a user study. Our experiments show that different language modelling techniques can be applied to improve a standard word cloud that uses a TF weighting scheme in combination with stopword removal. Including bigrams in the word clouds and a parsimonious term weighting scheme are the most effective in both the system evaluation and the user study.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1007/978-3-642-12275-0_48 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

How different are language models and word clouds?