Linking Wikipedia to the web

Authors
Publication date 2010
Host editors
  • H.-H. Chen
  • E.N. Efthimiadis
  • J. Savoy
  • F. Crestani
  • S. Marchand-Maillet
Book title SIGIR 2010: proceedings: 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval: Geneva, Switzerland, July 19-23, 2010
ISBN
  • 9781450301534
Event 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2010), Geneva, Switzerland
Pages (from-to) 839-840
Publisher New York, NY: Association for Computing Machinery
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract
We investigate the task of finding links from Wikipedia pages to external web pages. Such external links significantly extend the information in Wikipedia with information from the Web at large, while retaining the encyclopedic organization of Wikipedia. We use a language modeling approach to create a full-text and anchor text runs, and experiment with different document priors. In addition we explore whether social bookmarking site Delicious can be exploited to further improve our performance. We have constructed a test collection of 53 topics, which are Wikipedia pages on different entities. Our findings are that the anchor text index is a very effective method to retrieve home pages. Url class and anchor text length priors and their combination leads to the best results. Using Delicious on its own does not lead to very good results, but it does contain valuable information. Combining the best anchor text run and the Delicious run leads to further improvements.
Document type Conference contribution
Language English
Published at https://doi.org/10.1145/1835449.1835642
Permalink to this page
Back