Linking Wikipedia to the web
| Authors |
|
|---|---|
| Publication date | 2010 |
| Host editors |
|
| Book title | SIGIR 2010: proceedings: 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval: Geneva, Switzerland, July 19-23, 2010 |
| ISBN |
|
| Event | 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2010), Geneva, Switzerland |
| Pages (from-to) | 839-840 |
| Publisher | New York, NY: Association for Computing Machinery |
| Organisations |
|
| Abstract |
We investigate the task of finding links from Wikipedia pages to external web pages. Such external links significantly extend the information in Wikipedia with information from the Web at large, while retaining the encyclopedic organization of Wikipedia. We use a language modeling approach to create a full-text and anchor text runs, and experiment with different document priors. In addition we explore whether social bookmarking site Delicious can be exploited to further improve our performance. We have constructed a test collection of 53 topics, which are Wikipedia pages on different entities. Our findings are that the anchor text index is a very effective method to retrieve home pages. Url class and anchor text length priors and their combination leads to the best results. Using Delicious on its own does not lead to very good results, but it does contain valuable information. Combining the best anchor text run and the Delicious run leads to further improvements.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1145/1835449.1835642 |
| Permalink to this page | |
