The impact of linkage methods in hierarchical clustering for active learning to rank

Authors
Publication date 2017
Book title SIGIR'17 : proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
Book subtitle August 7-11, 2017, Shinjuku, Tokyo, Japan
ISBN (electronic)
  • 9781450350228
Event 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017
Pages (from-to) 941-944
Number of pages 4
Publisher New York, NY: Association for Computing Machinery
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract

Document ranking is a central problem in many areas, including information retrieval and recommendation. .The goal of learning to rank is to automatically create ranking models from training data. .The performance of ranking models is strongly a.ected by the quality and quantity of training data. Collecting large scale training samples with relevance labels involves human labor which is timeconsuming and expensive. Selective sampling and active learning techniques have been developed and proven effective in addressing this problem. However, most active methods do not scale well and need to rebuild the model a.er selected samples are added to the previous training set. We propose a sampling method which selects a set of instances and labels the full set only once before training the ranking model. Our method is based on hierarchical agglomerative clustering (average linkage) and we also report the performance of other linkage criteria that measure the distance between two clusters of query-document pairs. Another di.erence from previous hierarchical clustering is that we cluster the instances belonging to the same query, which usually outperforms the baselines.

Document type Conference contribution
Language English
Published at https://doi.org/10.1145/3077136.3080684
Other links https://www.scopus.com/pages/publications/85029390421
Permalink to this page
Back