A Unified Framework for Learned Sparse Retrieval

Open Access
Authors
Publication date 2023
Host editors
  • J. Kamps
  • L. Goeuriot
  • F. Crestani
  • M. Maistro
  • H. Joho
  • B. Davis
  • C. Gurrin
  • U. Kruschwitz
  • A. Caputo
Book title Advances in Information Retrieval
Book subtitle 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023 : proceedings
ISBN
  • 9783031282409
ISBN (electronic)
  • 9783031282416
Series Lecture Notes in Computer Science
Event 45th European Conference on Information Retrieval
Volume | Issue number III
Pages (from-to) 101-116
Publisher Cham: Springer
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Learned sparse retrieval (LSR) is a family of first-stage retrieval methods that are trained to generate sparse lexical representations of queries and documents for use with an inverted index. Many LSR methods have been recently introduced, with Splade models achieving state-of-the-art performance on MSMarco. Despite similarities in their model architectures, many LSR methods show substantial differences in effectiveness and efficiency. Differences in the experimental setups and configurations used make it difficult to compare the methods and derive insights. In this work, we analyze existing LSR methods and identify key components to establish an LSR framework that unifies all LSR methods under the same perspective. We then reproduce all prominent methods using a common codebase and re-train them in the same environment, which allows us to quantify how components of the framework affect effectiveness and efficiency. We find that (1) including document term weighting is most important for a method’s effectiveness, (2) including query weighting has a small positive impact, and (3) document expansion and query expansion have a cancellation effect. As a result, we show how removing query expansion from a state-of-the-art model can reduce latency significantly while maintaining effectiveness on MSMarco and TripClick benchmarks.
Document type Conference contribution
Language English
Published at https://doi.org/10.48550/arXiv.2303.13416 https://doi.org/10.1007/978-3-031-28241-6_7
Downloads
2303.13416v1 (Accepted author manuscript)
Permalink to this page
Back