Transformers for search

T.C.R. Rajapakse Mudiyanselage

Transformers for search Retrieval, robustness, and refusal

Authors	T.C.R. Rajapakse Mudiyanselage
Supervisors	M. de Rijke
Cosupervisors	A.C. Yates
Award date	13-02-2026
ISBN	9789493483804
Number of pages	102
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Access to information has never been easier, quicker, nor more fragile. As language models increasingly dominate search and question answering, the boundary between retrieving information and generating it has blurred. Systems designed to find relevant documents now routinely produce complete answers and explanations, stitched together from model memory and retrieved evidence. While retrieval-augmented models make it easy to answer complex questions, this convenience often hides a fragility: these systems work best when test conditions resemble their training data, and often fail when they do not. Modern information retrieval systems typically follow a pipeline architecture, where a retriever selects candidate documents and a generator produces an answer conditioned on them. In such systems, retrieval and generation are tightly coupled. Two requirements are critical for reliable performance: generalizability, where retrievers remain effective across new datasets, domains, and languages; and grounded answers, where generators base outputs on retrieved evidence and abstain from answering when that evidence is missing. This thesis studies these requirements together. It examines how training data augmentation and negative sampling shape dense retrievers under distribution shift, proposes methods that improve robustness across domains and languages, and investigates how small, open-source language models can be trained to reason over retrieved evidence and refuse to answer when evidence is insufficient. Finally, it emphasizes accessibility through an open-source library, Simple Transformers, that lowers the barrier to building and reproducing transformer-based retrieval and question answering systems.
Document type	PhD thesis
Language	English
Downloads	Thesis
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Transformers for search Retrieval, robustness, and refusal