Taking a step back Measuring and mitigating bias in language models

Open Access
Authors
Supervisors
Award date 29-04-2026
ISBN
  • 9789493539150
Series ILLC Dissertation Series, DS-2026-07
Number of pages 178
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
  • Faculty of Humanities (FGw)
Abstract
Language models increasingly shape how people access information, make decisions, and understand social issues. Although often presented as neutral tools, they can reflect and reinforce social biases, influencing stereotypes and judgments about different groups. When a model depicts scientists as male or relies on biased clinical data, it does more than mirror patterns in its training data: it can normalize these associations across millions of interactions. In response, bias benchmarks and mitigation techniques have proliferated, but with little clarity about what benchmark scores measure, and mitigation remains limited when underlying mechanisms are poorly understood.
This thesis investigates how to rigorously measure, understand, and mitigate representational bias in language models—systematic patterns in how models encode and use information about social groups. Addressing this requires valid, reliable measurement in realistic contexts and insight into where biased associations arise.
The thesis approaches this problem from four directions. First, it develops a framework that treats bias as a latent construct, explaining why metrics disagree and establishing criteria for valid evaluation. Second, a clinical pilot study identifies recurring failure patterns, showing that realistic scenarios reveal issues missed by simplified benchmarks. Third, interpretability methods trace how bias emerges during training, showing that gender information can become increasingly localized. Fourth, targeted interventions show that modifying specific components can reduce bias while preserving performance.
Together, these contributions show that reducing bias requires validated measurement, realistic evaluation, and attention to internal model mechanisms, rather than reliance on isolated benchmarks, while recognising that these approaches alone are insufficient in practice.
Document type PhD thesis
Language English
Downloads
Thesis (complete) (Embargo up to 2028-04-29)
Chapter 4: Scenario-grounded bias evaluation in clinical decision support (Embargo up to 2028-04-29)
Appendix for chapter 4: Qualitative failure examples (Embargo up to 2028-04-29)
Permalink to this page
cover
Back