Taking a step back

O.D. van der Wal

Taking a step back Measuring and mitigating bias in language models

Authors	O.D. van der Wal
Supervisors	W.H. Zuidema K. Schulz
Award date	29-04-2026
ISBN	9789493539150
Series	ILLC Dissertation Series, DS-2026-07
Number of pages	178
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC) Faculty of Humanities (FGw)
Abstract	Language models increasingly shape how people access information, make decisions, and understand social issues. Although often presented as neutral tools, they can reflect and reinforce social biases, influencing stereotypes and judgments about different groups. When a model depicts scientists as male or relies on biased clinical data, it does more than mirror patterns in its training data: it can normalize these associations across millions of interactions. In response, bias benchmarks and mitigation techniques have proliferated, but with little clarity about what benchmark scores measure, and mitigation remains limited when underlying mechanisms are poorly understood. This thesis investigates how to rigorously measure, understand, and mitigate representational bias in language models—systematic patterns in how models encode and use information about social groups. Addressing this requires valid, reliable measurement in realistic contexts and insight into where biased associations arise. The thesis approaches this problem from four directions. First, it develops a framework that treats bias as a latent construct, explaining why metrics disagree and establishing criteria for valid evaluation. Second, a clinical pilot study identifies recurring failure patterns, showing that realistic scenarios reveal issues missed by simplified benchmarks. Third, interpretability methods trace how bias emerges during training, showing that gender information can become increasingly localized. Fourth, targeted interventions show that modifying specific components can reduce bias while preserving performance. Together, these contributions show that reducing bias requires validated measurement, realistic evaluation, and attention to internal model mechanisms, rather than reliance on isolated benchmarks, while recognising that these approaches alone are insufficient in practice.
Document type	PhD thesis
Language	English
Downloads	Thesis (complete) (Embargo up to 2028-04-29) Front matter Chapter 1: Introduction Chapter 2: Background Chapter 3: Measuring representational bias as a latent construct Chapter 4: Scenario-grounded bias evaluation in clinical decision support (Embargo up to 2028-04-29) Chapter 5: Developmental dynamics of bias during pre-training Chapter 6: Mechanistic analysis of bias and targeted intervention Chapter 7: Discussion and conclusions Appendix for chapter 2: Bias benchmarks Appendix for chapter 4: Qualitative failure examples (Embargo up to 2028-04-29) Appendix for chapter 5: Experimental reference materials Appendix for chapter 6: Circuit diagrams and datasets Bibliography; Glossary; Samenvatting; Abstract
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Taking a step back Measuring and mitigating bias in language models