A bias towards neutrality?

A bias towards neutrality? How LLM guardrail sensitivity affects classification

Authors	R. Rogers X. Zhang
Publication date	2025
Journal	Communication and Change
Article number	13
Volume \| Issue number	1
Number of pages	18
Organisations	Faculty of Humanities (FGw) - Amsterdam Institute for Humanities Research (AIHR) - Amsterdam School for Cultural Analysis (ASCA)
Abstract	The advent of generative AI platforms and large language models (LLMs) such as ChatGPT has prompted scholarly work in two seemingly disconnected directions: automated classification as well as bias detection. Here these two strands of work are brought together to take up one of the larger challenges facing social scientific research with AI platforms: the effects of LLM safety guardrails on the quality of LLM data labelling. The piece briefly reviews the literature that takes up classification and bias, particularly their conjunction, which has been termed the safety/helpfulness trade-off. We then turn to findings made from research that explores the effects of guardrails on labelling. In all we find that the greater the bias mitigation the more neutralising sentiment exhibited by LLMs in their classification and labelling. By way of conclusion, we discuss the implications of this bias towards neutrality as an analytical flattening that accompanies the automation of knowledge making.
Document type	Article
Language	English
Published at	https://doi.org/10.1007/s44382-025-00013-0
Downloads	s44382-025-00013-0 (Final published version)
Permalink to this page

Back