The Silent Saboteur

doi:https://doi.org/10.18653/v1/2025.findings-acl.717

The Silent Saboteur Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems

Authors	Hongru Song Yu-An Liu Ruqing Zhang Jiafeng Guo Jianming Lv Maarten de Rijke Xueqi Cheng
Publication date	2025
Host editors	Wanxiang Che Joyce Nabende Ekaterina Shutova Mohammad Taher Pilehvar
Book title	The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) : Findings of the Association for Computational Linguistics: ACL 2025
Book subtitle	ACL 2025 : July 27-August 1, 2025
ISBN (electronic)	9798891762565
Event	63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Pages (from-to)	13935-13952
Number of pages	18
Publisher	Kerrville, TX: Association for Computational Linguistics
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	We explore adversarial attacks against retrieval-augmented generation (RAG) systems to identify their vulnerabilities. We focus on generating human-imperceptible adversarial examples and introduce a novel imperceptible retrieve-to-generate attack against RAG. This task aims to find imperceptible perturbations that retrieve a target document, originally excluded from the initial top-k candidate set, in order to influence the final answer generation. To address this task, we propose ReGENT, a reinforcement learning-based framework that tracks interactions between the attacker and the target RAG and continuously refines attack strategies based on relevance-generation-naturalness rewards. Experiments on newly constructed factual and non-factual question-answering benchmarks demonstrate that ReGENT significantly outperforms existing attack methods in misleading RAG systems with small imperceptible text perturbations.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.18653/v1/2025.findings-acl.717
Other links	https://www.scopus.com/pages/publications/105028560721
Downloads	2025.findings-acl.717 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

The Silent Saboteur Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems