The Silent Saboteur Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems
| Authors |
|
|---|---|
| Publication date | 2025 |
| Host editors |
|
| Book title | The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) : Findings of the Association for Computational Linguistics: ACL 2025 |
| Book subtitle | ACL 2025 : July 27-August 1, 2025 |
| ISBN (electronic) |
|
| Event | 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 |
| Pages (from-to) | 13935-13952 |
| Number of pages | 18 |
| Publisher | Kerrville, TX: Association for Computational Linguistics |
| Organisations |
|
| Abstract |
We explore adversarial attacks against retrieval-augmented generation (RAG) systems to identify their vulnerabilities. We focus on generating human-imperceptible adversarial examples and introduce a novel imperceptible retrieve-to-generate attack against RAG. This task aims to find imperceptible perturbations that retrieve a target document, originally excluded from the initial top-k candidate set, in order to influence the final answer generation. To address this task, we propose ReGENT, a reinforcement learning-based framework that tracks interactions between the attacker and the target RAG and continuously refines attack strategies based on relevance-generation-naturalness rewards. Experiments on newly constructed factual and non-factual question-answering benchmarks demonstrate that ReGENT significantly outperforms existing attack methods in misleading RAG systems with small imperceptible text perturbations. |
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.18653/v1/2025.findings-acl.717 |
| Other links | https://www.scopus.com/pages/publications/105028560721 |
| Downloads |
2025.findings-acl.717
(Final published version)
|
| Permalink to this page | |
