PRADA: Practical Black-Box Adversarial Attacks against Neural Ranking Models

C. Wu; R. Zhang; J. Guo; M. de Rijke; Y. Fan; X. Cheng

doi:https://doi.org/10.1145/3576923

PRADA: Practical Black-Box Adversarial Attacks against Neural Ranking Models

Authors	C. Wu R. Zhang J. Guo M. de Rijke Y. Fan X. Cheng
Publication date	10-2023
Journal	ACM Transactions on Information Systems
Article number	89
Volume \| Issue number	41 \| 4
Number of pages	27
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Neural ranking models (NRMs) have shown remarkable success in recent years, especially with pre-trained language models. However, deep neural models are notorious for their vulnerability to adversarial examples. Adversarial attacks may become a new type of web spamming technique given our increased reliance on neural information retrieval models. Therefore, it is important to study potential adversarial attacks to identify vulnerabilities of NRMs before they are deployed. In this article, we introduce the Word Substitution Ranking Attack (WSRA) task against NRMs, which aims at promoting a target document in rankings by adding adversarial perturbations to its text. We focus on the decision-based black-box attack setting, where the attackers cannot directly get access to the model information, but can only query the target model to obtain the rank positions of the partial retrieved list. This attack setting is realistic in real-world search engines. We propose a novel Pseudo Relevance-based ADversarial ranking Attack method (PRADA) that learns a surrogate model based on Pseudo Relevance Feedback (PRF) to generate gradients for finding the adversarial perturbations. Experiments on two web search benchmark datasets show that PRADA can outperform existing attack strategies and successfully fool the NRM with small indiscernible perturbations of text.
Document type	Article
Language	English
Published at	https://doi.org/10.1145/3576923
Other links	https://github.com/wuchen95/PRADA
Downloads	3576923 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

PRADA: Practical Black-Box Adversarial Attacks against Neural Ranking Models