A Taxonomy, Dataset and Benchmark for Detecting and Classifying Malevolent Dialogue Responses

Authors	Y. Zhang P. Ren M. de Rijke
Publication date	12-2021
Journal	Journal of the Association for Information Science and Technology
Volume \| Issue number	72 \| 12
Pages (from-to)	1477-1497
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Conversational interfaces are increasingly popular as a way of connecting people to information. With the increased generative capacity of corpus-based conversational agents comes the need to classify and filter out malevolent responses that are inappropriate in terms of content and dialogue acts. Previous studies on the topic of detecting and classifying inappropriate content are mostly focused on a specific category of malevolence or on single sentences instead of an entire dialogue. We make three contributions to advance research on the malevolent dialogue response detection and classification (MDRDC) task. First, we define the task and present a hierarchical malevolent dialogue taxonomy. Second, we create a labeled multiturn dialogue data set and formulate the MDRDC task as a hierarchical classification task. Last, we apply state-of-the-art text classification methods to the MDRDC task, and report on experiments aimed at assessing the performance of these approaches.
Document type	Article
Language	English
Published at	https://doi.org/10.1002/asi.24496 (Final published version)
Other links	https://github.com/repozhang/malevolent_dialogue
Permalink to this page

Back

UvA-DARE