A Taxonomy, Dataset and Benchmark for Detecting and Classifying Malevolent Dialogue Responses

Authors
Publication date 12-2021
Journal Journal of the Association for Information Science and Technology
Volume | Issue number 72 | 12
Pages (from-to) 1477-1497
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Conversational interfaces are increasingly popular as a way of connecting people to information. With the increased generative capacity of corpus-based conversational agents comes the need to classify and filter out malevolent responses that are inappropriate in terms of content and dialogue acts. Previous studies on the topic of detecting and classifying inappropriate content are mostly focused on a specific category of malevolence or on single sentences instead of an entire dialogue. We make three contributions to advance research on the malevolent dialogue response detection and classification (MDRDC) task. First, we define the task and present a hierarchical malevolent dialogue taxonomy. Second, we create a labeled multiturn dialogue data set and formulate the MDRDC task as a hierarchical classification task. Last, we apply state-of-the-art text classification methods to the MDRDC task, and report on experiments aimed at assessing the performance of these approaches.
Document type Article
Language English
Published at https://doi.org/10.1002/asi.24496
Other links https://github.com/repozhang/malevolent_dialogue
Permalink to this page
Back