A Song of (Dis)agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing

M. Neely; S.F. Schouten; M. Bleeker; A. Lucic

doi:https://doi.org/10.3233/FAIA220190

A Song of (Dis)agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing

Authors	M. Neely S.F. Schouten M. Bleeker A. Lucic
Publication date	2022
Host editors	S. Schlobach M. Pérez-Ortiz M. Tielman
Book title	HHAI2022: Augmenting Human Intellect
Book subtitle	Proceedings of the 1st International Conference on Hybrid Human-Artificial Intelligence
ISBN	9781643683089
ISBN (electronic)	9781643683096
Series	Frontiers in Artificial Intelligence and Applications
Event	1st International Conference on Hybrid Human-Artificial Intelligence, HHAI 2022
Pages (from-to)	60-78
Number of pages	19
Publisher	Amsterdam: IOS Press
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	There has been significant debate in the NLP community about whether or not attention weights can be used as an explanation - a mechanism for interpreting how important each input token is for a particular prediction. The validity of 'attention as explanation' has so far been evaluated by computing the rank correlation between attention-based explanations and existing feature attribution explanations using LSTM-based models. In our work, we (i) compare the rank correlation between five more recent feature attribution methods and two attention-based methods, on two types of NLP tasks, and (ii) extend this analysis to also include transformer-based models. We find that attention-based explanations do not correlate strongly with any recent feature attribution methods, regardless of the model or task. Furthermore, we find that none of the tested explanations correlate strongly with one another for the transformer-based model, leading us to question the underlying assumption that we should measure the validity of attention-based explanations based on how well they correlate with existing feature attribution explanation methods. After conducting experiments on five datasets using two different models, we argue that the community should stop using rank correlation as an evaluation metric for attention-based explanations. We suggest that researchers and practitioners should instead test various explanation methods and employ a human-in-the-loop process to determine if the explanations align with human intuition for the particular use case at hand.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.3233/FAIA220190
Other links	https://www.scopus.com/pages/publications/85138627018
Downloads	FAIA-354-FAIA220190 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

A Song of (Dis)agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing