MIRAGE: A Metrics lIbrary for Rating hAllucinations in Generated tExt

Open Access
Authors
  • Benjamin Vendeville
  • Liana Ermakova
  • Pierre De Loor
  • Jaap Kamps ORCID logo
Publication date 2025
Book title CIKM'25
Book subtitle Proceedings of the 34th ACM International Conference on Information and Knowledge Management : November 10-14, 2025, Seoul, Republic of Korea
ISBN (electronic)
  • 9798400720406
Event 34th ACM International Conference on Information and Knowledge Management, CIKM 2025
Pages (from-to) 6539-6543
Number of pages 5
Publisher New York, NY: Association for Computing Machinery
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract

Errors in natural language generation, so-called hallucinations, remain a critical challenge, particularly in high-stakes domains such as healthcare or science communication. While several automatic metrics have been proposed to detect and quantify hallucinations, such as FactCC, QAGS, FEQA, and FactAcc, these metrics are often unavailable, difficult to reproduce, or incompatible with modern development workflows. We introduce MIRAGE, an open-source Python library designed to address these limitations. MIRAGE re-implements key hallucination evaluation metrics in a unified library built on the Hugging Face framework, offering modularity, reproducibility, and standardized inputs and outputs. By adhering to FAIR principles, MIRAGE promotes reproducibility, accelerates experimentation, and supports the development of future hallucination metrics. We validate MIRAGE by re-evaluating existing metrics on benchmark datasets, demonstrating comparable performance while significantly improving usability and transparency.

Document type Conference contribution
Language English
Published at https://doi.org/10.1145/3746252.3761644
Other links https://www.scopus.com/pages/publications/105023153112
Downloads
3746252.3761644 (Final published version)
Permalink to this page
Back