Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation

Open Access
Authors
Publication date 2025
Host editors
  • Wanxiang Che
  • Joyce Nabende
  • Ekaterina Shutova
  • Mohammad Taher Pilehvar
Book title The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) : Findings of the Association for Computational Linguistics: ACL 2025
Book subtitle ACL 2025 : July 27-August 1, 2025
ISBN (electronic)
  • 9798891762565
Event 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Pages (from-to) 22453-22472
Number of pages 20
Publisher Kerrville, TX: Association for Computational Linguistics
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract

This paper introduces Unilogit, a novel self-distillation method for machine unlearning in Large Language Models. Unilogit addresses the challenge of selectively forgetting specific information while maintaining overall model utility, a critical task in compliance with data privacy regulations like GDPR. Unlike prior methods that rely on static hyperparameters or starting model outputs, Unilogit dynamically adjusts target logits to achieve a uniform probability for the target token, leveraging the current model's outputs for more accurate self-distillation targets. This approach not only eliminates the need for additional hyperparameters but also enhances the model's ability to approximate the golden targets. Extensive experiments on public benchmarks and an in-house e-commerce dataset demonstrate Unilogit's superior performance in balancing forget and retain objectives, outperforming state-of-the-art methods such as NPO and UnDIAL. Our analysis further reveals Unilogit's robustness across various scenarios, highlighting its practical applicability and effectiveness in achieving efficacious machine unlearning.

Document type Conference contribution
Language English
Published at https://doi.org/10.18653/v1/2025.findings-acl.1154
Other links https://www.scopus.com/pages/publications/105028591870
Downloads
2025.findings-acl.1154 (Final published version)
Permalink to this page
Back