Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation

Stefan Vasilev; Christian Herold; Baohao Liao; Seyyed Hadi Hashemi; Shahram Khadivi; Christof Monz

doi:https://doi.org/10.18653/v1/2025.findings-acl.1154

Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation

Authors	Stefan Vasilev Christian Herold Baohao Liao Seyyed Hadi Hashemi Shahram Khadivi Christof Monz
Publication date	2025
Host editors	Wanxiang Che Joyce Nabende Ekaterina Shutova Mohammad Taher Pilehvar
Book title	The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) : Findings of the Association for Computational Linguistics: ACL 2025
Book subtitle	ACL 2025 : July 27-August 1, 2025
ISBN (electronic)	9798891762565
Event	63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Pages (from-to)	22453-22472
Number of pages	20
Publisher	Kerrville, TX: Association for Computational Linguistics
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	This paper introduces Unilogit, a novel self-distillation method for machine unlearning in Large Language Models. Unilogit addresses the challenge of selectively forgetting specific information while maintaining overall model utility, a critical task in compliance with data privacy regulations like GDPR. Unlike prior methods that rely on static hyperparameters or starting model outputs, Unilogit dynamically adjusts target logits to achieve a uniform probability for the target token, leveraging the current model's outputs for more accurate self-distillation targets. This approach not only eliminates the need for additional hyperparameters but also enhances the model's ability to approximate the golden targets. Extensive experiments on public benchmarks and an in-house e-commerce dataset demonstrate Unilogit's superior performance in balancing forget and retain objectives, outperforming state-of-the-art methods such as NPO and UnDIAL. Our analysis further reveals Unilogit's robustness across various scenarios, highlighting its practical applicability and effectiveness in achieving efficacious machine unlearning.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.18653/v1/2025.findings-acl.1154
Other links	https://www.scopus.com/pages/publications/105028591870
Downloads	2025.findings-acl.1154 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation