ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning

Baohao Liao; Christian Herold; Seyyed Hadi Hashemi; Stefan Vasilev; Shahram Khadivi; Christof Monz

doi:https://doi.org/10.18653/v1/2025.findings-acl.1272

ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning

Authors	Baohao Liao Christian Herold Seyyed Hadi Hashemi Stefan Vasilev Shahram Khadivi Christof Monz
Publication date	2025
Host editors	Wanxiang Che Joyce Nabende Ekaterina Shutova Mohammad Taher Pilehvar
Book title	The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) : Findings of the Association for Computational Linguistics: ACL 2025
Book subtitle	ACL 2025 : July 27-August 1, 2025
ISBN (electronic)	9798891762565
Event	63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Pages (from-to)	24779-24804
Number of pages	26
Publisher	Kerrville, TX: Association for Computational Linguistics
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	As large language models (LLMs) scale, model compression is crucial for edge deployment and accessibility. Weight-only quantization reduces model size but suffers from performance degradation at lower bit widths. Moreover, standard finetuning is incompatible with quantized models, and alternative methods often fall short of full finetuning. In this paper, we propose ClusComp, a simple yet effective compression paradigm that clusters weight matrices into codebooks and finetunes them block-by-block. ClusComp (1) achieves superior performance in 2-4 bit quantization, (2) pushes compression to 1-bit while outperforming ultra-low-bit methods with minimal finetuning, and (3) enables efficient finetuning, even surpassing existing quantization-based approaches and rivaling full FP16 finetuning. Notably, ClusComp supports compression and finetuning of 70B LLMs on a single A6000-48GB GPU.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.18653/v1/2025.findings-acl.1272
Other links	https://www.scopus.com/pages/publications/105028610349
Downloads	2025.findings-acl.1272 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning