ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning

Open Access
Authors
Publication date 2025
Host editors
  • Wanxiang Che
  • Joyce Nabende
  • Ekaterina Shutova
  • Mohammad Taher Pilehvar
Book title The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) : Findings of the Association for Computational Linguistics: ACL 2025
Book subtitle ACL 2025 : July 27-August 1, 2025
ISBN (electronic)
  • 9798891762565
Event 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Pages (from-to) 24779-24804
Number of pages 26
Publisher Kerrville, TX: Association for Computational Linguistics
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract

As large language models (LLMs) scale, model compression is crucial for edge deployment and accessibility. Weight-only quantization reduces model size but suffers from performance degradation at lower bit widths. Moreover, standard finetuning is incompatible with quantized models, and alternative methods often fall short of full finetuning. In this paper, we propose ClusComp, a simple yet effective compression paradigm that clusters weight matrices into codebooks and finetunes them block-by-block. ClusComp (1) achieves superior performance in 2-4 bit quantization, (2) pushes compression to 1-bit while outperforming ultra-low-bit methods with minimal finetuning, and (3) enables efficient finetuning, even surpassing existing quantization-based approaches and rivaling full FP16 finetuning. Notably, ClusComp supports compression and finetuning of 70B LLMs on a single A6000-48GB GPU.

Document type Conference contribution
Language English
Published at https://doi.org/10.18653/v1/2025.findings-acl.1272
Other links https://www.scopus.com/pages/publications/105028610349
Downloads
2025.findings-acl.1272 (Final published version)
Permalink to this page
Back