SSH: Sparse Spectrum Adaptation via Discrete Hartley Transformation

Open Access
Authors
Publication date 2025
Host editors
  • Luis Chiruzzo
  • Alan Ritter
  • Lu Wang
Book title Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: proceedings of the conference
Book subtitle NAACL 2025 : April 29-May 4, 2025
ISBN (electronic)
  • 9798891761896
Event 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics
Volume | Issue number 1
Pages (from-to) 10400–10415
Publisher Kerrville, TX: Association for Computational Linguistics
Organisations
  • Faculty of Economics and Business (FEB) - Amsterdam Business School Research Institute (ABS-RI)
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Low-rank adaptation (LoRA) has been demonstrated effective in reducing the trainable parameter number when fine-tuning a large foundation model (LLM). However, it still encounters computational and memory challenges when scaling to larger models or addressing more complex task adaptation.In this work, we introduce **Sparse Spectrum Adaptation via Discrete Hartley Transformation (SSH)**, a novel approach that significantly reduces the number of trainable parameters while enhancing model performance. It selects the most informative spectral components across all layers, under the guidance of the initial weights after a discrete Hartley transformation (DHT). The lightweight inverse DHT then projects the spectrum back into the spatial domain for updates.Extensive experiments across both single-modality tasks—such as language understanding and generation—and multi-modality tasks—such as video-text understanding—demonstrate that SSH outperforms existing parameter-efficient fine-tuning (PEFT) methods while achieving substantial reductions in computational cost and memory requirements. For instance, during instruction tuning on the LLaMA3.1 8B model, SSH achieves higher accuracy with only 0.048M trainable parameters compared to LoRA’s 33.5M, while reducing computational intensity up to 55% compared to FourierFT.
Document type Conference contribution
Language English
Published at https://doi.org/10.18653/v1/2025.naacl-long.522
Downloads
2025.naacl-long.522 (Final published version)
Permalink to this page
Back