Gradient Weight-normalized Low-rank Projection for Efficient LLM Training

Open Access
Authors
Publication date 2025
Host editors
  • T. Walsh
  • J. Shah
  • Z. Kolter
Book title Proceedings of the 39th Annual AAAI Conference on Artificial Intelligence
Book subtitle February 25-March 4, 2025, Philadelphia, Pennsylvania, USA
ISBN
  • 9781577358978
Event 39th Annual AAAI Conference on Artificial Intelligence
Volume | Issue number 23
Pages (from-to) 24123-24131
Number of pages 9
Publisher Washington, DC: AAAI Press
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
  • Faculty of Law (FdR)
  • Faculty of Economics and Business (FEB) - Amsterdam Business School Research Institute (ABS-RI)
Abstract
Large Language Models (LLMs) have shown remarkable performance across various tasks, but the escalating demands on computational resources pose significant challenges, particularly in the extensive utilization of full fine-tuning for downstream tasks. To address this, parameter-efficient fine-tuning (PEFT) methods have been developed, but they often underperform compared to full fine-tuning and struggle with memory efficiency. In this work, we introduce Gradient Weight-Normalized Low-Rank Projection (GradNormLoRP), a novel approach that enhances both parameter and memory efficiency while maintaining comparable performance to full fine-tuning. GradNormLoRP normalizes the weight matrix to improve gradient conditioning, facilitating better convergence during optimization. Additionally, it applies low-rank approximations to the weight and gradient matrices, significantly reducing memory usage during training. Extensive experiments demonstrate that our 8-bit GradNormLoRP reduces optimizer memory usage by up to 89.5\% and enables the pre-training of large LLMs, such as LLaMA 7B, on consumer-level GPUs like the NVIDIA RTX 4090, without additional inference costs. Moreover, GradNormLoRP outperforms existing low-rank methods in fine-tuning tasks. For instance, when fine-tuning the RoBERTa model on all GLUE tasks with a rank of 8, GradNormLoRP achieves an average score of 80.65, surpassing LoRA's score of 79.23. These results underscore GradNormLoRP as a promising alternative for efficient LLM pre-training and fine-tuning.
Document type Conference contribution
Language English
Published at https://doi.org/10.1609/aaai.v39i23.34587
Downloads
34587-Article Text-38654-1-2-20250410 (Final published version)
Permalink to this page
Back