Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Open Access
Authors
  • Christof Monz
  • Silvio Savarese
  • Doyen Sahoo
  • Caiming Xiong
Publication date 2025
Journal Proceedings of Machine Learning Research
Event 42nd International Conference on Machine Learning, ICML 2025
Volume | Issue number 267
Pages (from-to) 37555-37572
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs). RSD synergistically combines a lightweight draft model with a more powerful target model, incorporating a controlled bias to prioritize high-reward outputs, in contrast to existing speculative decoding methods that enforce strict unbiasedness. RSD employs a process reward model to evaluate intermediate decoding steps and dynamically decide whether to invoke the target model, optimizing the trade-off between computational cost and output quality. We theoretically demonstrate that a threshold-based mixture strategy achieves an optimal balance between resource utilization and performance. Extensive evaluations on challenging reasoning benchmarks, including Olympiad-level tasks, show that RSD delivers significant efficiency gains against decoding with the target model only (up to 4.4× fewer FLOPs), while achieving significant better accuracy than parallel decoding method on average (up to +3.5). These results highlight RSD as a robust and cost-effective approach for deploying LLMs in resource-intensive scenarios. The code is available at https://github.com/BaohaoLiao/RSD.
Document type Article
Note Proceedings of the 42nd International Conference on Machine Learning, 13-19 July 2025, Vancouver Convention Center, Vancouver, Canada
Language English
Published at https://proceedings.mlr.press/v267/liao25f.html
Other links https://www.scopus.com/pages/publications/105023639033
Downloads
Permalink to this page
Back