Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Baohao Liao; Yuhui Xu; Hanze Dong; Junnan Li; Christof Monz; Silvio Savarese; Doyen Sahoo; Caiming Xiong

Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Authors	Baohao Liao Yuhui Xu Hanze Dong Junnan Li Christof Monz Silvio Savarese Doyen Sahoo Caiming Xiong
Publication date	2025
Journal	Proceedings of Machine Learning Research
Event	42nd International Conference on Machine Learning, ICML 2025
Volume \| Issue number	267
Pages (from-to)	37555-37572
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs). RSD synergistically combines a lightweight draft model with a more powerful target model, incorporating a controlled bias to prioritize high-reward outputs, in contrast to existing speculative decoding methods that enforce strict unbiasedness. RSD employs a process reward model to evaluate intermediate decoding steps and dynamically decide whether to invoke the target model, optimizing the trade-off between computational cost and output quality. We theoretically demonstrate that a threshold-based mixture strategy achieves an optimal balance between resource utilization and performance. Extensive evaluations on challenging reasoning benchmarks, including Olympiad-level tasks, show that RSD delivers significant efficiency gains against decoding with the target model only (up to 4.4× fewer FLOPs), while achieving significant better accuracy than parallel decoding method on average (up to +3.5). These results highlight RSD as a robust and cost-effective approach for deploying LLMs in resource-intensive scenarios. The code is available at https://github.com/BaohaoLiao/RSD.
Document type	Article
Note	Proceedings of the 42nd International Conference on Machine Learning, 13-19 July 2025, Vancouver Convention Center, Vancouver, Canada
Language	English
Published at	https://proceedings.mlr.press/v267/liao25f.html
Other links	https://www.scopus.com/pages/publications/105023639033
Downloads	Reward-Guided Speculative Decoding for Efficient LLM (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Reward-Guided Speculative Decoding for Efficient LLM Reasoning