Reward-Guided Speculative Decoding for Efficient LLM Reasoning
| Authors |
|
|---|---|
| Publication date | 2025 |
| Journal | Proceedings of Machine Learning Research |
| Event | 42nd International Conference on Machine Learning, ICML 2025 |
| Volume | Issue number | 267 |
| Pages (from-to) | 37555-37572 |
| Organisations |
|
| Abstract |
We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs). RSD synergistically combines a lightweight draft model with a more powerful target model, incorporating a controlled bias to prioritize high-reward outputs, in contrast to existing speculative decoding methods that enforce strict unbiasedness. RSD employs a process reward model to evaluate intermediate decoding steps and dynamically decide whether to invoke the target model, optimizing the trade-off between computational cost and output quality. We theoretically demonstrate that a threshold-based mixture strategy achieves an optimal balance between resource utilization and performance. Extensive evaluations on challenging reasoning benchmarks, including Olympiad-level tasks, show that RSD delivers significant efficiency gains against decoding with the target model only (up to 4.4× fewer FLOPs), while achieving significant better accuracy than parallel decoding method on average (up to +3.5). These results highlight RSD as a robust and cost-effective approach for deploying LLMs in resource-intensive scenarios. The code is available at https://github.com/BaohaoLiao/RSD.
|
| Document type | Article |
| Note | Proceedings of the 42nd International Conference on Machine Learning, 13-19 July 2025, Vancouver Convention Center, Vancouver, Canada |
| Language | English |
| Published at | https://proceedings.mlr.press/v267/liao25f.html |
| Other links | https://www.scopus.com/pages/publications/105023639033 |
| Downloads |
Reward-Guided Speculative Decoding for Efficient LLM
(Final published version)
|
| Permalink to this page | |