Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".
-
Updated
Feb 19, 2025 - Python
Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".
A comprehensive collection of process reward models.
Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.
Add a description, image, and links to the process-reward-model topic page so that developers can more easily learn about it.
To associate your repository with the process-reward-model topic, visit your repo's landing page and select "manage topics."