Skip to content

Scaling Deep Research via Reinforcement Learning in Real-world Environments.

License

Notifications You must be signed in to change notification settings

GAIR-NLP/DeepResearcher

Repository files navigation

DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments

This is the official repository for DeepResearcher.

📝 Introduction

DeepResearcher is the first comprehensive framework for end-to-end training of LLM-based deep research agents through scaling reinforcement learning (RL) in real-world environments with authentic web search interactions. Our qualitative analysis reveals emergent cognitive behaviors from end-to-end RL training, including the ability to formulate plans, cross-validate information from multiple sources, engage in self-reflection to redirect research, and maintain honesty when unable to find definitive answers.

       

📋 Table of Contents

🤖 Model

DeepResearcher is now available on huggingface-hub:

Model Name HF Checkpoint Size
DeepResearcher-7b 🤗 GAIR/DeepResearcher-7b 7B

🏆 Performance

Extensive experiments on open-domain research tasks demonstrate that DeepResearcher achieves substantial improvements of up to 28.9 points over prompt engineering-based baselines and up to 7.2 points over RAG-based RL agents. Our qualitative analysis reveals emergent cognitive behaviors from end-to-end RL training, including the ability to formulate plans, cross-validate information from multiple sources, engage in self-reflection to redirect research, and maintain honesty when unable to find definitive answers. Our results highlight that end-to-end training in real-world web environments is not merely an implementation detail but a fundamental requirement for developing robust research capabilities aligned with real-world applications.

🚀 Get Started

Package Installation

To begin using this repo, you need to install the required dependencies. You can do this by running the following command:

git clone https://github.com/GAIR-NLP/DeepResearcher.git 
conda create -n deepresearcher python=3.10 
conda activate deepresearcher
cd DeepResearcher
pip3 install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
pip3 install flash-attn --no-build-isolation
pip3 install -e .
pip3 install -r requirements.txt

Start ray before training and inference

We use ray to train model, befor start ray you should set PET_NODE_RANK first. (This is compulsory even if you only have 1 node). Here is the code of the head node:

export PET_NODE_RANK=0
ray start --head

Run backend handler

Running the following command to launch the server handler:

  1. Modify serper_api_key or azure_bing_search_subscription_key & search_engine in ./scrl/handler/config.yaml
  2. Add qwen-plus api key in ./scrl/handler/server_handler.py
client = OpenAI(
    api_key="sk-xxx",
    base_url="xxxx"
)
  1. Start server handler:
 python ./scrl/handler/server_handler.py

After launching all server handlers, you can replace server_url_list in ./scrl/handler/config.yaml in your training host node and then run:

 python ./scrl/handler/handler.py

Training model

Using the following command to train the model:

 bash train_grpo.sh

Evaluate

Using the following command to generate rollout:

 bash evaluate.sh

You can find the rollout file in: ./outputs/{project_name}/{experiment_name}/rollout/rollout_step_0.json You can rename and copy it into ./evaluate/{experiment_name}_result.json

Then, run the following command:

 python ./evaluate/cacluate_metrics.py {experiment_name}

You can check the score in ./evaluate/{experiment_name}_score.json

🙏 Acknowledgement

DeepResearcher is inspired by Deepseek-R1 with its implementation based on veRL and Search-r1. We deeply appreciate the contributions of these teams to open-source research and development.

✍️ Citation

Please cite the repo if the model/code/conclusion in this repo are helpful to you.

@misc{zheng2025deepresearcherscalingdeepresearch,
      title={DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments}, 
      author={Yuxiang Zheng and Dayuan Fu and Xiangkun Hu and Xiaojie Cai and Lyumanshan Ye and Pengrui Lu and Pengfei Liu},
      year={2025},
      eprint={2504.03160},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2504.03160}, 
}

About

Scaling Deep Research via Reinforcement Learning in Real-world Environments.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published