This is the official repository for DeepResearcher.
DeepResearcher is the first comprehensive framework for end-to-end training of LLM-based deep research agents through scaling reinforcement learning (RL) in real-world environments with authentic web search interactions. Our qualitative analysis reveals emergent cognitive behaviors from end-to-end RL training, including the ability to formulate plans, cross-validate information from multiple sources, engage in self-reflection to redirect research, and maintain honesty when unable to find definitive answers.
DeepResearcher is now available on huggingface-hub:
Model Name | HF Checkpoint | Size |
---|---|---|
DeepResearcher-7b | 🤗 GAIR/DeepResearcher-7b | 7B |
Extensive experiments on open-domain research tasks demonstrate that DeepResearcher achieves substantial improvements of up to 28.9 points over prompt engineering-based baselines and up to 7.2 points over RAG-based RL agents. Our qualitative analysis reveals emergent cognitive behaviors from end-to-end RL training, including the ability to formulate plans, cross-validate information from multiple sources, engage in self-reflection to redirect research, and maintain honesty when unable to find definitive answers. Our results highlight that end-to-end training in real-world web environments is not merely an implementation detail but a fundamental requirement for developing robust research capabilities aligned with real-world applications.
To begin using this repo, you need to install the required dependencies. You can do this by running the following command:
git clone https://github.com/GAIR-NLP/DeepResearcher.git
conda create -n deepresearcher python=3.10
conda activate deepresearcher
cd DeepResearcher
pip3 install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
pip3 install flash-attn --no-build-isolation
pip3 install -e .
pip3 install -r requirements.txt
We use ray to train model, befor start ray you should set PET_NODE_RANK
first. (This is compulsory even if you only have 1 node).
Here is the code of the head node:
export PET_NODE_RANK=0
ray start --head
Running the following command to launch the server handler:
- Modify
serper_api_key
orazure_bing_search_subscription_key
&search_engine
in./scrl/handler/config.yaml
- Add
qwen-plus
api key in./scrl/handler/server_handler.py
client = OpenAI(
api_key="sk-xxx",
base_url="xxxx"
)
- Start server handler:
python ./scrl/handler/server_handler.py
After launching all server handlers, you can replace server_url_list
in ./scrl/handler/config.yaml
in your training host node and then run:
python ./scrl/handler/handler.py
Using the following command to train the model:
bash train_grpo.sh
Using the following command to generate rollout:
bash evaluate.sh
You can find the rollout file in: ./outputs/{project_name}/{experiment_name}/rollout/rollout_step_0.json
You can rename and copy it into ./evaluate/{experiment_name}_result.json
Then, run the following command:
python ./evaluate/cacluate_metrics.py {experiment_name}
You can check the score in ./evaluate/{experiment_name}_score.json
DeepResearcher is inspired by Deepseek-R1 with its implementation based on veRL and Search-r1. We deeply appreciate the contributions of these teams to open-source research and development.
Please cite the repo if the model/code/conclusion in this repo are helpful to you.
@misc{zheng2025deepresearcherscalingdeepresearch,
title={DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments},
author={Yuxiang Zheng and Dayuan Fu and Xiangkun Hu and Xiaojie Cai and Lyumanshan Ye and Pengrui Lu and Pengfei Liu},
year={2025},
eprint={2504.03160},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2504.03160},
}