WideSeek-R1#
WideSeek-R1 is a lead-agent and subagent framework trained with multi-agent reinforcement learning (MARL) for broad information-seeking tasks. It combines scalable orchestration and parallel execution through a shared LLM, isolated agent contexts, and specialized tools.
On the WideSearch benchmark, WideSeek-R1-4B reaches an item F1 score of
40.0%. This is comparable to single-agent DeepSeek-R1-671B while continuing
to improve as the number of parallel subagents increases.
For the full method and results, see the WideSeek-R1 publication, the project page, the paper on arXiv, and the example code in RLinf.
Installation#
For the base environment, follow the RLinf installation guide.
We recommend the prebuilt Docker image:
docker pull rlinf/rlinf:math-rlinf0.2-torch2.6.0-sglang0.4.6.post5-vllm0.8.5-megatron0.13.0-te2.1
If you prefer a local environment, install the agentic stack:
bash requirements/install.sh agentic
Our startup scripts and configuration files are located in examples/agent/wideseek_r1.
examples/agent/wideseek_r1/configcontains the YAML configuration files for training and evaluation.examples/agent/tools/search_local_server_qdrantprovides the search engine implementation used by offline tools.examples/agent/wideseek_r1/run_train.shandexamples/agent/wideseek_r1/run_eval.share the main entry points for training and evaluation, respectively.
Tool Backends#
WideSeek-R1 supports two tool backends:
Offline Mode for training and standard QA evaluation.
Online Mode for WideSearch evaluation.
See Tool Setup for the full configuration workflow.
Quick Start#
Before running either training or evaluation, start the judge model server. WideSeek-R1 uses an LLM judge to provide more reliable feedback than exact-match scoring alone.
Judge Model#
The default setup uses Qwen3-30B-A3B-Instruct-2507 as the judge model.
Start the judge server with SGLang:
python3 -m sglang.launch_server \
--model-path /PATH/TO/Qwen3-30B-A3B-Instruct-2507 \
--host 0.0.0.0 \
--log-level info \
--context-length 32768 \
--dp 8
In the main experiments, the judge model was served on 8 H100 GPUs. You can
reduce or increase --dp based on your available hardware and throughput
requirements.
Then obtain the host IP address, for example:
hostname -I
Use that IP address in the YAML configuration through the following fields. The default port is 30000.
agentloop:
llm_ip: LLM_JUDGE_IP
llm_port: LLM_JUDGE_PORT
you can test it by:
python rlinf/agents/wideseek_r1/utils/sglang_client.py --llm-ip LLM_JUDGE_IP
Multi-node#
Since multi-agent generation incurs substantial time overhead, training and evaluation on a single machine with eight GPUs can significantly slow down experiments; therefore, WideSeek-R1 supports multi-node training and evaluation. Please refer to the documentation Multi-node Training.
Next Steps#
For tool configuration, see Tool Setup.
For the full training procedure, see Training.
For the full evaluation procedure, see Evaluation.