WideSeek-R1#
WideSeek-R1 is a lead-agent and subagent framework trained with multi-agent reinforcement learning (MARL) for broad information-seeking tasks. It combines scalable orchestration and parallel execution through a shared LLM, isolated agent contexts, and specialized tools.
On the WideSearch benchmark, WideSeek-R1-4B reaches an item F1 score of
40.0%. This is comparable to single-agent DeepSeek-R1-671B while continuing
to improve as the number of parallel subagents increases.
For the full method and results, see the WideSeek-R1 publication, the project page, the paper on arXiv, and the example code in RLinf.
Installation#
For the base environment, follow the RLinf installation guide.
We recommend the prebuilt Docker image:
docker pull rlinf/rlinf:math-rlinf0.2-torch2.6.0-sglang0.4.6.post5-vllm0.8.5-megatron0.13.0-te2.1
If you prefer a local environment, install the agentic stack:
bash requirements/install.sh agentic
Our startup scripts and configuration files are located in examples/agent/wideseek_r1.
examples/agent/wideseek_r1/configcontains the YAML configuration files for training and evaluation.examples/agent/tools/search_local_server_qdrantprovides the search engine implementation used by offline tools.examples/agent/wideseek_r1/run_train.shandexamples/agent/wideseek_r1/run_eval.share the main entry points for training and evaluation, respectively.
Tool Backends#
WideSeek-R1 supports two tool backends:
Offline Mode for training and standard QA evaluation.
Online Mode for WideSearch evaluation.
See Tool Setup for the full configuration workflow.
Quick Start#
Before running either training or evaluation, start the judge model server. WideSeek-R1 uses an LLM judge to provide more reliable feedback than exact-match scoring alone.
Judge Model#
The default setup uses Qwen3-30B-A3B-Instruct-2507 as the judge model.
Start the judge server with SGLang:
python3 -m sglang.launch_server \
--model-path /PATH/TO/Qwen3-30B-A3B-Instruct-2507 \
--host 0.0.0.0 \
--log-level info \
--context-length 32768 \
--dp 8
In the main experiments, the judge model was served on 8 H100 GPUs. You can
reduce or increase --dp based on your available hardware and throughput
requirements.
Then obtain the host IP address, for example:
hostname -I
Use that IP address in the YAML configuration through the following fields. The default port is 30000.
agentloop:
llm_ip: LLM_JUDGE_IP
llm_port: LLM_JUDGE_PORT
you can test it by:
python rlinf/agents/wideseek_r1/utils/sglang_client.py --llm-ip LLM_JUDGE_IP
Using RLinf Built-in Rollout Engine as Judge#
Alternatively, you can use RLinf’s built-in rollout engine as the judge instead of an external server. This approach runs the judge LLM within the RLinf framework, which can be more convenient for local development and testing.
To use the built-in rollout engine as judge, set the following configuration in your YAML file:
agentloop:
use_local_judge: true # Enable local judge within RLinf framework
Then configure the rollout_judge section with your desired model and settings:
rollout_judge:
group_name: "RolloutJudgeGroup"
gpu_memory_utilization: 0.5
model:
model_type: qwen3
model_path: /PATH/TO/YOUR/JUDGE/MODEL # Replace with actual path
precision: fp16
rollout_backend: sglang
tensor_parallel_size: 1
pipeline_parallel_size: 1
max_running_requests: 64
Example configuration files using the built-in judge can be found in:
examples/agent/wideseek_r1/config/train_qwen3_hybrid_local_judge.yaml
examples/agent/wideseek_r1/config/eval_qwen3_widesearch_local_judge.yaml
When using the built-in judge, you don’t need to start a separate judge server. The judge model will be loaded and managed by RLinf’s rollout engine.
Multi-node#
Since multi-agent generation incurs substantial time overhead, training and evaluation on a single machine with eight GPUs can significantly slow down experiments; therefore, WideSeek-R1 supports multi-node training and evaluation. Please refer to the documentation Multi-node Training.
Next Steps#
For tool configuration, see Tool Setup.
For the full training procedure, see Training.
For the full evaluation procedure, see Evaluation.