WideSeek-R1#

WideSeek-R1 is a lead-agent and subagent framework trained with multi-agent reinforcement learning (MARL) for broad information-seeking tasks. It combines scalable orchestration and parallel execution through a shared LLM, isolated agent contexts, and specialized tools.

On the WideSearch benchmark, WideSeek-R1-4B reaches an item F1 score of 40.0%. This is comparable to single-agent DeepSeek-R1-671B while continuing to improve as the number of parallel subagents increases.

For the full method and results, see the WideSeek-R1 publication, the project page, the paper on arXiv, and the example code in RLinf.

Overview#

Use this guide as the entry point for WideSeek-R1 setup, training, and evaluation.

Model

Qwen3-4B and Qwen3-series dense models

Algorithm

Multi-agent RL for broad information seeking

Tools

Online web search or offline Qdrant retrieval

Hardware

Single-node quick start or multi-node scaling

Installation#

For the base environment, follow the RLinf installation guide.

We recommend the prebuilt Docker image:

docker pull rlinf/rlinf:agentic-rlinf0.3-torch2.6.0-sglang0.4.6.post5-vllm0.8.5-megatron0.13.0-te2.1

If you prefer a local environment, install the agentic stack:

bash requirements/install.sh agentic

Our startup scripts and configuration files are located in examples/agent/wideseek_r1.

Path	Role
`examples/agent/wideseek_r1/config`	YAML configuration files for training and evaluation.
`examples/agent/tools/search_local_server_qdrant`	Search engine implementation used by offline tools.
`examples/agent/wideseek_r1/run_train.sh` / `examples/agent/wideseek_r1/run_eval.sh`	Main entry points for training and evaluation.

Tool Backends#

WideSeek-R1 supports two tool backends:

Backend	Use it for
Offline tools	Training and standard QA evaluation.
Online tools	WideSearch evaluation.

See Tool Setup for the full configuration workflow.

Run It#

Before running either training or evaluation, start the judge model server. WideSeek-R1 uses an LLM judge to provide more reliable feedback than exact-match scoring alone.

Judge Model#

The default setup uses Qwen3-30B-A3B-Instruct-2507 as the judge model.

Start the judge server with SGLang:

python3 -m sglang.launch_server \
   --model-path /PATH/TO/Qwen3-30B-A3B-Instruct-2507 \
   --host 0.0.0.0 \
   --log-level info \
   --context-length 32768 \
   --dp 8

In the main experiments, the judge model was served on 8 H100 GPUs. You can reduce or increase --dp based on your available hardware and throughput requirements.

Then obtain the host IP address, for example:

hostname -I

Use that IP address in the YAML configuration through the following fields. The default port is 30000.

agentloop:
  llm_ip: LLM_JUDGE_IP
  llm_port: LLM_JUDGE_PORT

you can test it by:

python rlinf/agents/wideseek_r1/utils/sglang_client.py --llm-ip LLM_JUDGE_IP

Using RLinf Built-in Rollout Engine as Judge#

Alternatively, you can use RLinf’s built-in rollout engine as the judge instead of an external server. This approach runs the judge LLM within the RLinf framework, which can be more convenient for local development and testing.

To use the built-in rollout engine as judge, set the following configuration in your YAML file:

agentloop:
  use_local_judge: true  # Enable local judge within RLinf framework

Then configure the rollout_judge section with your desired model and settings:

rollout_judge:
  group_name: "RolloutJudgeGroup"
  gpu_memory_utilization: 0.5
  model:
    model_type: qwen3
    model_path: /PATH/TO/YOUR/JUDGE/MODEL  # Replace with actual path
    precision: fp16
  rollout_backend: sglang
  tensor_parallel_size: 1
  pipeline_parallel_size: 1
  max_running_requests: 64

Example configuration files using the built-in judge can be found in:

Config	Purpose
`examples/agent/wideseek_r1/config/train_qwen3_hybrid_local_judge.yaml`	Train with the local judge.
`examples/agent/wideseek_r1/config/eval_qwen3_widesearch_local_judge.yaml`	Evaluate WideSearch with the local judge.

When using the built-in judge, you don’t need to start a separate judge server. The judge model will be loaded and managed by RLinf’s rollout engine.

Multi-node#

Since multi-agent generation incurs substantial time overhead, training and evaluation on a single machine with eight GPUs can significantly slow down experiments; therefore, WideSeek-R1 supports multi-node training and evaluation. Please refer to Multi-Node Ray Cluster Setup.

Next Steps#

Page	Next step
Tool Setup	Configure offline and online tool backends.
Training	Run the full training procedure.
Evaluation	Run the full evaluation procedure.