WideSeek-R1#

WideSeek-R1 is a lead-agent and subagent framework trained with multi-agent reinforcement learning (MARL) for broad information-seeking tasks. It combines scalable orchestration and parallel execution through a shared LLM, isolated agent contexts, and specialized tools.

On the WideSearch benchmark, WideSeek-R1-4B reaches an item F1 score of 40.0%. This is comparable to single-agent DeepSeek-R1-671B while continuing to improve as the number of parallel subagents increases.

For the full method and results, see the WideSeek-R1 publication, the project page, the paper on arXiv, and the example code in RLinf.

Installation#

For the base environment, follow the RLinf installation guide.

We recommend the prebuilt Docker image:

docker pull rlinf/rlinf:math-rlinf0.2-torch2.6.0-sglang0.4.6.post5-vllm0.8.5-megatron0.13.0-te2.1

If you prefer a local environment, install the agentic stack:

bash requirements/install.sh agentic

Our startup scripts and configuration files are located in examples/agent/wideseek_r1.

  • examples/agent/wideseek_r1/config contains the YAML configuration files for training and evaluation.

  • examples/agent/tools/search_local_server_qdrant provides the search engine implementation used by offline tools.

  • examples/agent/wideseek_r1/run_train.sh and examples/agent/wideseek_r1/run_eval.sh are the main entry points for training and evaluation, respectively.

Tool Backends#

WideSeek-R1 supports two tool backends:

See Tool Setup for the full configuration workflow.

Quick Start#

Before running either training or evaluation, start the judge model server. WideSeek-R1 uses an LLM judge to provide more reliable feedback than exact-match scoring alone.

Judge Model#

The default setup uses Qwen3-30B-A3B-Instruct-2507 as the judge model.

Start the judge server with SGLang:

python3 -m sglang.launch_server \
   --model-path /PATH/TO/Qwen3-30B-A3B-Instruct-2507 \
   --host 0.0.0.0 \
   --log-level info \
   --context-length 32768 \
   --dp 8

In the main experiments, the judge model was served on 8 H100 GPUs. You can reduce or increase --dp based on your available hardware and throughput requirements.

Then obtain the host IP address, for example:

hostname -I

Use that IP address in the YAML configuration through the following fields. The default port is 30000.

agentloop:
  llm_ip: LLM_JUDGE_IP
  llm_port: LLM_JUDGE_PORT

you can test it by:

python rlinf/agents/wideseek_r1/utils/sglang_client.py --llm-ip LLM_JUDGE_IP

Using RLinf Built-in Rollout Engine as Judge#

Alternatively, you can use RLinf’s built-in rollout engine as the judge instead of an external server. This approach runs the judge LLM within the RLinf framework, which can be more convenient for local development and testing.

To use the built-in rollout engine as judge, set the following configuration in your YAML file:

agentloop:
  use_local_judge: true  # Enable local judge within RLinf framework

Then configure the rollout_judge section with your desired model and settings:

rollout_judge:
  group_name: "RolloutJudgeGroup"
  gpu_memory_utilization: 0.5
  model:
    model_type: qwen3
    model_path: /PATH/TO/YOUR/JUDGE/MODEL  # Replace with actual path
    precision: fp16
  rollout_backend: sglang
  tensor_parallel_size: 1
  pipeline_parallel_size: 1
  max_running_requests: 64

Example configuration files using the built-in judge can be found in:

  • examples/agent/wideseek_r1/config/train_qwen3_hybrid_local_judge.yaml

  • examples/agent/wideseek_r1/config/eval_qwen3_widesearch_local_judge.yaml

When using the built-in judge, you don’t need to start a separate judge server. The judge model will be loaded and managed by RLinf’s rollout engine.

Multi-node#

Since multi-agent generation incurs substantial time overhead, training and evaluation on a single machine with eight GPUs can significantly slow down experiments; therefore, WideSeek-R1 supports multi-node training and evaluation. Please refer to the documentation Multi-node Training.

Next Steps#