WideSeek-R1#

WideSeek-R1 is a lead-agent and subagent framework trained with multi-agent reinforcement learning (MARL) for broad information-seeking tasks. It combines scalable orchestration and parallel execution through a shared LLM, isolated agent contexts, and specialized tools.

On the WideSearch benchmark, WideSeek-R1-4B reaches an item F1 score of 40.0%. This is comparable to single-agent DeepSeek-R1-671B while continuing to improve as the number of parallel subagents increases.

For the full method and results, see the WideSeek-R1 publication, the project page, the paper on arXiv, and the example code in RLinf.

Installation#

For the base environment, follow the RLinf installation guide.

We recommend the prebuilt Docker image:

docker pull rlinf/rlinf:math-rlinf0.2-torch2.6.0-sglang0.4.6.post5-vllm0.8.5-megatron0.13.0-te2.1

If you prefer a local environment, install the agentic stack:

bash requirements/install.sh agentic

Our startup scripts and configuration files are located in examples/agent/wideseek_r1.

  • examples/agent/wideseek_r1/config contains the YAML configuration files for training and evaluation.

  • examples/agent/tools/search_local_server_qdrant provides the search engine implementation used by offline tools.

  • examples/agent/wideseek_r1/run_train.sh and examples/agent/wideseek_r1/run_eval.sh are the main entry points for training and evaluation, respectively.

Tool Backends#

WideSeek-R1 supports two tool backends:

See Tool Setup for the full configuration workflow.

Quick Start#

Before running either training or evaluation, start the judge model server. WideSeek-R1 uses an LLM judge to provide more reliable feedback than exact-match scoring alone.

Judge Model#

The default setup uses Qwen3-30B-A3B-Instruct-2507 as the judge model.

Start the judge server with SGLang:

python3 -m sglang.launch_server \
   --model-path /PATH/TO/Qwen3-30B-A3B-Instruct-2507 \
   --host 0.0.0.0 \
   --log-level info \
   --context-length 32768 \
   --dp 8

In the main experiments, the judge model was served on 8 H100 GPUs. You can reduce or increase --dp based on your available hardware and throughput requirements.

Then obtain the host IP address, for example:

hostname -I

Use that IP address in the YAML configuration through the following fields. The default port is 30000.

agentloop:
  llm_ip: LLM_JUDGE_IP
  llm_port: LLM_JUDGE_PORT

you can test it by:

python rlinf/agents/wideseek_r1/utils/sglang_client.py --llm-ip LLM_JUDGE_IP

Multi-node#

Since multi-agent generation incurs substantial time overhead, training and evaluation on a single machine with eight GPUs can significantly slow down experiments; therefore, WideSeek-R1 supports multi-node training and evaluation. Please refer to the documentation Multi-node Training.

Next Steps#