WideSeek-R1 Training#

This page describes how to reproduce WideSeek-R1 training in RLinf.

The reference configuration uses Qwen3-4B, but the current pipeline is also compatible with other dense models in the Qwen3 family.

Overview#

Use this page to prepare the model, data, and config for WideSeek-R1 training.

Model

Qwen3-4B

Algorithm

Hybrid multi-agent RL

Data

hybrid_20k.jsonl from WideSeek-R1-train-data

Tools

Offline retrieval and judge model server

Prerequisites#

Before launching training, make sure the following components are ready:

The RLinf environment is installed. See Installation.
The judge model server is running. See WideSeek-R1.
The offline retrieval tools are configured. See Tool Setup.

Download the Base Model#

The main experiments use Qwen3-4B.

After downloading the model, update examples/agent/wideseek_r1/config/train_qwen3_hybrid.yaml with the local model path:

rollout:
  model:
    model_type: qwen3
    model_path: /PATH/TO/MODEL

Download the Training Data#

WideSeek-R1 training uses a 20k hybrid dataset that combines broad information-seeking data with standard QA data. The dataset is available on Hugging Face:

WideSeek-R1-train-data

The main experiments use hybrid_20k.jsonl from WideSeek-R1-train-data.

A separate WideSeek-R1-Corpus artifact is also available for users who want to inspect or reuse the public corpus resources. It is not the hybrid_20k.jsonl training file used by the main experiments.

After downloading the data, update examples/agent/wideseek_r1/config/train_qwen3_hybrid.yaml with the dataset path:

data:
  train_data_paths: /PATH/TO/TRAIN/DATASET/hybrid_20k.jsonl
  is_hybrid: True

About `is_hybrid` and `is_markdown`#

is_hybrid indicates whether the training set mixes WideSearch-style data and standard QA data.

Set is_hybrid: True for the provided hybrid training set.
If you train on a single-source dataset such as width_20k or depth_20k, set is_hybrid: False.
When is_hybrid is False, make sure data.is_markdown matches the dataset format you use (True for width_20k, False for depth_20k).

Run It#

Before starting training, verify all of the following:

rollout.model.model_path points to the downloaded base model.
data.train_data_paths points to the training dataset.
agentloop.llm_ip is set correctly.
Offline tools are configured and reachable. See Tool Setup.

Then run:

bash examples/agent/wideseek_r1/run_train.sh train_qwen3_hybrid

Visualization and Results#

Training outputs are written to:

${runner.output_dir}/${runner.experiment_name}

You can inspect the TensorBoard files in that directory to monitor training metrics.

Notes#

WideSeek-R1 supports both single-agent and multi-agent execution. Switch between them with agentloop.workflow in the YAML config:

mas: multi-agent training.
sa: single-agent training.

The single-agent mode is designed to be comparable to ASearcher.