Training#
This page describes how to reproduce WideSeek-R1 training in RLinf.
The reference configuration uses Qwen3-4B, but the current pipeline is also
compatible with other dense models in the Qwen3 family.
Prerequisites#
Before launching training, make sure the following components are ready:
The RLinf environment is installed. See Installation.
The judge model server is running. See WideSeek-R1.
The offline retrieval tools are configured. See Tool Setup.
Download the Base Model#
The main experiments use Qwen3-4B.
After downloading the model, update
examples/agent/wideseek_r1/config/train_qwen3_hybrid.yaml
with the local model path:
rollout:
model:
model_type: qwen3
model_path: /PATH/TO/MODEL
Download the Training Data#
WideSeek-R1 training uses a 20k hybrid dataset that combines broad information-seeking data with standard QA data. The dataset is available on Hugging Face:
The main experiments use hybrid_20k.jsonl from
WideSeek-R1-train-data.
A separate WideSeek-R1-Corpus
artifact is also available for users who want to inspect or reuse the public
corpus resources. It is not the hybrid_20k.jsonl training file used by the
main experiments.
After downloading the data, update
examples/agent/wideseek_r1/config/train_qwen3_hybrid.yaml
with the dataset path:
data:
train_data_paths: /PATH/TO/TRAIN/DATASET/hybrid_20k.jsonl
is_hybrid: True
About is_hybrid and is_markdown#
is_hybrid indicates whether the training set mixes WideSearch-style data and
standard QA data.
Set
is_hybrid: Truefor the provided hybrid training set.If you train on a single-source dataset such as
width_20kordepth_20k, setis_hybrid: False.When
is_hybridisFalse, make suredata.is_markdownmatches the dataset format you use (True forwidth_20k, False fordepth_20k).
Launch Training#
Before starting training, verify all of the following:
rollout.model.model_pathpoints to the downloaded base model.data.train_data_pathspoints to the training dataset.agentloop.llm_ipis set correctly.Offline tools are configured and reachable. See Tool Setup.
Then run:
bash examples/agent/wideseek_r1/run_train.sh train_qwen3_hybrid
Outputs#
Training outputs are written to:
${runner.output_dir}/${runner.experiment_name}
You can inspect the TensorBoard files in that directory to monitor training metrics.
Notes#
WideSeek-R1 supports both single-agent and multi-agent execution. Switch between
them with agentloop.workflow in the YAML config:
mas: multi-agent training.sa: single-agent training.
The single-agent mode is designed to be comparable to ASearcher.