Tool Setup#
WideSeek-R1 provides two search backends:
onlinemode for live web search and webpage access.offlinemode for retrieval against a local Qdrant-based knowledge base.
In the standard workflow, offline tools are typically used for training and standard QA evaluation, while online tools are used for WideSearch evaluation.
Online Mode#
Online mode uses Serper for web search and Jina AI for webpage access.
API Keys#
Export the required API keys before running training or evaluation:
export SERPER_API_KEY=your_serper_api_key
export JINA_API_KEY=your_jina_api_key
Configuration#
In the YAML config under examples/agent/wideseek_r1/config, set:
tools:
online: True
use_jina: True
enable_cache: True
cache_file: "./webpage_cache.json"
Offline Mode#
Offline mode uses a local Qdrant retrieval service together with a local corpus and webpage store.
Prerequisites#
After completing the base setup from the installation guide, install the Qdrant client:
uv pip install qdrant-client==1.16.2
Download the Corpus and Retriever#
Prepare the following assets:
The corpus package includes:
wiki_corpus.jsonlfor retrieval snippets.wiki_webpages.jsonlfor webpage content lookup.qdrant/containing the Qdrant collection files.
Launch the Retrieval Service#
Start Qdrant in the corpus directory:
cd /PATH/TO/Wiki-2018-Corpus/qdrant ./qdrant
This process must stay alive. Running it inside
tmuxis recommended.Get the host IP address for the Qdrant service:
hostname -IEdit
examples/agent/tools/search_local_server_qdrant/launch_local_server.shand update these variables:WIKI2018_DIR:/PATH/TO/Wiki-2018-Corpusretriever_path:/PATH/TO/e5-modelqdrant_url: for examplehttp://<host_ip>:6333qdrant_collection_name: set it towiki_collection_m32_cef512.qdrant_search_param: set it to{"hnsw_ef":256}.
Start the retrieval service:
bash examples/agent/tools/search_local_server_qdrant/launch_local_server.sh
We recommend running this retrieval service on the same machine as training or
evaluation to avoid unnecessary network latency. If you run it elsewhere,
configure tools.search.server_addr accordingly. The default address is
localhost:8000.
The retrieval service listens on port 8000 by default and exposes:
POST /retrievefor vector retrieval.POST /accessfor webpage content lookup.
Because Qdrant retrieval runs on CPU, only the E5 retriever model consumes GPU memory after the service starts.
Configuration#
In your YAML config, set:
tools:
online: False
If the retrieval service is not running on the local machine, also set:
tools:
search:
server_addr: "HOST:8000"
Test the Tools#
You can test the WideSeek-R1 tool worker directly.
Online mode:
python rlinf/agents/wideseek_r1/tools.py --is_online true
Offline mode:
python rlinf/agents/wideseek_r1/tools.py --is_online false
The online test requires SERPER_API_KEY and JINA_API_KEY.
The offline test requires the local retrieval service to be reachable at the
configured server_addr.