Tool Setup#

WideSeek-R1 provides two search backends:

online mode for live web search and webpage access.
offline mode for retrieval against a local Qdrant-based knowledge base.

In the standard workflow, offline tools are typically used for training and standard QA evaluation, while online tools are used for WideSearch evaluation.

Overview#

Configure one tool backend before launching WideSeek-R1 training or evaluation.

Online

Serper search and Jina webpage access

Offline

Local Qdrant retrieval over Wiki-2018

Config

tools.online, use_jina, and cache settings

Test

rlinf/agents/wideseek_r1/tools.py

Online Mode#

Online mode uses Serper for web search and Jina AI for webpage access.

API Keys#

Export the required API keys before running training or evaluation:

export SERPER_API_KEY=your_serper_api_key
export JINA_API_KEY=your_jina_api_key

Configuration#

In the YAML config under examples/agent/wideseek_r1/config, set:

tools:
  online: True
  use_jina: True
  enable_cache: True
  cache_file: "./webpage_cache.json"

Offline Mode#

Offline mode uses a local Qdrant retrieval service together with a local corpus and webpage store.

Prerequisites#

After completing the base setup from the installation guide, install the Qdrant client:

uv pip install qdrant-client==1.16.2

Download the Corpus and Retriever#

Prepare the following assets:

The corpus package includes:

wiki_corpus.jsonl for retrieval snippets.
wiki_webpages.jsonl for webpage content lookup.
qdrant/ containing the Qdrant collection files.

Launch the Retrieval Service#

Start Qdrant in the corpus directory:
```
cd /PATH/TO/Wiki-2018-Corpus/qdrant
./qdrant
```
This process must stay alive. Running it inside tmux is recommended.
Get the host IP address for the Qdrant service:
```
hostname -I
```
Edit examples/agent/tools/search_local_server_qdrant/launch_local_server.sh and update these variables:
- WIKI2018_DIR: /PATH/TO/Wiki-2018-Corpus
- retriever_path: /PATH/TO/e5-model
- qdrant_url: for example http://<host_ip>:6333
- qdrant_collection_name: set it to wiki_collection_m32_cef512.
- qdrant_search_param: set it to {"hnsw_ef":256}.

Start the retrieval service:

bash examples/agent/tools/search_local_server_qdrant/launch_local_server.sh

We recommend running this retrieval service on the same machine as training or evaluation to avoid unnecessary network latency. If you run it elsewhere, configure tools.search.server_addr accordingly. The default address is localhost:8000.

The retrieval service listens on port 8000 by default and exposes:

POST /retrieve for vector retrieval.
POST /access for webpage content lookup.

Because Qdrant retrieval runs on CPU, only the E5 retriever model consumes GPU memory after the service starts.

Configuration#

In your YAML config, set:

tools:
  online: False

If the retrieval service is not running on the local machine, also set:

tools:
  search:
    server_addr: "HOST:8000"

Test the Tools#

You can test the WideSeek-R1 tool worker directly.

Online mode:

python rlinf/agents/wideseek_r1/tools.py --is_online true

Offline mode:

python rlinf/agents/wideseek_r1/tools.py --is_online false

The online test requires SERPER_API_KEY and JINA_API_KEY.

The offline test requires the local retrieval service to be reachable at the configured server_addr.