Tool Setup#

WideSeek-R1 provides two search backends:

  • online mode for live web search and webpage access.

  • offline mode for retrieval against a local Qdrant-based knowledge base.

In the standard workflow, offline tools are typically used for training and standard QA evaluation, while online tools are used for WideSearch evaluation.

Online Mode#

Online mode uses Serper for web search and Jina AI for webpage access.

API Keys#

Export the required API keys before running training or evaluation:

export SERPER_API_KEY=your_serper_api_key
export JINA_API_KEY=your_jina_api_key

Configuration#

In the YAML config under examples/agent/wideseek_r1/config, set:

tools:
  online: True
  use_jina: True
  enable_cache: True
  cache_file: "./webpage_cache.json"

Offline Mode#

Offline mode uses a local Qdrant retrieval service together with a local corpus and webpage store.

Prerequisites#

After completing the base setup from the installation guide, install the Qdrant client:

uv pip install qdrant-client==1.16.2

Download the Corpus and Retriever#

Prepare the following assets:

The corpus package includes:

  • wiki_corpus.jsonl for retrieval snippets.

  • wiki_webpages.jsonl for webpage content lookup.

  • qdrant/ containing the Qdrant collection files.

Launch the Retrieval Service#

  1. Start Qdrant in the corpus directory:

    cd /PATH/TO/Wiki-2018-Corpus/qdrant
    ./qdrant
    

    This process must stay alive. Running it inside tmux is recommended.

  2. Get the host IP address for the Qdrant service:

    hostname -I
    
  3. Edit examples/agent/tools/search_local_server_qdrant/launch_local_server.sh and update these variables:

    • WIKI2018_DIR: /PATH/TO/Wiki-2018-Corpus

    • retriever_path: /PATH/TO/e5-model

    • qdrant_url: for example http://<host_ip>:6333

    • qdrant_collection_name: set it to wiki_collection_m32_cef512.

    • qdrant_search_param: set it to {"hnsw_ef":256}.

  4. Start the retrieval service:

    bash examples/agent/tools/search_local_server_qdrant/launch_local_server.sh
    

We recommend running this retrieval service on the same machine as training or evaluation to avoid unnecessary network latency. If you run it elsewhere, configure tools.search.server_addr accordingly. The default address is localhost:8000.

The retrieval service listens on port 8000 by default and exposes:

  • POST /retrieve for vector retrieval.

  • POST /access for webpage content lookup.

Because Qdrant retrieval runs on CPU, only the E5 retriever model consumes GPU memory after the service starts.

Configuration#

In your YAML config, set:

tools:
  online: False

If the retrieval service is not running on the local machine, also set:

tools:
  search:
    server_addr: "HOST:8000"

Test the Tools#

You can test the WideSeek-R1 tool worker directly.

Online mode:

python rlinf/agents/wideseek_r1/tools.py --is_online true

Offline mode:

python rlinf/agents/wideseek_r1/tools.py --is_online false

The online test requires SERPER_API_KEY and JINA_API_KEY.

The offline test requires the local retrieval service to be reachable at the configured server_addr.