AgentLightning RL Training (calc_x)#

calc_x is an AgentLightning example in RLinf for training a math-solving agent. The agent reads a question, produces reasoning and an answer, and then receives feedback for RL updates.

Overview#

Use this recipe to train a calculator-backed math agent with Agent Lightning and RLinf’s distributed trainer.

Model

Qwen2.5-1.5B-Instruct

Algorithm

Multi-turn agent RL

Tools

MCP calculator and AutoGen agent chat

Hardware

One node with at least one 40 GB GPU

Installation#

For the base RLinf environment, see RLinf Installation.

Install dependencies for this example:

pip install "agentlightning==0.3.0" "autogen-agentchat" "autogen-ext[openai]" "mcp>=1.10.0" "mcp-server-calculator"

Data Preparation#

Download and extract the calc_x dataset (Google Drive). See the download link here.

Run It#

Go to the example directory:

cd /path/to/RLinf/examples/agent/agentlightning/calc_x

Choose the config you want to run, then set the model and dataset paths in the same config file. For example, the training command below uses config/qwen2.5-1.5b-enginehttp-multiturn.yaml:

rollout:
  model:
    model_path: /path/to/model/Qwen2.5-1.5B-Instruct

data:
  train_data_paths: ["/path/to/train.parquet"]
  val_data_paths: ["/path/to/test.parquet"]

Start training:

bash run_calc_x.sh qwen2.5-1.5b-enginehttp-multiturn

You can also keep the config file unchanged and pass Hydra overrides on the command line:

bash run_calc_x.sh qwen2.5-1.5b-enginehttp-multiturn \
  rollout.model.model_path=/path/to/Qwen2.5-1.5B-Instruct \
  data.train_data_paths='["/path/to/train.parquet"]' \
  data.val_data_paths='["/path/to/test.parquet"]'

To train with trajectory-level advantages instead, use the matching trajectory config and set the same paths there or via overrides:

bash run_calc_x.sh qwen2.5-1.5b-enginehttp-trajectory

Visualization and Results#

Example training / metric curves from a calc_x run (logged metrics may vary by config and seed):

AgentLightning calc_x training curves

AgentLightning calc_x training curves#

Standalone Evaluation#

For HF evaluation, set rollout.model.model_path in the matching *_eval.yaml. Examples:

bash run_calc_x.sh qwen2.5-1.5b-enginehttp-multiturn_eval
bash run_calc_x.sh qwen2.5-1.5b-enginehttp-trajectory_eval