AgentLightning RL Training (calc_x)#
calc_x is an AgentLightning example in RLinf for training a math-solving agent.
The agent reads a question, produces reasoning and an answer, and then receives feedback for RL updates.
Overview#
Use this recipe to train a calculator-backed math agent with Agent Lightning and RLinf’s distributed trainer.
Qwen2.5-1.5B-Instruct
Multi-turn agent RL
MCP calculator and AutoGen agent chat
One node with at least one 40 GB GPU
Installation#
For the base RLinf environment, see RLinf Installation.
Install dependencies for this example:
pip install "agentlightning==0.3.0" "autogen-agentchat" "autogen-ext[openai]" "mcp>=1.10.0" "mcp-server-calculator"
Data Preparation#
Download and extract the calc_x dataset (Google Drive). See the download link here.
Run It#
Go to the example directory:
cd /path/to/RLinf/examples/agent/agentlightning/calc_x
Choose the config you want to run, then set the model and dataset paths in the
same config file. For example, the training command below uses
config/qwen2.5-1.5b-enginehttp-multiturn.yaml:
rollout:
model:
model_path: /path/to/model/Qwen2.5-1.5B-Instruct
data:
train_data_paths: ["/path/to/train.parquet"]
val_data_paths: ["/path/to/test.parquet"]
Start training:
bash run_calc_x.sh qwen2.5-1.5b-enginehttp-multiturn
You can also keep the config file unchanged and pass Hydra overrides on the command line:
bash run_calc_x.sh qwen2.5-1.5b-enginehttp-multiturn \
rollout.model.model_path=/path/to/Qwen2.5-1.5B-Instruct \
data.train_data_paths='["/path/to/train.parquet"]' \
data.val_data_paths='["/path/to/test.parquet"]'
To train with trajectory-level advantages instead, use the matching trajectory config and set the same paths there or via overrides:
bash run_calc_x.sh qwen2.5-1.5b-enginehttp-trajectory
Visualization and Results#
Example training / metric curves from a calc_x run (logged metrics may vary by config and seed):
AgentLightning calc_x training curves#
Standalone Evaluation#
For HF evaluation, set rollout.model.model_path in the matching *_eval.yaml. Examples:
bash run_calc_x.sh qwen2.5-1.5b-enginehttp-multiturn_eval
bash run_calc_x.sh qwen2.5-1.5b-enginehttp-trajectory_eval