Skip to main content
Ctrl+K
RLinf
Ctrl+K
  • English
  • 中文
  • GitHub
  • Get Started
    • Requirements
    • Installation
    • Quick Start
    • Cheat Sheet
  • Examples
    • Simulators
      • ManiSkill
      • LIBERO
      • Behavior
      • MetaWorld
      • IsaacLab
      • CALVIN
      • RoboCasa
      • RoboTwin
      • RoboVerse
      • Franka-Sim
      • EmbodiChain
      • PolaRiS
      • GSEnv
      • Genesis
    • Robots
      • Franka
        • Reward Model
        • ZED + Robotiq
        • GELLO
        • Dual-Arm
        • Dexterous Hand
        • Pi0 SFT
        • HG-DAgger
      • GimArm
      • XSquare Turtle2
      • DOS-W1
    • Models
      • MLP
      • π₀ / π₀.â‚…
      • GR00T
      • Lingbot-VLA
      • Dexbotic
      • StarVLA
      • ABot-M0
      • OpenSora
      • Wan
    • SFT
      • OpenPI
      • DreamZero
      • VLM
    • Algorithms
      • SAC-Flow
      • DSRL
      • DAgger
      • RECAP
      • Co-Training
      • IQL (D4RL)
    • Agents
      • WideSeek-R1
        • Tool Setup
        • WideSeek-R1 Training
        • WideSeek-R1 Evaluation
      • AgentLightning
      • Coding Online RL
      • Search-R1
      • rStar2
      • Math GRPO
      • Math PPO
    • Systems
      • FUSCO High-Performance MoE Communication Library
  • Evaluation
    • Get Started
      • Overview
      • Installation
      • Quick Tour
    • Benchmark Guides
      • Real-World Evaluation
      • BEHAVIOR-1K Evaluation
      • LIBERO Evaluation
      • ManiSkill OOD Evaluation
      • PolaRiS Evaluation
      • RoboTwin Evaluation
    • Reference
      • Configuration Reference
      • CLI Reference
      • Supported Models
      • Logs and Results
  • Guides
    • Configure
      • Basic Configuration
      • Embodied Configuration
      • Agentic Configuration
      • Logging
    • Launch & Scale
      • Multi-Node Training
      • Heterogeneous Clusters
      • Cloud-Edge Collaboration
      • Real-World Robots
    • Data & Checkpoints
      • Data Collection
      • Checkpoint Conversion
      • Resume Training
    • Performance
      • Auto Placement
      • Dynamic Scheduling
      • Profiling
      • 5D Parallelism
      • LoRA
    • Hardware Backends
      • AMD ROCm
      • Ascend CANN
      • SGLang Version Switching
    • Agent Workflows
      • Agentic Guides
  • Concepts
    • Execution Model
      • RLinf Execution Flow
      • M2Flow Programming Flow
      • Worker and WorkerGroup
      • Cluster
      • Channel
      • Collective Communication
    • Scheduling Model
      • Placement
      • Execution Modes
      • Replay Buffer
  • Reference
    • API
      • Worker Interface
      • Placement Interface
      • Cluster Interface
      • Channel Interface
      • Actor Interface
      • Rollout Interface
      • Environment Interface
      • Data Interface
      • Embodied Data Interface
      • Replay Buffer
    • Algorithms
      • PPO
      • GRPO
      • DAPO
      • Reinforce++
      • SAC
      • CrossQ
      • RLPD
      • IQL
      • Async PPO
    • Configuration
      • Training Configuration
      • Training Metrics
  • Extending
    • Extending Overview
    • New Environment
    • New Model with FSDP
    • New Model with Megatron
    • New SFT Model
    • Advanced Integrations
      • Megatron-Bridge
      • Weight Synchronization
      • Reward Model Workflow
  • Resources
    • Why RLinf
    • Blog
      • Comparison with VeRL
      • A First Look at the “Last Mile” of Agent Deployment: Cursor Online Reinforcement Learning
      • Accelerating the “ImageNet Moment” of Embodied AI: RLinf Brings a 25Ă— System Optimization to BEHAVIOR
    • Publications
      • RLinf-USER: Unified System for Real-world Online Policy Learning
      • RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training
      • Beyond Imitation: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models
      • RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation
      • Ď€RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models
      • WoVR: World Models as Reliable Simulators for Post-Training VLA Policies with RL
      • WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning
    • Release Notes
    • FAQ
  • Guides
  • Launch & Scale

Launch & Scale#

Use these guides when you need to start runs beyond a single default machine or connect RLinf to physical hardware.

Guide

What you get

Multi-Node Training

Launch a run across multiple nodes.

Heterogeneous Clusters

Configure mixed hardware and node groups.

Cloud-Edge Collaboration

Split inference and training across cloud and edge.

Real-World Robots

Run RL on physical robot hardware.

previous

Training Visualisation

next

Multi-Node Ray Cluster Setup

© Copyright 2025 RLinf Project.

Created using Sphinx 9.0.4.

Built with the PyData Sphinx Theme 0.16.1.