Requirements#

The following configuration has been extensively tested.

Hardware#

Component

Configuration

GPU

8xH100 per node

CPU

192 cores per node

Memory

1.8TB per node

Network

NVLink + RoCE / IB 3.2 Tbps

Storage

1TB local storage for single-node experiments
10TB shared storage (NAS) for distributed experiments

Software#

Component

Version

Operating System

Ubuntu 22.04

NVIDIA Driver

535.183.06

CUDA

12.4

Docker

26.0.0

NVIDIA Container Toolkit

1.17.8