Requirements#
The following configuration has been extensively tested.
Hardware#
Component |
Configuration |
|---|---|
GPU |
8xH100 per node |
CPU |
192 cores per node |
Memory |
1.8TB per node |
Network |
NVLink + RoCE / IB 3.2 Tbps |
Storage |
1TB local storage for single-node experiments
10TB shared storage (NAS) for distributed experiments
|
Software#
Component |
Version |
|---|---|
Operating System |
Ubuntu 22.04 |
NVIDIA Driver |
535.183.06 |
CUDA |
12.4 |
Docker |
26.0.0 |
NVIDIA Container Toolkit |
1.17.8 |