You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -31,7 +31,14 @@ This guide is loosely based on the
31
31
[to the instructions](https://github.com/oracle-quickstart/oci-hpc-oke/tree/main#instructions-for-deploying-an-oke-cluster-with-gpus-and-rdma-connectivity),
32
32
importing one of the images and creating a GPU partition with BM.GPU.H100.8 nodes.
33
33
34
-
The configuration here assumes a minimum of 16 BM.GPU.H100.8 nodes.
34
+
The configuration here assumes a minimum of 1 BM.GPU.H100.8 node for
35
+
training with 8B parameters, and a minimum of 8 BM.GPU.H100.8 nodes for 70B
36
+
parameters.
37
+
38
+
If another shape is used, the NCCL and MPI parameters in the Kubernetes
39
+
[configuration map](./files/training/templates/mpi.yaml) should be adapted
0 commit comments