RHOAI Deploy

Simplified deployment of Red Hat OpenShift AI (RHOAI) platform using GitOps.

Overview

This repository provides a streamlined approach to deploying the RHOAI platform:

Platform: RHOAI operator and dependencies (NFD, Kueue, NVIDIA GPU Operator)
GitOps: ArgoCD-based deployment automation

Note: These deployment steps are specifically for RHOAI 3.x. RHOAI 3.x requires OpenShift 4.19+ and uses the fast-3.x subscription channel.

For GPU infrastructure (MachineSets), see the openshift-infra repository.

Prerequisites

OpenShift 4.19+ with cluster-admin access
oc CLI installed and configured
GPU nodes (optional, for model serving and training)
- See openshift-infra for GPU node deployment

Prerequisites Check:

# Verify OpenShift version (4.19+ required)
oc version

# Verify cluster-admin access
oc whoami
oc auth can-i create namespace

Deployment Steps

Step 1: Install OpenShift GitOps (2-3 minutes)

./bootstrap.sh

Note: If GitOps is already installed (e.g., from deploying another repository), the bootstrap script will detect it and skip installation.

Step 2: Deploy GPU Nodes (Optional, 10-15 minutes)

GPU node deployment is managed in a separate repository. See the openshift-infra repository for:

Multi-GPU instance type support (g4dn, g6)
Automated deployment scripts
Cost optimization guidance

# Clone openshift-infra repo
git clone https://github.com/dandawg/openshift-infra.git
cd openshift-infra

# Deploy GPU nodes (see openshift-infra README for details)
INSTANCE_TYPE=g6.2xlarge ./infra/gpu-machineset/aws/deploy.sh

# Return to rhoai-deploy
cd ../rhoai-deploy

Step 3: Deploy RHOAI Platform (5-10 minutes)

Option A: Deploy as one Application (Recommended)

# Single command deploys RHOAI + dependencies + GPU Operator
oc apply -f gitops/platform/rhoai-platform.yaml

Option B: Step-by-step

# 1. Deploy RHOAI dependencies (NFD + Kueue)
oc apply -f gitops/platform/rhoai-dependencies.yaml

# Wait for dependencies
oc wait --for=condition=Ready \
  pod -l app=nfd-master -n openshift-nfd --timeout=300s
oc wait --for=condition=Ready \
  pod -l control-plane=controller-manager \
  -n openshift-kueue --timeout=300s

# 2. Deploy NVIDIA GPU Operator (required for GPU support)
oc apply -f gitops/platform/nvidia-gpu-operator.yaml

# Wait for GPU operator (3-5 minutes, only if GPU nodes exist)
oc wait --for=condition=Ready \
  pod -l app=gpu-operator -n nvidia-gpu-operator --timeout=300s

# 3. Deploy RHOAI Operator
oc apply -f gitops/platform/rhoai-operator.yaml

Verification

Check ArgoCD Applications

# List all applications
oc get applications -n openshift-gitops

# Check specific application status
oc describe application <name> -n openshift-gitops

# Watch applications sync
watch oc get applications -n openshift-gitops

Check RHOAI Status

# Check RHOAI operator
oc get csv -n redhat-ods-operator

# Check DataScienceCluster
oc get datasciencecluster

# Check RHOAI pods
oc get pods -n redhat-ods-applications

# Get RHOAI Dashboard URL
RHOAI_URL=$(oc get route rhods-dashboard \
  -n redhat-ods-applications -o jsonpath='{.spec.host}')
echo "RHOAI Dashboard: https://${RHOAI_URL}"

Check GPU Status (if GPU nodes deployed)

# List GPU nodes
oc get nodes -l nvidia.com/gpu.present=true

# Check GPU allocatable resources
oc describe node <gpu-node-name> | grep -A 5 "Allocatable:"

# Check NVIDIA GPU Operator pods
oc get pods -n nvidia-gpu-operator

# Verify GPU device plugin is running
oc get daemonset nvidia-device-plugin-daemonset -n nvidia-gpu-operator

Repository Structure

rhoai-deploy/
├── README.md              # This file
├── bootstrap.sh           # GitOps installer script
├── bootstrap/             # GitOps operator manifests
│   └── gitops-operator/
├── gitops/               # ArgoCD Application manifests
│   └── platform/        # RHOAI, GPU Operator, dependencies
└── platform/            # Platform component definitions
    ├── gitops-operator/ # OpenShift GitOps/ArgoCD (legacy)
    ├── rhoai-operator/  # RHOAI with dependencies (NFD, Kueue)
    └── nvidia-gpu-operator/  # NVIDIA GPU Operator

Components

Platform Layer

RHOAI Operator - Red Hat OpenShift AI platform

Dashboard, Workbenches, Model Serving, Pipelines
Requires OpenShift 4.19+ for RHOAI 3.x
Currently using fast-3.x subscription channel

RHOAI Dependencies

Node Feature Discovery (NFD) - Hardware feature detection
Red Hat Build for Kueue - Job queuing and resource management

NVIDIA GPU Operator - GPU infrastructure

GPU drivers, CUDA runtime, device plugin
DCGM monitoring and metrics

See platform/rhoai-operator/README.md and platform/nvidia-gpu-operator/README.md for details.

GPU Infrastructure

For GPU node provisioning and management, see the openshift-infra repository, which provides:

Multi-GPU instance type support (g4dn, g6)
Automated deployment scripts
Cost optimization guidance
GitOps-ready manifests

Customization

Forking the Repository

If you fork this repository, update the repoURL in all GitOps manifests:

# Update all ArgoCD Application manifests
find gitops/ -name "*.yaml" -type f -exec sed -i '' \
  's|repoURL: .*|repoURL: https://github.com/YOUR-ORG/rhoai-deploy|g' {} \;

GPU Instance Configuration

For GPU instance configuration, see the openshift-infra repository.

Troubleshooting

GPU Scheduling Issues

Problem: Pods fail to schedule with error: 0/N nodes are available: X Insufficient nvidia.com/gpu, Y node(s) didn't match Pod's node affinity/selector

Root Cause: NVIDIA GPU Operator not deployed or GPU device plugin not running.

Solution:

# 1. Verify GPU Operator is deployed
oc get applications -n openshift-gitops | grep nvidia-gpu-operator

# If not deployed, deploy it:
oc apply -f gitops/platform/nvidia-gpu-operator.yaml

# 2. Check GPU Operator pods
oc get pods -n nvidia-gpu-operator

# 3. Verify GPU device plugin is running on GPU nodes
oc get daemonset nvidia-device-plugin-daemonset -n nvidia-gpu-operator

# 4. Check GPU resources are advertised
oc get nodes -o json | jq '.items[] | {name: .metadata.name, gpuAllocatable: .status.allocatable["nvidia.com/gpu"]}'

# 5. If GPU nodes exist but show 0 allocatable GPUs, restart device plugin
oc rollout restart daemonset/nvidia-device-plugin-daemonset -n nvidia-gpu-operator

GPU Nodes Issues

For GPU node provisioning and troubleshooting, see the openshift-infra repository.

# Check GPU nodes exist
oc get nodes -l nvidia.com/gpu.present=true

# Check GPU node labels
oc get nodes --show-labels | grep gpu

RHOAI Not Ready

# Check operator logs
oc logs -l name=rhods-operator -n redhat-ods-operator

# Check DataScienceCluster status
oc describe datasciencecluster default-dsc

# Check RHOAI component pods
oc get pods -n redhat-ods-applications
oc get pods -n redhat-ods-monitoring

ArgoCD Application OutOfSync

# Force sync
oc patch application <name> -n openshift-gitops \
  --type merge -p '{"operation":{"sync":{}}}'

# Check sync status
oc get application <name> -n openshift-gitops -o yaml

Resources

Next Steps

After successful deployment:

Explore RHOAI Dashboard
- Access the dashboard using the URL from verification steps
- Login with your OpenShift credentials
- Create a workbench for data science work
- Access Jupyter notebooks
Deploy AI Models (see rhoai-app-demos repository)
- Download models to storage
- Configure model serving with KServe
- Test inference endpoints
Build Applications (see rhoai-app-demos repository)
- Deploy AnythingLLM for RAG applications
- Set up n8n for workflow automation
- Create custom AI applications

Related Repositories

openshift-infra - GPU infrastructure and MachineSets
rhoai-app-demos - Application-level demos and examples
rhoai-model-serving - Model serving configurations

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
bootstrap/gitops-operator		bootstrap/gitops-operator
gitops/platform		gitops/platform
platform		platform
.gitignore		.gitignore
README.md		README.md
bootstrap.sh		bootstrap.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RHOAI Deploy

Overview

Prerequisites

Deployment Steps

Step 1: Install OpenShift GitOps (2-3 minutes)

Step 2: Deploy GPU Nodes (Optional, 10-15 minutes)

Step 3: Deploy RHOAI Platform (5-10 minutes)

Verification

Check ArgoCD Applications

Check RHOAI Status

Check GPU Status (if GPU nodes deployed)

Repository Structure

Components

Platform Layer

GPU Infrastructure

Customization

Forking the Repository

GPU Instance Configuration

Troubleshooting

GPU Scheduling Issues

GPU Nodes Issues

RHOAI Not Ready

ArgoCD Application OutOfSync

Resources

Next Steps

Related Repositories

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RHOAI Deploy

Overview

Prerequisites

Deployment Steps

Step 1: Install OpenShift GitOps (2-3 minutes)

Step 2: Deploy GPU Nodes (Optional, 10-15 minutes)

Step 3: Deploy RHOAI Platform (5-10 minutes)

Verification

Check ArgoCD Applications

Check RHOAI Status

Check GPU Status (if GPU nodes deployed)

Repository Structure

Components

Platform Layer

GPU Infrastructure

Customization

Forking the Repository

GPU Instance Configuration

Troubleshooting

GPU Scheduling Issues

GPU Nodes Issues

RHOAI Not Ready

ArgoCD Application OutOfSync

Resources

Next Steps

Related Repositories

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages