-
Notifications
You must be signed in to change notification settings - Fork 261
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
StorageClass Not Applied to PVC in Helm Deployment
Description
Hi team,
I am preparing a Terraform stack to deploy production-stack
on EKS clusters. I noticed a misbehavior from the chart. The existing storage class gp2
should be chosen, but each time the PVC has a null value for the storage class after deployment. Please provide any fix for this bug.
Workaround: I am currently doing this manually, which defeats the purpose of automation.
Note: There is no default StorageClass
in my cluster.
Helm Values
# This file is a Go template. Variables passed from Terraform are accessed with .VarName
servingEngineSpec:
enableEngine: true
runtimeClassName: "" # Use default runtime for CPU
nodeSelector:
workload-type: cpu
node-group: cpu-pool
containerSecurityContext:
privileged: true
modelSpec:
- name: "tinyllama-cpu"
repository: "public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo"
tag: "v0.8.5.post1"
modelURL: "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
replicaCount: 1
requestCPU: 1
requestMemory: "2Gi"
requestGPU: 0
limitCPU: "2"
limitMemory: "4Gi"
pvcStorage: "10Gi"
storageClassName: "gp2" # <------------- Storage Class
vllmConfig:
dtype: "bfloat16"
extraArgs:
- "--device"
- "cpu"
env:
- name: VLLM_CPU_KVCACHE_SPACE
value: "1"
- name: VLLM_CPU_OMP_THREADS_BIND
value: "0-2"
- name: HUGGING_FACE_HUB_TOKEN
valueFrom:
secretKeyRef:
name: hf-token-secret
key: token
routerSpec:
enableRouter: true
routingLogic: "roundrobin"
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1"
memory: "2Gi"
Error Message
pod has unbound immediate PersistentVolumeClaims
To Reproduce
Use EKS with ebs-csi-driver and a storage of type gp2
Run the below helm values (through basic helm or using terraform)
see above shared values
Expected behavior
- The PVC should be created with the specified storageClassName: "gp2".
Additional context
- There is no default StorageClass in the cluster.
- The issue occurs consistently with the provided Helm values.
- The whole deployment is stuck because of the storageclassName value not being persistent .
- This forces users to patch the pvc that is supposed to be managed by the prod-stack helm
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working