Skip to content

bug: StorageClass Not Applied to PVC in Helm Deployment #594

@brokedba

Description

@brokedba

Describe the bug

StorageClass Not Applied to PVC in Helm Deployment

Description

Hi team,

I am preparing a Terraform stack to deploy production-stack on EKS clusters. I noticed a misbehavior from the chart. The existing storage class gp2 should be chosen, but each time the PVC has a null value for the storage class after deployment. Please provide any fix for this bug.
Workaround: I am currently doing this manually, which defeats the purpose of automation.

Note: There is no default StorageClass in my cluster.

Helm Values

# This file is a Go template. Variables passed from Terraform are accessed with .VarName
servingEngineSpec:
  enableEngine: true
  runtimeClassName: ""  # Use default runtime for CPU
  nodeSelector:
    workload-type: cpu
    node-group: cpu-pool
  containerSecurityContext:
    privileged: true
  modelSpec:
  - name: "tinyllama-cpu"
    repository: "public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo"
    tag: "v0.8.5.post1"
    modelURL: "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
    replicaCount: 1
    requestCPU: 1
    requestMemory: "2Gi"
    requestGPU: 0
    limitCPU: "2"
    limitMemory: "4Gi"
    pvcStorage: "10Gi"
    storageClassName: "gp2"  # <------------- Storage Class
    vllmConfig:
      dtype: "bfloat16"
      extraArgs:
        - "--device"
        - "cpu"
    env:
      - name: VLLM_CPU_KVCACHE_SPACE
        value: "1"
      - name: VLLM_CPU_OMP_THREADS_BIND
        value: "0-2"
      - name: HUGGING_FACE_HUB_TOKEN
        valueFrom:
          secretKeyRef:
            name: hf-token-secret
            key: token

routerSpec:
  enableRouter: true
  routingLogic: "roundrobin"
  resources:
    requests:
      cpu: "500m"
      memory: "1Gi"
    limits:
      cpu: "1"
      memory: "2Gi"

Error Message

pod has unbound immediate PersistentVolumeClaims

To Reproduce

Use EKS with ebs-csi-driver and a storage of type gp2

Run the below helm values (through basic helm or using terraform)

see above shared values

Expected behavior

  • The PVC should be created with the specified storageClassName: "gp2".

Additional context

  • There is no default StorageClass in the cluster.
  • The issue occurs consistently with the provided Helm values.
  • The whole deployment is stuck because of the storageclassName value not being persistent .
  • This forces users to patch the pvc that is supposed to be managed by the prod-stack helm

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions