Skip to content

The replicas are dropped when adding new disks into a existing JBOD volume #1946

@boqu

Description

@boqu

Description

When adding a new disk to an existing JBOD storage policy, the operator incorrectly treats the missing PVC for the new volume as data loss and executes SYSTEM DROP REPLICA, which removes the replica's ZooKeeper
state (including log_ptr).

Since we set None for both replica and shard in schemaPolicy, so the operator won't try to do any recovery.

Root Cause

It seems the root cause is that the PVC reconciliation flow misclassifies a new volume as data loss:

  // stsReconcileOpts, migrateTableOpts = w.hostPVCsDataVolumeMissedDetectedOptions(host)
  stsReconcileOpts, migrateTableOpts = w.hostPVCsDataLossDetectedOptions(host)

See

// stsReconcileOpts, migrateTableOpts = w.hostPVCsDataVolumeMissedDetectedOptions(host)

Any idea why we don't use hostPVCsDataVolumeMissedDetectedOptions? Any edge case it won't handle?

Steps to Reproduce

  1. Deploy a ClickHouseInstallation with a JBOD storage policy containing one or more disks
  2. Add a new disk to the JBOD volume in the CHI spec
  3. Observe operator logs showing SYSTEM DROP REPLICA being executed
  4. Verify ZooKeeper state (log_ptr, etc.) is removed for the affected replicas

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions