Mount Point Leak in deploy/undeploy Cycles

Normal day-to-day usage of `make deploy/undeploy` cycles for bpfman-operator development causes exponential growth in mount entries within the cluster control plane, eventually rendering the cluster unusable.


```
Cycle 1 - Current mount count: 80
Cycle 2 - Current mount count: 82
Cycle 3 - Current mount count: 86
Cycle 4 - Current mount count: 95
Cycle 5 - Current mount count: 111
Cycle 6 - Current mount count: 143
Cycle 7 - Current mount count: 207
Cycle 8 - Current mount count: 335
Cycle 9 - Current mount count: 591
Cycle 10 - Current mount count: 1103
Cycle 11 - Current mount count: 2127
Cycle 12 - Current mount count: 4175
Cycle 13 - Current mount count: 8271
Cycle 14 - Current mount count: 16463
Cycle 15 - Current mount count: 32847
```

The growth follows an exponential pattern, approximately doubling each cycle after cycle 7. This indicates mount points are not being properly cleaned up during undeploy operations.

## Impact

- **Development Workflow Disruption**: Clusters become unusable after routine deploy/undeploy cycles
- **System Instability**: High mount counts can cause kernel resource exhaustion
- **Deployment Failures**: Eventually pods fail to enter running state within timeout periods
- **Performance Degradation**: Mount table operations become increasingly slow

## Reproduction Steps

1. Create a Kind cluster: `make setup-kind`
2. Monitor mount points in real-time (optional):
   ```bash
   while :; do
   echo -n "$(date): "; docker exec bpfman-deployment-control-plane wc -l /proc/mounts
   sleep 5
   done
   ```
3. Perform normal development cycles:
   - `make deploy`
   - <Test changes>
   - `make undeploy`
   - Repeat
4. Observe mount count growth with each cycle
5. After ~15 cycles, deployments timeout waiting for pods to become ready

## Environment

- **Platform**: KIND cluster on Linux (or OpenShift)
- **Cluster**: bpfman-deployment (default KIND cluster name)
- **Namespace**: bpfman
- **Development Cycles**: 15 deploy/undeploy cycles over ~9 minutes

**Note**: This issue is also reproducible on OpenShift clusters, where remediation requires rebooting affected nodes.

## Expected Behaviour

Mount entries should return to baseline levels after each undeploy operation, with minimal growth over time.

## Actual Behaviour

Mount entries accumulate exponentially, never being fully cleaned up during undeploy operations, eventually breaking the development cluster.

This issue significantly impacts the development workflow as clusters become unusable after routine development cycles. After ~15 deploy/undeploy cycles, the cluster reaches system limits (32,767 mount entries) and deployments begin failing.

**Production Impact**: On OpenShift clusters, this issue manifests similarly and requires node reboots to remediate, making it a critical issue for production deployments.

The issue likely stems from incomplete cleanup during the undeploy process, where mount points created during deployment are not properly removed.

## Suggested Investigation

An initial avenue of investigation would be the CSI driver, as it manages volume mounts and may not be properly cleaning up mount points during undeploy operations. The exponential growth pattern suggests mount points are being created but never removed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mount Point Leak in deploy/undeploy Cycles #445

Impact

Reproduction Steps

Environment

Expected Behaviour

Actual Behaviour

Suggested Investigation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mount Point Leak in deploy/undeploy Cycles #445

Description

Impact

Reproduction Steps

Environment

Expected Behaviour

Actual Behaviour

Suggested Investigation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions