A common issue that kubernetes operators run into is how to balance the requirements of long-running workloads with the need to periodically cycle through worker nodes (e.g. for a kubernetes upgrade or configuration change). Stateful set workloads (kafka, zookeeper, rabbitmq, redis, etc) may be sensitive to rescheduling, or application developers may simply deploy applications with pod disruption budgets that do not allow pod eviction at all. k8s-shredder is a tool that allows the operator to automatically try to soft-evict and reschedule workloads on "parked nodes" over a set period of time before hard-deleting pods. It does so while exposing metrics that application stewards can monitor and take action, allowing them to take whatever action is needed to reschedule their application off the node slated for replacement.
You can find more about node parking here.
- allow teams running stateful apps to move their workloads off of parked nodes at their will, independent of node lifecycle
- optimizes cloud costs by dynamically purging unschedulable workers nodes (parked nodes).
- notifies (via metrics) clients that they are running workloads on parked nodes so that they can take proper actions.
You can deploy k8s-shredder in your cluster by using the helm chart.
Then, during a cluster upgrade, while rotating the worker nodes, you will need to label the nodes and the non-daemonset pods it contains with labels similar to this:
shredder.ethos.adobe.net/parked-by=k8s-shredder
shredder.ethos.adobe.net/parked-node-expires-on=1750164865.36373
shredder.ethos.adobe.net/upgrade-status=parkedYou can either park (e.g. label the nodes and pods) with your own cluster management tooling, or you can configure k8s-shredder to park nodes for you based on karpenter node claim status or the presence of a specific label on the node.
The following options can be used to customize the k8s-shredder controller:
| Name | Default Value | Description |
|---|---|---|
| EvictionLoopInterval | 60s | How often to run the eviction loop process |
| ParkedNodeTTL | 60m | Time a node can be parked before starting force eviction process |
| RollingRestartThreshold | 0.5 | How much time(percentage) should pass from ParkedNodeTTL before starting the rollout restart process |
| UpgradeStatusLabel | "shredder.ethos.adobe.net/upgrade-status" | Label used for the identifying parked nodes |
| ExpiresOnLabel | "shredder.ethos.adobe.net/parked-node-expires-on" | Label used for identifying the TTL for parked nodes |
| NamespacePrefixSkipInitialEviction | "" | For pods in namespaces having this prefix proceed directly with a rollout restart without waiting for the RollingRestartThreshold |
| RestartedAtAnnotation | "shredder.ethos.adobe.net/restartedAt" | Annotation name used to mark a controller object for rollout restart |
| AllowEvictionLabel | "shredder.ethos.adobe.net/allow-eviction" | Label used for skipping evicting pods that have explicitly set this label on false |
| ToBeDeletedTaint | "ToBeDeletedByClusterAutoscaler" | Node taint used for skipping a subset of parked nodes that are already handled by cluster-autoscaler |
| ArgoRolloutsAPIVersion | "v1alpha1" | API version from argoproj.io API group to be used while handling Argo Rollouts objects |
| EnableKarpenterDriftDetection | false | Controls whether to scan for drifted Karpenter NodeClaims and automatically label their nodes |
| EnableKarpenterDisruptionDetection | false | Controls whether to scan for disrupted Karpenter NodeClaims and automatically label their nodes |
| ParkedByLabel | "shredder.ethos.adobe.net/parked-by" | Label used to identify which component parked the node |
| ParkedNodeTaint | "shredder.ethos.adobe.net/upgrade-status=parked:NoSchedule" | Taint to apply to parked nodes in format key=value:effect |
| EnableNodeLabelDetection | false | Controls whether to scan for nodes with specific labels and automatically park them |
| NodeLabelsToDetect | [] | List of node labels to detect. Supports both key-only and key=value formats |
| MaxParkedNodes | "0" | Maximum number of nodes that can be parked simultaneously. Can be an integer (e.g., "5") or percentage (e.g., "20%"). Set to "0" (default) for no limit. |
| ExtraParkingLabels | {} | (Optional) Map of extra labels to apply to nodes and pods during parking. Example: { "example.com/owner": "infrastructure" } |
| EvictionSafetyCheck | true | Controls whether to perform safety checks before force eviction. If true, nodes will be unparked if pods don't have required parking labels. |
| ParkingReasonLabel | "shredder.ethos.adobe.net/parked-reason" | Label used to track why a node or pod was parked (values: node-label, karpenter-drifted, karpenter-disrupted) |
k8s-shredder will periodically run eviction loops, based on configured EvictionLoopInterval, trying to clean up all the pods from the parked nodes. Once all the pods are cleaned up, [cluster-autoscaler](
https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) or karpenter should chime in and recycle the parked node.
The diagram below describes a simple flow about how k8s-shredder handles stateful set applications:
k8s-shredder includes an optional feature for automatic detection of drifted Karpenter NodeClaims. This feature is disabled by default, but can be enabled by setting EnableKarpenterDriftDetection to true. When enabled, at the beginning of each eviction loop, the controller will:
-
Scan the Kubernetes cluster for Karpenter NodeClaims that are marked as "Drifted"
-
Identify the nodes associated with these drifted NodeClaims
-
Automatically process these nodes by:
- Labeling nodes and their non-DaemonSet pods with:
UpgradeStatusLabel(set to "parked")ExpiresOnLabel(set to current time +ParkedNodeTTL)ParkedByLabel(set to "k8s-shredder")- Any labels specified in
ExtraParkingLabels
- Cordoning the nodes to prevent new pod scheduling
- Tainting the nodes with the configured
ParkedNodeTaint
- Labeling nodes and their non-DaemonSet pods with:
This integration allows k8s-shredder to automatically handle node lifecycle management for clusters using Karpenter, ensuring that drifted nodes are properly marked for eviction and eventual replacement.
k8s-shredder includes an optional feature for automatic detection of disrupted Karpenter NodeClaims. This feature is disabled by default, but can be enabled by setting EnableKarpenterDisruptionDetection to true. When enabled, at the beginning of each eviction loop, the controller will:
-
Scan the Kubernetes cluster for Karpenter NodeClaims that are marked as disrupted (e.g., "Disrupting", "Terminating", "Empty", "Underutilized")
-
Identify the nodes associated with these disrupted NodeClaims
-
Automatically process these nodes by:
- Labeling nodes and their non-DaemonSet pods with:
UpgradeStatusLabel(set to "parked")ExpiresOnLabel(set to current time +ParkedNodeTTL)ParkedByLabel(set to "k8s-shredder")- Any labels specified in
ExtraParkingLabels
- Cordoning the nodes to prevent new pod scheduling
- Tainting the nodes with the configured
ParkedNodeTaint
- Labeling nodes and their non-DaemonSet pods with:
This integration ensures that nodes undergoing disruption as part of bin-packing operations have all pods evicted in a reasonable amount of time, preventing them from getting stuck due to blocking Pod Disruption Budgets (PDBs). It complements the drift detection feature by handling nodes that are actively being disrupted by Karpenter's consolidation and optimization processes.
k8s-shredder includes optional automatic detection of nodes with specific labels. This feature is disabled by default but can be enabled by setting EnableNodeLabelDetection to true. When enabled, at the beginning of each eviction loop, the application will:
-
Scan the Kubernetes cluster for nodes that match any of the configured label selectors in
NodeLabelsToDetect -
Support both key-only (
"maintenance") and key=value ("upgrade=required") label formats -
Automatically process matching nodes by:
- Labeling nodes and their non-DaemonSet pods with:
UpgradeStatusLabel(set to "parked")ExpiresOnLabel(set to current time +ParkedNodeTTL)ParkedByLabel(set to "k8s-shredder")- Any labels specified in
ExtraParkingLabels
- Cordoning the nodes to prevent new pod scheduling
- Tainting the nodes with the configured
ParkedNodeTaint
- Labeling nodes and their non-DaemonSet pods with:
This integration allows k8s-shredder to automatically handle node lifecycle management based on custom labeling strategies, enabling teams to mark nodes for parking using their own operational workflows and labels. For example, this can be used in conjunction with AKS cluster upgrades.
k8s-shredder supports limiting the maximum number of nodes that can be parked simultaneously using the MaxParkedNodes configuration option. This feature helps prevent overwhelming the cluster with too many parked nodes at once, which could impact application availability.
MaxParkedNodes can be specified as either:
- Integer value (e.g.,
"5"): Absolute maximum number of nodes that can be parked - Percentage value (e.g.,
"20%"): Maximum percentage of total cluster nodes that can be parked (calculated dynamically each cycle)
When MaxParkedNodes is set to a non-zero value:
- Parse the limit: The configuration is parsed to determine the actual limit
- For percentages:
limit = (percentage / 100) * totalNodes(rounded down) - For integers:
limit = configured value
- For percentages:
- Count parked nodes: k8s-shredder counts the number of currently parked nodes
- Calculate available slots:
availableSlots = limit - currentlyParked - Sort by age: Eligible nodes are sorted by creation timestamp (oldest first) to ensure predictable parking order
- Limit parking: If the number of eligible nodes exceeds available slots, only the oldest
availableSlotsnodes are parked - Skip if full: If no slots are available (currentlyParked >= limit), parking is skipped for that eviction interval
Examples:
MaxParkedNodes: "0"(default): No limit, all eligible nodes are parkedMaxParkedNodes: "5": Maximum 5 nodes can be parked at any timeMaxParkedNodes: "20%": Maximum 20% of total cluster nodes can be parked (e.g., 2 nodes in a 10-node cluster)- Invalid values (e.g.,
"-1","invalid"): Treated as 0 (no limit) with a warning logged
Percentage Benefits:
- Dynamic scaling: Limit automatically adjusts as cluster size changes
- Proportional safety: Maintains a consistent percentage of available capacity regardless of cluster size
- Auto-scaling friendly: Works well with cluster auto-scaling by recalculating limits each cycle
Predictable Parking Order:
Eligible nodes are always sorted by creation timestamp (oldest first), regardless of whether MaxParkedNodes is set. This ensures:
- Consistent behavior: The same nodes will be parked first across multiple eviction cycles
- Fair rotation: Oldest nodes are prioritized for replacement during rolling upgrades
- Predictable capacity: You can anticipate which nodes will be parked next when slots become available
- Deterministic ordering: Even when parking all eligible nodes (no limit), they are processed in a predictable order
This sorting behavior applies to both Karpenter drift detection and node label detection features. When multiple nodes are eligible for parking:
- With no limit (
MaxParkedNodes: "0"): All nodes are parked in order from oldest to newest - With a limit: Only the oldest nodes up to the limit are parked; newer nodes wait for the next eviction interval
Use cases:
- Gradual node replacement: Control the pace of node cycling during cluster upgrades
- Resource management: Prevent excessive resource pressure from too many parked nodes
- Application stability: Ensure applications have sufficient capacity during node transitions
- Cost optimization: Balance between node replacement speed and cluster stability
- Auto-scaling clusters: Use percentage-based limits to maintain consistent safety margins as cluster size changes
The ExtraParkingLabels option allows you to specify a map of additional Kubernetes labels that will be applied to all nodes and pods during the parking process. This is useful for custom automation, monitoring, or compliance workflows.
Configuration:
ExtraParkingLabels:
example.com/owner: "infrastructure"
example.com/maintenance: "true"
example.com/upgrade-batch: "batch-1"Use cases:
- Team ownership: Mark parked nodes with team ownership labels for accountability
- Maintenance tracking: Add labels to track maintenance windows or upgrade batches
- Compliance: Apply labels required by compliance or governance policies
- Monitoring: Enable custom alerting or monitoring based on parking labels
- Automation: Trigger external automation workflows based on parking labels
Behavior:
- Labels are applied to both nodes and their non-DaemonSet pods during parking
- Labels are removed during the unparking process (if
EvictionSafetyChecktriggers unparking) - If not set or empty, no extra labels are applied
- Labels are applied in addition to the standard parking labels (
UpgradeStatusLabel,ExpiresOnLabel,ParkedByLabel)
The EvictionSafetyCheck feature provides an additional safety mechanism to prevent force eviction of pods that weren't properly prepared for parking. When enabled (default: true), k8s-shredder performs a safety check before force evicting pods from expired parked nodes.
How it works:
- Before force eviction: When a node's TTL expires and force eviction is about to begin, k8s-shredder checks all non-DaemonSet and non-static pods on the node
- Required labels check: Each pod must have:
UpgradeStatusLabelset to "parked"ExpiresOnLabelpresent with any value
- Safety decision:
- If all pods have the required labels → proceed with force eviction
- If any pod is missing required labels → unpark the node instead of force evicting
Unparking process: When safety check fails, k8s-shredder automatically unparks the node by:
- Removing
ExpiresOnLabelandExtraParkingLabelsfrom nodes and pods - Removing the
ParkedNodeTaint - Uncordoning the node (making it schedulable again)
- Setting
UpgradeStatusLabelto "unparked" on nodes and pods - Setting
ParkedByLabelto the configuredParkedByValue
Use cases:
- Safety during manual parking: If nodes are manually parked but pods weren't properly labeled
- Partial parking failures: When parking automation fails to label all pods
- Emergency recovery: Provides a safe way to recover from parking mistakes
- Compliance: Ensures only properly prepared workloads are force evicted
Configuration:
EvictionSafetyCheck: true # Enable safety checks (default)
EvictionSafetyCheck: false # Disable safety checks (force eviction always proceeds)Logging: When safety checks fail, k8s-shredder logs detailed information about which pods are missing required labels, helping operators understand why the node was unparked instead of force evicted.
k8s-shredder automatically tracks why nodes and pods were parked by applying a configurable parking reason label. This feature helps operators understand the source of parking actions and enables better monitoring and debugging.
Configuration:
ParkingReasonLabel: "shredder.ethos.adobe.net/parked-reason" # Default label nameParking Reason Values:
node-label: Node was parked due to node label detectionkarpenter-drifted: Node was parked due to Karpenter drift detectionkarpenter-disrupted: Node was parked due to Karpenter disruption detection
Behavior:
- The parking reason label is applied to both nodes and their non-DaemonSet pods during parking
- The label is automatically removed during the unparking process (e.g., when safety checks fail)
- The label value corresponds to the detection method that triggered the parking action
- This label works alongside other parking labels and doesn't interfere with existing functionality
Use cases:
- Monitoring: Track which detection method is most active in your cluster
- Debugging: Understand why specific nodes were parked
- Automation: Trigger different workflows based on parking reason
- Compliance: Audit parking actions and their sources
k8s-shredder exposes comprehensive metrics for monitoring its operation. You can find detailed information about all available metrics in the metrics documentation.
See RELEASE.md.

