Is there an existing issue for this?
Problem Statement
Use of KTF outside CI may leak GKE clusters due to humans forgetting to delete them after usage. To avoid unnecessary cloud spend, we'd like a means to have them default to ephemeral unless marked otherwise.
Proposed Solution
- Add a GKE cluster label indicating a deletion date.
- Default this label to three days after creation.
- Provide a flag allowing users to override the default.
- Add a scheduled task to delete clusters past their expiration.
Additional information
Elsewhere we run cleanup jobs as part of CI in projects that use KTF. There isn't really a great CI location for this scheduled deletion, since this is intended for clusters created outside CI. I don't think it makes sense to run this in the KTF repo itself.
GCP does provide https://cloud.google.com/scheduler with 3 free jobs a month ($0.10/job/month after). We should use it as a project-agnostic deletion method.
We may want a "never" option but it's simpler to just use dates as label values, and we should probably discourage indefinite lifetime clusters anyway. Setting the duration to 9999 or something similarly ridiculous or manually removing the label should be sufficient if you really, really want to avoid the cleanup job.
Acceptance Criteria
Is there an existing issue for this?
Problem Statement
Use of KTF outside CI may leak GKE clusters due to humans forgetting to delete them after usage. To avoid unnecessary cloud spend, we'd like a means to have them default to ephemeral unless marked otherwise.
Proposed Solution
Additional information
Elsewhere we run cleanup jobs as part of CI in projects that use KTF. There isn't really a great CI location for this scheduled deletion, since this is intended for clusters created outside CI. I don't think it makes sense to run this in the KTF repo itself.
GCP does provide https://cloud.google.com/scheduler with 3 free jobs a month ($0.10/job/month after). We should use it as a project-agnostic deletion method.
We may want a "never" option but it's simpler to just use dates as label values, and we should probably discourage indefinite lifetime clusters anyway. Setting the duration to 9999 or something similarly ridiculous or manually removing the label should be sufficient if you really, really want to avoid the cleanup job.
Acceptance Criteria