Skip to content

chore(slo): Refactor create, update and reset SLO operations #221206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

kdelemme
Copy link
Contributor

@kdelemme kdelemme commented May 21, 2025

Summary

resolves #221205

This PR does 3 things:

  1. Refactor the create service by introducing some private methods to make everything more readable. Also removed a bunch of unnecessary await.
  2. Refactor the update service by parallelizing most operations (when deleting older resources, and installing new resources), which should improve the overall latency of this API.
  3. Refactor the reset service by parallelizing most operations, but also not waiting on the deletion of the previous data (which can definitely take some time and make the API timeout) before installing the reseted SLO. This is achieved by targeting the delete by query to the previous SLO revision, while using a bumped revision for the new SLO.

Comparison

API Main Branch
reset (critical path) reset API critical path main reset API critical path updated
reset reset API main reset API updated
update (critical path) update API critical path main update API critical path updated
update update API main update API updated
  • Requests are parallelized when possible
  • Deletion of previous data on reset is not blocking anymore (this is +++ if the SLO has been running for more than a few months)

@kdelemme kdelemme added the release_note:skip Skip the PR/issue when compiling release notes label May 21, 2025
@kdelemme kdelemme requested a review from a team as a code owner May 21, 2025 22:01
@kdelemme kdelemme added Team:obs-ux-management Observability Management User Experience Team backport:version Backport to applied version labels v9.1.0 v8.19.0 labels May 21, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

@github-actions github-actions bot added the author:obs-ux-management PRs authored by the obs ux management team label May 21, 2025
Comment on lines -102 to -104
rollbackOperations.push(() => this.transformManager.stop(rollupTransformId));
rollbackOperations.push(() => this.summaryTransformManager.stop(summaryTransformId));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stopping the transform is not necessary anymore since uninstalling will proceed regardless of the transform state

@shahzad31
Copy link
Contributor

@kdelemme can you please post the APM instrumented view for these routes for before/after comparisons.

@kdelemme
Copy link
Contributor Author

I'll try to find the time later this week. it will be very similar to the similar changes that happened in the create service awhile back

@kdelemme kdelemme force-pushed the kde/slo/reset-bump-revision branch from 58aa6c8 to 92fc92d Compare May 26, 2025 13:10
@elasticmachine
Copy link
Contributor

elasticmachine commented May 26, 2025

⏳ Build in-progress, with failures

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #102 / Serverless Observability - Deployment-agnostic API integration tests SLO Reset SLOs resets the related resources
  • [job] [logs] FTR Configs #102 / Serverless Observability - Deployment-agnostic API integration tests SLO Reset SLOs resets the related resources
  • [job] [logs] FTR Configs #119 / Stateful Observability - Deployment-agnostic API integration tests SLO Reset SLOs resets the related resources
  • [job] [logs] FTR Configs #119 / Stateful Observability - Deployment-agnostic API integration tests SLO Reset SLOs resets the related resources

History

@kdelemme kdelemme self-assigned this May 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author:obs-ux-management PRs authored by the obs ux management team backport:version Backport to applied version labels release_note:skip Skip the PR/issue when compiling release notes Team:obs-ux-management Observability Management User Experience Team v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SLO] Improve reset and update operation duration
3 participants