Skip to content

Commit 1877e2f

Browse files
Apply suggestions from code review
Co-authored-by: Mike Morris <[email protected]>
1 parent 39b4b0f commit 1877e2f

File tree

3 files changed

+9
-7
lines changed

3 files changed

+9
-7
lines changed

geps/gep-1731/metadata.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ relationships:
1919
extends: {}
2020
extendedBy:
2121
- number: 3388
22-
name: HTTPRoute Retry Budget
22+
name: Retry Budgets
2323
# seeAlso indicates other GEPs that are relevant in some way without being
2424
# covered by an existing relationship.
2525
seeAlso:

geps/gep-3388/index.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# GEP-3388: HTTP Retry Budget
1+
# GEP-3388: Retry Budgets
22

33
* Issue: [#3388](https://github.com/kubernetes-sigs/gateway-api/issues/3388)
44
* Status: Provisional
@@ -23,10 +23,9 @@ To allow configuration of a "retry budget" across all endpoints of a destination
2323
* To allow specifying inclusion of specific HTTP status codes and responses within the retry budget spec.
2424
* To allow specification of more than one retry budget for a given service, or for specific subsets of its traffic.
2525

26-
2726
## Introduction
2827

29-
Multiple data plane proxies offer optional configuration for budgeted retries, in order to create a dynamic limit on the amount of a service's active request that is being retried across its clients. In the case of Linkerd, retry budgets are the default retry policy configuration for HTTP retries within the [ServiceProfile CRD](https://linkerd.io/2.12/reference/service-profiles/), with static max retries being a [fairly recent addition](https://linkerd.io/2024/08/13/announcing-linkerd-2.16/).
28+
Multiple data plane proxies offer optional configuration for budgeted retries, in order to create a dynamic limit on the amount of a service's active request load that is comprised of retries from across its clients. In the case of Linkerd, retry budgets are the default retry policy configuration for HTTP retries within the [ServiceProfile CRD](https://linkerd.io/2.12/reference/service-profiles/), with static max retries being a [fairly recent addition](https://linkerd.io/2024/08/13/announcing-linkerd-2.16/).
3029

3130
Configuring a limit for client retries is an important factor in building a resilient system, allowing requests to be successfully retried during periods of intermittent failure. But too many client-side retries can also exacerbate consistent failures and slow down recovery, quickly overwhelming a failing system and leading to cascading failures such as retry storms. Configuring a sane limit for max client-side retries is often challenging in complex systems. Allowing an application developer (Ana) to configure a dynamic "retry budget" reduces the risk of a high number of retries across clients. It allows a service to perform as expected in both times of high & low request load, as well as both during periods of intermittent & consistent failures.
3231

@@ -76,7 +75,7 @@ Configuring a retry budget through a Policy Attachment may produce some confusio
7675

7776
Discrepancies in the semantics of retry budget behavior and configuration options between Envoy and Linkerd may require a change in either implementation to accommodate the Gateway API specification. While Envoy's `min_retry_concurrency` setting may behave similarly in practice to Linkerd's `minRetriesPerSecond`, they are not directly equivalent.
7877

79-
A version of Linkerd's `ttl` parameter may also need to be implemented within Envoy.
78+
The implementation of a version of Linkerd's `ttl` parameter within Envoy might be a path towards reconciling the behavior of these implementations, as it could allow Envoy to express a `budget_percent` and minimum number of permissible retries over a period of time rather than by tracking active and pending connections. It is not currently clear which of these models is preferable, but being able to specify a budget as requests over a window of time seems like it might offer more predictable behavior.
8079

8180
## API
8281

@@ -102,11 +101,14 @@ TODO
102101

103102
## Other considerations
104103

105-
TODO
104+
* Is it worth allowing the budget to be expressed as a `Fraction` similar to `HTTPRequestMirrorFilter` as described in [GEP-3171](https://gateway-api.sigs.k8s.io/geps/gep-3171/), or is a percentage sufficient for this use case? (Expressing a sub-1% budget for retries seems less necessary than for mirroring or redirecting traffic at significant scale.)
105+
* As there isn't anything inherently specific to HTTP requests in either known implementation, a retry budget policy on a target Service could likely be applicable to GRPCRoute as well as HTTPRoute requests.
106+
* While retry budgets are commonly associated with service mesh uses cases to handle many distributed clients, a retry budget policy may also be desirable for north/south implementations of Gateway API to prioritize new inbound requests and minimize tail latency during periods of service instability.
106107

107108
## References
108109

109110
* <https://gateway-api.sigs.k8s.io/geps/gep-1731/>
111+
* <https://finagle.github.io/blog/2016/02/08/retry-budgets/>
110112
* <https://linkerd.io/2019/02/22/how-we-designed-retries-in-linkerd-2-2/>
111113
* <https://linkerd.io/2.11/tasks/configuring-retries/>
112114
* <https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/cluster/v3/circuit_breaker.proto#config-cluster-v3-circuitbreakers-thresholds-retrybudget>

geps/gep-3388/metadata.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
apiVersion: internal.gateway.networking.k8s.io/v1alpha1
22
kind: GEPDetails
33
number: 3388
4-
name: HTTP Retry Budget
4+
name: Retry Budgets
55
status: Provisional
66
# Any authors who contribute to the GEP in any way should be listed here using
77
# their Github handle.

0 commit comments

Comments
 (0)