|
| 1 | +# GEP-3793: Default Gateways |
| 2 | + |
| 3 | +* Issue: [#3793](https://github.com/kubernetes-sigs/gateway-api/issues/3793) |
| 4 | +* Status: Provisional |
| 5 | + |
| 6 | +(See [status definitions](../overview.md#gep-states).) |
| 7 | + |
| 8 | +## User Story |
| 9 | + |
| 10 | +**[Ana] wants a concept of a default Gateway.** |
| 11 | + |
| 12 | +Gateway API currently requires every north/south Route object to explicitly |
| 13 | +specify its parent Gateway. This is helpful in that it removes ambiguity, but |
| 14 | +it's less helpful in that [Ana] is stuck constantly explicitly configuring a |
| 15 | +thing that she probably doesn't care much about: in a great many cases, Ana |
| 16 | +just wants to create a Route that "works from the outside world" and she |
| 17 | +really doesn't care what the Gateway is called. |
| 18 | + |
| 19 | +Therefore, Ana would like a way to be able to rely on a default Gateway that |
| 20 | +she doesn't have to explicitly name, and can simply trust to exist. |
| 21 | + |
| 22 | +[Ana]: https://https//gateway-api.sigs.k8s.io/concepts/roles-and-personas/#ana |
| 23 | + |
| 24 | +## Goals |
| 25 | + |
| 26 | +- Give Ana a way to use Gateway API without having to explicitly specify a |
| 27 | + Gateway for every Route, ideally without mutating Routes. |
| 28 | + |
| 29 | +- Give Ana an easy way to determine which Gateway is the default, and which of |
| 30 | + her Routes are bound to it. |
| 31 | + |
| 32 | +- Continue supporting multiple Gateways in a cluster, while allowing exactly |
| 33 | + one of them to be the default Gateway. |
| 34 | + |
| 35 | +- Allow [Chihiro] to retain control over which Gateway is the default, so that |
| 36 | + they can ensure that it meets their requirements for security, performance, |
| 37 | + and other operational concerns. |
| 38 | + |
| 39 | +- Allow Chihiro to choose not to provide a default Gateway. |
| 40 | + |
| 41 | +- Allow Chihiro to rename, reconfigure, or replace the default Gateway at |
| 42 | + runtime. |
| 43 | + |
| 44 | + - If Chihiro renames the default Gateway, Routes using the default Gateway |
| 45 | + MUST remain bound to the new default Gateway. Ana shouldn't need to go |
| 46 | + recreate all her Routes just because Chihiro is being indecisive. |
| 47 | + |
| 48 | + - Determine how (or if) to signal changes in functionality if the default |
| 49 | + Gateway implementation is changed. For example, suppose that Chihiro |
| 50 | + switches the default Gateway from an implementation that supports the |
| 51 | + `HTTPRoutePhaseOfTheMoon` filter to an implementation that does not. |
| 52 | + |
| 53 | + (Note that this problem is not unique to default Gateways; it affects |
| 54 | + explicitly-named Gateways as well.) |
| 55 | + |
| 56 | +- Allow Chihiro to control which Routes may bind to the default Gateway, and |
| 57 | + to enumerate which Routes are currently bound to the default Gateway. |
| 58 | + |
| 59 | +- Support easy interoperation with common CI/CD and GitOps workflows. |
| 60 | + |
| 61 | +- Define how (or if) listener and Gateway merging applies to a default |
| 62 | + Gateway. |
| 63 | + |
| 64 | +## Non-Goals |
| 65 | + |
| 66 | +- Support multiple "default" Gateways in a single cluster. If Ana has to make |
| 67 | + a choice about which Gateway she wants to use, she'll need to be explicit |
| 68 | + about that. |
| 69 | + |
| 70 | + Loosening this restriction later is a possibility. For example, we may later |
| 71 | + want to consider allowing a default Gateway per namespace, or a default |
| 72 | + Gateway per implementation running in a cluster. However, these examples are |
| 73 | + not in scope for this GEP, in order to have a fighting chance of getting |
| 74 | + functionality into Gateway API 1.4. |
| 75 | + |
| 76 | +- Allow Ana to override Chihiro's choice for the default Gateway for a given |
| 77 | + Route without explicitly specifying the Gateway. |
| 78 | + |
| 79 | +- Require that every possible routing use case be met by a Route using the |
| 80 | + default Gateway. There will be a great many situations that require Ana to |
| 81 | + explicitly choose a Gateway; the existence of a default Gateway is not a |
| 82 | + guarantee that it will be correct for any given use case. |
| 83 | + |
| 84 | +- Allow for "default Gateway" functionality without a Gateway controller |
| 85 | + installed. Just as with any other Gateway, a default Gateway requires an |
| 86 | + implementation to be installed. |
| 87 | + |
| 88 | +## Overview |
| 89 | + |
| 90 | +Gateway API currently requires every north/south Route object to explicitly |
| 91 | +specify its parent Gateway. This is a wonderful example of a fundamental |
| 92 | +tension in Gateway API: |
| 93 | + |
| 94 | +- [Chihiro] and [Ian] value _explicit definition_ of everything, because it |
| 95 | + makes it easier for them to reason about the system and ensure that it meets |
| 96 | + the standards they set for it. |
| 97 | + |
| 98 | +- [Ana], on the other hand, values _simplicity_ and _ease of use_, because |
| 99 | + she just wants to get her job done without having to think about every little |
| 100 | + detail. |
| 101 | + |
| 102 | +At present, Gateway API is heavily weighted towards the point of view of |
| 103 | +Chihiro and Ian. This causes friction for Ana: for example, she can't write |
| 104 | +examples or documentation for her colleagues (or her counterparts at other |
| 105 | +companies) without telling them that they'll need to be sure to edit the |
| 106 | +Gateway name in every Route. Nor can she write a Helm chart that includes a |
| 107 | +Route without requiring the person using the chart to know the specific name |
| 108 | +for the Gateway to use. |
| 109 | + |
| 110 | +The root cause of this friction is a difference in perspective: to Chihiro and |
| 111 | +Ian, the Gateway is a first-class thing that they think about regularly, while |
| 112 | +to Ana, it's an implementation detail that she doesn't care about. Neither |
| 113 | +point of view is wrong, but they are in tension with each other. |
| 114 | + |
| 115 | +### Prior Art |
| 116 | + |
| 117 | +This is very much not a new problem: there are many other systems out there |
| 118 | +where being unambiguous is crucial, but where being completely explicit is a |
| 119 | +burden. One of the simplest examples is the humble URL, where the port number |
| 120 | +is not always explicit, but it _is_ always unambiguous. Requiring everyone to |
| 121 | +type `:80` or `:443` at the end of the host portion of every URL wouldn't |
| 122 | +actually help anyone, though allowing it to be specified explicitly when |
| 123 | +needed definitely does help people. |
| 124 | + |
| 125 | +The Ingress resource, of course, is another example of prior art: it permitted |
| 126 | +specifying a default IngressClass, allowing users to create Ingress resources |
| 127 | +that didn't specify the IngressClass explicitly. As with a great many things |
| 128 | +in the Ingress API, this caused problems: |
| 129 | + |
| 130 | +1. Ingress never defined how conflicts between multiple Ingress resources |
| 131 | + should be handled. Many (most?) implementations merged conflicting |
| 132 | + resources, which is arguably the worst possible choice. |
| 133 | + |
| 134 | +2. Ingress also never defined a way to allow users to see which IngressClass |
| 135 | + was being used by a given Ingress resource, which made it difficult for |
| 136 | + users to understand what was going on if they were using the default |
| 137 | + IngressClass. |
| 138 | + |
| 139 | +(Oddly enough, Ingress' general lack of attention to separation of concerns |
| 140 | +wasn't really one of the problems here, since IngressClass was a separate |
| 141 | +resource.) |
| 142 | + |
| 143 | +It's rare to find systems that are completely explicit or completely implicit: |
| 144 | +in practice, the trick is to find a usable balance between explicitness and |
| 145 | +simplicity, while managing ambiguity. |
| 146 | + |
| 147 | +### Debugging and Visibility |
| 148 | + |
| 149 | +It's also critical to note that visibility is critical when debugging: if Ana |
| 150 | +can't tell which Gateway is being used by a given Route, then her ability to |
| 151 | +troubleshoot problems is _severely_ hampered. Of course, one of the major |
| 152 | +strengths of Gateway API is that it _does_ provide visibility into what's |
| 153 | +going on in the `status` stanzas of its resources: every Route already has a |
| 154 | +`status` showing exactly which Gateways it is bound to. Making certain that |
| 155 | +Ana has easy access to this information, and that it's clear enough for her to |
| 156 | +understand, is clearly important for many more reasons than just default |
| 157 | +Gateways. |
| 158 | + |
| 159 | +[Chihiro]: https://https//gateway-api.sigs.k8s.io/concepts/roles-and-personas/#chihiro |
| 160 | +[Ian]: https://https//gateway-api.sigs.k8s.io/concepts/roles-and-personas/#ian |
| 161 | + |
| 162 | +## API |
| 163 | + |
| 164 | +Most of the API work for this GEP is TBD at this point. The challenge is to |
| 165 | +find a way to allow Ana to use Routes without requiring her to specify the |
| 166 | +Gateway explicitly, while still allowing Chihiro and Ian to retain control |
| 167 | +over the Gateway and its configuration. |
| 168 | + |
| 169 | +An additional concern is CD tools and GitOps workflows. In very broad terms, |
| 170 | +these tools function by applying manifests from a Git repository to a |
| 171 | +Kubernetes cluster, and then monitoring the cluster for changes. If a tool |
| 172 | +like Argo CD or Flux detects a change to a resource in the cluster, it will |
| 173 | +attempt to reconcile that change with the manifest in the Git repository -- |
| 174 | +which means that changes to the `spec` of an HTTPRoute that are made by code |
| 175 | +running in the cluster, rather than by a user with a Git commit, can |
| 176 | +potentially trip up these tools. |
| 177 | + |
| 178 | +These tools generally ignore strict additions: if a field in `spec` is not |
| 179 | +present in the manifest in Git, but is added by code running in the cluster, |
| 180 | +the tools know to ignore it. So, for example, if `spec.parentRefs` is not |
| 181 | +present at all in the manifest in Git, CD tools can probably tolerate having a |
| 182 | +Gateway controller write a new `parentRefs` stanza to the resource. |
| 183 | + |
| 184 | +There has been (much!) [discussion] about whether the ideal API for this |
| 185 | +feature will mutate the `parentRefs` of a Route using a default Gateway to |
| 186 | +reflect the Gateway chosen, or whether it should not, relying instead on the |
| 187 | +`status` stanza to carry this information. This is obviously a key point that |
| 188 | +will need resolution before this GEP can graduate. |
| 189 | + |
| 190 | +[discussion]: https://github.com/kubernetes-sigs/gateway-api/pull/3852#discussion_r2140117567 |
| 191 | + |
| 192 | +### Gateway for Ingress (North/South) |
| 193 | + |
| 194 | +### Gateway For Mesh (East/West) |
| 195 | + |
| 196 | +## Conformance Details |
| 197 | + |
| 198 | +#### Feature Names |
| 199 | + |
| 200 | +The default-gateway feature will be named `HTTPRouteDefaultGateway` and |
| 201 | +`GRPCRouteDefaultGateway`. It is unlikely that an implementation would support |
| 202 | +one of these Route types without the other, but `GatewayDefaultGateway` does |
| 203 | +not seem like a good choice. |
| 204 | + |
| 205 | +### Conformance tests |
| 206 | + |
| 207 | +## Alternatives |
| 208 | + |
| 209 | +A possible alternative API design is to modify the behavior of Listeners or |
| 210 | +ListenerSets; rather than having a "default Gateway", perhaps we would have |
| 211 | +"[default Listeners]". One challenge here is that the Route `status` doesn't |
| 212 | +currently expose information about which Listener is being used, though it |
| 213 | +does show which Gateway is being used. |
| 214 | + |
| 215 | +[default Listeners]: https://github.com/kubernetes-sigs/gateway-api/pull/3852#discussion_r2149056246 |
| 216 | + |
| 217 | +## References |
0 commit comments