-
Notifications
You must be signed in to change notification settings - Fork 551
Adding GEP-3539: Gateway API to Expose Pods on Cluster-Internal IP Address (ClusterIP Gateway) #3608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Adding GEP-3539: Gateway API to Expose Pods on Cluster-Internal IP Address (ClusterIP Gateway) #3608
Changes from all commits
6a061ca
e876ced
5176c6d
741292c
72865bd
8dab84a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
apiVersion: gateway.networking.k8s.io/v1 | ||
kind: Gateway | ||
metadata: | ||
name: example-cluster-ip-gateway | ||
spec: | ||
addresses: | ||
- value: 10.12.0.15 | ||
gatewayClassName: cluster-ip | ||
listeners: | ||
- name: example-service | ||
protocol: TCP | ||
port: 8080 | ||
allowedRoutes: | ||
namespaces: | ||
from: Same | ||
kinds: | ||
- kind: TCPRoute | ||
- kind: CustomRoute |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
apiVersion: gateway.networking.k8s.io/v1 | ||
kind: GatewayClass | ||
metadata: | ||
name: cluster-ip | ||
spec: | ||
controllerName: "networking.k8s.io/cluster-ip-controller" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,250 @@ | ||
# GEP-3539: ClusterIP Gateway - Gateway API to Expose Pods on Cluster-Internal IP Address | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This might have started out as "ClusterIP Gateways" but at this point it's really more like "Service-equivalent functionality via Gateway API". |
||
|
||
* Issue: [#3539](https://github.com/kubernetes-sigs/gateway-api/issues/3539) | ||
* Status: Provisional | ||
|
||
## TLDR | ||
|
||
Gateway API enables advanced traffic routing and can be used to expose a | ||
logical set of pods on a single IP address within a cluster. It can be seen | ||
as the next generation ClusterIP providing more flexibility and composability | ||
than Service API. This comes at the expense of some additional configuration | ||
and manageability burden. | ||
ptrivedi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Goals | ||
|
||
* Define Gateway API usage to accomplish ClusterIP Service style behavior | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Beyond the fact that it's not just ClusterIP, I think there are at least 3 use cases hiding in that sentence.
The GEP talks about case 2 some, but it doesn't really explain why we'd want to do that (other than via the link to Tim's KubeCon lightning talk). |
||
* Propose DNS layout and record format for ClusterIP Gateway | ||
ptrivedi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Extend the use of Gateway API to provide NodePort and LoadBalancer Service | ||
type of functionality | ||
|
||
## Non-Goals | ||
|
||
* Make significant changes to Gateway API | ||
* Provide path for existing ClusterIP Services in a cluster to migrate to | ||
Gateway API model | ||
|
||
## API Changes | ||
|
||
* EndpointSelector is recognized as a backend | ||
* DNS record format for ClusterIP Gateways | ||
ptrivedi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Introduction | ||
|
||
Gateway API provides a generic and composable model for defining L4 and L7 | ||
routing in Kubernetes. Very simply, it describes how to get traffic into pods. | ||
ClusterIP provides similar functionality of an ingress point for routing traffic | ||
into pods. As the Gateway API has evolved, there have been discussions around whether | ||
it can be a substitute for the increasingly complex and overloaded Service API. This | ||
document aims to describe what this could look like in practice, with a focus on | ||
ClusterIP and a brief commentary on how the concept design can be extended to | ||
accommodate LoadBalancer and NodePort Services. | ||
|
||
## Overview | ||
|
||
Gateway API can be thought of as decomposing Service API into multiple separable | ||
components that allow for definition of the ClusterIP address and listener configuration | ||
(Gateway resource), implementation specifics and common configuration (GatewayClass | ||
resource), and routing traffic to backends (Route resource). | ||
|
||
### Limitations of Service API | ||
ptrivedi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Besides what has been discussed in the past about Service API maintainability, evolvability, | ||
and complexity concerns, see: https://www.youtube.com/watch?v=Oslwx3hj2Eg, we ran into | ||
additional practical concerns that rendered Service API insufficient for the needs at hand. | ||
|
||
Service IPs can only be assigned out of the ServiceCIDR range configured for the API server. | ||
While Kubernetes 1.31 added a Beta feature (GA 1.33) that allows for the Extension of Service IP Ranges, | ||
there have been use cases where multi-NIC pods (pods with multiple network interfaces) require | ||
the flexibility of specifying different ServiceCIDR ranges to be used for ClusterIP services | ||
corresponding to the multiple different networks. There are strict traffic splitting and network | ||
isolation requirements that demand non-overlapping ServiceCIDR ranges for per-network ClusterIP | ||
service groups. Because of the way service definition and IP address allocation are tightly | ||
ptrivedi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
coupled in API server, it is not possible to use the current Service API to achieve this model | ||
without resorting to inelegant and klugey implementations. | ||
|
||
Gateway API also satisfies, in a user-friendly and uncomplicated manner, the need for advanced | ||
ptrivedi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
routing and load balancing capabilities in order to enable canary rollouts, weighted traffic | ||
distribution, isolation of access and configuration. | ||
|
||
### Service Model to Gateway API Model | ||
|
||
 | ||
|
||
### EndpointSelector as Backend | ||
|
||
A Route can forward traffic to the endpoints selected via selector rules defined in EndpointSelector. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW, I can imagine a path toward maybe making this a regular core feature. I am sure that it would be tricky but I don't think it's impossible. Eg. Define a Service with selector foo=bar. That triggers us to create a PodSelector for foo=bar. That triggers the endpoints controller(s) to do their thing. Same as we do with IP. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting thought. For starters at least, there seemed to be agreement on having a GEP for EndpointSelector as the next step. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As always, Gateway proves something is a good idea, then core steals the spotlight. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
FWIW NetworkPolicies also contain selectors that need to be resolved to Pods, and we've occasionally talked about how nice it would be if the selector-to-pod mapping could be handled centrally, rather than every NP impl needing to implement that itself, often doing it redundantly on every node. I guess in theory, we could do that with EndpointSlice even, since kube-proxy will ignore EndpointSlices that don't have a label pointing back to a Service, so we could just have another set of EndpointSlices for NetworkPolicies... (EndpointSlice has a bunch of fields that are wrong for NetworkPolicy but most of them are optional and could just be left unset...) Though this also reminds me of my theory that EndpointSlice should have been a gRPC API rather than an object stored in etcd. The EndpointSlice controller can re-derive the entire (controller-generated) EndpointSlice state from Services and Pods at any time, and it needs to keep all that state in memory while it's running anyway. So it should just serve that information out to the controllers that need it (kube-proxy, gateways) in an efficient use-case-specific form (kind of like the original kpng idea) rather than writing it all out to etcd. (Alternate version: move |
||
While Service is the default resource kind of the referent in backendRef, EndpointSelector is | ||
suggested as an example of a custom resource that implementations could have to attach pods (or | ||
potentially other resource kinds) directly to a Route via backendRef. | ||
|
||
```yaml | ||
apiVersion: networking.gke.io/v1alpha1 | ||
ptrivedi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
kind: EndpointSelector | ||
metadata: | ||
name: front-end-pods | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. probably want this to work the same way EndpointSlice does, where the |
||
spec: | ||
kind: Pod | ||
selector: | ||
- key: app | ||
value: frontend | ||
operator: In | ||
``` | ||
|
||
The EndpointSelector object is defined as follows. It allows the user to specify which endpoints | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it make sense to have a config field so we can have implementation specific parameters? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, I think something like config would be needed.
How the traffic will be distributed seems to be more of a Route level config than EndpointSelector level config. e.g. BackendRefs already have a weight field today
publishNotReadyAddresses could be one more thing that could go here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The
Not sure. To me, this is a combination of both Route and Backend (Service, EndpointSelector...). The Route steers traffic to backends via some characteristics (L7 (HTTP...), L3/L4 (IPs, Ports, protocols)) and the backend (Service, EndpointSelector...) defines how to distribute it (Load-Balance it over a set of IPs).
Yes, to me |
||
should be targeted for the Route. | ||
|
||
```yaml | ||
apiVersion: networking.gke.io/v1alpha1 | ||
kind: EndpointSelector | ||
metadata: | ||
name: front-end-pods | ||
spec: | ||
kind: Pod | ||
selector: | ||
- key: app | ||
value: frontend | ||
operator: In | ||
``` | ||
|
||
To allow more granular control over traffic routing, there have been discussions around adding | ||
support for using Kubernetes resources besides Service (or external endpoints) directly as backendRefs. | ||
Gateway API allows for this flexibility, so having a generic EndpointSelector resource supported as a | ||
backendRef would be a good evolutionary step. | ||
|
||
### User Journey | ||
|
||
Infrastructure provider supplies a GatewayClass corresponding to the type of service-like behavior to | ||
be supported. | ||
|
||
Below is the example of a GatewayClass for ClusterIP support: | ||
```yaml | ||
{% include 'standard/clusterip-gateway/clusterip-gatewayclass.yaml' %} | ||
``` | ||
|
||
The user must then create a Gateway in order to configure and enable the behavior as per their intent: | ||
```yaml | ||
{% include 'standard/clusterip-gateway/clusterip-gateway.yaml' %} | ||
``` | ||
|
||
By default, IP address(es) from a pool specified by a CIDR block will be assigned unless a static IP is | ||
ptrivedi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
configured in the _addresses_ field as shown above. The CIDR block may be configured using a custom CR. | ||
Subject to further discussion, it may make sense to have a GatewayCIDR resource available upstream to | ||
specify an IP address range for Gateway IP allocation. | ||
|
||
Finally the specific Route and EndpointSelector resources must be created in order to set up the backend | ||
pods for the configured ClusterIP. | ||
```yaml | ||
kind: [TCPRoute|CustomRoute] | ||
metadata: | ||
name: service-route | ||
spec: | ||
config: | ||
sessionAffinity: false | ||
parentRefs: | ||
- name: example-cluster-ip-gateway | ||
rules: | ||
backendRefs: | ||
- kind: EndpointSelector | ||
port: 8080 | ||
name: exampleapp-app-pods | ||
--- | ||
apiVersion: gateway.networking.k8s.io/v1alpha1 | ||
kind: EndpointSelector | ||
metadata: | ||
name: exampleapp-app-pods | ||
spec: | ||
selector: | ||
- key: app | ||
value: exampleapp | ||
operator: In | ||
``` | ||
|
||
### Backends on Listeners | ||
|
||
As seen above, Gateway API requires at least three CRs to be defined. This introduces some complexity. | ||
GEP-1713 proposes the addition of a ListenerSet resource to allow sets of listeners to attach to a Gateway. | ||
As a part of discussions around this topic, the idea of directly adding backendRefs to listeners has come | ||
up. Allowing backendRefs directly on the listeners eliminates the need to have Route objects for simple | ||
cases. More complex traffic splitting and advanced load balancing cases can still use Route attachments via | ||
allowedRoutes. | ||
|
||
### DNS | ||
|
||
ClusterIP Gateways in the cluster need to have consistent DNS names assigned to allow ClusterIP lookup by | ||
name rather than IP address. DNS A and/or AAAA record creation needs to happen when Kubernetes publishes | ||
information about Gateways, in a manner similar to ClusterIP Service creation behavior. DNS nameservers | ||
in pods’ /etc/resolv.conf need to be programmed accordingly by kubelet. | ||
|
||
``` | ||
<name of gateway>.<gateway-namespace>.gw.cluster.local | ||
ptrivedi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
|
||
This results in the following search option entries in Pods’ /etc/resolv.conf: | ||
``` | ||
search <ns>.gw.cluster.local gw.cluster.local cluster.local | ||
``` | ||
|
||
### Cross-namespace References | ||
|
||
Gateway API allows for Routes in different namespaces to attach to the Gateway. | ||
|
||
When modeling ClusterIP service networking, the simplest recommendation might be to keep Gateway and Routes | ||
within the same namespace. While cross namespace routing would work and allow for evolved functionality, | ||
it may make supporting certain cases tricky. One specific example for this case is the pod DNS resolution | ||
ptrivedi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
support of the following format | ||
|
||
``` | ||
pod-ipv4-address.gateway-name.my-namespace.gw.cluster-domain.example | ||
``` | ||
|
||
If Gateway and Routes (and hence the backing pods) are in different namespaces, there arises ambiguity in | ||
whether and how to support this pod DNS resolution format. | ||
|
||
## LoadBalancer and NodePort Services | ||
|
||
Extending the concept further to LoadBalancer and NodePort type services follows a similar pattern. The idea | ||
is to have a GatewayClass corresponding to each type of service networking behavior that needs to be modeled | ||
and supported. | ||
|
||
 | ||
ptrivedi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Note that Gateway API allows flexibility and clear separation of concerns so that one would not need to | ||
configure cluster-ip and node-port when configuring a load-balancer. | ||
|
||
But for completeness, the case shown below demonstrates how load balancer functionality analogous to | ||
ptrivedi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
LoadBalancer Service API can be achieved using Gateway API. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the example from the image uses load-balancer as the class. The cloud providers usually have a few variants of LBs and preferably these would have their own classes but these would be somewhat unique since the cloud provider controller would act on it but also kubeproxy, cilium and others would need to do some setup on their side. Maybe we could set something on the GatewayClass that would indicate it's a L4 LB class so that the node networking part has to treat it as a LB? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Linking a similar discussion: #3608 (comment) |
||
|
||
 | ||
|
||
## Additional Service API Features | ||
|
||
Services natively provide additional features as listed below (not an exhaustive list). Gateway API can be | ||
extended to provide some of these features natively, while others may be left up to the specifics of | ||
implementations. | ||
|
||
| Feature | ServiceAPI options | Gateway API possibilities | | ||
|---|---|---| | ||
| sessionAffinity | ClientIP <br /> NoAffinity | Route level | ||
ptrivedi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| allocateLoadBalancerNodePorts | True <br /> False | Not supported? (No need for LB Gateway type to also create NodePort) | | ||
| externalIPs | List of externalIPs for service | Not supported? | | ||
| externalTrafficPolicy | Local <br /> Cluster | Supported for LB Gateways only, Route level | | ||
ptrivedi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| internalTrafficPolicy | Local <br /> Cluster | Supported for ClusterIP Gateways only, Route level | | ||
| ipFamily | IPv4 <br /> IPv6 | Route level | | ||
ptrivedi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| publishNotReadyAddresses | True <br /> False | Route or EndpointSelector level | | ||
| ClusterIP (headless service) | IPAddress <br /> None | GatewayClass definition for Headless Service type | | ||
| externalName | External name reference <br /> (e.g. DNS CNAME) | GatewayClass definition for ExternalName Service type | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Not mentioned here:
|
||
|
||
## References | ||
|
||
* [Original Doc](https://docs.google.com/document/d/1N-C-dBHfyfwkKufknwKTDLAw4AP2BnJlnmx0dB-cC4U/edit) | ||
|
||
## Open Questions and Followups | ||
|
||
1) How do we take this forward? | ||
* "Gateway as new-and-improved Service" -- A parallel, orthogonally-extensible API for cleaner and more advanced Service funtionality | ||
* "Gateway as backend for Service" -- An underlying implementation for Service functionality allowing the simpler UX provided by Service API to be unchanged for end users while allowing advanced users to deal with Gateway API resources directly | ||
2) Decouple DNS from this topic -- Headless, externalName, and other DNS functionality may be discussed in context of having a separate API/Object that represents DNS concepts | ||
3) Should there be GatewayClasses with official, reserved names (e.g. clusterip) such that all agents, CNIs, providers will know to have standard implementation for it? | ||
4) Standardization of an EndpointSelector type of resource | ||
5) Discuss topology aware routing feature as a generic feature. Features like internal/externalTrafficPolicy can then be appropriately morphed and provided as a part of topology aware routing |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
apiVersion: internal.gateway.networking.k8s.io/v1alpha1 | ||
kind: GEPDetails | ||
number: 3539 | ||
name: ClusterIP Gateway - Gateway API to Expose Pods on Cluster-Internal IP Address | ||
status: Provisional | ||
authors: | ||
- ptrivedi |
Uh oh!
There was an error while loading. Please reload this page.