Skip to content

GEP 91: Update API #3881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

arkodg
Copy link
Contributor

@arkodg arkodg commented Jun 25, 2025

Relates to #3760 (comment)

What type of PR is this?

Does this PR introduce a user-facing change?:

NONE

Relates to kubernetes-sigs#3760 (comment)

Signed-off-by: Arko Dasgupta <[email protected]>
@k8s-ci-robot k8s-ci-robot added the release-note-none Denotes a PR that doesn't merit a release note. label Jun 25, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: arkodg
Once this PR has been reviewed and has the lgtm label, please assign thockin for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/gep PRs related to Gateway Enhancement Proposal(GEP) size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 25, 2025
@arkodg arkodg marked this pull request as draft June 25, 2025 03:56
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 25, 2025
@@ -570,6 +577,8 @@ type GatewayTLSConfig struct {
// that requests a user to specify the client certificate.
// The maximum depth of a certificate chain accepted in verification is Implementation specific.
//
// Setting any field will override the equivalent field setting that was applied at the Gateway level.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple of comments:

  1. Regarding "Setting any fields": The current phrasing implies merging the Frontend Validation struct from both gateway and listener levels. Given that this struct contains fields whose values depend on each other, I propose allowing overrides at the struct level. Therefore, I would change the line to "Setting this field."
  2. Regarding overriding Frontend validation config on the listener level: This does not resolve the HTTP Coalescing issue. To address this, we should restrict listener overrides. Two potential approaches are:
    a) Restrict the use of the same TLS config for listeners sharing the same port.
    b) Allow overrides only for listeners without a hostname set. We can extend this solution with a mechanism to inherit the config for all listeners with the same port from the parent (which is a listener without hostname set). This would provide clear signals to customers about the coalescing issue.

Let me know what you think about this proposal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for 1. there's a case where you may want to configure the CACert at the Gateway level, but update the mode at the Listener level, which is why i phrased it this way.

  1. https://gateway-api.sigs.k8s.io/geps/gep-3567/ takes a more lenient approach and proposes adding a OverlappingTLSConfig condition, imo we should do the same here, either piggyback off that condition or highlight via a new condition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youngnick ccing you here for 1. remember you bringing up this use case for per field overrides during the meeting in kubecon

@robscott ccing you here for 2. and to get some guidance on whether we want to use the same status condition or add an extra one

Copy link
Contributor Author

@arkodg arkodg Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rethinking for this case, this should affect the overlapping listeners and set Listener Accepted condition to False, with a ClientCertificateVerificationBypassed reason or something similar, wdyt ?

@@ -636,10 +662,47 @@ type FrontendTLSValidation struct {
// "RefNotPermitted" reason.
//
// +kubebuilder:validation:MaxItems=8
// +kubebuilder:validation:MinItems=1
// +kubebuilder:validation:MinItems=0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate why this change is needed? What is the use case not to pass any certificates?
From our customers we can see that there is a need to pass one or more certificates and to differentiate between trust anchors and allowlisted certificates. Since Allowlisted certificates are not technically CA certificates we can either rename existing field or introduce a new field for allowlisted certificates then sending an empty list of CAcertificates makes sense to me.
Also we should update the “Support: Core” line to reflect those changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for the case when the CA cert is applied at the gateway level, and a mode is applied at the listener level

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relates to #3881 (comment)

@@ -27,7 +27,9 @@ This use case has been highlighted in the [TLS Configuration GEP][] under segmen
* This new field is separate from the existing [BackendTLSPolicy][] configuration. [BackendTLSPolicy][] controls TLS certificate validation for connections *from* the
Gateway to the backend service. This proposal adds the ability to validate the TLS certificate presented by the *client* connecting to the Gateway (the
frontend). These two validation mechanisms operate independently and can be used simultaneously.
* Also introduce a `ObjectReference` structure that can be used to specify `caCertificateRefs` references.
* Introduce a `ObjectReference` structure that can be used to specify `caCertificateRefs` references.
* Add `frontendValidation` to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not finished.

// GatewayTLSConfig describes a TLS configuration that can be applied to a Gateway.
type GatewayTLSConfig struct {
// FrontendValidation holds configuration information for validating the frontend (client).
// Setting this field will require clients to send a client certificate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is inconsistent with struct changes. Please see my comment about certificates arrey.

* Also introduce a `ObjectReference` structure that can be used to specify `caCertificateRefs` references.
* Introduce a `ObjectReference` structure that can be used to specify `caCertificateRefs` references.
* Add `frontendValidation` to
* Introduce a `tls` field within the Gateway Spec to allow for a common TLS configuration to apply across all listeners.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include information that “the newly introduced struct currently contains only client certificate validation configuration, but may be extended in the future."

// that requests a user to specify the client certificate.
// The maximum depth of a certificate chain accepted in verification is Implementation specific.
//
// Each field may be overidden by an equivalent setting applied at the Listener level.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why we feel the need to have a Gateway level setting here.

Why not allow every setting in Listener at the top level? FrontendValidation doesn't seem unique at all. If anything, it seems less likely to be common across listeners.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see its for Add a top level field for frontendValidation to avoid the auth bypass that may occur with HTTP/2 connection coalescing highlighted in https://gateway-api.sigs.k8s.io/geps/gep-3567/#interaction-with-client-cert-validation'. Given you can override it at the listener level, we didn't actually solve this problem...

IMO this is better solved with status or external validation. We can ship a sample Validation Admission Policy to deny inconsistent config between listeners, for example

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Moving the frontend validation configuration to gateway was intended to resolve the HTTP coalescing issue. However, this may not be necessary if we can provide users with clear guidelines for configuring TLS settings for listeners.
What do you think about allowing overrides only for listeners without a hostname set? This would enable all listeners on the same port (regardless of hostname) to inherit the configuration, thus resolving the coalescing issue while keeping TLS settings at the listener level.

Signed-off-by: Arko Dasgupta <[email protected]>
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 3, 2025
@arkodg arkodg marked this pull request as ready for review July 3, 2025 18:56
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 3, 2025
@k8s-ci-robot k8s-ci-robot requested review from mlavacca and robscott July 3, 2025 18:56
@arkodg arkodg requested a review from kl52752 July 3, 2025 18:56
@k8s-ci-robot
Copy link
Contributor

@arkodg: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-gateway-api-crds-validation-3 64b4751 link true /test pull-gateway-api-crds-validation-3
pull-gateway-api-test 64b4751 link true /test pull-gateway-api-test
pull-gateway-api-crds-validation-2 64b4751 link true /test pull-gateway-api-crds-validation-2
pull-gateway-api-crds-validation-5 64b4751 link true /test pull-gateway-api-crds-validation-5
pull-gateway-api-crds-validation-1 64b4751 link true /test pull-gateway-api-crds-validation-1
pull-gateway-api-crds-validation-4 64b4751 link true /test pull-gateway-api-crds-validation-4
pull-gateway-api-verify 64b4751 link true /test pull-gateway-api-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

// presented during the handshake and validated
// using CA certificates defined in CACertificateRefs.
//
// Defaults to RequireAndVerify.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In previous comments we discussed 3 options for cert validation:

  • AllowInvalid
  • AllowMissing
  • RequireValid
    In the current proposal the validation settings are too granular, looking at the fact that they are in Core support.
    IMO we could merge Missing and Invalid into 1 option because in both cases request validation is moved to backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this eliminates VerifyIfGiven

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, because I don't understand what would be the use case for all 4 ceses here.
Why do you want a case where the server doesn't require a certificate but if the client provides the certificate then it needs to be valid?
From our customers we can see that they need 2 usecases:
Require and validate the certificate
Allow for invalid and missing certificates
There are only two use cases because, in the second case, whether the certificate is missing or invalid is irrelevant; this scenario necessitates an additional validation at the backend level. Can you provide more context on Use Cases for all listed values?

@arkodg
Copy link
Contributor Author

arkodg commented Jul 15, 2025

hey @candita @shaneutt here's a summary of the discussion from today's meeting

I'm referring to the proposal in this PR as Proposal 1 and @kl52752's proposal as Proposal 2

TLDR

Aspect Proposal 1 Proposal 2
Summary Adds FrontendValidation at Gateway and Listener levels. Adds FrontendValidation at Gateway level with an additional optional port field to specify per port configuration.
Security Implementation should correctly detect listeners with overlapping TLS Config as recommended by GEP-3567 and reject overlapping listeners. Connection reuse issues are fully addressed.
UX - Per Listener Configuration is consistent with current Server TLS configuration.
- Additional top level Gateway configuration that's common for all listeners.
- Also supports Listener-level field overrides, useful in cases where mode is different across listeners but the CA trust bundle is the same.
TLS configuration might be confusing to customers because it is detached from the Listeners. However, for HTTPS, TLS configuration is tied to the connection, and the connection can extend beyond the Listener's boundaries which is reflected with the proposal.
Impact on other listeners Validating new listeners against existing ones to reject listeners with overlapping TLS config could disrupt the current system for customers if changes are not tested in a pre-prod environment. Adding new listeners will not break existing setup. Changes in the TLS configuration might break current listeners.
Hostname configuration Permits configuration per hostname which some implementations support (highlighted in the Prior Art section). Does not allow per hostname validation.

cc @robscott @youngnick

//
// +optional
// +kubebuilder:default=RequireAndVerify
Mode *FrontendValidationModeType `json:"mode,omitempty"`
Copy link
Contributor

@candita candita Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mode is a bad name for this concept. Can you make it more specific? ValidationMode or ValidationPrecision?

Copy link
Contributor Author

@arkodg arkodg Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tls.frontendValidation.validationMode repeats validation twice

@shaneutt shaneutt dismissed their stale review July 16, 2025 17:58

unblock

@shaneutt shaneutt self-requested a review July 16, 2025 17:58
Copy link
Member

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me that multiple in-use versions of Gateway API will be affected for some time, and so there may be a "long tail" kind of situation here with implementations lagging behind for potentially a year or more?

As such, should we consider expanding our documentation to better describe this problem, and better warn implementations as an immediate first step prior to the proposal and changes?

@robscott
Copy link
Member

Thanks to @arkodg and @kl52752 for the great summary of the options we have here!

IMO, this is one of the highest risk areas of Gateway API. If we're not very careful, implementations will expose vulnerabilities when trying to implement this part of the API. With that in mind, my strong preference is for "Proposal 2" as described above.

Essentially, the only way to safely implement proposal 1 is to require implementations to inspect each cert, keep track of the SANs, and ensure that client cert validation is always consistent across listeners that have certs with overlapping SANs.

I'd strongly prefer avoiding requiring implementations to inspect individual certs like this, and I'm very nervous about the wide number of ways an implementation could get proposal 1 wrong and lead to some painful CVEs.

Although proposal 2 offers a less powerful API, it is far safer, and does not prevent us from adding further granularity in the future if needed. So my vote here would be strongly in favor of proposal 2.

Open to what others think here.

/cc @howardjohn @mlavacca @mikemorris @shaneutt

@youngnick
Copy link
Contributor

I also vote for proposal 2 here, and am happy to LGTM something that adds that (I suspect this will all happen overnight while I am asleep).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/gep PRs related to Gateway Enhancement Proposal(GEP) release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants