Skip to content

Enhancements to the BackendTLSPolicy GEP #3835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

howardjohn
Copy link
Contributor

What type of PR is this?
/kind gep

What this PR does / why we need it:

This PR makes the minimum viable tweaks to the BackendTLSPolicy GEP and Go types to support the use cases defined in #3554.

The primary limitation with the API is that TLS origination is a client side decision that is being delegated entirely to the service producer.
This breaks a variety of important use cases, such as:

  • A Service being accessed in different ways, some of which require TLS while others do not.
  • Disagreement between service producer and consumer on how to behave (e.g. producer says "insecure skip verify" while the consumer doesn't want to use this insecure method).
  • Using TLS when the service producer is unable or unwilling to create a BackendTLSPolicy.

However, I fully acknowledge and agree that in many cases its good enough, and far simpler, for the service producer to tell clients how to connect to it.

To remedy this, I propose the following:

  • Keep the API exactly the same for the most part.
  • Allow a route-level override (i.e. owned by the consumer) that takes precedence over a Service-attached policy.
  • Add a new 'Mode' into the API spec, primarily to allow a route override to disable TLS entirely when a Service policy enables it.

This change is entirely forward-compatible from a raw CRD schema perspective.
However, the semantics of the API are changed.
A policy attached to a Service no longer strictly applies to to HTTPRoute and instead is used in any context the service is called from a conformant Gateway API implementation (mesh or otherwise).

In recognition of the tight timelines of this I am more than happy to discuss this F2F with anyone that is interested (in addition to the weekly call), and will be able to fully commit to writing the conformance tests needed for the tweaks needed here, as well as the current gaps in the testing that are currently unstaffed for the GA.

Which issue(s) this PR fixes:

Fixes #3516

Does this PR introduce a user-facing change?:

TODO

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/gep PRs related to Gateway Enhancement Proposal(GEP) cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 5, 2025
@k8s-ci-robot k8s-ci-robot requested review from candita and mlavacca June 5, 2025 02:59
@shaneutt shaneutt moved this to Review in Release v1.4.0 Jun 5, 2025
@shaneutt shaneutt added this to the v1.4.0 milestone Jun 5, 2025
@shaneutt shaneutt requested a review from robscott June 5, 2025 13:00
@shaneutt shaneutt self-assigned this Jun 5, 2025
@shaneutt shaneutt self-requested a review June 5, 2025 13:00
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 18, 2025
@@ -5,61 +5,53 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@candita as a primary stakeholder here: is it your interpretation that the semantic changes here are ultimately compatible with the implementations who have already adopted the current API?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shaneutt no that's not my interpretation. Furthermore, the language changes are not something that can be inserted or reviewed in a hurry. I recommend changing the GEP as little as possible instead of overhauling it, so that it can be reviewed more easily.

@howardjohn howardjohn force-pushed the btls/new-attachment branch from 4686992 to 266e22a Compare June 18, 2025 20:47
@howardjohn
Copy link
Contributor Author

I've addressed all the comments (that had a clear resolution) and reverted all the Go changes which can be in a followup. Thanks again for all the reviews.

@youngnick
Copy link
Contributor

Okay, I've spoken to a bunch of folks, and here's the update:

Firstly, this one is not over deadline to be merged, as it's not provisional. So we have a bit of time to get the right thing happening.

The plan from here is to make the changes to this existing GEP minimal, handling the producer-default case (which is another way to describe the original case that includes re-encrypt, and only covers the BackendTLSPolicy in the same namespace as its target).

Then, we, as a project, can open a new GEP that covers extending BackendTLSPolicy to handle the consumer-override, configure-the-Gateway-client-settings-it-will-accept mode. This mode covers the BackendTLSPolicy being in the same namespace as the Gateway it configures as a client.

We then have the possibility, if we can get the conformance tests in, to move the first use case forward to Standard, and add the second case to Experimental either in this version or v1.5 (I think this version would be better).

So, we have a plan for the next week or two, which should enable US folks to enjoy their long weekend, and we can reconvene about this next week.

@howardjohn howardjohn force-pushed the btls/new-attachment branch from 266e22a to 8162a90 Compare June 20, 2025 14:49
@howardjohn
Copy link
Contributor Author

Per guidance from @youngnick and others I have sent 2 new commits: 1 reverts everything I did before, and the last one adds back the 'minimum viable changes'. I kept the git history for the rest in case it is useful.

Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @howardjohn! I think this is a good improvement. Will leave more detailed review for the follow up that's been moved now into #3876.

6. Controlling certificates used by more than one workload (#6 in [Gateway API TLS Use Cases](#references))
7. Client certificate settings used in TLS **from external clients to the
Listener** (#7 in [Gateway API TLS Use Cases](#references))
8. Providing a mechanism for the cluster operator to override gateway to backend TLS settings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has now effectively been split off into a separate GEP (#3876)

This includes the intent to originate TLS on the connection to the backend, as well as various properties of the TLS connection:
* If using mutual TLS, the client certificate to use during the TLS handshake
* How to verify the peer certificate (trusted certificates, trusted subject names, etc).
* Server Name indication

## Purpose - why do we want to do this?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the majority of changes are in #3876, I think this can be resolved.

3. The solution should consider client certificate settings used in the TLS handshake **from
Gateway to backend**, such as server name indication, trusted certificates,
and CA certificates.
2. Both the application developer persona, and gateway operator persona will have control over TLS settings.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and gateway operator persona

IIUC this moved to a separate GEP. please correct me if misunderstood. so should we move this to long term goal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will mention this is a future goal thanks!

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: howardjohn, karthikbox, LiorLieberman, robscott, shaneutt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/gep PRs related to Gateway Enhancement Proposal(GEP) release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: Review
Development

Successfully merging this pull request may close these issues.

BackendTLSPolicy's Service attachment is problematic Standardizing Behavior for Invalid BackendTLSPolicy