Skip to content

Conversation

@Inode1
Copy link
Contributor

@Inode1 Inode1 commented Jun 13, 2025

What type of PR is this?

What this PR does / why we need it:

Add support for determining route stat names in BackendTrafficPolicy

Supported: xRoute
Supported Formatters: %ROUTE_KIND%, %ROUTE_NAME%, %ROUTE_NAMESPACE%, %ROUTE_RULE_NAME%, %ROUTE_RULE_NUMBER%

Which issue(s) this PR fixes:

Fixes #6190

Release Notes: Yes

@Inode1 Inode1 requested a review from a team as a code owner June 13, 2025 22:09
@Inode1 Inode1 force-pushed the feat-per-route-metrics branch from 559f768 to 1484995 Compare June 14, 2025 19:41
@Inode1 Inode1 marked this pull request as draft June 14, 2025 19:43
@Inode1 Inode1 force-pushed the feat-per-route-metrics branch 3 times, most recently from 828420c to 04cd1df Compare June 14, 2025 20:28
@codecov
Copy link

codecov bot commented Jun 14, 2025

Codecov Report

❌ Patch coverage is 88.88889% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.15%. Comparing base (a9fa022) to head (1e6d51f).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
internal/gatewayapi/backendtrafficpolicy.go 83.33% 3 Missing and 2 partials ⚠️
api/v1alpha1/validation/envoyproxy_validate.go 90.90% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6310      +/-   ##
==========================================
- Coverage   72.31%   71.15%   -1.16%     
==========================================
  Files         231      274      +43     
  Lines       34084    34857     +773     
==========================================
+ Hits        24647    24802     +155     
- Misses       7664     8260     +596     
- Partials     1773     1795      +22     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Inode1
Copy link
Contributor Author

Inode1 commented Jun 14, 2025

/retest

@Inode1 Inode1 marked this pull request as ready for review June 14, 2025 21:39
@arkodg
Copy link
Contributor

arkodg commented Jun 16, 2025

seeing the API being added to BTP, is the use case to enable this only on some routes or all ? If all is the most common use case, , then lets start with a field in EnvoyProxy and later one in BTP

@Inode1
Copy link
Contributor Author

Inode1 commented Jun 17, 2025

seeing the API being added to BTP, is the use case to enable this only on some routes or all ? If all is the most common use case, , then lets start with a field in EnvoyProxy and later one in BTP

Hi, I thought we agreed to add stats only in BTP), i.e., only for a specific route using the template. I don't quite understand how the settings of EnvoyProxy and BTP will work together. It seems that in the discussion it was decided that this should be opt-in, as enabling it for all routes through EnvoyProxy could lead to a large number of metrics, which might affect the performance of Envoy (and it's unclear how to disable the option at the EnvoyProxy level for a specific route). Additionally, with BTP, you can specify more specific things, such as the name of the route (routeStatPrefix: name=/route), as well as more concrete business-related aspects, like tenant, for example.

@arkodg
Copy link
Contributor

arkodg commented Jun 17, 2025

okay, lets start with BTP first

// %ROUTE_RULE_NAME%: name of the Gateway API xRoute section
// %ROUTE_RULE_NUMBER%: sequence number of the Gateway API xRoute resource
// Example: %ROUTE_KIND%/%ROUTE_NAMESPACE%/%ROUTE_NAME%/rule/%ROUTE_RULE_NUMBER% => httproute/my-ns/my-route/rule/0
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//
// Disabled by default.
//

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apply

type BackendMetrics struct {
// RouteStatName defines the value of the Route stat_prefix, determining how the route stats are named.
// For more details, see envoy docs: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto#config-route-v3-route
// The supported operators for this pattern are:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

routes can be attached to multiple gateways/parentRefs which will need to be represented here in case the user doesnt want to aggregate them into 1, and prefers per parent per route metrics
https://gateway-api.sigs.k8s.io/reference/spec/#httproutespec

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand your point, I thought this function was supposed to set the stats for the route in the gateways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/envoyproxy/gateway/blob/main/internal/gatewayapi/testdata/httproute-with-multiple-gateways-from-different-ns.out.yaml should help understand how 1 route can attach to multiple gateways
and this introduces two uses cases for metrics granularity that is mentioned in #6310 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to check and validate this somehow?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/envoyproxy/gateway/blob/main/internal/gatewayapi/testdata/httproute-with-multiple-gateways-from-different-ns.out.yaml should help understand how 1 route can attach to multiple gateways and this introduces two uses cases for metrics granularity that is mentioned in #6310 (comment)

It seems I understand what this is about. In the function https://github.com/envoyproxy/gateway/blob/26f9b20a7d5f3060f841787e677c0ed127adc759/internal/gatewayapi/backendtrafficpolicy.go#L664, there is processing for the Gateway, but here we have the IR, and we need to somehow map it to a resource to get the template parameters. We could do this via "resources", but that would be expensive — we'd have to go through all resources for every route, which is O(n²). Maybe we should reconsider the metadata field for all routes?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should have access to it indirectly via the IR listener name

func irListenerName(listener *ListenerContext) string {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking out loud, it may help to start off with an API within EnvoyProxy which is easier to build out its not tied to policy translation

// Example: %ROUTE_KIND%/%ROUTE_NAMESPACE%/%ROUTE_NAME%/rule/%ROUTE_RULE_NUMBER% => httproute/my-ns/my-route/rule/0
//
// +optional
RouteStatName string `json:"routeStatName,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must be a ptr since its optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but can you explain why it's like that? controllergen automatically makes string fields with omitempty optional. I understand that we might want to distinguish between a field being unset and being set to an empty "" in order to apply some default, but that's not our case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to generate additional stats if it's unset, so making it a ptr makes it opt in

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah "", may not always means empty, sometimes it could be a valid value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@@ -188,6 +188,10 @@ xdsIR:
name: tcproute/default/tcproute-1/rule/-1/backend/0
protocol: TCP
weight: 1
metadata:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its expensive to add this from a mem usage perspective, why this this needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explained below why the metadata is needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do it conditionally only if the feature is enabled? Maybe that also related to Arko's other comment about EnvoyProxy level feature as opposed to BTP.

// settings of upstream connection
BackendConnection *BackendConnection `json:"backendConnection,omitempty" yaml:"backendConnection,omitempty"`
// DNS is used to configure how DNS resolution is handled for the route
DNS *DNS `json:"dns,omitempty" yaml:"dns,omitempty"`
// Metrics defines the schema for metric stat configuration.
Metrics *egv1a1.BackendMetrics `json:"metrics,omitempty" yaml:"metrics,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my suggestion would be to convert BackendMetrics into RouteStat name in the gateway api layer similar to what ClusterStatName is doing, add add that as a sub field here

Copy link
Contributor Author

@Inode1 Inode1 Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, but Routes are processed before BTP is handled — see this line. So at the Route processing stage, I can't create RouteStats, and instead I store all the necessary route information in metadata. I could generate RouteStats in this place, but I still need additional route-related data like SectionName and RouteIndex, which are only available during Route processing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recommend using a StatName field in HTTPRoute to address this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recommend using a StatName field in HTTPRoute to address this

Sorry, I didn't understand you. Should StatName already contain the fully parsed and built stat_prefix field, or will it contain information about the route (SectionName, RuleIndex)? If it's the former, then we can do it that way, but we'll have to drop SectionName and RuleIndex—in that case, the metadata won't be needed either. Are you okay with that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should have access to RULE_NAME and RULE_NUMBER during route processing
but yes, any values coming from the policy cannot be used (today, we can do better in the future with some post processing if really needed)

@Inode1 Inode1 force-pushed the feat-per-route-metrics branch 2 times, most recently from 44d329f to 26f9b20 Compare June 19, 2025 06:55
@Inode1 Inode1 marked this pull request as draft June 19, 2025 08:48
@Inode1 Inode1 force-pushed the feat-per-route-metrics branch 2 times, most recently from 733185c to 47a2d99 Compare June 22, 2025 19:57
@Inode1 Inode1 marked this pull request as ready for review June 22, 2025 20:19
@Inode1 Inode1 requested a review from arkodg June 24, 2025 12:45
@Inode1
Copy link
Contributor Author

Inode1 commented Jun 30, 2025

/retest

@Inode1
Copy link
Contributor Author

Inode1 commented Jun 30, 2025

@arkodg PTAL

@arkodg arkodg requested a review from guydc June 30, 2025 20:32
@arkodg arkodg added this to the v1.5.0-rc.1 Release milestone Jun 30, 2025
@@ -430,6 +431,16 @@ func (t *Translator) translateBackendTrafficPolicyForRouteWithMerge(
return fmt.Errorf("error merging policies: %w", err)
}

if policy.Spec.Telemetry != nil && policy.Spec.Telemetry.Metrics != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this needs to be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup

@@ -739,6 +739,8 @@ type HTTPClientTimeout struct {
type HTTPRoute struct {
// Name of the HTTPRoute
Name string `json:"name" yaml:"name"`
// StatName is the name of the route used for statistics and metrics.
StatName string `json:"statName,omitempty" yaml:"statName,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be a ptr since its optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@arkodg arkodg modified the milestones: v1.6.0 Milestone, Backlog Oct 29, 2025
@arkodg
Copy link
Contributor

arkodg commented Oct 29, 2025

hey @Inode1 do you have a use case for using %ROUTE_RULE_NUMBER% instead of %ROUTE_SECTION_NAME% which is already available in the IR today ? my preference would be to avoid adding RuleIndex to the IR unless we absolutely need it

@Inode1 Inode1 force-pushed the feat-per-route-metrics branch 7 times, most recently from ad3f911 to 62eecee Compare November 2, 2025 17:53
@Inode1
Copy link
Contributor Author

Inode1 commented Nov 3, 2025

hey @Inode1 do you have a use case for using %ROUTE_RULE_NUMBER% instead of %ROUTE_SECTION_NAME% which is already available in the IR today ? my preference would be to avoid adding RuleIndex to the IR unless we absolutely need it

No, to be honest, I don't need formatters tags at all, just a custom stats name string).

Done, delete ROUTE_RULE_NUMBER

@arkodg
Copy link
Contributor

arkodg commented Nov 3, 2025

thanks @Inode1, will take a look at this PR next week, after v1.6 is out

@@ -955,6 +973,7 @@ func (t *Translator) translateBackendTrafficPolicyForGateway(
setIfNil(&r.TCPKeepalive, tf.TCPKeepalive)
setIfNil(&r.Timeout, tf.Timeout)
setIfNil(&r.DNS, tf.DNS)
r.StatName = buildRouteStatName(routeStatName, r.Metadata)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldnt this be setIfNil ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r.StatName is a value, not a pointer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PR sets setIfNil(&r.StatName, buildRouteStatName(routeStatName, r.Metadata)) in applyTrafficFeatureToRoute

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, you are right. Done

@@ -1002,6 +1021,7 @@ func (t *Translator) translateBackendTrafficPolicyForGateway(
continue
}

r.StatName = buildRouteStatName(routeStatName, r.Metadata)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldnt this use setIfNIl ?

@Inode1 Inode1 force-pushed the feat-per-route-metrics branch from dfea4b8 to 62439e0 Compare November 17, 2025 19:32
Copy link
Contributor

@arkodg arkodg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks

@arkodg arkodg requested review from a team November 17, 2025 21:27
@zirain zirain force-pushed the feat-per-route-metrics branch from 62439e0 to f7ae8bc Compare November 17, 2025 23:38
@zirain
Copy link
Member

zirain commented Nov 18, 2025

@Inode1 can you fix the DCO?

i.makarychev and others added 6 commits November 18, 2025 20:23
Signed-off-by: i.makarychev <[email protected]>
Signed-off-by: i.makarychev <[email protected]>
Signed-off-by: i.makarychev <[email protected]>
Signed-off-by: ivan <ivan>
Signed-off-by: i.makarychev <[email protected]>
Signed-off-by: i.makarychev <[email protected]>
Signed-off-by: i.makarychev <[email protected]>
Signed-off-by: i.makarychev <[email protected]>
Signed-off-by: i.makarychev <[email protected]>
@Inode1 Inode1 force-pushed the feat-per-route-metrics branch from f7ae8bc to 1e6d51f Compare November 18, 2025 17:23
@arkodg arkodg merged commit b086308 into envoyproxy:main Nov 18, 2025
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for route-level downstream metrics in Envoy Gateway

5 participants