fix(storage): support FQDN trailing dot in storage backends via custom transport for Kubernetes ndots=5#6641
Conversation
8cbf259 to
d017862
Compare
Kubernetes ndots=5 On Kubernetes (ndots=5), storage hostnames trigger up to 11 spurious DNS lookups per API call. Users can configure a trailing dot on their storage endpoint/bucket to signal a fully qualified domain name (FQDN), which causes kube-dns to skip local search entirely. This adds fqdnTransport (http.RoundTripper) to all three storage backends: - Keeps the trailing dot in the URL so DNS resolves using the FQDN - Strips the trailing dot from the Host header only, since Azure (400), S3 (404) and GCS (301) all reject a trailing dot in the Host header The dot is NOT stripped from cfg.Endpoint/cfg.BucketName upfront — that was the previous (incorrect) approach which lost the DNS benefit. Fixes grafana#1726
d017862 to
bade102
Compare
|
CI / RUN Unit Tests (test-with-cover-others) check: - The CI failure in TestBlockbuilder_flushingFails is a flaky temp directory cleanup issue in modules/blockbuilder — unrelated to this PR which only touches tempodb/backend/{azure,s3,gcs}. No files in modules/blockbuilder were modified. |
electron0zero
left a comment
There was a problem hiding this comment.
the core change looks reasonable but there are lots of unrelated and stray changes like comments are removed, etc.
Sidenote: It does look like LLMs were used in producing this PR. LLM usage is okay but please review the code before you create the PR and ensure that it makes sense. sending unreviewed LLM output creates extra burden on the maintainers.
| maxRetries = 1 | ||
| ) | ||
|
|
||
| // fqdnTransport is a custom http.RoundTripper that strips the trailing dot |
There was a problem hiding this comment.
can this comment be less verbose and more focused
|
|
||
| // Use cfg.Endpoint as-is (trailing dot preserved) so the URL carries the | ||
| // FQDN for correct DNS resolution. The fqdnTransport above strips the dot | ||
| // from the Host header before the request is sent to Azure. |
There was a problem hiding this comment.
not sure if this comment makes sense here? there is not code change here?
|
|
||
| return accountKey | ||
| } | ||
| } No newline at end of file |
| endpoint: "blob.core.cloudapi.de", | ||
| expectedURL: "https://devstoreaccount1.blob.core.cloudapi.de/traces", | ||
| }, | ||
| // FQDN test cases for Kubernetes ndots=5 support (issue #1726). |
| "Host header must have trailing dot stripped before sending to server") | ||
| }) | ||
| } | ||
| } No newline at end of file |
| httpHandler: func(t *testing.T) http.HandlerFunc { | ||
| return func(w http.ResponseWriter, r *http.Request) { | ||
| if r.Method == "GET" { | ||
| _, _ = w.Write([]byte(` |
There was a problem hiding this comment.
this diff seems to be unrelated? can you explain this diff?
| { | ||
| "kind": "storage#objects", | ||
| "items": [{ | ||
| "kind": "storage#object", |
There was a problem hiding this comment.
same here and other places across this test.
| wrapped http.RoundTripper | ||
| } | ||
|
|
||
| func (t *fqdnTransport) RoundTrip(req *http.Request) (*http.Response, error) { |
There was a problem hiding this comment.
this code is duplicated across all the backend? I think it's better to move it into a common place and reuse across the backends.
| } | ||
|
|
||
| func (rw *readerWriter) DeleteVersioned(ctx context.Context, name string, keypath backend.KeyPath, version backend.Version) error { | ||
| // Note there is a potential data race here because S3 does not support conditional headers. If |
There was a problem hiding this comment.
also here? why was this comment removed?
| SecretKey: flagext.SecretWithValue("test"), | ||
| Bucket: "blerg", | ||
| Insecure: true, | ||
| Endpoint: server.URL[7:], // [7:] -> strip http:// |
There was a problem hiding this comment.
why was this comment removed???
What this PR does
Fixes #1726
Adds a custom
fqdnTransport(http.RoundTripper) to all three storagebackends (Azure, S3, GCS) that strips the trailing dot from the
Hostheader only, while keeping the trailing dot in the URL so that DNS
resolution uses the fully qualified domain name.
Problem
Kubernetes sets
ndots=5in every Pod's/etc/resolv.conf. Any hostnamewith fewer than 5 dots (e.g.
blob.core.windows.net) is searched locallyfirst before a real DNS lookup, causing:
Solution
A custom
fqdnTransportwraps the existing HTTP transport stack on allthree backends. On every request it:
blob.core.windows.net.→ kube-dns recognises the FQDN and skipslocal search entirely
Hostheader only → the cloudAPI receives a valid host header with no trailing dot
This is the correct split —
req.URL.Hostdrives DNS resolution whilereq.Hostdrives theHost:header sent over the wire.endpoint_suffixendpointbucket_nameChanges
tempodb/backend/azure/azure_helpers.gofqdnTransport;cfg.Endpointpassed as-is to URL (dot retained)tempodb/backend/s3/s3.gofqdnTransport;cfg.Endpointpassed as-is to minio (dot retained)tempodb/backend/gcs/gcs.gofqdnTransport;cfg.BucketNamepassed as-is to GCS client (dot retained)tempodb/backend/azure/azure_helpers_test.gotempodb/backend/s3/s3_test.gotempodb/backend/gcs/gcs_test.goExample config
Backward compatibility
fqdnTransport.RoundTripis a no-op when the Host header has no trailing dotTesting
fix(storage): support FQDN trailing dot via custom transport for Kubernetes ndots=5
On Kubernetes (ndots=5), storage hostnames trigger up to 11 spurious
DNS lookups per API call. Users can configure a trailing dot on their
storage endpoint/bucket to signal a fully qualified domain name (FQDN),
which causes kube-dns to skip local search entirely.
This adds fqdnTransport (http.RoundTripper) to all three storage backends:
S3 (404) and GCS (301) all reject a trailing dot in the Host header
The dot is NOT stripped from cfg.Endpoint/cfg.BucketName upfront —
that was the previous (incorrect) approach which lost the DNS benefit.
Fixes #1726