Skip to content

fix(storage): support FQDN trailing dot in storage backends via custom transport for Kubernetes ndots=5#6641

Open
priyendang wants to merge 1 commit intografana:mainfrom
priyendang:fix/issue-1726-fqdn-trailing-dot
Open

fix(storage): support FQDN trailing dot in storage backends via custom transport for Kubernetes ndots=5#6641
priyendang wants to merge 1 commit intografana:mainfrom
priyendang:fix/issue-1726-fqdn-trailing-dot

Conversation

@priyendang
Copy link

@priyendang priyendang commented Mar 10, 2026

What this PR does

Fixes #1726

Adds a custom fqdnTransport (http.RoundTripper) to all three storage
backends (Azure, S3, GCS) that strips the trailing dot from the Host
header only, while keeping the trailing dot in the URL so that DNS
resolution uses the fully qualified domain name.


Problem

Kubernetes sets ndots=5 in every Pod's /etc/resolv.conf. Any hostname
with fewer than 5 dots (e.g. blob.core.windows.net) is searched locally
first before a real DNS lookup, causing:

  • Up to 11 failed DNS lookups per storage API call
  • Added read/write latency on every Tempo operation
  • Pressure on the conntrack table and extra load on kube-dns
  • DNS lookup failures at scale (see also: Azure DNS Lookup Failures #1462)

Solution

A custom fqdnTransport wraps the existing HTTP transport stack on all
three backends. On every request it:

  1. Keeps the trailing dot in the URL → the HTTP client dials using
    blob.core.windows.net. → kube-dns recognises the FQDN and skips
    local search entirely
  2. Strips the trailing dot from the Host header only → the cloud
    API receives a valid host header with no trailing dot

This is the correct split — req.URL.Host drives DNS resolution while
req.Host drives the Host: header sent over the wire.

Backend API rejects trailing dot in Host with Config field
Azure HTTP 400 endpoint_suffix
S3 HTTP 404 endpoint
GCS HTTP 301 redirect bucket_name

Changes

File What changed
tempodb/backend/azure/azure_helpers.go Added fqdnTransport; cfg.Endpoint passed as-is to URL (dot retained)
tempodb/backend/s3/s3.go Added fqdnTransport; cfg.Endpoint passed as-is to minio (dot retained)
tempodb/backend/gcs/gcs.go Added fqdnTransport; cfg.BucketName passed as-is to GCS client (dot retained)
tempodb/backend/azure/azure_helpers_test.go Tests verifying Host header has dot stripped; URL retains dot
tempodb/backend/s3/s3_test.go Tests verifying fqdnTransport strips dot from Host header only
tempodb/backend/gcs/gcs_test.go Tests verifying fqdnTransport strips dot from Host header only

Example config

# Azure
storage:
  trace:
    backend: azure
    azure:
      endpoint_suffix: blob.core.windows.net.   # trailing dot = FQDN

# S3
storage:
  trace:
    backend: s3
    s3:
      endpoint: s3.amazonaws.com.               # trailing dot = FQDN

# GCS
storage:
  trace:
    backend: gcs
    gcs:
      bucket_name: my-tempo-traces.             # trailing dot = FQDN

Backward compatibility

  • fqdnTransport.RoundTrip is a no-op when the Host header has no trailing dot
  • Existing users without trailing dots are completely unaffected
  • No config changes required for non-Kubernetes deployments

Testing

go test -short ./tempodb/backend/azure/...   # ok
go test -short ./tempodb/backend/s3/...      # ok
go test -short ./tempodb/backend/gcs/...     # ok

### New Commit Message (amend):

fix(storage): support FQDN trailing dot via custom transport for Kubernetes ndots=5

On Kubernetes (ndots=5), storage hostnames trigger up to 11 spurious
DNS lookups per API call. Users can configure a trailing dot on their
storage endpoint/bucket to signal a fully qualified domain name (FQDN),
which causes kube-dns to skip local search entirely.

This adds fqdnTransport (http.RoundTripper) to all three storage backends:

  • Keeps the trailing dot in the URL so DNS resolves using the FQDN
  • Strips the trailing dot from the Host header only, since Azure (400),
    S3 (404) and GCS (301) all reject a trailing dot in the Host header

The dot is NOT stripped from cfg.Endpoint/cfg.BucketName upfront —
that was the previous (incorrect) approach which lost the DNS benefit.

Fixes #1726

@cla-assistant
Copy link

cla-assistant bot commented Mar 10, 2026

CLA assistant check
All committers have signed the CLA.

@priyendang priyendang marked this pull request as draft March 10, 2026 01:29
@priyendang priyendang force-pushed the fix/issue-1726-fqdn-trailing-dot branch from 8cbf259 to d017862 Compare March 11, 2026 22:18
@priyendang priyendang changed the title fix(storage): strip trailing dot from backend hostnames to support FQDN on Kubernetes fix(storage): support FQDN trailing dot in storage backends via custom transport for Kubernetes ndots=5 Mar 11, 2026
Kubernetes ndots=5

On Kubernetes (ndots=5), storage hostnames trigger up to 11 spurious
DNS lookups per API call. Users can configure a trailing dot on their
storage endpoint/bucket to signal a fully qualified domain name (FQDN),
which causes kube-dns to skip local search entirely.

This adds fqdnTransport (http.RoundTripper) to all three storage
backends:
- Keeps the trailing dot in the URL so DNS resolves using the FQDN
- Strips the trailing dot from the Host header only, since Azure (400),
  S3 (404) and GCS (301) all reject a trailing dot in the Host header

  The dot is NOT stripped from cfg.Endpoint/cfg.BucketName upfront —
  that was the previous (incorrect) approach which lost the DNS benefit.

  Fixes grafana#1726
@priyendang priyendang force-pushed the fix/issue-1726-fqdn-trailing-dot branch from d017862 to bade102 Compare March 11, 2026 22:30
@priyendang priyendang marked this pull request as ready for review March 11, 2026 22:30
@priyendang
Copy link
Author

CI / RUN Unit Tests (test-with-cover-others) check: - The CI failure in TestBlockbuilder_flushingFails is a flaky temp directory cleanup issue in modules/blockbuilder — unrelated to this PR which only touches tempodb/backend/{azure,s3,gcs}. No files in modules/blockbuilder were modified.

Copy link
Member

@electron0zero electron0zero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the core change looks reasonable but there are lots of unrelated and stray changes like comments are removed, etc.

Sidenote: It does look like LLMs were used in producing this PR. LLM usage is okay but please review the code before you create the PR and ensure that it makes sense. sending unreviewed LLM output creates extra burden on the maintainers.

maxRetries = 1
)

// fqdnTransport is a custom http.RoundTripper that strips the trailing dot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this comment be less verbose and more focused


// Use cfg.Endpoint as-is (trailing dot preserved) so the URL carries the
// FQDN for correct DNS resolution. The fqdnTransport above strips the dot
// from the Host header before the request is sent to Azure.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this comment makes sense here? there is not code change here?


return accountKey
}
} No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no trailing \n

endpoint: "blob.core.cloudapi.de",
expectedURL: "https://devstoreaccount1.blob.core.cloudapi.de/traces",
},
// FQDN test cases for Kubernetes ndots=5 support (issue #1726).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

less verbose comment?

"Host header must have trailing dot stripped before sending to server")
})
}
} No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no trailing \n

httpHandler: func(t *testing.T) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
if r.Method == "GET" {
_, _ = w.Write([]byte(`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this diff seems to be unrelated? can you explain this diff?

{
"kind": "storage#objects",
"items": [{
"kind": "storage#object",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here and other places across this test.

wrapped http.RoundTripper
}

func (t *fqdnTransport) RoundTrip(req *http.Request) (*http.Response, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is duplicated across all the backend? I think it's better to move it into a common place and reuse across the backends.

}

func (rw *readerWriter) DeleteVersioned(ctx context.Context, name string, keypath backend.KeyPath, version backend.Version) error {
// Note there is a potential data race here because S3 does not support conditional headers. If
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also here? why was this comment removed?

SecretKey: flagext.SecretWithValue("test"),
Bucket: "blerg",
Insecure: true,
Endpoint: server.URL[7:], // [7:] -> strip http://
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this comment removed???

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use fully qualifed domain name for storage backends when running on Kubernetes

2 participants