Skip to content

Commit 1cbaf0e

Browse files
docs: add eg best practices (#59)
1 parent b5fc58f commit 1cbaf0e

File tree

3 files changed

+310
-0
lines changed

3 files changed

+310
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,7 @@ The `docs/` folder contains deep‑dive guidance, reference architectures, and o
142142
- [Argo CD Best Practices](docs/argocd-best-practices.md) – HA setup, custom labels, Redis/Valkey guidance, Gateway exposure, metrics scraping.
143143
- [Observability (Metrics, Logs, Traces)](docs/observability.md) – Thanos federation model, Elastic logging, Jaeger design, mTLS patterns, retention & ILM.
144144
- [Traffic Management (Gateway API, TLS, DNS)](docs/traffic-management.md) – Envoy Gateway deployment, certificate issuance flows, DNS automation, sync wave ordering.
145+
- [Envoy Gateway Best Practices](docs/envoy-gateway-best-practices.md) – Production patterns, certificate management, observability, route management, migration strategies.
145146
- [Policy & Compliance](docs/compliance.md) – Kyverno audit→enforce ladder, Checkov shift‑left scanning, exception handling strategy.
146147
- [Elasticsearch Best Practices](docs/elastic-best-practices.md) – Node role segregation, mTLS external access, heap sizing, ILM, GC & recovery tuning.
147148

docs/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ This directory contains comprehensive documentation for the Kubernetes platform.
3535

3636
### Traffic Management
3737
- [Traffic Management Guide](traffic-management.md) - Envoy Gateway, TLS certificates, DNS automation
38+
- [Envoy Gateway Best Practices](envoy-gateway-best-practices.md) - Production patterns, certificate management, observability, route management
3839

3940
### GitOps & Operations
4041
- [Argo CD Best Practices](argocd-best-practices.md) - HA setup, custom labels, Redis, Gateway exposure, metrics
Lines changed: 308 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,308 @@
1+
# Envoy Gateway Best Practices
2+
3+
This document outlines best practices for using Envoy Gateway in production environments, covering architecture patterns, certificate management, observability, and operational considerations.
4+
5+
## Table of Contents
6+
7+
- [Gateway Architecture Patterns](#gateway-architecture-patterns)
8+
- [Certificate Management](#certificate-management)
9+
- [HTTP to HTTPS Redirect](#http-to-https-redirect)
10+
- [Observability & Monitoring](#observability--monitoring)
11+
- [Route Management Strategy](#route-management-strategy)
12+
- [Security Considerations](#security-considerations)
13+
- [Troubleshooting](#troubleshooting)
14+
15+
## Gateway Architecture Patterns
16+
17+
### Shared Gateways vs. Dedicated Gateways
18+
19+
When migrating from NGINX Ingress or designing a new platform, consider using **shared Gateways** rather than creating dedicated Gateways per application.
20+
21+
**Benefits of Shared Gateways:**
22+
- **Simplified management**: Single Gateway resource to manage per cluster/environment
23+
- **Consistent TLS configuration**: All routes inherit the same certificate and TLS settings
24+
- **Reduced resource overhead**: One Gateway instance handles multiple routes
25+
- **Easier DNS management**: Single DNS entry per Gateway
26+
- **Centralized policy application**: Apply rate limiting, WAF, and other policies at Gateway level
27+
28+
**When to Use Dedicated Gateways:**
29+
- Different TLS requirements per application
30+
- Isolation requirements (separate Gateway for sensitive workloads)
31+
- Different listener configurations (ports, protocols)
32+
33+
### Merged Gateways
34+
35+
Envoy Gateway supports **Merged Gateways** when different Gateway resources are needed but a single network load balancer will service all of them. This is useful when:
36+
- Multiple teams need separate Gateway resources for organizational boundaries
37+
- Different Gateway configurations are required but share the same infrastructure
38+
- You want to maintain separation while optimizing resource usage
39+
40+
With Merged Gateways, Envoy Gateway merges multiple Gateway resources into a single Envoy proxy instance, reducing resource overhead while maintaining logical separation.
41+
42+
**Recommended Pattern:**
43+
```yaml
44+
# One Gateway per environment/cluster
45+
apiVersion: gateway.networking.k8s.io/v1
46+
kind: Gateway
47+
metadata:
48+
name: shared-gateway
49+
namespace: envoy-gateway-system
50+
spec:
51+
gatewayClassName: envoy
52+
listeners:
53+
- name: https
54+
protocol: HTTPS
55+
port: 443
56+
hostname: "*.example.com"
57+
tls:
58+
mode: Terminate
59+
certificateRefs:
60+
- name: wildcard-cert
61+
namespace: envoy-gateway-system
62+
```
63+
64+
Applications then reference this Gateway via `parentRefs` in their HTTPRoute resources.
65+
66+
## Certificate Management
67+
68+
### Prefer Certificate CRs Over Annotations
69+
70+
While cert-manager supports inline annotations on Gateway resources, **prefer using Certificate Custom Resources** for better lifecycle management.
71+
72+
**Why Certificate CRs:**
73+
- **Better observability**: Certificate status conditions show issuance progress
74+
- **Clearer Git diffs**: Certificate resources are explicit and version-controlled
75+
- **Easier troubleshooting**: Certificate events and status are visible in kubectl
76+
- **Renewal visibility**: Certificate renewal windows and status are trackable
77+
78+
**Pattern:**
79+
```yaml
80+
# 1. Create Certificate CR
81+
apiVersion: cert-manager.io/v1
82+
kind: Certificate
83+
metadata:
84+
name: wildcard-example-com
85+
namespace: envoy-gateway-system
86+
spec:
87+
secretName: wildcard-example-com-tls
88+
issuerRef:
89+
name: letsencrypt-prod
90+
kind: ClusterIssuer
91+
dnsNames:
92+
- "*.example.com"
93+
- "example.com"
94+
95+
# 2. Reference in Gateway
96+
apiVersion: gateway.networking.k8s.io/v1
97+
kind: Gateway
98+
spec:
99+
listeners:
100+
- name: https
101+
tls:
102+
certificateRefs:
103+
- name: wildcard-example-com-tls # References the secret created by Certificate CR
104+
namespace: envoy-gateway-system
105+
```
106+
107+
### Certificate Replication with Reflector
108+
109+
When certificates need to be available in multiple namespaces, use [kubernetes-reflector](https://github.com/emberstack/kubernetes-reflector) to replicate secrets across namespaces.
110+
111+
Reflector is a Kubernetes controller that can replicate secrets, configmaps, and certificates to multiple namespaces. This avoids creating multiple Certificate CRs and ensures all namespaces use the same certificate.
112+
113+
For detailed usage and configuration, see the [kubernetes-reflector documentation](https://github.com/emberstack/kubernetes-reflector).
114+
115+
## HTTP to HTTPS Redirect
116+
117+
### Enforce HTTPS by Default
118+
119+
All HTTP traffic should be redirected to HTTPS for security. Configure this at the Gateway level:
120+
121+
```yaml
122+
apiVersion: gateway.networking.k8s.io/v1
123+
kind: Gateway
124+
metadata:
125+
name: shared-gateway
126+
spec:
127+
listeners:
128+
- name: http
129+
protocol: HTTP
130+
port: 80
131+
- name: https
132+
protocol: HTTPS
133+
port: 443
134+
hostname: "*.example.com"
135+
tls:
136+
mode: Terminate
137+
certificateRefs:
138+
- name: wildcard-cert
139+
```
140+
141+
Then create an HTTPRoute that redirects all HTTP traffic to HTTPS (no hostname restriction):
142+
143+
```yaml
144+
apiVersion: gateway.networking.k8s.io/v1
145+
kind: HTTPRoute
146+
metadata:
147+
name: http-to-https-redirect
148+
spec:
149+
parentRefs:
150+
- name: shared-gateway
151+
sectionName: http
152+
rules:
153+
- filters:
154+
- type: RequestRedirect
155+
requestRedirect:
156+
scheme: https
157+
statusCode: 301
158+
```
159+
160+
This redirects all HTTP traffic to HTTPS regardless of hostname.
161+
162+
**Alternative:** Use Envoy Gateway's built-in redirect capability if available in your version.
163+
164+
## Observability & Monitoring
165+
166+
### Prometheus Service Discovery Limitation
167+
168+
**Important:** Prometheus does not natively recognize Gateway API route resources (HTTPRoute, GRPCRoute) for service discovery. This affects tools like Blackbox Exporter that rely on Prometheus service discovery.
169+
170+
**Solution: Annotate Services Instead**
171+
172+
For Blackbox Exporter to discover endpoints, annotate the **Service resources** that back your routes:
173+
174+
```yaml
175+
apiVersion: v1
176+
kind: Service
177+
metadata:
178+
name: my-app-service
179+
annotations:
180+
prometheus.io/scrape: "true"
181+
prometheus.io/blackbox: "true"
182+
prometheus.io/blackbox-module: "http_2xx"
183+
prometheus.io/path: "/health"
184+
prometheus.io/port: "8080"
185+
spec:
186+
ports:
187+
- port: 8080
188+
targetPort: 8080
189+
```
190+
191+
Blackbox Exporter will discover these annotated services and probe them, regardless of whether they're exposed via Gateway API or traditional Ingress.
192+
193+
**Best Practice:**
194+
- Add annotations to Services in application charts
195+
- Use consistent annotation patterns for auto-discovery
196+
- Document annotation requirements in application chart READMEs
197+
198+
### Metrics and Tracing
199+
200+
- **Envoy metrics**: Exposed on the Envoy Gateway service (port 19000 by default)
201+
- **Custom Resource Monitoring**: Configure kube-state-metrics custom resource state monitoring for Gateway API resources (Gateway, HTTPRoute, GRPCRoute) to track resource state and status
202+
- **Distributed tracing**: Configure OpenTelemetry in Envoy Gateway for request tracing
203+
204+
## Route Management Strategy
205+
206+
### Separation of Concerns
207+
208+
**Current Pattern:**
209+
- **Gateway resources**: Managed in `envoy-gateway` chart (platform team)
210+
- **HTTPRoute/GRPCRoute resources**: Managed in application charts (application teams)
211+
212+
**Challenge:**
213+
Adding specific listeners or Gateway configurations for a service requires changes to both:
214+
1. The `envoy-gateway` chart (to add listener)
215+
2. The application chart (to create route)
216+
217+
This creates coordination overhead and potential conflicts.
218+
219+
### Future: ListenerSet Functionality
220+
221+
**ListenerSet** (coming soon, see [GEP-1713](https://gateway-api.sigs.k8s.io/geps/gep-1713/)) will allow application teams to define listeners without modifying the Gateway resource.
222+
223+
For the complete specification and YAML examples, see [GEP-1713: ListenerSet](https://gateway-api.sigs.k8s.io/geps/gep-1713/#yaml).
224+
225+
**Current Workaround:**
226+
- Use shared Gateway with standard listeners (80, 443)
227+
- Route differentiation via hostnames and paths
228+
- For special requirements, coordinate Gateway changes through platform team
229+
230+
**Best Practice:**
231+
- Prefer hostname/path-based routing over custom listeners
232+
- Document any custom listener requirements
233+
- Plan for ListenerSet adoption when available
234+
235+
## Security Considerations
236+
237+
### TLS Configuration
238+
239+
- **TLS 1.2 minimum**: Configure minimum TLS version
240+
- **Strong cipher suites**: Use modern cipher suites only
241+
- **Certificate rotation**: Ensure certificates auto-renew via cert-manager
242+
- **mTLS for east-west**: Configure mTLS for service-to-service communication
243+
244+
### Rate Limiting
245+
246+
Apply rate limiting at Gateway or Route level using ExtensionPolicy:
247+
248+
```yaml
249+
apiVersion: gateway.envoyproxy.io/v1alpha1
250+
kind: ExtensionPolicy
251+
metadata:
252+
name: rate-limit-policy
253+
spec:
254+
targetRef:
255+
group: gateway.networking.k8s.io
256+
kind: HTTPRoute
257+
name: my-app-route
258+
rateLimit:
259+
rules:
260+
- limit:
261+
requests: 100
262+
unit: Minute
263+
```
264+
265+
### Network Policies
266+
267+
- Restrict Gateway access to only necessary namespaces
268+
- Use NetworkPolicies to limit east-west traffic
269+
- Isolate Gateway control plane from workloads
270+
271+
## Troubleshooting
272+
273+
### Common Issues
274+
275+
| Issue | Cause | Solution |
276+
|-------|-------|----------|
277+
| Route not accessible | Gateway not ready | Check Gateway status: `kubectl get gateway` |
278+
| TLS handshake fails | Certificate not issued | Check Certificate CR status: `kubectl get certificate` |
279+
| 404 errors | Hostname mismatch | Verify HTTPRoute hostnames match Gateway hostname |
280+
| DNS not resolving | External-DNS not syncing | Check external-dns logs and Gateway annotations |
281+
| Metrics missing | Service not annotated | Add Prometheus annotations to Service resources |
282+
283+
### Debugging Commands
284+
285+
```bash
286+
# Check Gateway status
287+
kubectl get gateway -A
288+
289+
# Check HTTPRoute status
290+
kubectl get httproute -A
291+
292+
# View Envoy configuration
293+
kubectl exec -n envoy-gateway-system <envoy-pod> -- curl localhost:19000/config_dump
294+
295+
# Check certificate status
296+
kubectl get certificate -A
297+
298+
# View Gateway API events
299+
kubectl get events --field-selector involvedObject.kind=Gateway
300+
```
301+
302+
## References
303+
304+
- [Kubernetes Gateway API Specification](https://gateway-api.sigs.k8s.io/)
305+
- [Envoy Gateway Documentation](https://gateway.envoyproxy.io/)
306+
- [Traffic Management Guide](traffic-management.md) - Platform-specific traffic management patterns
307+
- Repository: `charts/envoy-gateway/`
308+

0 commit comments

Comments
 (0)