|
| 1 | +# Envoy Gateway Best Practices |
| 2 | + |
| 3 | +This document outlines best practices for using Envoy Gateway in production environments, covering architecture patterns, certificate management, observability, and operational considerations. |
| 4 | + |
| 5 | +## Table of Contents |
| 6 | + |
| 7 | +- [Gateway Architecture Patterns](#gateway-architecture-patterns) |
| 8 | +- [Certificate Management](#certificate-management) |
| 9 | +- [HTTP to HTTPS Redirect](#http-to-https-redirect) |
| 10 | +- [Observability & Monitoring](#observability--monitoring) |
| 11 | +- [Route Management Strategy](#route-management-strategy) |
| 12 | +- [Security Considerations](#security-considerations) |
| 13 | +- [Troubleshooting](#troubleshooting) |
| 14 | + |
| 15 | +## Gateway Architecture Patterns |
| 16 | + |
| 17 | +### Shared Gateways vs. Dedicated Gateways |
| 18 | + |
| 19 | +When migrating from NGINX Ingress or designing a new platform, consider using **shared Gateways** rather than creating dedicated Gateways per application. |
| 20 | + |
| 21 | +**Benefits of Shared Gateways:** |
| 22 | +- **Simplified management**: Single Gateway resource to manage per cluster/environment |
| 23 | +- **Consistent TLS configuration**: All routes inherit the same certificate and TLS settings |
| 24 | +- **Reduced resource overhead**: One Gateway instance handles multiple routes |
| 25 | +- **Easier DNS management**: Single DNS entry per Gateway |
| 26 | +- **Centralized policy application**: Apply rate limiting, WAF, and other policies at Gateway level |
| 27 | + |
| 28 | +**When to Use Dedicated Gateways:** |
| 29 | +- Different TLS requirements per application |
| 30 | +- Isolation requirements (separate Gateway for sensitive workloads) |
| 31 | +- Different listener configurations (ports, protocols) |
| 32 | + |
| 33 | +### Merged Gateways |
| 34 | + |
| 35 | +Envoy Gateway supports **Merged Gateways** when different Gateway resources are needed but a single network load balancer will service all of them. This is useful when: |
| 36 | +- Multiple teams need separate Gateway resources for organizational boundaries |
| 37 | +- Different Gateway configurations are required but share the same infrastructure |
| 38 | +- You want to maintain separation while optimizing resource usage |
| 39 | + |
| 40 | +With Merged Gateways, Envoy Gateway merges multiple Gateway resources into a single Envoy proxy instance, reducing resource overhead while maintaining logical separation. |
| 41 | + |
| 42 | +**Recommended Pattern:** |
| 43 | +```yaml |
| 44 | +# One Gateway per environment/cluster |
| 45 | +apiVersion: gateway.networking.k8s.io/v1 |
| 46 | +kind: Gateway |
| 47 | +metadata: |
| 48 | + name: shared-gateway |
| 49 | + namespace: envoy-gateway-system |
| 50 | +spec: |
| 51 | + gatewayClassName: envoy |
| 52 | + listeners: |
| 53 | + - name: https |
| 54 | + protocol: HTTPS |
| 55 | + port: 443 |
| 56 | + hostname: "*.example.com" |
| 57 | + tls: |
| 58 | + mode: Terminate |
| 59 | + certificateRefs: |
| 60 | + - name: wildcard-cert |
| 61 | + namespace: envoy-gateway-system |
| 62 | +``` |
| 63 | +
|
| 64 | +Applications then reference this Gateway via `parentRefs` in their HTTPRoute resources. |
| 65 | + |
| 66 | +## Certificate Management |
| 67 | + |
| 68 | +### Prefer Certificate CRs Over Annotations |
| 69 | + |
| 70 | +While cert-manager supports inline annotations on Gateway resources, **prefer using Certificate Custom Resources** for better lifecycle management. |
| 71 | + |
| 72 | +**Why Certificate CRs:** |
| 73 | +- **Better observability**: Certificate status conditions show issuance progress |
| 74 | +- **Clearer Git diffs**: Certificate resources are explicit and version-controlled |
| 75 | +- **Easier troubleshooting**: Certificate events and status are visible in kubectl |
| 76 | +- **Renewal visibility**: Certificate renewal windows and status are trackable |
| 77 | + |
| 78 | +**Pattern:** |
| 79 | +```yaml |
| 80 | +# 1. Create Certificate CR |
| 81 | +apiVersion: cert-manager.io/v1 |
| 82 | +kind: Certificate |
| 83 | +metadata: |
| 84 | + name: wildcard-example-com |
| 85 | + namespace: envoy-gateway-system |
| 86 | +spec: |
| 87 | + secretName: wildcard-example-com-tls |
| 88 | + issuerRef: |
| 89 | + name: letsencrypt-prod |
| 90 | + kind: ClusterIssuer |
| 91 | + dnsNames: |
| 92 | + - "*.example.com" |
| 93 | + - "example.com" |
| 94 | +
|
| 95 | +# 2. Reference in Gateway |
| 96 | +apiVersion: gateway.networking.k8s.io/v1 |
| 97 | +kind: Gateway |
| 98 | +spec: |
| 99 | + listeners: |
| 100 | + - name: https |
| 101 | + tls: |
| 102 | + certificateRefs: |
| 103 | + - name: wildcard-example-com-tls # References the secret created by Certificate CR |
| 104 | + namespace: envoy-gateway-system |
| 105 | +``` |
| 106 | + |
| 107 | +### Certificate Replication with Reflector |
| 108 | + |
| 109 | +When certificates need to be available in multiple namespaces, use [kubernetes-reflector](https://github.com/emberstack/kubernetes-reflector) to replicate secrets across namespaces. |
| 110 | + |
| 111 | +Reflector is a Kubernetes controller that can replicate secrets, configmaps, and certificates to multiple namespaces. This avoids creating multiple Certificate CRs and ensures all namespaces use the same certificate. |
| 112 | + |
| 113 | +For detailed usage and configuration, see the [kubernetes-reflector documentation](https://github.com/emberstack/kubernetes-reflector). |
| 114 | + |
| 115 | +## HTTP to HTTPS Redirect |
| 116 | + |
| 117 | +### Enforce HTTPS by Default |
| 118 | + |
| 119 | +All HTTP traffic should be redirected to HTTPS for security. Configure this at the Gateway level: |
| 120 | + |
| 121 | +```yaml |
| 122 | +apiVersion: gateway.networking.k8s.io/v1 |
| 123 | +kind: Gateway |
| 124 | +metadata: |
| 125 | + name: shared-gateway |
| 126 | +spec: |
| 127 | + listeners: |
| 128 | + - name: http |
| 129 | + protocol: HTTP |
| 130 | + port: 80 |
| 131 | + - name: https |
| 132 | + protocol: HTTPS |
| 133 | + port: 443 |
| 134 | + hostname: "*.example.com" |
| 135 | + tls: |
| 136 | + mode: Terminate |
| 137 | + certificateRefs: |
| 138 | + - name: wildcard-cert |
| 139 | +``` |
| 140 | + |
| 141 | +Then create an HTTPRoute that redirects all HTTP traffic to HTTPS (no hostname restriction): |
| 142 | + |
| 143 | +```yaml |
| 144 | +apiVersion: gateway.networking.k8s.io/v1 |
| 145 | +kind: HTTPRoute |
| 146 | +metadata: |
| 147 | + name: http-to-https-redirect |
| 148 | +spec: |
| 149 | + parentRefs: |
| 150 | + - name: shared-gateway |
| 151 | + sectionName: http |
| 152 | + rules: |
| 153 | + - filters: |
| 154 | + - type: RequestRedirect |
| 155 | + requestRedirect: |
| 156 | + scheme: https |
| 157 | + statusCode: 301 |
| 158 | +``` |
| 159 | + |
| 160 | +This redirects all HTTP traffic to HTTPS regardless of hostname. |
| 161 | + |
| 162 | +**Alternative:** Use Envoy Gateway's built-in redirect capability if available in your version. |
| 163 | + |
| 164 | +## Observability & Monitoring |
| 165 | + |
| 166 | +### Prometheus Service Discovery Limitation |
| 167 | + |
| 168 | +**Important:** Prometheus does not natively recognize Gateway API route resources (HTTPRoute, GRPCRoute) for service discovery. This affects tools like Blackbox Exporter that rely on Prometheus service discovery. |
| 169 | + |
| 170 | +**Solution: Annotate Services Instead** |
| 171 | + |
| 172 | +For Blackbox Exporter to discover endpoints, annotate the **Service resources** that back your routes: |
| 173 | + |
| 174 | +```yaml |
| 175 | +apiVersion: v1 |
| 176 | +kind: Service |
| 177 | +metadata: |
| 178 | + name: my-app-service |
| 179 | + annotations: |
| 180 | + prometheus.io/scrape: "true" |
| 181 | + prometheus.io/blackbox: "true" |
| 182 | + prometheus.io/blackbox-module: "http_2xx" |
| 183 | + prometheus.io/path: "/health" |
| 184 | + prometheus.io/port: "8080" |
| 185 | +spec: |
| 186 | + ports: |
| 187 | + - port: 8080 |
| 188 | + targetPort: 8080 |
| 189 | +``` |
| 190 | + |
| 191 | +Blackbox Exporter will discover these annotated services and probe them, regardless of whether they're exposed via Gateway API or traditional Ingress. |
| 192 | + |
| 193 | +**Best Practice:** |
| 194 | +- Add annotations to Services in application charts |
| 195 | +- Use consistent annotation patterns for auto-discovery |
| 196 | +- Document annotation requirements in application chart READMEs |
| 197 | + |
| 198 | +### Metrics and Tracing |
| 199 | + |
| 200 | +- **Envoy metrics**: Exposed on the Envoy Gateway service (port 19000 by default) |
| 201 | +- **Custom Resource Monitoring**: Configure kube-state-metrics custom resource state monitoring for Gateway API resources (Gateway, HTTPRoute, GRPCRoute) to track resource state and status |
| 202 | +- **Distributed tracing**: Configure OpenTelemetry in Envoy Gateway for request tracing |
| 203 | + |
| 204 | +## Route Management Strategy |
| 205 | + |
| 206 | +### Separation of Concerns |
| 207 | + |
| 208 | +**Current Pattern:** |
| 209 | +- **Gateway resources**: Managed in `envoy-gateway` chart (platform team) |
| 210 | +- **HTTPRoute/GRPCRoute resources**: Managed in application charts (application teams) |
| 211 | + |
| 212 | +**Challenge:** |
| 213 | +Adding specific listeners or Gateway configurations for a service requires changes to both: |
| 214 | +1. The `envoy-gateway` chart (to add listener) |
| 215 | +2. The application chart (to create route) |
| 216 | + |
| 217 | +This creates coordination overhead and potential conflicts. |
| 218 | + |
| 219 | +### Future: ListenerSet Functionality |
| 220 | + |
| 221 | +**ListenerSet** (coming soon, see [GEP-1713](https://gateway-api.sigs.k8s.io/geps/gep-1713/)) will allow application teams to define listeners without modifying the Gateway resource. |
| 222 | + |
| 223 | +For the complete specification and YAML examples, see [GEP-1713: ListenerSet](https://gateway-api.sigs.k8s.io/geps/gep-1713/#yaml). |
| 224 | + |
| 225 | +**Current Workaround:** |
| 226 | +- Use shared Gateway with standard listeners (80, 443) |
| 227 | +- Route differentiation via hostnames and paths |
| 228 | +- For special requirements, coordinate Gateway changes through platform team |
| 229 | + |
| 230 | +**Best Practice:** |
| 231 | +- Prefer hostname/path-based routing over custom listeners |
| 232 | +- Document any custom listener requirements |
| 233 | +- Plan for ListenerSet adoption when available |
| 234 | + |
| 235 | +## Security Considerations |
| 236 | + |
| 237 | +### TLS Configuration |
| 238 | + |
| 239 | +- **TLS 1.2 minimum**: Configure minimum TLS version |
| 240 | +- **Strong cipher suites**: Use modern cipher suites only |
| 241 | +- **Certificate rotation**: Ensure certificates auto-renew via cert-manager |
| 242 | +- **mTLS for east-west**: Configure mTLS for service-to-service communication |
| 243 | + |
| 244 | +### Rate Limiting |
| 245 | + |
| 246 | +Apply rate limiting at Gateway or Route level using ExtensionPolicy: |
| 247 | + |
| 248 | +```yaml |
| 249 | +apiVersion: gateway.envoyproxy.io/v1alpha1 |
| 250 | +kind: ExtensionPolicy |
| 251 | +metadata: |
| 252 | + name: rate-limit-policy |
| 253 | +spec: |
| 254 | + targetRef: |
| 255 | + group: gateway.networking.k8s.io |
| 256 | + kind: HTTPRoute |
| 257 | + name: my-app-route |
| 258 | + rateLimit: |
| 259 | + rules: |
| 260 | + - limit: |
| 261 | + requests: 100 |
| 262 | + unit: Minute |
| 263 | +``` |
| 264 | + |
| 265 | +### Network Policies |
| 266 | + |
| 267 | +- Restrict Gateway access to only necessary namespaces |
| 268 | +- Use NetworkPolicies to limit east-west traffic |
| 269 | +- Isolate Gateway control plane from workloads |
| 270 | + |
| 271 | +## Troubleshooting |
| 272 | + |
| 273 | +### Common Issues |
| 274 | + |
| 275 | +| Issue | Cause | Solution | |
| 276 | +|-------|-------|----------| |
| 277 | +| Route not accessible | Gateway not ready | Check Gateway status: `kubectl get gateway` | |
| 278 | +| TLS handshake fails | Certificate not issued | Check Certificate CR status: `kubectl get certificate` | |
| 279 | +| 404 errors | Hostname mismatch | Verify HTTPRoute hostnames match Gateway hostname | |
| 280 | +| DNS not resolving | External-DNS not syncing | Check external-dns logs and Gateway annotations | |
| 281 | +| Metrics missing | Service not annotated | Add Prometheus annotations to Service resources | |
| 282 | + |
| 283 | +### Debugging Commands |
| 284 | + |
| 285 | +```bash |
| 286 | +# Check Gateway status |
| 287 | +kubectl get gateway -A |
| 288 | +
|
| 289 | +# Check HTTPRoute status |
| 290 | +kubectl get httproute -A |
| 291 | +
|
| 292 | +# View Envoy configuration |
| 293 | +kubectl exec -n envoy-gateway-system <envoy-pod> -- curl localhost:19000/config_dump |
| 294 | +
|
| 295 | +# Check certificate status |
| 296 | +kubectl get certificate -A |
| 297 | +
|
| 298 | +# View Gateway API events |
| 299 | +kubectl get events --field-selector involvedObject.kind=Gateway |
| 300 | +``` |
| 301 | + |
| 302 | +## References |
| 303 | + |
| 304 | +- [Kubernetes Gateway API Specification](https://gateway-api.sigs.k8s.io/) |
| 305 | +- [Envoy Gateway Documentation](https://gateway.envoyproxy.io/) |
| 306 | +- [Traffic Management Guide](traffic-management.md) - Platform-specific traffic management patterns |
| 307 | +- Repository: `charts/envoy-gateway/` |
| 308 | + |
0 commit comments