Version: 1.0 Last Updated: 2025-11-13 Status: Production Ready Audience: Platform Engineers, SRE, Developers
Complete guide to Istio traffic management capabilities including intelligent routing, circuit breaking, retries, timeouts, fault injection, and canary deployments for the Kagenti AI Agent Platform.
- Overview
- Traffic Routing
- Resiliency Patterns
- Canary Deployments
- Fault Injection
- Traffic Mirroring
- Request Routing
- Timeout and Retry
- Circuit Breaking
- Load Balancing
- Troubleshooting
- Best Practices
- Alternatives
- Next Steps
- References
Purpose: Configure advanced traffic management capabilities in Istio to achieve intelligent routing, high availability, and gradual rollouts for AI agents and platform services.
What You Get:
- ✅ Intelligent request routing (header-based, path-based, weight-based)
- ✅ Automatic retries with exponential backoff
- ✅ Request timeouts to prevent cascading failures
- ✅ Circuit breaking to isolate failing services
- ✅ Canary deployments for gradual rollouts
- ✅ Fault injection for chaos engineering
- ✅ Traffic mirroring for testing
- ✅ Load balancing algorithms (round-robin, least-conn, consistent hashing)
Key Principle: Istio's traffic management operates at Layer 7 (HTTP), enabling intelligent routing decisions based on request content, unlike traditional load balancers that operate at Layer 4 (TCP).
Source: Based on Istio Traffic Management, Istio Best Practices
Kagenti platform uses Gateway API HTTPRoute as the primary routing mechanism, with Istio VirtualService for advanced features.
Comparison:
| Feature | HTTPRoute (Gateway API) | VirtualService (Istio) |
|---|---|---|
| Standard | Kubernetes SIG standard | Istio-specific |
| Use Case | External ingress routing | Service-to-service routing |
| Features | Basic routing, rewrites, redirects | Advanced: retries, timeouts, fault injection |
| Future | ✅ Recommended for new features |
Recommendation: Use HTTPRoute for external ingress, VirtualService for internal service mesh routing.
Source: Gateway API Overview, Istio Traffic Management
Route traffic based on URL path:
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: agent-routes
namespace: team1
spec:
parentRefs:
- name: kagenti-gateway
namespace: istio-system
hostnames:
- "agents.localtest.me"
rules:
# Route /research to research-agent
- matches:
- path:
type: PathPrefix
value: /research
backendRefs:
- name: research-agent
port: 8080
# Route /code to code-agent
- matches:
- path:
type: PathPrefix
value: /code
backendRefs:
- name: code-agent
port: 8080
# Route /orchestrate to orchestrator
- matches:
- path:
type: PathPrefix
value: /orchestrate
backendRefs:
- name: orchestrator-agent
port: 8080Source: HTTPRoute Path Matching
Route traffic based on HTTP headers (e.g., API version):
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: versioned-api-routes
namespace: team1
spec:
parentRefs:
- name: kagenti-gateway
namespace: istio-system
hostnames:
- "api.localtest.me"
rules:
# Route v2 API requests to new backend
- matches:
- headers:
- name: api-version
value: v2
backendRefs:
- name: research-agent-v2
port: 8080
# Default to v1 API
- backendRefs:
- name: research-agent
port: 8080Use Case: API versioning, A/B testing, feature flags
Source: HTTPRoute Header Matching
Route traffic to different versions based on weight:
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: research-agent-vs
namespace: team1
spec:
hosts:
- research-agent.team1.svc.cluster.local
http:
- match:
- headers:
x-user-type:
exact: "beta-tester"
route:
- destination:
host: research-agent.team1.svc.cluster.local
subset: v2
weight: 100
# Default traffic goes to v1
- route:
- destination:
host: research-agent.team1.svc.cluster.local
subset: v1
weight: 100DestinationRule (defines subsets):
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
name: research-agent-dr
namespace: team1
spec:
host: research-agent.team1.svc.cluster.local
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2Source: Istio VirtualService
Configure retries for transient failures:
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: research-agent-retry
namespace: team1
spec:
hosts:
- research-agent.team1.svc.cluster.local
http:
- route:
- destination:
host: research-agent.team1.svc.cluster.local
retries:
attempts: 3 # Retry up to 3 times
perTryTimeout: 2s # Timeout per attempt
retryOn: 5xx,reset,refused-stream,retriable-4xxRetry Conditions:
5xx: Server errors (500, 502, 503, 504)reset: Connection resetrefused-stream: HTTP/2 REFUSED_STREAMretriable-4xx: 409 Conflict
Source: Istio Retries
Prevent long-running requests from blocking resources:
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: research-agent-timeout
namespace: team1
spec:
hosts:
- research-agent.team1.svc.cluster.local
http:
- route:
- destination:
host: research-agent.team1.svc.cluster.local
timeout: 10s # Total timeout for requestUse Case: Prevent slow AI inference from blocking other requests
Source: Istio Timeouts
Limit concurrent connections to prevent overwhelming backend:
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
name: research-agent-circuit-breaker
namespace: team1
spec:
host: research-agent.team1.svc.cluster.local
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100 # Max TCP connections
http:
http1MaxPendingRequests: 10 # Max pending HTTP/1.1 requests
http2MaxRequests: 100 # Max concurrent HTTP/2 requests
maxRequestsPerConnection: 2 # Max requests per connection (HTTP/1.1)Why:
- Prevents resource exhaustion: Limits concurrent connections
- Protects backend: Prevents overload during traffic spikes
- Fast fail: Returns 503 immediately when circuit is open
Source: Istio Circuit Breaking
Automatically remove failing instances from load balancing pool:
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
name: research-agent-outlier
namespace: team1
spec:
host: research-agent.team1.svc.cluster.local
trafficPolicy:
outlierDetection:
consecutiveErrors: 5 # Eject after 5 consecutive errors
interval: 30s # Check every 30 seconds
baseEjectionTime: 30s # Eject for at least 30 seconds
maxEjectionPercent: 50 # Eject max 50% of instances
minHealthPercent: 25 # Keep at least 25% of instances healthyHow It Works:
- Istio tracks errors per backend instance
- After 5 consecutive errors, instance is ejected from pool
- Instance is ejected for 30 seconds (increases with repeated ejections)
- After ejection time, instance is re-added to pool
Source: Istio Outlier Detection
Incrementally shift traffic from v1 to v2:
Step 1: Deploy v2 (0% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
name: research-agent-v2
namespace: team1
spec:
replicas: 1
selector:
matchLabels:
app: research-agent
version: v2
template:
metadata:
labels:
app: research-agent
version: v2
spec:
containers:
- name: agent
image: localhost:5000/research-agent:v0.0.16Step 2: Route 10% traffic to v2
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: research-agent-canary
namespace: team1
spec:
hosts:
- research-agent.team1.svc.cluster.local
http:
- route:
- destination:
host: research-agent.team1.svc.cluster.local
subset: v1
weight: 90 # 90% to v1
- destination:
host: research-agent.team1.svc.cluster.local
subset: v2
weight: 10 # 10% to v2 (canary)Step 3: Monitor metrics
# Check error rate for v2
kubectl exec -n observability deploy/prometheus -- \
promtool query instant 'http://localhost:9090' \
'rate(istio_requests_total{destination_version="v2",response_code=~"5.."}[5m])'
# Check latency for v2
kubectl exec -n observability deploy/prometheus -- \
promtool query instant 'http://localhost:9090' \
'histogram_quantile(0.99, rate(istio_request_duration_milliseconds_bucket{destination_version="v2"}[5m]))'Step 4: Gradually increase traffic (50%)
http:
- route:
- destination:
host: research-agent.team1.svc.cluster.local
subset: v1
weight: 50
- destination:
host: research-agent.team1.svc.cluster.local
subset: v2
weight: 50Step 5: Complete migration (100% to v2)
http:
- route:
- destination:
host: research-agent.team1.svc.cluster.local
subset: v2
weight: 100Step 6: Decommission v1
kubectl delete deployment research-agent-v1 -n team1Source: Istio Traffic Shifting
Inject latency to test timeout handling:
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: research-agent-delay
namespace: team1
spec:
hosts:
- research-agent.team1.svc.cluster.local
http:
- fault:
delay:
percentage:
value: 10.0 # Inject delay for 10% of requests
fixedDelay: 5s # Delay by 5 seconds
route:
- destination:
host: research-agent.team1.svc.cluster.localUse Case: Test how orchestrator handles slow agent responses
Source: Istio Fault Injection
Inject HTTP errors to test error handling:
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: research-agent-abort
namespace: team1
spec:
hosts:
- research-agent.team1.svc.cluster.local
http:
- fault:
abort:
percentage:
value: 5.0 # Inject error for 5% of requests
httpStatus: 503 # Return HTTP 503
route:
- destination:
host: research-agent.team1.svc.cluster.localUse Case: Test retry logic and circuit breaker behavior
Send copy of production traffic to test environment without affecting users:
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: research-agent-mirror
namespace: team1
spec:
hosts:
- research-agent.team1.svc.cluster.local
http:
- route:
- destination:
host: research-agent.team1.svc.cluster.local
subset: v1
weight: 100
mirror:
host: research-agent.team1.svc.cluster.local
subset: v2-test # Mirror to test version
mirrorPercentage:
value: 10.0 # Mirror 10% of trafficHow It Works:
- Primary request goes to v1 (production)
- Copy of 10% of requests goes to v2-test
- v2-test response is ignored (fire-and-forget)
- Users see only v1 response
Use Case: Test new agent version with real traffic without risk
Source: Istio Traffic Mirroring
Rewrite request path before routing:
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: agent-rewrite
namespace: team1
spec:
parentRefs:
- name: kagenti-gateway
namespace: istio-system
hostnames:
- "api.localtest.me"
rules:
- matches:
- path:
type: PathPrefix
value: /v1/research
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplacePrefixMatch
replacePrefixMatch: /a2a/task # Rewrite /v1/research to /a2a/task
backendRefs:
- name: research-agent
port: 8080Use Case: API versioning without changing agent implementation
Source: HTTPRoute URL Rewrite
Add, modify, or remove request headers:
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: agent-headers
namespace: team1
spec:
parentRefs:
- name: kagenti-gateway
namespace: istio-system
rules:
- filters:
- type: RequestHeaderModifier
requestHeaderModifier:
add:
- name: X-Agent-Platform
value: kagenti
- name: X-Request-ID
value: ${UUID} # Generate UUID per request
set:
- name: X-Forwarded-Proto
value: https
remove:
- X-Internal-Debug # Remove internal headers
backendRefs:
- name: research-agent
port: 8080Source: HTTPRoute Header Filters
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: research-agent-resilient
namespace: team1
spec:
hosts:
- research-agent.team1.svc.cluster.local
http:
- route:
- destination:
host: research-agent.team1.svc.cluster.local
timeout: 15s # Total timeout including retries
retries:
attempts: 3
perTryTimeout: 5s # 5s per attempt (3 attempts = max 15s)
retryOn: 5xx,reset,refused-streamCalculation:
- Total timeout: 15s
- Per-try timeout: 5s
- Attempts: 3
- Maximum time: min(timeout, perTryTimeout * attempts) = min(15s, 15s) = 15s
Source: Istio Timeout and Retry
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
name: research-agent-advanced-cb
namespace: team1
spec:
host: research-agent.team1.svc.cluster.local
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
connectTimeout: 30s
http:
http1MaxPendingRequests: 10
http2MaxRequests: 100
maxRequestsPerConnection: 2
h2UpgradePolicy: UPGRADE # Allow HTTP/1.1 to HTTP/2 upgrade
outlierDetection:
consecutiveErrors: 5
interval: 30s
baseEjectionTime: 30s
maxEjectionPercent: 50
minHealthPercent: 25
consecutiveGatewayErrors: 3 # Eject after 3 gateway errors (502, 503, 504)
consecutive5xxErrors: 5 # Eject after 5 5xx errorsSource: Istio Connection Pool
Round Robin (default):
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
name: research-agent-lb-rr
namespace: team1
spec:
host: research-agent.team1.svc.cluster.local
trafficPolicy:
loadBalancer:
simple: ROUND_ROBINLeast Connections:
trafficPolicy:
loadBalancer:
simple: LEAST_CONNRandom:
trafficPolicy:
loadBalancer:
simple: RANDOMConsistent Hash (session affinity):
trafficPolicy:
loadBalancer:
consistentHash:
httpHeaderName: "x-user-id" # Hash based on user ID headerConsistent Hash (Cookie-based):
trafficPolicy:
loadBalancer:
consistentHash:
httpCookie:
name: session-id
ttl: 3600sSource: Istio Load Balancing
Symptoms: 5xx errors increase after canary deployment
Diagnosis:
# Check error rate by version
kubectl exec -n observability deploy/prometheus -- \
promtool query instant 'http://localhost:9090' \
'sum(rate(istio_requests_total{response_code=~"5.."}[5m])) by (destination_version)'
# Check destination rule applied
kubectl get destinationrule research-agent-dr -n team1 -o yamlRoot Cause: v2 pods not ready, receiving traffic before initialization complete
Solution:
# Add readiness probe to v2 deployment
spec:
template:
spec:
containers:
- name: agent
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 5Source: Kubernetes Probes
Symptoms: Backend overloaded despite circuit breaker configuration
Diagnosis:
# Check connection pool metrics
kubectl exec -n observability deploy/prometheus -- \
promtool query instant 'http://localhost:9090' \
'istio_tcp_connections_opened_total{destination_service="research-agent.team1.svc.cluster.local"}'
# Check circuit breaker stats
istioctl proxy-config clusters deploy/orchestrator-agent -n team1 | grep research-agentRoot Cause: Connection pool limits too high
Solution:
# Lower connection limits
trafficPolicy:
connectionPool:
http:
http2MaxRequests: 50 # Reduced from 100Symptoms: AI agent processes same request multiple times
Diagnosis:
# Check retry metrics
kubectl exec -n observability deploy/prometheus -- \
promtool query instant 'http://localhost:9090' \
'istio_requests_total{response_flags=~".*R.*"}'Root Cause: Non-idempotent operations retried on transient errors
Solution:
# Only retry safe methods (GET, HEAD, OPTIONS)
retries:
attempts: 3
retryOn: 5xx,reset
retryRemoteLocalities: falseOr implement idempotency keys in agent:
@app.route('/a2a/task', methods=['POST'])
def create_task():
idempotency_key = request.headers.get('X-Idempotency-Key')
if idempotency_key and is_duplicate(idempotency_key):
return get_cached_response(idempotency_key)
# Process task...✅ Good: Set timeout for agent-to-agent calls
http:
- route:
- destination:
host: research-agent.team1.svc.cluster.local
timeout: 30s # LLM inference can be slow❌ Avoid: No timeout (blocks forever on hang)
http:
- route:
- destination:
host: research-agent.team1.svc.cluster.local
# No timeout configuredWhy: Prevents cascading failures when downstream service hangs
✅ Good: Circuit breaker with outlier detection
trafficPolicy:
connectionPool:
http:
http2MaxRequests: 100
outlierDetection:
consecutiveErrors: 5
baseEjectionTime: 30s❌ Avoid: No circuit breaker
# No traffic policy configuredWhy: Protects backend from overload and prevents resource exhaustion
✅ Good: Gradual traffic shift with monitoring
http:
- route:
- destination:
host: research-agent.team1.svc.cluster.local
subset: v1
weight: 90
- destination:
host: research-agent.team1.svc.cluster.local
subset: v2
weight: 10 # Start with 10%❌ Avoid: Big bang deployment (100% at once)
kubectl set image deploy/research-agent agent=research-agent:v2
# All traffic immediately goes to v2Why: Limits blast radius if v2 has issues
✅ Good: Test retry logic before production
# In test environment
fault:
abort:
percentage:
value: 20.0
httpStatus: 503❌ Avoid: Assume retries work without testing
retries:
attempts: 3
# Never tested with real failuresWhy: Discovers retry configuration issues before production
Process:
- Install Linkerd control plane
- Annotate pods for injection
- Use ServiceProfile for retries/timeouts
Pros:
- ✅ Simpler than Istio (smaller resource footprint)
- ✅ Automatic retries for all HTTP requests
- ✅ Great observability out-of-box
Cons:
- ❌ Less feature-rich than Istio
- ❌ No Gateway API support (uses Ingress)
- ❌ Smaller ecosystem
When to Use: Resource-constrained environments, need simplicity
Source: Linkerd Traffic Management
Process:
- Deploy Consul agents
- Configure service intentions
- Use L7 traffic management
Pros:
- ✅ Strong service discovery
- ✅ Multi-datacenter support
- ✅ Integrated with HashiCorp stack
Cons:
- ❌ Requires Consul infrastructure
- ❌ Less Kubernetes-native than Istio
- ❌ Steeper learning curve
When to Use: Already using HashiCorp stack, multi-DC requirements
Source: Consul Service Mesh
Process:
- Add Resilience4j dependency to agent code
- Configure circuit breaker, retry in code
- No service mesh required
Pros:
- ✅ No infrastructure overhead
- ✅ Fine-grained control
- ✅ Language-specific optimizations
Cons:
- ❌ Must implement in every microservice
- ❌ No unified observability
- ❌ Requires code changes for updates
When to Use: Single-language environment, minimal infrastructure
Source: Resilience4j
- Istio Traffic Management Concepts - Deep dive into Istio routing
- Gateway API Guide - HTTPRoute configuration
- Istio Service Mesh Guide - Istio installation and basics
-
Implement Gradual Rollouts
- Benefit: Safer deployments with automated rollback
- Effort: 2-3 days (setup Flagger or Argo Rollouts)
- Priority: High
-
Add Traffic Mirroring for Testing
- Benefit: Test new agent versions with real traffic
- Effort: 1 day (configure VirtualService)
- Priority: Medium
-
Implement Service-Level Objectives (SLOs)
- Benefit: Quantify service reliability
- Effort: 3-5 days (define SLIs, implement monitoring)
- Priority: High
- Flagger: Automated canary deployments with metrics analysis
- Argo Rollouts: Progressive delivery for Kubernetes
- Istio Traffic Management: istio.io/latest/docs/concepts/traffic-management
- Istio Best Practices: istio.io/latest/docs/ops/best-practices/traffic-management
- Gateway API: gateway-api.sigs.k8s.io
- VirtualService Reference: istio.io/latest/docs/reference/config/networking/virtual-service
- DestinationRule Reference: istio.io/latest/docs/reference/config/networking/destination-rule
- Istio Traffic Management Tasks: istio.io/latest/docs/tasks/traffic-management
- Circuit Breaking: istio.io/latest/docs/tasks/traffic-management/circuit-breaking
- Fault Injection: istio.io/latest/docs/tasks/traffic-management/fault-injection
- Traffic Mirroring: istio.io/latest/docs/tasks/traffic-management/mirroring
- Istio Service Mesh Guide - Installation and configuration
- Gateway API Guide - HTTPRoute setup
- Prometheus Metrics - Monitoring traffic management
- Istio Version: 1.20+
- Gateway API Version: v1
- Kubernetes Version: 1.28+
Last Updated: 2025-11-13 Document Version: 1.0 Maintained By: Platform Engineering Team License: Apache 2.0