Lightweight Kubernetes metrics collector for Cased observability. Runs as a DaemonSet on every node, collecting container metrics, Kubernetes events, and optionally HTTP traffic via eBPF.
curl -fsSL https://raw.githubusercontent.com/cased/cased-agent/main/install.sh | bash -s -- \
--api-key YOUR_CASED_API_KEY \
--cluster-id prodhelm install cased-agent oci://ghcr.io/cased/charts/cased-agent \
--namespace cased-system \
--create-namespace \
--set apiKey=YOUR_CASED_API_KEY \
--set clusterId=prodkubectl apply -f https://raw.githubusercontent.com/cased/cased-agent/main/deploy/manifests/install.yaml
kubectl -n cased-system create secret generic cased-agent --from-literal=api-key=YOUR_API_KEY
kubectl -n cased-system set env daemonset/cased-agent CASED_CLUSTER_ID=prod- CPU usage (user, system, idle, iowait)
- Memory (total, used, available, cached, swap)
- Network I/O (bytes, packets, errors per interface)
- CPU usage percentage
- CPU throttling (throttle percent, throttled time)
- Memory usage, limit, percentage
- Memory breakdown (RSS, cache, swap)
- Network throughput (rx/tx bytes per second)
- Disk I/O (read/write bytes per second)
- OOM kills
- Pod evictions
- Failed scheduling
- CrashLoopBackOff
- All Warning events
- Request count
- Error rate (4xx/5xx)
- Latency percentiles (P50, P95, P99)
- Per-path breakdowns
- Span count by service
- Trace error rate
- Duration percentiles (P50, P95, P99)
- Cluster ID
- Node name
- Namespace
- Pod name and UID
- Container name
- Labels
| Flag | Env Var | Default | Description |
|---|---|---|---|
--endpoint |
CASED_API_ENDPOINT |
https://app.cased.com |
Cased API endpoint |
--api-key |
CASED_API_KEY |
- | API key (required) |
--cluster-id |
CASED_CLUSTER_ID |
- | Cluster identifier (required) |
--node-name |
NODE_NAME |
- | Node name |
--interval |
- | 15s |
Collection interval |
--batch-size |
- | 100 |
Max metrics per batch |
--enable-ebpf |
ENABLE_EBPF |
false |
Enable eBPF HTTP tracing |
--enable-otel |
ENABLE_OTEL |
false |
Enable OpenTelemetry receiver |
--otel-port |
- | 4318 |
Port for OTLP HTTP receiver |
┌─────────────────────────────────────────────────────────────────────┐
│ Kubernetes Node │
│ │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ cased-agent │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ /proc/* │ │ /sys/fs/ │ │ Kubernetes │ │ │
│ │ │ (node) │ │ cgroup/* │ │ API │ │ │
│ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │
│ │ │ │ │ │ │
│ │ ┌──────┴────────────────┴────────────────┴──────┐ │ │
│ │ │ Core Collector │ │ │
│ │ │ CPU, Memory, Network, Disk, K8s Events │ │ │
│ │ └────────────────────┬───────────────────────────┘ │ │
│ │ │ │ │
│ │ ┌────────────────────┼────────────────────┐ │ │
│ │ │ │ │ │ │
│ │ ▼ ▼ ▼ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ eBPF │ │ OTel │ │ Batch & │ │ │
│ │ │ HTTP Trace │ │ Receiver │ │ Send │ │ │
│ │ │ (optional) │ │ :4318 │ │ │ │ │
│ │ └──────────────┘ └──────────────┘ └──────┬───────┘ │ │
│ │ │ │ │
│ └───────────────────────────────────────────┼───────────────────┘ │
│ │ │
└──────────────────────────────────────────────┼──────────────────────┘
│ HTTPS POST
▼
┌─────────────────┐
│ Cased API │
│ /api/v1/ │
│ telemetry/ │
│ metrics │
└─────────────────┘
To send traces to the agent's OTel receiver, configure your application's OTLP exporter:
# Environment variables for OTLP
export OTEL_EXPORTER_OTLP_ENDPOINT=http://cased-agent:4318
export OTEL_EXPORTER_OTLP_PROTOCOL=http/jsonOr in code:
# Python example
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
exporter = OTLPSpanExporter(endpoint="http://cased-agent:4318/v1/traces")# Start with sample workloads
docker compose up --build
# View agent logs
docker compose logs agent -fThe compose file includes:
- agent: The metrics collector
- workload: CPU/memory stress test
- web: Sample HTTP server with random errors
- traffic: Generates HTTP traffic to the web service
# Build (without eBPF)
CGO_ENABLED=0 go build -o cased-agent .
# Build with eBPF support (Linux only)
clang -O2 -g -target bpf -c ebpf/http_trace.c -o ebpf/http_trace.o
CGO_ENABLED=1 go build -o cased-agent .| Metric | Unit | Description |
|---|---|---|
container.cpu.usage_percent |
percent | CPU utilization |
container.cpu.throttle_percent |
percent | CPU throttling rate |
container.cpu.throttled_time |
ms/sec | Time spent throttled |
container.memory.usage |
bytes | Current memory usage |
container.memory.limit |
bytes | Memory limit |
container.memory.usage_percent |
percent | Memory utilization |
container.memory.rss |
bytes | Resident Set Size |
container.memory.cache |
bytes | Page cache |
container.memory.swap |
bytes | Swap usage |
container.disk.read_bytes_per_sec |
bytes/sec | Disk read throughput |
container.disk.write_bytes_per_sec |
bytes/sec | Disk write throughput |
| Metric | Unit | Description |
|---|---|---|
http.request_count |
count | Request count |
http.error_rate |
percent | 4xx/5xx rate |
http.latency_avg |
ms | Average latency |
http.latency_p50 |
ms | P50 latency |
http.latency_p95 |
ms | P95 latency |
http.latency_p99 |
ms | P99 latency |
| Metric | Unit | Description |
|---|---|---|
trace.span_count |
count | Spans received |
trace.error_rate |
percent | Error span rate |
trace.duration_avg |
ms | Average span duration |
trace.duration_p50 |
ms | P50 duration |
trace.duration_p95 |
ms | P95 duration |
trace.duration_p99 |
ms | P99 duration |
| Metric | Unit | Description |
|---|---|---|
k8s.event_count |
count | Events by type/reason |
k8s.warning_events |
count | Warning events |
k8s.oom_kills |
count | OOM kill events |
k8s.evictions |
count | Pod evictions |
k8s.failed_scheduling |
count | Scheduling failures |
k8s.crashloop_backoff |
count | CrashLoopBackOff events |
kubectl delete -f https://raw.githubusercontent.com/cased/cased-agent/main/deploy/manifests/install.yamlOr with Helm:
helm uninstall cased-agent -n cased-system