|
| 1 | +# Data Attributes - Usage and Examples |
| 2 | + |
| 3 | +This document provides additional context and examples for the `data` attribute group. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +Traditional observability focuses on the health of the "vessel" (service, container, database). This specification introduces the Data Attribute Group, which focuses on the "cargo." By tagging resources and spans with data-specific metadata, we enable: |
| 8 | + |
| 9 | +* **Automated Retention**: Systems can dynamically set snapshot lifetimes based on `data.category`. |
| 10 | +* **Security Guardrails**: Monitoring for sensitive data movement across trust boundaries. |
| 11 | +* **Compliance Mapping**: Real-time visibility into which services handle GDPR, HIPAA, or PCI-DSS governed data. |
| 12 | + |
| 13 | +## Example Configurations |
| 14 | + |
| 15 | +### 1. Kubernetes Resource Metadata |
| 16 | + |
| 17 | +Infrastructure components like PersistentVolumeClaims (PVCs) act as the source of truth for the data sensitivity level. |
| 18 | + |
| 19 | +```yaml |
| 20 | +apiVersion: v1 |
| 21 | +kind: PersistentVolumeClaim |
| 22 | +metadata: |
| 23 | + name: customer-data-pvc |
| 24 | + labels: |
| 25 | + # Defining the sensitivity of the "cargo" at rest |
| 26 | + data-classification: "restricted" |
| 27 | + data-category: "pii" |
| 28 | + compliance-scope: "gdpr" |
| 29 | +spec: |
| 30 | + accessModes: |
| 31 | + - ReadWriteOnce |
| 32 | + resources: |
| 33 | + requests: |
| 34 | + storage: 10Gi |
| 35 | +``` |
| 36 | +
|
| 37 | +### 2. OpenTelemetry Resource Attributes |
| 38 | +
|
| 39 | +When a service starts, it can be configured to broadcast its data handling capabilities or the sensitivity of its primary datastore via environment variables. |
| 40 | +
|
| 41 | +```bash |
| 42 | +# Broadly tagging the service resource in the collection pipeline |
| 43 | +export OTEL_RESOURCE_ATTRIBUTES="data.sensitivity=restricted,data.category=financial" |
| 44 | +``` |
| 45 | + |
| 46 | +### 3. Programmatic Context Transfer (Pseudo-code) |
| 47 | + |
| 48 | +To propagate sensitivity across a pipeline, a service injects the attribute into the Baggage, which is then carried in the headers of all downstream RPC calls. |
| 49 | + |
| 50 | +```python |
| 51 | +from opentelemetry import baggage, context |
| 52 | + |
| 53 | +# 1. Extract sensitivity from the datastore metadata (e.g., K8s label) |
| 54 | +store_sensitivity = "restricted" |
| 55 | + |
| 56 | +# 2. Inject into the current request context as "Baggage" |
| 57 | +ctx = baggage.set_baggage("data.sensitivity", store_sensitivity) |
| 58 | + |
| 59 | +# 3. Downstream calls now carry this context automatically in their headers |
| 60 | +with context.attach(ctx): |
| 61 | + # This RPC call to 'DownstreamService' will include the sensitivity attribute |
| 62 | + call_downstream_service() |
| 63 | +``` |
| 64 | + |
| 65 | +## Security & Governance Considerations |
| 66 | + |
| 67 | +1. **Low Cardinality**: To avoid performance degradation in metrics backends (like Prometheus), do not use unique IDs (e.g., `user_id`) in `data.*` attributes. |
| 68 | +2. **No Actual Data**: Never place actual PII (e.g., an email address) inside these attributes. They are for metadata only. |
| 69 | +3. **Mandatory Review**: Any new value added to `data.category` must be approved by the Data Governance committee to ensure consistent reporting. |
0 commit comments