Skip to content

Commit 3c72528

Browse files
author
Ayushi Asthana
committed
Introduce data attribute group for governance and security (#3088)
1 parent dd90cbe commit 3c72528

9 files changed

Lines changed: 197 additions & 0 deletions

File tree

.chloggen/add-data-attributes.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
change_type: "enhancement"
2+
component: "data"
3+
note: "Introduce `data.category` and `data.sensitivity` attributes to support data governance and security use cases."
4+
issues: [0] # Placeholder PR number
5+
subtext: |
6+
These attributes allow for classifying data based on its semantic nature and sensitivity level,
7+
enabling automated security workflows, compliance mapping, and improved observability of sensitive data.

.github/ISSUE_TEMPLATE/bug_report.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ body:
3939
- area:container
4040
- area:cpu
4141
- area:cpython
42+
- area:data
4243
- area:db
4344
- area:deployment
4445
- area:destination

.github/ISSUE_TEMPLATE/change_proposal.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ body:
3131
- area:container
3232
- area:cpu
3333
- area:cpython
34+
- area:data
3435
- area:db
3536
- area:deployment
3637
- area:destination

.github/ISSUE_TEMPLATE/new-conventions.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ body:
4242
- area:container
4343
- area:cpu
4444
- area:cpython
45+
- area:data
4546
- area:db
4647
- area:deployment
4748
- area:destination

docs/non-normative/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,5 @@ linkTitle: Non-normative
66

77
The pages in this section are **non-normative**, most are supplementary
88
guidelines.
9+
10+
- [Data Attributes - Usage and Examples](data-attributes.md)
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Data Attributes - Usage and Examples
2+
3+
This document provides additional context and examples for the `data` attribute group.
4+
5+
## Overview
6+
7+
Traditional observability focuses on the health of the "vessel" (service, container, database). This specification introduces the Data Attribute Group, which focuses on the "cargo." By tagging resources and spans with data-specific metadata, we enable:
8+
9+
* **Automated Retention**: Systems can dynamically set snapshot lifetimes based on `data.category`.
10+
* **Security Guardrails**: Monitoring for sensitive data movement across trust boundaries.
11+
* **Compliance Mapping**: Real-time visibility into which services handle GDPR, HIPAA, or PCI-DSS governed data.
12+
13+
## Example Configurations
14+
15+
### 1. Kubernetes Resource Metadata
16+
17+
Infrastructure components like PersistentVolumeClaims (PVCs) act as the source of truth for the data sensitivity level.
18+
19+
```yaml
20+
apiVersion: v1
21+
kind: PersistentVolumeClaim
22+
metadata:
23+
name: customer-data-pvc
24+
labels:
25+
# Defining the sensitivity of the "cargo" at rest
26+
data-classification: "restricted"
27+
data-category: "pii"
28+
compliance-scope: "gdpr"
29+
spec:
30+
accessModes:
31+
- ReadWriteOnce
32+
resources:
33+
requests:
34+
storage: 10Gi
35+
```
36+
37+
### 2. OpenTelemetry Resource Attributes
38+
39+
When a service starts, it can be configured to broadcast its data handling capabilities or the sensitivity of its primary datastore via environment variables.
40+
41+
```bash
42+
# Broadly tagging the service resource in the collection pipeline
43+
export OTEL_RESOURCE_ATTRIBUTES="data.sensitivity=restricted,data.category=financial"
44+
```
45+
46+
### 3. Programmatic Context Transfer (Pseudo-code)
47+
48+
To propagate sensitivity across a pipeline, a service injects the attribute into the Baggage, which is then carried in the headers of all downstream RPC calls.
49+
50+
```python
51+
from opentelemetry import baggage, context
52+
53+
# 1. Extract sensitivity from the datastore metadata (e.g., K8s label)
54+
store_sensitivity = "restricted"
55+
56+
# 2. Inject into the current request context as "Baggage"
57+
ctx = baggage.set_baggage("data.sensitivity", store_sensitivity)
58+
59+
# 3. Downstream calls now carry this context automatically in their headers
60+
with context.attach(ctx):
61+
# This RPC call to 'DownstreamService' will include the sensitivity attribute
62+
call_downstream_service()
63+
```
64+
65+
## Security & Governance Considerations
66+
67+
1. **Low Cardinality**: To avoid performance degradation in metrics backends (like Prometheus), do not use unique IDs (e.g., `user_id`) in `data.*` attributes.
68+
2. **No Actual Data**: Never place actual PII (e.g., an email address) inside these attributes. They are for metadata only.
69+
3. **Mandatory Review**: Any new value added to `data.category` must be approved by the Data Governance committee to ensure consistent reporting.

docs/registry/attributes/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ Currently, the following namespaces exist:
4848
- [Container](container.md)
4949
- [CPU](cpu.md)
5050
- [CPython](cpython.md)
51+
- [Data](data.md)
5152
- [DB](db.md)
5253
- [Deployment](deployment.md)
5354
- [Destination](destination.md)

docs/registry/attributes/data.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
<!-- NOTE: THIS FILE IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
2+
<!-- see templates/registry/markdown/attribute_namespace.md.j2 -->
3+
4+
# Data
5+
6+
## Data Attributes
7+
8+
A data attribute group represents a logical or physical collection of information (e.g., a database table, an S3 object, or a message queue) being monitored for security and governance.
9+
10+
**Attributes:**
11+
12+
| Key | Stability | Value Type | Description | Example Values |
13+
| --- | --- | --- | --- | --- |
14+
| <a id="data-category" href="#data-category">`data.category`</a> | ![Development](https://img.shields.io/badge/-development-blue) | string | A taxonomy identifier describing the semantic nature of the data. [1] | `pii`; `financial`; `health`; `internal`; `public` |
15+
| <a id="data-sensitivity" href="#data-sensitivity">`data.sensitivity`</a> | ![Development](https://img.shields.io/badge/-development-blue) | string | A classification level indicating the potential impact of unauthorized disclosure. [2] | `public`; `internal`; `confidential`; `restricted` |
16+
17+
**[1] `data.category`:** It is used for regulatory compliance mapping and filtering in governance dashboards.
18+
19+
**[2] `data.sensitivity`:** It drives automated security workflows like alerting and encryption.
20+
21+
---
22+
23+
`data.category` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
24+
25+
| Value | Description | Stability |
26+
| --- | --- | --- |
27+
| `financial` | Financial data (e.g., credit card numbers, bank account info). | ![Development](https://img.shields.io/badge/-development-blue) |
28+
| `health` | Health data (e.g., medical records, PHI). | ![Development](https://img.shields.io/badge/-development-blue) |
29+
| `internal` | Internal-only data. | ![Development](https://img.shields.io/badge/-development-blue) |
30+
| `pii` | Personally Identifiable Information. | ![Development](https://img.shields.io/badge/-development-blue) |
31+
| `public` | Publicly available data. | ![Development](https://img.shields.io/badge/-development-blue) |
32+
33+
---
34+
35+
`data.sensitivity` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
36+
37+
| Value | Description | Stability |
38+
| --- | --- | --- |
39+
| `confidential` | Sensitive data that requires protection from unauthorized access. | ![Development](https://img.shields.io/badge/-development-blue) |
40+
| `internal` | Data intended for internal use within the organization. | ![Development](https://img.shields.io/badge/-development-blue) |
41+
| `public` | Data that is safe for public disclosure. | ![Development](https://img.shields.io/badge/-development-blue) |
42+
| `restricted` | Highly sensitive data that requires strict access controls and protection. | ![Development](https://img.shields.io/badge/-development-blue) |

model/data/registry.yaml

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
groups:
2+
- id: registry.data
3+
type: attribute_group
4+
display_name: Data Attributes
5+
brief: >
6+
A data attribute group represents a logical or physical collection of information (e.g., a database table, an S3 object, or a message queue) being monitored for security and governance.
7+
note: >
8+
In a security-aware infrastructure, context transfer is the mechanism that propagates metadata—such as `data.sensitivity`—across a distributed request chain using distributed baggage.
9+
This allows every service in a pipeline to be aware of the "cargo" (the data) it is currently processing, enabling dynamic guardrails such as automated encryption, audit logging, or real-time access blocking at trust boundaries.
10+
attributes:
11+
- id: data.category
12+
type:
13+
members:
14+
- id: pii
15+
value: 'pii'
16+
brief: >
17+
Personally Identifiable Information.
18+
stability: development
19+
- id: financial
20+
value: 'financial'
21+
brief: >
22+
Financial data (e.g., credit card numbers, bank account info).
23+
stability: development
24+
- id: health
25+
value: 'health'
26+
brief: >
27+
Health data (e.g., medical records, PHI).
28+
stability: development
29+
- id: internal
30+
value: 'internal'
31+
brief: >
32+
Internal-only data.
33+
stability: development
34+
- id: public
35+
value: 'public'
36+
brief: >
37+
Publicly available data.
38+
stability: development
39+
stability: development
40+
brief: >
41+
A taxonomy identifier describing the semantic nature of the data.
42+
note: >
43+
It is used for regulatory compliance mapping and filtering in governance dashboards.
44+
examples: ["pii", "financial", "health", "internal", "public"]
45+
- id: data.sensitivity
46+
type:
47+
members:
48+
- id: public
49+
value: 'public'
50+
brief: >
51+
Data that is safe for public disclosure.
52+
stability: development
53+
- id: internal
54+
value: 'internal'
55+
brief: >
56+
Data intended for internal use within the organization.
57+
stability: development
58+
- id: confidential
59+
value: 'confidential'
60+
brief: >
61+
Sensitive data that requires protection from unauthorized access.
62+
stability: development
63+
- id: restricted
64+
value: 'restricted'
65+
brief: >
66+
Highly sensitive data that requires strict access controls and protection.
67+
stability: development
68+
stability: development
69+
brief: >
70+
A classification level indicating the potential impact of unauthorized disclosure.
71+
note: >
72+
It drives automated security workflows like alerting and encryption.
73+
examples: ["public", "internal", "confidential", "restricted"]

0 commit comments

Comments
 (0)