Introduce data attribute group in semantic-conventions for governance and security by Ayushi-12 · Pull Request #3644 · open-telemetry/semantic-conventions

Ayushi-12 · 2026-04-22T10:24:46Z

Fixes #

Changes

This change introduces the 'data' attribute group to OpenTelemetry semantic conventions to support data governance and security use cases.
Key changes:

Added 'data.category' to classify data by semantic nature (pii, financial, etc.)
Added 'data.sensitivity' to classify data by impact level (restricted, internal, etc.)
Included a non-normative guide with usage examples for Kubernetes, Resource Attributes, and Context Propagation (Baggage).
These attributes enable automated security workflows like dynamic data redaction, compliance mapping (GDPR/PCI), and sensitive data movement monitoring.

Prototypes [IN draft] -
open-telemetry/opentelemetry-demo#3215
open-telemetry/opentelemetry-demo#3210

Supporting documents -
Introduce "data" attribute group in OTEL

Important

Pull requests acceptance are subject to the triage process as described in Issue and PR Triage Management.
PRs that do not follow the guidance above, may be automatically rejected and closed.

Merge requirement checklist

CONTRIBUTING.md guidelines followed.
Change log entry added, according to the guidelines in When to add a changelog entry.
- If your PR does not need a change log, start the PR title with [chore]
Links to the prototypes or existing instrumentations (when adding or changing conventions)

linux-foundation-easycla · 2026-04-22T10:24:51Z

The committers listed above are authorized under a signed CLA.

✅ login: Aneurysm9 / name: Anthony Mirabella (2cdc91a)
✅ login: Ayushi-12 / name: Ayushi Asthana (1b67bcf, 2cdc91a, 4407831, ecdd165)

github-actions · 2026-04-22T10:25:06Z

This PR contains changes to area(s) that do not have an active SIG/project and will be auto-closed:

data

Such changes may be rejected or put on hold until a new SIG/project is established.

Please refer to the Semantic Convention Areas
document to see the current active SIGs and also to learn how to kick start a new one.

github-actions · 2026-04-22T10:25:25Z

This PR contains changes to area(s) that do not have an active SIG/project and will be auto-closed:

data

Such changes may be rejected or put on hold until a new SIG/project is established.

Please refer to the Semantic Convention Areas
document to see the current active SIGs and also to learn how to kick start a new one.

…metry#3088)

KalleOlaviNiemitalo · 2026-04-22T15:11:27Z

+
+1. **Low Cardinality**: To avoid performance degradation in metrics backends (like Prometheus), do not use unique IDs (e.g., `user_id`) in `data.*` attributes.
+2. **No Actual Data**: Never place actual PII (e.g., an email address) inside these attributes. They are for metadata only.
+3. **Mandatory Review**: Any new value added to `data.category` must be approved by the Data Governance committee to ensure consistent reporting.


Is a "Data Governance committee" being formed in the OpenTelemetry Project, or does this refer to something else? https://opentelemetry.io/community/members/ lists "Governance Committee" and "Technical Committee" but not yet that.

I think this section can be removed. I meant it more as a guideline for the users who might want to add custom values to data.category field but it is confusing when it references a "committee"

proposing a mandatory review as a guideline seems counterintuitive.

Aneurysm9 · 2026-04-23T16:09:56Z

+
+---
+
+`data.category` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.


Is there an existing taxonomy we can/should use here? Maybe more generally does this need to be enumerated now?

One of my concerns is that either internal or public would seem to apply to all data and thus one of them MUST be used at all times. That doesn't leave any room for any other category since this is string and not []string. There's also overlap between pii and financial or health.

I agree with your point, so far in my research I could not find any specific taxonomy that will be clean and mutually exclusive.
Should we remove the enum for now and let the community usage drive addition of enums
(while continuing the research for standardized terminology)?

Microsoft has lists of sensitivity labels and information types for use with SQL Server, at https://github.com/Azure-Samples/sql-data-classification/blob/main/sql_information_protection_default.json, referenced from https://learn.microsoft.com/en-us/sql/relational-databases/security/sql-data-discovery-and-classification?view=sql-server-ver17. I don't know if the same taxonomy is used in other Microsoft products or even by other vendors.

Their Tabular Data Stream 7.4 protocol is able to return this information from SQL Server to the client along with a result set; see [MS-TDS] section 2.2.7.5 DATACLASSIFICATION. Which makes me wonder whether these attributes could be of any use in SQL Server database client spans. Because these attributes are defined as plain strings rather than arrays, I suppose the instrumentation would have to choose only the most sensitive label and the most secret information type if different columns of a result set have different labels.

Aneurysm9 · 2026-04-23T16:15:03Z

+
+---
+
+`data.sensitivity` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.


I'm not sure this should be a formal enumeration, for some of the same reasons mentioned about data.category. I'd expect most organizations to have their own data classification schemes that may or may not align with this and that may be more coarsely- or finely-grained.

Ack. I can remove the enum from both these fields for now. We can come back to it with more data.

Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>

thompson-tomo · 2026-04-24T04:17:11Z

+
+```bash
+# Broadly tagging the service resource in the collection pipeline
+export OTEL_RESOURCE_ATTRIBUTES="data.sensitivity=restricted,data.category=financial"


How does this map to declarative config?

I imagine it would be like

file_format: "1.0-rc.1" resource: attributes_list: "data.sensitivity=restricted,data.category=financial"

or

file_format: "1.0-rc.1" resource: attributes: - name: data.sensitivity value: restricted - name: data.category value: financial

https://github.com/open-telemetry/opentelemetry-configuration/blob/v1.0.0-rc.1/examples/kitchen-sink.yaml shows an example of both.

thompson-tomo

The resource attribute/scenario is hard to invision how it could be used reliably. Also resource entries should be associated with an entity.

thompson-tomo · 2026-04-27T13:00:21Z

+
+### 2. OpenTelemetry Resource Attributes
+
+When a service starts, it can be configured to broadcast its data handling capabilities or the sensitivity of its primary datastore via environment variables.


I am not following here. This is talking about broadcasting it's capabilities (plural) and as such shouldn't the resource attribute be an array so that a service can report that it can handle PII & health etc. Tools can then raise alerts if attribute on telemetry signal ie span is not a registered capability.

thompson-tomo · 2026-04-27T13:11:46Z

+
+### 2. OpenTelemetry Resource Attributes
+
+When a service starts, it can be configured to broadcast its data handling capabilities or the sensitivity of its primary datastore via environment variables.


The data store use case will be difficult due to potential for multiple data stores being used by a client. The scenario where I see it as workable is if the data store ie db reports it's sensitivity and we can trace from client to server.

Ayushi-12 requested review from a team as code owners April 22, 2026 10:24

github-project-automation Bot added this to Semantic Conventions Triage Apr 22, 2026

github-project-automation Bot moved this to Untriaged in Semantic Conventions Triage Apr 22, 2026

github-actions Bot added the enhancement New feature or request label Apr 22, 2026

Ayushi-12 changed the title ~~Introduce data attribute group in semantic-conventions for governance and security (#3088)~~ Introduce data attribute group in semantic-conventions for governance and security Apr 22, 2026

github-actions Bot added the triage:rejected:declined label Apr 22, 2026

github-actions Bot closed this Apr 22, 2026

jsuereth added triage:accepted:ready-with-sig and removed triage:rejected:declined labels Apr 22, 2026

jsuereth reopened this Apr 22, 2026

Introduce data attribute group for governance and security (open-tele…

4407831

…metry#3088)

Ayushi-12 force-pushed the main branch from 3c72528 to 4407831 Compare April 22, 2026 13:23

KalleOlaviNiemitalo reviewed Apr 22, 2026

View reviewed changes

Ayushi-12 mentioned this pull request Apr 23, 2026

[chore] Assign area:data to Service and Deployment SIG #3645

Open

3 tasks

Remove mandatory review for data.category attributes

1b67bcf

proposing a mandatory review as a guideline seems counterintuitive.

Ayushi-12 requested a review from KalleOlaviNiemitalo April 23, 2026 06:56

Aneurysm9 reviewed Apr 23, 2026

View reviewed changes

Ayushi-12 and others added 2 commits April 27, 2026 13:02

Apply suggestions from code review

2cdc91a

Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>

Remove enums from data.category and data.sensitivity

ecdd165

Ayushi-12 requested a review from Aneurysm9 April 27, 2026 08:31

thompson-tomo reviewed Apr 27, 2026

View reviewed changes

lmolkova moved this from Untriaged to Awaiting codeowners approval in Semantic Conventions Triage Apr 27, 2026

thompson-tomo reviewed Apr 27, 2026

View reviewed changes


		---

		`data.category` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.


		---

		`data.sensitivity` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.


		### 2. OpenTelemetry Resource Attributes

		When a service starts, it can be configured to broadcast its data handling capabilities or the sensitivity of its primary datastore via environment variables.

Conversation

Ayushi-12 commented Apr 22, 2026

Changes

Merge requirement checklist

Uh oh!

linux-foundation-easycla Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KalleOlaviNiemitalo Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thompson-tomo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

linux-foundation-easycla Bot commented Apr 22, 2026 •

edited

Loading

KalleOlaviNiemitalo Apr 27, 2026 •

edited

Loading