Skip to content

Introduce data attribute group in semantic-conventions for governance and security#3644

Open
Ayushi-12 wants to merge 4 commits intoopen-telemetry:mainfrom
Ayushi-12:main
Open

Introduce data attribute group in semantic-conventions for governance and security#3644
Ayushi-12 wants to merge 4 commits intoopen-telemetry:mainfrom
Ayushi-12:main

Conversation

@Ayushi-12
Copy link
Copy Markdown

Fixes #

Changes

This change introduces the 'data' attribute group to OpenTelemetry semantic conventions to support data governance and security use cases.
Key changes:

  • Added 'data.category' to classify data by semantic nature (pii, financial, etc.)
  • Added 'data.sensitivity' to classify data by impact level (restricted, internal, etc.)
    Included a non-normative guide with usage examples for Kubernetes, Resource Attributes, and Context Propagation (Baggage).
    These attributes enable automated security workflows like dynamic data redaction, compliance mapping (GDPR/PCI), and sensitive data movement monitoring.

Prototypes [IN draft] -
open-telemetry/opentelemetry-demo#3215
open-telemetry/opentelemetry-demo#3210

Supporting documents -
Introduce "data" attribute group in OTEL

Important

Pull requests acceptance are subject to the triage process as described in Issue and PR Triage Management.
PRs that do not follow the guidance above, may be automatically rejected and closed.

Merge requirement checklist

  • CONTRIBUTING.md guidelines followed.
  • Change log entry added, according to the guidelines in When to add a changelog entry.
    • If your PR does not need a change log, start the PR title with [chore]
  • Links to the prototypes or existing instrumentations (when adding or changing conventions)

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Apr 22, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@github-actions github-actions Bot added the enhancement New feature or request label Apr 22, 2026
@Ayushi-12 Ayushi-12 changed the title Introduce data attribute group in semantic-conventions for governance and security (#3088) Introduce data attribute group in semantic-conventions for governance and security Apr 22, 2026
@github-actions
Copy link
Copy Markdown

This PR contains changes to area(s) that do not have an active SIG/project and will be auto-closed:

  • data

Such changes may be rejected or put on hold until a new SIG/project is established.

Please refer to the Semantic Convention Areas
document to see the current active SIGs and also to learn how to kick start a new one.

@github-actions github-actions Bot closed this Apr 22, 2026
@github-actions
Copy link
Copy Markdown

This PR contains changes to area(s) that do not have an active SIG/project and will be auto-closed:

  • data

Such changes may be rejected or put on hold until a new SIG/project is established.

Please refer to the Semantic Convention Areas
document to see the current active SIGs and also to learn how to kick start a new one.

Comment thread docs/non-normative/data-attributes.md Outdated

1. **Low Cardinality**: To avoid performance degradation in metrics backends (like Prometheus), do not use unique IDs (e.g., `user_id`) in `data.*` attributes.
2. **No Actual Data**: Never place actual PII (e.g., an email address) inside these attributes. They are for metadata only.
3. **Mandatory Review**: Any new value added to `data.category` must be approved by the Data Governance committee to ensure consistent reporting.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a "Data Governance committee" being formed in the OpenTelemetry Project, or does this refer to something else? https://opentelemetry.io/community/members/ lists "Governance Committee" and "Technical Committee" but not yet that.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section can be removed. I meant it more as a guideline for the users who might want to add custom values to data.category field but it is confusing when it references a "committee"

proposing a mandatory review as a guideline seems counterintuitive.
Comment thread .chloggen/add-data-attributes.yaml Outdated
Comment thread docs/registry/attributes/data.md Outdated

---

`data.category` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an existing taxonomy we can/should use here? Maybe more generally does this need to be enumerated now?

One of my concerns is that either internal or public would seem to apply to all data and thus one of them MUST be used at all times. That doesn't leave any room for any other category since this is string and not []string. There's also overlap between pii and financial or health.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your point, so far in my research I could not find any specific taxonomy that will be clean and mutually exclusive.
Should we remove the enum for now and let the community usage drive addition of enums
(while continuing the research for standardized terminology)?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Microsoft has lists of sensitivity labels and information types for use with SQL Server, at https://github.com/Azure-Samples/sql-data-classification/blob/main/sql_information_protection_default.json, referenced from https://learn.microsoft.com/en-us/sql/relational-databases/security/sql-data-discovery-and-classification?view=sql-server-ver17. I don't know if the same taxonomy is used in other Microsoft products or even by other vendors.

Their Tabular Data Stream 7.4 protocol is able to return this information from SQL Server to the client along with a result set; see [MS-TDS] section 2.2.7.5 DATACLASSIFICATION. Which makes me wonder whether these attributes could be of any use in SQL Server database client spans. Because these attributes are defined as plain strings rather than arrays, I suppose the instrumentation would have to choose only the most sensitive label and the most secret information type if different columns of a result set have different labels.

Comment thread docs/registry/attributes/data.md Outdated

---

`data.sensitivity` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this should be a formal enumeration, for some of the same reasons mentioned about data.category. I'd expect most organizations to have their own data classification schemes that may or may not align with this and that may be more coarsely- or finely-grained.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. I can remove the enum from both these fields for now. We can come back to it with more data.

Ayushi-12 and others added 2 commits April 27, 2026 13:02
@Ayushi-12 Ayushi-12 requested a review from Aneurysm9 April 27, 2026 08:31

```bash
# Broadly tagging the service resource in the collection pipeline
export OTEL_RESOURCE_ATTRIBUTES="data.sensitivity=restricted,data.category=financial"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this map to declarative config?

Copy link
Copy Markdown

@KalleOlaviNiemitalo KalleOlaviNiemitalo Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine it would be like

file_format: "1.0-rc.1"
resource:
  attributes_list: "data.sensitivity=restricted,data.category=financial"

or

file_format: "1.0-rc.1"
resource:
  attributes:
    - name: data.sensitivity
      value: restricted
    - name: data.category
      value: financial

https://github.com/open-telemetry/opentelemetry-configuration/blob/v1.0.0-rc.1/examples/kitchen-sink.yaml shows an example of both.

@lmolkova lmolkova moved this from Untriaged to Awaiting codeowners approval in Semantic Conventions Triage Apr 27, 2026
Copy link
Copy Markdown
Contributor

@thompson-tomo thompson-tomo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resource attribute/scenario is hard to invision how it could be used reliably. Also resource entries should be associated with an entity.


### 2. OpenTelemetry Resource Attributes

When a service starts, it can be configured to broadcast its data handling capabilities or the sensitivity of its primary datastore via environment variables.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not following here. This is talking about broadcasting it's capabilities (plural) and as such shouldn't the resource attribute be an array so that a service can report that it can handle PII & health etc. Tools can then raise alerts if attribute on telemetry signal ie span is not a registered capability.


### 2. OpenTelemetry Resource Attributes

When a service starts, it can be configured to broadcast its data handling capabilities or the sensitivity of its primary datastore via environment variables.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data store use case will be difficult due to potential for multiple data stores being used by a client. The scenario where I see it as workable is if the data store ie db reports it's sensitivity and we can trace from client to server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Awaiting codeowners approval

Development

Successfully merging this pull request may close these issues.

6 participants