Skip to content

Commit cd2448f

Browse files
authored
Merge pull request #50 from chaoss/contribution-signaling
Preference Signaling efforts and participation information
2 parents fe27a1b + 0904889 commit cd2448f

File tree

1 file changed

+119
-0
lines changed

1 file changed

+119
-0
lines changed
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# Preference Signaling for AI Training: Where Our Community Can Contribute
2+
3+
*Based on Stefano Maffulli's "[The Landscape of Preference Signaling for AI Training," April 2026*](https://www.maffulli.net/the-landscape-of-preference-signaling-for-ai-training/), please find a list of initiatives and ways to connect, contribute and review.
4+
5+
---
6+
7+
## Why This Matters
8+
9+
The open web and software commons were built without anticipating that everything created would be used to train AI systems. A range of initiatives is now working to establish reciprocity between AI developers and the communities whose work powers them.
10+
11+
This document maps the current landscape of preference signaling initiatives and where participation is possible based on the mapping from Stefano's paper./
12+
13+
---
14+
15+
## The Initiatives
16+
17+
---
18+
19+
### 1. IETF AI Preferences (AiPref) Working Group
20+
21+
**What it is:** Building the shared vocabulary and technical standards that all other initiatives reference. The foundational layer of preference signaling.
22+
23+
**Governance:** Open standards body (IETF). Community-run, standards-track process.
24+
25+
**Status:** Active. RFC expected late 2026/early 2027.
26+
27+
**Where to participate:**
28+
- [Working Group home](https://ietf-wg-aipref.github.io/)
29+
- [IETF Datatracker](https://datatracker.ietf.org/wg/aipref/about/)
30+
- [GitHub repository](https://github.com/ietf-wg-aipref/drafts)
31+
- Mailing list: [ai-control@ietf.org](mailto:ai-control@ietf.org) / [Archive](https://mailarchive.ietf.org/arch/browse/ai-control/) / [Subscribe](https://www.ietf.org/mailman/listinfo/ai-control/)
32+
33+
---
34+
35+
### 2. TDM-AI Protocol
36+
37+
**What it is:** Cryptographically binds creator preferences to content using content-derived fingerprints (ISCC identifiers). Preferences travel with the content even if metadata is stripped or files are moved.
38+
39+
**Governance:** Developed by [Liccium B.V.](https://liccium.com/), a Dutch rights management company. The [CommonsDB](https://openfuture.eu/project/commonsdb/) registry implementation is EU co-funded.
40+
41+
**Status:** Active development. CommonsDB registry due July 2026.
42+
43+
**Where to participate:**
44+
- [TDM-AI documentation](https://docs.tdmai.org/)
45+
- [GitHub repository](https://github.com/liccium/tdmai)
46+
47+
---
48+
49+
### 3. Really Simple Licensing (RSL)
50+
51+
**What it is:** Machine-readable licensing built on RSS, adding compensation models (pay-per-crawl, pay-per-inference) and content protection. Includes the RSL Collective for smaller publishers and creators.
52+
53+
**Governance:** Standard maintained by the RSL Technical Steering Committee, representing major publishing and technology companies. The [RSL Collective](https://rslcollective.org/) is a nonprofit collective rights organization, free to join.
54+
55+
**Status:** RSL 1.0 published December 2025.
56+
57+
**Where to participate:**
58+
- [RSL Standard](https://rslstandard.org/)
59+
- [GitHub repository](https://github.com/rslstandard/rsl)
60+
- [RSL Collective](https://rslcollective.org/) (nonprofit, free to join)
61+
62+
---
63+
64+
### 4. Creative Commons Signals
65+
66+
**What it is:** A framework for data stewards (organizations managing large content collections such as Common Crawl, libraries, and data cooperatives) to express reciprocity conditions for AI reuse. Four signal types: Credit, Direct Contribution, Ecosystem Contribution, and Open.
67+
68+
**Governance:** Led by [Creative Commons](https://creativecommons.org/), a nonprofit organization.
69+
70+
**Status:** Draft proposal. Community input sought via GitHub.
71+
72+
**Where to participate:**
73+
- [CC Signals overview](https://creativecommons.org/ai-and-the-commons/cc-signals/)
74+
- [GitHub repository and discussions](https://github.com/creativecommons/cc-signals)
75+
76+
---
77+
78+
### 5. CodeCommons / AI Preference Attachment for Software
79+
80+
**What it is:** Builds on the [Software Heritage](https://www.softwareheritage.org/) archive to create ethical AI training datasets. Proposes embedding AI preferences directly in file headers so they travel with code when copied or vendored into other projects.
81+
82+
**Governance:** Funded by the French government via the France 2030 program. Academic and institutional partners in France and Italy.
83+
84+
**Status:** Actively developed.
85+
86+
**Where to participate:**
87+
- [CodeCommons project](https://codecommons.org/)
88+
- [Software Heritage GitLab](https://gitlab.softwareheritage.org/)
89+
90+
---
91+
92+
### 6. Cloudflare AI Crawl Control
93+
94+
**What it is:** A service that gives website owners visibility and control over AI crawler access, with options to block, allow, or charge for access at the network edge.
95+
96+
**Governance:** A commercial product developed and controlled by [Cloudflare](https://www.cloudflare.com/).
97+
98+
**Status:** Launched July 2025.
99+
100+
**Where to follow:**
101+
- [Cloudflare AI Crawl Control](https://www.cloudflare.com/ai-crawl-control/)
102+
103+
---
104+
105+
## Cross-Cutting Themes
106+
107+
A few shared challenges across all six initiatives, drawn from Maffulli's analysis:
108+
109+
**The acquisition layer:** Most signals operate at the moment of crawling. What happens to data after it is downloaded, and how preferences follow it through training and use, remains an open problem across all initiatives.
110+
111+
**Attribution is evolving:** Attribution in the AI era is likely to attach to datasets rather than individual works. What meaningful attribution looks like in practice is still being defined.
112+
113+
**Balancing access and protection:** Restricting AI crawler access can unintentionally affect beneficial uses alongside harmful ones.
114+
115+
**Global representation:** These initiatives are currently concentrated in Western Europe and North America.
116+
117+
---
118+
119+
*Contributions, corrections, and additions to this document are welcome via pull request.*

0 commit comments

Comments
 (0)