-
Notifications
You must be signed in to change notification settings - Fork 109
Description
Problem
The doc/data-templates.adoc file is currently a TODO placeholder (only 4 lines), but it's linked from the README.adoc as a "How-to/recipe" guide (line 27).
New contributors who click this link expecting guidance on creating UDB data files find only "TODO", which creates a poor user experience and slows down the contribution process.
Impact
Current state:
- New contributors have no guidance on UDB data structure
- No examples of how to create CSR definitions
- No examples of how to create instruction definitions
- No examples of how to create extension definitions
- Slows down onboarding and contribution process
Affected users:
- New contributors trying to add CSRs
- Developers migrating from riscv-opcodes
- Anyone trying to understand UDB data format
Proposed Solution
Complete doc/data-templates.adoc with comprehensive documentation including:
1. CSR Definition Template
- Real example based on a simple CSR (using mstatus fields as reference)
- Explanation of each field (location, long_name, description, type(), reset_value())
- How to handle RV32 vs RV64 differences (location_rv32 vs location_rv64)
- How to specify extension dependencies using
definedBy - Examples of different field types (RO, RW, WARL, etc.)
2. Instruction Definition Template
- Basic instruction structure
- Encoding specification
- Assembly format
- Link to existing examples in spec/std/isa/inst/
3. Extension Definition Template
- How to define a new extension
- Required fields and their meanings
- Link to existing examples in spec/std/isa/ext/
4. Common Patterns
- Config-dependent fields
- Multi-XLEN support (RV32/RV64/RV128)
- Parameter definitions
- IDL function usage
5. Validation Steps
- How to validate new data files
- Running schema validation
- Running tests
Context
I am a prospective LFX Spring '26 mentee for the "AI-assisted extraction of architectural parameters from RISC-V specifications" project.
I've studied the UDB structure extensively while developing an LLM-based parameter extraction proof-of-concept:
- Analyzed mstatus.yaml (627 lines, 22 fields)
- Studied misa.yaml and parameter definitions
- Tested LLM extraction achieving 98% accuracy
I believe I can create comprehensive, beginner-friendly documentation based on my understanding of the UDB data structure.
Proposed Changes
- Complete
doc/data-templates.adocwith ~200-300 lines of documentation - Include real, working examples from existing UDB data
- Add clear explanations suitable for new contributors
- Link to relevant schema documentation
- Include validation workflow
Acceptance Criteria
- File contains real, working examples (not just placeholders)
- Examples are well-commented and explained
- Covers CSR, instruction, and extension templates at minimum
- Includes validation steps
- Links to relevant schema documentation (schemas.adoc, idl.adoc)
- Renders correctly as AsciiDoc
- Helps new contributors understand UDB data structure
Questions
- Should I include examples for all data types (CSR, instruction, extension, parameter) or focus on CSRs first?
- Should I include IDL function examples or keep it simple?
- Any specific CSR you'd prefer as the main example besides mstatus?
I'm happy to adjust the scope based on maintainer feedback.