Skip to content

docs: complete data-templates.adoc with CSR and instruction examples #1343

@saurabh12nxf

Description

@saurabh12nxf

Problem

The doc/data-templates.adoc file is currently a TODO placeholder (only 4 lines), but it's linked from the README.adoc as a "How-to/recipe" guide (line 27).

New contributors who click this link expecting guidance on creating UDB data files find only "TODO", which creates a poor user experience and slows down the contribution process.

Impact

Current state:

  • New contributors have no guidance on UDB data structure
  • No examples of how to create CSR definitions
  • No examples of how to create instruction definitions
  • No examples of how to create extension definitions
  • Slows down onboarding and contribution process

Affected users:

  • New contributors trying to add CSRs
  • Developers migrating from riscv-opcodes
  • Anyone trying to understand UDB data format

Proposed Solution

Complete doc/data-templates.adoc with comprehensive documentation including:

1. CSR Definition Template

  • Real example based on a simple CSR (using mstatus fields as reference)
  • Explanation of each field (location, long_name, description, type(), reset_value())
  • How to handle RV32 vs RV64 differences (location_rv32 vs location_rv64)
  • How to specify extension dependencies using definedBy
  • Examples of different field types (RO, RW, WARL, etc.)

2. Instruction Definition Template

  • Basic instruction structure
  • Encoding specification
  • Assembly format
  • Link to existing examples in spec/std/isa/inst/

3. Extension Definition Template

  • How to define a new extension
  • Required fields and their meanings
  • Link to existing examples in spec/std/isa/ext/

4. Common Patterns

  • Config-dependent fields
  • Multi-XLEN support (RV32/RV64/RV128)
  • Parameter definitions
  • IDL function usage

5. Validation Steps

  • How to validate new data files
  • Running schema validation
  • Running tests

Context

I am a prospective LFX Spring '26 mentee for the "AI-assisted extraction of architectural parameters from RISC-V specifications" project.

I've studied the UDB structure extensively while developing an LLM-based parameter extraction proof-of-concept:

  • Analyzed mstatus.yaml (627 lines, 22 fields)
  • Studied misa.yaml and parameter definitions
  • Tested LLM extraction achieving 98% accuracy

I believe I can create comprehensive, beginner-friendly documentation based on my understanding of the UDB data structure.

Proposed Changes

  • Complete doc/data-templates.adoc with ~200-300 lines of documentation
  • Include real, working examples from existing UDB data
  • Add clear explanations suitable for new contributors
  • Link to relevant schema documentation
  • Include validation workflow

Acceptance Criteria

  • File contains real, working examples (not just placeholders)
  • Examples are well-commented and explained
  • Covers CSR, instruction, and extension templates at minimum
  • Includes validation steps
  • Links to relevant schema documentation (schemas.adoc, idl.adoc)
  • Renders correctly as AsciiDoc
  • Helps new contributors understand UDB data structure

Questions

  1. Should I include examples for all data types (CSR, instruction, extension, parameter) or focus on CSRs first?
  2. Should I include IDL function examples or keep it simple?
  3. Any specific CSR you'd prefer as the main example besides mstatus?

I'm happy to adjust the scope based on maintainer feedback.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions