|
| 1 | +# Surelog Preprocessor Architecture |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +Surelog's preprocessor is not a simple text-replacement engine, but rather a sophisticated **full ANTLR-based parser** with its own grammar, data structures, and persistent caching mechanism. This architecture enables accurate tracking of file locations, macro expansions, and includes throughout the preprocessing phase, information that is essential for later compilation stages. |
| 6 | + |
| 7 | +## Preprocessing Strategy |
| 8 | + |
| 9 | +The preprocessor performs the following transformations: |
| 10 | +- **Include files are replaced by their contents**: When an `` `include`` directive is encountered, the entire file content is inserted in place of the directive |
| 11 | +- **Macro definitions are carved out**: Macro definition directives are removed from the preprocessed output but replaced with **equivalent whitespace** to preserve line numbering |
| 12 | +- **Whitespace preservation**: This approach results in potentially large amounts of whitespace and blank lines in macro-heavy files, but this is intentional—it greatly simplifies the mathematical reconstruction of file/line information, especially when macros and includes are used in complex nested (imbricated) fashion |
| 13 | + |
| 14 | +This whitespace-preserving strategy ensures that line numbers in the preprocessed output maintain a direct correspondence with the original source, making it straightforward to map errors and debug information back to the correct locations even through multiple levels of macro expansion and file inclusion. |
| 15 | + |
| 16 | +## Key Components |
| 17 | + |
| 18 | +### 1. ANTLR Grammar and Parser |
| 19 | + |
| 20 | +The preprocessor uses a dedicated ANTLR 4 grammar separate from the main SystemVerilog parser: |
| 21 | + |
| 22 | +- **Grammar Files**: |
| 23 | + - [`grammar/SV3_1aPpParser.g4`](grammar/SV3_1aPpParser.g4) - Preprocessor parser grammar |
| 24 | + - [`grammar/SV3_1aPpLexer.g4`](grammar/SV3_1aPpLexer.g4) - Preprocessor lexer grammar |
| 25 | + |
| 26 | +The parser grammar defines rules for all SystemVerilog preprocessor directives including: |
| 27 | +- `ifdef_directive`, `ifndef_directive`, `else_directive`, `elsif_directive`, `endif_directive` |
| 28 | +- `include_directive` |
| 29 | +- `macro_definition`, `macro_usage` |
| 30 | +- `undef_directive`, `resetall_directive` |
| 31 | +- `timescale_directive`, `line_directive` |
| 32 | +- And many more preprocessor-specific constructs |
| 33 | + |
| 34 | +### 2. IncludeFileInfo Data Structure |
| 35 | + |
| 36 | +The [`IncludeFileInfo`](include/Surelog/SourceCompile/IncludeFileInfo.h) class is the core data structure for maintaining file and line tracking information: |
| 37 | + |
| 38 | +```cpp |
| 39 | +class IncludeFileInfo { |
| 40 | +public: |
| 41 | + enum class Context : uint32_t { NONE = 0, INCLUDE = 1, MACRO = 2 }; |
| 42 | + enum class Action : uint32_t { NONE = 0, PUSH = 1, POP = 2 }; |
| 43 | + |
| 44 | + // Tracks: |
| 45 | + // - Context (include file or macro expansion) |
| 46 | + // - Section start line in preprocessed output |
| 47 | + // - Original file location (line, column) |
| 48 | + // - Push/Pop actions for nested contexts |
| 49 | + // - Opening/closing indices for paired directives |
| 50 | +}; |
| 51 | +``` |
| 52 | +
|
| 53 | +Key fields: |
| 54 | +- `m_context`: Whether this is an `INCLUDE` file or `MACRO` expansion |
| 55 | +- `m_sectionStartLine`: Line number in the preprocessed output |
| 56 | +- `m_sectionFileId`/`m_sectionSymbolId`: References to the included file or macro |
| 57 | +- `m_originalStartLine`/`m_originalStartColumn`: Original source location |
| 58 | +- `m_action`: `PUSH` when entering a context, `POP` when exiting |
| 59 | +- `m_indexOpening`/`m_indexClosing`: Paired indices for matching push/pop operations |
| 60 | +
|
| 61 | +### 3. SV3_1aPpTreeShapeListener |
| 62 | +
|
| 63 | +The [`SV3_1aPpTreeShapeListener`](src/SourceCompile/SV3_1aPpTreeShapeListener.cpp) class (derived from `SV3_1aPpTreeListenerHelper`) is the ANTLR tree listener that processes the parse tree and maintains the `IncludeFileInfo` records. |
| 64 | +
|
| 65 | +Key responsibilities: |
| 66 | +
|
| 67 | +#### Include File Processing |
| 68 | +When processing `#include` directives ([lines 309-388](src/SourceCompile/SV3_1aPpTreeShapeListener.cpp#L309-L388)): |
| 69 | +```cpp |
| 70 | +// Records PUSH action when entering include file |
| 71 | +openingIndex = m_pp->getSourceFile()->addIncludeFileInfo( |
| 72 | + IncludeFileInfo::Context::INCLUDE, |
| 73 | + 1, // sectionStartLine |
| 74 | + symbolId, // include file symbol |
| 75 | + fileId, // include file path |
| 76 | + // ... original location info |
| 77 | + IncludeFileInfo::Action::PUSH); |
| 78 | +
|
| 79 | +// Records POP action when exiting include file |
| 80 | +closingIndex = m_pp->getSourceFile()->addIncludeFileInfo( |
| 81 | + IncludeFileInfo::Context::INCLUDE, |
| 82 | + // ... location info |
| 83 | + IncludeFileInfo::Action::POP, |
| 84 | + openingIndex, 0); |
| 85 | +``` |
| 86 | + |
| 87 | +#### Macro Expansion Tracking |
| 88 | +When processing macro expansions ([lines 503-594](src/SourceCompile/SV3_1aPpTreeShapeListener.cpp#L503-L594)): |
| 89 | +```cpp |
| 90 | +// Records PUSH when entering macro expansion |
| 91 | +openingIndex = m_pp->getSourceFile()->addIncludeFileInfo( |
| 92 | + IncludeFileInfo::Context::MACRO, |
| 93 | + macroInf->m_startLine, |
| 94 | + BadSymbolId, |
| 95 | + macroInf->m_fileId, |
| 96 | + // ... location info |
| 97 | + IncludeFileInfo::Action::PUSH); |
| 98 | + |
| 99 | +// Records POP when exiting macro expansion |
| 100 | +closingIndex = m_pp->getSourceFile()->addIncludeFileInfo( |
| 101 | + IncludeFileInfo::Context::MACRO, |
| 102 | + // ... location info |
| 103 | + IncludeFileInfo::Action::POP, |
| 104 | + openingIndex, 0); |
| 105 | +``` |
| 106 | +
|
| 107 | +#### Nested Context Tracking |
| 108 | +The same `IncludeFileInfo` mechanism handles complex nested scenarios, including: |
| 109 | +- **Macro expansions that contain `include directives**: When a macro body contains an `include statement, the preprocessor tracks both the macro expansion context (MACRO) and the subsequent file inclusion context (INCLUDE) |
| 110 | +- **Include files that define and use macros**: Files brought in via `include can define macros that are then expanded, creating nested INCLUDE→MACRO contexts |
| 111 | +- **Recursive macro expansions**: When macros expand to other macro invocations, multiple MACRO contexts are pushed and popped in sequence |
| 112 | +
|
| 113 | +This unified tracking system ensures that any combination of macro expansions and file includes maintains accurate source location information through arbitrary levels of nesting. |
| 114 | +
|
| 115 | +#### Debug Output |
| 116 | +The `-d incl` command-line option outputs the entire `IncludeFileInfo` data structure to stderr for debugging purposes. This is implemented in [`PreprocessFile::reportIncludeInfo()`](src/SourceCompile/PreprocessFile.cpp#L659-L677) and called from [`CompileSourceFile`](src/SourceCompile/CompileSourceFile.cpp#L244-L245): |
| 117 | +
|
| 118 | +```cpp |
| 119 | +if (m_commandLineParser->getDebugIncludeFileInfo()) |
| 120 | + std::cerr << m_pp->reportIncludeInfo(); |
| 121 | +``` |
| 122 | + |
| 123 | +The debug output shows: |
| 124 | +- Context type ("inc" for INCLUDE, "mac" for MACRO) |
| 125 | +- Original source location (line,column:endLine,endColumn) |
| 126 | +- Action ("in" for PUSH, "out" for POP) |
| 127 | +- File paths and symbols involved |
| 128 | +- Opening/closing index pairs |
| 129 | + |
| 130 | +This provides a complete trace of all preprocessor context changes, invaluable for debugging complex preprocessing issues. |
| 131 | + |
| 132 | +### 4. Cache Persistence Mechanism |
| 133 | + |
| 134 | +The preprocessor output and associated `IncludeFileInfo` records are persisted to disk using Cap'n Proto serialization: |
| 135 | + |
| 136 | +#### PPCache Class |
| 137 | +[`PPCache`](src/Cache/PPCache.cpp) handles saving and restoring preprocessor state: |
| 138 | + |
| 139 | +- **Saving** ([`PreprocessFile::saveCache()`](src/SourceCompile/PreprocessFile.cpp#L1289-L1306)): |
| 140 | + ```cpp |
| 141 | + void PreprocessFile::saveCache() { |
| 142 | + if (!m_usingCachedVersion) { |
| 143 | + PPCache cache(this); |
| 144 | + cache.save(); |
| 145 | + } |
| 146 | + } |
| 147 | + ``` |
| 148 | +
|
| 149 | +- **Restoring** ([lines 362-365](src/SourceCompile/PreprocessFile.cpp#L362-L365)): |
| 150 | + ```cpp |
| 151 | + PPCache cache(this); |
| 152 | + if (cache.restore(clp->lowMem() || clp->noCacheHash())) { |
| 153 | + m_usingCachedVersion = true; |
| 154 | + } |
| 155 | + ``` |
| 156 | + |
| 157 | +#### IncludeFileInfo Serialization |
| 158 | +The [`cacheIncludeFileInfos()`](src/Cache/PPCache.cpp#L419-L438) method serializes the `IncludeFileInfo` vector: |
| 159 | +```cpp |
| 160 | +void PPCache::cacheIncludeFileInfos(::PPCache::Builder builder, |
| 161 | + SymbolTable& targetSymbols, |
| 162 | + const SymbolTable& sourceSymbols) { |
| 163 | + const std::vector<IncludeFileInfo>& sourceIncludeFileInfos = |
| 164 | + m_pp->getIncludeFileInfo(); |
| 165 | + |
| 166 | + // Serialize each IncludeFileInfo record |
| 167 | + for (const IncludeFileInfo& sourceIncludeFileInfo : sourceIncludeFileInfos) { |
| 168 | + // Copy symbol IDs and path IDs to cache |
| 169 | + // Store all context, action, and location information |
| 170 | + } |
| 171 | +} |
| 172 | +``` |
| 173 | +
|
| 174 | +## Data Flow |
| 175 | +
|
| 176 | +1. **Parsing Phase**: ANTLR parses the input file using the preprocessor grammar |
| 177 | +2. **Tree Walking**: `SV3_1aPpTreeShapeListener` walks the parse tree: |
| 178 | + - Processes each preprocessor directive |
| 179 | + - Creates `IncludeFileInfo` records for includes and macros |
| 180 | + - Maintains push/pop pairs for proper nesting |
| 181 | +3. **Output Generation**: Preprocessed text is generated alongside the `IncludeFileInfo` records |
| 182 | +4. **Caching**: Both the preprocessed text and `IncludeFileInfo` are serialized to the cache directory |
| 183 | +5. **Parser Integration**: When the main parser processes the preprocessed file, it uses the `IncludeFileInfo` records to: |
| 184 | + - Map locations in preprocessed text back to original source files |
| 185 | + - Provide accurate error reporting with original file locations |
| 186 | + - Support debugging and source navigation |
| 187 | +
|
| 188 | +## Benefits of This Architecture |
| 189 | +
|
| 190 | +1. **Accurate Location Tracking**: Every token in the preprocessed output can be mapped back to its original source location |
| 191 | +2. **Performance**: Caching preprocessed files avoids redundant preprocessing |
| 192 | +3. **Error Reporting**: Error messages reference the original source files, not the preprocessed text |
| 193 | +4. **Debugging Support**: Tools can navigate from preprocessed code back to original sources |
| 194 | +5. **Incremental Compilation**: Cached preprocessor output enables faster incremental builds |
| 195 | +
|
| 196 | +## Integration with Main Parser |
| 197 | +
|
| 198 | +The main SystemVerilog parser ([`grammar/SV3_1aParser.g4`](grammar/SV3_1aParser.g4)) consumes the preprocessed output along with the `IncludeFileInfo` records. This allows it to: |
| 199 | +- Report errors at original source locations |
| 200 | +- Build accurate ASTs with proper file/line information |
| 201 | +- Support IDE features like "go to definition" across include boundaries |
| 202 | +
|
| 203 | +## Command Line Options |
| 204 | +
|
| 205 | +The preprocessor behavior can be controlled through various command-line options: |
| 206 | +
|
| 207 | +### Include Paths and Defines |
| 208 | +- **`+incdir+<dir>`**: Add include search paths for `include directives |
| 209 | +- **`+define+<name>=<value>`**: Define preprocessor macros from command line |
| 210 | +- **`-D <name>=<value>`**: Alternative syntax for defining macros |
| 211 | +
|
| 212 | +### Preprocessor Output |
| 213 | +- **`-writepp`**: Write preprocessed output files to `slpp_all/` or `slpp_unit/` directory |
| 214 | +- **`-writeppfile <file>`**: Write preprocessed output to a specific file |
| 215 | +- **`-lineoffsetascomments`**: Add line offset information as comments in preprocessed output |
| 216 | +
|
| 217 | +### Caching Control |
| 218 | +- **`-nocache`**: Disable reading cached preprocessor output (forces re-preprocessing) |
| 219 | +- **`-createcache`**: Create cache for precompiled packages |
| 220 | +- **`-cachedir <dir>`**: Specify cache directory location |
| 221 | +
|
| 222 | +### Filtering Options |
| 223 | +- **`-filterprotectedregions`**: Filter out synthesis pragma protected regions |
| 224 | +- **`-filtercomments`**: Remove comments from preprocessed output |
| 225 | +- **`-filterdirectives`**: Filter out simple preprocessor directives |
| 226 | +
|
| 227 | +### Debug Options |
| 228 | +- **`-d incl`**: Output `IncludeFileInfo` debug trace to stderr |
| 229 | +- **`-verbose`**: Provide verbose preprocessing information |
| 230 | +- **`-profile`**: Generate profiling information for preprocessing |
| 231 | +
|
| 232 | +### Special Modes |
| 233 | +- **`-fileunit`**: Compile each file as independent unit (affects preprocessing scope) |
| 234 | +- **`-nobuiltin`**: Skip preprocessing of builtin SystemVerilog classes |
| 235 | +- **`-parseonly`**: Only parse, reload preprocessor saved database |
| 236 | +
|
| 237 | +## Example Usage |
| 238 | +
|
| 239 | +When preprocessing a file with includes and macros: |
| 240 | +```verilog |
| 241 | +// main.sv |
| 242 | +`include "defines.svh" |
| 243 | +module top; |
| 244 | + `MY_MACRO(arg1, arg2) |
| 245 | +endmodule |
| 246 | +``` |
| 247 | + |
| 248 | +The preprocessor generates: |
| 249 | +1. Preprocessed text with expanded includes and macros |
| 250 | +2. `IncludeFileInfo` records tracking: |
| 251 | + - PUSH when entering "defines.svh" |
| 252 | + - POP when exiting "defines.svh" |
| 253 | + - PUSH when expanding MY_MACRO |
| 254 | + - POP when macro expansion completes |
| 255 | + |
| 256 | +This information is cached and used by subsequent compilation stages to maintain accurate source tracking throughout the compilation pipeline. |
0 commit comments