Skip to content

Commit 65d5fdd

Browse files
Merge pull request #4067 from alainmarcel/alainmarcel-patch-1
Preprocessor explanation
2 parents 524d88d + d27547b commit 65d5fdd

File tree

1 file changed

+256
-0
lines changed

1 file changed

+256
-0
lines changed

PREPROCESSOR.md

Lines changed: 256 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,256 @@
1+
# Surelog Preprocessor Architecture
2+
3+
## Overview
4+
5+
Surelog's preprocessor is not a simple text-replacement engine, but rather a sophisticated **full ANTLR-based parser** with its own grammar, data structures, and persistent caching mechanism. This architecture enables accurate tracking of file locations, macro expansions, and includes throughout the preprocessing phase, information that is essential for later compilation stages.
6+
7+
## Preprocessing Strategy
8+
9+
The preprocessor performs the following transformations:
10+
- **Include files are replaced by their contents**: When an `` `include`` directive is encountered, the entire file content is inserted in place of the directive
11+
- **Macro definitions are carved out**: Macro definition directives are removed from the preprocessed output but replaced with **equivalent whitespace** to preserve line numbering
12+
- **Whitespace preservation**: This approach results in potentially large amounts of whitespace and blank lines in macro-heavy files, but this is intentional—it greatly simplifies the mathematical reconstruction of file/line information, especially when macros and includes are used in complex nested (imbricated) fashion
13+
14+
This whitespace-preserving strategy ensures that line numbers in the preprocessed output maintain a direct correspondence with the original source, making it straightforward to map errors and debug information back to the correct locations even through multiple levels of macro expansion and file inclusion.
15+
16+
## Key Components
17+
18+
### 1. ANTLR Grammar and Parser
19+
20+
The preprocessor uses a dedicated ANTLR 4 grammar separate from the main SystemVerilog parser:
21+
22+
- **Grammar Files**:
23+
- [`grammar/SV3_1aPpParser.g4`](grammar/SV3_1aPpParser.g4) - Preprocessor parser grammar
24+
- [`grammar/SV3_1aPpLexer.g4`](grammar/SV3_1aPpLexer.g4) - Preprocessor lexer grammar
25+
26+
The parser grammar defines rules for all SystemVerilog preprocessor directives including:
27+
- `ifdef_directive`, `ifndef_directive`, `else_directive`, `elsif_directive`, `endif_directive`
28+
- `include_directive`
29+
- `macro_definition`, `macro_usage`
30+
- `undef_directive`, `resetall_directive`
31+
- `timescale_directive`, `line_directive`
32+
- And many more preprocessor-specific constructs
33+
34+
### 2. IncludeFileInfo Data Structure
35+
36+
The [`IncludeFileInfo`](include/Surelog/SourceCompile/IncludeFileInfo.h) class is the core data structure for maintaining file and line tracking information:
37+
38+
```cpp
39+
class IncludeFileInfo {
40+
public:
41+
enum class Context : uint32_t { NONE = 0, INCLUDE = 1, MACRO = 2 };
42+
enum class Action : uint32_t { NONE = 0, PUSH = 1, POP = 2 };
43+
44+
// Tracks:
45+
// - Context (include file or macro expansion)
46+
// - Section start line in preprocessed output
47+
// - Original file location (line, column)
48+
// - Push/Pop actions for nested contexts
49+
// - Opening/closing indices for paired directives
50+
};
51+
```
52+
53+
Key fields:
54+
- `m_context`: Whether this is an `INCLUDE` file or `MACRO` expansion
55+
- `m_sectionStartLine`: Line number in the preprocessed output
56+
- `m_sectionFileId`/`m_sectionSymbolId`: References to the included file or macro
57+
- `m_originalStartLine`/`m_originalStartColumn`: Original source location
58+
- `m_action`: `PUSH` when entering a context, `POP` when exiting
59+
- `m_indexOpening`/`m_indexClosing`: Paired indices for matching push/pop operations
60+
61+
### 3. SV3_1aPpTreeShapeListener
62+
63+
The [`SV3_1aPpTreeShapeListener`](src/SourceCompile/SV3_1aPpTreeShapeListener.cpp) class (derived from `SV3_1aPpTreeListenerHelper`) is the ANTLR tree listener that processes the parse tree and maintains the `IncludeFileInfo` records.
64+
65+
Key responsibilities:
66+
67+
#### Include File Processing
68+
When processing `#include` directives ([lines 309-388](src/SourceCompile/SV3_1aPpTreeShapeListener.cpp#L309-L388)):
69+
```cpp
70+
// Records PUSH action when entering include file
71+
openingIndex = m_pp->getSourceFile()->addIncludeFileInfo(
72+
IncludeFileInfo::Context::INCLUDE,
73+
1, // sectionStartLine
74+
symbolId, // include file symbol
75+
fileId, // include file path
76+
// ... original location info
77+
IncludeFileInfo::Action::PUSH);
78+
79+
// Records POP action when exiting include file
80+
closingIndex = m_pp->getSourceFile()->addIncludeFileInfo(
81+
IncludeFileInfo::Context::INCLUDE,
82+
// ... location info
83+
IncludeFileInfo::Action::POP,
84+
openingIndex, 0);
85+
```
86+
87+
#### Macro Expansion Tracking
88+
When processing macro expansions ([lines 503-594](src/SourceCompile/SV3_1aPpTreeShapeListener.cpp#L503-L594)):
89+
```cpp
90+
// Records PUSH when entering macro expansion
91+
openingIndex = m_pp->getSourceFile()->addIncludeFileInfo(
92+
IncludeFileInfo::Context::MACRO,
93+
macroInf->m_startLine,
94+
BadSymbolId,
95+
macroInf->m_fileId,
96+
// ... location info
97+
IncludeFileInfo::Action::PUSH);
98+
99+
// Records POP when exiting macro expansion
100+
closingIndex = m_pp->getSourceFile()->addIncludeFileInfo(
101+
IncludeFileInfo::Context::MACRO,
102+
// ... location info
103+
IncludeFileInfo::Action::POP,
104+
openingIndex, 0);
105+
```
106+
107+
#### Nested Context Tracking
108+
The same `IncludeFileInfo` mechanism handles complex nested scenarios, including:
109+
- **Macro expansions that contain `include directives**: When a macro body contains an `include statement, the preprocessor tracks both the macro expansion context (MACRO) and the subsequent file inclusion context (INCLUDE)
110+
- **Include files that define and use macros**: Files brought in via `include can define macros that are then expanded, creating nested INCLUDE→MACRO contexts
111+
- **Recursive macro expansions**: When macros expand to other macro invocations, multiple MACRO contexts are pushed and popped in sequence
112+
113+
This unified tracking system ensures that any combination of macro expansions and file includes maintains accurate source location information through arbitrary levels of nesting.
114+
115+
#### Debug Output
116+
The `-d incl` command-line option outputs the entire `IncludeFileInfo` data structure to stderr for debugging purposes. This is implemented in [`PreprocessFile::reportIncludeInfo()`](src/SourceCompile/PreprocessFile.cpp#L659-L677) and called from [`CompileSourceFile`](src/SourceCompile/CompileSourceFile.cpp#L244-L245):
117+
118+
```cpp
119+
if (m_commandLineParser->getDebugIncludeFileInfo())
120+
std::cerr << m_pp->reportIncludeInfo();
121+
```
122+
123+
The debug output shows:
124+
- Context type ("inc" for INCLUDE, "mac" for MACRO)
125+
- Original source location (line,column:endLine,endColumn)
126+
- Action ("in" for PUSH, "out" for POP)
127+
- File paths and symbols involved
128+
- Opening/closing index pairs
129+
130+
This provides a complete trace of all preprocessor context changes, invaluable for debugging complex preprocessing issues.
131+
132+
### 4. Cache Persistence Mechanism
133+
134+
The preprocessor output and associated `IncludeFileInfo` records are persisted to disk using Cap'n Proto serialization:
135+
136+
#### PPCache Class
137+
[`PPCache`](src/Cache/PPCache.cpp) handles saving and restoring preprocessor state:
138+
139+
- **Saving** ([`PreprocessFile::saveCache()`](src/SourceCompile/PreprocessFile.cpp#L1289-L1306)):
140+
```cpp
141+
void PreprocessFile::saveCache() {
142+
if (!m_usingCachedVersion) {
143+
PPCache cache(this);
144+
cache.save();
145+
}
146+
}
147+
```
148+
149+
- **Restoring** ([lines 362-365](src/SourceCompile/PreprocessFile.cpp#L362-L365)):
150+
```cpp
151+
PPCache cache(this);
152+
if (cache.restore(clp->lowMem() || clp->noCacheHash())) {
153+
m_usingCachedVersion = true;
154+
}
155+
```
156+
157+
#### IncludeFileInfo Serialization
158+
The [`cacheIncludeFileInfos()`](src/Cache/PPCache.cpp#L419-L438) method serializes the `IncludeFileInfo` vector:
159+
```cpp
160+
void PPCache::cacheIncludeFileInfos(::PPCache::Builder builder,
161+
SymbolTable& targetSymbols,
162+
const SymbolTable& sourceSymbols) {
163+
const std::vector<IncludeFileInfo>& sourceIncludeFileInfos =
164+
m_pp->getIncludeFileInfo();
165+
166+
// Serialize each IncludeFileInfo record
167+
for (const IncludeFileInfo& sourceIncludeFileInfo : sourceIncludeFileInfos) {
168+
// Copy symbol IDs and path IDs to cache
169+
// Store all context, action, and location information
170+
}
171+
}
172+
```
173+
174+
## Data Flow
175+
176+
1. **Parsing Phase**: ANTLR parses the input file using the preprocessor grammar
177+
2. **Tree Walking**: `SV3_1aPpTreeShapeListener` walks the parse tree:
178+
- Processes each preprocessor directive
179+
- Creates `IncludeFileInfo` records for includes and macros
180+
- Maintains push/pop pairs for proper nesting
181+
3. **Output Generation**: Preprocessed text is generated alongside the `IncludeFileInfo` records
182+
4. **Caching**: Both the preprocessed text and `IncludeFileInfo` are serialized to the cache directory
183+
5. **Parser Integration**: When the main parser processes the preprocessed file, it uses the `IncludeFileInfo` records to:
184+
- Map locations in preprocessed text back to original source files
185+
- Provide accurate error reporting with original file locations
186+
- Support debugging and source navigation
187+
188+
## Benefits of This Architecture
189+
190+
1. **Accurate Location Tracking**: Every token in the preprocessed output can be mapped back to its original source location
191+
2. **Performance**: Caching preprocessed files avoids redundant preprocessing
192+
3. **Error Reporting**: Error messages reference the original source files, not the preprocessed text
193+
4. **Debugging Support**: Tools can navigate from preprocessed code back to original sources
194+
5. **Incremental Compilation**: Cached preprocessor output enables faster incremental builds
195+
196+
## Integration with Main Parser
197+
198+
The main SystemVerilog parser ([`grammar/SV3_1aParser.g4`](grammar/SV3_1aParser.g4)) consumes the preprocessed output along with the `IncludeFileInfo` records. This allows it to:
199+
- Report errors at original source locations
200+
- Build accurate ASTs with proper file/line information
201+
- Support IDE features like "go to definition" across include boundaries
202+
203+
## Command Line Options
204+
205+
The preprocessor behavior can be controlled through various command-line options:
206+
207+
### Include Paths and Defines
208+
- **`+incdir+<dir>`**: Add include search paths for `include directives
209+
- **`+define+<name>=<value>`**: Define preprocessor macros from command line
210+
- **`-D <name>=<value>`**: Alternative syntax for defining macros
211+
212+
### Preprocessor Output
213+
- **`-writepp`**: Write preprocessed output files to `slpp_all/` or `slpp_unit/` directory
214+
- **`-writeppfile <file>`**: Write preprocessed output to a specific file
215+
- **`-lineoffsetascomments`**: Add line offset information as comments in preprocessed output
216+
217+
### Caching Control
218+
- **`-nocache`**: Disable reading cached preprocessor output (forces re-preprocessing)
219+
- **`-createcache`**: Create cache for precompiled packages
220+
- **`-cachedir <dir>`**: Specify cache directory location
221+
222+
### Filtering Options
223+
- **`-filterprotectedregions`**: Filter out synthesis pragma protected regions
224+
- **`-filtercomments`**: Remove comments from preprocessed output
225+
- **`-filterdirectives`**: Filter out simple preprocessor directives
226+
227+
### Debug Options
228+
- **`-d incl`**: Output `IncludeFileInfo` debug trace to stderr
229+
- **`-verbose`**: Provide verbose preprocessing information
230+
- **`-profile`**: Generate profiling information for preprocessing
231+
232+
### Special Modes
233+
- **`-fileunit`**: Compile each file as independent unit (affects preprocessing scope)
234+
- **`-nobuiltin`**: Skip preprocessing of builtin SystemVerilog classes
235+
- **`-parseonly`**: Only parse, reload preprocessor saved database
236+
237+
## Example Usage
238+
239+
When preprocessing a file with includes and macros:
240+
```verilog
241+
// main.sv
242+
`include "defines.svh"
243+
module top;
244+
`MY_MACRO(arg1, arg2)
245+
endmodule
246+
```
247+
248+
The preprocessor generates:
249+
1. Preprocessed text with expanded includes and macros
250+
2. `IncludeFileInfo` records tracking:
251+
- PUSH when entering "defines.svh"
252+
- POP when exiting "defines.svh"
253+
- PUSH when expanding MY_MACRO
254+
- POP when macro expansion completes
255+
256+
This information is cached and used by subsequent compilation stages to maintain accurate source tracking throughout the compilation pipeline.

0 commit comments

Comments
 (0)