Skip to content

Commit cebfe56

Browse files
authored
Merge pull request #3383 from smowton/smowton/docs/goto-model-creation
Improve GOTO programs high-level documentation
2 parents 4bca13b + 008c580 commit cebfe56

File tree

1 file changed

+83
-58
lines changed

1 file changed

+83
-58
lines changed

src/goto-programs/README.md

Lines changed: 83 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,15 @@ used depends on whether it is internal or being presented to the user.
3030
The comment lines are generated from the `source_location` field
3131
of the `instructiont` structure.
3232

33+
GOTO programs are commonly produced using the `initialize_goto_model` function,
34+
which combines the complete process from command-line arguments specifying
35+
source code files, through parsing and generating a symbol table, to finally
36+
producing GOTO functions.
37+
38+
Alternatively `lazy_goto_modelt` can be used, which facilitates producing GOTO
39+
functions on demand. See \ref section-lazy-conversion for more details on this
40+
method.
41+
3342
\section goto_data_structures Data Structures
3443

3544
A \ref goto_functionst object contains a set of GOTO programs. Note the
@@ -86,23 +95,19 @@ digraph G {
8695
At this stage, CBMC constructs a goto-program from a symbol table.
8796
Each symbol in the symbol table with function type (that is, the symbol's `type`
8897
field contains a \ref code_typet) will be converted to a corresponding GOTO
89-
program.
90-
91-
It does not use the parse tree or the source file at all for this step. This
92-
may seem surprising, because the symbols are stored in a hash table and
93-
therefore have no intrinsic order; nevertheless, every \ref symbolt is
94-
associated with a \ref source_locationt, allowing CBMC to figure out the
95-
lexical order.
96-
97-
The structure of what is informally called a goto-program follows. The
98-
entire target program is converted to a single \ref goto_functionst
99-
object. The goto functions contains a set of \ref goto_programt objects;
100-
each of these correspond to a "function" or "method" in the target
101-
program. Each goto_program contains a list of
102-
\ref goto_programt::instructiont "instructions"; each
103-
instruction contains an associated statement---these are subtypes of
104-
\ref codet. Each instruction also contains a "target", which will be
105-
empty for now.
98+
program. The parse tree and source file are not used at all for this step.
99+
100+
The conversion happens in two phases:
101+
102+
1. Function `goto_convertt::convert` turns each `codet` in the symbol table
103+
into corresponding GOTO instructions.
104+
2. `goto_convertt::finish_gotos` and others populate the `GOTO` and `CATCH`
105+
instructions' `targets` members, pointing to their possible successors.
106+
`DEAD` instructions are also added when GOTO instructions branch out of one
107+
or more lexical blocks.
108+
109+
Here is an example of a simple C program being converted into GOTO code (note
110+
the `targets` member of each instruction isn't populated yet):
106111

107112
\dot
108113
digraph G{
@@ -241,39 +246,10 @@ void foo(){}"];
241246
}
242247
\enddot
243248

244-
This is not the final form of the goto-functions, since the lists of
245-
instructions will be 'normalized' in the next step (Instrumentation),
246-
which removes some instructions and adds targets to others.
247-
248-
---
249-
\section instrumentation Instrumentation
250-
251-
In the \ref goto-programs directory.
252-
253-
**Key classes:**
254-
* goto_programt
255-
* goto_functionst
256-
* \ref goto_programt::instructiont
257-
258-
\dot
259-
digraph G {
260-
node [shape=box];
261-
rankdir="LR";
262-
1 [shape=none, label=""];
263-
2 [label="goto conversion"];
264-
3 [shape=none, label=""];
265-
1 -> 2 [label="goto-programs, goto-functions, symbol table"];
266-
2 -> 3 [label="transformed goto-programs"];
267-
}
268-
\enddot
269-
270-
This stage applies several transformations to the goto-programs from the
271-
previous stage:
272-
273-
* The diagram in the previous stage showed a goto_programt with four
274-
instructions, but real programs usually yield instruction lists that
275-
are littered with \ref code_skipt "skip" statements. The
276-
instrumentation stage removes the majority of these.
249+
The `codet` representation of method bodies that is stored in the symbol table
250+
resembles C code quite strongly: it uses lexical variable scoping and has
251+
complex control-flow constructs such as `code_whilet` and `code_ifthenelset`.
252+
In converting to a GOTO program this structure is changed in several ways:
277253

278254
* Variable lifespan implied by \ref code_declt instructions and lexical scopes
279255
described by \ref code_blockt nodes is replaced by `DECL` and corresponding
@@ -322,14 +298,63 @@ previous stage:
322298
instruction. Each goto_programt should have precisely one such
323299
instruction.
324300

325-
This stage concludes the *analysis-independent* program transformations.
326-
327-
\subsection goto-program-example-section Example:
328-
329-
\subsubsection goto-program-example-1-section Unsigned mult (unsigned a, unsigned b) { int acc, i; for (i = 0; i < b; i++) acc += a; return acc; }
330-
331-
To be documented.
332-
301+
\section section-goto-transforms Subsequent Transformation
302+
303+
It is normal for a program that calls `goto_convert` to immediately pass the
304+
GOTO functions through one or more transformations. For example, the ubiquitous
305+
`remove_returns` transformation turns the method return sequence generated by
306+
`goto_convert` (something like `RETURN x; GOTO end;`) into
307+
`function_name#return_value = x; GOTO end;`, and a callsite
308+
`y = function_name();` into `function_name(); y = function_name#return_value;`.
309+
310+
Some subsequent parts of the CBMC pipeline assume that one or more of these
311+
transforms has been applied, making them effectively mandatory in that case.
312+
313+
Common transformations include:
314+
315+
* Removing superfluous SKIP instructions (`remove_skip`)
316+
* Converting indirect function calls into dispatch tables over direct calls
317+
(`remove_virtual_functions` and `remove_function_pointers`)
318+
* Adding checks for null-pointer dereferences, arithmetic overflow and other
319+
undefined behaviour (`goto_check`)
320+
* Adding code coverage assertions (`instrument_cover_goals`).
321+
322+
By convention these post-convert transformations are applied by a driver program
323+
in a function named `process_goto_program` or `process_goto_functions`; examples
324+
of these can be found in `cbmc_parse_optionst`, `goto_analyzer_parse_optionst`
325+
and `goto_instrument_parse_optionst` among others.
326+
327+
\section section-lazy-conversion Lazy GOTO Conversion
328+
329+
The conventional source-code-to-GOTO-program pipeline is horizontally
330+
structured: all source-file information is converted into a symbol table, then
331+
each symbol-table method is converted into a GOTO program, then each transform
332+
is run over the whole GOTO-model in turn. This is fine when the GOTO model
333+
consumer will use most or all of the results of conversion, but if not then
334+
lazy conversion may be beneficial.
335+
336+
Lazy conversion is coordinated by `lazy_goto_modelt`. Early source code parsing
337+
proceeds as usual, but while the same symbol table symbols are created as using
338+
`initialize_goto_model`, method bodies in the symbol table are not populated.
339+
`lazy_goto_modelt::get_goto_function` then allows each function to be converted
340+
on demand, passing individually through symbol-table `codet` production, GOTO
341+
conversion and driver-program transformation.
342+
343+
`goto_symext` is able to use such a lazy GOTO model in place of the usual
344+
`goto_modelt`, enabling it to create GOTO programs on demand, and thus avoid
345+
spending time and memory generating GOTO programs that symex determines cannot
346+
in fact be reached. This is particularly useful when the source program contains
347+
many indirect calls (virtual function calls, or calls via function pointer), as
348+
the `remove_virtual_functions` and `remove_function_pointers` passes treat these
349+
very conservatively, producing large dispatch tables of possible callees which
350+
symex often finds to be largely unreachable.
351+
352+
JBMC exhibits typical usage of `lazy_goto_modelt` when it is called with
353+
`--symex-driven-lazy-loading`. It defines a function
354+
`jbmc_parse_optionst::process_goto_function` which performs driver-program-
355+
specific post-GOTO-convert transformation. The lazy GOTO model is then passed to
356+
`goto_symext`, which will trigger symbol-table population, GOTO conversion
357+
and post-processing as needed.
333358

334359
\section section-goto-binary Binary Representation
335360

0 commit comments

Comments
 (0)