@@ -30,6 +30,15 @@ used depends on whether it is internal or being presented to the user.
30
30
The comment lines are generated from the ` source_location ` field
31
31
of the ` instructiont ` structure.
32
32
33
+ GOTO programs are commonly produced using the ` initialize_goto_model ` function,
34
+ which combines the complete process from command-line arguments specifying
35
+ source code files, through parsing and generating a symbol table, to finally
36
+ producing GOTO functions.
37
+
38
+ Alternatively ` lazy_goto_modelt ` can be used, which facilitates producing GOTO
39
+ functions on demand. See \ref section-lazy-conversion for more details on this
40
+ method.
41
+
33
42
\section goto_data_structures Data Structures
34
43
35
44
A \ref goto_functionst object contains a set of GOTO programs. Note the
@@ -86,23 +95,19 @@ digraph G {
86
95
At this stage, CBMC constructs a goto-program from a symbol table.
87
96
Each symbol in the symbol table with function type (that is, the symbol's ` type `
88
97
field contains a \ref code_typet) will be converted to a corresponding GOTO
89
- program.
90
-
91
- It does not use the parse tree or the source file at all for this step. This
92
- may seem surprising, because the symbols are stored in a hash table and
93
- therefore have no intrinsic order; nevertheless, every \ref symbolt is
94
- associated with a \ref source_locationt, allowing CBMC to figure out the
95
- lexical order.
96
-
97
- The structure of what is informally called a goto-program follows. The
98
- entire target program is converted to a single \ref goto_functionst
99
- object. The goto functions contains a set of \ref goto_programt objects;
100
- each of these correspond to a "function" or "method" in the target
101
- program. Each goto_program contains a list of
102
- \ref goto_programt::instructiont "instructions"; each
103
- instruction contains an associated statement---these are subtypes of
104
- \ref codet. Each instruction also contains a "target", which will be
105
- empty for now.
98
+ program. The parse tree and source file are not used at all for this step.
99
+
100
+ The conversion happens in two phases:
101
+
102
+ 1 . Function ` goto_convertt::convert ` turns each ` codet ` in the symbol table
103
+ into corresponding GOTO instructions.
104
+ 2 . ` goto_convertt::finish_gotos ` and others populate the ` GOTO ` and ` CATCH `
105
+ instructions' ` targets ` members, pointing to their possible successors.
106
+ ` DEAD ` instructions are also added when GOTO instructions branch out of one
107
+ or more lexical blocks.
108
+
109
+ Here is an example of a simple C program being converted into GOTO code (note
110
+ the ` targets ` member of each instruction isn't populated yet):
106
111
107
112
\dot
108
113
digraph G{
@@ -241,39 +246,10 @@ void foo(){}"];
241
246
}
242
247
\enddot
243
248
244
- This is not the final form of the goto-functions, since the lists of
245
- instructions will be 'normalized' in the next step (Instrumentation),
246
- which removes some instructions and adds targets to others.
247
-
248
- ---
249
- \section instrumentation Instrumentation
250
-
251
- In the \ref goto-programs directory.
252
-
253
- ** Key classes:**
254
- * goto_programt
255
- * goto_functionst
256
- * \ref goto_programt::instructiont
257
-
258
- \dot
259
- digraph G {
260
- node [ shape=box] ;
261
- rankdir="LR";
262
- 1 [ shape=none, label=""] ;
263
- 2 [ label="goto conversion"] ;
264
- 3 [ shape=none, label=""] ;
265
- 1 -> 2 [ label="goto-programs, goto-functions, symbol table"] ;
266
- 2 -> 3 [ label="transformed goto-programs"] ;
267
- }
268
- \enddot
269
-
270
- This stage applies several transformations to the goto-programs from the
271
- previous stage:
272
-
273
- * The diagram in the previous stage showed a goto_programt with four
274
- instructions, but real programs usually yield instruction lists that
275
- are littered with \ref code_skipt "skip" statements. The
276
- instrumentation stage removes the majority of these.
249
+ The ` codet ` representation of method bodies that is stored in the symbol table
250
+ resembles C code quite strongly: it uses lexical variable scoping and has
251
+ complex control-flow constructs such as ` code_whilet ` and ` code_ifthenelset ` .
252
+ In converting to a GOTO program this structure is changed in several ways:
277
253
278
254
* Variable lifespan implied by \ref code_declt instructions and lexical scopes
279
255
described by \ref code_blockt nodes is replaced by ` DECL ` and corresponding
@@ -322,14 +298,63 @@ previous stage:
322
298
instruction. Each goto_programt should have precisely one such
323
299
instruction.
324
300
325
- This stage concludes the * analysis-independent* program transformations.
326
-
327
- \subsection goto-program-example-section Example:
328
-
329
- \subsubsection goto-program-example-1-section Unsigned mult (unsigned a, unsigned b) { int acc, i; for (i = 0; i < b; i++) acc += a; return acc; }
330
-
331
- To be documented.
332
-
301
+ \section section-goto-transforms Subsequent Transformation
302
+
303
+ It is normal for a program that calls ` goto_convert ` to immediately pass the
304
+ GOTO functions through one or more transformations. For example, the ubiquitous
305
+ ` remove_returns ` transformation turns the method return sequence generated by
306
+ ` goto_convert ` (something like ` RETURN x; GOTO end; ` ) into
307
+ ` function_name#return_value = x; GOTO end; ` , and a callsite
308
+ ` y = function_name(); ` into ` function_name(); y = function_name#return_value; ` .
309
+
310
+ Some subsequent parts of the CBMC pipeline assume that one or more of these
311
+ transforms has been applied, making them effectively mandatory in that case.
312
+
313
+ Common transformations include:
314
+
315
+ * Removing superfluous SKIP instructions (` remove_skip ` )
316
+ * Converting indirect function calls into dispatch tables over direct calls
317
+ (` remove_virtual_functions ` and ` remove_function_pointers ` )
318
+ * Adding checks for null-pointer dereferences, arithmetic overflow and other
319
+ undefined behaviour (` goto_check ` )
320
+ * Adding code coverage assertions (` instrument_cover_goals ` ).
321
+
322
+ By convention these post-convert transformations are applied by a driver program
323
+ in a function named ` process_goto_program ` or ` process_goto_functions ` ; examples
324
+ of these can be found in ` cbmc_parse_optionst ` , ` goto_analyzer_parse_optionst `
325
+ and ` goto_instrument_parse_optionst ` among others.
326
+
327
+ \section section-lazy-conversion Lazy GOTO Conversion
328
+
329
+ The conventional source-code-to-GOTO-program pipeline is horizontally
330
+ structured: all source-file information is converted into a symbol table, then
331
+ each symbol-table method is converted into a GOTO program, then each transform
332
+ is run over the whole GOTO-model in turn. This is fine when the GOTO model
333
+ consumer will use most or all of the results of conversion, but if not then
334
+ lazy conversion may be beneficial.
335
+
336
+ Lazy conversion is coordinated by ` lazy_goto_modelt ` . Early source code parsing
337
+ proceeds as usual, but while the same symbol table symbols are created as using
338
+ ` initialize_goto_model ` , method bodies in the symbol table are not populated.
339
+ ` lazy_goto_modelt::get_goto_function ` then allows each function to be converted
340
+ on demand, passing individually through symbol-table ` codet ` production, GOTO
341
+ conversion and driver-program transformation.
342
+
343
+ ` goto_symext ` is able to use such a lazy GOTO model in place of the usual
344
+ ` goto_modelt ` , enabling it to create GOTO programs on demand, and thus avoid
345
+ spending time and memory generating GOTO programs that symex determines cannot
346
+ in fact be reached. This is particularly useful when the source program contains
347
+ many indirect calls (virtual function calls, or calls via function pointer), as
348
+ the ` remove_virtual_functions ` and ` remove_function_pointers ` passes treat these
349
+ very conservatively, producing large dispatch tables of possible callees which
350
+ symex often finds to be largely unreachable.
351
+
352
+ JBMC exhibits typical usage of ` lazy_goto_modelt ` when it is called with
353
+ ` --symex-driven-lazy-loading ` . It defines a function
354
+ ` jbmc_parse_optionst::process_goto_function ` which performs driver-program-
355
+ specific post-GOTO-convert transformation. The lazy GOTO model is then passed to
356
+ ` goto_symext ` , which will trigger symbol-table population, GOTO conversion
357
+ and post-processing as needed.
333
358
334
359
\section section-goto-binary Binary Representation
335
360
0 commit comments