Value set: lift offset from numeric constants to expressions #8647

tautschnig · 2025-05-30T13:30:20Z

We can safely track arbitrary expressions as pointer offsets rather than limit ourselves to just constant offsets (and then treating all other expressions as "unknown").

Each commit message has a non-empty body, explaining why the change was made.
n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
n/a My commit message includes data points confirming performance improvements (if claimed).
My PR is restricted to a single feature or bugfix.
n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

codecov · 2025-06-03T09:11:52Z

Codecov Report

Attention: Patch coverage is 98.73418% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.39%. Comparing base (eef9677) to head (056be83).
Report is 5 commits behind head on develop.

Files with missing lines	Patch %	Lines
src/pointer-analysis/value_set.cpp	97.05%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #8647      +/-   ##
===========================================
+ Coverage    80.36%   80.39%   +0.02%     
===========================================
  Files         1688     1688              
  Lines       207067   207129      +62     
  Branches        73       73              
===========================================
+ Hits        166418   166516      +98     
+ Misses       40649    40613      -36

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

src/goto-symex/goto_symex_state.cpp

remi-delmas-3000 · 2025-06-18T19:47:08Z

src/goto-symex/shadow_memory_util.cpp

@@ -981,7 +981,7 @@ normalize(const object_descriptor_exprt &expr, const namespacet &ns)
  {
    return expr;
  }
-  if(expr.offset().id() == ID_unknown)
+  if(!expr.offset().is_constant())


Can we get a high level description of what is the normal form we're trying to reach ?

To root object + constant offset, as object can be an arbitrary access path into the object. Pointer equality checks become trivial then - maybe simplify_expr has become good enough in the meanwhile.
Anyways, this seems orthogonal to this PR.

remi-delmas-3000 · 2025-06-18T19:52:21Z

src/pointer-analysis/value_set.cpp

@@ -184,7 +183,7 @@ void value_sett::output(std::ostream &out, const std::string &indent) const
        stream << "<" << format(o) << ", ";

        if(o_it->second)
-          stream << *o_it->second;
+          stream << format(*o_it->second);


now we have to print an expression instead of a mere integer

Yes, but why is that a concern?

remi-delmas-3000

I have one remaining question: Now that we have symbolic offsets for pointer expressions instead of just "constants" or "unknown", is there ay way to use that to compute more precise results in get_value_set_rec ? I know it lets us be more precise when modelling assignments, but I don't understand why we don't have a similar gain in precision when computing dereferences/traversing value_sets.

Other question: now that the value set representation "knows" that an expression array[i] has an offset of the form i * sizeof(T) could we try to take into account extra constraints about i during the value set traversal ? Let's say we're trying to resolve array[i] in the context of a basic loop invariant 0 <= i && i <= len(array), knowing range constraints about i we could maybe avoid injecting values representing OOB accesses in the value set for array[i] ?

peterschrammel

It seems we are lacking tests in 4 places here. Given that this is all but trivial it would be great to find some test cases that trigger these.

peterschrammel · 2025-06-19T20:55:10Z

src/goto-symex/shadow_memory_util.cpp

@@ -981,7 +981,7 @@ normalize(const object_descriptor_exprt &expr, const namespacet &ns)
  {
    return expr;
  }
-  if(expr.offset().id() == ID_unknown)
+  if(!expr.offset().is_constant())


To root object + constant offset, as object can be an arbitrary access path into the object. Pointer equality checks become trivial then - maybe simplify_expr has become good enough in the meanwhile.
Anyways, this seems orthogonal to this PR.

src/pointer-analysis/value_set.cpp

tautschnig · 2025-06-20T20:08:21Z

I have one remaining question: Now that we have symbolic offsets for pointer expressions instead of just "constants" or "unknown", is there ay way to use that to compute more precise results in get_value_set_rec ? I know it lets us be more precise when modelling assignments, but I don't understand why we don't have a similar gain in precision when computing dereferences/traversing value_sets.

I have already seen this happen, although it isn't necessarily very obvious unless one starts examining the formula that symex produces. #8653 is a consequence of my observations: I was surprised to still find "unknown" when I had expected a known offset

Other question: now that the value set representation "knows" that an expression array[i] has an offset of the form i * sizeof(T) could we try to take into account extra constraints about i during the value set traversal ? Let's say we're trying to resolve array[i] in the context of a basic loop invariant 0 <= i && i <= len(array), knowing range constraints about i we could maybe avoid injecting values representing OOB accesses in the value set for array[i] ?

I'm not sure we even do create those OOB values here?

The test will otherwise exhibit undefined behaviour under CBMC 6 settings (where malloc may fail). This, in turn, can cause the expected patterns not to show up in the trace.

Several branches cannot easily be tested by regression tests as existing front-ends will never create some of the expressions.

None of these should be triggered by user-provided code.

We can safely track arbitrary expressions as pointer offsets rather than limit ourselves to just constant offsets (and then treating all other expressions as "unknown").

This release adds aarch64 va_list support (via diffblue#8572), which makes all tests pass on aarch64 Linux. We reworked expression simplification during symbolic execution (via diffblue#8642, diffblue#8647, diffblue#8627) to produce smaller and quicker-to-solve formulae for scenarios seen by our users.

tautschnig assigned peterschrammel May 30, 2025

tautschnig requested review from martin-cs and peterschrammel as code owners May 30, 2025 13:30

tautschnig self-assigned this May 30, 2025

tautschnig mentioned this pull request Jun 2, 2025

Huge SMT file and slow proof for simple array function #8617

Closed

tautschnig force-pushed the value-set-offset branch from 9764fbe to 5d136c4 Compare June 3, 2025 08:42

tautschnig requested a review from kroening as a code owner June 3, 2025 08:42

tautschnig force-pushed the value-set-offset branch from 5d136c4 to 41811ba Compare June 3, 2025 09:59

tautschnig assigned kroening and unassigned tautschnig Jun 3, 2025

remi-delmas-3000 reviewed Jun 18, 2025

View reviewed changes

src/goto-symex/goto_symex_state.cpp Show resolved Hide resolved

remi-delmas-3000 reviewed Jun 18, 2025

View reviewed changes

remi-delmas-3000 approved these changes Jun 18, 2025

View reviewed changes

peterschrammel approved these changes Jun 19, 2025

View reviewed changes

tautschnig assigned tautschnig and unassigned kroening and peterschrammel Jun 20, 2025

tautschnig force-pushed the value-set-offset branch from 41811ba to 96844b5 Compare June 20, 2025 22:13

tautschnig added 4 commits June 24, 2025 09:20

Ensure test only fails as originally designed

2ce536c

The test will otherwise exhibit undefined behaviour under CBMC 6 settings (where malloc may fail). This, in turn, can cause the expected patterns not to show up in the trace.

Additional unit tests for value_sett

1f2fd5a

Several branches cannot easily be tested by regression tests as existing front-ends will never create some of the expressions.

value_sett: replace uses of throw by invariants

7fc006e

None of these should be triggered by user-provided code.

Value set: lift offset from numeric constants to expressions

056be83

We can safely track arbitrary expressions as pointer offsets rather than limit ourselves to just constant offsets (and then treating all other expressions as "unknown").

tautschnig force-pushed the value-set-offset branch from 96844b5 to 056be83 Compare June 24, 2025 11:23

tautschnig merged commit 0beaf25 into diffblue:develop Jun 24, 2025
40 checks passed

tautschnig deleted the value-set-offset branch June 24, 2025 12:50

tautschnig mentioned this pull request Jun 25, 2025

Release CBMC 6.7.0 #8667

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Value set: lift offset from numeric constants to expressions #8647

Value set: lift offset from numeric constants to expressions #8647

tautschnig commented May 30, 2025

Uh oh!

codecov bot commented Jun 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

remi-delmas-3000 Jun 18, 2025 •

edited

Loading

Uh oh!

peterschrammel Jun 19, 2025

Uh oh!

remi-delmas-3000 Jun 18, 2025 •

edited

Loading

Uh oh!

tautschnig Jun 20, 2025

Uh oh!

remi-delmas-3000 left a comment •

edited

Loading

Uh oh!

peterschrammel left a comment

Uh oh!

peterschrammel Jun 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tautschnig commented Jun 20, 2025

Uh oh!

Uh oh!

Uh oh!

Value set: lift offset from numeric constants to expressions #8647

Value set: lift offset from numeric constants to expressions #8647

Conversation

tautschnig commented May 30, 2025

Uh oh!

codecov bot commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

remi-delmas-3000 Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterschrammel Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

remi-delmas-3000 Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tautschnig Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

remi-delmas-3000 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterschrammel left a comment

Choose a reason for hiding this comment

Uh oh!

peterschrammel Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tautschnig commented Jun 20, 2025

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jun 3, 2025 •

edited

Loading

remi-delmas-3000 Jun 18, 2025 •

edited

Loading

remi-delmas-3000 Jun 18, 2025 •

edited

Loading

remi-delmas-3000 left a comment •

edited

Loading