Skip to content

Value set: lift offset from numeric constants to expressions #8647

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 24, 2025

Conversation

tautschnig
Copy link
Collaborator

We can safely track arbitrary expressions as pointer offsets rather than limit ourselves to just constant offsets (and then treating all other expressions as "unknown").

  • Each commit message has a non-empty body, explaining why the change was made.
  • n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • n/a My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Copy link

codecov bot commented Jun 3, 2025

Codecov Report

Attention: Patch coverage is 98.73418% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.39%. Comparing base (eef9677) to head (056be83).
Report is 5 commits behind head on develop.

Files with missing lines Patch % Lines
src/pointer-analysis/value_set.cpp 97.05% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8647      +/-   ##
===========================================
+ Coverage    80.36%   80.39%   +0.02%     
===========================================
  Files         1688     1688              
  Lines       207067   207129      +62     
  Branches        73       73              
===========================================
+ Hits        166418   166516      +98     
+ Misses       40649    40613      -36     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@tautschnig tautschnig assigned kroening and unassigned tautschnig Jun 3, 2025
@@ -981,7 +981,7 @@ normalize(const object_descriptor_exprt &expr, const namespacet &ns)
{
return expr;
}
if(expr.offset().id() == ID_unknown)
if(!expr.offset().is_constant())
Copy link
Collaborator

@remi-delmas-3000 remi-delmas-3000 Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a high level description of what is the normal form we're trying to reach ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To root object + constant offset, as object can be an arbitrary access path into the object. Pointer equality checks become trivial then - maybe simplify_expr has become good enough in the meanwhile.
Anyways, this seems orthogonal to this PR.

@@ -184,7 +183,7 @@ void value_sett::output(std::ostream &out, const std::string &indent) const
stream << "<" << format(o) << ", ";

if(o_it->second)
stream << *o_it->second;
stream << format(*o_it->second);
Copy link
Collaborator

@remi-delmas-3000 remi-delmas-3000 Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now we have to print an expression instead of a mere integer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but why is that a concern?

Copy link
Collaborator

@remi-delmas-3000 remi-delmas-3000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one remaining question: Now that we have symbolic offsets for pointer expressions instead of just "constants" or "unknown", is there ay way to use that to compute more precise results in get_value_set_rec ? I know it lets us be more precise when modelling assignments, but I don't understand why we don't have a similar gain in precision when computing dereferences/traversing value_sets.

Other question: now that the value set representation "knows" that an expression array[i] has an offset of the form i * sizeof(T) could we try to take into account extra constraints about i during the value set traversal ? Let's say we're trying to resolve array[i] in the context of a basic loop invariant 0 <= i && i <= len(array), knowing range constraints about i we could maybe avoid injecting values representing OOB accesses in the value set for array[i] ?

Copy link
Member

@peterschrammel peterschrammel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we are lacking tests in 4 places here. Given that this is all but trivial it would be great to find some test cases that trigger these.

@@ -981,7 +981,7 @@ normalize(const object_descriptor_exprt &expr, const namespacet &ns)
{
return expr;
}
if(expr.offset().id() == ID_unknown)
if(!expr.offset().is_constant())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To root object + constant offset, as object can be an arbitrary access path into the object. Pointer equality checks become trivial then - maybe simplify_expr has become good enough in the meanwhile.
Anyways, this seems orthogonal to this PR.

@tautschnig
Copy link
Collaborator Author

I have one remaining question: Now that we have symbolic offsets for pointer expressions instead of just "constants" or "unknown", is there ay way to use that to compute more precise results in get_value_set_rec ? I know it lets us be more precise when modelling assignments, but I don't understand why we don't have a similar gain in precision when computing dereferences/traversing value_sets.

I have already seen this happen, although it isn't necessarily very obvious unless one starts examining the formula that symex produces. #8653 is a consequence of my observations: I was surprised to still find "unknown" when I had expected a known offset

Other question: now that the value set representation "knows" that an expression array[i] has an offset of the form i * sizeof(T) could we try to take into account extra constraints about i during the value set traversal ? Let's say we're trying to resolve array[i] in the context of a basic loop invariant 0 <= i && i <= len(array), knowing range constraints about i we could maybe avoid injecting values representing OOB accesses in the value set for array[i] ?

I'm not sure we even do create those OOB values here?

The test will otherwise exhibit undefined behaviour under CBMC 6
settings (where malloc may fail). This, in turn, can cause the expected
patterns not to show up in the trace.
Several branches cannot easily be tested by regression tests as
existing front-ends will never create some of the expressions.
None of these should be triggered by user-provided code.
We can safely track arbitrary expressions as pointer offsets rather than
limit ourselves to just constant offsets (and then treating all other
expressions as "unknown").
@tautschnig tautschnig merged commit 0beaf25 into diffblue:develop Jun 24, 2025
40 checks passed
@tautschnig tautschnig deleted the value-set-offset branch June 24, 2025 12:50
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jun 25, 2025
This release adds aarch64 va_list support (via diffblue#8572), which makes all
tests pass on aarch64 Linux. We reworked expression simplification
during symbolic execution (via diffblue#8642, diffblue#8647, diffblue#8627) to produce smaller
and quicker-to-solve formulae for scenarios seen by our users.
@tautschnig tautschnig mentioned this pull request Jun 25, 2025
4 tasks
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jun 25, 2025
This release adds aarch64 va_list support (via diffblue#8572), which makes all
tests pass on aarch64 Linux. We reworked expression simplification
during symbolic execution (via diffblue#8642, diffblue#8647, diffblue#8627) to produce smaller
and quicker-to-solve formulae for scenarios seen by our users.
tautschnig added a commit to tautschnig/cbmc that referenced this pull request Jun 25, 2025
This release adds aarch64 va_list support (via diffblue#8572), which makes all
tests pass on aarch64 Linux. We reworked expression simplification
during symbolic execution (via diffblue#8642, diffblue#8647, diffblue#8627) to produce smaller
and quicker-to-solve formulae for scenarios seen by our users.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants