[CI] The Big Beautiful Build #3186

AlexandreSinger · 2025-07-03T23:23:04Z

The CI for VTR was doing far more work than it needed to, which was
leading to long CI run times of around 1.5 hours on average. Overall,
the CI used 12.5 hours of compute and used more than 20 GitHub runners
at a time which hurt concurrency of runs.

The PR reduces the compute by only building VTR once for general builds
and using that build throughout when needed. This reduces the CI compute
down to 7.5 hours. In this process, I also added dependency chains to
try and schedule the runs efficiently to keep the number of active
GitHub runners below 10 (which is our current maximum).

I also fixed a bug with the way we have been using CCache. We were using
the same cache for all builds, which works fine for some projects; but
for ours that causes tons of cache misses when a gcc-11 build cache is
used for a Clang-18 cache for example. This PR makes each build's cache
unique, which enables better cache hit rates. I have found that when the
cache hit rate is perfect (i.e. the build is unchanged from the last
run), the CI uses less than 3 hours of compute and the test portion only
takes 20 minutes. When this happens, building the container actually
becomes the tall-pole since it often takes a little over 30 minutes.

Addendum: I also moved the coverity scan into the weekly CI run since it has never tripped in my time working on VTR and it adds a fixed 30 minutes to every CI run we do.

The CI for VTR was doing far more work than it needed to, which was leading to long CI run times of around 1.5 hours on average. Overall, the CI used 12.5 hours of compute and used more than 20 GitHub runners at a time which hurt concurrency of runs. The PR reduces the compute by only building VTR once for general builds and using that build throughout when needed. This reduces the CI compute down to 7.5 hours. In this process, I also added dependency chains to try and schedule the runs efficiently to keep the number of active GitHub runners below 10 (which is our current maximum). I also fixed a bug with the way we have been using CCache. We were using the same cache for all builds, which works fine for some projects; but for ours that causes tons of cache misses when a gcc-11 build cache is used for a Clang-18 cache for example. This PR makes each build's cache unique, which enables better cache hit rates. I have found that when the cache hit rate is perfect (i.e. the build is unchanged from the last run), the CI uses less than 3 hours of compute and the test portion only takes 20 minutes. When this happens, building the container actually becomes the tall-pole since it often takes a little over 30 minutes.

AlexandreSinger · 2025-07-04T17:49:04Z

Some data:

Prior to this change (with cache hits):

The average CI run time is 1.5 hours
The total compute for the test workflow was 12.5 hours (with containers taking 0.5 hours)
The slowest test was the BuildVariations test which took 47 minutes.

After this change (assuming the cache is not hit at all; which is exceedingly rare):

The CI run time is 1 hour
The total compute for the test workflow is 7.5 hours (with containers still taking 0.5 hours)
The slowest test was still BuildVariations which takes 54 minutes without a cache hit.

However, it is more likely that the cache will hit for a prior run (even from another CI run from a different branch). When the cache is perfectly hit (from a prior run from the same branch):

The CI run time is a little over 30 minutes
The total compute for the test workflow is 3 hours (with containers still taking 0.5 hours)
The slowest test is now the sanitized build that takes 20 minutes; however the containers test takes 30 minutes.

I will need to merge this into master to get an average (I need a field test), but I expect the average CI run time to be around 30-45 minutes (2x faster!).

Future work is to speed up the container build if possible (maybe we can using caching within, but I am not sure).

AlexandreSinger · 2025-07-04T17:51:07Z

@amin1377 @AmirhosseinPoolad @vaughnbetz TL;DR, I think I was able to reduce the end to end run time of the CI to around 40 minutes (from 1.5 hours) and reduce the number of concurrent machines required immensely. Please review when you have a moment. I would like to merge this into master and see how it ends up working in the field.

AlexandreSinger · 2025-07-04T17:53:42Z

Its dependency chain also looks cooler now:

VS. Just being flat.

vaughnbetz · 2025-07-04T19:23:41Z

Looks great -- thanks @AlexandreSinger !
I wonder if coverity scan is working. Coverity is basically super-warnings (static code checker). We're using their free version, but if it never trips it may mean something is broken in the link to it and we aren't really running it.
Coverity isn't essential; with the warning levels turned way up on a lot of compilers we probably overlap most of what it checks already. If it isn't really running we should turn it off.

vaughnbetz · 2025-07-04T19:24:53Z

Are the four pending tests ones that will never fire given the refactoring? If so, they should be removed from the test requirements.

AlexandreSinger · 2025-07-04T21:50:12Z

Hi Vaughn, yes! They were renamed. I have removed them from the branch protection. Once things stabilize I will update the branch protection rules to include the new tests.

I merged this in to get the CI moved over sooner. We can decide what to do about the coverity scan later. I think keeping it around weekly is fine, but I agree, if its not doing anything it should be removed.

vaughnbetz · 2025-07-04T22:58:59Z

Great, thanks.

github-actions bot added infra Project Infrastructure lang-shell Shell scripts (bash etc.) labels Jul 3, 2025

AlexandreSinger force-pushed the feature-ci-build-cleanup branch 3 times, most recently from c87e68b to 420e318 Compare July 4, 2025 00:26

github-actions bot added the scripts Utility & Infrastructure scripts label Jul 4, 2025

AlexandreSinger force-pushed the feature-ci-build-cleanup branch 2 times, most recently from c9c4d75 to c931c29 Compare July 4, 2025 15:58

AlexandreSinger force-pushed the feature-ci-build-cleanup branch from 243a32b to 9620cce Compare July 4, 2025 17:41

AlexandreSinger changed the title ~~[WIP][CI] Combined VTR Builds~~ [CI] The Big Beautiful Build Jul 4, 2025

AlexandreSinger requested review from amin1377, vaughnbetz and AmirhosseinPoolad July 4, 2025 17:49

AlexandreSinger merged commit a2da057 into master Jul 4, 2025
30 checks passed

AlexandreSinger deleted the feature-ci-build-cleanup branch July 4, 2025 21:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] The Big Beautiful Build #3186

[CI] The Big Beautiful Build #3186

Uh oh!

AlexandreSinger commented Jul 3, 2025 •

edited

Loading

Uh oh!

AlexandreSinger commented Jul 4, 2025 •

edited

Loading

Uh oh!

AlexandreSinger commented Jul 4, 2025

Uh oh!

AlexandreSinger commented Jul 4, 2025

Uh oh!

vaughnbetz commented Jul 4, 2025

Uh oh!

vaughnbetz commented Jul 4, 2025

Uh oh!

Uh oh!

AlexandreSinger commented Jul 4, 2025

Uh oh!

vaughnbetz commented Jul 4, 2025

Uh oh!

Uh oh!

[CI] The Big Beautiful Build #3186

[CI] The Big Beautiful Build #3186

Uh oh!

Conversation

AlexandreSinger commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexandreSinger commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexandreSinger commented Jul 4, 2025

Uh oh!

AlexandreSinger commented Jul 4, 2025

Uh oh!

vaughnbetz commented Jul 4, 2025

Uh oh!

vaughnbetz commented Jul 4, 2025

Uh oh!

Uh oh!

AlexandreSinger commented Jul 4, 2025

Uh oh!

vaughnbetz commented Jul 4, 2025

Uh oh!

Uh oh!

AlexandreSinger commented Jul 3, 2025 •

edited

Loading

AlexandreSinger commented Jul 4, 2025 •

edited

Loading