Apr speedup #779

becnealon · 2026-01-05T04:10:09Z

Description:
Editing the APR routines to increase speed - previously most of these routines were serial (by necessity). These have been tidied up, serialised where possible and made more compact in an attempt to make the code more efficient. Speedup on my tests is about 20%. This is in response to #766.

Components modified:

Main code (src/main)

Type of change:

Performance improvements
Code cleanup / refactor

Testing:
I tested against a reference calculation from the previous version of the APR routines. Because particles are placed slightly differently than before, the results are not identical however important conserved quantities track the same. Passed the APR section of the testsuite.

Did you run the bots? no

Did you update relevant documentation in the docs directory? N/A

Did you add comments such that the purpose of the code is understandable? hope so

Is there a unit test that could be added for this feature/bug? no

Important notes:

A previous commit removed rand() in splitpart to ensure that results are always exactly replicable. This was replaced with a psuedo random number generated by a counter. This is one of the reasons to keep the loop that calls splitpart serialised (also it probably doesn't go over that many particles each time).
I tried a few different things to see what would produce the fastest speedup, they're all documented as previous commits on this branch. The pushed one gave the best speedup of 1.2x on a N=10^6 circumbinary disc with apr_max=3 centred on stars.
I have not tested this with GR. I moved the GR split routines (simple copy and paste) but would appreciate if @Fitz-Hu could either test it yourself or send me a file that I can test from?

…t option This produces a slightly different answer than before because the minimum function returns a different number. After testing, I think this version is actually the correct one. It only changes slightly the distance that the particles are placed at but not their direction or properties.

produces same as previous results if adjusted_split = .false. and nangle is held constant

This reverts commit 219ff68.

This reverts commit 72eeb0a.

…peedup

gemini-code-assist · 2026-01-05T04:10:33Z

Summary of Changes

Hello @becnealon, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers substantial performance enhancements to the Adaptive Particle Refinement (APR) routines. The core "update_apr" logic has been optimized by introducing OpenMP for parallel processing of particle splitting and merging decisions. Furthermore, the "splitpart" functionality has undergone a significant refactoring, extracting specific splitting and geodesic integration methods into a new module ("get_apr_level.f90"). This modular approach improves code clarity, simplifies future modifications, and ensures robust handling of various physical scenarios, including General Relativity, while also adding a crucial check for EOS compatibility.

Highlights

Performance Optimization: The APR routines have been significantly optimized, leading to an approximate 20% speedup in the author's tests.
OpenMP Parallelization: OpenMP directives have been introduced in "update_apr" to parallelize the identification of particles for splitting and merging, improving computational efficiency.
Modular splitpart Logic: The complex "splitpart" subroutine has been refactored and its specific splitting strategies (e.g., for GR, different "split_dir" values) have been moved into dedicated subroutines within "get_apr_level.f90", enhancing code organization and maintainability.
Geodesic Integration Relocation: The "integrate_geodesic" and "integrate_geodesic_gr" subroutines, previously in "apr.f90", have been moved to "get_apr_level.f90" to support the modular "splitpart" implementation.
Error Handling for EOS: A new fatal error check ensures that "split_dir == 2" is not used with an incompatible Equation of State (EOS) choice ("maxvxyzu < 4").

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the Adaptive Particle Refinement (APR) routines to improve performance, primarily through parallelization using OpenMP. The changes are extensive, touching particle splitting and merging logic. While the refactoring is a significant improvement in terms of code structure and potential speedup, I've identified a critical bug related to array sizing that could lead to memory corruption, a high-severity race condition, and a few medium-severity issues related to maintainability and performance that should be addressed.

gemini-code-assist · 2026-01-05T04:18:24Z

src/main/apr.f90

+ integer :: should_split(apr_max*npart), idx_split(apr_max*npart), scan_array(apr_max*npart)
+ integer :: idx_merge(apr_max*npart),should_merge(apr_max*npart),apr_last
+ real :: get_apr_in(3),rneighs(2*npart),xi,yi,zi,dx,dy,dz,rmin_local


The automatic arrays should_split, idx_split, scan_array, idx_merge, should_merge, and rneighs are sized based on the value of npart at the subroutine's entry. However, npart is an intent(inout) parameter that is increased within the particle splitting logic. This will lead to out-of-bounds memory access when npartold or npart become larger than the initial npart value in subsequent iterations of the loops. This is a critical bug that can cause crashes or incorrect behavior.

These arrays should be sized with maxp (which is available from the dim module) to prevent buffer overflows, or be declared as allocatable and sized appropriately before use.

integer :: should_split(maxp), idx_split(maxp), scan_array(maxp) integer :: idx_merge(maxp), should_merge(maxp), apr_last real :: get_apr_in(3), rneighs(maxp), xi, yi, zi, dx, dy, dz, rmin_local

The above is true, these arrays technically should be maxp large. However in general they are about 10% of maxp which then makes it wasteful and costly since maxp -> 8 * the original maxp when you're using apr. If someone is splitting enough particles that they hit this limit they are basically splitting their entire simulation which is not a useful application of APR. I find the above is a fair practical limit that doesn't add to runtime. Let me know if you want anything different.

local arrays should never be declared like this, if you really want large temporary arrays you should explicitly allocate and deallocate, i.e. "declared as allocatable and sized appropriately before use"

src/main/apr.f90

gemini-code-assist · 2026-01-05T04:18:24Z

src/main/apr.f90

+       !$omp do schedule(dynamic)
+          do ii = 1,idx_len
+             mm = idx_split(ii) ! original particle that should be split          
+             xi = xyzh(1,mm)
+             yi = xyzh(2,mm)
+             zi = xyzh(3,mm)
+
+             rmin_local = huge(1.0)
+
+             do j = 1,npartold
+                if (j == mm) cycle
+                dx = xi - xyzh(1,j)
+                dy = yi - xyzh(2,j)
+                dz = zi - xyzh(3,j)
+                rmin_local = min(rmin_local,dx*dx + dy*dy + dz*dz)
+             enddo
+             rneighs(ii) = sqrt(rmin_local)
+          enddo
+       !$omp end do


The brute-force nearest neighbor search within this parallelized loop has a complexity of O(n_to_split * npartold), which can be a significant performance bottleneck if n_to_split is large. The comment in the old closest_neigh subroutine indicated that it needed replacing.

While parallelizing the existing algorithm provides a speedup, consider implementing a more efficient algorithm, such as a tree-based search (e.g., using the existing k-d tree infrastructure), to further improve performance. This would change the complexity from O(N^2) to O(N log N).

@danieljprice let me know what is best here ...

would suggest to go ahead merge for now, implementing a tree-based search here is a problem for another p-r

src/main/apr.f90

src/main/get_apr_level.f90

becnealon and others added 17 commits November 20, 2025 11:35

(apr) splitpart doesn't change npart anymore

d8dd26c

(apr) reordered the split routine

801a6c2

(apr) speed up of split

eefafa8

(apr) speed up of merge

c1b7377

produces same as previous results if adjusted_split = .false. and nangle is held constant

(apr) turned nangle back on - this should always be used outside testing

c1687bf

(apr) testing if a cleaner version of splitpart is faster

a08107e

(apr) removed leftover code from previous commit

640c1f0

(apr) restructuring apr split loop again

c1a3769

(apr) tinkering

3bf7ff9

(apr) consolidating the split loop

72eeb0a

(apr) consolidating the merge loop

219ff68

Revert "(apr) consolidating the merge loop"

c34a087

This reverts commit 219ff68.

Revert "(apr) consolidating the split loop"

fcf4934

This reverts commit 72eeb0a.

Merge branch 'danieljprice:master' into apr-speedup

233ef29

(apr) turned nangle increments back on - necessary to pass test suite!

e3bec5e

Merge branch 'apr-speedup' of github.com:becnealon/phantom into apr-s…

1ec32e6

…peedup

gemini-code-assist bot reviewed Jan 5, 2026

View reviewed changes

becnealon added 2 commits January 5, 2026 16:01

(apr) appeasing wondrous bot overlord

3bac803

(apr) fixed from previous commit

7197a75

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Apr speedup #779

Apr speedup #779

Uh oh!

becnealon commented Jan 5, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 5, 2026

Uh oh!

becnealon Jan 5, 2026 •

edited

Loading

Uh oh!

danieljprice Jan 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

gemini-code-assist bot Jan 5, 2026

Uh oh!

becnealon Jan 5, 2026

Uh oh!

danieljprice Jan 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Apr speedup #779

Are you sure you want to change the base?

Apr speedup #779

Uh oh!

Conversation

becnealon commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Jan 5, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

becnealon Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danieljprice Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

becnealon Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

danieljprice Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

becnealon commented Jan 5, 2026 •

edited

Loading

becnealon Jan 5, 2026 •

edited

Loading

danieljprice Jan 8, 2026 •

edited

Loading