Skip to content

Conversation

@becnealon
Copy link
Collaborator

@becnealon becnealon commented Jan 5, 2026

Description:
Editing the APR routines to increase speed - previously most of these routines were serial (by necessity). These have been tidied up, serialised where possible and made more compact in an attempt to make the code more efficient. Speedup on my tests is about 20%. This is in response to #766.

Components modified:

  • Main code (src/main)

Type of change:

  • Performance improvements
  • Code cleanup / refactor

Testing:
I tested against a reference calculation from the previous version of the APR routines. Because particles are placed slightly differently than before, the results are not identical however important conserved quantities track the same. Passed the APR section of the testsuite.

Did you run the bots? no

Did you update relevant documentation in the docs directory? N/A

Did you add comments such that the purpose of the code is understandable? hope so

Is there a unit test that could be added for this feature/bug? no

Important notes:

  • A previous commit removed rand() in splitpart to ensure that results are always exactly replicable. This was replaced with a psuedo random number generated by a counter. This is one of the reasons to keep the loop that calls splitpart serialised (also it probably doesn't go over that many particles each time).
  • I tried a few different things to see what would produce the fastest speedup, they're all documented as previous commits on this branch. The pushed one gave the best speedup of 1.2x on a N=10^6 circumbinary disc with apr_max=3 centred on stars.
  • I have not tested this with GR. I moved the GR split routines (simple copy and paste) but would appreciate if @Fitz-Hu could either test it yourself or send me a file that I can test from?

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @becnealon, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers substantial performance enhancements to the Adaptive Particle Refinement (APR) routines. The core "update_apr" logic has been optimized by introducing OpenMP for parallel processing of particle splitting and merging decisions. Furthermore, the "splitpart" functionality has undergone a significant refactoring, extracting specific splitting and geodesic integration methods into a new module ("get_apr_level.f90"). This modular approach improves code clarity, simplifies future modifications, and ensures robust handling of various physical scenarios, including General Relativity, while also adding a crucial check for EOS compatibility.

Highlights

  • Performance Optimization: The APR routines have been significantly optimized, leading to an approximate 20% speedup in the author's tests.
  • OpenMP Parallelization: OpenMP directives have been introduced in "update_apr" to parallelize the identification of particles for splitting and merging, improving computational efficiency.
  • Modular splitpart Logic: The complex "splitpart" subroutine has been refactored and its specific splitting strategies (e.g., for GR, different "split_dir" values) have been moved into dedicated subroutines within "get_apr_level.f90", enhancing code organization and maintainability.
  • Geodesic Integration Relocation: The "integrate_geodesic" and "integrate_geodesic_gr" subroutines, previously in "apr.f90", have been moved to "get_apr_level.f90" to support the modular "splitpart" implementation.
  • Error Handling for EOS: A new fatal error check ensures that "split_dir == 2" is not used with an incompatible Equation of State (EOS) choice ("maxvxyzu < 4").

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Adaptive Particle Refinement (APR) routines to improve performance, primarily through parallelization using OpenMP. The changes are extensive, touching particle splitting and merging logic. While the refactoring is a significant improvement in terms of code structure and potential speedup, I've identified a critical bug related to array sizing that could lead to memory corruption, a high-severity race condition, and a few medium-severity issues related to maintainability and performance that should be addressed.

Comment on lines +163 to +165
integer :: should_split(apr_max*npart), idx_split(apr_max*npart), scan_array(apr_max*npart)
integer :: idx_merge(apr_max*npart),should_merge(apr_max*npart),apr_last
real :: get_apr_in(3),rneighs(2*npart),xi,yi,zi,dx,dy,dz,rmin_local
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The automatic arrays should_split, idx_split, scan_array, idx_merge, should_merge, and rneighs are sized based on the value of npart at the subroutine's entry. However, npart is an intent(inout) parameter that is increased within the particle splitting logic. This will lead to out-of-bounds memory access when npartold or npart become larger than the initial npart value in subsequent iterations of the loops. This is a critical bug that can cause crashes or incorrect behavior.

These arrays should be sized with maxp (which is available from the dim module) to prevent buffer overflows, or be declared as allocatable and sized appropriately before use.

integer :: should_split(maxp), idx_split(maxp), scan_array(maxp)
integer :: idx_merge(maxp), should_merge(maxp), apr_last
real :: get_apr_in(3), rneighs(maxp), xi, yi, zi, dx, dy, dz, rmin_local

Copy link
Collaborator Author

@becnealon becnealon Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above is true, these arrays technically should be maxp large. However in general they are about 10% of maxp which then makes it wasteful and costly since maxp -> 8 * the original maxp when you're using apr. If someone is splitting enough particles that they hit this limit they are basically splitting their entire simulation which is not a useful application of APR. I find the above is a fair practical limit that doesn't add to runtime. Let me know if you want anything different.

Copy link
Owner

@danieljprice danieljprice Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local arrays should never be declared like this, if you really want large temporary arrays you should explicitly allocate and deallocate, i.e. "declared as allocatable and sized appropriately before use"

Comment on lines +284 to +302
!$omp do schedule(dynamic)
do ii = 1,idx_len
mm = idx_split(ii) ! original particle that should be split
xi = xyzh(1,mm)
yi = xyzh(2,mm)
zi = xyzh(3,mm)

rmin_local = huge(1.0)

do j = 1,npartold
if (j == mm) cycle
dx = xi - xyzh(1,j)
dy = yi - xyzh(2,j)
dz = zi - xyzh(3,j)
rmin_local = min(rmin_local,dx*dx + dy*dy + dz*dz)
enddo
rneighs(ii) = sqrt(rmin_local)
enddo
!$omp end do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The brute-force nearest neighbor search within this parallelized loop has a complexity of O(n_to_split * npartold), which can be a significant performance bottleneck if n_to_split is large. The comment in the old closest_neigh subroutine indicated that it needed replacing.

While parallelizing the existing algorithm provides a speedup, consider implementing a more efficient algorithm, such as a tree-based search (e.g., using the existing k-d tree infrastructure), to further improve performance. This would change the complexity from O(N^2) to O(N log N).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danieljprice let me know what is best here ...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would suggest to go ahead merge for now, implementing a tree-based search here is a problem for another p-r

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants