-
Notifications
You must be signed in to change notification settings - Fork 321
Apr speedup #779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Apr speedup #779
Conversation
…t option This produces a slightly different answer than before because the minimum function returns a different number. After testing, I think this version is actually the correct one. It only changes slightly the distance that the particles are placed at but not their direction or properties.
produces same as previous results if adjusted_split = .false. and nangle is held constant
This reverts commit 219ff68.
This reverts commit 72eeb0a.
Summary of ChangesHello @becnealon, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request delivers substantial performance enhancements to the Adaptive Particle Refinement (APR) routines. The core "update_apr" logic has been optimized by introducing OpenMP for parallel processing of particle splitting and merging decisions. Furthermore, the "splitpart" functionality has undergone a significant refactoring, extracting specific splitting and geodesic integration methods into a new module ("get_apr_level.f90"). This modular approach improves code clarity, simplifies future modifications, and ensures robust handling of various physical scenarios, including General Relativity, while also adding a crucial check for EOS compatibility. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the Adaptive Particle Refinement (APR) routines to improve performance, primarily through parallelization using OpenMP. The changes are extensive, touching particle splitting and merging logic. While the refactoring is a significant improvement in terms of code structure and potential speedup, I've identified a critical bug related to array sizing that could lead to memory corruption, a high-severity race condition, and a few medium-severity issues related to maintainability and performance that should be addressed.
| integer :: should_split(apr_max*npart), idx_split(apr_max*npart), scan_array(apr_max*npart) | ||
| integer :: idx_merge(apr_max*npart),should_merge(apr_max*npart),apr_last | ||
| real :: get_apr_in(3),rneighs(2*npart),xi,yi,zi,dx,dy,dz,rmin_local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The automatic arrays should_split, idx_split, scan_array, idx_merge, should_merge, and rneighs are sized based on the value of npart at the subroutine's entry. However, npart is an intent(inout) parameter that is increased within the particle splitting logic. This will lead to out-of-bounds memory access when npartold or npart become larger than the initial npart value in subsequent iterations of the loops. This is a critical bug that can cause crashes or incorrect behavior.
These arrays should be sized with maxp (which is available from the dim module) to prevent buffer overflows, or be declared as allocatable and sized appropriately before use.
integer :: should_split(maxp), idx_split(maxp), scan_array(maxp)
integer :: idx_merge(maxp), should_merge(maxp), apr_last
real :: get_apr_in(3), rneighs(maxp), xi, yi, zi, dx, dy, dz, rmin_local
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above is true, these arrays technically should be maxp large. However in general they are about 10% of maxp which then makes it wasteful and costly since maxp -> 8 * the original maxp when you're using apr. If someone is splitting enough particles that they hit this limit they are basically splitting their entire simulation which is not a useful application of APR. I find the above is a fair practical limit that doesn't add to runtime. Let me know if you want anything different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local arrays should never be declared like this, if you really want large temporary arrays you should explicitly allocate and deallocate, i.e. "declared as allocatable and sized appropriately before use"
| !$omp do schedule(dynamic) | ||
| do ii = 1,idx_len | ||
| mm = idx_split(ii) ! original particle that should be split | ||
| xi = xyzh(1,mm) | ||
| yi = xyzh(2,mm) | ||
| zi = xyzh(3,mm) | ||
|
|
||
| rmin_local = huge(1.0) | ||
|
|
||
| do j = 1,npartold | ||
| if (j == mm) cycle | ||
| dx = xi - xyzh(1,j) | ||
| dy = yi - xyzh(2,j) | ||
| dz = zi - xyzh(3,j) | ||
| rmin_local = min(rmin_local,dx*dx + dy*dy + dz*dz) | ||
| enddo | ||
| rneighs(ii) = sqrt(rmin_local) | ||
| enddo | ||
| !$omp end do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The brute-force nearest neighbor search within this parallelized loop has a complexity of O(n_to_split * npartold), which can be a significant performance bottleneck if n_to_split is large. The comment in the old closest_neigh subroutine indicated that it needed replacing.
While parallelizing the existing algorithm provides a speedup, consider implementing a more efficient algorithm, such as a tree-based search (e.g., using the existing k-d tree infrastructure), to further improve performance. This would change the complexity from O(N^2) to O(N log N).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danieljprice let me know what is best here ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would suggest to go ahead merge for now, implementing a tree-based search here is a problem for another p-r
Description:
Editing the APR routines to increase speed - previously most of these routines were serial (by necessity). These have been tidied up, serialised where possible and made more compact in an attempt to make the code more efficient. Speedup on my tests is about 20%. This is in response to #766.
Components modified:
Type of change:
Testing:
I tested against a reference calculation from the previous version of the APR routines. Because particles are placed slightly differently than before, the results are not identical however important conserved quantities track the same. Passed the APR section of the testsuite.
Did you run the bots? no
Did you update relevant documentation in the docs directory? N/A
Did you add comments such that the purpose of the code is understandable? hope so
Is there a unit test that could be added for this feature/bug? no
Important notes: