[Draft] OpenACC port of halo exchanges #1355
Open
abishekg7 wants to merge 146 commits intoMPAS-Dev:developfrom
Open
[Draft] OpenACC port of halo exchanges #1355abishekg7 wants to merge 146 commits intoMPAS-Dev:developfrom
abishekg7 wants to merge 146 commits intoMPAS-Dev:developfrom
Conversation
56ed028 to
a8fda92
Compare
mgduda
reviewed
Sep 24, 2025
| use mpas_derived_types, only : domain_type, mpas_halo_group, MPAS_HALO_REAL, MPAS_LOG_CRIT | ||
| use mpas_pool_routines, only : mpas_pool_get_array | ||
| use mpas_log, only : mpas_log_write | ||
| use mpas_timer, only : mpas_timer_start, mpas_timer_stop |
Contributor
There was a problem hiding this comment.
Don't forget to update the Makefile to add a dependency on mpas_timer.o for the mpas_halo.o target.
d0c1431 to
1e08917
Compare
…rinsky scheme on Cartesian planes. This also fixes periodicity on those planes for the coefficients.
Added unit test for deformation coefficients on Cartesian-plane meshes.
The algorithm for the 2D Smagorinsky eddy viscosity coefficients are implemented in this module. Others to follow.
(2) Changes in atm_compute_dyn_tend to call dissipation module subroutine to compute 2D Smagorinsky eddy viscosity.
so that compile time specification of Nvertlevels and maxEdges is enabled.
Made string available in atm_compute_dyn_tend and set up logic to allow for different dissipation options that now include the les models.
subroutine atm_compute_dyn_tend to subroutines in mpas_atm_dissipation.F. The results are no longer bit-for-bit with the modified code because we have re-arranged the order to the processes in the vertical momentum equation to accommodate doing the horizontal and vertical dissipation for w together.
This is config_init_case = 10
…liations. added a new 3D Smagorinsky eddy viscosity computation and vertical mixing for the dynamics variables. All code compiles but not tested.
This commit introduces changes to ensure that building with -DCURVATURE still produces the correct results, compared to the nvhpc cpu reference. This involves removing the data movement of the reconstructed zonal and meridional velocities in the atm_compute_dyn_tend_work subroutine and instead using copyin for the same fields in mpas_atm_pre_dynamics_h2d. This commit also removes the ACC data Xfer timers for the atm_compute_dyn_tend_work subroutine, as we only have create/delete statements
Modifying the existing OpenACC data movements of lbc_* variables in the mpas_atm_pre_dynamics_h2d and mpas_atm_post_dynamics_d2h subroutines to be contingent on config_apply_lbcs being true. This avoids unexpected behavior when lbc_* variables are uninitialized.
This commit does work and matches the previous results!
NOTE: The last commit was successful!
Last commit had differences from the baseline. It's either this, or the change dropping 'update device(group % sendBuf(:)' in the last commit
Last commit still had answer differences
This should make the dependency analysis easier on the compiler. NOTE: The last commit succeeded and had no diffs after 1 timestep compared to a reference run!
…o force GPUDirect MPI NOTE: The last commit ran successfully and matched previous 1 step results
…r variables Last run failed with CUDA_ERROR_ILLEGAL_ADDRESS, I think keeping these on the GPU would help!
Last commit gave me some big differences, let's see if this helps. If this helps, then that means I wasn't using GPU-aware MPI routines like I thought...
…calls instead Last commit still had answer differences. NOTE: This commit does too
Introducing a new namelist option under development, config_gpu_aware_mpi, which will control whether the OpenACC run of MPAS on GPUs will use GPU-aware MPI or do a device<->host update of variables around the call to a purely CPU- based halo exchange. Note: This feature is not available to use when config_halo_exch_method is set to 'mpas_dmpar'
1e08917 to
eff87cf
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR enables execution of halo exchanges on GPUs via OpenACC directives. This uses #1315 as the base branch, so 1315 needs to be merged before the current PR can be merged.
The packing and unpacking code around the halo exchanges use
!$acc parallelregions.The actual MPI_Isend and MPI_Irecv operations use CUDA-aware MPI, by wrapping these calls within