Skip to content

[Draft] OpenACC port of halo exchanges #1355

Open
abishekg7 wants to merge 146 commits intoMPAS-Dev:developfrom
abishekg7:framework/acc_halo_exch
Open

[Draft] OpenACC port of halo exchanges #1355
abishekg7 wants to merge 146 commits intoMPAS-Dev:developfrom
abishekg7:framework/acc_halo_exch

Conversation

@abishekg7
Copy link
Collaborator

@abishekg7 abishekg7 commented Aug 7, 2025

This PR enables execution of halo exchanges on GPUs via OpenACC directives. This uses #1315 as the base branch, so 1315 needs to be merged before the current PR can be merged.

The packing and unpacking code around the halo exchanges use!$acc parallel regions.

The actual MPI_Isend and MPI_Irecv operations use CUDA-aware MPI, by wrapping these calls within

!$acc host_data use_device(pointer_to_buffer)

!$acc end host_data

@abishekg7 abishekg7 force-pushed the framework/acc_halo_exch branch 2 times, most recently from 56ed028 to a8fda92 Compare August 15, 2025 01:07
use mpas_derived_types, only : domain_type, mpas_halo_group, MPAS_HALO_REAL, MPAS_LOG_CRIT
use mpas_pool_routines, only : mpas_pool_get_array
use mpas_log, only : mpas_log_write
use mpas_timer, only : mpas_timer_start, mpas_timer_stop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to update the Makefile to add a dependency on mpas_timer.o for the mpas_halo.o target.

@abishekg7 abishekg7 force-pushed the framework/acc_halo_exch branch from d0c1431 to 1e08917 Compare October 13, 2025 22:50
Jimy Dudhia and others added 25 commits January 20, 2026 17:13
…rinsky scheme on Cartesian planes.

This also fixes periodicity on those planes for the coefficients.
Added unit test for deformation coefficients on Cartesian-plane meshes.
The algorithm for the 2D Smagorinsky eddy viscosity coefficients are
implemented in this module.  Others to follow.
(2) Changes in atm_compute_dyn_tend to call dissipation module subroutine
to compute 2D Smagorinsky eddy viscosity.
so that compile time specification of Nvertlevels and maxEdges is enabled.
Made string available in atm_compute_dyn_tend and set up logic to
allow for different dissipation options that now include the les models.
subroutine atm_compute_dyn_tend to subroutines in mpas_atm_dissipation.F.
The results are no longer bit-for-bit with the modified code because we have
re-arranged the order to the processes in the vertical momentum equation to
accommodate doing the horizontal and vertical dissipation for w together.
…liations.

added a new 3D Smagorinsky eddy viscosity computation and vertical mixing for the
dynamics variables.  All code compiles but not tested.
abishekg7 and others added 28 commits February 20, 2026 12:02
This commit introduces changes to ensure that building with -DCURVATURE still
produces the correct results, compared to the nvhpc cpu reference. This
involves removing the data movement of the reconstructed zonal and meridional
velocities in the atm_compute_dyn_tend_work subroutine and instead using
copyin for the same fields in mpas_atm_pre_dynamics_h2d.

This commit also removes the ACC data Xfer timers for the
atm_compute_dyn_tend_work subroutine, as we only have create/delete
statements
Modifying the existing OpenACC data movements of lbc_* variables in the
mpas_atm_pre_dynamics_h2d and mpas_atm_post_dynamics_d2h subroutines to be
contingent on config_apply_lbcs being true. This avoids unexpected behavior
when lbc_* variables are uninitialized.
This commit does work and matches the previous results!
NOTE: The last commit was successful!
Last commit had differences from the baseline. It's either this, or the
change dropping 'update device(group % sendBuf(:)' in the last commit
Last commit still had answer differences
This should make the dependency analysis easier on the compiler.

NOTE: The last commit succeeded and had no diffs after 1 timestep
compared to a reference run!
…o force GPUDirect MPI

NOTE: The last commit ran successfully and matched previous 1 step
results
…r variables

Last run failed with CUDA_ERROR_ILLEGAL_ADDRESS, I think keeping these
on the GPU would help!
Last commit gave me some big differences, let's see if this helps.

If this helps, then that means I wasn't using GPU-aware MPI routines
like I thought...
…calls instead

Last commit still had answer differences.

NOTE: This commit does too
Introducing a new namelist option under development, config_gpu_aware_mpi,
which will control whether the OpenACC run of MPAS on GPUs will use GPU-aware
MPI or do a device<->host update of variables around the call to a purely CPU-
based halo exchange.

Note: This feature is not available to use when config_halo_exch_method is set
to 'mpas_dmpar'
@abishekg7 abishekg7 force-pushed the framework/acc_halo_exch branch from 1e08917 to eff87cf Compare February 27, 2026 06:04
@abishekg7 abishekg7 marked this pull request as ready for review February 27, 2026 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants