Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gas Cooling on GPU #185

Open
wants to merge 65 commits into
base: master
Choose a base branch
from
Open

Gas Cooling on GPU #185

wants to merge 65 commits into from

Conversation

spencerw
Copy link
Member

@spencerw spencerw commented Oct 7, 2024

A few of the cooling modules (boley, cosmo, metal and h2) use a stiff ODE solver (StiffStep), which creates a significant bottleneck during updateuDot. This PR introduces CudaStiffStep, which is a GPU version of the solver. With the CUDA flag enabled, the ODE integration now happens for all particles on a given TreePiece in parallel. The parameter 'nGpuGasMinParts' can be used to direct TreePieces with small particle counts to do the integration on the CPU.

This required making a few significant structural changes to the code:

  1. To minimize code duplication, __device__ __host__ specifiers have been added to many of the cooling subroutines. An empty .cu file has been added to each of the cooling modules, which allows the old C code to be used on the GPU. When the CUDA flag is enabled, these new .cu files are compiled separately for the host and device using the '-dc' flag and then linked together in a separate step at the end of the cuda.mk file. Also note that the clDerivs function for cosmo cooling makes use of RootFind, which required making some of the routines from stiff.c accessible from the device as well.

  2. The parallel nature of CudaStiffStep requires a separate clDerivsData and Stiff struct (along with space for the associated deep pointers) for each of the gas particles on both the host and device side. Originally, allocation for this data was handled by the TreePieces from within the cooling subroutines. This is now handled by the DataManager (allocCoolParticleBlock), which then assigns blocks of pre-allocated host and device memory to the TreePieces (setCoolPtrs). In the event that the gas particle count increases significantly, a larger block of memory is then re-allocated.

A test suite 'test_cooling' for the different cooling modules is also included.

spencerw added 30 commits May 14, 2024 21:10
Free all device data at end of simulation
@spencerw
Copy link
Member Author

spencerw commented Oct 7, 2024

Note that 'grackle' and 'planet' cooling still need to be tested and updated. Although they don't make use of StiffStep, changes to updateuDot and the way memory is managed have created some incompatibilities.

It looks like a few tweaks need to be made to get this to work without the CUDA or cooling flags as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant