This code uses CUDA to parallelize nested loops of the type:
for(int ix=0; ix<Nx; ix++) {
for(int iy=0; iy<Ny[ix]; iy++) {
NestedLoopFunction(ix, iy, ...);
...
where Ny[] is an array in CUDA global memory. In order to use it:
-
Install cub from https://nvlabs.github.io/cub/
-
Add to your code the files nested_loop.cu, nested_loop.h, Ny_th.h, cuda_error.h
-
Put the body of the nested loop in a device function (NestedLoopFunction in this example) and modify as you need the call to this function in nested_loop.cu
-
At the beginning of your program call NestedLoop::Init();
-
Run the nested loop with the command: NestedLoop::Run(Nx, d_Ny); where d_Ny is the array Ny[ix] stored in CUDA global memory
See the example in the file example.cu