Kernel digit_last_wdc has race conditions #75
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[This is a dummy PR to report a bug]
The code section below from kernel
digit_last_wdc
reads and writes shared memory across different threads without any synchronization.In the current code, there is no guarantee that the former shared writing in "CALC_LEVEL_SMALL" happens before the later shared reading. You can see this race warning using "cuda-memcheck --tool racecheck"
You may see this program works fine with specific CUDA compiler versions or GPU architectures, but there is no guarantee that works well in the future. So, I suggest adding
__syncwarp();
like below. See [1][2] for more details.[1] https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#synchronization-functions
[2] https://devblogs.nvidia.com/using-cuda-warp-level-primitives/