Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel digit_last_wdc has race conditions #75

Open
wants to merge 1 commit into
base: linux
Choose a base branch
from
Open

Kernel digit_last_wdc has race conditions #75

wants to merge 1 commit into from

Conversation

fiigii
Copy link

@fiigii fiigii commented Jan 28, 2020

[This is a dummy PR to report a bug]

The code section below from kernel digit_last_wdc reads and writes shared memory across different threads without any synchronization.
In the current code, there is no guarantee that the former shared writing in "CALC_LEVEL_SMALL" happens before the later shared reading. You can see this race warning using "cuda-memcheck --tool racecheck"

You may see this program works fine with specific CUDA compiler versions or GPU architectures, but there is no guarantee that works well in the future. So, I suggest adding __syncwarp(); like below. See [1][2] for more details.

if (lane % 16 == 0)
{
	u32 plvl;
	if (lane == 0) plvl = buck[__byte_perm(pair, 0, 0x4510)].hash[1];
	else plvl = buck[__byte_perm(pair, 0, 0x4532)].hash[1];
	slotsmall* bucks = eq->treessmall[1][PACKER::get_bucketid(plvl, RB, SM)];
	u32 slot1 = PACKER::get_slot1(plvl, RB, SM);
	u32 slot0 = PACKER::get_slot0(plvl, slot1, RB, SM);
	levels[lane] = bucks[slot1].hash[2];
	levels[lane + 8] = bucks[slot0].hash[2];
}

__syncwarp();  // suggested change

if (lane % 8 == 0)
	CALC_LEVEL_SMALL(0, lane, lane + 4, 3);

__syncwarp();  // suggested change

if (lane % 4 == 0)
	CALC_LEVEL_SMALL(2, lane, lane + 2, 3);
		
__syncwarp(); // suggested change

if (lane % 2 == 0)
	CALC_LEVEL(0, lane, lane + 1, 4);
		
__syncwarp(); // suggested change

u32 ind[16];

u32 f1 = levels[lane];

[1] https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#synchronization-functions
[2] https://devblogs.nvidia.com/using-cuda-warp-level-primitives/

@fiigii fiigii requested a review from tpruvot January 28, 2020 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant