Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugs for DeviceGetHandleByUUID #41

Open
jiaozhentian opened this issue Apr 6, 2022 · 3 comments
Open

bugs for DeviceGetHandleByUUID #41

jiaozhentian opened this issue Apr 6, 2022 · 3 comments

Comments

@jiaozhentian
Copy link

jiaozhentian commented Apr 6, 2022

I wrote some code to get the status of the GPU, where get handle by uuid.
But sometimes the code will get the ERROR_NOT_FOUND (6) of the error, it not always happened.
here are my codes:

	if ret != nvml.SUCCESS {
		log.Printf("Failed to initialize NVML: %s\n", nvml.ErrorString(ret))
	}
	defer func() {
		ret := nvml.Shutdown()
		if ret != nvml.SUCCESS {
			log.Printf("Failed to shut down NVML: %s\n", nvml.ErrorString(ret))
		}
	}()
	device, ret := nvml.DeviceGetHandleByUUID(gpu_uuid)
	for ret != nvml.SUCCESS {
		log.Printf("Failed to get device handle: %s\n", nvml.ErrorString(ret))
		ret = nvml.Shutdown()
		time.Sleep(time.Second * 5)
		ret = nvml.Init()
		time.Sleep(time.Second * 1)
		device, ret = nvml.DeviceGetHandleByUUID(gpu_uuid)
	}
	memory, ret := device.GetMemoryInfo()
	if ret != nvml.SUCCESS {
		log.Printf("Failed to get device memory info: %s\n", nvml.ErrorString(ret))
	}

I try to address it by restart the nvml connection service in codes, but it still get that wrong.
However, when the function is over, I do not stop debug, I give it a uuid by gRPC,, DeviceGetHandByUUID can work normally, that is wired.
Anyone help me to fix this bugs?

@jiaozhentian
Copy link
Author

I tried to use nvml.DeviceGetHandleBySerial instead nvml.DeviceGetHandleByUUID, it works smoothly, have no idea why the function of uuid went wrong sometimes.

@elezar
Copy link
Member

elezar commented Aug 29, 2022

@jiaozhentian it may be that @klueska addressed this in #48. Would you be able to try with the latest version?

@alexbagirov
Copy link
Contributor

Hi. This problem still persists. Do you know any workarounds?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants