You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wrote some code to get the status of the GPU, where get handle by uuid.
But sometimes the code will get the ERROR_NOT_FOUND (6) of the error, it not always happened.
here are my codes:
if ret != nvml.SUCCESS {
log.Printf("Failed to initialize NVML: %s\n", nvml.ErrorString(ret))
}
defer func() {
ret := nvml.Shutdown()
if ret != nvml.SUCCESS {
log.Printf("Failed to shut down NVML: %s\n", nvml.ErrorString(ret))
}
}()
device, ret := nvml.DeviceGetHandleByUUID(gpu_uuid)
for ret != nvml.SUCCESS {
log.Printf("Failed to get device handle: %s\n", nvml.ErrorString(ret))
ret = nvml.Shutdown()
time.Sleep(time.Second * 5)
ret = nvml.Init()
time.Sleep(time.Second * 1)
device, ret = nvml.DeviceGetHandleByUUID(gpu_uuid)
}
memory, ret := device.GetMemoryInfo()
if ret != nvml.SUCCESS {
log.Printf("Failed to get device memory info: %s\n", nvml.ErrorString(ret))
}
I try to address it by restart the nvml connection service in codes, but it still get that wrong.
However, when the function is over, I do not stop debug, I give it a uuid by gRPC,, DeviceGetHandByUUID can work normally, that is wired.
Anyone help me to fix this bugs?
The text was updated successfully, but these errors were encountered:
I tried to use nvml.DeviceGetHandleBySerial instead nvml.DeviceGetHandleByUUID, it works smoothly, have no idea why the function of uuid went wrong sometimes.
I wrote some code to get the status of the GPU, where get handle by uuid.
But sometimes the code will get the ERROR_NOT_FOUND (6) of the error, it not always happened.
here are my codes:
I try to address it by restart the nvml connection service in codes, but it still get that wrong.
However, when the function is over, I do not stop debug, I give it a uuid by gRPC,, DeviceGetHandByUUID can work normally, that is wired.
Anyone help me to fix this bugs?
The text was updated successfully, but these errors were encountered: