We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The library cudnn_cnn_infer64_8.dll is not used on Windows, but libcudnn_cnn_infer.so.8 is used on Linux. This seems to make a visible NPS difference.
cudnn_cnn_infer64_8.dll
libcudnn_cnn_infer.so.8
e.g. Ubuntu 18.04:
GPU: RTX 2070 OC
isready info string onnx file: model/chess/model-1.23453-0.572-0537-bsize-1.onnx info string deserialize engine: model/chess/model-bsize1-fp16-0.trt info string inputDims: (1, 39, 8, 8) info string valueOutputDims: (1, 1) info string policyOutputDims: (1, 4864) info string No auxiliary outputs detected. info string onnx file: model/chess/model-1.23453-0.572-0537-bsize-16.onnx info string deserialize engine: model/chess/model-bsize16-fp16-0.trt info string onnx file: model/chess/model-1.23453-0.572-0537-bsize-16.onnx info string deserialize engine: model/chess/model-bsize16-fp16-0.trt info string inputDims: (16, 39, 8, 8) info string valueOutputDims: (16, 1) info string policyOutputDims: (16, 4864) info string No auxiliary outputs detected. readyok go infinite info string create new tree info string run mcts search info depth 17 seldepth 28 multipv 1 score cp 47 nodes 18522 nps 18485 tbhits 0 time 1002 pv d2d4 g8f6 c2c4 e7e6 g1f3 b7b6 g2g3 c8a6 b2b3 f8b4 c1d2 b4e7 f1g2 c7c6 d2c3 d7d5 b1d2 info depth 19 seldepth 31 multipv 1 score cp 47 nodes 38347 nps 19154 tbhits 0 time 2002 pv d2d4 g8f6 c2c4 e7e6 g1f3 b7b6 g2g3 c8a6 b2b3 f8b4 c1d2 b4e7 f1g2 c7c6 d2c3 d7d5 b1d2 b8d7 e1g1 info depth 19 seldepth 37 multipv 1 score cp 47 nodes 57007 nps 18990 tbhits 0 time 3002 pv d2d4 g8f6 c2c4 e7e6 g1f3 b7b6 g2g3 c8a6 b2b3 f8b4 c1d2 b4e7 f1g2 c7c6 d2c3 d7d5 b1d2 b8d7 e1g1
GPU-Utility: 91%
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GeForce RTX 207... Off | 00000000:01:00.0 Off | N/A | | 0% 48C P2 152W / 215W | 677MiB / 7982MiB | 91% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 108... Off | 00000000:0B:00.0 On | N/A | | 0% 54C P2 56W / 250W | 441MiB / 11177MiB | 3% Default | | | | N/A | +-------------------------------+----------------------+----------------------+
e.g. Windows 10:
isready info string onnx file: model/chess/model-1.23453-0.572-0537-bsize-1.onnx info string deserialize engine: model/chess/model-bsize1-fp16-0.trt info string inputDims: (1, 39, 8, 8) info string valueOutputDims: (1, 1) info string policyOutputDims: (1, 4864) info string No auxiliary outputs detected. info string onnx file: model/chess/model-1.23453-0.572-0537-bsize-16.onnx info string deserialize engine: model/chess/model-bsize16-fp16-0.trt info string onnx file: model/chess/model-1.23453-0.572-0537-bsize-16.onnx info string deserialize engine: model/chess/model-bsize16-fp16-0.trt info string inputDims: (16, 39, 8, 8) info string valueOutputDims: (16, 1) info string policyOutputDims: (16, 4864) info string No auxiliary outputs detected. readyok go infinite info string create new tree info string run mcts search info depth 17 seldepth 28 multipv 1 score cp 47 nodes 16500 nps 16369 tbhits 0 time 1008 pv d2d4 g8f6 c2c4 e7e6 g1f3 b7b6 g2g3 c8a6 b2b3 f8b4 c1d2 b4e7 f1g2 c7c6 d2c3 d7d5 b1d2 info depth 19 seldepth 31 multipv 1 score cp 47 nodes 33367 nps 16584 tbhits 0 time 2012 pv d2d4 g8f6 c2c4 e7e6 g1f3 b7b6 g2g3 c8a6 b2b3 f8b4 c1d2 b4e7 f1g2 c7c6 d2c3 d7d5 b1d2 b8d7 e1g1 info depth 19 seldepth 33 multipv 1 score cp 47 nodes 50400 nps 16617 tbhits 0 time 3033 pv d2d4 g8f6 c2c4 e7e6 g1f3 b7b6 g2g3 c8a6 b2b3 f8b4 c1d2 b4e7 f1g2 c7c6 d2c3 d7d5 b1d2 b8d7 e1g1
GPU-Utility: 85%
C:\Windows\System32\DriverStore\FileRepository\nv_dispui.inf_amd64_c1f8f32cc9af9677>nvidia-smi Fri Apr 9 17:37:37 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 461.33 Driver Version: 461.33 CUDA Version: 11.2 | |-------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GeForce RTX 207... WDDM | 00000000:01:00.0 Off | N/A | | 29% 59C P2 141W / 215W | 845MiB / 8192MiB | 85% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 108... WDDM | 00000000:0B:00.0 On | N/A | | 0% 38C P8 17W / 250W | 692MiB / 11264MiB | 1% Default | | | | N/A | +-------------------------------+----------------------+----------------------+
The text was updated successfully, but these errors were encountered:
I was able to link the binary to cudnn_cnn_infer64_8.dll but this didn't seem to help unfortunately.
Also adding certain optimization options such as /O2 (Maximize Speed), /GL (Whole program optimization), /LTCG (Link-time code generation) didn't result in a NPS improvement.
Sorry, something went wrong.
No branches or pull requests
The library
cudnn_cnn_infer64_8.dll
is not used on Windows, butlibcudnn_cnn_infer.so.8
is used on Linux.This seems to make a visible NPS difference.
e.g. Ubuntu 18.04:
GPU: RTX 2070 OC
GPU-Utility: 91%
e.g. Windows 10:
GPU: RTX 2070 OC
GPU-Utility: 85%
The text was updated successfully, but these errors were encountered: