Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GTX Titan runtime error #4

Open
acharara opened this issue Feb 20, 2017 · 2 comments
Open

GTX Titan runtime error #4

acharara opened this issue Feb 20, 2017 · 2 comments
Labels

Comments

@acharara
Copy link
Collaborator

A runtime error occurs when running TRMM or TRSM on GTX Titan device.
Error reports cuBLAS error or invalid memory access.
No definite sequence is noticed, error is mostly happening with single precision and complex precisions.

@acharara
Copy link
Collaborator Author

Thorough error checking is added in #5 to pin down this problem.
Error happens when synchronizing with stream.
Happens only on some Titan devices.
Not resolved yet, need further investigation.

@egonzalf
Copy link
Contributor

Just a reminder of the error

$ nvidia-smi
Tue Feb 28 15:31:20 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48                 Driver Version: 367.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TITAN   On   | 0000:02:00.0     Off |                  N/A |
| 30%   34C    P8    23W / 250W |      0MiB /  6080MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TITAN   On   | 0000:03:00.0     Off |                  N/A |
| 30%   29C    P8    12W / 250W |      0MiB /  6082MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K20c          On   | 0000:83:00.0     Off |                  Off |
| 30%   27C    P8    17W / 225W |      0MiB /  5060MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K20c          On   | 0000:84:00.0     Off |                    0 |
| 30%   26C    P8    15W / 225W |      0MiB /  4742MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
$ CUDA_VISIBLE_DEVICES=1,0 ./kblas-test-l3-parallel.py 
cd .; mkdir -p kblas-test-log
running: ./testing/bin/test_strmm ... 
 running: ./testing/bin/test_dtrmm ... 
./testing/bin/test_strmm done
running: ./testing/bin/test_ctrmm ... 
CUDA runtime error: an illegal memory access was encountered (77) in test_trmm at test_trmm.ch:167
./testing/bin/test_dtrmm done
running: ./testing/bin/test_ztrmm ... 
CUDA runtime error: an illegal memory access was encountered (77) in get_elapsed_time at testing_Xtr_common.h:334
CUDA runtime error: an illegal memory access was encountered (77) in get_elapsed_time at testing_Xtr_common.h:334
./testing/bin/test_ztrmm done
running: ./testing/bin/test_dtrmm_cpu ... 

@acharara acharara added the bug label Apr 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants