Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error: device-side assert triggered #5

Open
ijorquera opened this issue Dec 27, 2022 · 2 comments
Open

CUDA error: device-side assert triggered #5

ijorquera opened this issue Dec 27, 2022 · 2 comments

Comments

@ijorquera
Copy link

ijorquera commented Dec 27, 2022

Hello, thanks for the project!

I was trying to run the FAIR benchmark using the validation and test set, but in both cases I get the following error:

Traceback (most recent call last): File "/TRUST/test.py", line 136, in <module> trust.test(return_params=True) File "/TRUST/lib/model.py", line 249, in test 'scene': scene_images[visind], RuntimeError: CUDA error: device-side assert triggered

There are also a few lines like this one:

/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [19,0,0], thread: [93,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.

I also checked the test_outputs folder and there was 1 folder with albedo texture jpg and a npy file, that for each folder in the fair benchmark val/test. There was also another folder (test_images_vis) with the images of the albedo jpg, the albedo applied in the 3d model, the light shader, the original input and the 3d model with the tex and the light shader.

The output folder makes me think that everything went alright, but I keep getting the error above every time I run the benchmark.

Any help regarding the error will be highly appreciated! Thanks again.

Edit:

The project was installed in a machine with CUDA 11.3 and pytorch 1.10.2 since the rtx 30xx series don't have support for CUDA 10.1

@jacksoncsy
Copy link

jacksoncsy commented Feb 1, 2023

Hello, this is caused by a bug in the code, when it loops through the data, the last batch does not always has the same number as the batch size.

TRUST/lib/model.py

Lines 244 to 256 in 5e44ab5

for col_num in range(self.config.batch_size):
visind = np.arange(col_num, col_num+1) # self.config.batch_size )
if self.config.test_data == 'benchmark_val':
visdict = {
'scene': scene_images[visind],
'inputs': images[visind],
'predicted_images': predicted_images_alpha[visind],
'albedo_images': predicted_albedo_images[visind],
'albedo': albedo[visind],
'pred_lightprobe': (predicted_shading * self.lightprobe_albedo_images)[visind],
}

@cafermutluozkan
Copy link

Hello @ijorquera, I got the same error.
As a solution, I did as follows;
I left 16 lines in test_crop_files.txt file in TRUST/FAIR_benchmark/test_set path and ran it.
The reason I left 16 lines is that batch_size = 16 in the test.py file.
python test.py --test_folder '/you_path/Trust/TRUST/data' --test_split test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants