Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mask volumes with 4D data: OOM error #41

Open
luiztauffer opened this issue Sep 19, 2024 · 9 comments
Open

mask volumes with 4D data: OOM error #41

luiztauffer opened this issue Sep 19, 2024 · 9 comments

Comments

@luiztauffer
Copy link
Collaborator

luiztauffer commented Sep 19, 2024

Short Java error:

java.lang.OutOfMemoryError: GC overhead limit exceeded

Short Python error track:

Cell In[8], line 1
----> 1 voluseg.step3_mask_volumes(parameters)

File /mnt/shared_storage/Github/voluseg/voluseg/_steps/step3.py:159, in mask_volumes(parameters)
    156     volume_accum.add(volume)
    158 if p.parallel_volume:
--> 159     evenly_parallelize(p.volume_names[timepoints]).foreach(add_volume)
    160 else:
    161     for name_volume in p.volume_names[timepoints]:

some references:

@luiztauffer
Copy link
Collaborator Author

weirdly, this error stopped happening after I restarted the spark local cluster. But good to have it here for reference, in case it happens again

@luiztauffer
Copy link
Collaborator Author

luiztauffer commented Sep 20, 2024

reopening because this error is happening consistently for the 4D dataset, both in my local machine and remote machines running with docker.

Spark keeps having issues of memory at that point in the code, we should probably improve that operation.

log_file.log

@luiztauffer luiztauffer reopened this Sep 20, 2024
@luiztauffer luiztauffer changed the title mask volumes wwith 4D data: OOM error mask volumes with 4D data: OOM error Sep 20, 2024
@luiztauffer
Copy link
Collaborator Author

setting parallel_volume=False seems avoid the problem... but this might be inefficient?

@luiztauffer
Copy link
Collaborator Author

maybe related
image

@luiztauffer
Copy link
Collaborator Author

a similar error happens at step 5 - clean_cells. Similarly, the error is avoided by setting parallel_clean=False

Should we consider changing the default values of parallel_volume and parallel_clean to False? @mikarubi

@mikarubi
Copy link
Owner

So, just to clarify -- this is an out-of-memory error, correct? In general, we expect people to start with a lot of RAM for these analyses, so I am inclined to keep these on (so that the jobs run faster without people needing to manually turn them on). Is it possible, at all, to catch this error and return a more meaningful error message to the user? That would probably be ideal.

@luiztauffer
Copy link
Collaborator Author

the error is possibly due to an out of memory error allocated to the worker subprocesses. One possible solution would be to configure spark to increase this limit.

@mikarubi
Copy link
Owner

Ok, looking at this again.

  • The divide by 0 warning is not a problem, but rather just represents either missing fluorescence data or an ill-posed segmentation problem (that voluseg subsequently corrects for). We shouldn't worry about fixing it, and could just suppress it.
  • The out of memory error is most likely due to the size of the dataset. If it's possible to catch this error, and issue a descriptive error to the user (either advising them to increase memory or set parallels to zero that would be probably be enough.
  • As a possible addition, we could do some back of the envelope calculation to check if the requested memory will be enough for the job, and issue a warning if we think it won't.

@luiztauffer
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants