mask volumes with 4D data: OOM error #41

luiztauffer · 2024-09-19T08:33:06Z

Short Java error:

java.lang.OutOfMemoryError: GC overhead limit exceeded

Short Python error track:

Cell In[8], line 1
----> 1 voluseg.step3_mask_volumes(parameters)

File /mnt/shared_storage/Github/voluseg/voluseg/_steps/step3.py:159, in mask_volumes(parameters)
    156     volume_accum.add(volume)
    158 if p.parallel_volume:
--> 159     evenly_parallelize(p.volume_names[timepoints]).foreach(add_volume)
    160 else:
    161     for name_volume in p.volume_names[timepoints]:

some references:

https://stackoverflow.com/a/1393503/11483674

The text was updated successfully, but these errors were encountered:

luiztauffer · 2024-09-19T08:42:51Z

weirdly, this error stopped happening after I restarted the spark local cluster. But good to have it here for reference, in case it happens again

luiztauffer · 2024-09-20T11:40:05Z

reopening because this error is happening consistently for the 4D dataset, both in my local machine and remote machines running with docker.

Spark keeps having issues of memory at that point in the code, we should probably improve that operation.

log_file.log

luiztauffer · 2024-09-20T11:58:28Z

setting parallel_volume=False seems avoid the problem... but this might be inefficient?

luiztauffer · 2024-09-20T15:18:03Z

maybe related

luiztauffer · 2024-09-23T13:43:23Z

a similar error happens at step 5 - clean_cells. Similarly, the error is avoided by setting parallel_clean=False

Should we consider changing the default values of parallel_volume and parallel_clean to False? @mikarubi

mikarubi · 2024-09-25T03:20:55Z

So, just to clarify -- this is an out-of-memory error, correct? In general, we expect people to start with a lot of RAM for these analyses, so I am inclined to keep these on (so that the jobs run faster without people needing to manually turn them on). Is it possible, at all, to catch this error and return a more meaningful error message to the user? That would probably be ideal.

luiztauffer · 2024-09-25T04:52:25Z

the error is possibly due to an out of memory error allocated to the worker subprocesses. One possible solution would be to configure spark to increase this limit.

mikarubi · 2024-09-25T16:17:04Z

Ok, looking at this again.

The divide by 0 warning is not a problem, but rather just represents either missing fluorescence data or an ill-posed segmentation problem (that voluseg subsequently corrects for). We shouldn't worry about fixing it, and could just suppress it.
The out of memory error is most likely due to the size of the dataset. If it's possible to catch this error, and issue a descriptive error to the user (either advising them to increase memory or set parallels to zero that would be probably be enough.
As a possible addition, we could do some back of the envelope calculation to check if the requested memory will be enough for the job, and issue a warning if we think it won't.

luiztauffer · 2024-10-23T14:47:22Z

https://github.com/mikarubi/janelia_voluseg/blob/master/spark_properties.conf

luiztauffer closed this as completed Sep 19, 2024

luiztauffer reopened this Sep 20, 2024

luiztauffer changed the title ~~mask volumes wwith 4D data: OOM error~~ mask volumes with 4D data: OOM error Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mask volumes with 4D data: OOM error #41

mask volumes with 4D data: OOM error #41

luiztauffer commented Sep 19, 2024 •

edited

Loading

luiztauffer commented Sep 19, 2024

luiztauffer commented Sep 20, 2024 •

edited

Loading

luiztauffer commented Sep 20, 2024

luiztauffer commented Sep 20, 2024

luiztauffer commented Sep 23, 2024

mikarubi commented Sep 25, 2024

luiztauffer commented Sep 25, 2024

mikarubi commented Sep 25, 2024

luiztauffer commented Oct 23, 2024

mask volumes with 4D data: OOM error #41

mask volumes with 4D data: OOM error #41

Comments

luiztauffer commented Sep 19, 2024 • edited Loading

luiztauffer commented Sep 19, 2024

luiztauffer commented Sep 20, 2024 • edited Loading

luiztauffer commented Sep 20, 2024

luiztauffer commented Sep 20, 2024

luiztauffer commented Sep 23, 2024

mikarubi commented Sep 25, 2024

luiztauffer commented Sep 25, 2024

mikarubi commented Sep 25, 2024

luiztauffer commented Oct 23, 2024

luiztauffer commented Sep 19, 2024 •

edited

Loading

luiztauffer commented Sep 20, 2024 •

edited

Loading