-
Notifications
You must be signed in to change notification settings - Fork 26
[FeatureRequest] Dataset Metrics #85
Comments
To move this on, we have to think of a way to lazily calculate dataset metrics. The only thing I can currently think of is to save intermediate values to disk and reload them afterwards (although this may slow down things due to i/o bottlenecks). A (probably) good hybrid solution has some requirements on the metric for good performance (but should also work without these requirements, but For better performance, the metric must be dividable into 2 sub-steps, where one of the sub-steps works on a per-sample/per-batch basis and reduces the amount of temporary data and thus the i/o bottleneck The (reduced) temporary data will then be stored on disk and loaded all together after all predictions are done (and intermediate allocated, but now freed, memory can be reallocated for metric calculation). Do you know any metrics fulfilling these requirements or a better solution for lazy dataset metric calculation at all @mibaumgartner ? |
My original goal was to compute the AUROC on the the validation dataset. Memory consumption of the classification results isn't that high, it might not be necessary to introduce a caching system. I think there are two options for the general lazy case:
|
I like the first approach, but regarding the outputs: That highly depends on the task. If you're doing something like GANs or segmentation and have some image-like outputs, you can go OOM very fast. If you're doing this after training, there might be more RAM available, although in most cases this does not change the fact, whether it fits or not. |
I like the first approach, too :) |
Go ahead and implement it |
Find a way to provide a lazy way for calculating dataset metrics since they'd currently break the intention of lazy datasets (see #66 )
The text was updated successfully, but these errors were encountered: