-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/improve metric calculations #45
Conversation
… on each individual slice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get to testing this yet, but I did look through the code. See my minor comments below.
"train_metrics": ["dice_score"], | ||
"train_metric_confidence_levels": [0.25, 0.5, 0.75], | ||
"test_metrics": ["dice_score", "sensitivity", "specificity", "hausdorff95"], | ||
"test_metric_confidence_levels": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these should be documented in the README
metrics: Iterable[str], | ||
confidence_levels: Iterable[float], | ||
image_ids: Iterable[str], | ||
slices_per_image: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be problematic for the Medical Segmentation Decathlon data. There, images for the same segmentation task can have different amount of slices.
src/datasets/dataset_hooks.py
Outdated
""" | ||
|
||
@abstractmethod | ||
def slices_per_image(self, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing return type annotation.
See my other comment about this being a single int.
@@ -109,15 +112,20 @@ def run_active_learning_pipeline( | |||
else: | |||
raise ValueError("Invalid data_module name.") | |||
|
|||
if checkpoint_dir is not None: | |||
checkpoint_dir = os.path.join(checkpoint_dir, f"{wandb_logger.experiment.id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use a similar naming for the prediction directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good to me now. Great job!
Closes #32, #37, #38
May also contribute to #25, #43, #44
This PR implements the following features:
train_metrics
,train_metric_confidence_levels
,test_metrics
andtest_metric_confidence_levels
were added to specify which metrics should be tracked for each training step and which should only be tracked for the best modelmodel_selection_criterion
was introduced to specify which metric should be used to select the best model from each training runWith the changes from this PR, a training epoch on the BraTS 2018 dataset takes approximately three minutes on an A100 GPU and the model evaluation at the end of the training takes approximately five minutes. During each training epoch, the Dice score is computed for three confidence levels (0.25, 0.5, 0.75) on both the training and the validation set. During model evaluation, the Dice score, sensitivity, specificity and the Hausdorff distance are computed for nine confidence levels (0.1 - 0.9) on the validation set using the best model.
To ensure that the metric tracking is working in all cases, I would like to extend the test cases in a follow-up PR.