Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardization of Initial Weights (SIW) Plugin implementation #527

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from
Draft

Conversation

EdenBelouadah
Copy link
Collaborator

Hello,
I implemented the SIW approach from https://arxiv.org/pdf/2008.13710.pdf as a plugin.
I just need you to check if my implementation respects the rules of Avalanche.
I also added an example on how to run it on SplitCIFAR100.

SIW is based on three components:

  • Freeze past class weights across incremental states
  • Standardize last layer weights
  • Rectify past class prediction scores using model's confidence in each inc. state.

If the implementation seems okay to you, I will run full experiments to check if I can reproduce the results of the paper before to submit the definitive PR.

Thank you! :)

@coveralls
Copy link

Pull Request Test Coverage Report for Build 742476754

  • 16 of 47 (34.04%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.2%) to 75.578%

Changes Missing Coverage Covered Lines Changed/Added Lines %
avalanche/training/plugins/siw.py 15 46 32.61%
Totals Coverage Status
Change from base Build 742253830: -0.2%
Covered Lines: 6607
Relevant Lines: 8742

💛 - Coveralls

@AndreaCossu
Copy link
Collaborator

Thanks @EdenBelouadah ! Just a couple of comments, then I'll leave to @AntonioCarta the final check:

  1. It would be better to let the user specify how to access the fully connected layer of the model, instead of assuming model.fc (e.g. like in the CWRStarPlugin).
  2. Just out of curiosity, is there a reason you are making inputs a Variable in the line 86 of plugin?
  3. @AntonioCarta I think the direct calls to model() are fine for now. Maybe we will need to adapt them after you added the new model API.
  4. In the example, you can use Naive instead of BaseStrategy to better align with the existing examples. The computation is equivalent.

Apart from that, I don't have any other comments! Well done 😄

@AntonioCarta
Copy link
Collaborator

Thanks @EdenBelouadah!

I have a couple of comments:

  • after_eval_exp may be called multiple times, even for the same experience, so you may apply the standardization multiple times. It shouldn't be a problem for SIW because after the first time you have zero mean and unit variance.
  • can you make num_workers optional by using a default to 0?
  • Usually we provide an equivalent strategy for every plugin. The strategy is just a wrapper for the plugin. Can you add it? Just look ay training.strategies.strategy_wrappers and add it there. Remember to add the strategy to __all__.
  • can you add a test in tests/test_strategies.py? You can copy test_naive and change the strategy to use SIW.

@AntonioCarta I think the direct calls to model() are fine for now. Maybe we will need to adapt them after you added the new model API.

This is fine, we can change it later if needed.

Let me know if you are able to reproduce the results.

@EdenBelouadah
Copy link
Collaborator Author

Thank you very much @AndreaCossu and @AntonioCarta for you remarks. They are indeed helpful! I will integrate them this week and let you know later about the results reproduction.

@EdenBelouadah
Copy link
Collaborator Author

EdenBelouadah commented Apr 17, 2021

Hello again,
I included all your improvements into the code. Thank you again for your help! However, I still have some issues to fix before to push the final PR.

  1. I have an error in my unit test. How can I run the unittest related to SIW only? using python -m unittest discover -v runs the test on all the strategies and this makes me lose time to debug the code.

2- At which point can I specify the lr_scheduler ? I need it to replicate my paper's results.

3- Is there any way in Avalanche to use hyper-parameters for the first non-incremental state different from those used in incremental states?

Please tell me if there are still some issues with the code.
I just need to get a little bit familiar with the evaluation module to run the final tests.

Thank you very much

@vlomonaco vlomonaco marked this pull request as draft April 20, 2021 08:47
@AntonioCarta
Copy link
Collaborator

I have an error in my unit test. How can I run the unittest related to SIW only? using python -m unittest discover -v runs the test on all the strategies and this makes me lose time to debug the code.

If you use pycharm you have the ability to run tests separately (also in debug mode).

2- At which point can I specify the lr_scheduler ? I need it to replicate my paper's results.

I did a quick implementation of a wrapper for the lr scheduler. I added a print inside after_training_epoch, please that the results are correct because I tested it quickly.

class LRSchedulerPlugin(StrategyPlugin):
    def __init__(self, lr_scheduler):
        super().__init__()
        self.lr_scheduler = lr_scheduler

    def after_training_epoch(self, strategy: 'BaseStrategy', **kwargs):
        self.lr_scheduler.step()
        lr = strategy.optimizer.param_groups[0]['lr']
        print(f"\nlr = {lr}")


if __name__ == '__main__':
    scenario = SplitMNIST(n_experiences=5)
    model = SimpleMLP()
    opt = SGD(model.parameters(), lr=0.001)
    schp = LRSchedulerPlugin(StepLR(opt, step_size=1, gamma=0.9))
    strategy = Naive(model, opt, CrossEntropyLoss(),
                     train_mb_size=512, plugins=[schp])

    for exp in scenario.train_stream:
        strategy.train(exp)

3- Is there any way in Avalanche to use hyper-parameters for the first non-incremental state different from those used in incremental states?

what hyperparameters? can you explain more in detail?

@EdenBelouadah
Copy link
Collaborator Author

EdenBelouadah commented Apr 20, 2021

  1. Thank you @AntonioCarta for your answers. I did not check the scheduler yet, I will let you know when I do it.
  2. When I refer to hyper-parameters, I mean the lr, batch_size, patience (for the ReduceLROnPlateau) ....
  3. I found the error that causes the failure in unit tests. It's because SIW is not supposed to be evaluated on unseen experiences. Like it's invoqued in run_strategy function results.append(cl_strategy.eval(scenario.test_stream[:])). In fact, I compute the model's confidence at the end of each training experience and I update the self.confidences list. When we evaluate on unseen experiences, we try to access to inexistant element in the list. Should I write another run_strategy function for SIW? (In fact, I don't even understand why in run_strategy, you're evaluating on all experiences).
    When I replace the line above with results.append(cl_strategy.eval(scenario.test_stream[:i+1])), where i is the number of the current experience in the training stream, the unit test passes without problems.

Thank you

@AntonioCarta
Copy link
Collaborator

When I refer to hyper-parameters, I mean the lr, batch_size, patience (for the ReduceLROnPlateau) ....

You could do something like this to change the hyper-parameters dynamically:

class SetSIWHyperparams(StrategyPlugin):
    def __init__(self, lr_scheduler):
        super().__init__()
        self.lr_scheduler = lr_scheduler

    def before_training_exp(self, strategy: 'BaseStrategy', **kwargs):
        if self.experience.current_experience == 0:  # first experience
            self.train_batch_size = 32  # set batch size
            self.optimizer = ...
        else self.experience.current_experience == 0:  # incremental update
            self.train_batch_size = 64  # set batch size
            self.optimizer = ...

and add it as a plugin to your strategy. Other customized parameters can be similarly modified at runtime,

When I replace the line above with results.append(cl_strategy.eval(scenario.test_stream[:i+1])), where i is the number of the current experience in the training stream, the unit test passes without problems.

This is fine. You could leave run_strategy as is and modify the test only for SIW.
Maybe you should also raise an explicit error when the user try to test on unseen classes? A more explicit error with an explanation will help users which may not be familiar with the internal implementation. Also, remember to add this detail in the docstring.

@ContinualAI-bot
Copy link
Collaborator

Oh no! It seems there are some PEP8 errors! 😕
Don't worry, you can fix them! 💪
Here's a report about the errors and where you can find them:

examples/siw_cifar100.py:40:81: E501 line too long (84 > 80 characters)
examples/siw_cifar100.py:47:81: E501 line too long (84 > 80 characters)
examples/siw_cifar100.py:66:81: E501 line too long (94 > 80 characters)
examples/siw_cifar100_2.py:37:81: E501 line too long (81 > 80 characters)
examples/siw_cifar100_2.py:43:81: E501 line too long (84 > 80 characters)
examples/siw_cifar100_2.py:50:81: E501 line too long (84 > 80 characters)
examples/siw_cifar100_2.py:75:81: E501 line too long (103 > 80 characters)
examples/siw_cifar100_2.py:76:81: E501 line too long (103 > 80 characters)
examples/siw_cifar100_2.py:77:81: E501 line too long (103 > 80 characters)
examples/siw_cifar100_2.py:78:81: E501 line too long (103 > 80 characters)
examples/siw_cifar100_2.py:79:81: E501 line too long (103 > 80 characters)
examples/siw_cifar100_2.py:80:81: E501 line too long (103 > 80 characters)
examples/siw_cifar100_2.py:81:81: E501 line too long (103 > 80 characters)
examples/siw_cifar100_2.py:82:81: E501 line too long (103 > 80 characters)
examples/siw_cifar100_2.py:83:81: E501 line too long (103 > 80 characters)
examples/siw_cifar100_2.py:84:81: E501 line too long (105 > 80 characters)
examples/siw_cifar100_2.py:86:81: E501 line too long (103 > 80 characters)
examples/siw_cifar100_2.py:87:81: E501 line too long (103 > 80 characters)
examples/siw_cifar100_2.py:88:81: E501 line too long (103 > 80 characters)
examples/siw_cifar100_2.py:89:81: E501 line too long (103 > 80 characters)
examples/siw_cifar100_2.py:90:81: E501 line too long (103 > 80 characters)
examples/siw_cifar100_2.py:91:81: E501 line too long (103 > 80 characters)
examples/siw_cifar100_2.py:92:81: E501 line too long (103 > 80 characters)
examples/siw_cifar100_2.py:93:81: E501 line too long (103 > 80 characters)
examples/siw_cifar100_2.py:94:81: E501 line too long (103 > 80 characters)
examples/siw_cifar100_2.py:95:81: E501 line too long (105 > 80 characters)
26      E501 line too long (84 > 80 characters)

@ContinualAI-bot
Copy link
Collaborator

Oh no! It seems there are some PEP8 errors! 😕
Don't worry, you can fix them! 💪
Here's a report about the errors and where you can find them:

examples/siw_cifar100.py:40:81: E501 line too long (84 > 80 characters)
examples/siw_cifar100.py:47:81: E501 line too long (84 > 80 characters)
examples/siw_cifar100.py:66:81: E501 line too long (94 > 80 characters)
3       E501 line too long (84 > 80 characters)

BELOUADAH Eden added 2 commits April 22, 2021 11:37
@EdenBelouadah
Copy link
Collaborator Author

EdenBelouadah commented Apr 24, 2021

Hello @AntonioCarta ,
I have two main issues:
1- the unittests are failing because of the si_lambda parameter in synaptic intelligence, and not for SIW (knowing that I didn't touch the code of SI).


Traceback (most recent call last):
  File "/home/runner/work/avalanche/avalanche/tests/test_strategies.py", line 293, in test_synaptic_intelligence
    train_epochs=1, train_mb_size=10, eval_mb_size=10)
TypeError: __init__() got an unexpected keyword argument 'si_lambda'

2- To ckeck why I cannot reproduce the results of my paper, I found that the problem is in the vanilla fine tuning itself. I removed the SIW plugin and I removed also my image lists. Instead, I just use the Naive Strategy with the already-prepared SplitCifar benchmark and the results during testing are weird. During training, it seems that the model is learning (accuracy on the new task ~80%), but during testing, the accuracy on the current task drops until ~20% while it's nearly ~0 for older tasks.

Here's the code:


model = torchvision.models.resnet18(num_classes=100).to(device)

# log to text file
text_logger = TextLogger(open('/scratch_global/eden/avalanche/log_0.txt', 'a'))

# print to stdout
interactive_logger = InteractiveLogger()

eval_plugin = EvaluationPlugin(
accuracy_metrics(minibatch=False, epoch=True, experience=True, stream=True),
loggers=[interactive_logger, text_logger]
)

optimizer = SGD(model.parameters(), lr=0.1, momentum=0.9)
criterion = CrossEntropyLoss()
strategy = Naive(model, optimizer, criterion,
                 device=device, train_epochs=100, evaluator=eval_plugin)

normalize = transforms.Normalize(mean=[0.5356, 0.4898, 0.4255],
                                     std=[0.2007, 0.1999, 0.1992])

train_transform = transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        normalize])

test_transform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        normalize])

# scenario
scenario = SplitCIFAR100(n_experiences=10, return_task_id=False,
                             fixed_class_order=range(0, 100), train_transform=train_transform,
                             eval_transform=test_transform)

# TRAINING LOOP
print('Starting experiment...')
results = []
for i, experience in enumerate(scenario.train_stream):
    print("Start of experience: ", experience.current_experience)
    strategy.train(experience)
    print('Training completed')
    print('Computing accuracy on the test set')
    res = strategy.eval(scenario.test_stream[:i+1])
    results.append(res)
print('Results = ' + str(results))

and here's the testing accuracy at each experience:

Top1_Acc_Stream/eval_phase/test_stream = 0.1860
Top1_Acc_Stream/eval_phase/test_stream = 0.1600
Top1_Acc_Stream/eval_phase/test_stream = 0.1243
Top1_Acc_Stream/eval_phase/test_stream = 0.0712
Top1_Acc_Stream/eval_phase/test_stream = 0.0608
Top1_Acc_Stream/eval_phase/test_stream = 0.0472
Top1_Acc_Stream/eval_phase/test_stream = 0.0657
Top1_Acc_Stream/eval_phase/test_stream = 0.0270
Top1_Acc_Stream/eval_phase/test_stream = 0.0343
Top1_Acc_Stream/eval_phase/test_stream = 0.0284

Thank you again for you help and patience.

@AntonioCarta
Copy link
Collaborator

Top1_Acc_Stream/eval_phase/test_stream = 0.1860
Top1_Acc_Stream/eval_phase/test_stream = 0.1600
Top1_Acc_Stream/eval_phase/test_stream = 0.1243
Top1_Acc_Stream/eval_phase/test_stream = 0.0712
Top1_Acc_Stream/eval_phase/test_stream = 0.0608
Top1_Acc_Stream/eval_phase/test_stream = 0.0472
Top1_Acc_Stream/eval_phase/test_stream = 0.0657
Top1_Acc_Stream/eval_phase/test_stream = 0.0270
Top1_Acc_Stream/eval_phase/test_stream = 0.0343
Top1_Acc_Stream/eval_phase/test_stream = 0.0284

This is suspiciously low. I run your example and I see that the accuracy on the current experience drops. For example on Exp. 9 you get

Top1_Acc_Exp/eval_phase/test_stream/Task000/Exp009 = 0.1730

While it should get a much higher value, forgetting all the previous experiences (Naive). Notice that the values on the training set are ok (above 80%), so it could also be overfitting? I never experimented with CIFAR100 so I can't say for sure but it seems too extreme to me.

I will try to reproduce this problem on a smaller dataset.

@AntonioCarta
Copy link
Collaborator

AntonioCarta commented Apr 26, 2021

@EdenBelouadah I did a quick experiment on CIFAR10 (10 epochs, train_mb_size 512), The results seem ok to me:

> Eval on experience 4 (Task 0) from test stream ended.
        Top1_Acc_Exp/eval_phase/test_stream/Task000/Exp004 = 0.6760
-- >> End of eval phase << --
        Top1_Acc_Stream/eval_phase/test_stream = 0.1352

The model learns correctly on the last experience and forgets catastrophically the previous ones. The Naive strategy seems to work as expected. Probably your error has to do with your hyperparameter's choice. Can you check this?

Are you using the last version of Avalanche? Can you pull from master to ensure this?

@EdenBelouadah
Copy link
Collaborator Author

EdenBelouadah commented Apr 26, 2021

Hello,
1- Yes, I'm using the last version of Avalanche at each time. Here are the last 2 commits displayed by git log -2 :

commit d49e28ad86d2e72f0b1c53a3091451e6e6de63c5 (upstream/master)
Merge: c8e404e2 b10f6759
Author: Antonio Carta <w-32@hotmail.it>
Date:   Fri Apr 23 13:43:21 2021 +0200

    Merge pull request #553 from Mattdl/master
    
    Replay with StoragePolicies and CoPE strategy

commit c8e404e2171d070ffd34bdca6fbc30ab2857ebf4
Merge: ec9c5744 9b5739da
Author: Lorenzo Pellegrini <43009885+lrzpellegrini@users.noreply.github.com>
Date:   Fri Apr 23 12:29:11 2021 +0200

    Merge pull request #555 from imirzadeh/patch-2
    
    Fix wrong normalization statistics for Fashion MNIST

2- Also, me too I suspected that it was an overfitting. Therefore, I changed the hyperparameters and I used exactly those used in my paper, but it seems now to be an underfitting (which is weird for me). The training accuracy on the first task does not exceed 10%. I think now we're sure that the problem is in the optimization of the Naive strategy. I will search for other hyperparameters and tell you if the problem is solved.

3- Can you please run a unit test on the Synaptic Intelligence and tell me if you have the same bug as me? if yes, can you please correct the bug so it does not affect the PR of SIW?

Thanks a lot.

@AntonioCarta
Copy link
Collaborator

2- Also, me too I suspected that it was an overfitting. Therefore, I changed the hyperparameters and I used exactly those used in my paper, but it seems now to be an underfitting (which is weird for me). The training accuracy on the first task does not exceed 10%. I think now we're sure that the problem is in the optimization of the Naive strategy. I will search for other hyperparameters and tell you if the problem is solved.

Are you able to obtain good results without using the Avalanche's Naive strategy? If you can provide me a minimal script that gets good results using Avalanche's scenarios, but not Avalanche's strategies I can try to investigate this problem more in detail.

3- Can you please run a unit test on the Synaptic Intelligence and tell me if you have the same bug as me? if yes, can you please correct the bug so it does not affect the PR of SIW?

I see that in strategy_wrappers.py Synaptic Intelligence has two __init__. You probably copy-paster it by mistake.

@AntonioCarta AntonioCarta mentioned this pull request May 12, 2021
32 tasks
@vlomonaco vlomonaco added the Training Related to the Training module label Jun 6, 2021
@vlomonaco
Copy link
Member

Hi @EdenBelouadah how is this progressing? Let us know if you encountered any problem and want some help!

@vlomonaco vlomonaco mentioned this pull request Jun 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Training Related to the Training module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants