[Fix] Implement better `wd_ban_list` handling #282

Vectorrent · 2024-10-23T05:30:37Z

Problem (Why?)

The wd_ban_list argument for get_optimizer_parameters() is somewhat misleading. When you look at it, you would expect any of the default arguments' name-formats to work correctly. However, that is not the case.

wd_ban_list: List[str] = ('bias', 'LayerNorm.bias', 'LayerNorm.weight')

From this list, only bias is "detected" and "banned" correctly. Neither LayerNorm.bias is detected, nor is LayerNorm.weight. Neither of these parameters have their weight_decay set to 0.

I even tested LayerNorm - and that doesn't work, either.

Solution (What/How?)

The reason this fails is that the wd_ban_list logic is only checking for the actual, fully-qualified parameter names; it is NOT checking for the class name of each nn.Module, as pytorch_optimizer's default arguments and tests would imply.

I implemented a more complete method for handling the wd_ban_list. Now, we check both for "true names", as well as for nn.Module names.

Notes

I've been using this patch in my own code for several weeks now; it seems to work great! Let me know if there is anything you would change.

kozistr

Hi thanks for the contribution!

wd_ban_list logic is only checking for the actual, fully-qualified parameter names

Yes. It was originally intended to exclude parameters that included names on the blacklist. So as you mentioned above, if you have a layer norm layer called 'asdf' and don't put the exact parameter name into wd_ban_list for example, it won't be excluded. I added LayerNorm.bias, and LayerNorm.weight to the default wd_ban_list to align with the usages in the Transformers library.

Your idea sounds good to me also in the aspect of adding module names (e.g. LayerNorm) in the exclusion criteria cuz we usually ban based on the type of module.

your code looks good to me! could you please run make format & make check by any chance? or I can handle it later then.

Vectorrent · 2024-10-24T14:58:15Z

I just pushed a new commit, with a few fixes. However, there is one error I was not able to fix:

pytorch_optimizer/optimizer/utils.py:201:5: D212 [*] Multi-line docstring summary should start at the first line
    |
199 |       wd_ban_list: List[str] = ('bias', 'LayerNorm.bias', 'LayerNorm.weight'),
200 |   ) -> PARAMETERS:
201 |       r"""
    |  _____^
202 | |     Get optimizer parameters while filtering specified modules.
203 | |     :param model_or_parameter: Union[nn.Module, List]. model or parameters.
204 | |     :param weight_decay: float. weight_decay.
205 | |     :param wd_ban_list: List[str]. ban list not to set weight decay.
206 | |     :returns: PARAMETERS. new parameter list.
207 | |     """
    | |_______^ D212
208 |   
209 |       fully_qualified_names = []
    |
    = help: Remove whitespace after opening quotes

Found 3 errors.
[*] 2 fixable with the `--fix` option.
make: *** [Makefile:16: check] Error 1

If you run make format, it fixes this issue. But then, if you run make check, it fails. So, if I manually fix it, then make check will work - but make format will fail, now!

I'm not super familiar with make, so I don't really know what to do here.

kozistr · 2024-10-24T15:01:00Z

I just pushed a new commit, with a few fixes. However, there is one error I was not able to fix:

pytorch_optimizer/optimizer/utils.py:201:5: D212 [*] Multi-line docstring summary should start at the first line
    |
199 |       wd_ban_list: List[str] = ('bias', 'LayerNorm.bias', 'LayerNorm.weight'),
200 |   ) -> PARAMETERS:
201 |       r"""
    |  _____^
202 | |     Get optimizer parameters while filtering specified modules.
203 | |     :param model_or_parameter: Union[nn.Module, List]. model or parameters.
204 | |     :param weight_decay: float. weight_decay.
205 | |     :param wd_ban_list: List[str]. ban list not to set weight decay.
206 | |     :returns: PARAMETERS. new parameter list.
207 | |     """
    | |_______^ D212
208 |   
209 |       fully_qualified_names = []
    |
    = help: Remove whitespace after opening quotes

Found 3 errors.
[*] 2 fixable with the `--fix` option.
make: *** [Makefile:16: check] Error 1

If you run make format, it fixes this issue. But then, if you run make check, it fails. So, if I manually fix it, then make check will work - but make format will fail, now!

I'm not super familiar with make, so I don't really know what to do here.

it's okay. I can handle lint stuff.

anyway, thanks for the contributions!

implement better logic for detecting weights/modules

905fca7

Vectorrent requested a review from kozistr as a code owner October 23, 2024 05:30

pull-request-size bot added the size/S label Oct 23, 2024

make it slightly more concise

ea80942

kozistr assigned Vectorrent Oct 24, 2024

kozistr added the enhancement New feature or request label Oct 24, 2024

kozistr previously approved these changes Oct 24, 2024

View reviewed changes

fix some makefile issues

546531c

Vectorrent dismissed kozistr’s stale review via 546531c October 24, 2024 14:56

kozistr merged commit 769e5fb into kozistr:main Oct 24, 2024
1 check passed

kozistr mentioned this pull request Oct 24, 2024

[Fix] when model_or_parameter is not nn.Module instance. #283

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Implement better `wd_ban_list` handling #282

[Fix] Implement better `wd_ban_list` handling #282

Vectorrent commented Oct 23, 2024

kozistr left a comment •

edited

Loading

Vectorrent commented Oct 24, 2024

kozistr commented Oct 24, 2024

[Fix] Implement better wd_ban_list handling #282

[Fix] Implement better wd_ban_list handling #282

Conversation

Vectorrent commented Oct 23, 2024

Problem (Why?)

Solution (What/How?)

Notes

kozistr left a comment • edited Loading

Choose a reason for hiding this comment

Vectorrent commented Oct 24, 2024

kozistr commented Oct 24, 2024

[Fix] Implement better `wd_ban_list` handling #282

[Fix] Implement better `wd_ban_list` handling #282

kozistr left a comment •

edited

Loading