Releases: gpauloski/kfac-pytorch
Releases · gpauloski/kfac-pytorch
v0.4.1
v0.4.0
Complete refactor of kfac-pytorch
See Pull Requests #38, #40, #41, and #42.
DevOps changes
kfac
requirestorch>=1.8
and Python>=3.7
tox
used for testing environments and automationpre-commit
updated. Major changes include prefer single-quotes, mypy, flake8 plugins- Switch to
setup.cfg
for package metadata andtox
/flake8
/mypy
/coverage
configuration - Add
requirement-dev.txt
that contains all dependencies needed to run the test suite
Code quality and testing
- Complete type annotations for all code
- Passes
mypy
- Passes
- Separated testing utilities and unit tests into
testing/
andtests/
respectively - Expansive unit testing suite that achieves 100% code coverage
- New testing utilities include wrappers for simulating distributed environments and small test models
- Added end-to-end training tests
- small unit test (included in
pytest
) that checks loss decreases when training with K-FAC - MNIST integration test (not run with
pytest
) that verifies training with K-FAC achieves higher accuracy
- small unit test (included in
kfac
package improvements
- KFAC layers separated from PyTorch module wrappers
KFACBaseLayer
handles general K-FAC computations and communications for an arbitrary layerModuleHelper
implementations provide a unified interface for interacting with supported PyTorch modules- Provides methods that return the size of the factors for the layer so the size of factors can be determined prior to training
- Provides methods for getting the current gradients, updating the gradients, and computing the factors from the intermediate data
- Each
KFACBaseLayer
instance is passed aModuleHelper
instance corresponding to the module in the model being preconditioned
- Removed broken LSTM/RNN/Embedding layer support
- Module registration utilities moved out of the preconditioner class and into the
kfac.layers.register
module - Replaced the
comm
module with thedistributed
module that provide a more exhaustive set of distributed communication utilties- All communication ops now return futures to the return value to allow more aggressive asynchronous communication
- Added allreduce bucketing for factor allreduce (closes #32)
- Added
get_rank
andget_world_size
methods to enable K-FAC training whentorch.distributed
is not initialized
- Enum types moved to
enums
module for convenience with type annotations KFACBaseLayer
is now agnostic of its placement- I.e., the
KFACBaseLayer
expects some other object to correctly execute its operations according to some placement strategy. - This change was made to allow other preconditioner implementations to use the math/communication operations provided by the
KFACBaseLayer
without being beholded to some placement strategy.
- I.e., the
- Created the
BaseKFACPreconditioner
which provides the minimal set of functionality for preconditioning with K-FAC- Provides state dict saving/loading, a
step()
method, hook registration toKFACBaseLayer
, and some small bookkeeping functionality - The
BaseKFACPreconditioner
takes as input already registeredKFACBaseLayer
s and an initializedWorkAssignment
object. - This change was made to factor out the strategy specific details from the core preconditioning functions with the goal of having preconditioner implementations that interact more closely with other frameworks such as DeepSpeed
- Added
reset_batch()
to clear the staged factors for the batch in the case of a bad batch of data (e.g., if the gradients overflowed) memory_usage()
includes the intermediate factors accumulated for the current batchstate_dict
now includes K-FAC hyperparameters and steps in addition to factors
- Provides state dict saving/loading, a
- Added
KFACPreconditioner
, a subclass ofBaseKFACPreconditioner
, that implements the full functionality described in the KAISA paper. - New
WorkAssignment
interface that provides a schematic for the methods needed byBaseKFACPreconditioner
to determine where to perform computations and communications- Added the
KAISAAssignment
implementation that provides the KAISA gradient worker fraction-based strategy
- Added the
- K-FAC hyperparameter schedule changes
- Old inflexible
KFACParamScheduler
replace with aLambdaParamScheduler
modeled on PyTorch'sLambdaLRSchedule
BaseKFACPreconditioner
can be passed functions the return the current K-FAC hyperparameters rather than static float values
- Old inflexible
- All printing done via
logging
andKFACBasePreconditioner
takes an optionalloglevel
parameter (closes #33)
Example script changes
- Added
examples/requirements.txt
- Usage instructions for examples moved to
examples/README.md
- Update examples to use new
kfac
API - Examples are now properly type annotated
- Removed non-working language model example
Other changes + future goals
- Removed a lot of content from the README that should eventually be moved to a wiki
- Previously, the README was quite verbose and made it difficult to find the important content
- Updated README examples, publications, and development instructions
- Future changes include:
- GitHub actions for running code formatting, unit tests, integration tests
- Issue/PR templates
- Added badges to README
- wiki