Skip to content

Commit

Permalink
Updated the HISTORY and release notes in preperation for v1.2.0. Adde…
Browse files Browse the repository at this point in the history
…d a Planned Additions section to the README.
  • Loading branch information
MitchMedeiros committed Sep 6, 2024
1 parent cd4e2a0 commit 0a7b9a9
Show file tree
Hide file tree
Showing 3 changed files with 93 additions and 14 deletions.
30 changes: 30 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,33 @@
## v1.2.0 (2024-08-02)

[GitHub release](https://github.com/MitchMedeiros/MLCompare/tag/v1.2.0)

### Pipelines
- Created a `data_pipeline` function for performing only data retrieval and processing
- Expanded the generated model performance metrics and added a required argument to `full_pipeline` for specifying whether the pipeline is being used for regression or classification tasks

### DatasetProcessor
- Refactored the class to store the train-test split data for easier processing
- Added a `handle_nan` method which can drop, forward-fill, and backward-fill missing values
- Added label encoding ordinal encoding, and target encoding methods
- Added several scaling and transformation methods from sklearn: StandardScaler, MinMaxScaler, MaxAbsScaler, RobustScaler, PowerTransformer, QuantileTransformer, and Normalize

### Documentation
- Created a new homepage
- Updated the layout of the API Reference page
- Added content to the Release Notes page
- Improved various docstrings
- Made multiple updates to the README including adding a "Planned Additions" section

### Other
- Added a `ResultsWriter` class, responsible for directory and file naming and creation throughout pipelines
- Implemented directory and file name incrementing to prevent overwrites
- Changed the default directory name to use the current timestamp to ensure uniqueness
- Improved how saving model results is handled
- Removed the `DataProcessor` class in favor of pipelines
- Migrated several high-level functions being used within pipelines to a new module: `processing.py`
- Improved unit test coverage

## v1.1.0 (2024-08-02)

[GitHub release](https://github.com/MitchMedeiros/MLCompare/tag/v1.1.0)
Expand Down
36 changes: 28 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,6 @@

<br>

<div align="center">
<b>This library is still in early development. Expect many more features to come :D</b>
</div>

<br>

MLCompare is a Python package for running model comparison pipelines, with the aim of being both simple and flexible. It supports multiple popular ML libraries, retrieval from multiple online dataset repositories, common data processing steps, and results visualization. Additionally, it allows for using your own models and datasets within the pipelines.

<table align="center">
Expand Down Expand Up @@ -132,7 +126,33 @@ models = [
}
]

mlcompare.full_pipeline(datasets, models)
mlcompare.full_pipeline(datasets, models, "regression")
```

In the case of the XGBoost model we passed in our own parameter values rather than using the defaults.
In the case of the XGBoost model some non-default parameter values were used.

<h2>Planned Additions</h2>

<h3>Version 1.3</h3>
<ul>
<li>LightGBM support</li>
<li>CatBoost support</li>
<li>Model results graphing and visualization</li>
<li>Improved documentation</li>
<li>Support for presplit data</li>
</ul>

<h3>Version 1.4</h3>
<ul>
<li>PyTorch support</li>
<li>TensorFlow support</li>
<li>Additional dataset sources</li>
<li>Built-in model and dataset collections for quick testing of similar model types/datasets</li>
<li>Optional pipeline caching</li>
<li>Optional trained model saving</li>
</ul>

<h3>Version 1.5</h3>
<ul>
<li>S3 Support</li>
</ul>
41 changes: 35 additions & 6 deletions docs/source/release_notes/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,51 @@ Release Notes
This is the list of changes to MLCompare between each release. For full details,
see the `commit logs <https://github.com/MitchMedeiros/MLCompare/commits/>`_.

Version 1.2.0
-------------

### Pipelines
- Created a `data_pipeline` function for performing only data retrieval and processing
- Expanded the generated model performance metrics and added a required argument to `full_pipeline` for specifying whether the pipeline is being used for regression or classification tasks

### DatasetProcessor
- Refactored the class to store the train-test split data for easier processing
- Added a `handle_nan` method which can drop, forward-fill, and backward-fill missing values
- Added label encoding ordinal encoding, and target encoding methods
- Added several scaling and transformation methods from sklearn: StandardScaler, MinMaxScaler, MaxAbsScaler, RobustScaler, PowerTransformer, QuantileTransformer, and Normalize

### Documentation
- Created a new homepage
- Updated the layout of the API Reference page
- Added content to the Release Notes page
- Improved various docstrings
- Made multiple updates to the README including adding a "Planned Additions" section

### Other
- Added a `ResultsWriter` class, responsible for directory and file naming and creation throughout pipelines
- Implemented directory and file name incrementing to prevent overwrites
- Changed the default directory name to use the current timestamp to ensure uniqueness
- Improved how saving model results is handled
- Removed the `DataProcessor` class in favor of pipelines
- Migrated several high-level functions being used within pipelines to a new module: `processing.py`
- Improved unit test coverage

Version 1.1.0
-------------

- Refactored DatasetProcessor, moving save_directory from a class attribute to a method argument
- Added type validation to several methods within DatasetProcessor
- Updated docstrings for the dataset_processor module
- Updated unit tests for DatasetProcessor
- Refactored `DatasetProcessor`, moving `save_directory` from a class attribute to a method argument
- Added type validation to several methods within `DatasetProcessor`
- Updated docstrings for the `dataset_processor` module
- Updated unit tests for `DatasetProcessor`
- Added optimal device selection for PyTorch models as default behavior
- Corrected a logging issue with model processing

Version 1.0.1
-------------

- Updated the project versioning to dynamically use the version in mlcompare/__init__.py
- Updated the project versioning to dynamically use the version in `mlcompare/__init__.py`
- Modified the package attributes displayed on PyPi including adding links to documentation
- Added the link to the documentation to the library __init__
- Added a link to the documentation to the library `__init__`
- Created a GitHub action for publishing newly tagged versions to PyPi

Version 1.0.0
Expand Down

0 comments on commit 0a7b9a9

Please sign in to comment.