Skip to content

Commit

Permalink
Release 0.6 (#349)
Browse files Browse the repository at this point in the history
  • Loading branch information
bbengfort authored Mar 19, 2018
1 parent 29867f5 commit 74e116c
Show file tree
Hide file tree
Showing 247 changed files with 7,603 additions and 1,205 deletions.
74 changes: 34 additions & 40 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -49,61 +49,68 @@ coverage.xml
*.mo
*.pot

# Django stuff:
*.log

# Sphinx documentation
docs/_build/

# PyBuilder
target/

#Ipython Notebook
# Jupyter Notebook
.ipynb_checkpoints

# Making sure the team plays well together
venv*
# pyenv
.python-version

# IDE/editor droppings
*.swp
*.swo

# OS droppings
.DS_Store
spad.py

# Created by https://www.gitignore.io/api/pycharm
# dotenv
.env

# virtualenv
.venv
venv/
ENV/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

### PyCharm ###
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and Webstorm
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
# mkdocs documentation
/site

# User-specific stuff:
# mypy
.mypy_cache/

# PyTest
.pytest_cache

# PyCharm
.idea/workspace.xml
.idea/tasks.xml
.idea/dictionaries
.idea/vcs.xml
.idea/jsLibraryMappings.xml

# Sensitive or high-churn files:
.idea/dataSources.ids
.idea/dataSources.xml
.idea/dataSources.local.xml
.idea/sqlDataSources.xml
.idea/dynamic.xml
.idea/uiDesigner.xml

# Gradle:
.idea/gradle.xml
.idea/libraries

# Mongo Explorer plugin:
.idea/mongoSettings.xml

## File-based project format:
*.iws

## Plugin-specific files:

# IntelliJ
/out/

# mpeltonen/sbt-idea plugin
.idea_modules/
.idea

# JIRA plugin
atlassian-ide-plugin.xml
Expand All @@ -114,18 +121,5 @@ crashlytics.properties
crashlytics-build.properties
fabric.properties

### PyCharm Patch ###
# Comment Reason: https://github.com/joeblau/gitignore.io/issues/186#issuecomment-215987721

# *.iml
# modules.xml
# .idea/misc.xml
# *.ipr

.idea

# VisualTestCase Outputs
/tests/actual_images/*

# Data downloaded from Yellowbrick
# Data downloaded from Yellowbrick
data/
10 changes: 5 additions & 5 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
language: python
python:
- '2.7'
- '3.5'
- '3.6'

before_install:
- sudo apt-get build-dep python-scipy
- pip install scipy
- pip install nose coverage mock
- pip install coveralls requests
- pip install -r tests/requirements.txt
- python -c 'import nltk; nltk.download("popular");'
- pip install coveralls

install: pip install -r requirements.txt

Expand All @@ -20,7 +19,8 @@ notifications:
email:
recipients:
- bbengfort@districtdatalabs.com
- tojeda@districtdatalabs.com
- rbilbro@districtdatalabs.com
- nathan.danielsen@gmail.com
- tojeda@districtdatalabs.com
on_success: change
on_failure: always
45 changes: 28 additions & 17 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ For more on the development path, goals, and motivations behind Yellowbrick, che

Yellowbrick is an open source project that is supported by a community who will gratefully and humbly accept any contributions you might make to the project. Large or small, any contribution makes a big difference; and if you've never contributed to an open source project before, we hope you will start with Yellowbrick!

Principally, Yellowbrick development is about the addition and creation of *visualizers* --- objects that learn from data and create a visual representation of the data or model. Visualizers integrate with Scikit-Learn estimators, transformers, and pipelines for specific purposes and as a result, can be simple to build and deploy. The most common contribution is therefore a new visualizer for a specific model or model family. We'll discuss in detail how to build visualizers later.
Principally, Yellowbrick development is about the addition and creation of *visualizers* --- objects that learn from data and create a visual representation of the data or model. Visualizers integrate with scikit-learn estimators, transformers, and pipelines for specific purposes and as a result, can be simple to build and deploy. The most common contribution is therefore a new visualizer for a specific model or model family. We'll discuss in detail how to build visualizers later.

Beyond creating visualizers, there are many ways to contribute:

Expand Down Expand Up @@ -74,7 +74,14 @@ Once forked, use the following steps to get your development environment set up
$ pip install -r requirements.txt
``

Note that there may be other dependencies required for development and testing, you can simply install them with `pip`.
Note that there may be other dependencies required for development and testing, you can simply install them with `pip`. For example to install
the additional dependencies for building the documentation or to run the
test suite, use the `requirements.txt` files in those directories:

```
$ pip install -r tests/requirements.txt
$ pip install -r docs/requirements.txt
```
4. Switch to the develop branch.
Expand Down Expand Up @@ -124,23 +131,23 @@ Head back to Waffle and checkout another issue!
In this section, we'll discuss the basics of developing visualizers. This of course is a big topic, but hopefully these simple tips and tricks will help make sense.
One thing that is necessary is a good understanding of Scikit-Learn and Matplotlib. Because our API is intended to integrate with Scikit-Learn, a good start is to review ["APIs of Scikit-Learn objects"](http://scikit-learn.org/stable/developers/contributing.html#apis-of-scikit-learn-objects) and ["rolling your own estimator"](http://scikit-learn.org/stable/developers/contributing.html#rolling-your-own-estimator). In terms of matplotlib, check out [Nicolas P. Rougier's Matplotlib tutorial](https://www.labri.fr/perso/nrougier/teaching/matplotlib/).
One thing that is necessary is a good understanding of scikit-learn and Matplotlib. Because our API is intended to integrate with scikit-learn, a good start is to review ["APIs of scikit-learn objects"](http://scikit-learn.org/stable/developers/contributing.html#apis-of-scikit-learn-objects) and ["rolling your own estimator"](http://scikit-learn.org/stable/developers/contributing.html#rolling-your-own-estimator). In terms of matplotlib, check out [Nicolas P. Rougier's Matplotlib tutorial](https://www.labri.fr/perso/nrougier/teaching/matplotlib/).
### Visualizer API
There are two basic types of Visualizers:
- **Feature Visualizers** are high dimensional data visualizations that are essentially transformers.
- **Score Visualizers** wrap a Scikit-Learn regressor, classifier, or clusterer and visualize the behavior or performance of the model on test data.
- **Score Visualizers** wrap a scikit-learn regressor, classifier, or clusterer and visualize the behavior or performance of the model on test data.
These two basic types of visualizers map well to the two basic objects in Scikit-Learn:
These two basic types of visualizers map well to the two basic objects in scikit-learn:
- **Transformers** take input data and return a new data set.
- **Estimators** are fit to training data and can make predictions.
The Scikit-Learn API is object oriented, and estimators and transformers are initialized with parameters by instantiating their class. Hyperparameters can also be set using the `set_attrs()` method and retrieved with the corresponding `get_attrs()` method. All Scikit-Learn estimators have a `fit(X, y=None)` method that accepts a two dimensional data array, `X`, and optionally a vector `y` of target values. The `fit()` method trains the estimator, making it ready to transform data or make predictions. Transformers have an associated `transform(X)` method that returns a new dataset, `Xprime` and models have a `predict(X)` method that returns a vector of predictions, `yhat`. Models also have a `score(X, y)` method that evaluate the performance of the model.
The scikit-learn API is object oriented, and estimators and transformers are initialized with parameters by instantiating their class. Hyperparameters can also be set using the `set_attrs()` method and retrieved with the corresponding `get_attrs()` method. All scikit-learn estimators have a `fit(X, y=None)` method that accepts a two dimensional data array, `X`, and optionally a vector `y` of target values. The `fit()` method trains the estimator, making it ready to transform data or make predictions. Transformers have an associated `transform(X)` method that returns a new dataset, `Xprime` and models have a `predict(X)` method that returns a vector of predictions, `yhat`. Models also have a `score(X, y)` method that evaluate the performance of the model.
Visualizers interact with Scikit-Learn objects by intersecting with them at the methods defined above. Specifically, visualizers perform actions related to `fit()`, `transform()`, `predict()`, and `score()` then call a `draw()` method which initializes the underlying figure associated with the visualizer. The user calls the visualizer's `poof()` method, which in turn calls a `finalize()` method on the visualizer to draw legends, titles, etc. and then `poof()` renders the figure. The Visualizer API is therefore:
Visualizers interact with scikit-learn objects by intersecting with them at the methods defined above. Specifically, visualizers perform actions related to `fit()`, `transform()`, `predict()`, and `score()` then call a `draw()` method which initializes the underlying figure associated with the visualizer. The user calls the visualizer's `poof()` method, which in turn calls a `finalize()` method on the visualizer to draw legends, titles, etc. and then `poof()` renders the figure. The Visualizer API is therefore:
- `draw()`: add visual elements to the underlying axes object
- `finalize()`: prepare the figure for rendering, adding final touches such as legends, titles, axis labels, etc.
Expand Down Expand Up @@ -172,7 +179,7 @@ class MyVisualizer(Visualizer):
self.set_title("My Visualizer")
```

This simple visualizer simply draws a line graph for some input dataset X, intersecting with the Scikit-Learn API at the `fit()` method. A user would use this visualizer in the typical style::
This simple visualizer simply draws a line graph for some input dataset X, intersecting with the scikit-learn API at the `fit()` method. A user would use this visualizer in the typical style::

```python
visualizer = MyVisualizer()
Expand All @@ -184,11 +191,13 @@ Score visualizers work on the same principle but accept an additional required `

### Testing

The test package mirrors the yellowbrick package in structure and also contains several helper methods and base functionality. To add a test to your visualizer, find the corresponding file to add the test case, or create a new test file in the same place you added your code.
The test package mirrors the `yellowbrick` package in structure and also contains several helper methods and base functionality. To add a test to your visualizer, find the corresponding file to add the test case, or create a new test file in the same place you added your code.

Visual tests are notoriously difficult to create --- how do you test a visualization or figure? Moreover, testing Scikit-Learn models with real data can consume a lot of memory. Therefore the primary test you should create is simply to test your visualizer from end to end and make sure that no exceptions occur. To assist with this, we have two primary helpers, `VisualTestCase` and `DatasetMixin`. Create your unittest as follows::
Visual tests are notoriously difficult to create --- how do you test a visualization or figure? Moreover, testing scikit-learn models with real data can consume a lot of memory. Therefore the primary test you should create is simply to test your visualizer from end to end and make sure that no exceptions occur. To assist with this, we have two primary helpers, `VisualTestCase` and `DatasetMixin`. Create your unit test as follows::

```python
import pytest

from tests.base import VisualTestCase
from tests.dataset import DatasetMixin

Expand All @@ -212,26 +221,28 @@ class MyVisualizerTests(VisualTestCase, DatasetMixin):
visualizer.fit(X)
visualizer.poof()
except Exception as e:
self.fail("my visualizer didn't work")
pytest.fail("my visualizer didn't work")
```

The entire test suite can be run as follows::

```
$ make test
$ pytest
```

You can also run your own test file as follows::

```
$ nosetests tests/test_your_visualizer.py
$ pytest tests/test_your_visualizer.py
```

The Makefile uses the nosetest runner and testing suite as well as the coverage library, so make sure you have those dependencies installed! The `DatasetMixin` also requires requests.py to fetch data from our Amazon S3 account.
The Makefile uses the pytest runner and testing suite as well as the coverage library, so make sure you have those dependencies installed! The `DatasetMixin` also requires [requests.py](http://docs.python-requests.org/en/master/) to fetch data from our Amazon S3 account.

**Note**: Advanced developers can use our _image comparison tests_ to assert that an image generated matches a baseline image. Read more about this in our [testing documentation](http://www.scikit-yb.org/en/latest/contributing.html#testing)

### Documentation

The initial documentation for your visualizer will be a well structured docstring. Yellowbrick uses Sphinx to build documentation, therefore docstrings should be written in reStructuredText in numpydoc format (similar to Scikit-Learn). The primary location of your docstring should be right under the class definition, here is an example::
The initial documentation for your visualizer will be a well structured docstring. Yellowbrick uses Sphinx to build documentation, therefore docstrings should be written in reStructuredText in numpydoc format (similar to scikit-learn). The primary location of your docstring should be right under the class definition, here is an example::

```python
class MyVisualizer(Visualizer):
Expand All @@ -245,7 +256,7 @@ class MyVisualizer(Visualizer):
Parameters
----------
model : a Scikit-Learn regressor
model : a scikit-learn regressor
Should be an instance of a regressor, and specifically one whose name
ends with "CV" otherwise a will raise a YellowbrickTypeError exception
on instantiation. To use non-CV regressors see:
Expand Down Expand Up @@ -273,7 +284,7 @@ class MyVisualizer(Visualizer):
"""
```

You should also add your example to the `examples` directory of the documentation when you have the chance, as well as create a demonstration in a notebook in the `examples` directory of the repository.
This is a very good start to producing a high quality visualizer, but unless it is part of the documentation on our website, it will not be visible. For details on including documentation in the `docs` directory see the [Contributing Documentation](http://www.scikit-yb.org/en/latest/contributing.html#documentation) section in the larger contributing guide.

## Throughput

Expand Down
6 changes: 4 additions & 2 deletions DESCRIPTION.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Yellowbrick is a suite of visual analysis and diagnostic tools designed to facil

Visualizers allow users to steer the model selection process, building intuition around feature engineering, algorithm selection, and hyperparameter tuning. For example, visualizers can help diagnose common problems surrounding model complexity and bias, heteroscedasticity, underfit and overtraining, or class balance issues. By applying visualizers to the model selection workflow, Yellowbrick allows you to steer predictive models to more successful results, faster.

Please see the full documentation at: http://scikit-yb.org/
Please see the full documentation at: http://scikit-yb.org/ particularly the `quick start guide <http://www.scikit-yb.org/en/latest/quickstart.html>`_

Visualizers
-----------
Expand All @@ -30,12 +30,14 @@ Feature Visualization
- **Parallel Coordinates**: horizontal visualization of instances
- **Radial Visualization**: separation of instances around a circular plot
- **PCA Projection**: projection of instances based on principal components
- **Feature Importances**: rank features based on their in-model performance
- **Scatter and Joint Plots**: direct data visualization with feature selection

Classification Visualization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- **Class Balance**: see how the distribution of classes affects the model
- **Class Prediction Error**: shows error and support in classification
- **Classification Report**: visual representation of precision, recall, and F1
- **ROC/AUC Curves**: receiver operator characteristics and area under the curve
- **Confusion Matrices**: visual description of class decision making
Expand All @@ -61,5 +63,5 @@ Text Visualization

... and more! Visualizers are being added all the time; be sure to check the examples_ (or even the develop_ branch) and feel free to contribute your ideas for new Visualizers!

.. _examples: http://www.scikit-yb.org/en/latest/examples/examples.html
.. _examples: http://www.scikit-yb.org/en/latest/api/index.html
.. _develop: https://github.com/districtdatalabs/yellowbrick/tree/develop
3 changes: 2 additions & 1 deletion MAINTAINERS.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ For everyone who has [contributed](https://github.com/DistrictDataLabs/yellowbri
This is a list of the primary project maintainers. Feel free to @ message them in issues and converse with them directly.

- [bbengfort](https://github.com/bbengfort)
- [ndanielsen](https://github.com/ndanielsen)
- [NealHumphrey](https://github.com/NealHumphrey)
- [jkeung](https://github.com/jkeung)
- [ndanielsen](https://github.com/ndanielsen)

## Core Contributors

Expand All @@ -27,3 +27,4 @@ This is a list of the core-contributors of the project. Core contributors set th
- [tuulihill](https://github.com/tuulihill)
- [balavenkatesan](https://github.com/balavenkatesan)
- [morganmendis](https://github.com/morganmendis)
- [lwgray](https://github.com/lwgray)
11 changes: 7 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,9 @@ SHELL := /bin/bash
# Set important Paths
PROJECT := yellowbrick
LOCALPATH := $(CURDIR)/$(PROJECT)
PYTHONPATH := $(LOCALPATH)/
PYTHON_BIN := $(VIRTUAL_ENV)/bin

# Export targets not associated with files
.PHONY: test coverage pip virtualenv clean publish uml build deploy
.PHONY: test coverage pip clean publish uml build deploy install

# Clean build files
clean:
Expand All @@ -19,14 +17,15 @@ clean:
-rm -rf build
-rm -rf dist
-rm -rf $(PROJECT).egg-info
-rm -rf .eggs
-rm -rf site
-rm -rf classes_$(PROJECT).png
-rm -rf packages_$(PROJECT).png
-rm -rf docs/_build

# Targets for testing
test:
$(PYTHON_BIN)/nosetests -v --with-coverage --cover-package=$(PROJECT) --cover-inclusive --cover-erase tests
python setup.py test

# Publish to gh-pages
publish:
Expand All @@ -40,6 +39,10 @@ uml:
build:
python setup.py sdist bdist_wheel

# Install the package from source
install:
python setup.py install

# Deploy to PyPI
deploy:
python setup.py register
Expand Down
Loading

0 comments on commit 74e116c

Please sign in to comment.