Release 0.6 (#349)

DistrictDataLabs · Mar 19, 2018 · 74e116c · 74e116c
1 parent 29867f5
commit 74e116c
Show file tree

Hide file tree

Showing 247 changed files with 7,603 additions and 1,205 deletions.
diff --git a/.gitignore b/.gitignore
@@ -49,61 +49,68 @@ coverage.xml
 *.mo
 *.pot
 
-# Django stuff:
-*.log
-
 # Sphinx documentation
 docs/_build/
 
 # PyBuilder
 target/
 
-#Ipython Notebook
+# Jupyter Notebook
 .ipynb_checkpoints
 
-# Making sure the team plays well together
-venv*
+# pyenv
+.python-version
+
+# IDE/editor droppings
+*.swp
+*.swo
+
+# OS droppings
 .DS_Store
-spad.py
 
-# Created by https://www.gitignore.io/api/pycharm
+# dotenv
+.env
+
+# virtualenv
+.venv
+venv/
+ENV/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
 
-### PyCharm ###
-# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and Webstorm
-# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
+# mkdocs documentation
+/site
 
-# User-specific stuff:
+# mypy
+.mypy_cache/
+
+# PyTest
+.pytest_cache
+
+# PyCharm
 .idea/workspace.xml
 .idea/tasks.xml
 .idea/dictionaries
 .idea/vcs.xml
 .idea/jsLibraryMappings.xml
-
-# Sensitive or high-churn files:
 .idea/dataSources.ids
 .idea/dataSources.xml
 .idea/dataSources.local.xml
 .idea/sqlDataSources.xml
 .idea/dynamic.xml
 .idea/uiDesigner.xml
-
-# Gradle:
 .idea/gradle.xml
 .idea/libraries
-
-# Mongo Explorer plugin:
 .idea/mongoSettings.xml
-
-## File-based project format:
 *.iws
-
-## Plugin-specific files:
-
-# IntelliJ
 /out/
-
-# mpeltonen/sbt-idea plugin
 .idea_modules/
+.idea
 
 # JIRA plugin
 atlassian-ide-plugin.xml
@@ -114,18 +121,5 @@ crashlytics.properties
 crashlytics-build.properties
 fabric.properties
 
-### PyCharm Patch ###
-# Comment Reason: https://github.com/joeblau/gitignore.io/issues/186#issuecomment-215987721
-
-# *.iml
-# modules.xml
-# .idea/misc.xml
-# *.ipr
-
-.idea
-
-# VisualTestCase Outputs
-/tests/actual_images/*
-
-# Data downloaded from Yellowbrick 
+# Data downloaded from Yellowbrick
 data/
diff --git a/.travis.yml b/.travis.yml
@@ -1,14 +1,13 @@
 language: python
 python:
   - '2.7'
-  - '3.5'
   - '3.6'
 
 before_install:
   - sudo apt-get build-dep python-scipy
-  - pip install scipy
-  - pip install nose coverage mock
-  - pip install coveralls requests
+  - pip install -r tests/requirements.txt
+  - python -c 'import nltk; nltk.download("popular");'
+  - pip install coveralls
 
 install: pip install -r requirements.txt
 
@@ -20,7 +19,8 @@ notifications:
   email:
     recipients:
       - bbengfort@districtdatalabs.com
-      - tojeda@districtdatalabs.com
       - rbilbro@districtdatalabs.com
+      - nathan.danielsen@gmail.com
+      - tojeda@districtdatalabs.com
     on_success: change
     on_failure: always
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -8,7 +8,7 @@ For more on the development path, goals, and motivations behind Yellowbrick, che
 
 Yellowbrick is an open source project that is supported by a community who will gratefully and humbly accept any contributions you might make to the project. Large or small, any contribution makes a big difference; and if you've never contributed to an open source project before, we hope you will start with Yellowbrick!
 
-Principally, Yellowbrick development is about the addition and creation of *visualizers* --- objects that learn from data and create a visual representation of the data or model. Visualizers integrate with Scikit-Learn estimators, transformers, and pipelines for specific purposes and as a result, can be simple to build and deploy. The most common contribution is therefore a new visualizer for a specific model or model family. We'll discuss in detail how to build visualizers later.
+Principally, Yellowbrick development is about the addition and creation of *visualizers* --- objects that learn from data and create a visual representation of the data or model. Visualizers integrate with scikit-learn estimators, transformers, and pipelines for specific purposes and as a result, can be simple to build and deploy. The most common contribution is therefore a new visualizer for a specific model or model family. We'll discuss in detail how to build visualizers later.
 
 Beyond creating visualizers, there are many ways to contribute:
 
@@ -74,7 +74,14 @@ Once forked, use the following steps to get your development environment set up
     $ pip install -r requirements.txt
     ``
 
-    Note that there may be other dependencies required for development and testing, you can simply install them with `pip`.
+    Note that there may be other dependencies required for development and testing, you can simply install them with `pip`. For example to install
+    the additional dependencies for building the documentation or to run the
+    test suite, use the `requirements.txt` files in those directories:
+
+    ```
+    $ pip install -r tests/requirements.txt
+    $ pip install -r docs/requirements.txt
+    ```
 
 4. Switch to the develop branch.
 
@@ -124,23 +131,23 @@ Head back to Waffle and checkout another issue!
 
 In this section, we'll discuss the basics of developing visualizers. This of course is a big topic, but hopefully these simple tips and tricks will help make sense.
 
-One thing that is necessary is a good understanding of Scikit-Learn and Matplotlib. Because our API is intended to integrate with Scikit-Learn, a good start is to review ["APIs of Scikit-Learn objects"](http://scikit-learn.org/stable/developers/contributing.html#apis-of-scikit-learn-objects) and ["rolling your own estimator"](http://scikit-learn.org/stable/developers/contributing.html#rolling-your-own-estimator). In terms of matplotlib, check out [Nicolas P. Rougier's Matplotlib tutorial](https://www.labri.fr/perso/nrougier/teaching/matplotlib/).
+One thing that is necessary is a good understanding of scikit-learn and Matplotlib. Because our API is intended to integrate with scikit-learn, a good start is to review ["APIs of scikit-learn objects"](http://scikit-learn.org/stable/developers/contributing.html#apis-of-scikit-learn-objects) and ["rolling your own estimator"](http://scikit-learn.org/stable/developers/contributing.html#rolling-your-own-estimator). In terms of matplotlib, check out [Nicolas P. Rougier's Matplotlib tutorial](https://www.labri.fr/perso/nrougier/teaching/matplotlib/).
 
 ### Visualizer API
 
 There are two basic types of Visualizers:
 
 - **Feature Visualizers** are high dimensional data visualizations that are essentially transformers.
-- **Score Visualizers** wrap a Scikit-Learn regressor, classifier, or clusterer and visualize the behavior or performance of the model on test data.
+- **Score Visualizers** wrap a scikit-learn regressor, classifier, or clusterer and visualize the behavior or performance of the model on test data.
 
-These two basic types of visualizers map well to the two basic objects in Scikit-Learn:
+These two basic types of visualizers map well to the two basic objects in scikit-learn:
 
 - **Transformers** take input data and return a new data set.
 - **Estimators** are fit to training data and can make predictions.
 
-The Scikit-Learn API is object oriented, and estimators and transformers are initialized with parameters by instantiating their class. Hyperparameters can also be set using the `set_attrs()` method and retrieved with the corresponding `get_attrs()` method. All Scikit-Learn estimators have a `fit(X, y=None)` method that accepts a two dimensional data array, `X`, and optionally a vector `y` of target values. The `fit()` method trains the estimator, making it ready to transform data or make predictions. Transformers have an associated `transform(X)` method that returns a new dataset, `Xprime` and models have a `predict(X)` method that returns a vector of predictions, `yhat`. Models also have a `score(X, y)` method that evaluate the performance of the model.
+The scikit-learn API is object oriented, and estimators and transformers are initialized with parameters by instantiating their class. Hyperparameters can also be set using the `set_attrs()` method and retrieved with the corresponding `get_attrs()` method. All scikit-learn estimators have a `fit(X, y=None)` method that accepts a two dimensional data array, `X`, and optionally a vector `y` of target values. The `fit()` method trains the estimator, making it ready to transform data or make predictions. Transformers have an associated `transform(X)` method that returns a new dataset, `Xprime` and models have a `predict(X)` method that returns a vector of predictions, `yhat`. Models also have a `score(X, y)` method that evaluate the performance of the model.
 
-Visualizers interact with Scikit-Learn objects by intersecting with them at the methods defined above. Specifically, visualizers perform actions related to `fit()`, `transform()`, `predict()`, and `score()` then call a `draw()` method which initializes the underlying figure associated with the visualizer. The user calls the visualizer's `poof()` method, which in turn calls a `finalize()` method on the visualizer to draw legends, titles, etc. and then `poof()` renders the figure. The Visualizer API is therefore:
+Visualizers interact with scikit-learn objects by intersecting with them at the methods defined above. Specifically, visualizers perform actions related to `fit()`, `transform()`, `predict()`, and `score()` then call a `draw()` method which initializes the underlying figure associated with the visualizer. The user calls the visualizer's `poof()` method, which in turn calls a `finalize()` method on the visualizer to draw legends, titles, etc. and then `poof()` renders the figure. The Visualizer API is therefore:
 
 - `draw()`: add visual elements to the underlying axes object
 - `finalize()`: prepare the figure for rendering, adding final touches such as legends, titles, axis labels, etc.
@@ -172,7 +179,7 @@ class MyVisualizer(Visualizer):
         self.set_title("My Visualizer")
 ```
 
-This simple visualizer simply draws a line graph for some input dataset X, intersecting with the Scikit-Learn API at the `fit()` method. A user would use this visualizer in the typical style::
+This simple visualizer simply draws a line graph for some input dataset X, intersecting with the scikit-learn API at the `fit()` method. A user would use this visualizer in the typical style::
 
 ```python
 visualizer = MyVisualizer()
@@ -184,11 +191,13 @@ Score visualizers work on the same principle but accept an additional required `
 
 ### Testing
 
-The test package mirrors the yellowbrick package in structure and also contains several helper methods and base functionality. To add a test to your visualizer, find the corresponding file to add the test case, or create a new test file in the same place you added your code.
+The test package mirrors the `yellowbrick` package in structure and also contains several helper methods and base functionality. To add a test to your visualizer, find the corresponding file to add the test case, or create a new test file in the same place you added your code.
 
-Visual tests are notoriously difficult to create --- how do you test a visualization or figure? Moreover, testing Scikit-Learn models with real data can consume a lot of memory. Therefore the primary test you should create is simply to test your visualizer from end to end and make sure that no exceptions occur. To assist with this, we have two primary helpers, `VisualTestCase` and `DatasetMixin`. Create your unittest as follows::
+Visual tests are notoriously difficult to create --- how do you test a visualization or figure? Moreover, testing scikit-learn models with real data can consume a lot of memory. Therefore the primary test you should create is simply to test your visualizer from end to end and make sure that no exceptions occur. To assist with this, we have two primary helpers, `VisualTestCase` and `DatasetMixin`. Create your unit test as follows::
 
 ```python
+import pytest
+
 from tests.base import VisualTestCase
 from tests.dataset import DatasetMixin
 
@@ -212,26 +221,28 @@ class MyVisualizerTests(VisualTestCase, DatasetMixin):
             visualizer.fit(X)
             visualizer.poof()
         except Exception as e:
-            self.fail("my visualizer didn't work")
+            pytest.fail("my visualizer didn't work")
 ```
 
 The entire test suite can be run as follows::
 
 ```
-$ make test
+$ pytest
 ```
 
 You can also run your own test file as follows::
 
 ```
-$ nosetests tests/test_your_visualizer.py
+$ pytest tests/test_your_visualizer.py
 ```
 
-The Makefile uses the nosetest runner and testing suite as well as the coverage library, so make sure you have those dependencies installed! The `DatasetMixin` also requires requests.py to fetch data from our Amazon S3 account.
+The Makefile uses the pytest runner and testing suite as well as the coverage library, so make sure you have those dependencies installed! The `DatasetMixin` also requires [requests.py](http://docs.python-requests.org/en/master/) to fetch data from our Amazon S3 account.
+
+**Note**: Advanced developers can use our _image comparison tests_ to assert that an image generated matches a baseline image. Read more about this in our [testing documentation](http://www.scikit-yb.org/en/latest/contributing.html#testing)
 
 ### Documentation
 
-The initial documentation for your visualizer will be a well structured docstring. Yellowbrick uses Sphinx to build documentation, therefore docstrings should be written in reStructuredText in numpydoc format (similar to Scikit-Learn). The primary location of your docstring should be right under the class definition, here is an example::
+The initial documentation for your visualizer will be a well structured docstring. Yellowbrick uses Sphinx to build documentation, therefore docstrings should be written in reStructuredText in numpydoc format (similar to scikit-learn). The primary location of your docstring should be right under the class definition, here is an example::
 
 ```python
 class MyVisualizer(Visualizer):
@@ -245,7 +256,7 @@ class MyVisualizer(Visualizer):
     Parameters
     ----------
 
-    model : a Scikit-Learn regressor
+    model : a scikit-learn regressor
         Should be an instance of a regressor, and specifically one whose name
         ends with "CV" otherwise a will raise a YellowbrickTypeError exception
         on instantiation. To use non-CV regressors see:
@@ -273,7 +284,7 @@ class MyVisualizer(Visualizer):
     """
 ```
 
-You should also add your example to the `examples` directory of the documentation when you have the chance, as well as create a demonstration in a notebook in the `examples` directory of the repository.
+This is a very good start to producing a high quality visualizer, but unless it is part of the documentation on our website, it will not be visible. For details on including documentation in the `docs` directory see the [Contributing Documentation](http://www.scikit-yb.org/en/latest/contributing.html#documentation) section in the larger contributing guide.
 
 ## Throughput
 

diff --git a/DESCRIPTION.rst b/DESCRIPTION.rst
@@ -13,7 +13,7 @@ Yellowbrick is a suite of visual analysis and diagnostic tools designed to facil
 
 Visualizers allow users to steer the model selection process, building intuition around feature engineering, algorithm selection, and hyperparameter tuning. For example, visualizers can help diagnose common problems surrounding model complexity and bias, heteroscedasticity, underfit and overtraining, or class balance issues. By applying visualizers to the model selection workflow, Yellowbrick allows you to steer predictive models to more successful results, faster.
 
-Please see the full documentation at: http://scikit-yb.org/
+Please see the full documentation at: http://scikit-yb.org/ particularly the `quick start guide <http://www.scikit-yb.org/en/latest/quickstart.html>`_
 
 Visualizers
 -----------
@@ -30,12 +30,14 @@ Feature Visualization
 - **Parallel Coordinates**: horizontal visualization of instances
 - **Radial Visualization**: separation of instances around a circular plot
 - **PCA Projection**: projection of instances based on principal components
+- **Feature Importances**: rank features based on their in-model performance
 - **Scatter and Joint Plots**: direct data visualization with feature selection
 
 Classification Visualization
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 - **Class Balance**: see how the distribution of classes affects the model
+- **Class Prediction Error**: shows error and support in classification
 - **Classification Report**: visual representation of precision, recall, and F1
 - **ROC/AUC Curves**: receiver operator characteristics and area under the curve
 - **Confusion Matrices**: visual description of class decision making
@@ -61,5 +63,5 @@ Text Visualization
 
 ... and more! Visualizers are being added all the time; be sure to check the examples_ (or even the develop_ branch) and feel free to contribute your ideas for new Visualizers!
 
-.. _examples: http://www.scikit-yb.org/en/latest/examples/examples.html
+.. _examples: http://www.scikit-yb.org/en/latest/api/index.html
 .. _develop: https://github.com/districtdatalabs/yellowbrick/tree/develop
diff --git a/MAINTAINERS.md b/MAINTAINERS.md
@@ -13,9 +13,9 @@ For everyone who has [contributed](https://github.com/DistrictDataLabs/yellowbri
 This is a list of the primary project maintainers. Feel free to @ message them in issues and converse with them directly.
 
 - [bbengfort](https://github.com/bbengfort)
+- [ndanielsen](https://github.com/ndanielsen)
 - [NealHumphrey](https://github.com/NealHumphrey)
 - [jkeung](https://github.com/jkeung)
-- [ndanielsen](https://github.com/ndanielsen)
 
 ## Core Contributors
 
@@ -27,3 +27,4 @@ This is a list of the core-contributors of the project. Core contributors set th
 - [tuulihill](https://github.com/tuulihill)
 - [balavenkatesan](https://github.com/balavenkatesan)
 - [morganmendis](https://github.com/morganmendis)
+- [lwgray](https://github.com/lwgray)
diff --git a/Makefile b/Makefile
@@ -4,11 +4,9 @@ SHELL := /bin/bash
 # Set important Paths
 PROJECT := yellowbrick
 LOCALPATH := $(CURDIR)/$(PROJECT)
-PYTHONPATH := $(LOCALPATH)/
-PYTHON_BIN := $(VIRTUAL_ENV)/bin
 
 # Export targets not associated with files
-.PHONY: test coverage pip virtualenv clean publish uml build deploy
+.PHONY: test coverage pip clean publish uml build deploy install
 
 # Clean build files
 clean:
@@ -19,14 +17,15 @@ clean:
 	-rm -rf build
 	-rm -rf dist
 	-rm -rf $(PROJECT).egg-info
+	-rm -rf .eggs
 	-rm -rf site
 	-rm -rf classes_$(PROJECT).png
 	-rm -rf packages_$(PROJECT).png
 	-rm -rf docs/_build
 
 # Targets for testing
 test:
-	$(PYTHON_BIN)/nosetests -v --with-coverage --cover-package=$(PROJECT) --cover-inclusive --cover-erase tests
+	python setup.py test
 
 # Publish to gh-pages
 publish:
@@ -40,6 +39,10 @@ uml:
 build:
 	python setup.py sdist bdist_wheel
 
+# Install the package from source
+install:
+	python setup.py install
+
 # Deploy to PyPI
 deploy:
 	python setup.py register