Outdpik is an open source Python package that provides different methods for outlier detection. It aims to be the fundamental high-level package for this purpose. Additionally, it offers visualization methods for the outlier analysis.
Here are just a few of the things that outdpik does well:
- It supports numpy arrays and pandas dataframes
- Multiple outlier detection techniques that can be combined
- Powerful visualizations
- Flexible at including one or more columns for the analysis
The source code is currently hosted on GitHub at: https://github.com/DanielPuentee/outdpik
Installer for the latest released version is available at the Python Package Index (PyPI)
# PyPI
pip install outdpik
Examples of configuring and running outpdik:
import outpdik as outdp
outdp = outdp()
We proceed to detect outliers returning a dictionary of numeric features and the outliers instances:
outliers_dict = outdp.outliers(df = df, cols = "all")
Plotting advantages:
outdp.plot_outliers(df = df, col = "x")
- pandas - Provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive
- NumPy - Adds support for large, multi-dimensional arrays, matrices and high-level mathematical functions to operate on these arrays
- SciPy - Includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more
- matplotlib - Comprehensive library for creating static, animated, and interactive visualizations in Python
- seaborn - Provides a high-level interface for drawing attractive statistical graphics
This project is licensed under the terms of the GNU - see the LICENSE file for details.
The official documentation is hosted on: https://outdpik.readthedocs.io/en/latest/
Want to contribute? Great! Open a discussion in Github in this repo and we will answer as soon as possible.