Skip to content

Commit

Permalink
Merge pull request #556 from ironmussa/develop
Browse files Browse the repository at this point in the history
Road to 2.2.6
  • Loading branch information
FavioVazquez authored Jun 11, 2019
2 parents 46856fb + e44ac44 commit aa8ae1f
Show file tree
Hide file tree
Showing 54 changed files with 5,783 additions and 2,293 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,5 @@ handyspark \.ipynb
handyspark1\.ipynb

examples/new-api-sandbox\.ipynb

examples/new-api-optimus-jars\.ipynb
29 changes: 0 additions & 29 deletions .pypirc

This file was deleted.

1 change: 0 additions & 1 deletion .python-version

This file was deleted.

43 changes: 0 additions & 43 deletions Dockerfile

This file was deleted.

1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
include optimus/templates/*
include optimus/profiler/templates/*
include optimus/css/*
include optimus/jars/*
33 changes: 26 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

[![Logo Optimus](images/logoOptimus.png)](https://hioptimus.com)


Expand Down Expand Up @@ -225,21 +226,25 @@ Let's load a "big" dataset
df = op.load.csv("https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/data/Meteorite_Landings.csv").h_repartition()
```

### Numeric

```python
op.profiler.run(df, "name", infer=False)
op.profiler.run(df, "mass (g)", infer=False)
```

![](images/profiler_numeric.png)

```python
op.profiler.run(df, "name", infer=False)
```

![](images/profiler.png)

For dates data types Optimus can give you extra data
```python
op.profiler.run(df, "year", infer=True)
```

```python
```
![](images/profiler1.png)

## Plots
Expand All @@ -251,15 +256,29 @@ df = df.rows.drop_na(["age","fare"])
```

```python
df.plot.scatter(["fare", "age"], buckets=30)
df.plot.hist("fare", output="image", path="images/hist.png")
```

```python
df.plot.frequency("age", output="image", path="images/frequency.png")
```

```python
df.plot.box("age")
df.plot.scatter(["fare", "age"], buckets=30, output="image", path="images/scatter.png")
```

```python
df.plot.box("age", output="image", path="images/box.png")
```
```python
df.plot.correlation(["age","fare","survived"])
```
### Using other plotting libraries


Optimus has a tiny API so you can use any plotting library. For example, you can use df.cols.scatter(), df.cols.frequency(), df.cols.boxplot() or df.cols.hist() to output a JSON that you can process to adapt the data to any plotting library.


## Outliers


Expand All @@ -282,9 +301,9 @@ df.outliers.iqr("age").drop().table()



```
```python
df.outliers.z_score("age", threshold=2).drop()
df.outliers.modified_z_score("age", threshold = 2 ).drop()
df.outliers.modified_z_score("age", threshold = 2).drop()
df.outliers.mad("age", threshold = 2).drop()
```

Expand Down
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
# The short X.Y version.
version = '2.2'
# The full version, including alpha/beta/rc tags.
release = '2.2.51'
release = '2.2.6'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
Loading

0 comments on commit aa8ae1f

Please sign in to comment.