Skip to content
This repository has been archived by the owner on Dec 11, 2023. It is now read-only.

Commit

Permalink
Getting ready for 1.1.2 final
Browse files Browse the repository at this point in the history
  • Loading branch information
Francesc Alted committed Feb 1, 2017
1 parent ed5d23c commit b621fff
Show file tree
Hide file tree
Showing 4 changed files with 54 additions and 57 deletions.
98 changes: 47 additions & 51 deletions ANNOUNCE.rst
Original file line number Diff line number Diff line change
@@ -1,40 +1,27 @@
======================
Announcing bcolz 1.1.0
Announcing bcolz 1.1.1
======================

What's new
==========

This release brings quite a lot of changes. After format stabilization
in 1.0, the focus is now in fine-tune many operations (specially queries
in ctables), as well as widening the available computational engines.
This is a maintenance release that brings quite a lot of improvements.
Here are the highlights:

Highlights:
- C-Blosc updated to 1.11.2.

* Much improved performance of ctable.where() and ctable.whereblocks().
Now bcolz is getting closer than ever to fundamental memory limits
during queries (see the updated benchmarks in the data containers
tutorial below).
- Added a new `defaults_ctx` context so that users can select defaults
easily without changing global behaviour. For example::

* Better support for Dask; i.e. GIL is released during Blosc operation
when bcolz is called from a multithreaded app (like Dask). Also, Dask
can be used as another virtual machine for evaluating expressions (so
now it is possible to use it during queries too).
with bcolz.defaults_ctx(vm="python", cparams=bcolz.cparams(clevel=0)):
cout = bcolz.eval("(x + 1) < 0")

* New ctable.fetchwhere() method for getting the rows fulfilling some
condition in one go.
- Fixed a crash occurring in `ctable.todataframe()` when both `columns`
and `orient='columns'` were specified. PR #311. Thanks to Peter
Quackenbush.

* New quantize filter for allowing lossy compression of floating point
data.

* It is possible to create ctables with more than 255 columns now.
Thanks to Skipper Seabold.

* The defaults during carray creation are scalars now. That allows to
create highly dimensional data containers more efficiently.

* carray object does implement the __array__() special method now. With
this, interoperability with numpy arrays is easier and faster.
- Use `pkg_resources.parse_version()` to test for version of packages.
Fixes #322 (PY27 bcolz with dask unicode error).

For a more detailed change log, see:

Expand All @@ -51,34 +38,39 @@ specially chapters 3 (in-memory containers) and 4 (on-disk containers).
What it is
==========

*bcolz* provides columnar and compressed data containers that can live
either on-disk or in-memory. Column storage allows for efficiently
querying tables with a large number of columns. It also allows for
cheap addition and removal of column. In addition, bcolz objects are
compressed by default for reducing memory/disk I/O needs. The
compression process is carried out internally by Blosc, an
extremely fast meta-compressor that is optimized for binary data. Lastly,
high-performance iterators (like ``iter()``, ``where()``) for querying
the objects are provided.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast. Moreover, since the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms. Also, it is
typically tested on both UNIX and Windows operating systems.

Together, bcolz and the Blosc compressor, are finally fulfilling the
promise of accelerating memory I/O, at least for some real scenarios:
*bcolz* provides **columnar and compressed** data containers that can
live either on-disk or in-memory. The compression is carried out
transparently by Blosc, an ultra fast meta-compressor that is optimized
for binary data. Compression is active by default.

Column storage allows for efficiently querying tables with a large
number of columns. It also allows for cheap addition and removal of
columns. Lastly, high-performance iterators (like ``iter()``,
``where()``) for querying the objects are provided.

bcolz can use diffent backends internally (currently numexpr,
Python/NumPy or dask) so as to accelerate many vector and query
operations (although it can use pure NumPy for doing so too). Moreover,
since the carray/ctable containers can be disk-based, it is possible to
use them for seamlessly performing out-of-memory computations.

While NumPy is used as the standard way to feed and retrieve data from
bcolz internal containers, but it also comes with support for
high-performance import/export facilities to/from `HDF5/PyTables tables
<http://www.pytables.org>`_ and `pandas dataframes
<http://pandas.pydata.org>`_.

Have a look at how bcolz and the Blosc compressor, are making a better
use of the memory without an important overhead, at least for some real
scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

Example users of bcolz are Visualfabriq (http://www.visualfabriq.com/),
and Quantopian (https://www.quantopian.com/):
bcolz has minimal dependencies (NumPy is the only strict requisite),
comes with an exhaustive test suite, and it is meant to be used in
production. Example users of bcolz are Visualfabriq
(http://www.visualfabriq.com/), Quantopian (https://www.quantopian.com/)
and scikit-allel:

* Visualfabriq:

Expand All @@ -90,6 +82,10 @@ and Quantopian (https://www.quantopian.com/):
* Using compressed data containers for faster backtesting at scale:
* https://quantopian.github.io/talks/NeedForSpeed/slides.html

* scikit-allel:

* Exploratory analysis of large scale genetic variation data.
* https://github.com/cggh/scikit-allel


Resources
Expand Down
2 changes: 1 addition & 1 deletion LICENSES/BCOLZ.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Copyright Notice and Statement for bcolz Software Library and Utilities:
Copyright (c) 2010-2011 by Francesc Alted
Copyright (c) 2012 by Continuum Analytics
Copyright (c) 2013 by Francesc Alted
Copyright (c) 2014-2016 by Francesc Alted and the bcolz contributors
Copyright (c) 2014-2017 by Francesc Alted and the bcolz contributors
All rights reserved.

Redistribution and use in source and binary forms, with or without
Expand Down
9 changes: 5 additions & 4 deletions RELEASE_NOTES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,18 @@ Changes from 1.1.0 to 1.1.1
- Double-check the value of a column that is being overwritten. Fixes
#307.

- Use `pkg_resources.parse_version()` to test for version of packages.
Fixes #322.

- Now all the columns in a ctable are enforced to be a carray instance
in order to simplify the internal logic for handling columns.

- Now, the cparams are preserved during column replacement, e.g.:

`ct['f0'] = x + 1`

will continue to use the same cparams than the original column.

- C-Blosc updated to 1.11.2.

- Added a new `defaults_ctx` context so that users can select defaults
Expand All @@ -29,10 +34,6 @@ Changes from 1.1.0 to 1.1.1
and `orient='columns'` were specified. PR #311. Thanks to Peter
Quackenbush.

- Replaced `distutils.version.LooseVersion()` by
`pkg_resources.parse_version()` becuase it is more resistant to
versioning schemas.


Changes from 1.0.0 to 1.1.0
===========================
Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@

# General information about the project.
project = u'bcolz'
copyright = u'2010-2016 Francesc Alted and the bcolz contributors'
copyright = u'2010-2017 Francesc Alted and the bcolz contributors'

# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
Expand Down

0 comments on commit b621fff

Please sign in to comment.