From b621fff3ff67ea593f4d13b61fa12a1c2a962b7e Mon Sep 17 00:00:00 2001 From: Francesc Alted Date: Wed, 1 Feb 2017 12:29:44 +0100 Subject: [PATCH] Getting ready for 1.1.2 final --- ANNOUNCE.rst | 98 ++++++++++++++++++++++------------------------ LICENSES/BCOLZ.txt | 2 +- RELEASE_NOTES.rst | 9 +++-- docs/conf.py | 2 +- 4 files changed, 54 insertions(+), 57 deletions(-) diff --git a/ANNOUNCE.rst b/ANNOUNCE.rst index 7b1a8fe1..4ca74757 100644 --- a/ANNOUNCE.rst +++ b/ANNOUNCE.rst @@ -1,40 +1,27 @@ ====================== -Announcing bcolz 1.1.0 +Announcing bcolz 1.1.1 ====================== What's new ========== -This release brings quite a lot of changes. After format stabilization -in 1.0, the focus is now in fine-tune many operations (specially queries -in ctables), as well as widening the available computational engines. +This is a maintenance release that brings quite a lot of improvements. +Here are the highlights: -Highlights: +- C-Blosc updated to 1.11.2. -* Much improved performance of ctable.where() and ctable.whereblocks(). - Now bcolz is getting closer than ever to fundamental memory limits - during queries (see the updated benchmarks in the data containers - tutorial below). +- Added a new `defaults_ctx` context so that users can select defaults + easily without changing global behaviour. For example:: -* Better support for Dask; i.e. GIL is released during Blosc operation - when bcolz is called from a multithreaded app (like Dask). Also, Dask - can be used as another virtual machine for evaluating expressions (so - now it is possible to use it during queries too). + with bcolz.defaults_ctx(vm="python", cparams=bcolz.cparams(clevel=0)): + cout = bcolz.eval("(x + 1) < 0") -* New ctable.fetchwhere() method for getting the rows fulfilling some - condition in one go. +- Fixed a crash occurring in `ctable.todataframe()` when both `columns` + and `orient='columns'` were specified. PR #311. Thanks to Peter + Quackenbush. -* New quantize filter for allowing lossy compression of floating point - data. - -* It is possible to create ctables with more than 255 columns now. - Thanks to Skipper Seabold. - -* The defaults during carray creation are scalars now. That allows to - create highly dimensional data containers more efficiently. - -* carray object does implement the __array__() special method now. With - this, interoperability with numpy arrays is easier and faster. +- Use `pkg_resources.parse_version()` to test for version of packages. + Fixes #322 (PY27 bcolz with dask unicode error). For a more detailed change log, see: @@ -51,34 +38,39 @@ specially chapters 3 (in-memory containers) and 4 (on-disk containers). What it is ========== -*bcolz* provides columnar and compressed data containers that can live -either on-disk or in-memory. Column storage allows for efficiently -querying tables with a large number of columns. It also allows for -cheap addition and removal of column. In addition, bcolz objects are -compressed by default for reducing memory/disk I/O needs. The -compression process is carried out internally by Blosc, an -extremely fast meta-compressor that is optimized for binary data. Lastly, -high-performance iterators (like ``iter()``, ``where()``) for querying -the objects are provided. - -bcolz can use numexpr internally so as to accelerate many vector and -query operations (although it can use pure NumPy for doing so too). -numexpr optimizes the memory usage and use several cores for doing the -computations, so it is blazing fast. Moreover, since the carray/ctable -containers can be disk-based, and it is possible to use them for -seamlessly performing out-of-memory computations. - -bcolz has minimal dependencies (NumPy), comes with an exhaustive test -suite and fully supports both 32-bit and 64-bit platforms. Also, it is -typically tested on both UNIX and Windows operating systems. - -Together, bcolz and the Blosc compressor, are finally fulfilling the -promise of accelerating memory I/O, at least for some real scenarios: +*bcolz* provides **columnar and compressed** data containers that can +live either on-disk or in-memory. The compression is carried out +transparently by Blosc, an ultra fast meta-compressor that is optimized +for binary data. Compression is active by default. + +Column storage allows for efficiently querying tables with a large +number of columns. It also allows for cheap addition and removal of +columns. Lastly, high-performance iterators (like ``iter()``, +``where()``) for querying the objects are provided. + +bcolz can use diffent backends internally (currently numexpr, +Python/NumPy or dask) so as to accelerate many vector and query +operations (although it can use pure NumPy for doing so too). Moreover, +since the carray/ctable containers can be disk-based, it is possible to +use them for seamlessly performing out-of-memory computations. + +While NumPy is used as the standard way to feed and retrieve data from +bcolz internal containers, but it also comes with support for +high-performance import/export facilities to/from `HDF5/PyTables tables +`_ and `pandas dataframes +`_. + +Have a look at how bcolz and the Blosc compressor, are making a better +use of the memory without an important overhead, at least for some real +scenarios: http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots -Example users of bcolz are Visualfabriq (http://www.visualfabriq.com/), -and Quantopian (https://www.quantopian.com/): +bcolz has minimal dependencies (NumPy is the only strict requisite), +comes with an exhaustive test suite, and it is meant to be used in +production. Example users of bcolz are Visualfabriq +(http://www.visualfabriq.com/), Quantopian (https://www.quantopian.com/) +and scikit-allel: * Visualfabriq: @@ -90,6 +82,10 @@ and Quantopian (https://www.quantopian.com/): * Using compressed data containers for faster backtesting at scale: * https://quantopian.github.io/talks/NeedForSpeed/slides.html +* scikit-allel: + + * Exploratory analysis of large scale genetic variation data. + * https://github.com/cggh/scikit-allel Resources diff --git a/LICENSES/BCOLZ.txt b/LICENSES/BCOLZ.txt index 7b06d38a..11327189 100644 --- a/LICENSES/BCOLZ.txt +++ b/LICENSES/BCOLZ.txt @@ -3,7 +3,7 @@ Copyright Notice and Statement for bcolz Software Library and Utilities: Copyright (c) 2010-2011 by Francesc Alted Copyright (c) 2012 by Continuum Analytics Copyright (c) 2013 by Francesc Alted -Copyright (c) 2014-2016 by Francesc Alted and the bcolz contributors +Copyright (c) 2014-2017 by Francesc Alted and the bcolz contributors All rights reserved. Redistribution and use in source and binary forms, with or without diff --git a/RELEASE_NOTES.rst b/RELEASE_NOTES.rst index 50b6f8e6..757ec17f 100644 --- a/RELEASE_NOTES.rst +++ b/RELEASE_NOTES.rst @@ -10,6 +10,9 @@ Changes from 1.1.0 to 1.1.1 - Double-check the value of a column that is being overwritten. Fixes #307. +- Use `pkg_resources.parse_version()` to test for version of packages. + Fixes #322. + - Now all the columns in a ctable are enforced to be a carray instance in order to simplify the internal logic for handling columns. @@ -17,6 +20,8 @@ Changes from 1.1.0 to 1.1.1 `ct['f0'] = x + 1` + will continue to use the same cparams than the original column. + - C-Blosc updated to 1.11.2. - Added a new `defaults_ctx` context so that users can select defaults @@ -29,10 +34,6 @@ Changes from 1.1.0 to 1.1.1 and `orient='columns'` were specified. PR #311. Thanks to Peter Quackenbush. -- Replaced `distutils.version.LooseVersion()` by - `pkg_resources.parse_version()` becuase it is more resistant to - versioning schemas. - Changes from 1.0.0 to 1.1.0 =========================== diff --git a/docs/conf.py b/docs/conf.py index 8de53d2f..056e0de5 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -51,7 +51,7 @@ # General information about the project. project = u'bcolz' -copyright = u'2010-2016 Francesc Alted and the bcolz contributors' +copyright = u'2010-2017 Francesc Alted and the bcolz contributors' # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the