August 30th, 2017

Matthew Rocklin: Dask Release 0.15.2

Programing, Python, by admin.

This work is supported by Anaconda Inc.
and the Data Driven Discovery Initiative from the Moore
Foundation
.

I’m pleased to announce the release of Dask version 0.15.2. This release
contains stability enhancements and bug fixes. This blogpost outlines
notable changes since the 0.15.0 release on June 11th.

You can conda install Dask:

conda install dask

or pip install from PyPI

pip install dask[complete] --upgrade

Conda packages are available both on the defaults and conda-forge channels.

Full changelogs are available here:

Some notable changes follow.

New dask-core and dask conda packages

On conda there are now three relevant Dask packages:

  1. dask-core: Package that includes only the core Dask package. This has no dependencies other than the standard library. This is primarily intended for down-stream libraries that depend on certain parts of Dask.
  2. distributed: Dask’s distributed scheduler, depends on Tornado, cloudpickle, and other libraries.
  3. dask: Metapackage that includes dask-core, distributed, and all relevant libraries like NumPy, Pandas, Bokeh, etc.. This is intended for users to install

This organization is designed to both allow downstream libraries to only
depend on the parts of Dask that they need while also making the default
behavior for users all-inclusive.

Downstream libraries may want to change conda dependencies from dask to
dask-core. They will then need to be careful to include the necessary
libraries (like numpy or cloudpickle) based on their user community.

Improved Deployment

Due to increased deployment on Docker or other systems with complex networking
rules dask-worker processes now include separate --contact-address and
--listen-address keywords that can be used to specify addresses that they
advertise and addresses on which they listen. This is especially helpful when
the perspective of ones network can shift dramatically.

dask-worker scheduler-address:8786 \ --contact-address 192.168.0.100:9000 # contact me at 192.168.0.100:9000 --listen-address 172.142.0.100:9000 # I listen on this host

Additionally other services like the HTTP and Bokeh servers now respect the
hosts provided by --listen-address or --host keywords and will not be
visible outside of the specified network.

Avoid memory, file descriptor, and process leaks

There were a few occasions where Dask would leak resources in complex
situations. Many of these have now been cleaned up. We’re grateful to all
those who were able to provide very detailed case studies that demonstrated
these issues and even more grateful to those who participated in resolving
them.

There is undoubtedly more work to do here and we look forward to future
collaboration.

Array and DataFrame APIs

As usual, Dask array and dataframe have a new set of functions that fill out
their API relative to NumPy and Pandas.

See the full APIs for further reference:

Deprecations

Officially deprecated dask.distributed.Executor, users should use dask.distributed.Client
instead. Previously this was set to an alias.

Removed Bag.concat, users should use Bag.flatten instead.

Removed magic tuple unpacking in Bag.map like bag.map(lambda x, y: x + y).
Users should unpack manually instead.

Julia

Developers from the Invenia have been building Julia workers and clients that
operate with the Dask.distributed scheduler. They have been helpful in raising
issues necessary to ensure cross-language support.

Acknowledgements

The following people contributed to the dask/dask repository since the 0.15.0
release on June 11th

  • Bogdan
  • Elliott Sales de Andrade
  • Bruce Merry
  • Erik Welch
  • Fabian Keller
  • James Bourbeau
  • Jeff Reback
  • Jim Crist
  • John A Kirkham
  • Luke Canavan
  • Mark Dunne
  • Martin Durant
  • Matthew Rocklin
  • Olivier Grisel
  • Søren Fuglede Jørgensen
  • Stephan Hoyer
  • Tom Augspurger
  • Yu Feng

The following people contributed to the dask/distributed repository since the
1.17.1 release on June 14th:

  • Antoine Pitrou
  • Dan Brown
  • Elliott Sales de Andrade
  • Eric Davies
  • Erik Welch
  • Evan Welch
  • John A Kirkham
  • Jim Crist
  • James Bourbeau
  • Jeremiah Lowin
  • Julius Neuffer
  • Martin Durant
  • Matthew Rocklin
  • Paul Anton Letnes
  • Peter Waller
  • Sohaib Iftikhar
  • Tom Augspurger

Additionally we’re happy to announce that John Kirkham
(@jakirkham) has accepted commit rights to the
Dask organization and become a core contributor. John has been active through
the Dask project, and particularly active in Dask.array.

Back Top

Leave a Reply