dask chunking tutorial outline

from the pangeo working meeting discussion with @mgrover1 @jmunroe @norlandrhagen 

Here's an outline for an intermediate tutorial talking about dask chunking specifically for Xarray users 

-----
### Motivation: why care about chunk size?
- demonstrate relation between chunk size and computation time / number of tasks with a simple example?
  - maybe even memory usage
- https://tutorial.dask.org/02_array.html#Choosing-good-chunk-sizes
- https://docs.dask.org/en/stable/array-chunks.html

### Keeping track
- monitoring chunk sizes and num tasks throughout the pipeline using the repr
  - use some images
- while output blocks may be small (say after a big reduction), intermediate blocks need not be.
- So keep monitoring chunksizes (and tasks) throughout the pipeline.

### Why is it important to choose appropriate chunks early in the pipeline?

- Demonstrate that rechunking is not cheap in most cases

### Specify chunks when reading data
1. Avoid `chunks="auto"`.
   - https://docs.dask.org/en/stable/array-chunks.html#automatic-chunking
2. Specifying `chunks` during data read
   - `open_dataset`
   - `open_mfdataset`
3. Analysis vs storage chunks:
   - Dask chunks should be a multiple of chunks on disk
   - talk about aligning chunks with files stored on disk
   - @djhoese [example](https://github.com/pydata/xarray/discussions/6945#discussioncomment-3468286)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dask chunking tutorial outline #157

Motivation: why care about chunk size?

Keeping track

Why is it important to choose appropriate chunks early in the pipeline?

Specify chunks when reading data

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

dask chunking tutorial outline #157

Description

Motivation: why care about chunk size?

Keeping track

Why is it important to choose appropriate chunks early in the pipeline?

Specify chunks when reading data

Activity

scottyhq commented on Jan 31, 2023

djhoese commented on Feb 1, 2023

dcherian commented on Feb 2, 2023

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions