Skip to content

Commit a4d8013

Browse files
committed
Merge branch 'main' into string-arguments-for-codecs
# Conflicts: # tests/test_array.py
2 parents 0abc569 + 2911be8 commit a4d8013

File tree

120 files changed

+7855
-10693
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

120 files changed

+7855
-10693
lines changed

.github/CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
Contributing
22
============
33

4-
Please see the [project documentation](https://zarr.readthedocs.io/en/stable/contributing.html) for information about contributing to Zarr.
4+
Please see the [project documentation](https://zarr.readthedocs.io/en/stable/developers/contributing.html) for information about contributing to Zarr.

.github/workflows/hypothesis.yaml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,12 +25,19 @@ jobs:
2525

2626
strategy:
2727
matrix:
28-
python-version: ['3.11']
28+
python-version: ['3.12']
2929
numpy-version: ['2.2']
3030
dependency-set: ["optional"]
3131

3232
steps:
3333
- uses: actions/checkout@v4
34+
- name: Set HYPOTHESIS_PROFILE based on trigger
35+
run: |
36+
if [[ "${{ github.event_name }}" == "schedule" || "${{ github.event_name }}" == "workflow_dispatch" ]]; then
37+
echo "HYPOTHESIS_PROFILE=nightly" >> $GITHUB_ENV
38+
else
39+
echo "HYPOTHESIS_PROFILE=ci" >> $GITHUB_ENV
40+
fi
3441
- name: Set up Python
3542
uses: actions/setup-python@v5
3643
with:
@@ -58,6 +65,7 @@ jobs:
5865
if: success()
5966
id: status
6067
run: |
68+
echo "Using Hypothesis profile: $HYPOTHESIS_PROFILE"
6169
hatch env run --env test.py${{ matrix.python-version }}-${{ matrix.numpy-version }}-${{ matrix.dependency-set }} run-hypothesis
6270
6371
# explicitly save the cache so it gets updated, also do this even if it fails.

.github/workflows/test.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,8 @@ jobs:
6161
hatch env create test.py${{ matrix.python-version }}-${{ matrix.numpy-version }}-${{ matrix.dependency-set }}
6262
hatch env run -e test.py${{ matrix.python-version }}-${{ matrix.numpy-version }}-${{ matrix.dependency-set }} list-env
6363
- name: Run Tests
64+
env:
65+
HYPOTHESIS_PROFILE: ci
6466
run: |
6567
hatch env run --env test.py${{ matrix.python-version }}-${{ matrix.numpy-version }}-${{ matrix.dependency-set }} run-coverage
6668
- name: Upload coverage
@@ -102,7 +104,7 @@ jobs:
102104
hatch env run -e ${{ matrix.dependency-set }} list-env
103105
- name: Run Tests
104106
run: |
105-
hatch env run --env ${{ matrix.dependency-set }} run
107+
hatch env run --env ${{ matrix.dependency-set }} run-coverage
106108
- name: Upload coverage
107109
uses: codecov/codecov-action@v5
108110
with:

changes/2774.feature.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Add `zarr.storage.FsspecStore.from_mapper()` so that `zarr.open()` supports stores of type `fsspec.mapping.FSMap`.

changes/2871.feature.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
Added public API for Buffer ABCs and implementations.
2+
3+
Use :mod:`zarr.buffer` to access buffer implementations, and
4+
:mod:`zarr.abc.buffer` for the interface to implement new buffer types.
5+
6+
Users previously importing buffer from ``zarr.core.buffer`` should update their
7+
imports to use :mod:`zarr.buffer`. As a reminder, all of ``zarr.core`` is
8+
considered a private API that's not covered by zarr-python's versioning policy.

changes/2874.feature.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Adds zarr-specific data type classes. This replaces the internal use of numpy data types for zarr
2+
v2 and a fixed set of string enums for zarr v3. This change is largely internal, but it does
3+
change the type of the ``dtype`` and ``data_type`` fields on the ``ArrayV2Metadata`` and
4+
``ArrayV3Metadata`` classes. It also changes the JSON metadata representation of the
5+
variable-length string data type, but the old metadata representation can still be
6+
used when reading arrays. The logic for automatically choosing the chunk encoding for a given data
7+
type has also changed, and this necessitated changes to the ``config`` API.
8+
9+
For more on this new feature, see the `documentation </user-guide/data_types.html>`_

changes/3127.bugfix.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
When `zarr.save` has an argument `path=some/path/` and multiple arrays in `args`, the path resulted in `some/path/some/path` due to using the `path`
2+
argument twice while building the array path. This is now fixed.

changes/3128.bugfix.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Fix `zarr.open` default for argument `mode` when `store` is `read_only`

changes/3130.feature.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Port more stateful testing actions from `Icechunk <https://icechunk.io>`_.

changes/3138.feature.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Adds a `with_read_only` convenience method to the `Store` abstract base class (raises `NotImplementedError`) and implementations to the `MemoryStore`, `ObjectStore`, `LocalStore`, and `FsspecStore` classes.

docs/conf.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,6 @@ def skip_submodules(
106106
"installation": "user-guide/installation.html",
107107
"api": "api/zarr/index",
108108
"release": "release-notes.html",
109-
"release-notes": "release-notes.html"
110109
}
111110

112111
# The language for content autogenerated by Sphinx. Refer to documentation

docs/user-guide/arrays.rst

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ which can be used to print useful diagnostics, e.g.::
182182
>>> z.info
183183
Type : Array
184184
Zarr format : 3
185-
Data type : DataType.int32
185+
Data type : Int32(endianness='little')
186186
Fill value : 0
187187
Shape : (10000, 10000)
188188
Chunk shape : (1000, 1000)
@@ -200,7 +200,7 @@ prints additional diagnostics, e.g.::
200200
>>> z.info_complete()
201201
Type : Array
202202
Zarr format : 3
203-
Data type : DataType.int32
203+
Data type : Int32(endianness='little')
204204
Fill value : 0
205205
Shape : (10000, 10000)
206206
Chunk shape : (1000, 1000)
@@ -248,7 +248,7 @@ built-in delta filter::
248248
The default compressor can be changed by setting the value of the using Zarr's
249249
:ref:`user-guide-config`, e.g.::
250250

251-
>>> with zarr.config.set({'array.v2_default_compressor.numeric': {'id': 'blosc'}}):
251+
>>> with zarr.config.set({'array.v2_default_compressor.default': {'id': 'blosc'}}):
252252
... z = zarr.create_array(store={}, shape=(100000000,), chunks=(1000000,), dtype='int32', zarr_format=2)
253253
>>> z.filters
254254
()
@@ -288,7 +288,7 @@ Here is an example using a delta filter with the Blosc compressor::
288288
>>> z.info
289289
Type : Array
290290
Zarr format : 3
291-
Data type : DataType.int32
291+
Data type : Int32(endianness='little')
292292
Fill value : 0
293293
Shape : (10000, 10000)
294294
Chunk shape : (1000, 1000)
@@ -603,7 +603,7 @@ Sharded arrays can be created by providing the ``shards`` parameter to :func:`za
603603
>>> a.info_complete()
604604
Type : Array
605605
Zarr format : 3
606-
Data type : DataType.uint8
606+
Data type : UInt8()
607607
Fill value : 0
608608
Shape : (10000, 10000)
609609
Shard shape : (1000, 1000)
@@ -612,10 +612,10 @@ Sharded arrays can be created by providing the ``shards`` parameter to :func:`za
612612
Read-only : False
613613
Store type : LocalStore
614614
Filters : ()
615-
Serializer : BytesCodec(endian=<Endian.little: 'little'>)
615+
Serializer : BytesCodec(endian=None)
616616
Compressors : (ZstdCodec(level=0, checksum=False),)
617617
No. bytes : 100000000 (95.4M)
618-
No. bytes stored : 3981552
618+
No. bytes stored : 3981473
619619
Storage ratio : 25.1
620620
Shards Initialized : 100
621621

docs/user-guide/config.rst

Lines changed: 25 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -43,39 +43,30 @@ This is the current default configuration::
4343

4444
>>> zarr.config.pprint()
4545
{'array': {'order': 'C',
46-
'v2_default_compressor': {'bytes': {'checksum': False,
47-
'id': 'zstd',
48-
'level': 0},
49-
'numeric': {'checksum': False,
50-
'id': 'zstd',
51-
'level': 0},
52-
'string': {'checksum': False,
46+
'v2_default_compressor': {'default': {'checksum': False,
5347
'id': 'zstd',
54-
'level': 0}},
55-
'v2_default_filters': {'bytes': [{'id': 'vlen-bytes'}],
56-
'numeric': None,
57-
'raw': None,
58-
'string': [{'id': 'vlen-utf8'}]},
59-
'v3_default_compressors': {'bytes': [{'configuration': {'checksum': False,
60-
'level': 0},
61-
'name': 'zstd'}],
62-
'numeric': [{'configuration': {'checksum': False,
48+
'level': 0},
49+
'variable-length-string': {'checksum': False,
50+
'id': 'zstd',
51+
'level': 0}},
52+
'v2_default_filters': {'default': None,
53+
'variable-length-string': [{'id': 'vlen-utf8'}]},
54+
'v3_default_compressors': {'default': [{'configuration': {'checksum': False,
6355
'level': 0},
6456
'name': 'zstd'}],
65-
'string': [{'configuration': {'checksum': False,
66-
'level': 0},
67-
'name': 'zstd'}]},
68-
'v3_default_filters': {'bytes': [], 'numeric': [], 'string': []},
69-
'v3_default_serializer': {'bytes': {'name': 'vlen-bytes'},
70-
'numeric': {'configuration': {'endian': 'little'},
71-
'name': 'bytes'},
72-
'string': {'name': 'vlen-utf8'}},
73-
'write_empty_chunks': False},
74-
'async': {'concurrency': 10, 'timeout': None},
75-
'buffer': 'zarr.core.buffer.cpu.Buffer',
76-
'codec_pipeline': {'batch_size': 1,
77-
'path': 'zarr.core.codec_pipeline.BatchedCodecPipeline'},
78-
'codecs': {'blosc': 'zarr.codecs.blosc.BloscCodec',
57+
'variable-length-string': [{'configuration': {'checksum': False,
58+
'level': 0},
59+
'name': 'zstd'}]},
60+
'v3_default_filters': {'default': [], 'variable-length-string': []},
61+
'v3_default_serializer': {'default': {'configuration': {'endian': 'little'},
62+
'name': 'bytes'},
63+
'variable-length-string': {'name': 'vlen-utf8'}},
64+
'write_empty_chunks': False},
65+
'async': {'concurrency': 10, 'timeout': None},
66+
'buffer': 'zarr.buffer.cpu.Buffer',
67+
'codec_pipeline': {'batch_size': 1,
68+
'path': 'zarr.core.codec_pipeline.BatchedCodecPipeline'},
69+
'codecs': {'blosc': 'zarr.codecs.blosc.BloscCodec',
7970
'bytes': 'zarr.codecs.bytes.BytesCodec',
8071
'crc32c': 'zarr.codecs.crc32c_.Crc32cCodec',
8172
'endian': 'zarr.codecs.bytes.BytesCodec',
@@ -85,7 +76,7 @@ This is the current default configuration::
8576
'vlen-bytes': 'zarr.codecs.vlen_utf8.VLenBytesCodec',
8677
'vlen-utf8': 'zarr.codecs.vlen_utf8.VLenUTF8Codec',
8778
'zstd': 'zarr.codecs.zstd.ZstdCodec'},
88-
'default_zarr_format': 3,
89-
'json_indent': 2,
90-
'ndbuffer': 'zarr.core.buffer.cpu.NDBuffer',
91-
'threading': {'max_workers': None}}
79+
'default_zarr_format': 3,
80+
'json_indent': 2,
81+
'ndbuffer': 'zarr.buffer.cpu.NDBuffer',
82+
'threading': {'max_workers': None}}

docs/user-guide/consolidated_metadata.rst

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ that can be used.:
4747
>>> from pprint import pprint
4848
>>> pprint(dict(sorted(consolidated_metadata.items())))
4949
{'a': ArrayV3Metadata(shape=(1,),
50-
data_type=<DataType.float64: 'float64'>,
50+
data_type=Float64(endianness='little'),
5151
chunk_grid=RegularChunkGrid(chunk_shape=(1,)),
5252
chunk_key_encoding=DefaultChunkKeyEncoding(name='default',
5353
separator='/'),
@@ -60,7 +60,7 @@ that can be used.:
6060
node_type='array',
6161
storage_transformers=()),
6262
'b': ArrayV3Metadata(shape=(2, 2),
63-
data_type=<DataType.float64: 'float64'>,
63+
data_type=Float64(endianness='little'),
6464
chunk_grid=RegularChunkGrid(chunk_shape=(2, 2)),
6565
chunk_key_encoding=DefaultChunkKeyEncoding(name='default',
6666
separator='/'),
@@ -73,7 +73,7 @@ that can be used.:
7373
node_type='array',
7474
storage_transformers=()),
7575
'c': ArrayV3Metadata(shape=(3, 3, 3),
76-
data_type=<DataType.float64: 'float64'>,
76+
data_type=Float64(endianness='little'),
7777
chunk_grid=RegularChunkGrid(chunk_shape=(3, 3, 3)),
7878
chunk_key_encoding=DefaultChunkKeyEncoding(name='default',
7979
separator='/'),
@@ -114,3 +114,23 @@ removed, or modified, consolidated metadata may not be desirable.
114114
metadata.
115115

116116
.. _Consolidated Metadata: https://github.com/zarr-developers/zarr-specs/pull/309
117+
118+
Stores Without Support for Consolidated Metadata
119+
------------------------------------------------
120+
121+
Some stores may want to opt out of the consolidated metadata mechanism. This
122+
may be for several reasons like:
123+
124+
* They want to maintain read-write consistency, which is challenging with
125+
consolidated metadata.
126+
* They have their own consolidated metadata mechanism.
127+
* They offer good enough performance without need for consolidation.
128+
129+
This type of store can declare it doesn't want consolidation by implementing
130+
`Store.supports_consolidated_metadata` and returning `False`. For stores that don't support
131+
consolidation, Zarr will:
132+
133+
* Raise an error on `consolidate_metadata` calls, maintaining the store in
134+
its unconsolidated state.
135+
* Raise an error in `AsyncGroup.open(..., use_consolidated=True)`
136+
* Not use consolidated metadata in `AsyncGroup.open(..., use_consolidated=None)`

0 commit comments

Comments
 (0)