Add compressed image support for QEMU driver #789

evakhoni · 2025-12-29T13:07:14Z

Closes #737

Summary

Adds transparent decompression support for compressed disk images when flashing to the QEMU driver. Compressed images (.gz, .xz, .bz2, .zstd) are automatically detected and decompressed on the fly.

Changes

packages/jumpstarter/jumpstarter/streams/encoding.py

Added FileSignature dataclass to represent compression format signatures
Added COMPRESSION_SIGNATURES tuple with file signatures for gzip, xz, bz2, and zstd
Added detect_compression_from_signature() function for auto-detection
Added create_decompressor() helper function
Added AutoDecompressIterator async iterator that wraps a byte stream and transparently decompresses if needed

packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py

Modified QemuFlasher.flash() to wrap the source stream with AutoDecompressIterator

How it works

When flash() is called, the first 8 bytes are buffered
File signature detection matches against known compression formats
If compressed: a decompressor is created and chunks are decompressed on the fly
If uncompressed: data passes through unchanged

Summary by CodeRabbit

New Features
- Flash operations now transparently accept and decompress images in gzip, xz, bz2, and zstd formats, allowing compressed files to be flashed without manual pre-processing.
Documentation
- Flash behavior and supported decompression formats are documented in the flash operation help text.
Tests
- Added comprehensive tests covering compression detection and automatic decompression behavior across formats and stream scenarios.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

netlify · 2025-12-29T13:07:20Z

✅ Deploy Preview for jumpstarter-docs ready!

Name	Link
🔨 Latest commit	`ed90de4`
🔍 Latest deploy log	https://app.netlify.com/projects/jumpstarter-docs/deploys/6952b8914837b8000802e3ae
😎 Deploy Preview	https://deploy-preview-789--jumpstarter-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

coderabbitai · 2025-12-29T13:07:24Z

📝 Walkthrough

Walkthrough

Adds file-signature based compression detection and an async AutoDecompressIterator (gzip, xz, bz2, zstd), and integrates it into QemuFlasher.flash so input image streams are transparently decompressed before writing to the target.

Changes

Cohort / File(s)	Summary
Compression Detection & Decompression Infrastructure `packages/jumpstarter/jumpstarter/streams/encoding.py`	Added `FileSignature` dataclass, `COMPRESSION_SIGNATURES`, `SIGNATURE_BUFFER_SIZE`, and `detect_compression_from_signature()`. Added `create_decompressor()` factory and implemented `AutoDecompressIterator` (async iterator) that buffers initial bytes, detects format, and yields decompressed or passthrough data for gzip/xz/bz2/zstd.
QEMU Driver Integration `packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py`	Wrapped the input stream in `AutoDecompressIterator` inside `QemuFlasher.flash()` and added a docstring describing decompression behavior. No public signatures changed.
Tests `packages/jumpstarter/jumpstarter/streams/encoding_test.py`	Added tests for signature detection and AutoDecompressIterator (gzip, xz, bz2, zstd, passthrough, empty/small chunks, large data); adjusted zstd import handling.

Sequence Diagram(s)

sequenceDiagram
    participant Qemu as QemuFlasher.flash()
    participant Stream as Input AsyncIterator
    participant AutoDec as AutoDecompressIterator
    participant Factory as create_decompressor()
    participant Target as Target Stream/Writer

    Qemu->>AutoDec: wrap(Stream)
    Note over AutoDec: buffer initial bytes (SIGNATURE_BUFFER_SIZE)
    AutoDec->>AutoDec: inspect buffered signature

    alt compression detected
        AutoDec->>Factory: request decompressor(type)
        Factory-->>AutoDec: decompressor
        AutoDec->>AutoDec: decompress buffered + incoming chunks
        AutoDec-->>Target: yield decompressed chunks
    else no compression
        AutoDec-->>Target: yield original chunks (passthrough)
    end

    Note over Target: write to device/image

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

mangelajo

Poem

🐰 I sniffed the bits that hide and bind,

I peeked the header, found the kind.
I hop, I buffer, then unwind—
Gzip, xz, bz2, zstd, all resigned.
Streams flow light; the flash runs kind.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.17% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add compressed image support for QEMU driver' directly and clearly describes the main change: adding support for compressed images in the QEMU driver.
Linked Issues check	✅ Passed	The PR implements transparent decompression for gzip, xz, bzip2, and zstd formats through FileSignature detection and AutoDecompressIterator, meeting the core coding requirement of supporting .gz and .xz packed images for the QEMU driver.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to implementing compressed image support: decompression utilities in encoding.py, AutoDecompressIterator integration in QEMU driver, and comprehensive tests for the new functionality.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

packages/jumpstarter/jumpstarter/streams/encoding.py (1)
62-72: Consider adding explicit handling for unmatched compression types.

The function relies on exhaustive pattern matching but lacks a default case. If a new Compression enum value is added without updating this function, it would silently return None.
🔎 Proposed improvement
 def create_decompressor(compression: Compression) -> Any:
     """Create a decompressor object for the given compression type."""
     match compression:
         case Compression.GZIP:
             return zlib.decompressobj(wbits=47)  # Auto-detect gzip/zlib
         case Compression.XZ:
             return lzma.LZMADecompressor()
         case Compression.BZ2:
             return bz2.BZ2Decompressor()
         case Compression.ZSTD:
             return zstd.ZstdDecompressor()
+        case _:
+            raise ValueError(f"Unsupported compression type: {compression}")

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b05be84 and fa7c213.

📒 Files selected for processing (2)

packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py
packages/jumpstarter/jumpstarter/streams/encoding.py

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project-structure.mdc)

Ruff should be used for code formatting and linting, excluding jumpstarter-protocol package

Files:

packages/jumpstarter/jumpstarter/streams/encoding.py
packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py

packages/jumpstarter-driver-*/jumpstarter_driver_*/driver.py

📄 CodeRabbit inference engine (.cursor/rules/creating-new-drivers.mdc)

Driver class names should be in CamelCase and be descriptive with appropriate suffixes based on functionality: Power drivers should end with *Power, Network drivers with *Network, Flasher drivers with *Flasher, Console drivers with *Console, Server drivers with *Server

Files:

packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py

packages/jumpstarter-driver-**/jumpstarter_driver_**/*.py

📄 CodeRabbit inference engine (.cursor/rules/creating-new-drivers.mdc)

Driver implementations should follow existing code style validated with make lint (fix with make lint-fix), perform static type checking with make ty-pkg-${package_name}, add comprehensive tests, and verify all tests pass with make test-pkg-${package_name} or make test

Files:

packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py

🧠 Learnings (1)

📚 Learning: 2025-11-27T09:58:41.875Z

Learnt from: CR
Repo: jumpstarter-dev/jumpstarter PR: 0
File: .cursor/rules/creating-new-drivers.mdc:0-0
Timestamp: 2025-11-27T09:58:41.875Z
Learning: Applies to packages/jumpstarter-driver-**/jumpstarter_driver_**/*.py : Driver implementations should follow existing code style validated with `make lint` (fix with `make lint-fix`), perform static type checking with `make ty-pkg-${package_name}`, add comprehensive tests, and verify all tests pass with `make test-pkg-${package_name}` or `make test`

Applied to files:

packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py

🧬 Code graph analysis (2)

packages/jumpstarter/jumpstarter/streams/encoding.py (1)

packages/jumpstarter-driver-iscsi/jumpstarter_driver_iscsi/driver.py (1)

decompress (318-357)

packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py (1)

packages/jumpstarter/jumpstarter/streams/encoding.py (1)

AutoDecompressIterator (155-216)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Redirect rules - jumpstarter-docs
GitHub Check: Header rules - jumpstarter-docs
GitHub Check: Pages changed - jumpstarter-docs
GitHub Check: build
GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
GitHub Check: e2e
GitHub Check: pytest-matrix (macos-15, 3.13)
GitHub Check: pytest-matrix (macos-15, 3.12)
GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)

🔇 Additional comments (4)

packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py (1)

44-55: LGTM! Clean integration of auto-decompression.

The flash method correctly wraps the source stream with AutoDecompressIterator and the docstring accurately describes the new decompression capability. The implementation maintains the existing streaming behavior while adding transparent decompression support.

Minor style consideration: res.__aiter__() works but aiter(res) is the idiomatic built-in (Python 3.10+). Not required to change if you prefer explicitness here.

packages/jumpstarter/jumpstarter/streams/encoding.py (3)

26-44: LGTM! Well-structured signature definitions.

The FileSignature dataclass and COMPRESSION_SIGNATURES tuple are clean and correctly define the magic bytes for each compression format. The frozen dataclass is appropriate since signatures are immutable.

47-59: LGTM! Straightforward signature detection.

The function correctly iterates through signatures and uses startswith() for matching. Returning None for unknown formats is the right approach.

13-16: No changes needed. The backports-zstd>=1.1.0 dependency is correctly declared in pyproject.toml with the version constraint python_full_version < '3.14', matching the conditional import at lines 13-16.

packages/jumpstarter/jumpstarter/streams/encoding.py

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

packages/jumpstarter/jumpstarter/streams/encoding.py (1)
200-202: Missing flush when stream exhausted during detection phase.

The flush logic at lines 209-214 only runs when StopAsyncIteration is caught in the main read loop. However, if the source stream is exhausted during _detect_compression() (small compressed file < 8 bytes), the decompressor is never flushed here.

Scenario:

Small gzip file fits entirely in buffer during detection

_exhausted = True, decompressor created

Buffer is decompressed and returned

Next call: buffer empty, _exhausted is True → raises immediately without flushing

This could cause data loss for trailing decompressed bytes.
🔎 Proposed fix
         # Stream exhausted
         if self._exhausted:
+            # Flush any remaining data from decompressor
+            if self._decompressor is not None and hasattr(self._decompressor, "flush"):
+                remaining = self._decompressor.flush()
+                self._decompressor = None
+                if remaining:
+                    return remaining
             raise StopAsyncIteration

🧹 Nitpick comments (1)

packages/jumpstarter/jumpstarter/streams/encoding.py (1)
62-72: Consider adding a default case for defensive coding.

The match statement covers all current Compression enum values, but lacks a fallback. If the enum is ever extended without updating this function, it would silently return None.
🔎 Proposed fix
 def create_decompressor(compression: Compression) -> Any:
     """Create a decompressor object for the given compression type."""
     match compression:
         case Compression.GZIP:
             return zlib.decompressobj(wbits=47)  # Auto-detect gzip/zlib
         case Compression.XZ:
             return lzma.LZMADecompressor()
         case Compression.BZ2:
             return bz2.BZ2Decompressor()
         case Compression.ZSTD:
             return zstd.ZstdDecompressor()
+        case _:
+            raise ValueError(f"Unsupported compression type: {compression}")

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa7c213 and 3bda0f6.

📒 Files selected for processing (1)

packages/jumpstarter/jumpstarter/streams/encoding.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project-structure.mdc)

Ruff should be used for code formatting and linting, excluding jumpstarter-protocol package

Files:

packages/jumpstarter/jumpstarter/streams/encoding.py

🧬 Code graph analysis (1)

packages/jumpstarter/jumpstarter/streams/encoding.py (1)

packages/jumpstarter-driver-iscsi/jumpstarter_driver_iscsi/driver.py (1)

decompress (318-357)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: Redirect rules - jumpstarter-docs
GitHub Check: Header rules - jumpstarter-docs
GitHub Check: Pages changed - jumpstarter-docs
GitHub Check: build
GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
GitHub Check: pytest-matrix (macos-15, 3.13)
GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
GitHub Check: pytest-matrix (macos-15, 3.12)
GitHub Check: pytest-matrix (macos-15, 3.11)
GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
GitHub Check: e2e

🔇 Additional comments (6)

packages/jumpstarter/jumpstarter/streams/encoding.py (6)

26-31: LGTM!

The FileSignature frozen dataclass is a clean, immutable value object for associating magic bytes with compression types.

34-44: LGTM!

Magic bytes are correct for all four compression formats, and SIGNATURE_BUFFER_SIZE = 8 adequately covers the longest signature (xz at 6 bytes).

47-59: LGTM!

The detection logic is clean and handles edge cases gracefully—startswith correctly handles buffers shorter than the signature.

154-185: LGTM!

The detection phase correctly buffers initial bytes and handles early stream exhaustion. The class design with clear internal state fields is well-structured.

207-219: Flush logic for main loop is correctly implemented.

The handling here properly flushes the decompressor, guards against double-flush by setting _decompressor = None, and returns remaining bytes before raising StopAsyncIteration. Good implementation.

71-72: The ZstdDecompressor() usage is compatible across both backports.zstd and Python 3.14's compression.zstd. The backports.zstd package is explicitly designed as a backport of compression.zstd (PEP 784) and exposes the same API, including the ZstdDecompressor class with identical decompress(data) method signature for incremental decompression. Both libraries support the same streaming decompression behavior used in this code and in CompressedStream.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

packages/jumpstarter/jumpstarter/streams/encoding_test.py (2)
153-155: Consider a more idiomatic empty async generator.

The if False: yield pattern works but is unconventional. A clearer alternative would be:
async def empty_iter():
    return
    yield  # Make it a generator
Or simply:
async def empty_iter():
    pass
    yield
100-166: Consider adding tests for malformed compressed data.

The current test suite comprehensively covers happy paths and basic edge cases. Consider adding tests for error handling scenarios:

Malformed/corrupted compressed data (e.g., valid signature but invalid compression stream)

Truncated compressed data (stream ends mid-decompression)

Data with valid signature but incorrect compression format

These tests would verify that the AutoDecompressIterator handles decompression errors gracefully and fails with appropriate error messages.
Example test cases to add
async def test_malformed_gzip(self):
    """Malformed gzip data should raise an error."""
    # Valid gzip signature followed by random data
    malformed = b"\x1f\x8b\x08" + os.urandom(20)
    with pytest.raises(Exception):  # Adjust exception type as needed
        await self._decompress_and_check(malformed, b"", chunk_size=16)

async def test_truncated_compressed_data(self):
    """Truncated compressed stream should raise an error."""
    original = b"hello world" * 100
    compressed = gzip.compress(original)
    truncated = compressed[:len(compressed) // 2]  # Cut in half
    with pytest.raises(Exception):  # Adjust exception type as needed
        await self._decompress_and_check(truncated, original, chunk_size=16)

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3bda0f6 and ed90de4.

📒 Files selected for processing (1)

packages/jumpstarter/jumpstarter/streams/encoding_test.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project-structure.mdc)

Ruff should be used for code formatting and linting, excluding jumpstarter-protocol package

Files:

packages/jumpstarter/jumpstarter/streams/encoding_test.py

🧬 Code graph analysis (1)

packages/jumpstarter/jumpstarter/streams/encoding_test.py (1)

packages/jumpstarter/jumpstarter/streams/encoding.py (4)

AutoDecompressIterator (155-222)

Compression (19-23)

compress_stream (124-151)

detect_compression_from_signature (47-59)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: Redirect rules - jumpstarter-docs
GitHub Check: Header rules - jumpstarter-docs
GitHub Check: Pages changed - jumpstarter-docs
GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
GitHub Check: pytest-matrix (macos-15, 3.12)
GitHub Check: pytest-matrix (macos-15, 3.13)
GitHub Check: pytest-matrix (macos-15, 3.11)
GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
GitHub Check: e2e
GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
GitHub Check: build

🔇 Additional comments (2)

packages/jumpstarter/jumpstarter/streams/encoding_test.py (2)

49-54: LGTM!

The helper function provides a clean abstraction for retrieving compression signatures in tests.

57-98: LGTM!

Comprehensive test coverage for signature detection including all compression formats and edge cases (empty input, truncated signatures, uncompressed data, and real compressed data).

evakhoni · 2025-12-30T12:17:54Z

cc @mangelajo @bennyz @bkhizgiy for review. thanks!

bkhizgiy · 2025-12-30T14:37:56Z

packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py

            async with self.resource(source) as res:
-                async for chunk in res:
+                # Wrap with auto-decompression to handle .gz, .xz, .bz2, .zstd files
+                async for chunk in AutoDecompressIterator(source=res.__aiter__()):


I think we can use res directly here, without the __aiter__, I think it's more clearer and consistent since it's already an async iterator, so there's no need to set it explicitly.

bkhizgiy · 2025-12-30T14:47:28Z

packages/jumpstarter/jumpstarter/streams/encoding.py

+            data = self._buffer
+            self._buffer = b""
+            if self._decompressor is not None:
+                return self._decompressor.decompress(data)


I would consider improving a bit the error handling, the decompress() method is called without try/except blocks, which means decompression failures will propagate as raw exceptions with technical error messages that aren't user-friendly and may be a bit unclear when trying to understand the issue. I would cover the main cases and add a more intuitive error for the user.

decompress by signature

fa7c213

coderabbitai bot reviewed Dec 29, 2025

View reviewed changes

packages/jumpstarter/jumpstarter/streams/encoding.py Show resolved Hide resolved

flush decompressor at end-of-stream

3bda0f6

coderabbitai bot reviewed Dec 29, 2025

View reviewed changes

unit tests

ed90de4

coderabbitai bot reviewed Dec 29, 2025

View reviewed changes

bkhizgiy self-requested a review December 30, 2025 11:06

bkhizgiy reviewed Dec 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add compressed image support for QEMU driver #789

Add compressed image support for QEMU driver #789

evakhoni commented Dec 29, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

netlify bot commented Dec 29, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Dec 29, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

evakhoni commented Dec 30, 2025

Uh oh!

bkhizgiy Dec 30, 2025

Uh oh!

bkhizgiy Dec 30, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add compressed image support for QEMU driver #789

Are you sure you want to change the base?

Add compressed image support for QEMU driver #789

Conversation

evakhoni commented Dec 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

How it works

Summary by CodeRabbit

Uh oh!

netlify bot commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for jumpstarter-docs ready!

Uh oh!

coderabbitai bot commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

evakhoni commented Dec 30, 2025

Uh oh!

bkhizgiy Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

bkhizgiy Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

evakhoni commented Dec 29, 2025 •

edited by coderabbitai bot

Loading

netlify bot commented Dec 29, 2025 •

edited

Loading

coderabbitai bot commented Dec 29, 2025 •

edited

Loading

bkhizgiy Dec 30, 2025 •

edited

Loading