Skip to content

Conversation

@evakhoni
Copy link
Contributor

@evakhoni evakhoni commented Dec 29, 2025

Closes #737

Summary

Adds transparent decompression support for compressed disk images when flashing to the QEMU driver. Compressed images (.gz, .xz, .bz2, .zstd) are automatically detected and decompressed on the fly.

Changes

packages/jumpstarter/jumpstarter/streams/encoding.py

  • Added FileSignature dataclass to represent compression format signatures
  • Added COMPRESSION_SIGNATURES tuple with file signatures for gzip, xz, bz2, and zstd
  • Added detect_compression_from_signature() function for auto-detection
  • Added create_decompressor() helper function
  • Added AutoDecompressIterator async iterator that wraps a byte stream and transparently decompresses if needed

packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py

  • Modified QemuFlasher.flash() to wrap the source stream with AutoDecompressIterator

How it works

  1. When flash() is called, the first 8 bytes are buffered
  2. File signature detection matches against known compression formats
  3. If compressed: a decompressor is created and chunks are decompressed on the fly
  4. If uncompressed: data passes through unchanged

Summary by CodeRabbit

  • New Features
    • Flash operations now transparently accept and decompress images in gzip, xz, bz2, and zstd formats, allowing compressed files to be flashed without manual pre-processing.
  • Documentation
    • Flash behavior and supported decompression formats are documented in the flash operation help text.
  • Tests
    • Added comprehensive tests covering compression detection and automatic decompression behavior across formats and stream scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

@netlify
Copy link

netlify bot commented Dec 29, 2025

Deploy Preview for jumpstarter-docs ready!

Name Link
🔨 Latest commit ed90de4
🔍 Latest deploy log https://app.netlify.com/projects/jumpstarter-docs/deploys/6952b8914837b8000802e3ae
😎 Deploy Preview https://deploy-preview-789--jumpstarter-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 29, 2025

📝 Walkthrough

Walkthrough

Adds file-signature based compression detection and an async AutoDecompressIterator (gzip, xz, bz2, zstd), and integrates it into QemuFlasher.flash so input image streams are transparently decompressed before writing to the target.

Changes

Cohort / File(s) Summary
Compression Detection & Decompression Infrastructure
packages/jumpstarter/jumpstarter/streams/encoding.py
Added FileSignature dataclass, COMPRESSION_SIGNATURES, SIGNATURE_BUFFER_SIZE, and detect_compression_from_signature(). Added create_decompressor() factory and implemented AutoDecompressIterator (async iterator) that buffers initial bytes, detects format, and yields decompressed or passthrough data for gzip/xz/bz2/zstd.
QEMU Driver Integration
packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py
Wrapped the input stream in AutoDecompressIterator inside QemuFlasher.flash() and added a docstring describing decompression behavior. No public signatures changed.
Tests
packages/jumpstarter/jumpstarter/streams/encoding_test.py
Added tests for signature detection and AutoDecompressIterator (gzip, xz, bz2, zstd, passthrough, empty/small chunks, large data); adjusted zstd import handling.

Sequence Diagram(s)

sequenceDiagram
    participant Qemu as QemuFlasher.flash()
    participant Stream as Input AsyncIterator
    participant AutoDec as AutoDecompressIterator
    participant Factory as create_decompressor()
    participant Target as Target Stream/Writer

    Qemu->>AutoDec: wrap(Stream)
    Note over AutoDec: buffer initial bytes (SIGNATURE_BUFFER_SIZE)
    AutoDec->>AutoDec: inspect buffered signature

    alt compression detected
        AutoDec->>Factory: request decompressor(type)
        Factory-->>AutoDec: decompressor
        AutoDec->>AutoDec: decompress buffered + incoming chunks
        AutoDec-->>Target: yield decompressed chunks
    else no compression
        AutoDec-->>Target: yield original chunks (passthrough)
    end

    Note over Target: write to device/image
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • mangelajo

Poem

🐰 I sniffed the bits that hide and bind,

I peeked the header, found the kind.
I hop, I buffer, then unwind—
Gzip, xz, bz2, zstd, all resigned.
Streams flow light; the flash runs kind.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.17% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add compressed image support for QEMU driver' directly and clearly describes the main change: adding support for compressed images in the QEMU driver.
Linked Issues check ✅ Passed The PR implements transparent decompression for gzip, xz, bzip2, and zstd formats through FileSignature detection and AutoDecompressIterator, meeting the core coding requirement of supporting .gz and .xz packed images for the QEMU driver.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing compressed image support: decompression utilities in encoding.py, AutoDecompressIterator integration in QEMU driver, and comprehensive tests for the new functionality.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/jumpstarter/jumpstarter/streams/encoding.py (1)

62-72: Consider adding explicit handling for unmatched compression types.

The function relies on exhaustive pattern matching but lacks a default case. If a new Compression enum value is added without updating this function, it would silently return None.

🔎 Proposed improvement
 def create_decompressor(compression: Compression) -> Any:
     """Create a decompressor object for the given compression type."""
     match compression:
         case Compression.GZIP:
             return zlib.decompressobj(wbits=47)  # Auto-detect gzip/zlib
         case Compression.XZ:
             return lzma.LZMADecompressor()
         case Compression.BZ2:
             return bz2.BZ2Decompressor()
         case Compression.ZSTD:
             return zstd.ZstdDecompressor()
+        case _:
+            raise ValueError(f"Unsupported compression type: {compression}")
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b05be84 and fa7c213.

📒 Files selected for processing (2)
  • packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py
  • packages/jumpstarter/jumpstarter/streams/encoding.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project-structure.mdc)

Ruff should be used for code formatting and linting, excluding jumpstarter-protocol package

Files:

  • packages/jumpstarter/jumpstarter/streams/encoding.py
  • packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py
packages/jumpstarter-driver-*/jumpstarter_driver_*/driver.py

📄 CodeRabbit inference engine (.cursor/rules/creating-new-drivers.mdc)

Driver class names should be in CamelCase and be descriptive with appropriate suffixes based on functionality: Power drivers should end with *Power, Network drivers with *Network, Flasher drivers with *Flasher, Console drivers with *Console, Server drivers with *Server

Files:

  • packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py
packages/jumpstarter-driver-**/jumpstarter_driver_**/*.py

📄 CodeRabbit inference engine (.cursor/rules/creating-new-drivers.mdc)

Driver implementations should follow existing code style validated with make lint (fix with make lint-fix), perform static type checking with make ty-pkg-${package_name}, add comprehensive tests, and verify all tests pass with make test-pkg-${package_name} or make test

Files:

  • packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py
🧠 Learnings (1)
📚 Learning: 2025-11-27T09:58:41.875Z
Learnt from: CR
Repo: jumpstarter-dev/jumpstarter PR: 0
File: .cursor/rules/creating-new-drivers.mdc:0-0
Timestamp: 2025-11-27T09:58:41.875Z
Learning: Applies to packages/jumpstarter-driver-**/jumpstarter_driver_**/*.py : Driver implementations should follow existing code style validated with `make lint` (fix with `make lint-fix`), perform static type checking with `make ty-pkg-${package_name}`, add comprehensive tests, and verify all tests pass with `make test-pkg-${package_name}` or `make test`

Applied to files:

  • packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py
🧬 Code graph analysis (2)
packages/jumpstarter/jumpstarter/streams/encoding.py (1)
packages/jumpstarter-driver-iscsi/jumpstarter_driver_iscsi/driver.py (1)
  • decompress (318-357)
packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py (1)
packages/jumpstarter/jumpstarter/streams/encoding.py (1)
  • AutoDecompressIterator (155-216)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Redirect rules - jumpstarter-docs
  • GitHub Check: Header rules - jumpstarter-docs
  • GitHub Check: Pages changed - jumpstarter-docs
  • GitHub Check: build
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: e2e
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: pytest-matrix (macos-15, 3.12)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
🔇 Additional comments (4)
packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py (1)

44-55: LGTM! Clean integration of auto-decompression.

The flash method correctly wraps the source stream with AutoDecompressIterator and the docstring accurately describes the new decompression capability. The implementation maintains the existing streaming behavior while adding transparent decompression support.

Minor style consideration: res.__aiter__() works but aiter(res) is the idiomatic built-in (Python 3.10+). Not required to change if you prefer explicitness here.

packages/jumpstarter/jumpstarter/streams/encoding.py (3)

26-44: LGTM! Well-structured signature definitions.

The FileSignature dataclass and COMPRESSION_SIGNATURES tuple are clean and correctly define the magic bytes for each compression format. The frozen dataclass is appropriate since signatures are immutable.


47-59: LGTM! Straightforward signature detection.

The function correctly iterates through signatures and uses startswith() for matching. Returning None for unknown formats is the right approach.


13-16: No changes needed. The backports-zstd>=1.1.0 dependency is correctly declared in pyproject.toml with the version constraint python_full_version < '3.14', matching the conditional import at lines 13-16.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
packages/jumpstarter/jumpstarter/streams/encoding.py (1)

200-202: Missing flush when stream exhausted during detection phase.

The flush logic at lines 209-214 only runs when StopAsyncIteration is caught in the main read loop. However, if the source stream is exhausted during _detect_compression() (small compressed file < 8 bytes), the decompressor is never flushed here.

Scenario:

  1. Small gzip file fits entirely in buffer during detection
  2. _exhausted = True, decompressor created
  3. Buffer is decompressed and returned
  4. Next call: buffer empty, _exhausted is True → raises immediately without flushing

This could cause data loss for trailing decompressed bytes.

🔎 Proposed fix
         # Stream exhausted
         if self._exhausted:
+            # Flush any remaining data from decompressor
+            if self._decompressor is not None and hasattr(self._decompressor, "flush"):
+                remaining = self._decompressor.flush()
+                self._decompressor = None
+                if remaining:
+                    return remaining
             raise StopAsyncIteration
🧹 Nitpick comments (1)
packages/jumpstarter/jumpstarter/streams/encoding.py (1)

62-72: Consider adding a default case for defensive coding.

The match statement covers all current Compression enum values, but lacks a fallback. If the enum is ever extended without updating this function, it would silently return None.

🔎 Proposed fix
 def create_decompressor(compression: Compression) -> Any:
     """Create a decompressor object for the given compression type."""
     match compression:
         case Compression.GZIP:
             return zlib.decompressobj(wbits=47)  # Auto-detect gzip/zlib
         case Compression.XZ:
             return lzma.LZMADecompressor()
         case Compression.BZ2:
             return bz2.BZ2Decompressor()
         case Compression.ZSTD:
             return zstd.ZstdDecompressor()
+        case _:
+            raise ValueError(f"Unsupported compression type: {compression}")
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa7c213 and 3bda0f6.

📒 Files selected for processing (1)
  • packages/jumpstarter/jumpstarter/streams/encoding.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project-structure.mdc)

Ruff should be used for code formatting and linting, excluding jumpstarter-protocol package

Files:

  • packages/jumpstarter/jumpstarter/streams/encoding.py
🧬 Code graph analysis (1)
packages/jumpstarter/jumpstarter/streams/encoding.py (1)
packages/jumpstarter-driver-iscsi/jumpstarter_driver_iscsi/driver.py (1)
  • decompress (318-357)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Redirect rules - jumpstarter-docs
  • GitHub Check: Header rules - jumpstarter-docs
  • GitHub Check: Pages changed - jumpstarter-docs
  • GitHub Check: build
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
  • GitHub Check: pytest-matrix (macos-15, 3.12)
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: e2e
🔇 Additional comments (6)
packages/jumpstarter/jumpstarter/streams/encoding.py (6)

26-31: LGTM!

The FileSignature frozen dataclass is a clean, immutable value object for associating magic bytes with compression types.


34-44: LGTM!

Magic bytes are correct for all four compression formats, and SIGNATURE_BUFFER_SIZE = 8 adequately covers the longest signature (xz at 6 bytes).


47-59: LGTM!

The detection logic is clean and handles edge cases gracefully—startswith correctly handles buffers shorter than the signature.


154-185: LGTM!

The detection phase correctly buffers initial bytes and handles early stream exhaustion. The class design with clear internal state fields is well-structured.


207-219: Flush logic for main loop is correctly implemented.

The handling here properly flushes the decompressor, guards against double-flush by setting _decompressor = None, and returns remaining bytes before raising StopAsyncIteration. Good implementation.


71-72: The ZstdDecompressor() usage is compatible across both backports.zstd and Python 3.14's compression.zstd. The backports.zstd package is explicitly designed as a backport of compression.zstd (PEP 784) and exposes the same API, including the ZstdDecompressor class with identical decompress(data) method signature for incremental decompression. Both libraries support the same streaming decompression behavior used in this code and in CompressedStream.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
packages/jumpstarter/jumpstarter/streams/encoding_test.py (2)

153-155: Consider a more idiomatic empty async generator.

The if False: yield pattern works but is unconventional. A clearer alternative would be:

async def empty_iter():
    return
    yield  # Make it a generator

Or simply:

async def empty_iter():
    pass
    yield

100-166: Consider adding tests for malformed compressed data.

The current test suite comprehensively covers happy paths and basic edge cases. Consider adding tests for error handling scenarios:

  • Malformed/corrupted compressed data (e.g., valid signature but invalid compression stream)
  • Truncated compressed data (stream ends mid-decompression)
  • Data with valid signature but incorrect compression format

These tests would verify that the AutoDecompressIterator handles decompression errors gracefully and fails with appropriate error messages.

Example test cases to add
async def test_malformed_gzip(self):
    """Malformed gzip data should raise an error."""
    # Valid gzip signature followed by random data
    malformed = b"\x1f\x8b\x08" + os.urandom(20)
    with pytest.raises(Exception):  # Adjust exception type as needed
        await self._decompress_and_check(malformed, b"", chunk_size=16)

async def test_truncated_compressed_data(self):
    """Truncated compressed stream should raise an error."""
    original = b"hello world" * 100
    compressed = gzip.compress(original)
    truncated = compressed[:len(compressed) // 2]  # Cut in half
    with pytest.raises(Exception):  # Adjust exception type as needed
        await self._decompress_and_check(truncated, original, chunk_size=16)
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3bda0f6 and ed90de4.

📒 Files selected for processing (1)
  • packages/jumpstarter/jumpstarter/streams/encoding_test.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project-structure.mdc)

Ruff should be used for code formatting and linting, excluding jumpstarter-protocol package

Files:

  • packages/jumpstarter/jumpstarter/streams/encoding_test.py
🧬 Code graph analysis (1)
packages/jumpstarter/jumpstarter/streams/encoding_test.py (1)
packages/jumpstarter/jumpstarter/streams/encoding.py (4)
  • AutoDecompressIterator (155-222)
  • Compression (19-23)
  • compress_stream (124-151)
  • detect_compression_from_signature (47-59)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Redirect rules - jumpstarter-docs
  • GitHub Check: Header rules - jumpstarter-docs
  • GitHub Check: Pages changed - jumpstarter-docs
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
  • GitHub Check: pytest-matrix (macos-15, 3.12)
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: e2e
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: build
🔇 Additional comments (2)
packages/jumpstarter/jumpstarter/streams/encoding_test.py (2)

49-54: LGTM!

The helper function provides a clean abstraction for retrieving compression signatures in tests.


57-98: LGTM!

Comprehensive test coverage for signature detection including all compression formats and edge cases (empty input, truncated signatures, uncompressed data, and real compressed data).

@bkhizgiy bkhizgiy self-requested a review December 30, 2025 11:06
@evakhoni
Copy link
Contributor Author

cc @mangelajo @bennyz @bkhizgiy for review. thanks!

async with self.resource(source) as res:
async for chunk in res:
# Wrap with auto-decompression to handle .gz, .xz, .bz2, .zstd files
async for chunk in AutoDecompressIterator(source=res.__aiter__()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use res directly here, without the __aiter__, I think it's more clearer and consistent since it's already an async iterator, so there's no need to set it explicitly.

data = self._buffer
self._buffer = b""
if self._decompressor is not None:
return self._decompressor.decompress(data)
Copy link
Contributor

@bkhizgiy bkhizgiy Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider improving a bit the error handling, the decompress() method is called without try/except blocks, which means decompression failures will propagate as raw exceptions with technical error messages that aren't user-friendly and may be a bit unclear when trying to understand the issue. I would cover the main cases and add a more intuitive error for the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

qemu driver support for .gz .xz packed images

2 participants