-
Notifications
You must be signed in to change notification settings - Fork 21
Add compressed image support for QEMU driver #789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for jumpstarter-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
📝 WalkthroughWalkthroughAdds file-signature based compression detection and an async AutoDecompressIterator (gzip, xz, bz2, zstd), and integrates it into QemuFlasher.flash so input image streams are transparently decompressed before writing to the target. Changes
Sequence Diagram(s)sequenceDiagram
participant Qemu as QemuFlasher.flash()
participant Stream as Input AsyncIterator
participant AutoDec as AutoDecompressIterator
participant Factory as create_decompressor()
participant Target as Target Stream/Writer
Qemu->>AutoDec: wrap(Stream)
Note over AutoDec: buffer initial bytes (SIGNATURE_BUFFER_SIZE)
AutoDec->>AutoDec: inspect buffered signature
alt compression detected
AutoDec->>Factory: request decompressor(type)
Factory-->>AutoDec: decompressor
AutoDec->>AutoDec: decompress buffered + incoming chunks
AutoDec-->>Target: yield decompressed chunks
else no compression
AutoDec-->>Target: yield original chunks (passthrough)
end
Note over Target: write to device/image
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/jumpstarter/jumpstarter/streams/encoding.py (1)
62-72: Consider adding explicit handling for unmatched compression types.The function relies on exhaustive pattern matching but lacks a default case. If a new
Compressionenum value is added without updating this function, it would silently returnNone.🔎 Proposed improvement
def create_decompressor(compression: Compression) -> Any: """Create a decompressor object for the given compression type.""" match compression: case Compression.GZIP: return zlib.decompressobj(wbits=47) # Auto-detect gzip/zlib case Compression.XZ: return lzma.LZMADecompressor() case Compression.BZ2: return bz2.BZ2Decompressor() case Compression.ZSTD: return zstd.ZstdDecompressor() + case _: + raise ValueError(f"Unsupported compression type: {compression}")
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.pypackages/jumpstarter/jumpstarter/streams/encoding.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/project-structure.mdc)
Ruff should be used for code formatting and linting, excluding
jumpstarter-protocolpackage
Files:
packages/jumpstarter/jumpstarter/streams/encoding.pypackages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py
packages/jumpstarter-driver-*/jumpstarter_driver_*/driver.py
📄 CodeRabbit inference engine (.cursor/rules/creating-new-drivers.mdc)
Driver class names should be in CamelCase and be descriptive with appropriate suffixes based on functionality: Power drivers should end with
*Power, Network drivers with*Network, Flasher drivers with*Flasher, Console drivers with*Console, Server drivers with*Server
Files:
packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py
packages/jumpstarter-driver-**/jumpstarter_driver_**/*.py
📄 CodeRabbit inference engine (.cursor/rules/creating-new-drivers.mdc)
Driver implementations should follow existing code style validated with
make lint(fix withmake lint-fix), perform static type checking withmake ty-pkg-${package_name}, add comprehensive tests, and verify all tests pass withmake test-pkg-${package_name}ormake test
Files:
packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py
🧠 Learnings (1)
📚 Learning: 2025-11-27T09:58:41.875Z
Learnt from: CR
Repo: jumpstarter-dev/jumpstarter PR: 0
File: .cursor/rules/creating-new-drivers.mdc:0-0
Timestamp: 2025-11-27T09:58:41.875Z
Learning: Applies to packages/jumpstarter-driver-**/jumpstarter_driver_**/*.py : Driver implementations should follow existing code style validated with `make lint` (fix with `make lint-fix`), perform static type checking with `make ty-pkg-${package_name}`, add comprehensive tests, and verify all tests pass with `make test-pkg-${package_name}` or `make test`
Applied to files:
packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py
🧬 Code graph analysis (2)
packages/jumpstarter/jumpstarter/streams/encoding.py (1)
packages/jumpstarter-driver-iscsi/jumpstarter_driver_iscsi/driver.py (1)
decompress(318-357)
packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py (1)
packages/jumpstarter/jumpstarter/streams/encoding.py (1)
AutoDecompressIterator(155-216)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Redirect rules - jumpstarter-docs
- GitHub Check: Header rules - jumpstarter-docs
- GitHub Check: Pages changed - jumpstarter-docs
- GitHub Check: build
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
- GitHub Check: e2e
- GitHub Check: pytest-matrix (macos-15, 3.13)
- GitHub Check: pytest-matrix (macos-15, 3.12)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
🔇 Additional comments (4)
packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py (1)
44-55: LGTM! Clean integration of auto-decompression.The flash method correctly wraps the source stream with
AutoDecompressIteratorand the docstring accurately describes the new decompression capability. The implementation maintains the existing streaming behavior while adding transparent decompression support.Minor style consideration:
res.__aiter__()works butaiter(res)is the idiomatic built-in (Python 3.10+). Not required to change if you prefer explicitness here.packages/jumpstarter/jumpstarter/streams/encoding.py (3)
26-44: LGTM! Well-structured signature definitions.The
FileSignaturedataclass andCOMPRESSION_SIGNATUREStuple are clean and correctly define the magic bytes for each compression format. The frozen dataclass is appropriate since signatures are immutable.
47-59: LGTM! Straightforward signature detection.The function correctly iterates through signatures and uses
startswith()for matching. ReturningNonefor unknown formats is the right approach.
13-16: No changes needed. Thebackports-zstd>=1.1.0dependency is correctly declared inpyproject.tomlwith the version constraintpython_full_version < '3.14', matching the conditional import at lines 13-16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
packages/jumpstarter/jumpstarter/streams/encoding.py (1)
200-202: Missing flush when stream exhausted during detection phase.The flush logic at lines 209-214 only runs when
StopAsyncIterationis caught in the main read loop. However, if the source stream is exhausted during_detect_compression()(small compressed file < 8 bytes), the decompressor is never flushed here.Scenario:
- Small gzip file fits entirely in buffer during detection
_exhausted = True, decompressor created- Buffer is decompressed and returned
- Next call: buffer empty,
_exhaustedis True → raises immediately without flushingThis could cause data loss for trailing decompressed bytes.
🔎 Proposed fix
# Stream exhausted if self._exhausted: + # Flush any remaining data from decompressor + if self._decompressor is not None and hasattr(self._decompressor, "flush"): + remaining = self._decompressor.flush() + self._decompressor = None + if remaining: + return remaining raise StopAsyncIteration
🧹 Nitpick comments (1)
packages/jumpstarter/jumpstarter/streams/encoding.py (1)
62-72: Consider adding a default case for defensive coding.The match statement covers all current
Compressionenum values, but lacks a fallback. If the enum is ever extended without updating this function, it would silently returnNone.🔎 Proposed fix
def create_decompressor(compression: Compression) -> Any: """Create a decompressor object for the given compression type.""" match compression: case Compression.GZIP: return zlib.decompressobj(wbits=47) # Auto-detect gzip/zlib case Compression.XZ: return lzma.LZMADecompressor() case Compression.BZ2: return bz2.BZ2Decompressor() case Compression.ZSTD: return zstd.ZstdDecompressor() + case _: + raise ValueError(f"Unsupported compression type: {compression}")
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/jumpstarter/jumpstarter/streams/encoding.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/project-structure.mdc)
Ruff should be used for code formatting and linting, excluding
jumpstarter-protocolpackage
Files:
packages/jumpstarter/jumpstarter/streams/encoding.py
🧬 Code graph analysis (1)
packages/jumpstarter/jumpstarter/streams/encoding.py (1)
packages/jumpstarter-driver-iscsi/jumpstarter_driver_iscsi/driver.py (1)
decompress(318-357)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Redirect rules - jumpstarter-docs
- GitHub Check: Header rules - jumpstarter-docs
- GitHub Check: Pages changed - jumpstarter-docs
- GitHub Check: build
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
- GitHub Check: pytest-matrix (macos-15, 3.13)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
- GitHub Check: pytest-matrix (macos-15, 3.12)
- GitHub Check: pytest-matrix (macos-15, 3.11)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
- GitHub Check: e2e
🔇 Additional comments (6)
packages/jumpstarter/jumpstarter/streams/encoding.py (6)
26-31: LGTM!The
FileSignaturefrozen dataclass is a clean, immutable value object for associating magic bytes with compression types.
34-44: LGTM!Magic bytes are correct for all four compression formats, and
SIGNATURE_BUFFER_SIZE = 8adequately covers the longest signature (xz at 6 bytes).
47-59: LGTM!The detection logic is clean and handles edge cases gracefully—
startswithcorrectly handles buffers shorter than the signature.
154-185: LGTM!The detection phase correctly buffers initial bytes and handles early stream exhaustion. The class design with clear internal state fields is well-structured.
207-219: Flush logic for main loop is correctly implemented.The handling here properly flushes the decompressor, guards against double-flush by setting
_decompressor = None, and returns remaining bytes before raisingStopAsyncIteration. Good implementation.
71-72: TheZstdDecompressor()usage is compatible across bothbackports.zstdand Python 3.14'scompression.zstd. Thebackports.zstdpackage is explicitly designed as a backport ofcompression.zstd(PEP 784) and exposes the same API, including theZstdDecompressorclass with identicaldecompress(data)method signature for incremental decompression. Both libraries support the same streaming decompression behavior used in this code and inCompressedStream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
packages/jumpstarter/jumpstarter/streams/encoding_test.py (2)
153-155: Consider a more idiomatic empty async generator.The
if False: yieldpattern works but is unconventional. A clearer alternative would be:async def empty_iter(): return yield # Make it a generatorOr simply:
async def empty_iter(): pass yield
100-166: Consider adding tests for malformed compressed data.The current test suite comprehensively covers happy paths and basic edge cases. Consider adding tests for error handling scenarios:
- Malformed/corrupted compressed data (e.g., valid signature but invalid compression stream)
- Truncated compressed data (stream ends mid-decompression)
- Data with valid signature but incorrect compression format
These tests would verify that the
AutoDecompressIteratorhandles decompression errors gracefully and fails with appropriate error messages.Example test cases to add
async def test_malformed_gzip(self): """Malformed gzip data should raise an error.""" # Valid gzip signature followed by random data malformed = b"\x1f\x8b\x08" + os.urandom(20) with pytest.raises(Exception): # Adjust exception type as needed await self._decompress_and_check(malformed, b"", chunk_size=16) async def test_truncated_compressed_data(self): """Truncated compressed stream should raise an error.""" original = b"hello world" * 100 compressed = gzip.compress(original) truncated = compressed[:len(compressed) // 2] # Cut in half with pytest.raises(Exception): # Adjust exception type as needed await self._decompress_and_check(truncated, original, chunk_size=16)
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/jumpstarter/jumpstarter/streams/encoding_test.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/project-structure.mdc)
Ruff should be used for code formatting and linting, excluding
jumpstarter-protocolpackage
Files:
packages/jumpstarter/jumpstarter/streams/encoding_test.py
🧬 Code graph analysis (1)
packages/jumpstarter/jumpstarter/streams/encoding_test.py (1)
packages/jumpstarter/jumpstarter/streams/encoding.py (4)
AutoDecompressIterator(155-222)Compression(19-23)compress_stream(124-151)detect_compression_from_signature(47-59)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Redirect rules - jumpstarter-docs
- GitHub Check: Header rules - jumpstarter-docs
- GitHub Check: Pages changed - jumpstarter-docs
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
- GitHub Check: pytest-matrix (macos-15, 3.12)
- GitHub Check: pytest-matrix (macos-15, 3.13)
- GitHub Check: pytest-matrix (macos-15, 3.11)
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
- GitHub Check: e2e
- GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
- GitHub Check: build
🔇 Additional comments (2)
packages/jumpstarter/jumpstarter/streams/encoding_test.py (2)
49-54: LGTM!The helper function provides a clean abstraction for retrieving compression signatures in tests.
57-98: LGTM!Comprehensive test coverage for signature detection including all compression formats and edge cases (empty input, truncated signatures, uncompressed data, and real compressed data).
|
cc @mangelajo @bennyz @bkhizgiy for review. thanks! |
| async with self.resource(source) as res: | ||
| async for chunk in res: | ||
| # Wrap with auto-decompression to handle .gz, .xz, .bz2, .zstd files | ||
| async for chunk in AutoDecompressIterator(source=res.__aiter__()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use res directly here, without the __aiter__, I think it's more clearer and consistent since it's already an async iterator, so there's no need to set it explicitly.
| data = self._buffer | ||
| self._buffer = b"" | ||
| if self._decompressor is not None: | ||
| return self._decompressor.decompress(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider improving a bit the error handling, the decompress() method is called without try/except blocks, which means decompression failures will propagate as raw exceptions with technical error messages that aren't user-friendly and may be a bit unclear when trying to understand the issue. I would cover the main cases and add a more intuitive error for the user.
Closes #737
Summary
Adds transparent decompression support for compressed disk images when flashing to the QEMU driver. Compressed images (
.gz,.xz,.bz2,.zstd) are automatically detected and decompressed on the fly.Changes
packages/jumpstarter/jumpstarter/streams/encoding.pyFileSignaturedataclass to represent compression format signaturesCOMPRESSION_SIGNATUREStuple with file signatures for gzip, xz, bz2, and zstddetect_compression_from_signature()function for auto-detectioncreate_decompressor()helper functionAutoDecompressIteratorasync iterator that wraps a byte stream and transparently decompresses if neededpackages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.pyQemuFlasher.flash()to wrap the source stream withAutoDecompressIteratorHow it works
flash()is called, the first 8 bytes are bufferedSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.