Skip to content

Commit 9a2db87

Browse files
feat: bytemuck::Pod + FORMAT_VERSION (v0.4.1)
Unblock UCFP's persistence story without structural breakage. - pub const FORMAT_VERSION: u32 = 1; at crate root - ImageFingerprint::format_version() / MultiHashFingerprint::format_version() accessors - bytemuck::Pod + Zeroable derives on both fingerprint types with #[repr(C)] - Compile-time layout assertions (168 / 536 bytes) — accidental drift fails build - Copy derive added (required by Pod); doc note flags the move-by-value memcpy trade-off Enables zero-copy persistence: let bytes: &[u8] = bytemuck::cast_slice(&fingerprints); UCFP's index can now mmap fingerprint blobs without serde-roundtrip.
1 parent c9a9b51 commit 9a2db87

9 files changed

Lines changed: 237 additions & 15 deletions

File tree

CHANGELOG.md

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,35 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [0.4.1] - 2026-04-27
11+
12+
### Added
13+
14+
- **`pub const FORMAT_VERSION: u32 = 1`** at crate root, plus `ImageFingerprint::format_version()` and `MultiHashFingerprint::format_version()` accessors. Persist alongside fingerprint bytes (or in a sidecar manifest) and refuse comparison across mismatched versions to guard against algorithm-version drift. The constant only changes when the algorithm output changes; layout drift is caught at compile time independently.
15+
16+
- **`bytemuck::Pod + Zeroable` derives** on `ImageFingerprint` and `MultiHashFingerprint` with `#[repr(C)]`. Enables zero-copy persistence:
17+
18+
```rust
19+
use imgfprint::MultiHashFingerprint;
20+
let bytes: &[u8] = bytemuck::cast_slice(&fingerprints);
21+
// ...write to disk / mmap / send over the wire...
22+
let back: &[MultiHashFingerprint] = bytemuck::cast_slice(bytes);
23+
```
24+
25+
Unblocks UCFP's `index` crate to store millions of fingerprints without serde-roundtrip overhead.
26+
27+
- **Compile-time layout assertions**`const _` size checks that `ImageFingerprint` is exactly 168 bytes and `MultiHashFingerprint` is exactly 536 bytes. Any accidental field reorder or padding addition fails the build, so consumers relying on `bytemuck::cast_slice` can't be silently broken.
28+
29+
### Changed
30+
31+
- **`ImageFingerprint` and `MultiHashFingerprint` now derive `Copy`** (required by `bytemuck::Pod`). Trade-off: move-by-value silently memcpys 168 / 536 bytes; prefer `&Fingerprint` borrows in hot loops where this matters. Purely additive change — nothing that compiled before fails to compile now.
32+
33+
- **`#[repr(C)]`** added to both fingerprint types for stable cross-version binary layout. The default `repr(Rust)` layout was already padding-free at 168 / 536 bytes, so this is a guarantee, not a layout change.
34+
35+
### Notes for UCFP integrators
36+
37+
Storing fingerprints: cast a `&[MultiHashFingerprint]` to `&[u8]` with `bytemuck::cast_slice` before writing to your index. Persist the `FORMAT_VERSION` constant alongside (in your manifest or as a sidecar header). On read, verify the version matches before casting back. The layout-stability assertions guarantee that within a `FORMAT_VERSION`, the byte representation is identical across builds and platforms with the same endianness.
38+
1039
## [0.4.0] - 2026-04-27
1140

1241
### Added
@@ -268,7 +297,8 @@ Per-algorithm DCT/grid/hash-bit reconfiguration (different `dct_size`, `block_gr
268297
- Semantic embeddings via external providers
269298
- Local ONNX inference (optional feature)
270299

271-
[Unreleased]: https://github.com/themankindproject/imgfprint-rs/compare/v0.4.0...HEAD
300+
[Unreleased]: https://github.com/themankindproject/imgfprint-rs/compare/v0.4.1...HEAD
301+
[0.4.1]: https://github.com/themankindproject/imgfprint-rs/compare/v0.4.0...v0.4.1
272302
[0.4.0]: https://github.com/themankindproject/imgfprint-rs/compare/v0.3.3...v0.4.0
273303
[0.3.3]: https://github.com/themankindproject/imgfprint-rs/compare/v0.3.2...v0.3.3
274304
[0.3.2]: https://github.com/themankindproject/imgfprint-rs/compare/v0.3.1...v0.3.2

Cargo.lock

Lines changed: 16 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "imgfprint"
3-
version = "0.4.0"
3+
version = "0.4.1"
44
edition = "2021"
55
description = "High-performance, deterministic image fingerprinting library"
66
license = "MIT"
@@ -32,6 +32,7 @@ rayon = { version = "1.10", optional = true }
3232
tract-onnx = { version = "0.22.1", optional = true }
3333
tracing = { version = "0.1", optional = true }
3434
subtle = "2.6"
35+
bytemuck = { version = "1.18", features = ["derive"] }
3536

3637
[features]
3738
default = ["serde", "parallel"]

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ Perfect for:
4848

4949
```toml
5050
[dependencies]
51-
imgfprint = "0.4.0"
51+
imgfprint = "0.4.1"
5252
```
5353

5454
### Feature Flags
@@ -63,13 +63,13 @@ imgfprint = "0.4.0"
6363
Minimal build (no parallel processing):
6464
```toml
6565
[dependencies]
66-
imgfprint = { version = "0.4.0", default-features = false }
66+
imgfprint = { version = "0.4.1", default-features = false }
6767
```
6868

6969
With local embeddings (requires ONNX model):
7070
```toml
7171
[dependencies]
72-
imgfprint = { version = "0.4.0", features = ["local-embedding"] }
72+
imgfprint = { version = "0.4.1", features = ["local-embedding"] }
7373
```
7474

7575
## Quick Start

USAGE.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Add the dependency to your `Cargo.toml`:
3030

3131
```toml
3232
[dependencies]
33-
imgfprint = "0.4.0"
33+
imgfprint = "0.4.1"
3434
```
3535

3636
### Basic Example (Multi-Algorithm)
@@ -852,7 +852,7 @@ With the `local-embedding` feature:
852852

853853
```toml
854854
[dependencies]
855-
imgfprint = { version = "0.4.0", features = ["local-embedding"] }
855+
imgfprint = { version = "0.4.1", features = ["local-embedding"] }
856856
```
857857

858858
```rust
@@ -1011,13 +1011,13 @@ Configure the library for your needs:
10111011
```toml
10121012
[dependencies]
10131013
# Minimal build (no parallel processing)
1014-
imgfprint = { version = "0.4.0", default-features = false }
1014+
imgfprint = { version = "0.4.1", default-features = false }
10151015

10161016
# Default (serialization + parallel processing)
1017-
imgfprint = "0.4.0"
1017+
imgfprint = "0.4.1"
10181018

10191019
# With local ONNX inference
1020-
imgfprint = { version = "0.4.0", features = ["local-embedding"] }
1020+
imgfprint = { version = "0.4.1", features = ["local-embedding"] }
10211021
```
10221022

10231023
### Available Features
@@ -1035,7 +1035,7 @@ Enable the `tracing` feature to add performance instrumentation:
10351035

10361036
```toml
10371037
[dependencies]
1038-
imgfprint = { version = "0.4.0", features = ["tracing"] }
1038+
imgfprint = { version = "0.4.1", features = ["tracing"] }
10391039
```
10401040

10411041
```rust

future.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,40 @@ Legend:
1313

1414
---
1515

16+
## 0. Next Up: 0.4.1 (UCFP-unblocking patch)
17+
18+
A small, additive patch that unblocks UCFP's persistence story without
19+
the structural breakage that 0.5.0 needs.
20+
21+
| # | Item | Priority | Effort | Status |
22+
|---|------|----------|--------|--------|
23+
| 0.1 | **`pub const FORMAT_VERSION: u32`** at crate root + `MultiHashFingerprint::format_version()` accessor returning the same constant | P0 | S | planned |
24+
| 0.2 | **`bytemuck::Pod + Zeroable`** derives on `ImageFingerprint` / `MultiHashFingerprint` with `#[repr(C)]` for stable layout | P0 | S | planned |
25+
| 0.3 | **Zero-copy persistence example**`bytemuck::cast_slice(&fingerprints)``&[u8]` for UCFP's mmap'd index | P0 | S | planned |
26+
| 0.4 | **Layout-stability test**`static_assertions::const_assert_eq!(size_of::<MultiHashFingerprint>(), 536)` so any accidental layout drift is caught at compile time | P1 | S | planned |
27+
28+
Why this set: UCFP's `index` crate stores millions of fingerprints. Today
29+
serializing them goes through serde-roundtrip (slow, allocates).
30+
`bytemuck::cast_slice` gives a zero-copy `&[u8]` view, mmap-friendly. The
31+
`FORMAT_VERSION` constant lets UCFP refuse cross-version compares without
32+
embedding a version field in every fingerprint (which would change layout
33+
and balloon storage).
34+
35+
Design choice: using full `Pod + Zeroable`. The earlier plan was to use
36+
`NoUninit + AnyBitPattern + Zeroable` to avoid `Copy`, but in current
37+
`bytemuck` (1.18+) those traits also require `Copy`. So `Copy` is
38+
unavoidable if we want any `cast_slice` capability. Trade-off accepted:
39+
move-by-value memcpys 168 / 536 bytes; `&Fingerprint` borrows in hot
40+
loops avoid this. If we ever want to drop `Copy`, we'd switch to the
41+
`zerocopy` crate (different trait shape) — left as a future option.
42+
43+
Not in this patch:
44+
- Removing the redundant per-algorithm `exact: [u8; 32]` field from inner
45+
`ImageFingerprint`s (saves 96 bytes per `MultiHashFingerprint`). It's a
46+
layout break and changes `Hash` derivation; deferred to 0.5.0.
47+
48+
---
49+
1650
## 1. Configurability Completeness
1751

1852
The 0.4.0 release tuned every weight and decode-time guard. What's left

src/core/fingerprint.rs

Lines changed: 126 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -83,9 +83,27 @@ impl Default for MultiHashConfig {
8383
/// Fingerprints are deterministic and comparable across platforms. The structure
8484
/// includes exact hashing for identical detection and perceptual hashing for
8585
/// similarity detection with resistance to resizing, compression, and cropping.
86+
///
87+
/// # Binary layout
88+
///
89+
/// `#[repr(C)]` with no padding bytes (168 bytes total: 32 + 8 + 128). Implements
90+
/// [`bytemuck::Pod`] / [`bytemuck::Zeroable`] so a `&[ImageFingerprint]` can be
91+
/// zero-copy cast to `&[u8]` for mmap-based persistence:
92+
///
93+
/// ```rust
94+
/// use imgfprint::ImageFingerprint;
95+
/// # fn ex(fps: &[ImageFingerprint]) -> &[u8] {
96+
/// bytemuck::cast_slice(fps)
97+
/// # }
98+
/// ```
99+
///
100+
/// `Copy` is derived because `bytemuck::Pod` requires it; the trade-off is
101+
/// that move-by-value silently memcpys 168 bytes. Prefer borrowing
102+
/// (`&ImageFingerprint`) in hot loops where this matters.
86103
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
87104
#[cfg_attr(feature = "serde", serde(deny_unknown_fields))]
88-
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
105+
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, bytemuck::Pod, bytemuck::Zeroable)]
106+
#[repr(C)]
89107
pub struct ImageFingerprint {
90108
pub(crate) exact: [u8; 32],
91109
pub(crate) global_hash: u64,
@@ -112,6 +130,17 @@ impl ImageFingerprint {
112130
&self.exact
113131
}
114132

133+
/// Returns the on-disk format version this fingerprint was computed under.
134+
///
135+
/// Equal to [`crate::FORMAT_VERSION`]. Persist alongside fingerprint bytes
136+
/// (or in a sidecar manifest) and refuse comparison across mismatched
137+
/// versions to guard against algorithm-version drift.
138+
#[inline]
139+
#[must_use]
140+
pub const fn format_version() -> u32 {
141+
crate::FORMAT_VERSION
142+
}
143+
115144
/// Returns the global perceptual hash from the center 32x32 region.
116145
///
117146
/// This hash captures the overall structure of the image and is robust
@@ -187,16 +216,44 @@ impl ImageFingerprint {
187216
///
188217
/// Provides enhanced similarity detection by combining results from multiple
189218
/// hash algorithms with weighted combination for improved accuracy.
219+
///
220+
/// # Binary layout
221+
///
222+
/// `#[repr(C)]` with no padding bytes (536 bytes total: 32 + 3 × 168). Implements
223+
/// [`bytemuck::Pod`] / [`bytemuck::Zeroable`] for zero-copy cast to `&[u8]`.
224+
/// See [`ImageFingerprint`] for an example.
225+
///
226+
/// Stable layout is enforced at compile time via a `const _` size assertion;
227+
/// any accidental layout drift fails the build.
228+
///
229+
/// `Copy` is derived for `bytemuck::Pod` compatibility; move-by-value silently
230+
/// memcpys 536 bytes. Prefer borrowing (`&MultiHashFingerprint`) in hot loops.
190231
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
191232
#[cfg_attr(feature = "serde", serde(deny_unknown_fields))]
192-
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
233+
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, bytemuck::Pod, bytemuck::Zeroable)]
234+
#[repr(C)]
193235
pub struct MultiHashFingerprint {
194236
pub(crate) exact: [u8; 32],
195237
pub(crate) ahash: ImageFingerprint,
196238
pub(crate) phash: ImageFingerprint,
197239
pub(crate) dhash: ImageFingerprint,
198240
}
199241

242+
// Layout-stability gate. If anyone accidentally introduces padding or reorders
243+
// fields in a way that changes the binary size, the build fails here. UCFP and
244+
// any other consumer relying on bytemuck::cast_slice would otherwise get
245+
// silently broken artefacts.
246+
const _: () = {
247+
assert!(
248+
core::mem::size_of::<ImageFingerprint>() == 168,
249+
"ImageFingerprint binary layout drifted"
250+
);
251+
assert!(
252+
core::mem::size_of::<MultiHashFingerprint>() == 536,
253+
"MultiHashFingerprint binary layout drifted"
254+
);
255+
};
256+
200257
impl MultiHashFingerprint {
201258
pub(crate) fn new(
202259
exact: [u8; 32],
@@ -219,6 +276,17 @@ impl MultiHashFingerprint {
219276
&self.exact
220277
}
221278

279+
/// Returns the on-disk format version this fingerprint was computed under.
280+
///
281+
/// Equal to [`crate::FORMAT_VERSION`]. Persist alongside fingerprint bytes
282+
/// (or in a sidecar manifest) and refuse comparison across mismatched
283+
/// versions to guard against algorithm-version drift.
284+
#[inline]
285+
#[must_use]
286+
pub const fn format_version() -> u32 {
287+
crate::FORMAT_VERSION
288+
}
289+
222290
/// Returns the AHash-based fingerprint.
223291
#[inline]
224292
#[must_use]
@@ -467,4 +535,60 @@ mod tests {
467535
// Keeps `fp()` referenced so the helper test util doesn't bit-rot.
468536
let _ = fp(0x1234, 0xABCD);
469537
}
538+
539+
#[test]
540+
fn format_version_is_one() {
541+
assert_eq!(crate::FORMAT_VERSION, 1);
542+
assert_eq!(ImageFingerprint::format_version(), 1);
543+
assert_eq!(MultiHashFingerprint::format_version(), 1);
544+
}
545+
546+
#[test]
547+
fn image_fingerprint_layout_is_stable() {
548+
assert_eq!(core::mem::size_of::<ImageFingerprint>(), 168);
549+
assert_eq!(core::mem::align_of::<ImageFingerprint>(), 8);
550+
}
551+
552+
#[test]
553+
fn multi_hash_fingerprint_layout_is_stable() {
554+
assert_eq!(core::mem::size_of::<MultiHashFingerprint>(), 536);
555+
assert_eq!(core::mem::align_of::<MultiHashFingerprint>(), 8);
556+
}
557+
558+
#[test]
559+
fn image_fingerprint_cast_slice_roundtrips() {
560+
let fps = vec![
561+
ImageFingerprint::new([1u8; 32], 0xAAAA_BBBB_CCCC_DDDD, [0x1234; 16]),
562+
ImageFingerprint::new([2u8; 32], 0xDEAD_BEEF_CAFE_BABE, [0xFEDC; 16]),
563+
ImageFingerprint::new([3u8; 32], 0, [0; 16]),
564+
];
565+
let bytes: &[u8] = bytemuck::cast_slice(&fps);
566+
assert_eq!(bytes.len(), 3 * 168);
567+
568+
let back: &[ImageFingerprint] = bytemuck::cast_slice(bytes);
569+
assert_eq!(back.len(), fps.len());
570+
assert_eq!(back, &fps[..]);
571+
}
572+
573+
#[test]
574+
fn multi_hash_fingerprint_cast_slice_roundtrips() {
575+
let fps = vec![
576+
multi([1u8; 32], 0x1111, 0x2222, 0x3333),
577+
multi([2u8; 32], 0xAAAA, 0xBBBB, 0xCCCC),
578+
];
579+
let bytes: &[u8] = bytemuck::cast_slice(&fps);
580+
assert_eq!(bytes.len(), 2 * 536);
581+
582+
let back: &[MultiHashFingerprint] = bytemuck::cast_slice(bytes);
583+
assert_eq!(back.len(), fps.len());
584+
assert_eq!(back, &fps[..]);
585+
}
586+
587+
#[test]
588+
fn fingerprint_zeroed_is_valid() {
589+
// Zeroable means an all-zero bit pattern is a valid value of the type.
590+
let z: MultiHashFingerprint = bytemuck::Zeroable::zeroed();
591+
assert_eq!(*z.exact_hash(), [0u8; 32]);
592+
assert_eq!(z.ahash().global_hash(), 0);
593+
}
470594
}

src/core/fingerprinter.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1077,7 +1077,7 @@ mod tests {
10771077

10781078
let mut single_set = HashSet::new();
10791079
let single = ImageFingerprinter::fingerprint_with(&img1, HashAlgorithm::DHash).unwrap();
1080-
single_set.insert(single.clone());
1080+
single_set.insert(single);
10811081
single_set.insert(single);
10821082
assert_eq!(single_set.len(), 1);
10831083
}

0 commit comments

Comments
 (0)