feat: add support for u64 MARF blob offsets by francesco-stacks · Pull Request #6949 · stacks-network/stacks-core

francesco-stacks · 2026-03-03T10:56:34Z

Description

Summary

This PR adds support for 64-bit trie pointers while keeping the current 32-bit format.

I hit this while building a mainnet snapshot: the squashed MARF trie for clarity was around 5GB, so some offsets went past u32::MAX and crashed. This PR is a prerequisite for squash work, split out so we can align on the approach first.

I looked at three options:

Add 64-bit pointers (this PR)
- straightforward fix for >4GB offsets
- pointers get larger (~40% bigger encoded TriePtr, but not a problem for "squashed MARFs")
- The idea behind this is that the archival nodes stay u32, while squashed nodes use u64
Split snapshot into multiple smaller tries
- doable, but a bigger MARF redesign and more complexity
Store snapshot in a plain key/value DB
- simpler storage model, but loses trie structure benefits
- more duplicated key bytes
- we lose current Merkle proof mechanism

Applicable issues

fixes #

Additional info (benefits, drawbacks, caveats)

Checklist

Test coverage for new or modified code paths
For new Clarity features or consensus changes, add property tests (see docs/property-testing.md)
Changelog is updated
Required documentation changes (e.g., rpc/openapi.yaml for RPC endpoints, event-dispatcher.md for new events)
New clarity functions have corresponding PR in clarity-benchmarking repo

codecov · 2026-03-03T11:31:58Z

Codecov Report

❌ Patch coverage is 62.88089% with 268 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.58%. Comparing base (2847ab5) to head (b1462c8).
⚠️ Report is 2 commits behind head on develop.

Files with missing lines	Patch %	Lines
stackslib/src/chainstate/stacks/index/test/node.rs	37.77%	112 Missing ⚠️
...lib/src/chainstate/stacks/index/test/node_patch.rs	15.68%	86 Missing ⚠️
stackslib/src/chainstate/stacks/index/node.rs	84.61%	30 Missing ⚠️
stackslib/src/chainstate/stacks/index/storage.rs	84.56%	23 Missing ⚠️
stackslib/src/chainstate/stacks/index/bits.rs	88.13%	7 Missing ⚠️
stackslib/src/chainstate/stacks/index/trie_sql.rs	62.50%	6 Missing ⚠️
...ckslib/src/chainstate/stacks/index/test/storage.rs	81.25%	3 Missing ⚠️
stackslib/src/chainstate/stacks/index/test/file.rs	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #6949      +/-   ##
===========================================
+ Coverage    54.68%   60.58%   +5.90%     
===========================================
  Files          412      412              
  Lines       218662   219172     +510     
  Branches       338      338              
===========================================
+ Hits        119571   132790   +13219     
+ Misses       99091    86382   -12709

Files with missing lines	Coverage Δ
stackslib/src/chainstate/stacks/index/file.rs	`88.66% <100.00%> (+8.12%)`	⬆️
stackslib/src/chainstate/stacks/index/test/file.rs	`57.89% <0.00%> (+57.89%)`	⬆️
...ckslib/src/chainstate/stacks/index/test/storage.rs	`83.18% <81.25%> (+0.94%)`	⬆️
stackslib/src/chainstate/stacks/index/trie_sql.rs	`85.40% <62.50%> (+5.35%)`	⬆️
stackslib/src/chainstate/stacks/index/bits.rs	`70.08% <88.13%> (-0.51%)`	⬇️
stackslib/src/chainstate/stacks/index/storage.rs	`80.31% <84.56%> (-0.12%)`	⬇️
stackslib/src/chainstate/stacks/index/node.rs	`83.78% <84.61%> (+0.49%)`	⬆️
...lib/src/chainstate/stacks/index/test/node_patch.rs	`21.32% <15.68%> (-8.22%)`	⬇️
stackslib/src/chainstate/stacks/index/test/node.rs	`22.49% <37.77%> (+8.82%)`	⬆️

... and 266 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2847ab5...b1462c8. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

federico-stacks · 2026-03-05T13:44:58Z

Thanks for putting this together. 

Regarding the alternatives, I don’t have additional options to propose beyond the ones already listed, as we discussed these approaches over the past few days. 

I do have a thought instead about the current solution u64-based. Do we expect the long-term trajectory to have only snapshotted nodes? If that’s the case, and considering that snapshotted nodes should require significantly less disk space, it might be reasonable to “spend” a bit more space and move toward a single u64 representation. During the transition we could still support the current u32/u64 duality, but eventually simplify the model by standardizing on u64.

cylewitruk-stacks

Have you considered using variable-length encoding instead? i.e. for on-disk representation, but still using u64 on the type itself?

The tradeoff is marginally more CPU work at the serialization boundary, but you'd pack 5GB into 5 bytes instead of 8, and lower offsets would likely shave some bytes:

1 byte: up to 2^7 - 1 = 127
2 bytes: up to 2^14 - 1 = 16,383
3 bytes: up to 2^21 - 1 = 2,097,151
4 bytes: up to 2^28 - 1 = 268,435,455
5 bytes: up to 2^35 - 1 = 34,359,738,367

I have a u64-based LEB128 implementation which we use in sBTC here which was optimized for explicit safety guarantees, but a more performance/alloc-optimized impl would be equally simple.

jcnelson · 2026-03-09T14:51:01Z

Summarizing notes from a call @francesco-stacks and I had:

We can use bit 0x20 in the trie node ID to indicate that the ptr field is a u64, instead of a u32.
We can use bit 0x10 in the trie node ID to indicate that the node is part of the "compressed chainstate trie"
In the future, we can replace the compressed chainstate trie with a Sqlite DB, with two tables: one that contains (block ID, key, value) triples, and one that contains (block ID, root hash) pairs. This would allow us to incrementally compress the chainstate online by dropping the oldest trie and inserting its key/value pairs into this DB. We could compute a Merkle root over its key/value pairs (as we do now), and then compute a Merkle skip-list over its root and its ancestors. The root hash of this skip-list would be stored along with the block ID. This would allow us to serve Merkle proofs-of-inclusion for key/value pairs in the compressed chain state -- if the key/value pair is in the compressed chain state, we would serve a Merkle skip-list proof that links the chain tip to the ancestral trie, and then a Merkle tree proof that proves the key/value pair's existence in the ancestral trie.

stackslib/src/chainstate/stacks/index/storage.rs

stackslib/src/chainstate/stacks/index/node.rs

stackslib/src/chainstate/stacks/index/storage.rs

stackslib/src/chainstate/stacks/index/test/file.rs

stackslib/src/chainstate/stacks/index/test/node_patch.rs

jcnelson · 2026-03-19T21:59:46Z

stackslib/src/chainstate/stacks/index/node.rs

+        if is_u64_ptr(encoded_id) {
+            w.write_all(&self.ptr().to_be_bytes())?;
+        } else {
+            w.write_all(&(self.ptr() as u32).to_be_bytes())?;


Instead of doing a cast to u32, could you do u32::try_from(self.ptr()).expect("unreachable") so we deliberately crash if the u64 control bit is unset but the ptr can't fit in a u32?

Even better imo would be to add an Error::IntegerConversion variant and return an Err instead of expecting since the method is fallible anyway? Here and elsewhere.

Both great points, I moved to try_from and from wherever possible, and use the OverFlowError that was already available

jcnelson · 2026-03-19T22:00:14Z

stackslib/src/chainstate/stacks/index/node.rs

+        if is_u64_ptr(encoded_id) {
+            w.write_all(&self.ptr().to_be_bytes())?;
+        } else {
+            w.write_all(&(self.ptr() as u32).to_be_bytes())?;


Same here -- let's deliberately panic instead of casting and potentially silently corrupting.

I return an error now

jcnelson · 2026-03-19T22:01:02Z

stackslib/src/chainstate/stacks/index/node.rs

+            let back_block = u32::from_be_bytes([bytes[10], bytes[11], bytes[12], bytes[13]]);
+            (ptr, back_block)
+        } else {
+            let ptr = u32::from_be_bytes([bytes[2], bytes[3], bytes[4], bytes[5]]) as u64;


In general, we try to avoid casting. Can you use u64::from() here?

done. (Sorry, I distributed the porting to from and try_from in different commits)

coveralls · 2026-03-24T17:09:47Z

Pull Request Test Coverage Report for Build 23788516993

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

690 of 826 (83.54%) changed or added relevant lines in 9 files are covered.
1288 unchanged lines in 48 files lost coverage.
Overall coverage decreased (-0.02%) to 85.691%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
stackslib/src/chainstate/stacks/index/test/file.rs	20	21	95.24%
stackslib/src/chainstate/stacks/index/storage.rs	189	211	89.57%
stackslib/src/chainstate/stacks/index/test/storage.rs	105	218	48.17%

Files with Coverage Reduction	New Missed Lines	%
stackslib/src/burnchains/mod.rs	1	83.9%
stackslib/src/chainstate/stacks/index/node.rs	1	87.87%
stackslib/src/clarity_vm/clarity.rs	1	93.7%
stackslib/src/net/download/nakamoto/download_state_machine.rs	1	90.57%
stackslib/src/net/relay.rs	1	74.54%
stacks-node/src/nakamoto_node/signer_coordinator.rs	1	86.18%
stacks-signer/src/client/stacks_client.rs	1	86.19%
clarity/src/vm/functions/define.rs	2	99.34%
stacks-common/src/deps_common/bitcoin/util/hash.rs	2	81.45%
stackslib/src/net/dns.rs	2	91.59%

Totals
Change from base Build 23400582460:	-0.02%
Covered Lines:	187118
Relevant Lines:	218364

💛 - Coveralls

Copilot

Pull request overview

This PR extends MARF trie storage to support >4GB trie blob offsets by adding a mixed-width u32/u64 pointer encoding (using a 0x20 control bit) while preserving existing 32-bit formats, and adds schema “read-only compatibility check” behavior.

Changes:

Add mixed u32/u64 TriePtr encoding/decoding (incl. compressed forms) and update byte-length calculations accordingly.
Update trie dump/load and file-backed reads to use u64 offsets, including a multi-pass layout stabilization for dumps.
Extend MARF DB migration to support a “readonly compatibility check only” mode, and add tests for new pointer/patch behaviors.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`stackslib/src/chainstate/stacks/index/node.rs`	Introduces u64-pointer control bit (0x20), updates `TriePtr` to `u64`, and adjusts encoding/size logic.
`stackslib/src/chainstate/stacks/index/bits.rs`	Updates pointer list sizing and parsing to handle mixed-width pointers; adjusts seeks accordingly.
`stackslib/src/chainstate/stacks/index/storage.rs`	Propagates `u64` offsets through in-memory/dump logic; adds multi-pass dump layout stabilization; wires readonly migration behavior.
`stackslib/src/chainstate/stacks/index/file.rs`	Removes `u32` casts when seeking to trie offsets (now `u64`).
`stackslib/src/chainstate/stacks/index/trie_sql.rs`	Adds `readonly` mode to migration function for compatibility checking without mutating schema.
`stackslib/src/chainstate/stacks/index/test/*`	Updates tests for `u64` pointers; adds patch serialization tests and dump-compression behavior tests (some ignored due to size).
`changelog.d/marf-u64-offset-pointers.added`	Adds changelog fragment for u64 pointer offset support.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-25T16:16:35Z

stackslib/src/chainstate/stacks/index/storage.rs

+            // Normal nodes with forward ptrs need re-measurement; patch
+            // nodes have a fixed size independent of pointer encoding.
+            let fwd = has_fwd && patch_node_opt.is_none();
            if let Some((hash_bytes, patch)) = patch_node_opt.take() {
                node_data.push(DumpPtr::Patch(pointer, hash_bytes, patch));
            } else {
                node_data.push(DumpPtr::Normal(pointer));
            }
-            offsets.push(ptr as u32);
+            has_forward_ptrs.push(fwd);
        }

-        assert_eq!(offsets.len(), node_data.len());
+        // step 2: repeatedly lay out nodes until serialized offsets stabilize
+        // The first 32 bytes are reserved for the parent block hash,
+        // and the next 4 bytes for the local block identifier.
+        let mut end_offset = BLOCK_HEADER_HASH_ENCODED_SIZE as u64 + 4;
+        let mut offsets = vec![];
+        // Cached byte lengths: leaf / pointer-free / patch node sizes are
+        // constant across passes, so we only recompute non-leaf nodes.
+        let mut byte_lens = node_data
+            .iter()
+            .map(|dp| {
+                let byte_len = if let Some(patch) = dp.patch() {
+                    TRIEHASH_ENCODED_SIZE + patch.size()
+                } else {
+                    let (node, _) = self.get_nodetype(dp.ptr())?;
+                    get_node_byte_len_compressed(node)
+                };


In dump_compressed_consume, patch node sizes are treated as fixed across layout passes (fwd = has_fwd && patch_node_opt.is_none()), but TrieNodePatch::size() depends on TriePtr::compressed_size(), which can widen from u32->u64 once the child offsets exceed u32::MAX. This can cause incorrect offsets and corrupted serialized output for large tries. Consider marking patch nodes as needing re-measurement as well and recomputing TRIEHASH_ENCODED_SIZE + patch.size() each pass (or recomputing byte lengths for all nodes each pass) so the convergence loop accounts for pointer-width changes inside patches.

Copilot · 2026-03-25T16:16:35Z

stackslib/src/chainstate/stacks/index/storage.rs

+                let node_forward = node
+                    .ptrs()
+                    .iter()
+                    .filter(|p| !p.is_empty() && !is_backptr(p.id))
+                    .map(|p| p.chr());
+                let diff_forward = patch_node
+                    .ptr_diff
+                    .iter()
+                    .filter(|p| !p.is_empty() && !is_backptr(p.id))
+                    .map(|p| p.chr());
+                assert!(node_forward.eq(diff_forward));


The assert!(node_forward.eq(diff_forward)); in dump_compressed_consume will panic the process on mismatch. Since this code runs in non-test builds, it would be safer to convert this into a handled error (e.g., Err(Error::CorruptionError(...))) or at least debug_assert! so a malformed patch/node mismatch can't crash a node in production.

Suggested change

let node_forward = node

.ptrs()

.iter()

.filter(|p| !p.is_empty() && !is_backptr(p.id))

.map(|p| p.chr());

let diff_forward = patch_node

.ptr_diff

.iter()

.filter(|p| !p.is_empty() && !is_backptr(p.id))

.map(|p| p.chr());

assert!(node_forward.eq(diff_forward));

let node_forward: Vec<_> = node

.ptrs()

.iter()

.filter(|p| !p.is_empty() && !is_backptr(p.id))

.map(|p| p.chr())

.collect();

let diff_forward: Vec<_> = patch_node

.ptr_diff

.iter()

.filter(|p| !p.is_empty() && !is_backptr(p.id))

.map(|p| p.chr())

.collect();

if node_forward != diff_forward {

return Err(Error::CorruptionError(

"Forward child pointers of node and patch diff do not match".into(),

));

}

Copilot · 2026-03-25T16:16:35Z

stackslib/src/chainstate/stacks/index/storage.rs

 /// In-RAM trie storage.
 /// Used by TrieFileStorage to buffer the next trie being built.
+///
+/// Pointer in `TrieRAM` are index-based, not disk-offset-based:


Doc comment grammar: "Pointer in TrieRAM are" should be "Pointers in TrieRAM are".

Suggested change

/// Pointer in `TrieRAM` are index-based, not disk-offset-based:

/// Pointers in `TrieRAM` are index-based, not disk-offset-based:

federico-stacks

implementation looks fine to me. Just added a minor consideration about TrieRAM api.

federico-stacks · 2026-03-27T17:17:44Z

stackslib/src/chainstate/stacks/index/storage.rs

+/// Pointers in `TrieRAM` are index-based, not disk-offset-based:
+/// `TriePtr::ptr()` is treated as an in-memory node index into `data`, and
+/// traversal/indexing paths are intentionally bounded to `u32`.
+/// Large `u64` byte offsets are only materialized when serializing this trie
+/// to persistent storage (see `dump_consume`/`write_trie_indirect`).
 #[derive(Clone)]
 pub struct TrieRAM<T: MarfTrieId> {


I noticed a small API asymmetry in TrieRAM: some methods accept u64 indices (write_nodetype, last_ptr, read_nodetype), while others still take u32 (write_node_hash, get_nodetype).

The documentation mentions that TrieRAM indices are intentionally bounded to u32, but this invariant isn’t currently reflected in the type signatures. There’s no practical impact, since in-memory node counts won’t approach u32::MAX, but it does introduce a bit of inconsistency in the API.

Would it make sense to standardize on one type, either in this PR or a follow-up (or not at all)?

francesco-stacks requested review from aaronb-stacks, federico-stacks and jcnelson March 3, 2026 11:46

francesco-stacks marked this pull request as ready for review March 3, 2026 11:46

francesco-stacks self-assigned this Mar 3, 2026

francesco-stacks force-pushed the feat/marf-u64-offset-pointers branch from af686ee to b97d711 Compare March 3, 2026 11:59

add support for u64 children pointers offsets

9bb657e

francesco-stacks force-pushed the feat/marf-u64-offset-pointers branch from b97d711 to 9bb657e Compare March 3, 2026 12:02

francesco-stacks changed the title ~~feat: add support for u64 children pointers offsets~~ feat: add support for u64 MARF blob offsets Mar 3, 2026

Merge branch 'develop' into feat/marf-u64-offset-pointers

b1462c8

cylewitruk-stacks reviewed Mar 9, 2026

View reviewed changes

francesco-stacks added 5 commits March 11, 2026 14:29

use 0x20 bit for mixed u32/u64 pointers

155b853

Merge branch 'develop' into feat/marf-u64-offset-pointers

b08d490

remove unnecessary casts to u64

e368def

add safe casts to usize

83f06b8

cargo fmt

45c98a8

federico-stacks reviewed Mar 16, 2026

View reviewed changes

jcnelson reviewed Mar 19, 2026

View reviewed changes

francesco-stacks added 7 commits March 20, 2026 11:48

Merge branch 'develop' into feat/marf-u64-offset-pointers

a554f68

Merge branch 'develop' into feat/marf-u64-offset-pointers

504473b

update changelog

5eb3035

crc: update comments

d7af514

crc: use u64::from instead of cast. return OverflowError on error

d950f3b

crc: add migrate_tables tests

673f3a9

crc: update tests names from v2 to u64_ptr

8690320

crc: remove more castings

a550bd9

francesco-stacks marked this pull request as draft March 24, 2026 17:46

francesco-stacks added 6 commits March 25, 2026 13:29

fix ptrs_from_bytes

208c00a

simplify overflow error propagations

ba79988

improve dump_consume and dump_compressed_consume

5a427a0

fix write_node_hash signature

eac9816

fix missing +1

c6009c8

remove unused symbol

a3a04db

francesco-stacks requested review from Copilot March 25, 2026 16:07

Copilot started reviewing on behalf of francesco-stacks March 25, 2026 16:09 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

francesco-stacks added 2 commits March 26, 2026 18:19

fix dump_compressed_consume for patch nodes

45489df

fix typo

ec3e1c3

francesco-stacks marked this pull request as ready for review March 26, 2026 21:58

francesco-stacks requested review from cylewitruk-stacks, federico-stacks and jcnelson March 26, 2026 21:59

federico-stacks reviewed Mar 27, 2026

View reviewed changes

huth-stacks mentioned this pull request Mar 29, 2026

ci: [TEST] add sccache compiler caching to compilation-heavy workflows #7055

Open

3 tasks

francesco-stacks mentioned this pull request Mar 30, 2026

feat: marf squash foundation #7057

Draft

5 tasks

Merge branch 'develop' into feat/marf-u64-offset-pointers

f1d73b8

	/// Pointer in `TrieRAM` are index-based, not disk-offset-based:
	/// Pointers in `TrieRAM` are index-based, not disk-offset-based:

Conversation

francesco-stacks commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary

Applicable issues

Additional info (benefits, drawbacks, caveats)

Checklist

Uh oh!

codecov bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

federico-stacks commented Mar 5, 2026

Uh oh!

cylewitruk-stacks left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcnelson commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coveralls commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 23788516993

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

federico-stacks left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

francesco-stacks commented Mar 3, 2026 •

edited

Loading

codecov bot commented Mar 3, 2026 •

edited

Loading

cylewitruk-stacks left a comment •

edited

Loading

jcnelson commented Mar 9, 2026 •

edited

Loading

coveralls commented Mar 24, 2026 •

edited

Loading