Skip to content

Lazy trees v2 #13225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Lazy trees v2 #13225

wants to merge 1 commit into from

Conversation

edolstra
Copy link
Member

@edolstra edolstra commented May 17, 2025

Motivation

This adds a setting lazy-trees that causes flake inputs to be "mounted" as virtual filesystems on top of /nix/store as random "virtual" store paths. Only when the store path is actually used as a dependency of a store derivation do we materialize ("devirtualize") the input by copying it to its content-addressed location in the store.

String contexts determine when devirtualization happens. One wrinkle is that there are cases where we had store paths without proper contexts, in particular when the user does toString <path> (where <path> is a source tree in the Nix store) and passes the result to a derivation. This usage was always broken, since it can result in derivations that lack correct references. But to ensure that we don't change evaluation results, we introduce a new type of context that results in devirtualization but not in store references. We also now print a warning about this.

Supersedes/incorporates #12432 and #12915. It replaces #6530. The big difference with #6530 is that the latter treated each source tree as its own root filesystem, which caused a lot of backward compatibility issues that required ugly hacks. With the new approach, lazy source trees appear under /nix/store, just using a virtual store path that doesn't exist in the real filesystem.

TODO: run the test suite with lazy-trees = true.

Context


Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@github-actions github-actions bot added new-cli Relating to the "nix" command with-tests Issues related to testing. PRs with tests have some priority store Issues and pull requests concerning the Nix store fetching Networking with the outside (non-Nix) world, input locking labels May 17, 2025
@edolstra edolstra mentioned this pull request May 17, 2025
9 tasks
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/determinate-nix-3-5-introducing-lazy-trees/64350/7

@Mic92 Mic92 added this to Nix team May 18, 2025
@github-project-automation github-project-automation bot moved this to To triage in Nix team May 18, 2025
@Mic92

This comment was marked as resolved.

This adds a setting 'lazy-trees' that causes flake inputs to be
"mounted" as virtual filesystems on top of /nix/store as random
"virtual" store paths. Only when the store path is actually used as a
dependency of a store derivation do we materialize ("devirtualize")
the input by copying it to its content-addressed location in the
store.

String contexts determine when devirtualization happens. One wrinkle
is that there are cases where we had store paths without proper
contexts, in particular when the user does `toString <path>` (where
<path> is a source tree in the Nix store) and passes the result to a
derivation. This usage was always broken, since it can result in
derivations that lack correct references. But to ensure that we don't
change evaluation results, we introduce a new type of context that
results in devirtualization but not in store references. We also now
print a warning about this.
@Mic92
Copy link
Member

Mic92 commented May 20, 2025

Running the test suite uncovered a couple of failures: https://github.com/NixOS/nix/actions/runs/15130602489/job/42530714771?pr=13235

@Mic92
Copy link
Member

Mic92 commented May 20, 2025

Particularly nixpkgsLibTests

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using randomness or even fingerprints for placeholders makes the language non-deterministic and impure.

Some things we can do, in approximate order of increasing cost/benefit:

With those solutions in mind, I don't think we have to ruin the language semantics with any kind of "entropy", including fingerprints.
Path equality and ordering (as observable to users) must remain identical, and we have plenty of options for making it lazier without sacrificing determinism or compatibility.

@Mic92
Copy link
Member

Mic92 commented May 23, 2025

@roberth I understand your concerns but I think it would be hard to make improvments if changes are not allowed incrementally. I think having large pull requests lingering also makes impacts negatively other pull requests where we are too afraid to apply large scale refactorings. Would you be fine if the settings would be replaced with a compile-time flag?

@Aleksanaa
Copy link
Member

How about guard behind an experimental flag and let more people use it first?

@Mic92
Copy link
Member

Mic92 commented May 23, 2025

How about guard behind an experimental flag and let more people use it first?

It's already a flag not enabled by default, so that should be covered. Commonly we use experimental flags for new features that might change over time. This flag should not introduce any new features so, but potentially bugs, so I don't think it needs to be experimental in this case. The documentation of the flag should potentially mention the current randomness however.

@roberth
Copy link
Member

roberth commented May 23, 2025

@Mic92 Incremental improvements are allowed, but this is not an improvement.

If any contributor, including Nix team members, wants to make breaking changes to the language, they should discuss that with the team first, and it is unlikely to result in an actual change to the language, as Nix promises reproducible behavior on old expressions.

Large PRs are ok if they're refactors, especially if they're previously agreed upon. However, this PR is not a refactor. It changes the semantics of the Nix language significantly, and for worse.

compile-time flag

What would be the purpose, except enabling a bug? Any kind of flag, compile time or otherwise, has a real cost in terms of maintenance. Also we shouldn't be seducing users into bad workarounds. We've seen how that ends - or, I guess, doesn't end.

I appreciate that you try to find a middle ground @Mic92, but this is not a negotiation. We don't balance out insufficient work with personal authority or upvotes. Good work gets merged.

fetchers::Input & input, const fetchers::Input & originalInput, ref<SourceAccessor> accessor, bool requireLockable)
{
auto storePath = settings.lazyTrees
? StorePath::random(input.getName())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context this is the line the concern is about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path equality and ordering (as observable to users) must remain identical, and we have plenty of options for making it lazier without sacrificing determinism or compatibility.

Concretely, an easy incremental solution would be to just hash the source without copying, and eat the cost of that. It may still avoid the more expensive copying operation.
By having the correct string in the first place, we're not exposing the wrong string in the first place.

More optimizations can be applied later, with the usual condition that they don't regress the language semantics.
(e.g. the list I commented in my review, or something else)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For debugging, behind a flag, it might be interesting to put an X instead of the last Nix32 hash character, indicating that some code path didn't realise its string context as it should have, but other than that, I see no use for string replacements because of the whole language semantics thing. We can't have an observable wrong string, so there won't be anything to replace beyond what we're doing for ca-derivations.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be beneficial to use a "dummy" hash like all a's here, instead of something random? The names are all coming from the flake inputs, right? So we would end up with them being unique anyway, unless I am misunderstanding this section, which is well possible. We would simply have to keep the dummy hash stable to maintain compatibility, then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concretely, an easy incremental solution would be to just hash the source without copying, and eat the cost of that. It may still avoid the more expensive copying operation. By having the correct string in the first place, we're not exposing the wrong string in the first place.

Seems still quite slow to the 2s I get with the current implementation:

% time find $HOME/git/nixpkgs -type f | xargs cat >/dev/null
find $HOME/git/nixpkgs -type f  0.18s user 1.36s system 8% cpu 18.286 total
xargs cat > /dev/null  0.37s user 8.25s system 43% cpu 19.634 total

And nixpkgs is not even the biggest repository. For people doing data-science or game development, asset sizes can be way beyond what we have in nixpkgs (i.e. 40G for datasets/game assets).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. This one seems faster

% time git ls-files | xargs cat >/dev/null
noglob hub ls-files  0.02s user 0.02s system 5% cpu 0.625 total
xargs cat > /dev/null  0.09s user 0.60s system 97% cpu 0.710 total
time git ls-files | xargs sha256sum >/dev/null
noglob hub ls-files  0.02s user 0.02s system 6% cpu 0.587 total
xargs sha256sum > /dev/null  0.31s user 0.33s system 96% cpu 0.658 total

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. But for people that don't have enough RAM we are back to 18s at least:

$ echo 3 > /proc/sys/vm/drop_caches
$ time git ls-files | xargs sha256sum >/dev/null
noglob hub ls-files  0.03s user 0.04s system 0% cpu 15.883 total
xargs sha256sum > /dev/null  1.25s user 5.13s system 35% cpu 17.926 total

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be beneficial to use a "dummy" hash like all a's here, instead of something random? The names are all coming from the flake inputs, right? So we would end up with them being unique anyway, unless I am misunderstanding this section, which is well possible.

Even if we were to preserve that information, it would not have compatible equality and ordering. Those are observable behaviors in the language and they should not change compared to older Nix versions.
As an example, you may take flake inputs, or any other fetchTree outPaths, and use them to generate a JSON file that contains those paths as keys. Then they must appear in lexicographical order of the rendered hashes, as they currently do.

eat the cost of hashing

slow

We shouldn't have to hash it more often than we currently do, so it should be fine if we manage to make sure that's actually the case. That's probably worth the effort because we'll need a copy-like behavior anyway for the unfortunate expressions that rely in it in one way or another.
Hashing before copying does present opportunities:

  • avoid copying altogether if the store path for it already exists
  • resume evaluation before all the files have been copied, improving concurrency, reducing wall clock time

We'll have to see whether this PR will be a performance regression without applying the other optimizations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fetching Networking with the outside (non-Nix) world, input locking new-cli Relating to the "nix" command store Issues and pull requests concerning the Nix store with-tests Issues related to testing. PRs with tests have some priority
Projects
Status: To triage
Development

Successfully merging this pull request may close these issues.

6 participants