Skip to content

wasmparser: merge VisitSimdOperator back into VisitOperator #2184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nagisa
Copy link
Contributor

@nagisa nagisa commented May 9, 2025

The simd feature remains in place, but now does not affect the API the way it used to. Rather parts of implementation pertaining to SIMD in wasmparser get disabled, reducing the compilation time and artifact size impact slightly, though less than in the original approach.

The results on my machine are roughly:

  • origin/main simd: 4.26s producing 30M .rlib
  • origin/main !simd: 3.49s producing 23M .rlib
  • this simd: 4.19s producing a 26M .rlib
  • this !simd: 3.96s producing a 25M .rlib

I'm not quite sure why the rlib size decreased so significantly when the SIMD feature is enabled (I guess less inlining is occurring?) and there is indeed a .5s hit on the compilation time to this PR.

Are those half a second worth a clunkier API? I'm not convinced :)

Fixes #2181

@nagisa
Copy link
Contributor Author

nagisa commented May 9, 2025

cc @Robbepop

@nagisa nagisa force-pushed the merges-simd-visitor branch from f2f7b31 to 45d4974 Compare May 9, 2025 12:37
@Robbepop
Copy link
Collaborator

Robbepop commented May 9, 2025

Are those half a second worth a clunkier API? I'm not convinced :)

Since the API has to be mirrored on the user side it is significantly more than 0.5s. In Wasmi's case the difference between SIMD and non SIMD is very much worth the effort.

@nagisa
Copy link
Contributor Author

nagisa commented May 9, 2025

How bad is the impact if you implement all of those methods with a plain call to a #[cold] method that returns "disabled"/panics when SIMD is disabled?

@Robbepop
Copy link
Collaborator

Robbepop commented May 9, 2025

How bad is the impact if you implement all of those methods with a plain call to a #[cold] method that returns "disabled"/panics when SIMD is disabled?

That's basically what I did before the change to use an optional simd crate feature (via macro) and as said prior the impact was large enough to suffice the changes from the perspective of users such as Wasmi.

To provide you with concrete numbers for Wasmi:

  • cargo build -p wasmi: ~4.8s
  • cargo build -p wasmi --features simd: ~7.3s

Given that most Wasmi users do not have a need for Wasm simd this is a huge win and would be an unnecessary overhead if it was baked in.

The `simd` feature remains in place, but now does not affect the API in
this way. Rather parts of implementation pertaining to SIMD in
wasmparser get disabled, reducing the compilation time and artifact
size impact slightly, though less than in the original approach.

The results on my machine are roughly:

* origin/main simd: 4.26s producing 30M .rlib
* origin/main !simd: 3.49s producing 23M .rlib
* this simd: 4.19s producing a 26M .rlib
* this !simd: 3.96s producing a 25M .rlib

I'm not quite sure why the rlib size decreased so significantly when the
SIMD feature is enabled (I guess less inlining is occurring?) and there
is indeed a .5s hit on the compilation time to this PR.

Are those half a second worth a clunkier API? I'm not convinced :)
@nagisa nagisa force-pushed the merges-simd-visitor branch from 45d4974 to 6c9ef22 Compare May 9, 2025 14:35
Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Robbepop I'm curious if you can detail the numbers a bit more? This is part of what I was afraid of in the original split that maintaining such a split is pretty cumbersome and IMO is only worth it if it has few downsides. Handling the split has enough downsides already and @nagisa's use case seems reasonable to me to support and it's easily supported with the split we have today.

For example are you measuring total compile time of wasmi from nothing? Those numbers are so small that 2.5s to me doesn't seem like it wouldn't really push the needle. You say this is a huge win but can you clarify how? In my experience fresh builds happen rarely enough or in situations where 2.5s doesn't really matter, but what are you thinking of? Is there some other operation which 2.5s is a significant portion of that reducing the overall operation is beneficial?

Personally I prefer the split in this PR which is that the feature gate only affects runtime and snips out the validator and other portions where possible. I think API-wise this is a much better way to carry conditional features. I'll also point out that your measurements weren't against this style of API, so the exact discrepancy may be much different with this API.

@@ -924,7 +915,7 @@ macro_rules! define_for_each_simd_operator {
}
};
}
#[cfg(feature = "simd")]

_for_each_operator_group!(define_for_each_simd_operator);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could get dropped in theory with this change, right? The idea being that the macro would always have all the instructions?

@Robbepop
Copy link
Collaborator

Robbepop commented May 10, 2025

Those numbers are so small that 2.5s to me doesn't seem like it wouldn't really push the needle. You say this is a huge win but can you clarify how? In my experience fresh builds happen rarely enough or in situations where 2.5s doesn't really matter, but what are you thinking of? Is there some other operation which 2.5s is a significant portion of that reducing the overall operation is beneficial?

I feel like we are operating from different angles. For Wasmi, being as lightweight as possible is its niche in which it finds its use compared to other Wasm runtimes such as Wasmer and Wasmtime. So while 2.5s compile time reduction is insignificant for those other Wasm runtimes, it certainly is not for Wasmi to be a viable alternative for certain users.

I thought it was possible for wasmparser to provide an implementation for a wide variety of users, yet, I consistently find myself in the unfortunate position to be "annoying party" that hinders wasmparser feature development or progression because it is not aligned with Wasmi's somewhat unique or niche goals and I really do feel sorry about this.

tl;dr: maybe it would be better for everyone involved if Wasmi forks wasmparser again since Wasmi's goals seems to be too unaligned with others. I would still contribute to wasmparser if valuable to upstream users.

@alexcrichton
Copy link
Member

I agree wasmi/wasmtime have different constraints, but I also want to clarify that, IMO, that's a good thing. Neither should necessarily take precedence over the other where possible and the goal is to satisfy as broad of a set of use cases as we can. I that sense while I do want to acknowledge that you're swimming upstream a bit (insofar as wasmparser doesn't automatically fit your use case exactly) you most definitely should not feel sorry for feature requests that risk coming at a cost of other embeddings. While I do think it's possible to have a change that's "too far" I don't think we're necessarily there yet here.

I'm trying to, for myself at least, explore the design space here. I don't know much about wasmi's constraints so I'm leaning on you to tell me things. I try to infer what I can but I'm unable to infer why ~2s of more build time would have such a large impact, hence my probing for more information. Apologies if that comes off as something along the lines of "surely this use case isn't that important" as that's most definitely not what I mean! To me 2s depends a lot on context. If Cargo slept 2 seconds every time I typed cargo build I'd be very sad as an example.

Any large-ish change in service of one embedding vs another always needs to be balance. For example I realize that not all embeddings are interested in supporting the component model which is why I've tried to ensure that everything is separated so it can be turned off at any time. I do find it tough to satisfy the two needs of "I want to support everything" in Wasmtime vs "I want to support only a subset" in wasmi (at least for simd-less builds) from an API-design perspective of this crate. I think it's worth iterating on designs though where possible, so for example the one proposed by @nagisa here I think is relatively-objectively better from an API-design standpoint but there's still the open question of does it solve the build time/build size issues in the same way the current solution does.

@Robbepop would you be up for testing out wasmi against a change like this? Basically comparing the simd-less build of today with the simd-less build of a PR like this?

@Robbepop
Copy link
Collaborator

I too highly prefer a common uniform solution such as wasmparser compared to having to maintain my own fork.
Yet, I completely understand that Wasmi's needs are kinda niche which sometimes makes finding abstractions and solutions that satisfy all users can be extremely difficult and time consuming. So I felt sorry because Wasmi's needs seem to cause so much of difficulties.

Though I want to make clear that I am also very thankful that wasmparser still tries its best to support even those niche use cases, e.g. simd, component-model features and the btree based collections, visitor trait based dispatch etc.. I am looking forward to the trait based WasmFeatures as it sounds very promising to solve many performance issues in a generic way.

I agree that the API proposed by @nagisa is better than the current one from a usability perspective and I would love wasmparser to change (back) to it if it did not regress compile times.

I try to infer what I can but I'm unable to infer why ~2s of more build time would have such a large impact, hence my probing for more information.

Again, I am very sorry about the sparse information from my side. Though, the only answer I can give is that there were Wasmi users in the past which critiqued Wasmi for being not lightweight enough. The reality is that Wasmi competes not only with JIT based Wasm runtimes such as Wasmtime and Wasmer but mostly with other Wasm interpreters and "singlepass" JITs such as WAMR's fast interpreter, Wasmer's singlepass JIT, Wasm3 etc. Especially the other interpreters are even more lightweight than Wasmi, and I am trying since some time (more or less) successfully to drop the numbers to be more competitive in that regard. E.g. Wasmi was compiling in 8s (debug) or 15s (release) and now is compiling in 5s (debug) or 10s (release). With these numbers in mind, 2.5s overhead seems big again I hope. Same argumentation applies to artifact size, though I have not as heavily invested in that area so far.

would you be up for testing out wasmi against a change like this? Basically comparing the simd-less build of today with the simd-less build of a PR like this?

Yes, I am very willing to do more tests. Though, I am currently at RustWeek so not sure how much time I will have this week. Great talk by the way!

@nagisa
Copy link
Contributor Author

nagisa commented May 13, 2025

FWIW I just went with unconditional features = ["simd"] and bounding my JoinVisitor trait impl on VisitOperator + VisitSimdOperator. Doing so does not really regress anything in particular compared to the situation I had with the version of wasmparser I'm migrating from, other than that users now get to implement not one but two traits.

I'm not currently blocked on anything in particular and there is no rush from my side to get some resolution here. I'm happy to make myself available to help drive some sort of an improvement to the current situation, but at the same time I'm fine if we decide that we should leave the situation as-is for the time being.

@alexcrichton
Copy link
Member

With these numbers in mind, 2.5s overhead seems big again I hope. Same argumentation applies to artifact size, though I have not as heavily invested in that area so far.

Thanks for the explanation, and makes sense!

Yes, I am very willing to do more tests. Though, I am currently at RustWeek so not sure how much time I will have this week. Great talk by the way!

Heh thanks! Ok and with what @nagisa said as well let's put this on old until @Robbepop you get a chance to evaluate it. Sounds like there's no rush though, so don't sweat it too much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Impossible to implement VisitOperator::simd_visitor generically
3 participants