-
Notifications
You must be signed in to change notification settings - Fork 285
wasmparser: merge VisitSimdOperator back into VisitOperator #2184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
cc @Robbepop |
f2f7b31
to
45d4974
Compare
Since the API has to be mirrored on the user side it is significantly more than 0.5s. In Wasmi's case the difference between SIMD and non SIMD is very much worth the effort. |
How bad is the impact if you implement all of those methods with a plain call to a |
That's basically what I did before the change to use an optional To provide you with concrete numbers for Wasmi:
Given that most Wasmi users do not have a need for Wasm |
The `simd` feature remains in place, but now does not affect the API in this way. Rather parts of implementation pertaining to SIMD in wasmparser get disabled, reducing the compilation time and artifact size impact slightly, though less than in the original approach. The results on my machine are roughly: * origin/main simd: 4.26s producing 30M .rlib * origin/main !simd: 3.49s producing 23M .rlib * this simd: 4.19s producing a 26M .rlib * this !simd: 3.96s producing a 25M .rlib I'm not quite sure why the rlib size decreased so significantly when the SIMD feature is enabled (I guess less inlining is occurring?) and there is indeed a .5s hit on the compilation time to this PR. Are those half a second worth a clunkier API? I'm not convinced :)
45d4974
to
6c9ef22
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Robbepop I'm curious if you can detail the numbers a bit more? This is part of what I was afraid of in the original split that maintaining such a split is pretty cumbersome and IMO is only worth it if it has few downsides. Handling the split has enough downsides already and @nagisa's use case seems reasonable to me to support and it's easily supported with the split we have today.
For example are you measuring total compile time of wasmi from nothing? Those numbers are so small that 2.5s to me doesn't seem like it wouldn't really push the needle. You say this is a huge win but can you clarify how? In my experience fresh builds happen rarely enough or in situations where 2.5s doesn't really matter, but what are you thinking of? Is there some other operation which 2.5s is a significant portion of that reducing the overall operation is beneficial?
Personally I prefer the split in this PR which is that the feature gate only affects runtime and snips out the validator and other portions where possible. I think API-wise this is a much better way to carry conditional features. I'll also point out that your measurements weren't against this style of API, so the exact discrepancy may be much different with this API.
@@ -924,7 +915,7 @@ macro_rules! define_for_each_simd_operator { | |||
} | |||
}; | |||
} | |||
#[cfg(feature = "simd")] | |||
|
|||
_for_each_operator_group!(define_for_each_simd_operator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could get dropped in theory with this change, right? The idea being that the macro would always have all the instructions?
I feel like we are operating from different angles. For Wasmi, being as lightweight as possible is its niche in which it finds its use compared to other Wasm runtimes such as Wasmer and Wasmtime. So while 2.5s compile time reduction is insignificant for those other Wasm runtimes, it certainly is not for Wasmi to be a viable alternative for certain users. I thought it was possible for tl;dr: maybe it would be better for everyone involved if Wasmi forks |
I agree wasmi/wasmtime have different constraints, but I also want to clarify that, IMO, that's a good thing. Neither should necessarily take precedence over the other where possible and the goal is to satisfy as broad of a set of use cases as we can. I that sense while I do want to acknowledge that you're swimming upstream a bit (insofar as wasmparser doesn't automatically fit your use case exactly) you most definitely should not feel sorry for feature requests that risk coming at a cost of other embeddings. While I do think it's possible to have a change that's "too far" I don't think we're necessarily there yet here. I'm trying to, for myself at least, explore the design space here. I don't know much about wasmi's constraints so I'm leaning on you to tell me things. I try to infer what I can but I'm unable to infer why ~2s of more build time would have such a large impact, hence my probing for more information. Apologies if that comes off as something along the lines of "surely this use case isn't that important" as that's most definitely not what I mean! To me 2s depends a lot on context. If Cargo slept 2 seconds every time I typed Any large-ish change in service of one embedding vs another always needs to be balance. For example I realize that not all embeddings are interested in supporting the component model which is why I've tried to ensure that everything is separated so it can be turned off at any time. I do find it tough to satisfy the two needs of "I want to support everything" in Wasmtime vs "I want to support only a subset" in wasmi (at least for simd-less builds) from an API-design perspective of this crate. I think it's worth iterating on designs though where possible, so for example the one proposed by @nagisa here I think is relatively-objectively better from an API-design standpoint but there's still the open question of does it solve the build time/build size issues in the same way the current solution does. @Robbepop would you be up for testing out wasmi against a change like this? Basically comparing the simd-less build of today with the simd-less build of a PR like this? |
I too highly prefer a common uniform solution such as Though I want to make clear that I am also very thankful that I agree that the API proposed by @nagisa is better than the current one from a usability perspective and I would love
Again, I am very sorry about the sparse information from my side. Though, the only answer I can give is that there were Wasmi users in the past which critiqued Wasmi for being not lightweight enough. The reality is that Wasmi competes not only with JIT based Wasm runtimes such as Wasmtime and Wasmer but mostly with other Wasm interpreters and "singlepass" JITs such as WAMR's fast interpreter, Wasmer's singlepass JIT, Wasm3 etc. Especially the other interpreters are even more lightweight than Wasmi, and I am trying since some time (more or less) successfully to drop the numbers to be more competitive in that regard. E.g. Wasmi was compiling in 8s (debug) or 15s (release) and now is compiling in 5s (debug) or 10s (release). With these numbers in mind, 2.5s overhead seems big again I hope. Same argumentation applies to artifact size, though I have not as heavily invested in that area so far.
Yes, I am very willing to do more tests. Though, I am currently at RustWeek so not sure how much time I will have this week. Great talk by the way! |
FWIW I just went with unconditional I'm not currently blocked on anything in particular and there is no rush from my side to get some resolution here. I'm happy to make myself available to help drive some sort of an improvement to the current situation, but at the same time I'm fine if we decide that we should leave the situation as-is for the time being. |
Thanks for the explanation, and makes sense!
Heh thanks! Ok and with what @nagisa said as well let's put this on old until @Robbepop you get a chance to evaluate it. Sounds like there's no rush though, so don't sweat it too much. |
The
simd
feature remains in place, but now does not affect the API the way it used to. Rather parts of implementation pertaining to SIMD in wasmparser get disabled, reducing the compilation time and artifact size impact slightly, though less than in the original approach.The results on my machine are roughly:
I'm not quite sure why the rlib size decreased so significantly when the SIMD feature is enabled (I guess less inlining is occurring?) and there is indeed a .5s hit on the compilation time to this PR.
Are those half a second worth a clunkier API? I'm not convinced :)
Fixes #2181