refactor(manifest): use cargo_metadata for parsing and discovery#135
refactor(manifest): use cargo_metadata for parsing and discovery#135gme-muriuki wants to merge 13 commits into
Conversation
| .map_err(|err| anyhow!("Failed to parse battery pack '{}': {}", crate_name, err))?; | ||
| let version = spec.version.clone(); // !quite sure? | ||
|
|
||
| (Some(version), spec) |
There was a problem hiding this comment.
I think we still want to return None here, generally you would not want to pin a version when using a local path
| .collect::<Vec<_>>() | ||
| .join(",\n"); | ||
|
|
||
| let workspace = format!("[workspace]\nresolver \"2\"\nmembers = [\n{member_toml}\n]\n"); |
There was a problem hiding this comment.
This is not valid TOML, it should be resolver = "2". Are your unit tests passing locally? CI is red earlier due to formatting (we should probably split that out into separate jobs to be less annoying...)
There was a problem hiding this comment.
Ooh, I see. 🤦🤦...
There was a problem hiding this comment.
Sure, formatting drives me crazy all the time in CI.
Mostly because sth is broke in my editor such that it doesn't run cargo fmtautomatically on save. Unless I run it manually. Quite sad actually.
There was a problem hiding this comment.
Could always add a global git pre-commit hook that runs cargo fmt (or non-ops if not in a cargo workspace).
I'd probably fix the editor though ^^
Or change editors if is truly difficult to get it working right.
| } | ||
| } | ||
| /// Runs `cargo metadata` against the given manifest and returns the spec for the matching package. | ||
| pub fn parse_battery_pack_from_path(manifest_path: &Path) -> Result<BatteryPackSpec, Error> { |
There was a problem hiding this comment.
We should canonicalize the path here, we are mixing relative paths from various inputs but cargo_metadata just does absolute, so our path equality checks will fail.
Let's also make sure we have a unit test guarding this case.
| .filter(|(key, value)| { | ||
| !(optional_dep_names.contains(key.as_str()) | ||
| && value.len() == 1 | ||
| && value[0] == format!("dep:{}", key)) |
There was a problem hiding this comment.
++ To this change to strip the auto-gen features.
This made me notice an existing bug. Suppose a bp author specifies:
[features]
indicators = ["dep:indicatif", "dep:console"]
Currently we look for exactly "indicatif" during feature resolution so "dep:indicatif" wouldn't match and thus would error out. The fix would be to strip the dep: prefix in add_feature_crates and validate_features.
Up to you if you care to scope creep this PR into fixing it, while you're anyway working on the parser, glad to review it. Maybe up for cutting a follow up issue if you don't have time to add that in here?
There was a problem hiding this comment.
I don't think it involves a lot of code changes so, I'm gonna do it here.
There was a problem hiding this comment.
I think it makes total sense to handle all the case such as:
[features]
indicators = ["indicatif/foo", "console/?foo"]don't you think?
There was a problem hiding this comment.
Yeah I'd agree with that. Ideally we could just use cargo provided resolution rather than hand rolling it, but if that is a larger change, making the hand rolled cover all these cases to start with is good.
| // ============================================================================ | ||
|
|
||
| /// Parse a battery pack's Cargo.toml into a `BatteryPackSpec`. | ||
| pub fn parse_battery_pack(manifest_str: &str) -> Result<BatteryPackSpec, Error> { |
There was a problem hiding this comment.
I'm on board with this refactor to accept the &Path directly, but let's label the commit as refactor(manifest)! so that it shows up in our changelog as breaking (and we'll bump semver)
There was a problem hiding this comment.
I did not know that actually. Thank you.
| } | ||
| // -- find the package whose manifest path matches the requested one -- | ||
| let manifest_utf8 = Utf8Path::from_path(manifest_path) | ||
| .ok_or_else(|| Error::Metadata("manifest path is not UTF-8".into()))?; |
There was a problem hiding this comment.
worth flushing the path here as well, maybe? ie roughly:
.ok_or_else(|| Error::Metadata(
format!("manifest path is not valid UTF-8: {}", manifest_path.display()).into()
))?;
| Ok(packs) | ||
| /// Discover battery packs reachable from a path. | ||
| /// | ||
| /// `path` may be a workspace root or any crate withing a workspace; |
There was a problem hiding this comment.
| /// `path` may be a workspace root or any crate withing a workspace; | |
| /// `path` may be a workspace root or any crate within a workspace; |
| DependencyKind::Normal => DepKind::Normal, | ||
| DependencyKind::Development => DepKind::Dev, | ||
| DependencyKind::Build => DepKind::Build, | ||
| _ => continue, // or unreachable!(). |
There was a problem hiding this comment.
this is fine, let's say // skip unknown kind fields
We could add a debug log here maybe
| (manifest_dir.to_path_buf(), false) | ||
| } | ||
| Err(e) => return Err(e), | ||
| Err(e) => { |
There was a problem hiding this comment.
surprised that this passes rustfmt?
| .collect::<BTreeMap<_, BTreeSet<_>>>(); | ||
|
|
||
| // -- side-read TOML for [package.metadata.battery-pack].hidden + battery.templates -- | ||
| let (hidden, templates) = read_bp_metadata_tables(&pkg.manifest_path)?; |
There was a problem hiding this comment.
Side-read [package.metadata.battery-pack] and [package.metadata.battery.templates] from TOML directly -- cargo_metadata exposes these as serde_json::Value which loses the original shape.
I don't think we care about that here, no? Since we are anyway not writing the toml back out into user space files. We only are parsing its contents, which shouldn't be lossy except for eg formatting and comments?
But the [package.metadata.battery-pack].hidden + battery.templates should propagate properly?
IE, seems like we could skip the extra disk read and just use the cargo_metadata output directly?
There was a problem hiding this comment.
Yeah. Totally possible. I'll fix that too.
| let spec = bphelper_manifest::parse_battery_pack(&content) | ||
| .map_err(|e| anyhow::anyhow!("failed to parse {}: {e}", cargo_toml.display()))?; | ||
| let spec = parse_battery_pack_from_path(&cargo_toml) | ||
| .map_err(|err| anyhow!("failed to parse {}: {}", cargo_toml.display(), err))?; |
There was a problem hiding this comment.
I've noticed that a lot of places have a pattern of map_err(|err| anyhow!("something {}: {}", smth, err)). Have you considered making that a function to abstract some logic away?
There was a problem hiding this comment.
Wait, i don't quite follow. What do you mean?
There was a problem hiding this comment.
You can also use .context() to help with that:
https://docs.rs/anyhow/latest/anyhow/trait.Context.html
| [features] | ||
| indicators = ["dep:indicatif"] | ||
| "#; | ||
|
|
There was a problem hiding this comment.
for these, consider using snapbox or something similar. I'm not quite on board with toml embedded right into the test code like this.
There was a problem hiding this comment.
I see embedded toml everywhere in the codebase. I'm not quite sure whether I should change for those test that I've written, or all the tests.
There was a problem hiding this comment.
Anyway, I could do the changes as follow-up PR.
There was a problem hiding this comment.
++ to using snapbox to manage snapshots instead of making point assertions like contains_key. Or rather, in addition to, we can keep the point assertion as well.
And it looks like we use indoc! to manage these embedded inputs.
This test is pretty much the same as we want, I think:
jlizen
left a comment
There was a problem hiding this comment.
(I think you are still working through feedback because noticed a couple more things)
| target | ||
| WIP.md | ||
| .claude/settings.local.json | ||
| .claude/ No newline at end of file |
There was a problem hiding this comment.
let's only keep commands and reject all else, ie make sure when we rebase against #134 we take that gitignore
| let cache_file = cache_dir.join(format!("{}_spec.toml", crate_name)); | ||
| let _ = fs::write(&cache_file, &manifest_content); | ||
| if fs::create_dir_all(&cache_dir).is_ok() | ||
| && let Ok(json) = serde_json::to_string(&spec) |
There was a problem hiding this comment.
would be nice to have a debug log in case of deserialize failures
ba2fb89 to
5061a1f
Compare
jlizen
left a comment
There was a problem hiding this comment.
Good refactors, a few more small things @gme-muriuki
| [features] | ||
| indicators = ["dep:indicatif"] | ||
| "#; | ||
|
|
There was a problem hiding this comment.
++ to using snapbox to manage snapshots instead of making point assertions like contains_key. Or rather, in addition to, we can keep the point assertion as well.
And it looks like we use indoc! to manage these embedded inputs.
This test is pretty much the same as we want, I think:
| @@ -1018,8 +956,108 @@ fn validate_templates_on_disk( | |||
|
|
|||
| #[cfg(test)] | |||
There was a problem hiding this comment.
Can we also have a regression test for glob members ((members = ["crates/*"]))? That would have broke old parser but should work now.
| #[test] | ||
| // [verify cli.source.discover] workspace case — member crate discovers siblings | ||
| fn discover_from_crate_root_finds_workspace() { | ||
| fn discover_battery_packs_finds_workspace() { |
There was a problem hiding this comment.
can we also have a negative test that non-battery-pack workspace members are excluded
There was a problem hiding this comment.
Totally possible. I'll do that too.
| .map_err(|err| anyhow!("Failed to parse battery pack '{}': {}", crate_name, err))?; | ||
| let version = spec.version.clone(); // !quite sure? | ||
|
|
||
| (Some(version), spec) |
bphelper-manifest had its own hand-rolled TOML parser
(`parse_battery_pack(&str)` + `RawManifest`/`RawPackage`/`RawDep`) and
walked workspace members literally via `members.iter().join(path)`.
That layer was blind to glob members, `default-members`, `exclude`,
and -- most visibly to users -- workspace dependency inheritance:
`snapbox = { workspace = true }` resolved to an empty version, so
`cargo bp add --path` wrote `snapbox = ""` to the user's Cargo.toml
(issue battery-pack-rs#96).
Replace the parser with `cargo_metadata::MetadataCommand`, which
shells out to the real `cargo metadata` and returns the same view
Cargo itself uses.
- Delete `parse_battery_pack(&str)`, `parse_dep_section`,
`parse_single_dep`, and the supporting `Raw*` deserialization types.
- Add `parse_battery_pack_from_path(&Path)` as the single-pack entry
point; migrate the 16 call sites in bphelper-cli/bphelper-build.
- Collapse `discover_from_crate_root` into `discover_battery_packs`:
`cargo metadata --manifest-path` walks up to the workspace root for
us, and a no-`[workspace]` crate is treated as a 1-member workspace,
so the two functions ended up identical.
- Extract `load_metadata` for the `MetadataCommand` boilerplate shared
by discover + single-pack paths.
- Side-read TOML in `read_bp_metadata_tables` for the bp-specific
`[package.metadata.battery-pack]` and
`[package.metadata.battery.templates]` tables; cargo_metadata exposes
these only as `serde_json::Value` and round-tripping through that
loses the original shape.
- Switch the completion cache from raw Cargo.toml snippets to
JSON-serialized `BatteryPackSpec`; cargo metadata can't operate on
isolated manifests, so the old `<name>_spec.toml` format is no longer
parseable.
Closes battery-pack-rs#13. Closes battery-pack-rs#96.
- read [package.metadata.battery-pack] from cargo_metadata JSON (pkg.metadata) instead of re-reading manifest TOML; drop read_bp_metadata_tables, add serde_json - strip implicit `^` from dep versions so output matches `cargo add` (`"1"` not `"^1"`) - parse feature crate refs (`dep:foo`, `foo/bar`, `foo?/bar`) via crate_from_feature_reference in validate + resolve paths - canonicalize manifest path in parse_battery_pack_from_path so non-canonical input (e.g. `../pack/Cargo.toml`) matches - warn + skip dependencies with unrecognized kind - tests: route parse_test through parse_battery_pack_from_path; build unknown-crate-in-feature specs by hand (cargo metadata rejects them at parse); give docgen fixtures a src/lib.rs target; fix `resolver = "2"` typo - tidy: collapse validate error arm
- The autocomplete spec cache swallowed every failure (create_dir_all,
serde_json::to_string, fs::write).
- Route write through cache_spec_for_completion, which stays
best-effort but logs a `warning:` at each failure point instead of
discarding it silently.
b13837c to
d2b9541
Compare
Sorry, totally slipped my mind |
- Replace map_err(|err| anyhow!("..:{err}")) with .context() /
.with_context() so source errors chain instead of flattening into a
single string.
Drop now-unused anyhow! imports
Remove the pinning of a dependency version when the dependency is local
in add_batter_pack().
- Add two discovery tests: non-battery-pack members are excluded, and
glob member path (members = ["crates/*]) are expanded.
- Wrap every TOML input in indoc! and convert resolve/discover tests
to snapbox snapshot for readable output.
| /// - `"foo?/bar"` | ||
| fn crate_from_feature_reference(s: &str) -> &str { | ||
| let crt = s.strip_prefix("dep:").unwrap_or(s); | ||
| let crt = crt.split('/').next().unwrap_or(crt); |
There was a problem hiding this comment.
This behavior will drop any optional features, ie this:
[dependencies]
serde = { version = "1", optional = true }
[features]
fancy = ["serde/derive"]
Resolves to:
serde = "1"
instead of:
serde = { version = "1", features = ["derive"] }
This seems like a regression to me, I think we want the latter.
I would probably handle this via a small local parser like so:
enum FeatureRef<'a> {
Dep(&'a str),
Feature(&'a str),
DepFeature {
dep: &'a str,
feature: &'a str,
weak: bool,
},
}
(though maybe @epage has better advice)
There was a problem hiding this comment.
We could defer to cargo entirely and own as minimum parsing as possible.
There was a problem hiding this comment.
Thinking of going with the typed-enum direction you suggested, but lifting the parse out of the resolver into manifest-load time -- so BatteryPackSpec.features stores Vec<FeatureRef> instead of Vec<String>.
Variants would be Dep, Feature, and DepFeature {dep, feature, weak }.
Validation moves up with it: bad refs become parse errors with location, instead of silent drops at resolve time.
The resolver becomes a small match -- for DepFeature it unions the per-dep feature into the result's feature set, which is the actual fix.
One question I want your take on:
- I'm planning to treat weak (foo?/bar) and strong (foo/bar) the same in the resolver. The reasoning is we're a recommender, not a runtime resolver -- weak/strong only affects whether cargo auto-activates the dep at runtime, which is the downstream consumer's concern. The weak flag would still be a FeatureRef so a future emitter could mirror it.
Does that match your mental model, or would you rather drop the weak ref entirely (closer to cargo's runtime behaviour) at resolve time?
For the broader "what about future cargo semantic changes?" worry, I'm thinking of adding an oracle test harness behind an oracle cargo feature: per-fixture x feature combo, assert our resolver names the same direct deps as cargo metadata --features . Run it with CI. That way if cargo's rules shift, the test fails before users hit it.
Open to alternatives if you've seen this done another way.
cc @epage
There was a problem hiding this comment.
Thinking of going with the typed-enum direction you suggested, but lifting the parse out of the resolver into manifest-load time -- so BatteryPackSpec.features stores Vec instead of Vec.
Variants would be Dep, Feature, and DepFeature {dep, feature, weak }.
Seems good
I'm planning to treat weak (foo?/bar) and strong (foo/bar) the same in the resolver. The reasoning is we're a recommender, not a runtime resolver -- weak/strong only affects whether cargo auto-activates the dep at runtime, which is the downstream consumer's concern. The weak flag would still be a FeatureRef so a future emitter could mirror it.
This seems fine to treat them identically for feature propagation, but we should be careful not to activate dependencies if there is only a weak syntax and nothing else is directly activating the feature
For the broader "what about future cargo semantic changes?" worry, I'm thinking of adding an oracle test harness behind an oracle cargo feature: per-fixture x feature combo, assert our resolver names the same direct deps as cargo metadata --features . Run it with CI. That way if cargo's rules shift, the test fails before users hit it.
An oracle seems nice but we would want to be careful about complexity explosion or flakiness. Probably I'd suggest using a path dependency in fixtures to avoid network usage. And keep the fixture cases relatively few and targeted (weak deps, dep: syntax, feature-of-a-dep, etc).
There was a problem hiding this comment.
Makes sense 👍. Working on it, will be through mid next week.
…into fix/issue-13-cargo-metadata
A spec section for `[features]` value-list entries: the four reference forms (`foo`, `dep:foo`, `foo/bar`,`foo?/bar`), how the recommender resolves them, what validation rejects, and the oracle invariant against `cargo metadata`. Subsequent commits implement and verify it.
`[features]` entries lived in `BTreeSet<String>` and the recommender
hand-walked the strings. That worked for bare `foo` but quietly
mishandled `dep:foo`, strong `foo/bar`, and weak `foo?/bar`.
This commit pulls the grammar into a real type.
* A new `FeatureRef` enum in `feature_ref` with variants
`Feature(name)`, `Dep(name)`, and `DepFeature {dep, feature, weak}`.
`FeatureRef::parse` is the single entry point; `FromStr`, `Display`,
`Serialize`, and `Deserialize` round-trip through the canonical Cargo
string form. `Ord` goes through `Display` so `BTreeSet<FeatureRef>`
iterates in printed order -- diff-stable and matching what users see.
* `BatteryPackSpec.features` is now `BTreeMap<String,
BTreeSet<FeatureRef>>`. `resolve_crates` / `add_feature_crates` are
rewritten around the parsed variants: bare `Feature(name` recurses
into a local feature when one exists (with a cycle guard) and
otherwise falls back to dep lookup -- the implicit-feature rule;
`Dep(name)` activates the dep directly; strong `DepFeature` activates
the dep and adds the per-dep feature; weak is deferred to a final
pass that only fires when the dep was activated by somethinge else.
* Callers in `bphelper_cli` and `bphelper_build` switched to the new
shape via `dep_name()`; user visible output is unchanged.
* New fixtures under `tests/fixtures/` -- `feature-syntax-battery-pack`
covers every reference form, `optional-feature-battery-pack` pins the
strong-dep-feature regression, `mixed-kinds-battery-pack` tangles
dev/build/optional Normal deps for the `dev-build-always' invariant.
Hidden `_stubs/` keep them self-contained
Pins the resolver to cargo's own behavior. For every `(fixture, combo)` in `CASES`, the test shells out to `cargo metadata` and asserts the direct-dep set matches what our resolver returns. * `tests/fixture_oracle.rs` gated behind a new `oracle` cargo feature on `bphelper-manifest`, off by default because it needs the registry on first run. * A dedicated `oracle` job in `.github/workflows/rust.yml` runs it on every PR.
jlizen
left a comment
There was a problem hiding this comment.
I think we should remove resolve_all_visible() as a public API, I think I found all the usages but clearly it is a maintenance footgun.
I also think we are missing some e2e tests. The unit tests are comprehensive but we have holes where this feature resolution gap would have passed through.
Suggested tests:
add --no-default-features -F fancypreserves per-dep featuresadd --all-featuresalso preserves per-dep features- Nested refs like
bundle = ["fancy"]expand correctly
| if let Some(members) = bp_spec.features.get(feat_name) { | ||
| for c in members { | ||
| for fref in members { | ||
| let c = fref.dep_name(); |
There was a problem hiding this comment.
This needs to pass through the new resolver that non-interactive mode now uses. Otherwise, it skips feature-gated features
|
I'm in a bit of a situation here. Should the Context: the manifest crate exposes Right now I tried fixing this by calling the validate() inside the function. It broke So there are two different consumers with different contracts:
Should I split them into two entry points: /// lenient, what bphelper-build uses
pub fn parse_battery_pack_from_path(path: &Path) -> Result<BatteryPackSpec, Error> {
...
}/// parse + validate
pub fn parse_and_validate_battery_pack(path: &Path) -> Result<BatteryPackSpec, Error> {
...
spec.validate()?;
} |
I don't believe that that is necessary. As I understand it to be, parsing and validation are different things. Just as well, I can conjure up cases where you wouldn't need to validate: for example if you knew that the spec in question was for a battery pack. I think it'd be fine if you just put a doc comment in |
Interesting. This is what I have in the code as of now. If that's the right call, I'll commit to that then. @jlizen What do you think? |
Typed resolver boundary:
- ActiveFeatures::{All, Subset} replaces BTreeSet<String> + "all"
sentinel
- From<&BTreeSet<String>> centralises sentinel translation
- resolve_for_features now filters hidden crates in one place
Error surface tightened:
- Error::Metadata holds cargo_metadata::Error directly (not Box<dyn
Error>); new MetadataDecode variant covers [package.metadata] decode
failures
- FeatureRefParse carries the underlying FeatureParseError as #[source]
Resolver internals:
- add_dep stops double-cloning spec-features on insert
Tests: hidden+All filtering, feature-cycle validation, "all"-literal
sentinel coercion.
parse_battery_pack_from_path keeps its lenient contract (no eager
validate) so bphelper-build still works inside scaffolded user crates;
contract documented.
jlizen
left a comment
There was a problem hiding this comment.
Lenient parse vs two entrypoints
I think a second entrypoint JUST to add name validation is probably overkill. But, we aren't currently calling validate() on install or sync, we could add that into load_installed_bp_spec. AIUI it mostly would affect cargo bp add --path (which would currently allow a non-*-battery-pack name).
(And then for show/list, I don't think we want validation, so lenient is probably the right default?)
We should add a regression test for the CLI capturing that case and that it is appropriately rejected.
| /// [impl format.hidden.effect] | ||
| pub fn resolve_for_features(&self, active: &ActiveFeatures) -> BTreeMap<String, CrateSpec> { | ||
| let expanded: Vec<&str> = match active { | ||
| ActiveFeatures::All => self.features.keys().map(String::as_str).collect(), |
There was a problem hiding this comment.
We have an untested edge case where an optional dependency that has no explicit feature, will be dropped here. This is contrary to cargo since cargo injects the implicit feature for that dependency. (Since we strip those extra features before this method is called). This means our oracle will not match the output.
This is contrary to the older resolve_all_visible() handling.
I don't have a strong opinion about what the right way to handle it is, I don't mind this new behavior, but we should have an oracle case covering it and some brief documentation.
| pub description: Option<String>, | ||
| } | ||
|
|
||
| /// Active feature selection at the resolver boundary. |
There was a problem hiding this comment.
I suggested just pulling off the bandaid and doing a state format version bump and switching to tagged representation using serde, encapsulating the sentinel here doesn't buy us very much
Co-authored-by: Jess Izen <44884346+jlizen@users.noreply.github.com>
Problem
bhelper-manifestused a hand-rolled TOML parser that walked workspace members via literal path joins.This missed glob members, default-members, exclude, and workspace dependency inheritance - mostly visibly,
snapbox = {workspace = true}resolved to an empty version, so cargo bp add --path wrotesnapbox = ""(#96).Fix
Changes:
parse_battery_pack_from_path(&Path)as the single-pack entry point; migrate 16 call sites.discover_battery_packs-- cargo metadata --manifest-path walks to the workspace root, and a no-[workspace] crate is a 1-member workspace, so both paths are identical.[package.metadata.battery-pack]and[package.metadata.battery.templates]from TOML directly -- cargo_metadata exposes these as serde_json::Value which loses the original shape.Closes (#13) and (#96)