Skip to content

Conversation

@nagilson
Copy link
Member

@nagilson nagilson commented Jan 13, 2026

context

for #52090
copilot failed attempt #52338

Issue: The Workload Manifest Reader can fail with ManifestFromWorkloadSetNotFound when calling GetManifests when package managers clean up manifests from older sdk versions that were still needed by the newer version workloads.

❗ This PR is better reviewed with hiding whitespace changes.

Manual Testing Loop Demo

./artifacts/bin/redist/Debug/dotnet/dotnet exec ./artifacts/bin/redist/Debug/dotnet/sdk/10.0.103-dev/dotnet.dll --debug workload install macos
rm -rf  /workspaces/sdk2/sdk/artifacts/bin/redist/Debug/dotnet/sdk-manifests/10.0.100/microsoft.net.workload.emscripten.current/10.0.102/

./artifacts/bin/redist/Debug/dotnet/dotnet exec ./artifacts/bin/redist/Debug/dotnet/sdk/10.0.103-dev/dotnet.dll --debug workload --info

./artifacts/bin/redist/Debug/dotnet/dotnet exec ./artifacts/bin/redist/Debug/dotnet/sdk/10.0.103-dev/dotnet.dll --debug workload repair
image image

What's this do

I considered removing the workload set entirely during workload manifest search if the manifests didn't exist and it was corrupt but this would prevent the repair from working and still require user interaction.

Removing the error is not sufficient because then we never recover from the bad state or detect the bad state.

So, now we have a separate class which tries to repair missing manifests before we start the manifest lookup. The errors for missing manifests are disabled in the recorder because the recorder will fail to initialize if the manifests are corrupted and it doesn't have the data available to repair or recover from that state. We do the sub-manifest repair because the manifests need to be in a good state for the rest of repair to have the correct data to function. The repairer being hooked into the Installer means commands such as repair will get the hook to do so, but commands such as --info never initialize an installer through the factory, so they instead print a prettier error.

Notes

I tested and confirmed:
repair, restore, update are fixed.
--info, workload --info, workload list show the small error.

Plan: (this commit is step 1 of the plan but idk if it works completely yet)

- Workload Manifest Reader will always fail with `ManifestFromWorkloadSetNotFound` when calling GetManifests.

I considered removing the workload set entirely during workload set search if the manifests didn't exist and it was corrupt but this would prevent the repair from working and still require user interaction.

Removing the error is not sufficient because then we never recover from the bad state or detect the bad state.

So,

1. Add a function to repair so it can repair missing manifests, because before it called ReinstallWorkloadsBasedOnCurrentManifests which calls install workloads, which calls install packs, which needs to read the manifests to know the packs, so repair would fail.

2. Once repair is able to fix the issue, resolve the other commands to be able to fix the issue by A - detecting corrupt issue, and then calling the repair subcomponent to install missing manifests

3. Make sure repair isn't cyclical and doesn't try to call the sub repair option (should be ok, because we don't get to the missing manifests code until we've already repaired the manifests)

4. Add boolean for manifest reader or whatever to print during info, version or list that there was a corrupt workload set  likely from pkg manager but still allow repairing it (or decide if we don't want to do that.)
workload restore is not setup for the injection so didn't add it there but the code is shared between the two so testing like this should be decent
…rkload command so it can be shared"

This reverts commit a3959ba.
… also fixed

this prevents needing to inherit or modify many command files at once
workload history recorder tries to get manifests and it does this before file baesd installer is created.
…ble to repair the manifests - the repair family commands should be able repair
the install state is empty on disk - this is not safe to rely on. Also if we refresh manifests we will create a dupe manifest
@nagilson nagilson changed the title workload x can recover from corrupt workload sets workload repair can recover from corrupt workload sets Jan 15, 2026
@nagilson nagilson marked this pull request as ready for review January 15, 2026 00:51
@nagilson nagilson requested a review from a team January 15, 2026 00:52
Copy link
Member Author

@nagilson nagilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some polish needed if this isnt urgent:

  1. The error string should likely contain the workload id and not just the version
  2. We should try to dedupe the message if possible, despite separate assemblies
  3. The MissingManifests logic is also implemented in two approaches, which has potential for consolidation
  4. It may not be required to attach in the factory method and we should reconsider if we can do that at the IInstaller level
  5. The sdkFeatureBand parameter is unnecessary in the CorruptionRepairer

@nagilson
Copy link
Member Author

The test failure seems to be due to a change in the new, more specific error message presenting in some cases instead of an existing error message.

@marcpopMSFT
Copy link
Member

I tested in codespaces. I can confirm that if I delete any workload manifest (after installing), I will get errors during --info, workload --info, and build. I can also confirm that repair, restore, and update all fix the problem.

The error experiences themselves could probably be cleaned up a bit as they give full exception stacks in 2/3 of those flows (workload --info is the best Ux). But this is sufficient for unblocking the customer. Need to fix the tests and review.

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

#nullable disable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Probably shouldn't be making new files with #nullable disable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants