Skip to content

Commit 6db7b21

Browse files
authored
fix(npm): stale metadata cache issue (#6101)
**What's the problem this PR addresses?** We now keep the package metadata in cache. To avoid missing new packages being released we have a check so that we only accept the cached metadata if 1/ the request asks for a semver version (not a range), and 2/ the requested version is found inside the cached metadata. In theory this means that whenever a dependency asks for a version we didn't cache, we assume something new got published, and we refetch it. However, to prevent fetching the package metadata many times for many different versions or ranges, we also have an in-memory metadata cache where we store the cached metadata once we extracted them from either the disk or the network. This may lead to memory cache corruption issues when two versions from the same package are resolved if one exists in the cached metadata but the other doesn't. In that case, the first package will pass the check for "is this version inside the cached metadata", get stored in the in-memory cache, and be reused for further resolutions (even if those resolutions would have failed this check). This is because the disk cache and the memory cache are the same. Fixes #5989 **How did you fix it?** I separated the in-memory cache into two buckets: the disk cache, and the network cache. This ensures that the disk cache gets properly ignored when retrieving versions we don't know, rather than be mistakenly assumed to be what the network fetched. **Checklist** <!--- Don't worry if you miss something, chores are automatically tested. --> <!--- This checklist exists to help you remember doing the chores when you submit a PR. --> <!--- Put an `x` in all the boxes that apply. --> - [x] I have read the [Contributing Guide](https://yarnpkg.com/advanced/contributing). <!-- See https://yarnpkg.com/advanced/contributing#preparing-your-pr-to-be-released for more details. --> <!-- Check with `yarn version check` and fix with `yarn version check -i` --> - [x] I have set the packages that need to be released for my changes to be effective. <!-- The "Testing chores" workflow validates that your PR follows our guidelines. --> <!-- If it doesn't pass, click on it to see details as to what your PR might be missing. --> - [x] I will check that all automated PR checks pass before the PR gets reviewed.
1 parent 3d99d6b commit 6db7b21

File tree

3 files changed

+202
-79
lines changed

3 files changed

+202
-79
lines changed

.yarn/versions/0fa4c1b4.yml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
releases:
2+
"@yarnpkg/cli": patch
3+
"@yarnpkg/plugin-npm": patch
4+
5+
declined:
6+
- "@yarnpkg/plugin-compat"
7+
- "@yarnpkg/plugin-constraints"
8+
- "@yarnpkg/plugin-dlx"
9+
- "@yarnpkg/plugin-essentials"
10+
- "@yarnpkg/plugin-init"
11+
- "@yarnpkg/plugin-interactive-tools"
12+
- "@yarnpkg/plugin-nm"
13+
- "@yarnpkg/plugin-npm-cli"
14+
- "@yarnpkg/plugin-pack"
15+
- "@yarnpkg/plugin-patch"
16+
- "@yarnpkg/plugin-pnp"
17+
- "@yarnpkg/plugin-pnpm"
18+
- "@yarnpkg/plugin-stage"
19+
- "@yarnpkg/plugin-typescript"
20+
- "@yarnpkg/plugin-version"
21+
- "@yarnpkg/plugin-workspace-tools"
22+
- "@yarnpkg/builder"
23+
- "@yarnpkg/core"
24+
- "@yarnpkg/doctor"
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
import {Filename, ppath, xfs} from '@yarnpkg/fslib';
2+
import {tests} from 'pkg-tests-core';
3+
4+
describe(`Features`, () => {
5+
describe(`Resolution cache`, () => {
6+
test(
7+
`it should use a cache metadata when resolving fixed versions`,
8+
makeTemporaryEnv({}, {
9+
enableGlobalCache: false,
10+
}, async ({path, run, source}) => {
11+
await run(`add`, `[email protected]`);
12+
13+
await xfs.removePromise(ppath.join(path, Filename.lockfile));
14+
15+
// We now hide any version other than 1.0.0 from the registry. If the
16+
// install passes, it means that Yarn read the metadata from the cache rather
17+
// than the registry, as we wanted.
18+
19+
await tests.setPackageWhitelist(new Map([
20+
[`no-deps`, new Set([`1.0.0`])],
21+
]), async () => {
22+
await run(`install`);
23+
});
24+
}),
25+
);
26+
27+
test(
28+
`it should properly separate the disk metadata cache from the network metadata cache`,
29+
makeTemporaryEnv({}, {
30+
enableGlobalCache: false,
31+
}, async ({path, run, source}) => {
32+
await tests.setPackageWhitelist(new Map([
33+
[`no-deps`, new Set([`1.0.0`])],
34+
]), async () => {
35+
await run(`add`, `[email protected]`);
36+
});
37+
38+
await xfs.removePromise(ppath.join(path, Filename.lockfile));
39+
40+
// At this point, no-deps has been added into the metadata cache, but only
41+
// with the 1.0.0 version. The metadata cache isn't aware of other versions.
42+
43+
// Now, we need a way to force the resolution cache to be used before resolving
44+
// a version that it isn't aware of. To that end, we create a package.json with
45+
// a dependency on one-fixed-dep@2, and we run 'yarn add [email protected]'. This
46+
// ensure that Yarn will run getCandidate on [email protected] first (because it's
47+
// required before adding it to the package.json), and [email protected] later.
48+
49+
await xfs.writeFilePromise(ppath.join(path, Filename.manifest), JSON.stringify({
50+
dependencies: {
51+
[`one-fixed-dep`]: `2.0.0`,
52+
},
53+
}));
54+
55+
await run(`add`, `[email protected]`);
56+
}),
57+
);
58+
});
59+
});

packages/plugin-npm/sources/npmHttpUtils.ts

Lines changed: 119 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
1-
import {Configuration, Ident, formatUtils, httpUtils, nodeUtils, StreamReport, structUtils, IdentHash, hashUtils, Project, miscUtils, Cache} from '@yarnpkg/core';
2-
import {MessageName, ReportError} from '@yarnpkg/core';
3-
import {Filename, PortablePath, ppath, xfs} from '@yarnpkg/fslib';
4-
import {prompt} from 'enquirer';
5-
import pick from 'lodash/pick';
6-
import semver from 'semver';
1+
import {Configuration, Ident, formatUtils, httpUtils, nodeUtils, StreamReport, structUtils, hashUtils, Project, miscUtils, Cache} from '@yarnpkg/core';
2+
import {MessageName, ReportError} from '@yarnpkg/core';
3+
import {Filename, PortablePath, ppath, xfs} from '@yarnpkg/fslib';
4+
import {prompt} from 'enquirer';
5+
import pick from 'lodash/pick';
6+
import semver from 'semver';
77

8-
import {Hooks} from './index';
9-
import * as npmConfigUtils from './npmConfigUtils';
10-
import {MapLike} from './npmConfigUtils';
8+
import {Hooks} from './index';
9+
import * as npmConfigUtils from './npmConfigUtils';
10+
import {MapLike} from './npmConfigUtils';
1111

1212
export enum AuthType {
1313
NO_AUTH,
@@ -80,74 +80,31 @@ export type GetPackageMetadataOptions = Omit<Options, 'ident' | 'configuration'>
8080
// - an in-memory cache, to avoid hitting the disk and the network more than once per process for each package
8181
// - an on-disk cache, for exact version matches and to avoid refetching the metadata if the resource hasn't changed on the server
8282

83-
const PACKAGE_METADATA_CACHE = new Map<IdentHash, Promise<PackageMetadata> | PackageMetadata>();
84-
85-
/**
86-
* Caches and returns the package metadata for the given ident.
87-
*
88-
* Note: This function only caches and returns specific fields from the metadata.
89-
* If you need other fields, use the uncached {@link get} or consider whether it would make more sense to extract
90-
* the fields from the on-disk packages using the linkers or from the fetch results using the fetchers.
91-
*/
92-
export async function getPackageMetadata(ident: Ident, {cache, project, registry, headers, version, ...rest}: GetPackageMetadataOptions): Promise<PackageMetadata> {
93-
return await miscUtils.getFactoryWithDefault(PACKAGE_METADATA_CACHE, ident.identHash, async () => {
94-
const {configuration} = project;
95-
96-
registry = normalizeRegistry(configuration, {ident, registry});
97-
98-
const registryFolder = getRegistryFolder(configuration, registry);
99-
const identPath = ppath.join(registryFolder, `${structUtils.slugifyIdent(ident)}.json`);
83+
const PACKAGE_DISK_METADATA_CACHE = new Map<PortablePath, Promise<CachedMetadata | null>>();
84+
const PACKAGE_NETWORK_METADATA_CACHE = new Map<PortablePath, Promise<CachedMetadata | null>>();
10085

86+
async function loadPackageMetadataInfoFromDisk(identPath: PortablePath) {
87+
return await miscUtils.getFactoryWithDefault(PACKAGE_DISK_METADATA_CACHE, identPath, async () => {
10188
let cached: CachedMetadata | null = null;
10289

103-
// We bypass the on-disk cache for security reasons if the lockfile needs to be refreshed,
104-
// since most likely the user is trying to validate the metadata using hardened mode.
105-
if (!project.lockfileNeedsRefresh) {
106-
try {
107-
cached = await xfs.readJsonPromise(identPath) as CachedMetadata;
108-
} catch {}
109-
110-
if (cached) {
111-
if (typeof version !== `undefined` && typeof cached.metadata.versions[version] !== `undefined`)
112-
return cached.metadata;
113-
114-
if (configuration.get(`enableOfflineMode`)) {
115-
const copy = structuredClone(cached.metadata);
116-
const deleted = new Set();
117-
118-
if (cache) {
119-
for (const version of Object.keys(copy.versions)) {
120-
const locator = structUtils.makeLocator(ident, `npm:${version}`);
121-
const mirrorPath = cache.getLocatorMirrorPath(locator);
122-
123-
if (!mirrorPath || !xfs.existsSync(mirrorPath)) {
124-
delete copy.versions[version];
125-
deleted.add(version);
126-
}
127-
}
128-
129-
const latest = copy[`dist-tags`].latest;
130-
if (deleted.has(latest)) {
131-
const allVersions = Object.keys(cached.metadata.versions)
132-
.sort(semver.compare);
133-
134-
let latestIndex = allVersions.indexOf(latest);
135-
while (deleted.has(allVersions[latestIndex]) && latestIndex >= 0)
136-
latestIndex -= 1;
90+
try {
91+
cached = await xfs.readJsonPromise(identPath) as CachedMetadata;
92+
} catch {}
13793

138-
if (latestIndex >= 0) {
139-
copy[`dist-tags`].latest = allVersions[latestIndex];
140-
} else {
141-
delete copy[`dist-tags`].latest;
142-
}
143-
}
144-
}
94+
return cached;
95+
});
96+
}
14597

146-
return copy;
147-
}
148-
}
149-
}
98+
type LoadPackageMetadataInfoFromNetworkOptions = {
99+
configuration: Configuration;
100+
cached: CachedMetadata | null;
101+
registry: string;
102+
headers?: {[key: string]: string | undefined};
103+
version?: string;
104+
};
150105

106+
async function loadPackageMetadataInfoFromNetwork(identPath: PortablePath, ident: Ident, {configuration, cached, registry, headers, version, ...rest}: LoadPackageMetadataInfoFromNetworkOptions) {
107+
return await miscUtils.getFactoryWithDefault(PACKAGE_NETWORK_METADATA_CACHE, identPath, async () => {
151108
return await get(getIdentUrl(ident), {
152109
...rest,
153110
customErrorMessage: customPackageError,
@@ -175,22 +132,28 @@ export async function getPackageMetadata(ident: Ident, {cache, project, registry
175132

176133
const packageMetadata = pickPackageMetadata(JSON.parse(response.body.toString()));
177134

178-
PACKAGE_METADATA_CACHE.set(ident.identHash, packageMetadata);
179-
180135
const metadata: CachedMetadata = {
181136
metadata: packageMetadata,
182137
etag: response.headers.etag,
183138
lastModified: response.headers[`last-modified`],
184139
};
185140

186-
// We append the PID because it is guaranteed that this code is only run once per process for a given ident
187-
const identPathTemp = `${identPath}-${process.pid}.tmp` as PortablePath;
141+
PACKAGE_DISK_METADATA_CACHE.set(identPath, Promise.resolve(metadata));
188142

189-
await xfs.mkdirPromise(registryFolder, {recursive: true});
190-
await xfs.writeJsonPromise(identPathTemp, metadata, {compact: true});
143+
// We don't need the cache in this process anymore (since we stored everything in both memory caches),
144+
// so we can run the part that writes the cache to disk in the background.
145+
Promise.resolve().then(async () => {
146+
// We append the PID because it is guaranteed that this code is only run once per process for a given ident
147+
const identPathTemp = `${identPath}-${process.pid}.tmp` as PortablePath;
191148

192-
// Doing a rename is important to ensure the cache is atomic
193-
await xfs.renamePromise(identPathTemp, identPath);
149+
await xfs.mkdirPromise(ppath.dirname(identPathTemp), {recursive: true});
150+
await xfs.writeJsonPromise(identPathTemp, metadata, {compact: true});
151+
152+
// Doing a rename is important to ensure the cache is atomic
153+
await xfs.renamePromise(identPathTemp, identPath);
154+
}).catch(() => {
155+
// It's not dramatic if the cache can't be written, so we just ignore the error
156+
});
194157

195158
return {
196159
...response,
@@ -201,6 +164,83 @@ export async function getPackageMetadata(ident: Ident, {cache, project, registry
201164
});
202165
}
203166

167+
/**
168+
* Caches and returns the package metadata for the given ident.
169+
*
170+
* Note: This function only caches and returns specific fields from the metadata.
171+
* If you need other fields, use the uncached {@link get} or consider whether it would make more sense to extract
172+
* the fields from the on-disk packages using the linkers or from the fetch results using the fetchers.
173+
*/
174+
export async function getPackageMetadata(ident: Ident, {cache, project, registry, headers, version, ...rest}: GetPackageMetadataOptions): Promise<PackageMetadata> {
175+
const {configuration} = project;
176+
177+
registry = normalizeRegistry(configuration, {ident, registry});
178+
179+
const registryFolder = getRegistryFolder(configuration, registry);
180+
const identPath = ppath.join(registryFolder, `${structUtils.slugifyIdent(ident)}.json`);
181+
182+
let cached: CachedMetadata | null = null;
183+
184+
// We bypass the on-disk cache for security reasons if the lockfile needs to be refreshed,
185+
// since most likely the user is trying to validate the metadata using hardened mode.
186+
if (!project.lockfileNeedsRefresh) {
187+
cached = await loadPackageMetadataInfoFromDisk(identPath);
188+
189+
if (cached) {
190+
if (typeof version !== `undefined` && typeof cached.metadata.versions[version] !== `undefined`)
191+
return cached.metadata;
192+
193+
194+
// If in offline mode, we change the metadata to pretend that the only versions available
195+
// on the registry are the ones currently stored in our cache. This is to avoid the resolver
196+
// to try to resolve to a version that we wouldn't be able to download.
197+
if (configuration.get(`enableOfflineMode`)) {
198+
const copy = structuredClone(cached.metadata);
199+
const deleted = new Set();
200+
201+
if (cache) {
202+
for (const version of Object.keys(copy.versions)) {
203+
const locator = structUtils.makeLocator(ident, `npm:${version}`);
204+
const mirrorPath = cache.getLocatorMirrorPath(locator);
205+
206+
if (!mirrorPath || !xfs.existsSync(mirrorPath)) {
207+
delete copy.versions[version];
208+
deleted.add(version);
209+
}
210+
}
211+
212+
const latest = copy[`dist-tags`].latest;
213+
if (deleted.has(latest)) {
214+
const allVersions = Object.keys(cached.metadata.versions)
215+
.sort(semver.compare);
216+
217+
let latestIndex = allVersions.indexOf(latest);
218+
while (deleted.has(allVersions[latestIndex]) && latestIndex >= 0)
219+
latestIndex -= 1;
220+
221+
if (latestIndex >= 0) {
222+
copy[`dist-tags`].latest = allVersions[latestIndex];
223+
} else {
224+
delete copy[`dist-tags`].latest;
225+
}
226+
}
227+
}
228+
229+
return copy;
230+
}
231+
}
232+
}
233+
234+
return await loadPackageMetadataInfoFromNetwork(identPath, ident, {
235+
...rest,
236+
configuration,
237+
cached,
238+
registry,
239+
headers,
240+
version,
241+
});
242+
}
243+
204244
type CachedMetadata = {
205245
metadata: PackageMetadata;
206246
etag?: string;

0 commit comments

Comments
 (0)