Skip to content

fix(docs-mcp): recursively crawl and register nested llms.txt resources#2317

Open
hzy wants to merge 3 commits intomainfrom
fix/docs-mcp-recursion
Open

fix(docs-mcp): recursively crawl and register nested llms.txt resources#2317
hzy wants to merge 3 commits intomainfrom
fix/docs-mcp-recursion

Conversation

@hzy
Copy link
Copy Markdown
Collaborator

@hzy hzy commented Mar 6, 2026

This PR fixes an issue where nested documentation resources (linked from sub-indexes like api/llms.txt) were not being registered by the MCP server, causing 'Resource not found' errors.

Changes:

  • Implemented recursive crawling of llms.txt files in main.ts.
  • Added logic to fetch and parse nested index files and register their linked resources.
  • Added HTTP status checks and improved error handling.
  • Added changeset.

Summary by CodeRabbit

  • Bug Fixes

    • Enables recursive crawling and registration of nested documentation indices.
    • Prevents duplicate processing of already-seen documentation sources.
    • Improves fetching error handling with clearer logging and graceful continuation.
  • Chores

    • Added a changeset entry for a patch release.

Copilot AI review requested due to automatic review settings March 6, 2026 16:20
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Mar 6, 2026

🦋 Changeset detected

Latest commit: 6877936

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@lynx-js/docs-mcp-server Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@cla-assistant
Copy link
Copy Markdown

cla-assistant bot commented Mar 6, 2026

CLA assistant check
All committers have signed the CLA.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 6, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7e6fd764-4cd1-436e-89e8-ba7933f36a57

📥 Commits

Reviewing files that changed from the base of the PR and between 7b4ac27 and 6877936.

📒 Files selected for processing (1)
  • packages/mcp-servers/docs-mcp-server/main.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/mcp-servers/docs-mcp-server/main.ts

📝 Walkthrough

Walkthrough

Converts the docs MCP server resource registrar into an async crawler that recursively discovers and registers nested llms.txt markdown indexes, tracks visited URLs to avoid cycles, replaces forEach with for...of for awaitable control flow, and adds guarded fetch/error logging; also adds a changeset entry.

Changes

Cohort / File(s) Summary
Changeset Entry
​.changeset/fix-recursive-docs-mcp.md
Adds a changeset documenting a patch release and the fix for recursive crawling/registration of nested llms.txt resources.
Docs MCP Server
packages/mcp-servers/docs-mcp-server/main.ts
Renames synchronous registerResources → async crawlAndRegisterResources; adds visited: Set<string> parameter and initialization, switches link traversal to for...of, implements special-case handling for nested llms.txt (fetch, register as lynx-docs://..., mark visited, recurse), updates non-llms.txt registration to await guarded fetch and handle non-OK responses, updates main to use the new async crawler.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • colinaaa

Poem

🐰 I hop through links both near and far,

I sniff each llms.txt like a guiding star.
I mark what's new and skip what's been read,
log a tumble, then bound onward instead.
Recursive paths — a rabbit's thread.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: implementing recursive crawling and registration of nested llms.txt resources, which directly matches the PR's core objective and the code modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/docs-mcp-recursion

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/mcp-servers/docs-mcp-server/main.ts (1)

122-130: Consider making the resource factory consistent with non-nested resources.

The factory here returns cached nestedMarkdown captured at registration time, while regular resources (lines 157-171) use an async factory that fetches fresh content on each read. This creates behavioral inconsistency: nested index resources return startup-time content, while other resources reflect current server content.

If this caching is intentional (avoiding redundant fetches for stable index files), consider adding a brief comment to document the design decision.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/mcp-servers/docs-mcp-server/main.ts` around lines 122 - 130, The
resource factory currently returns the cached nestedMarkdown captured at
registration (the factory returning () => ({ contents: [{ uri:
`lynx-docs://${strippedUrl}`, text: nestedMarkdown, mimeType: 'text/markdown' }]
})), which is inconsistent with the other resource factories that are async and
fetch fresh content on each read; either change this factory to an async factory
that computes/fetches the current nested markdown on each invocation (e.g.,
async () => ({ contents: [{ uri: `lynx-docs://${strippedUrl}`, text: await
computeNestedMarkdown(...), mimeType: 'text/markdown' }] })) to match the
behavior of the resources at lines 157-171, or if startup caching is
intentional, add a short comment above this factory referencing nestedMarkdown
and explaining that it is intentionally captured at registration to avoid
repeated fetches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/mcp-servers/docs-mcp-server/main.ts`:
- Around line 122-130: The resource factory currently returns the cached
nestedMarkdown captured at registration (the factory returning () => ({
contents: [{ uri: `lynx-docs://${strippedUrl}`, text: nestedMarkdown, mimeType:
'text/markdown' }] })), which is inconsistent with the other resource factories
that are async and fetch fresh content on each read; either change this factory
to an async factory that computes/fetches the current nested markdown on each
invocation (e.g., async () => ({ contents: [{ uri: `lynx-docs://${strippedUrl}`,
text: await computeNestedMarkdown(...), mimeType: 'text/markdown' }] })) to
match the behavior of the resources at lines 157-171, or if startup caching is
intentional, add a short comment above this factory referencing nestedMarkdown
and explaining that it is intentionally captured at registration to avoid
repeated fetches.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e04dd5dd-f09e-45c9-80f1-5621108d9111

📥 Commits

Reviewing files that changed from the base of the PR and between 4daa4d9 and 25e7b16.

📒 Files selected for processing (2)
  • .changeset/fix-recursive-docs-mcp.md
  • packages/mcp-servers/docs-mcp-server/main.ts

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the docs MCP server to recursively discover and register resources referenced by nested llms.txt index files, preventing “Resource not found” errors when documentation is organized under sub-indexes.

Changes:

  • Implement recursive crawling of llms.txt links to register nested indexes and their referenced resources.
  • Add HTTP status handling for nested index fetches and for resource fetches during reads.
  • Add a changeset to publish a patch release for @lynx-js/docs-mcp-server.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
packages/mcp-servers/docs-mcp-server/main.ts Adds recursive crawling/registration of nested llms.txt resources and improves fetch error handling.
.changeset/fix-recursive-docs-mcp.md Patch changeset entry for the docs MCP server.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +99 to +113
if (strippedUrl.endsWith('llms.txt')) {
if (visited.has(strippedUrl)) {
continue;
}

debug(`Recursively fetching index: ${link.url}`);
try {
const response = await fetch(link.url);
if (!response.ok) {
debug(`Failed to fetch nested index ${link.url}: ${response.status} ${response.statusText}`);
continue;
}
const nestedMarkdown = await response.text();
visited.add(strippedUrl);

Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the recursive llms.txt branch, visited is only updated after a successful fetch+read. If the same nested index appears in multiple index files and consistently fails (404/network), the server will re-fetch it for every occurrence, potentially causing long startup times and noisy logs. Consider adding the strippedUrl to visited before attempting the fetch (or tracking a separate failed/inProgress set) so each nested index is attempted at most once per startup run.

Copilot uses AI. Check for mistakes.
async () => {
const response = await fetch(link.url);
if (!response.ok) {
throw new Error(`Failed to fetch resource ${link.url}: ${response.status} ${response.statusText}`);
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s an extra leading space before throw new Error(...) which makes the indentation inconsistent with the surrounding block and may fail formatting checks (e.g., Prettier). Align this line’s indentation with the rest of the block.

Suggested change
throw new Error(`Failed to fetch resource ${link.url}: ${response.status} ${response.statusText}`);
throw new Error(`Failed to fetch resource ${link.url}: ${response.status} ${response.statusText}`);

Copilot uses AI. Check for mistakes.
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 6, 2026

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
4729 1 4728 123
View the top 3 failed test(s) by shortest run time
tests/reactlynx.spec.ts::reactlynx3 tests › apis › api-dispose
Stack Traces | 8.74s run time
reactlynx.spec.ts:1368:5 api-dispose
tests/reactlynx.spec.ts::reactlynx3 tests › elements › scroll-view › basic-element-scroll-view-event-scrollend
Stack Traces | 17.2s run time
reactlynx.spec.ts:2781:7 basic-element-scroll-view-event-scrollend
tests/reactlynx.spec.ts::reactlynx3 tests › elements › x-viewpager-ng › basic-element-x-viewpager-ng-exposure
Stack Traces | 19.5s run time
reactlynx.spec.ts:3001:7 basic-element-x-viewpager-ng-exposure

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@relativeci
Copy link
Copy Markdown

relativeci bot commented Mar 6, 2026

Web Explorer

#8807 Bundle Size — 748.66KiB (0%).

6877936(current) vs cffd86f main#8805(baseline)

Bundle metrics  no changes
                 Current
#8807
     Baseline
#8805
No change  Initial JS 44.27KiB 44.27KiB
No change  Initial CSS 2.16KiB 2.16KiB
No change  Cache Invalidation 0% 0%
No change  Chunks 8 8
No change  Assets 10 10
No change  Modules 149 149
No change  Duplicate Modules 11 11
No change  Duplicate Code 35.01% 35.01%
No change  Packages 3 3
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#8807
     Baseline
#8805
No change  Other 401.63KiB 401.63KiB
No change  JS 344.87KiB 344.87KiB
No change  CSS 2.16KiB 2.16KiB

Bundle analysis reportBranch fix/docs-mcp-recursionProject dashboard


Generated by RelativeCIDocumentationReport issue

@hzy hzy force-pushed the fix/docs-mcp-recursion branch from 25e7b16 to 80a7953 Compare March 9, 2026 10:01
@hzy hzy force-pushed the fix/docs-mcp-recursion branch from 80a7953 to 7b4ac27 Compare March 9, 2026 10:01
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/mcp-servers/docs-mcp-server/main.ts`:
- Around line 49-54: The crawler in crawlAndRegisterResources and related blocks
resolves nested link.url against the original root baseURL instead of the
current llms.txt location, so relative links like ./foo.md or ../bar/llms.txt
break; fix by resolving each link against the current resource's URL before
using or recursing: compute a resolved URL using new URL(link.url,
currentResourceBase) where currentResourceBase is the URL of the llms.txt (or
the full URL you just fetched/parsed) rather than the passed-in root baseURL,
use that resolved URL for fetching/registering and pass its origin/path (or the
resolved URL) as the base for recursive calls (update occurrences in
crawlAndRegisterResources and the other blocks mentioned: 99-138, 157-171,
222-228).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2b2081ec-7d5c-4d04-8e48-15dd4bbf79a3

📥 Commits

Reviewing files that changed from the base of the PR and between 25e7b16 and 7b4ac27.

📒 Files selected for processing (2)
  • .changeset/fix-recursive-docs-mcp.md
  • packages/mcp-servers/docs-mcp-server/main.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • .changeset/fix-recursive-docs-mcp.md

Comment on lines +49 to 54
async function crawlAndRegisterResources(
baseURL: string,
mcpServer: McpServer,
fromMarkdownText: string,
visited: Set<string> = new Set(),
) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

BASE_URL="${BASE_URL:-https://lynxjs.org/next}"
ROOT_URL="${BASE_URL%/}/llms.txt"

python - <<'PY' "$ROOT_URL"
import re
import sys
from urllib.parse import urljoin, urlparse
from urllib.request import urlopen

root_url = sys.argv[1]

def read(url: str) -> str:
    with urlopen(url) as resp:
        return resp.read().decode("utf-8", errors="replace")

def markdown_links(md: str):
    return re.findall(r'\[[^\]]+\]\(([^)]+)\)', md)

root_md = read(root_url)
nested_indexes = []

for href in markdown_links(root_md):
    absolute = urljoin(root_url, href)
    if urlparse(absolute).path.endswith("llms.txt"):
        nested_indexes.append(absolute)

print(f"Root index: {root_url}")
print(f"Nested indexes found: {len(nested_indexes)}")

for nested_url in nested_indexes:
    nested_md = read(nested_url)
    relative_links = []
    for href in markdown_links(nested_md):
        parsed = urlparse(href)
        if not parsed.scheme and not href.startswith("//"):
            relative_links.append(href)

    print(f"\n{nested_url}")
    print(f"  relative links: {len(relative_links)}")
    for href in relative_links[:10]:
        print(f"    - {href}")
PY

Repository: lynx-family/lynx-stack

Length of output: 2682


🏁 Script executed:

head -c 0 /dev/null  # test
wc -l packages/mcp-servers/docs-mcp-server/main.ts

Repository: lynx-family/lynx-stack

Length of output: 114


🏁 Script executed:

sed -n '49,54p' packages/mcp-servers/docs-mcp-server/main.ts
sed -n '99,138p' packages/mcp-servers/docs-mcp-server/main.ts
sed -n '157,171p' packages/mcp-servers/docs-mcp-server/main.ts
sed -n '222,228p' packages/mcp-servers/docs-mcp-server/main.ts

Repository: lynx-family/lynx-stack

Length of output: 1892


Resolve nested links against the current llms.txt, not the root URL.

After the first recursion level, the crawler still consumes link.url as-is and keeps passing baseUrl forward. If a nested index contains relative links (./foo.md, ../bar/llms.txt), those links will never resolve correctly, so the nested resources still won't be registered/read.

💡 Proposed fix
 async function crawlAndRegisterResources(
   baseURL: string,
+  currentIndexUrl: string,
   mcpServer: McpServer,
   fromMarkdownText: string,
   visited: Set<string> = new Set(),
 ) {
@@
-      const u = new URL(link.url);
+      const u = new URL(link.url, currentIndexUrl);
@@
-        const response = await fetch(link.url);
+        const nestedIndexUrl = new URL(link.url, currentIndexUrl);
+        const response = await fetch(nestedIndexUrl);
@@
         await crawlAndRegisterResources(
           baseURL,
+          nestedIndexUrl.href,
           mcpServer,
           nestedMarkdown,
           visited,
         );
@@
-        const response = await fetch(link.url);
+        const resourceUrl = new URL(link.url, currentIndexUrl);
+        const response = await fetch(resourceUrl);
         if (!response.ok) {
-           throw new Error(`Failed to fetch resource ${link.url}: ${response.status} ${response.statusText}`);
+          throw new Error(`Failed to fetch resource ${resourceUrl}: ${response.status} ${response.statusText}`);
         }
@@
   await crawlAndRegisterResources(
     baseUrl,
+    ROOT_DOC_URL,
     mcpServer,
     ROOT_DOC_MARKDOWN,
     visited,
   );

Also applies to: 99-138, 157-171, 222-228

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/mcp-servers/docs-mcp-server/main.ts` around lines 49 - 54, The
crawler in crawlAndRegisterResources and related blocks resolves nested link.url
against the original root baseURL instead of the current llms.txt location, so
relative links like ./foo.md or ../bar/llms.txt break; fix by resolving each
link against the current resource's URL before using or recursing: compute a
resolved URL using new URL(link.url, currentResourceBase) where
currentResourceBase is the URL of the llms.txt (or the full URL you just
fetched/parsed) rather than the passed-in root baseURL, use that resolved URL
for fetching/registering and pass its origin/path (or the resolved URL) as the
base for recursive calls (update occurrences in crawlAndRegisterResources and
the other blocks mentioned: 99-138, 157-171, 222-228).

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Mar 9, 2026

Merging this PR will degrade performance by 14.57%

❌ 1 regressed benchmark
✅ 80 untouched benchmarks
⏩ 21 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
transform 1000 view elements 40.4 ms 47.3 ms -14.57%

Comparing fix/docs-mcp-recursion (6877936) with main (fd0cc6e)2

Open in CodSpeed

Footnotes

  1. 21 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on main (cffd86f) during the generation of this report, so fd0cc6e was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@relativeci
Copy link
Copy Markdown

relativeci bot commented Apr 13, 2026

React MTF Example

#366 Bundle Size — 206.12KiB (0%).

6877936(current) vs cffd86f main#364(baseline)

Bundle metrics  no changes
                 Current
#366
     Baseline
#364
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
No change  Cache Invalidation 0% 0%
No change  Chunks 0 0
No change  Assets 3 3
No change  Modules 173 173
No change  Duplicate Modules 67 67
No change  Duplicate Code 45.79% 45.79%
No change  Packages 2 2
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#366
     Baseline
#364
No change  IMG 111.23KiB 111.23KiB
No change  Other 94.89KiB 94.89KiB

Bundle analysis reportBranch fix/docs-mcp-recursionProject dashboard


Generated by RelativeCIDocumentationReport issue

@relativeci
Copy link
Copy Markdown

relativeci bot commented Apr 13, 2026

React External

#351 Bundle Size — 591.76KiB (0%).

6877936(current) vs cffd86f main#349(baseline)

Bundle metrics  no changes
                 Current
#351
     Baseline
#349
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
Change  Cache Invalidation 0% 30.88%
No change  Chunks 0 0
No change  Assets 3 3
No change  Modules 17 17
No change  Duplicate Modules 5 5
No change  Duplicate Code 8.59% 8.59%
No change  Packages 0 0
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#351
     Baseline
#349
No change  Other 591.76KiB 591.76KiB

Bundle analysis reportBranch fix/docs-mcp-recursionProject dashboard


Generated by RelativeCIDocumentationReport issue

@relativeci
Copy link
Copy Markdown

relativeci bot commented Apr 13, 2026

React Example

#7233 Bundle Size — 236.83KiB (0%).

6877936(current) vs cffd86f main#7231(baseline)

Bundle metrics  no changes
                 Current
#7233
     Baseline
#7231
No change  Initial JS 0B 0B
No change  Initial CSS 0B 0B
No change  Cache Invalidation 0% 0%
No change  Chunks 0 0
No change  Assets 4 4
No change  Modules 179 179
No change  Duplicate Modules 70 70
No change  Duplicate Code 46.13% 46.13%
No change  Packages 2 2
No change  Duplicate Packages 0 0
Bundle size by type  no changes
                 Current
#7233
     Baseline
#7231
No change  IMG 145.76KiB 145.76KiB
No change  Other 91.07KiB 91.07KiB

Bundle analysis reportBranch fix/docs-mcp-recursionProject dashboard


Generated by RelativeCIDocumentationReport issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants