Skip to content

Update Jan 31#150

Merged
Deodat-Lawson merged 20 commits intostablefrom
main
Jan 31, 2026
Merged

Update Jan 31#150
Deodat-Lawson merged 20 commits intostablefrom
main

Conversation

@Deodat-Lawson
Copy link
Copy Markdown
Owner

No description provided.

…ent function. Added new interfaces for PDF text items and content, updated pagerender type, and implemented chunk insertion and OCR metadata updates in the database.
- Lazy load determineDocumentRouting in processDocument.ts
- This prevents HuggingFace Transformers and onnxruntime-node from being traced
- Add serverExternalPackages for heavy dependencies
- Add outputFileTracingExcludes for Inngest route
- Add vercel.json with function configuration

Reduces bundle from 453MB to under 250MB limit
- Set TRANSFORMERS_BACKEND=wasm environment variable
- Configure USE_ONNX_NODE=false to prevent Node.js runtime
- This ensures onnxruntime-node (404MB) is never required
- Lazy loading + WASM backend = minimal bundle size
…esign

Knowledge Base Redesign - RLM-Ready Hierarchical Document Storage
@vercel
Copy link
Copy Markdown

vercel Bot commented Jan 31, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
pdr-ai-v2 Ready Ready Preview, Comment Jan 31, 2026 7:22am

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7678ca57ad

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +80 to +84
const { jobId, eventIds } = await triggerDocumentProcessing(
documentUrl,
documentName,
companyId,
userId,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Insert OCR job before dispatching Inngest event

The event is dispatched here before the ocr_jobs row exists. If the Inngest worker starts quickly, the completion update in processDocument (step‑e) will run against a missing row and no status change will be persisted, leaving the job stuck as queued when clients poll /api/uploadDocument?jobId. Insert the job record before sending the event or make the completion update an upsert/create-on-miss.

Useful? React with 👍 / 👎.

Comment on lines +156 to +159
await step.run("step-e-storage", async () => {
if (vectorizedChunks.length === 0) {
console.log("[Step E] No chunks to store");
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Record failure/complete status when no chunks are produced

When vectorizedChunks is empty, the function returns early and skips updating document and ocr_jobs. Any document that yields no chunks (e.g., OCR returns empty text, parsing errors, or a truly blank PDF) will never transition out of queued, which makes status polling hang indefinitely. Consider marking the job failed or completed-with-zero-chunks and updating the document metadata even in the empty case.

Useful? React with 👍 / 👎.

Comment on lines +314 to +318
pages.push({
pageNumber: 1,
textBlocks: [data.text],
tables: []
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid tagging all native-PDF content as page 1

Native PDF parsing collapses the entire document into a single page and hardcodes pageNumber: 1. For multi-page PDFs this makes every chunk appear to come from page 1, so recommendedPages/citations become incorrect even though the document has multiple pages. If per-page splitting isn’t available, consider setting pageNumber to null or approximating page ranges rather than marking everything as page 1.

Useful? React with 👍 / 👎.

@Deodat-Lawson Deodat-Lawson merged commit 3b1fd6a into stable Jan 31, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant