Update Jan 31 by Deodat-Lawson · Pull Request #150 · Deodat-Lawson/LaunchStack

Deodat-Lawson · 2026-01-31T07:03:36Z

No description provided.

…ent function. Added new interfaces for PDF text items and content, updated pagerender type, and implemented chunk insertion and OCR metadata updates in the database.

- Lazy load determineDocumentRouting in processDocument.ts - This prevents HuggingFace Transformers and onnxruntime-node from being traced - Add serverExternalPackages for heavy dependencies - Add outputFileTracingExcludes for Inngest route - Add vercel.json with function configuration Reduces bundle from 453MB to under 250MB limit

- Set TRANSFORMERS_BACKEND=wasm environment variable - Configure USE_ONNX_NODE=false to prevent Node.js runtime - This ensures onnxruntime-node (404MB) is never required - Lazy loading + WASM backend = minimal bundle size

Feature/improve ocr workflow

…Deodat-Lawson/pdr_ai_v2 into feature/knowledge-base-redesign

…esign Knowledge Base Redesign - RLM-Ready Hierarchical Document Storage

vercel · 2026-01-31T07:03:40Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
pdr-ai-v2	Ready	Preview, Comment	Jan 31, 2026 7:22am

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7678ca57ad

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-01-31T07:09:57Z

+      const { jobId, eventIds } = await triggerDocumentProcessing(
+        documentUrl,
+        documentName,
+        companyId,
+        userId,


Insert OCR job before dispatching Inngest event

The event is dispatched here before the ocr_jobs row exists. If the Inngest worker starts quickly, the completion update in processDocument (step‑e) will run against a missing row and no status change will be persisted, leaving the job stuck as queued when clients poll /api/uploadDocument?jobId. Insert the job record before sending the event or make the completion update an upsert/create-on-miss.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-01-31T07:09:57Z

+    await step.run("step-e-storage", async () => {
+      if (vectorizedChunks.length === 0) {
+        console.log("[Step E] No chunks to store");
+        return;


Record failure/complete status when no chunks are produced

When vectorizedChunks is empty, the function returns early and skips updating document and ocr_jobs. Any document that yields no chunks (e.g., OCR returns empty text, parsing errors, or a truly blank PDF) will never transition out of queued, which makes status polling hang indefinitely. Consider marking the job failed or completed-with-zero-chunks and updating the document metadata even in the empty case.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-01-31T07:09:57Z

+    pages.push({
+      pageNumber: 1,
+      textBlocks: [data.text],
+      tables: []
+    });


Avoid tagging all native-PDF content as page 1

Native PDF parsing collapses the entire document into a single page and hardcodes pageNumber: 1. For multi-page PDFs this makes every chunk appear to come from page 1, so recommendedPages/citations become incorrect even though the document has multiple pages. If per-page splitting isn’t available, consider setting pageNumber to null or approximating page ranges rather than marking everything as page 1.

Useful? React with 👍 / 👎.

Deodat-Lawson added 17 commits January 22, 2026 20:04

saving work

d34206c

Enhance PDF parsing types and implement chunk storage in processDocum…

3552511

…ent function. Added new interfaces for PDF text items and content, updated pagerender type, and implemented chunk insertion and OCR metadata updates in the database.

implement new upload document pipeline

8da88d4

optimze bundle size

9a7b74e

Force HuggingFace Transformers to use WASM backend

7099fc3

- Set TRANSFORMERS_BACKEND=wasm environment variable - Configure USE_ONNX_NODE=false to prevent Node.js runtime - This ensures onnxruntime-node (404MB) is never required - Lazy loading + WASM backend = minimal bundle size

replaced onxxruntime node with web

81c32fd

removed vercel json

79b0e29

Merge pull request #147 from Deodat-Lawson/feature/improve-ocr-workflow

d41c180

Feature/improve ocr workflow

saving work

6a3f1e7

updated document layering and readme

9896d2c

saving work

732883f

fixed lint error

2fb011c

Merge branch 'main' into feature/knowledge-base-redesign

e74f01c

updated package lock jason

7183588

Merge branch 'feature/knowledge-base-redesign' of https://github.com/…

2e4b623

…Deodat-Lawson/pdr_ai_v2 into feature/knowledge-base-redesign

Merge pull request #148 from Deodat-Lawson/feature/knowledge-base-red…

7678ca5

…esign Knowledge Base Redesign - RLM-Ready Hierarchical Document Storage

chatgpt-codex-connector Bot reviewed Jan 31, 2026

View reviewed changes

Deodat-Lawson added 2 commits January 31, 2026 02:17

fixes tests

dddf6c4

Merge branch 'main' of https://github.com/Deodat-Lawson/pdr_ai_v2

4e30878

vercel Bot deployed to Preview January 31, 2026 07:19 View deployment

fixes tests

bd91c99

vercel Bot deployed to Preview January 31, 2026 07:22 View deployment

Deodat-Lawson merged commit 3b1fd6a into stable Jan 31, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Jan 31#150

Update Jan 31#150
Deodat-Lawson merged 20 commits intostablefrom
main

Deodat-Lawson commented Jan 31, 2026

Uh oh!

vercel Bot commented Jan 31, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jan 31, 2026

Uh oh!

chatgpt-codex-connector Bot Jan 31, 2026

Uh oh!

chatgpt-codex-connector Bot Jan 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Deodat-Lawson commented Jan 31, 2026

Uh oh!

vercel Bot commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jan 31, 2026 •

edited

Loading