Design Document: AI Trend Search Engine

# Design Document: AI Trend Search Engine

## Overview

The AI Trend Search Engine is a new module within the Marketing Engine service that accepts a natural language prompt and company context, searches the web for recent news and events, and returns 5 structured results with citations. It runs as an Inngest background job and is designed as a self-contained module that can later be invoked by AI agents.

The pipeline follows four stages:

```
Input Validation → Query Planning (LLM) → Web Search (Tavily) → Content Synthesis (LLM) → Return
```

Persistence is handled by the caller (Inngest job), not the core pipeline. This keeps the module stateless and reusable.

The module lives under `src/server/trend-search/` as a standalone directory, with an Inngest function in `src/server/inngest/functions/` and an API route in `src/app/api/trend-search/`.

## Architecture

```mermaid
sequenceDiagram
    participant Client
    participant API as API Route<br/>/api/trend-search
    participant Inngest as Inngest Job
    participant QP as Query Planner<br/>(LLM)
    participant WS as Web Search<br/>(Tavily)
    participant CS as Content Synthesizer<br/>(LLM)
    participant DB as PostgreSQL

    Client->>API: POST { query, companyContext, categories? }
    API->>API: Validate input (Zod)
    API->>Inngest: Dispatch "trend-search/run.requested"
    API-->>Client: 202 { jobId, status: "queued" }

    Inngest->>QP: Step 1: Plan queries
    QP-->>Inngest: sub-queries[]

    loop For each sub-query
        Inngest->>WS: Step 2: Execute search
        WS-->>Inngest: raw results[]
    end

    Inngest->>CS: Step 3: Synthesize results
    CS-->>Inngest: SearchResult[5]

    Inngest->>DB: Step 4: Persist results
    DB-->>Inngest: saved

    Note over Client,DB: Client polls GET /api/trend-search/[jobId] for results
```

The module integrates with the existing architecture:

- **Services Layer**: Lives within the Marketing Engine boundary
- **Tools Layer**: Uses Web Search (Tavily) and LLM capabilities (OpenAI via LangChain)
- **Physical Layer**: Persists to PostgreSQL, runs via Inngest

## Components and Interfaces

### 1. Input Types (`src/server/trend-search/types.ts`)

```typescript
import { z } from "zod";

export const SearchCategoryEnum = z.enum([
  "fashion",
  "finance",
  "business",
  "tech",
]);
export type SearchCategory = z.infer<typeof SearchCategoryEnum>;

export const TrendSearchInputSchema = z.object({
  query: z.string().min(1).max(1000),
  companyContext: z.string().min(1).max(2000),
  categories: z.array(SearchCategoryEnum).optional(),
});
export type TrendSearchInput = z.infer<typeof TrendSearchInputSchema>;

export interface SearchResult {
  sourceUrl: string;
  summary: string;
  description: string;
}

export interface TrendSearchOutput {
  results: SearchResult[];
  metadata: {
    query: string;
    companyContext: string;
    categories: SearchCategory[];
    createdAt: string;
  };
}

export type TrendSearchJobStatus =
  | "queued"
  | "planning"
  | "searching"
  | "synthesizing"
  | "completed"
  | "failed";

export interface TrendSearchJobRecord {
  id: string;
  companyId: bigint;
  userId: string;
  status: TrendSearchJobStatus;
  input: TrendSearchInput;
  output: TrendSearchOutput | null;
  errorMessage: string | null;
  createdAt: Date;
  completedAt: Date | null;
}

// Inngest event payload
export const TrendSearchEventDataSchema = z.object({
  jobId: z.string(),
  companyId: z.string(), // serialized as string for Inngest
  userId: z.string(),
  query: z.string(),
  companyContext: z.string(),
  categories: z.array(SearchCategoryEnum).optional(),
});
export type TrendSearchEventData = z.infer<typeof TrendSearchEventDataSchema>;
```

### 2. Query Planner (`src/server/trend-search/query-planner.ts`)

Responsible for taking the user's prompt and company context and generating optimized sub-queries for Tavily.

```typescript
interface PlannedQuery {
  searchQuery: string;
  category: SearchCategory;
  rationale: string;
}

async function planQueries(
  query: string,
  companyContext: string,
  categories?: SearchCategory[]
): Promise<PlannedQuery[]>;
```

Implementation approach:
- Uses OpenAI (via LangChain `ChatOpenAI`) with a structured output prompt
- System prompt instructs the LLM to generate 3-5 sub-queries focused on recent news/events
- If categories are not provided, the LLM infers them from the query and company context
- Each sub-query includes category-specific terms and company-relevant keywords
- Output is parsed via Zod schema for type safety

### 3. Web Search Executor (`src/server/trend-search/web-search.ts`)

Executes sub-queries against Tavily and collects raw results.

```typescript
interface RawSearchResult {
  url: string;
  title: string;
  content: string;
  score: number;
  publishedDate?: string;
}

async function executeSearch(
  subQueries: PlannedQuery[]
): Promise<RawSearchResult[]>;
```

Implementation approach:
- Uses `@langchain/community` TavilySearchResults tool or direct Tavily API
- Configures Tavily with `search_depth: "advanced"` and `topic: "news"` to focus on news/events
- Executes sub-queries sequentially (Inngest step per query for retry isolation)
- Retries each sub-query up to 2 times on failure
- Deduplicates results by URL across sub-queries
- Returns combined raw results for synthesis

### 4. Content Synthesizer (`src/server/trend-search/synthesizer.ts`)

Takes raw search results and produces exactly 5 structured results.

```typescript
async function synthesizeResults(
  rawResults: RawSearchResult[],
  query: string,
  companyContext: string,
  categories: SearchCategory[]
): Promise<SearchResult[]>;
```

Implementation approach:
- Uses OpenAI with a structured output prompt
- System prompt instructs the LLM to:
  - Select the 5 most relevant results to the company context
  - Rank by relevance (most relevant first)
  - Generate a concise summary (1-2 sentences) and detailed description for each
  - Preserve the original source URL
- If fewer than 5 distinct results are available, pads with placeholder entries (`sourceUrl: ""`, `summary: "Insufficient results"`)
- Output validated via Zod schema

### 5. Inngest Function (`src/server/inngest/functions/trendSearch.ts`)

Orchestrates the pipeline as a multi-step Inngest function.

```typescript
export const trendSearchJob = inngest.createFunction(
  {
    id: "trend-search-run",
    name: "AI Trend Search Pipeline",
    retries: 3,
  },
  { event: "trend-search/run.requested" },
  async ({ event, step }) => {
    // Step 1: Plan queries
    // Step 2: Execute web searches
    // Step 3: Synthesize results
    // Step 4: Persist to DB
  }
);
```

### 6. API Routes

**POST `/api/trend-search`** — Initiate a search
- Validates input with Zod
- Creates a job record in DB with status "queued"
- Dispatches Inngest event
- Returns `202 { jobId, status: "queued" }`

**GET `/api/trend-search/[jobId]`** — Poll for results
- Looks up job by ID, scoped to company_id
- Returns current status and results if completed

**GET `/api/trend-search`** — List past searches
- Returns paginated list of past searches for the company

### 7. Module Entry Point (`src/server/trend-search/index.ts`)

Exposes the public interface for programmatic invocation (agent-callable in the future).

```typescript
export async function runTrendSearch(
  input: TrendSearchInput
): Promise<TrendSearchOutput>;
```

This function runs the pipeline directly (without Inngest, without DB) for synchronous invocation by agents or any caller. It accepts only the search input and returns the result — no side effects. Persistence is the responsibility of the caller.

The Inngest function is the only caller that persists results to the DB. This keeps `runTrendSearch()` stateless and reusable across contexts (agents, tests, scripts) without requiring DB access.

## Data Models

### New Table: `trend_search_jobs`

```sql
CREATE TABLE trend_search_jobs (
  id VARCHAR(256) PRIMARY KEY,
  company_id BIGINT NOT NULL REFERENCES company(id) ON DELETE CASCADE,
  user_id VARCHAR(256) NOT NULL,
  status VARCHAR(50) NOT NULL DEFAULT 'queued',
  query TEXT NOT NULL,
  company_context TEXT NOT NULL,
  categories JSONB,
  results JSONB,
  error_message TEXT,
  created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
  completed_at TIMESTAMPTZ,
  updated_at TIMESTAMPTZ
);

CREATE INDEX trend_search_jobs_company_id_idx ON trend_search_jobs(company_id);
CREATE INDEX trend_search_jobs_status_idx ON trend_search_jobs(status);
CREATE INDEX trend_search_jobs_company_status_idx ON trend_search_jobs(company_id, status);
```

Drizzle schema definition in `src/server/db/schema/trend-search.ts`:

```typescript
import { sql } from "drizzle-orm";
import { index, jsonb, serial, text, timestamp, varchar, bigint } from "drizzle-orm/pg-core";
import { pgTable } from "./helpers";
import { company } from "./base";
import type { SearchCategory, SearchResult } from "~/server/trend-search/types";

export const trendSearchJobs = pgTable(
  "trend_search_jobs",
  {
    id: varchar("id", { length: 256 }).primaryKey(),
    companyId: bigint("company_id", { mode: "bigint" })
      .notNull()
      .references(() => company.id, { onDelete: "cascade" }),
    userId: varchar("user_id", { length: 256 }).notNull(),
    status: varchar("status", {
      length: 50,
      enum: ["queued", "planning", "searching", "synthesizing", "completed", "failed"],
    }).notNull().default("queued"),
    query: text("query").notNull(),
    companyContext: text("company_context").notNull(),
    categories: jsonb("categories").$type<SearchCategory[]>(),
    results: jsonb("results").$type<SearchResult[]>(),
    errorMessage: text("error_message"),
    createdAt: timestamp("created_at", { withTimezone: true })
      .default(sql`CURRENT_TIMESTAMP`)
      .notNull(),
    completedAt: timestamp("completed_at", { withTimezone: true }),
    updatedAt: timestamp("updated_at", { withTimezone: true }).$onUpdate(() => new Date()),
  },
  (table) => ({
    companyIdIdx: index("trend_search_jobs_company_id_idx").on(table.companyId),
    statusIdx: index("trend_search_jobs_status_idx").on(table.status),
    companyStatusIdx: index("trend_search_jobs_company_status_idx").on(table.companyId, table.status),
  })
);
```


## Correctness Properties

*A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.*

### Property 1: Valid input creates a job

*For any* valid Search_Query (non-empty, ≤1000 chars) and valid Company_Context (non-empty, ≤2000 chars), submitting a trend search should create a job record with status "queued" and return a job ID.

**Validates: Requirements 1.1**

### Property 2: Invalid input is rejected

*For any* Search_Query composed entirely of whitespace characters, or *for any* Company_Context exceeding 2000 characters, the Trend_Search_Engine should reject the request with a validation error and not create a job record.

**Validates: Requirements 1.4, 1.5**

### Property 3: Category inference produces valid categories

*For any* valid Search_Query and Company_Context where no Search_Categories are specified, the Query_Planner should return planned queries where every category is a member of the valid SearchCategory enum (fashion, finance, business, tech).

**Validates: Requirements 1.2**

### Property 4: Specified categories are preserved in planned queries

*For any* set of specified Search_Categories, all planned queries produced by the Query_Planner should only reference categories from the specified set.

**Validates: Requirements 1.3**

### Property 5: Query planner always produces sub-queries

*For any* valid Search_Query and Company_Context, the Query_Planner should return at least one PlannedQuery.

**Validates: Requirements 2.1**

### Property 6: Every sub-query triggers a search call

*For any* list of PlannedQueries, the web search executor should invoke the search provider exactly once per sub-query (before retries).

**Validates: Requirements 3.1**

### Property 7: Synthesizer output structure

*For any* set of at least 5 raw search results, the Content_Synthesizer should produce exactly 5 SearchResult objects, each containing a non-empty `sourceUrl`, a non-empty `summary`, and a non-empty `description`.

**Validates: Requirements 4.1, 4.2**

### Property 8: Source URL traceability

*For any* output from the Content_Synthesizer, every `sourceUrl` in the results should be present in the set of URLs from the raw input results.

**Validates: Requirements 4.3**

### Property 9: Persistence round-trip

*For any* completed trend search, persisting the Result_Set and then retrieving it by job ID should return an equivalent Result_Set, including the original query, company context, categories, and a timestamp.

**Validates: Requirements 5.1, 5.2, 5.3**

### Property 10: Company data isolation

*For any* two distinct company IDs, trend search results persisted under company A should never appear when querying results for company B.

**Validates: Requirements 5.4**

### Property 11: Successful pipeline sets completed status

*For any* trend search pipeline that completes all steps without error, the job record status should be "completed" and the `completedAt` timestamp should be non-null.

**Validates: Requirements 6.4**

### Property 12: Input serialization round-trip

*For any* valid TrendSearchInput, serializing it to the Inngest event JSON payload and then deserializing it back should produce an object equal to the original input.

**Validates: Requirements 8.1, 8.2**

## Error Handling

| Error Scenario | Handling Strategy |
|---|---|
| Empty/whitespace query | Zod validation rejects at API layer, returns 400 with error details |
| Company context too long | Zod validation rejects at API layer, returns 400 with error details |
| Tavily API unavailable | Retry up to 2 times per sub-query (Inngest step retry). Log and continue with remaining sub-queries |
| Tavily returns 0 results for a sub-query | Log warning, continue with other sub-queries |
| All sub-queries return 0 results | Synthesizer returns 5 placeholder results with empty sourceUrl and "Insufficient results" summary |
| LLM (Query Planner) fails | Inngest step retry (up to 3 retries). If all fail, mark job as failed |
| LLM (Synthesizer) fails | Inngest step retry (up to 3 retries). If all fail, mark job as failed |
| LLM returns malformed output | Zod validation on LLM output catches structural issues. Retry the step |
| DB write fails | Inngest step retry. If persistent, mark job as failed with error message |
| Unauthorized request | Clerk auth middleware returns 401 before reaching the handler |
| Job not found on poll | Return 404 |
| Job belongs to different company | Return 404 (do not leak existence) |

## Testing Strategy

### Property-Based Testing

Use `fast-check` as the property-based testing library (already compatible with the Jest setup in this project).

Each correctness property maps to a single property-based test with a minimum of 100 iterations. Tests should be tagged with the property reference.

**Tag format**: `Feature: ai-trend-search-engine, Property {N}: {title}`

Key property tests:
- **Input validation properties (P1, P2)**: Generate random valid/invalid inputs and verify acceptance/rejection
- **Query planner properties (P3, P4, P5)**: Mock LLM responses, generate random category combinations, verify structural constraints
- **Synthesizer properties (P7, P8)**: Generate random raw result sets, mock LLM synthesis, verify output structure and URL traceability
- **Persistence properties (P9, P10)**: Use test DB, generate random results, verify round-trip and isolation
- **Serialization property (P12)**: Generate random TrendSearchInput objects, verify serialize/deserialize round-trip

### Unit Testing

Unit tests complement property tests for specific examples and edge cases:
- Tavily returning 0 results for one sub-query (edge case from 3.3)
- Tavily failing and retry behavior (edge case from 3.4)
- Fewer than 5 raw results triggering placeholder padding (edge case from 4.5)
- Job failure status when pipeline step fails (edge case from 6.3)
- Zod schema validation for specific malformed inputs

### Integration Testing

- End-to-end test with mocked Tavily and LLM: submit search → poll for results → verify output
- DB integration: verify Drizzle schema, migrations, and query scoping


Error Scenario	Handling Strategy
Empty/whitespace query	Zod validation rejects at API layer, returns 400 with error details
Company context too long	Zod validation rejects at API layer, returns 400 with error details
Tavily API unavailable	Retry up to 2 times per sub-query (Inngest step retry). Log and continue with remaining sub-queries
Tavily returns 0 results for a sub-query	Log warning, continue with other sub-queries
All sub-queries return 0 results	Synthesizer returns 5 placeholder results with empty sourceUrl and "Insufficient results" summary
LLM (Query Planner) fails	Inngest step retry (up to 3 retries). If all fail, mark job as failed
LLM (Synthesizer) fails	Inngest step retry (up to 3 retries). If all fail, mark job as failed
LLM returns malformed output	Zod validation on LLM output catches structural issues. Retry the step
DB write fails	Inngest step retry. If persistent, mark job as failed with error message
Unauthorized request	Clerk auth middleware returns 401 before reaching the handler
Job not found on poll	Return 404
Job belongs to different company	Return 404 (do not leak existence)

Design Document: AI Trend Search Engine #198

Description

Design Document: AI Trend Search Engine

Overview

Architecture

Components and Interfaces

1. Input Types (src/server/trend-search/types.ts)

2. Query Planner (src/server/trend-search/query-planner.ts)

3. Web Search Executor (src/server/trend-search/web-search.ts)

4. Content Synthesizer (src/server/trend-search/synthesizer.ts)

5. Inngest Function (src/server/inngest/functions/trendSearch.ts)

6. API Routes

7. Module Entry Point (src/server/trend-search/index.ts)

Data Models

New Table: trend_search_jobs

Correctness Properties

Property 1: Valid input creates a job

Property 2: Invalid input is rejected

Property 3: Category inference produces valid categories

Property 4: Specified categories are preserved in planned queries

Property 5: Query planner always produces sub-queries

Property 6: Every sub-query triggers a search call

Property 7: Synthesizer output structure

Property 8: Source URL traceability

Property 9: Persistence round-trip

Property 10: Company data isolation

Property 11: Successful pipeline sets completed status

Property 12: Input serialization round-trip

Error Handling

Testing Strategy

Property-Based Testing

Unit Testing

Integration Testing

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. Input Types (`src/server/trend-search/types.ts`)

2. Query Planner (`src/server/trend-search/query-planner.ts`)

3. Web Search Executor (`src/server/trend-search/web-search.ts`)

4. Content Synthesizer (`src/server/trend-search/synthesizer.ts`)

5. Inngest Function (`src/server/inngest/functions/trendSearch.ts`)

7. Module Entry Point (`src/server/trend-search/index.ts`)

New Table: `trend_search_jobs`