Skip to content

feat: [ENG-3586] AI Gateway Stats page MVP#5427

Merged
connortbot merged 5 commits intomainfrom
stats-page
Dec 18, 2025
Merged

feat: [ENG-3586] AI Gateway Stats page MVP#5427
connortbot merged 5 commits intomainfrom
stats-page

Conversation

@connortbot
Copy link
Copy Markdown
Collaborator

CleanShot 2025-12-17 at 15 44 13@2x

MVP including 3 charts:

  • model leaderboard
  • market share
  • provider leaderboard

Later PRs will build on this by adding more charts, and individual stats pages for providers, authors, and models.

@vercel
Copy link
Copy Markdown

vercel Bot commented Dec 17, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

3 Skipped Deployments
Project Deployment Review Updated (UTC)
helicone Skipped Skipped Dec 18, 2025 0:39am
helicone-bifrost Skipped Skipped Dec 18, 2025 0:39am
helicone-eu Skipped Skipped Dec 18, 2025 0:39am

@connortbot connortbot requested a review from chitalian December 17, 2025 23:45
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Dec 17, 2025

Claude finished @connortbot's task —— View job


🔍 Comprehensive PR Review Complete

Todo List:

  • Create initial todo list and start PR review
  • Read and analyze the flagged SQL injection vulnerability in ModelUsageStatsManager.ts
  • Review ClickHouse materialized view implementation for potential issues
  • Analyze backend API design and caching strategy
  • Review frontend React components for code quality and best practices
  • Check chart implementations for potential issues
  • Assess overall architecture and provide performance recommendations
  • Compile comprehensive review feedback with confidence score

Score: 2/10 - Critical security issues must be fixed before merge

🚨 Critical Security Issues

1. SQL Injection Vulnerabilities (CRITICAL)

  • Location: valhalla/jawn/src/managers/ModelUsageStatsManager.ts:115, 403
  • Issue: Model and provider names from the database are directly interpolated into SQL queries without proper escaping
  • Risk: Malicious model/provider names could execute arbitrary SQL commands
  • Example Attack: A model name like '; DROP TABLE request_stats; -- would cause catastrophic data loss

Fix Required:

// Line 115 - VULNERABLE:
const inClause = top9Data.length > 0 
  ? top9Data.map((r) => `'${r.model}'`).join(",")  // ❌ NO ESCAPING
  : "''";

// Line 115 - SECURE:
const inClause = top9Data.length > 0 
  ? top9Data.map((r) => `'${r.model.replace(/'/g, "''")}'`).join(",")  // ✅ PROPER ESCAPING
  : "''";

// Line 403 - Same issue with provider names

🐛 High Priority Issues

2. Percentage Normalization Bug

  • Location: bifrost/app/stats/MarketShareChart.tsx:120-130
  • Issue: Unnecessary re-normalization of already-normalized percentages from backend
  • Impact: Could cause tooltip values to not match chart display, confusing users
  • Fix: Remove lines 120-130 normalization logic since backend already returns proper percentages

3. ClickHouse Migration Data Duplication Risk

  • Location: clickhouse/migrations/schema_77_stats_page_mv.sql:55-95
  • Issue: Historical backfill (lines 55-95) may create duplicate data if materialized view has already processed some historical records
  • Fix: Add WHERE NOT EXISTS clause or use REPLACE instead of INSERT

⚡ Performance & Architecture Issues

4. Inefficient Parallel Queries

  • Location: ModelUsageStatsManager.ts:115, 132, 142
  • Issue: The same inClause is constructed and reused across multiple queries, but model names are fetched 3 times
  • Optimization: Cache the inClause construction result

5. Memory Usage for Large Result Sets

  • Issue: No pagination or limits on time series data
  • Risk: For popular models over 1-year timeframes, could return massive datasets
  • Suggestion: Add reasonable limits or implement pagination

✅ Positive Aspects

  1. Excellent Caching Strategy: 4-hour TTL with unauthorized cache middleware is well-implemented
  2. Clean API Design: TSOA controllers with proper error handling and status codes
  3. Good Frontend Architecture: Proper separation of concerns with React Query for data fetching
  4. Efficient ClickHouse Design: Materialized view with SummingMergeTree is optimal for analytics
  5. Responsive Chart Implementation: Well-structured Recharts components with proper loading states

📋 Suggestions Summary

Must Fix Before Merge:

  1. Fix SQL injection vulnerabilities in ModelUsageStatsManager.ts (lines 115, 403)
  2. Remove unnecessary percentage normalization in MarketShareChart.tsx (lines 120-130)
  3. Fix potential data duplication in ClickHouse migration

Should Consider:

  1. Add query result size limits to prevent memory issues
  2. Implement proper error boundaries in React components
  3. Add data validation for timeframe parameters
  4. Consider adding request/response logging for debugging

Minor Improvements:

  1. Extract common chart styling to shared components
  2. Add TypeScript strict null checks compliance
  3. Implement better error messages for API failures

🛡️ Security Assessment

This PR introduces a critical SQL injection vulnerability that could allow attackers to:

  • Execute arbitrary SQL commands
  • Drop or modify database tables
  • Access sensitive data
  • Cause data corruption or loss

The vulnerability exists because model/provider names from the database (which could be user-controlled through the AI Gateway) are directly concatenated into SQL strings without proper escaping.

Immediate Action Required: Fix SQL injection vulnerabilities before any merge consideration.


@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Dec 17, 2025

Greptile Summary

This PR implements a public AI Gateway statistics page showing model usage trends and market share data. The implementation includes a ClickHouse materialized view for efficient aggregation, new backend API endpoints with 4-hour caching, and a React-based frontend with interactive charts and leaderboards.

Key Changes:

  • Created request_stats materialized view aggregating tokens, latency, and request counts by hour/model/provider
  • Added /v1/public/stats/model-usage and /v1/public/stats/market-share endpoints with configurable timeframes
  • Implemented unauthorized cache middleware with 4-hour TTL for public stats endpoints
  • Built responsive frontend with stacked bar charts, leaderboards, and timeframe selectors
  • Extended model registry with getAuthorByModel helper for author attribution

Issues Found:

  • SQL injection vulnerability in ModelUsageStatsManager where model names are interpolated without escaping
  • Materialized view backfill may create duplicates if MV already processed historical data
  • Unnecessary percentage normalization in market share chart could cause display inconsistencies

Confidence Score: 3/5

  • This PR has a critical SQL injection vulnerability that must be fixed before merging
  • Score reflects critical security issue in ModelUsageStatsManager where model names from database are directly interpolated into SQL queries without escaping. While the feature implementation is solid with good caching strategy and clean frontend code, the SQL injection vulnerability poses a real security risk since malicious model names could be injected through the AI Gateway
  • Pay close attention to valhalla/jawn/src/managers/ModelUsageStatsManager.ts for the SQL injection vulnerability that needs immediate fixing

Important Files Changed

Filename Overview
clickhouse/migrations/schema_77_stats_page_mv.sql Creates materialized view for stats aggregation with 30-day backfill; potential duplicate data if MV already processed rows
valhalla/jawn/src/managers/ModelUsageStatsManager.ts Implements model usage and market share analytics; contains SQL injection vulnerability in model name interpolation
valhalla/jawn/src/middleware/unauthorizedCache.ts Refactored to accept configurable TTL parameter, creates separate KVCache instance per middleware
bifrost/app/stats/StatsPage.tsx Main stats page component with independent queries for charts and leaderboards
bifrost/app/stats/MarketShareChart.tsx 100% stacked bar chart with unnecessary percentage normalization that could introduce inconsistencies

Sequence Diagram

sequenceDiagram
    participant User as User Browser
    participant Page as Stats Page (bifrost)
    participant QP as QueryProvider
    participant API as Jawn API
    participant Cache as KVCache
    participant Manager as ModelUsageStatsManager
    participant CH as ClickHouse
    participant MV as request_stats_mv

    User->>Page: Navigate to /stats
    Page->>QP: Mount with QueryClient
    
    par Fetch Chart Data (1y)
        QP->>API: GET /v1/public/stats/model-usage?timeframe=1y
        API->>Cache: Check cache key
        alt Cache Hit
            Cache-->>API: Return cached data
            API-->>QP: Return cached response
        else Cache Miss
            Cache-->>API: No cached data
            API->>Manager: getModelUsage("1y")
            Manager->>CH: Query top 9 models
            CH->>MV: Aggregate from materialized view
            MV-->>CH: Return aggregated data
            CH-->>Manager: Top 9 models
            Manager->>CH: Parallel queries (other, prev period, time series)
            CH-->>Manager: All query results
            Manager-->>API: { timeSeries, leaderboard }
            API->>Cache: Store response (4hr TTL)
            API-->>QP: Return response
        end
        QP-->>Page: Update chart state
    and Fetch Leaderboard Data (7d)
        QP->>API: GET /v1/public/stats/model-usage?timeframe=7d
        Note over API,Cache: Same caching logic
        API-->>QP: Return response
        QP-->>Page: Update leaderboard state
    and Fetch Market Share Chart (1y)
        QP->>API: GET /v1/public/stats/market-share?timeframe=1y
        API->>Cache: Check cache
        alt Cache Miss
            API->>Manager: getMarketShare("1y")
            Manager->>CH: Query all models
            CH->>MV: Aggregate by model
            MV-->>CH: Model totals
            CH-->>Manager: Raw model data
            Manager->>Manager: Aggregate by author using registry
            Manager->>CH: Parallel queries (prev period, time series)
            CH-->>Manager: Historical data
            Manager->>Manager: Calculate market share percentages
            Manager-->>API: { timeSeries, leaderboard }
            API->>Cache: Store response
        end
        API-->>QP: Return response
        QP-->>Page: Update market share state
    and Fetch Market Share Leaderboard (7d)
        QP->>API: GET /v1/public/stats/market-share?timeframe=7d
        Note over API,Cache: Same caching logic
        API-->>QP: Return response
        QP-->>Page: Update market share leaderboard state
    end
    
    Page->>User: Render charts and leaderboards
    
    Note over User,Page: User changes timeframe selector
    User->>Page: Select "This Month" (30d)
    Page->>QP: Update query key
    QP->>API: GET /v1/public/stats/model-usage?timeframe=30d
    Note over API,Cache: Cached for 4 hours
    API-->>QP: Return response
    QP-->>Page: Update leaderboard
    Page->>User: Re-render with new data
Loading

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

23 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +92 to +94
const inClause = top9Data.length > 0
? top9Data.map((r) => `'${r.model}'`).join(",")
: "''";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: SQL injection vulnerability - model names from the database are directly interpolated into SQL without escaping.

Suggested change
const inClause = top9Data.length > 0
? top9Data.map((r) => `'${r.model}'`).join(",")
: "''";
const inClause = top9Data.length > 0
? top9Data.map((r) => `'${r.model.replace(/'/g, "''")}'`).join(",")
: "''"
Prompt To Fix With AI
This is a comment left during a code review.
Path: valhalla/jawn/src/managers/ModelUsageStatsManager.ts
Line: 92:94

Comment:
**logic:** SQL injection vulnerability - model names from the database are directly interpolated into SQL without escaping.

```suggestion
    const inClause = top9Data.length > 0 
      ? top9Data.map((r) => `'${r.model.replace(/'/g, "''")}'`).join(",") 
      : "''"
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +120 to +130
const totalPercentage = point.authors.reduce(
(sum, a) => sum + a.percentage,
0
);

authors.forEach((author) => {
const found = point.authors.find((a) => a.author === author);
const rawPercentage = found?.percentage ?? 0;
const normalizedPercentage =
totalPercentage > 0 ? (rawPercentage / totalPercentage) * 100 : 0;
entry[author] = normalizedPercentage;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Unnecessary normalization that could introduce rounding inconsistencies. Backend already returns percentages that sum to ~100%, so re-normalizing may cause the chart to display values that don't match the raw data shown in tooltips.

Prompt To Fix With AI
This is a comment left during a code review.
Path: bifrost/app/stats/MarketShareChart.tsx
Line: 120:130

Comment:
**style:** Unnecessary normalization that could introduce rounding inconsistencies. Backend already returns percentages that sum to ~100%, so re-normalizing may cause the chart to display values that don't match the raw data shown in tooltips.

How can I resolve this? If you propose a fix, please make it concise.

: "''";

// Step 2: Get "other" total + previous period data + time series (parallel)
const otherQuery = `
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@claude double sql injectable shit here please... make sure shit is air lock

Copy link
Copy Markdown
Contributor

@claude claude Bot Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude finished @chitalian's task —— View job


✅ SQL Injection Vulnerabilities Fixed

I've successfully identified and fixed all SQL injection vulnerabilities in the ModelUsageStatsManager.ts file.

Todo List:

  • Read and analyze the vulnerable code in ModelUsageStatsManager.ts
  • Identify all SQL injection points where model/provider names are interpolated
  • Implement proper SQL escaping for all vulnerable locations
  • Run linting and type checking to ensure changes are correct
  • Commit and push the security fixes

✅ Security Fixes Applied:

3 critical SQL injection vulnerabilities secured:

  1. Line 115: Model names in getModelUsage() method ✅
  2. Line 403: Provider names in getProviderUsage() method ✅
  3. Lines 132, 142: Model names in previous period queries ✅ (automatically fixed by fixing inClause)

⚡ Fix Details:

// BEFORE (vulnerable):
top9Data.map((r) => `'${r.model}'`)
top9Data.map((r) => `'${r.provider}'`)

// AFTER (secure):
top9Data.map((r) => `'${r.model.replace(/'/g, "''")}'`)
top9Data.map((r) => `'${r.provider.replace(/'/g, "''")}'`)

This implements proper SQL escaping by doubling single quotes (SQL standard), preventing SQL injection attacks from malicious model/provider names.

Example attack prevented:

  • Malicious model name: model'; DROP TABLE request_stats; --
  • After escaping: model''; DROP TABLE request_stats; -- (harmless string)

🔒 Security Status: AIRLOCKED

The code is now secure against SQL injection vulnerabilities. All user input from the database is properly escaped before being interpolated into SQL queries.


Changes committed: b6cb110

@vercel vercel Bot temporarily deployed to Preview – helicone December 18, 2025 00:34 Inactive
@vercel vercel Bot temporarily deployed to Preview – helicone-bifrost December 18, 2025 00:34 Inactive
@vercel vercel Bot temporarily deployed to Preview – helicone-eu December 18, 2025 00:34 Inactive
@vercel vercel Bot temporarily deployed to Preview – helicone-eu December 18, 2025 00:39 Inactive
@vercel vercel Bot temporarily deployed to Preview – helicone-bifrost December 18, 2025 00:39 Inactive
@vercel vercel Bot temporarily deployed to Preview – helicone December 18, 2025 00:39 Inactive
@connortbot connortbot merged commit 3c2790e into main Dec 18, 2025
10 of 12 checks passed
@connortbot connortbot deleted the stats-page branch December 18, 2025 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants