Competitor Price Scraper Workflow

A Motia workflow that implements a full RAG (Retrieval-Augmented Generation) system for analyzing competitor pricing data.

Architecture

[Webhook API] → [Text Processor] → [Embeddings Generator]
                                           ↓
              [RAG Agent] ← [Vector Store (Supabase)]
                   ↓
           [Google Sheets Logger]
                   ↓ (on error)
           [Slack Alert]

Features

Webhook Endpoint: Receives competitor pricing data via HTTP POST
Text Processing: Chunks data with configurable size and overlap
Embeddings: Generates embeddings using OpenAI's text-embedding-3-small
Vector Storage: Stores and queries embeddings in Supabase with pgvector
RAG Analysis: Uses Anthropic Claude for intelligent price analysis
Window Memory: Maintains conversation context across analyses
Google Sheets Logging: Optional logging of analysis results
Error Handling: Slack notifications for workflow errors

Setup

1. Environment Variables

Create a .env file with the following variables:

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here

# Anthropic Configuration
ANTHROPIC_API_KEY=your_anthropic_api_key_here

# Supabase Configuration
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_ANON_KEY=your_supabase_anon_key_here

# Google Sheets Configuration (optional if ENABLE_SHEETS_LOGGING=false)
GOOGLE_SHEETS_CLIENT_EMAIL=your-service-account@project.iam.gserviceaccount.com
GOOGLE_SHEETS_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\nyour_private_key_here\n-----END PRIVATE KEY-----"
GOOGLE_SHEETS_SPREADSHEET_ID=your_spreadsheet_id_here

# Slack Configuration
SLACK_BOT_TOKEN=xoxb-your-slack-bot-token

# Feature Flags
ENABLE_SHEETS_LOGGING=false  # Set to true to enable Google Sheets logging

2. Supabase Setup

Run these SQL commands in your Supabase SQL editor:

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create the vector table
CREATE TABLE IF NOT EXISTS competitor_price_scraper (
  id TEXT PRIMARY KEY,
  content TEXT NOT NULL,
  embedding vector(1536),
  metadata JSONB,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Create index for similarity search
CREATE INDEX IF NOT EXISTS competitor_price_scraper_embedding_idx 
ON competitor_price_scraper 
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- Create RPC function for similarity search
CREATE OR REPLACE FUNCTION match_documents(
  query_embedding vector(1536),
  match_threshold float,
  match_count int,
  table_name text
)
RETURNS TABLE (
  id text,
  content text,
  embedding vector(1536),
  metadata jsonb,
  similarity float
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  EXECUTE format('
    SELECT 
      id,
      content,
      embedding,
      metadata,
      1 - (embedding <=> %L) as similarity
    FROM %I
    WHERE 1 - (embedding <=> %L) > %L
    ORDER BY embedding <=> %L
    LIMIT %L
  ', query_embedding, table_name, query_embedding, match_threshold, query_embedding, match_count);
END;
$$;

3. Install Dependencies

npm install

4. Run the Workflow

npm run dev

API Usage

Webhook Endpoint

Send competitor price data to the webhook:

POST /competitor-price-scraper

# With structured data
{
  "competitorName": "Competitor A",
  "products": [
    {
      "name": "Product 1",
      "price": 99.99,
      "currency": "USD",
      "url": "https://competitor.com/product1"
    }
  ],
  "timestamp": "2024-01-20T10:00:00Z"
}

# With raw data
{
  "competitorName": "Competitor B",
  "rawData": "Raw HTML or text content containing pricing information...",
  "timestamp": "2024-01-20T10:00:00Z"
}

Workflow Steps

1. Webhook (`webhook.step.ts`)

Receives competitor data via HTTP POST
Validates input schema
Emits event for text processing

2. Text Processor (`text_processor.step.ts`)

Splits raw data into 400-character chunks with 40-char overlap
Processes both raw text and structured product data
Maintains metadata for each chunk

3. Embeddings Generator (`embeddings.step.ts`)

Generates embeddings using OpenAI's API
Processes chunks in batches for efficiency
Handles API errors gracefully

4. Vector Store (`vector_store.step.ts`)

Stores embeddings in Supabase with pgvector
Supports both insert and query operations
Maintains document metadata

5. RAG Agent (`rag_agent.step.ts`)

Queries vector store for relevant context
Uses Anthropic Claude for analysis
Maintains conversation memory
Generates structured insights

6. Sheets Logger (`sheets_logger.step.ts`)

Logs analysis results to Google Sheets
Can be disabled via environment variable
Non-critical failures don't break workflow

7. Slack Alert (`slack_alert.step.ts`)

Sends error notifications to #alerts channel
Includes formatted error details
Handles both errors and warnings

Custom UI Components

Each step includes a custom Workbench UI component for better visualization:

Webhook: Shows API endpoint with lightning icon
Other steps: EventNode components with descriptive text

Testing

Test Mode: Set ENABLE_SHEETS_LOGGING=false to disable Google Sheets
Send Test Data: Use curl or Postman to send test competitor data
Monitor Logs: Check Motia logs for step execution
Check Slack: Verify error notifications work

Troubleshooting

Embedding Errors: Check OpenAI API key and quota
Vector Store Errors: Verify Supabase setup and credentials
Slack Errors: Ensure bot has access to #alerts channel
Memory Issues: Memory persists across workflow runs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Competitor Price Scraper Workflow

Architecture

Features

Setup

1. Environment Variables

2. Supabase Setup

3. Install Dependencies

4. Run the Workflow

API Usage

Webhook Endpoint

Workflow Steps

1. Webhook (`webhook.step.ts`)

2. Text Processor (`text_processor.step.ts`)

3. Embeddings Generator (`embeddings.step.ts`)

4. Vector Store (`vector_store.step.ts`)

5. RAG Agent (`rag_agent.step.ts`)

6. Sheets Logger (`sheets_logger.step.ts`)

7. Slack Alert (`slack_alert.step.ts`)

Custom UI Components

Testing

Troubleshooting

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Competitor Price Scraper Workflow

Architecture

Features

Setup

1. Environment Variables

2. Supabase Setup

3. Install Dependencies

4. Run the Workflow

API Usage

Webhook Endpoint

Workflow Steps

1. Webhook (webhook.step.ts)

2. Text Processor (text_processor.step.ts)

3. Embeddings Generator (embeddings.step.ts)

4. Vector Store (vector_store.step.ts)

5. RAG Agent (rag_agent.step.ts)

6. Sheets Logger (sheets_logger.step.ts)

7. Slack Alert (slack_alert.step.ts)

Custom UI Components

Testing

Troubleshooting

1. Webhook (`webhook.step.ts`)

2. Text Processor (`text_processor.step.ts`)

3. Embeddings Generator (`embeddings.step.ts`)

4. Vector Store (`vector_store.step.ts`)

5. RAG Agent (`rag_agent.step.ts`)

6. Sheets Logger (`sheets_logger.step.ts`)

7. Slack Alert (`slack_alert.step.ts`)