Skip to content

A professional, self-controlling AI assistant that handles live calls and manages your calendar with zero intervention. It leverages cutting-edge voice processing and seamless calendar integration to understand caller intent, verify your schedule, and autonomously book or reschedule meetings in real-time. Optimized for efficiency.

Notifications You must be signed in to change notification settings

Gitkbc/Neural_Conversational_Telephony_Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neural Conversational Telephony Pipeline (NCTP)

Python FastAPI MongoDB License

Author: Chaitanya Kalbhairav
Department: CSE (AI), Vishwakarma Institute of Technology, Pune, India
Email: [email protected]


Overview

The Neural Conversational Telephony Pipeline (NCTP) is a low-latency, autonomous voice AI system designed for real-time telephony conversations. It integrates streaming Automatic Speech Recognition (ASR), Large Language Model (LLM) reasoning, and ultra-fast Text-to-Speech (TTS) to achieve human-like conversational flow with minimal perceptible delay.

The system uses commercial services such as Twilio/LiveKit, Deepgram, OpenAI GPT, and Cartesia TTS, orchestrated via FastAPI, and leverages MongoDB Atlas for memory and context retrieval.

Additionally, a research paper detailing the architecture, methodology, performance analysis, and future directions is included in this repository.


Features

  • Real-time bidirectional voice interactions over telephony networks
  • Ultra-low response latency (< 800 ms average; P95 < 1000 ms)
  • Streaming ASR with Deepgram for continuous transcription
  • GPT-powered agent for contextual reasoning, intent recognition, and tool execution
  • High-speed RAG context retrieval using MongoDB Atlas
  • Streaming TTS with Cartesia for immediate audio output
  • Modular, containerized microservices for horizontal scalability
  • Accompanying research paper for academic reference

Architecture

The system follows a microservice-based architecture following the “Smart Endpoints, Dumb Pipes” principle:

  1. Media Gateway (Twilio/LiveKit) – Handles voice ingress/egress via low-latency WebRTC streaming
  2. Deepgram ASR – Streaming speech-to-text with chunked audio processing (~100 ms buffers)
  3. Orchestration (FastAPI + LiveKit Agents) – Manages conversation state and streaming flow
  4. GPT Agent (OpenAI) – Contextual reasoning, tool execution, and response generation
  5. MongoDB Atlas – Session memory and RAG context retrieval for fast, grounded reasoning
  6. Cartesia TTS – Ultra-low-latency text-to-speech streaming

Refer to Figure 1 in the research paper for a detailed visual overview.


Requirements

  • Python 3.11+
  • FastAPI
  • MongoDB Atlas account (for RAG context)
  • Twilio/LiveKit account (for telephony streaming)
  • Deepgram account (for ASR)
  • Cartesia account (for TTS)
  • Node.js (for NextJS frontend)

Installation

# Clone the repository
git clone https://github.com/<your-username>/NCTP.git
cd NCTP

# Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install backend dependencies
pip install -r requirements.txt

# Navigate to frontend and install dependencies
cd frontend
npm install
  

About

A professional, self-controlling AI assistant that handles live calls and manages your calendar with zero intervention. It leverages cutting-edge voice processing and seamless calendar integration to understand caller intent, verify your schedule, and autonomously book or reschedule meetings in real-time. Optimized for efficiency.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published