Neural Conversational Telephony Pipeline (NCTP)

Author: Chaitanya Kalbhairav
Department: CSE (AI), Vishwakarma Institute of Technology, Pune, India
Email: [email protected]

Research Paper: (https://drive.google.com/file/d/13clEW0UfqT2kV2ssnedwLidx83AVvtXD/view?usp=sharing)
Video Link: Watch Video
Info: Additional Info

Overview

The Neural Conversational Telephony Pipeline (NCTP) is a low-latency, autonomous voice AI system designed for real-time telephony conversations. It integrates streaming Automatic Speech Recognition (ASR), Large Language Model (LLM) reasoning, and ultra-fast Text-to-Speech (TTS) to achieve human-like conversational flow with minimal perceptible delay.

The system uses commercial services such as Twilio/LiveKit, Deepgram, OpenAI GPT, and Cartesia TTS, orchestrated via FastAPI, and leverages MongoDB Atlas for memory and context retrieval.

Additionally, a research paper detailing the architecture, methodology, performance analysis, and future directions is included in this repository.

Features

Real-time bidirectional voice interactions over telephony networks
Ultra-low response latency (< 800 ms average; P95 < 1000 ms)
Streaming ASR with Deepgram for continuous transcription
GPT-powered agent for contextual reasoning, intent recognition, and tool execution
High-speed RAG context retrieval using MongoDB Atlas
Streaming TTS with Cartesia for immediate audio output
Modular, containerized microservices for horizontal scalability
Accompanying research paper for academic reference

Architecture

The system follows a microservice-based architecture following the “Smart Endpoints, Dumb Pipes” principle:

Media Gateway (Twilio/LiveKit) – Handles voice ingress/egress via low-latency WebRTC streaming
Deepgram ASR – Streaming speech-to-text with chunked audio processing (~100 ms buffers)
Orchestration (FastAPI + LiveKit Agents) – Manages conversation state and streaming flow
GPT Agent (OpenAI) – Contextual reasoning, tool execution, and response generation
MongoDB Atlas – Session memory and RAG context retrieval for fast, grounded reasoning
Cartesia TTS – Ultra-low-latency text-to-speech streaming

Refer to Figure 1 in the research paper for a detailed visual overview.

Requirements

Python 3.11+
FastAPI
MongoDB Atlas account (for RAG context)
Twilio/LiveKit account (for telephony streaming)
Deepgram account (for ASR)
Cartesia account (for TTS)
Node.js (for NextJS frontend)

Installation

# Clone the repository
git clone https://github.com/<your-username>/NCTP.git
cd NCTP

# Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install backend dependencies
pip install -r requirements.txt

# Navigate to frontend and install dependencies
cd frontend
npm install

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Call_Agent		Call_Agent
Call_agent_backend		Call_agent_backend
call_agent_frontend		call_agent_frontend
.DS_Store		.DS_Store
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Neural Conversational Telephony Pipeline (NCTP)

Overview

Features

Architecture

Requirements

Installation

About

Uh oh!

Releases

Packages

Languages

Gitkbc/Neural_Conversational_Telephony_Pipeline

Folders and files

Latest commit

History

Repository files navigation

Neural Conversational Telephony Pipeline (NCTP)

Overview

Features

Architecture

Requirements

Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages