Back to selected work
Selected Work

Real-Time Conversational Engine for AI Voice Agents

Designed a finite-state conversational engine orchestrating telephony, speech recognition, language models, tool execution, and speech synthesis for real-time AI phone calls.

Overview

Built the core conversational service powering AI voice agents. The system managed the entire lifecycle of a phone conversation, routing audio between telephony providers, speech-to-text services, language models, text-to-speech providers, and internal backend services while maintaining conversational state in real time.

Problem

Human conversations are unpredictable. Callers interrupt, change topics, request human assistance, or refuse further contact. The challenge was designing a low-latency system capable of handling these behaviors while collecting structured information and coordinating multiple AI providers reliably.

Approach

Modeled conversations as a finite-state machine with explicit states such as listening, thinking, speaking, and waiting. Each state managed its own interactions with speech recognition, language models, text-to-speech providers, and backend services. Built a provider-agnostic architecture allowing different STT, TTS, and LLM vendors to be configured without changing the core orchestration logic.

Outcomes

  • Enabled real-time AI phone conversations across multiple providers.
  • Supported dynamic conversation flows instead of rigid scripted interactions.
  • Automated structured information collection through conversational guidance.
  • Handled escalation paths such as human handoff and do-not-contact requests.
  • Decoupled conversational logic from telephony and AI provider implementations.
  • Created a reusable foundation for future voice-agent capabilities.