Top Builders

Explore the top contributors showcasing the highest number of app submissions within our community.

Speechmatics Flow

Speechmatics Flow is a speech-to-speech API for building real-time conversational AI agents. Announced in July 2024, it combines Speechmatics' speech recognition with an LLM and text-to-speech into a single API connection, removing the need to stitch together separate transcription, inference, and synthesis services. Flow handles the real-time challenges of two-way voice conversations, including turn detection, interruption management, and multi-speaker isolation.

General
AnnouncedJuly 30, 2024
DeveloperSpeechmatics
TypeVoice agent API (speech-to-speech)
LicenseCommercial API
Documentationdocs.speechmatics.com/voice-agents-flow
GitHubspeechmatics/speechmatics-flow

Core Features

  • End-to-end speech-to-speech pipeline: STT, LLM, and TTS via a single API call.
  • Smart turn detection: uses a small language model (SLM) to decide when a speaker's turn has ended, reducing false triggers.
  • Interruption handling: ignores unintentional interruptions, handles intentional ones gracefully.
  • Speaker locking: isolates a target speaker and filters out background voices in multi-speaker environments.
  • Function calling: connects agents to external tools, APIs, databases, and validation services.
  • Internet search: agents can query live web data (weather, news) during conversations.
  • 55+ language support: same multilingual coverage as Speechmatics' STT API.
  • Conversation moderation: real-time transcript analysis to flag or filter content.
  • Flexible deployment: private SaaS cloud or on-premises.
  • Security: ISO/IEC 27001:2022 certified, GDPR compliant.

Pricing

TierIncluded
FreeUp to 50 hours/month
EnterpriseCustom pricing, contact Speechmatics

Tools and Resources


Ecosystem and Integrations

  • Works with Vapi for no-code voice agent deployments.
  • Works with LiveKit for WebRTC-based real-time infrastructure.
  • Works with Pipecat for open-source voice pipeline orchestration.
  • Supports contact center, healthcare, drive-thru, educational assistant, and smart device use cases.

Get started with the free tier (50 hours/month) or contact Speechmatics for enterprise access at flow-help@speechmatics.com.

speechmatics Speechmatics flow AI technology Hackathon projects

Discover innovative solutions crafted with speechmatics Speechmatics flow AI technology, developed by our community members during our engaging hackathons.

Dementia ASR Screening

Dementia ASR Screening

Overview The Dementia ASR Screening & Multi-Factorial Risk Stratification Application is an advanced digital health portal designed to bridge the gap between AI-driven speech biomarkers and modifiable clinical risk factors. Built on a premium, mobile-first dark theme interface, the portal enables patients to undergo non-invasive, visual speech elicitation tests in seconds, generating comprehensive, weighted dementia conversion risk assessments. Key Features & Technological Innovation Lossless Acoustic Elicitation: Features a custom-built, client-side Web Audio API recorder that encodes raw microphone samples directly into lossless 16-bit Mono PCM WAV format. This ensures 100% processing compatibility with cloud-based Automatic Speech Recognition engines. Speechmatics SaaS ASR Integration: Orchestrates advanced speech analyses via Express backend proxies. By disabling disfluency filtering (remove_disfluencies: false), the portal natively captures and tags spoken filler words. Acoustic Pause Biomarker Extraction: Reads word-level transcript timestamps to automatically isolate and flag long conversational hesitation gaps exceeding 1.5 seconds, a key marker in acoustic cognitive screens. Wiley 2025 Multi-Factorial Stratification: Incorporates a validated clinical risk algorithm based on the October 2025 study in Alzheimer's & Dementia (DOI: 10.1002/alz.70870): Age Scoring, Cognitive Reserve, BMI,Depression Screening (PHQ-2), Weighted Clinical Risk Dial: Automatically weighs Vocal speech biomarkers (40% weight) against Clinical modifiable factors (60% weight) to render a composite, color-coded Stratified Risk Indicator (Low, Moderate, or High Risk). Comprehensive Session Review Timeline: Connects to a self-healing SQLite database to store serialized disfluency details, allowing patients to inspect past tests and review historical reports exactly as they looked live along with an Interactive Demographics Editor.

VoxCall Oracle: Live Audio Trading Agent

VoxCall Oracle: Live Audio Trading Agent

The latency gap in modern finance is a multi-million dollar problem. Traditional algorithmic trading relies on text-based transcripts that humans or bots read minutes after words are actually spoken on an earnings call. By the time the data is digested, the market has already moved. VoxCall Oracle is an end-to-end, production-ready pipeline designed to bridge the gap between raw audio and instant market execution. Our fully autonomous agent listens to live earnings calls and executes trades in real-time without human intervention. We utilized four core technologies to build this stack: 1. Speechmatics: We use their best-in-class speech-to-text API for real-time transcription and speaker diarization. This allows our agent to know exactly *what* was said and *who* said it (e.g., separating a confident CEO from a cautious reporter). 2. Featherless AI: To eliminate the risk of AI hallucinations, we route the transcript chunks into a 3-model financial ensemble via Featherless (Llama-Open-Finance, Fin-o1, and finance-chat). These specialized models vote on the market sentiment to ensure pinpoint accuracy. 3. LangGraph: In automated trading, safety is critical. We built a LangGraph state-machine that acts as a risk-gating firewall. Trades are only approved if the AI ensemble's confidence score exceeds a strict user-defined threshold (e.g., >75%). 4. Kraken CLI: Approved signals are instantly routed to the Kraken API for ultra-low-latency order execution. Our entire platform is wrapped in a premium, glassmorphism UI deployed on Streamlit Cloud, featuring a live Paper PnL tracker and execution logs. VoxCall Oracle isn't just a script—it's a fully functional, cloud-native hedge fund analyst.