Speechmatics

Top Builders

Explore the top contributors showcasing the highest number of app submissions within our community.

Speechmatics

Founded in 2006 by Dr. Tony Robinson, a Cambridge University speech recognition pioneer, Speechmatics builds infrastructure to understand every voice globally. The company's core mission is inclusive, multilingual speech AI, covering transcription, real-time voice agents, and on-device deployment. Speechmatics serves enterprise clients across media, healthcare, financial services, and contact centers.

General
Company	Speechmatics
Founded	2006 by Dr. Tony Robinson
CEO	Katy Wigdahl
Headquarters	Cambridge, United Kingdom
Website	speechmatics.com
Documentation	docs.speechmatics.com
GitHub	github.com/speechmatics
Type	Speech AI / B2B SaaS

Core Products

Speechmatics API (Speech-to-Text)

The Speechmatics API provides batch and real-time transcription across 55+ languages, powered by the Ursa 2 model released in October 2024. It supports speaker diarization, custom dictionaries, automatic translation, and Voice Intelligence add-ons (summarization, sentiment analysis, entity recognition) with no retraining required.

Speechmatics Flow

Flow is a voice agent API that combines Speechmatics' speech-to-text with an LLM and text-to-speech into a single real-time pipeline. Developers connect through a single API call to build conversational AI agents with smart turn detection, interruption handling, and function calling support.

Developer Resources

Speechmatics provides SDKs for Python, JavaScript/TypeScript, and .NET, along with a developer portal and free tier to get started without a credit card.

Helpful Links

Documentation — official API reference, quickstarts, and integration guides
GitHub — open-source SDKs and client libraries
Developer Portal — API key management and usage dashboard
Pricing — free tier and pay-as-you-go rates

Key Features

Multilingual transcription across 55+ languages Speechmatics' Ursa 2 model leads accuracy benchmarks in 62% of supported languages on the FLEURS dataset, with 7.88% WER on Kincaid46 for English, surpassing human-level accuracy on that benchmark.

Flexible deployment Speechmatics runs on private SaaS cloud, on-premises, on-device, and via Docker or Kubernetes, making it suitable for data-sensitive industries like healthcare and finance.

Voice Intelligence add-ons Summarization, sentiment analysis, topic detection, chapter generation, and entity recognition layer on top of transcription without requiring additional integration work.

Use Cases

Contact center automation Real-time transcription and sentiment analysis during calls, combined with Flow for automated voice agent handling of common queries.

Clinical transcription Speechmatics' Medical Model (launched 2024) targets 93% real-time accuracy and 96% medical keyword recall for English, German, Danish, and Norwegian.

Media and broadcast Batch transcription of audio and video files for subtitling, archiving, and content search across multiple languages.

Edit on GitHub

speechmatics AI Technologies Hackathon projects

Discover innovative solutions crafted with speechmatics AI Technologies, developed by our community members during our engaging hackathons.

ForgeAi

ForgeAI is a hardware-aware AI model optimization platform that automatically finds the fastest, most efficient version of a model for a specific GPU — starting with AMD MI300X. Instead of manually tuning models for each accelerator, ForgeAI runs a 7-phase optimization pipeline: architecture search finds the best candidate structures, knowledge distillation transfers accuracy from a teacher model, pruning removes redundant weights, quantization compresses from FP32 to INT8, benchmarking measures real performance on target hardware, Pareto analysis identifies optimal latency-accuracy tradeoffs, and Optuna hyperparameter tuning auto-optimizes across 6 parameters with 50 trials and early stopping. The platform consists of a FastAPI backend with 9 optimization modules, a Next.js 14 frontend, and WebSocket-based live progress streaming. Users upload a PyTorch checkpoint, select target hardware, set constraints (max latency, max memory, min accuracy), and watch the pipeline execute in real time. Results include a Pareto frontier chart, before/after performance comparison, and export to ONNX or TorchScript. ForgeAI targets the $100B+ AI inference market where hardware-specific optimization is still done manually. Unlike Neural Magic and OpenVINO (CPU-focused, tool-by-tool), ForgeAI is AMD-native, full-pipeline, and open source under Apache 2.0.

Speech Transcription and Recording Assistant

ASTRA, the Adaptive Speech Transcription and Recording Assistant, is a hybrid Windows desktop application designed to turn live meetings, interviews, hearings, trainings, consultations, and uploaded recordings into organized, reviewable, and exportable documentation. Before transcription begins, ASTRA prepares audio locally using FFmpeg and Silero Voice Activity Detection. Silent and non-speech portions are skipped, while useful speech is isolated and compressed before online transmission. This reduces upload size, unnecessary AI processing, and provider usage. Long recordings are divided into manageable sections, allowing users to monitor progress, replay audio, retry failed parts, resume interrupted work, and avoid restarting an entire transcription because of one failed section. Users can choose between online and offline processing. Offline mode runs Whisper locally for privacy, poor connectivity, or reduced cloud dependence. Online mode connects to the ASTRA Server through a license-protected API. The server validates access, accepts individual or batched audio clips, creates asynchronous transcription jobs, and returns job status while processing continues. It can route requests across multiple configured speech-to-text providers and automatically try another provider when the preferred service becomes unavailable. This server layer keeps provider credentials away from the desktop app and allows models or providers to be changed without rebuilding the client. After transcription, local Sherpa-ONNX speaker diarization adds anonymous speaker labels and keeps conversations easier to follow across sections. ASTRA also supports transcript polishing, summaries, timestamps, playback, speech-detection logs, processing status, and exportable output. The result is a practical transcription workflow that combines local privacy, cloud performance, provider resilience, and efficient AI resource usage for real documentation work.

Apohara Synthex

AI agents now run on the live web, but prompt injection is the number-one risk on the OWASP LLM Top 10, and most teams cannot prove what their agents ingested, or that it was safe. Apohara Synthex fixes that. Synthex is the provenance and security layer for the web data an AI agent consumes. It fetches across the full Bright Data spectrum: Web Unlocker, the Web Scraper API, SERP API, Scraping Browser, and the MCP Server. We didn't just use Bright Data; we improved it, contributing PR #140 upstream. Every fetch runs a layered defense before anything reaches a model. A deterministic regex pass and Qwen3Guard on Featherless form a high-recall net; NVIDIA's NemoGuard, selected by a measured benchmark, is the low-false-positive block gate; and a reasoning model on the AI/ML API knows the difference between describing an attack and executing one. Clean content is classified across four lenses, then sealed into an enterprise Evidence Report. The seal is real and shipped: an Ed25519 signature, an RFC 3161 DigiCert timestamp, an offline-verifiable Sigstore Rekor transparency log, and C2PA Content Credentials. Anyone can verify it in seconds with openssl, the industry's own c2patool, and a public ledger. No trust required. Cognee adds memory across re-scrapes, TriggerWare turns it into an automated monitor, and Kiro runs our continuous test and QA hooks. Synthex spans all three tracks, Security & Compliance, Finance & Market Intelligence, and GTM Intelligence, built for the CISO, CFO, compliance lead, and underwriter who need evidence they can defend to a board or a regulator. The average data breach costs 4.44 million dollars; Synthex seals an evidence artifact for a fraction of a cent. Everything signed, nothing trusted, and every number ships with a command to reproduce it.

Apohara Synthex

WebDataOS

WebDataOS turns public web signals into sourced intelligence briefs for Security, GTM, and Finance teams. Enterprise AI agents fail on the live web. Bot detection, JavaScript rendering, geo-blocks, and stale data break scrapers. Even when data arrives, someone still has to decide what matters. WebDataOS solves both problems. The system runs a seven-step pipeline. Users submit tasks via text, voice, or audio. Speechmatics transcribes. Cognee checks its knowledge graph for prior context. The Bright Data gateway retrieves fresh evidence across five tools — SERP API, Web Scraper, Web Unlocker, Scraping Browser, and MCP Server — with automatic failure detection and recovery routing between them. OpenAI synthesizes contextual analysis from evidence and memory. The reasoning engine assesses each finding against organizational context: contracts, risk thresholds, financial exposure, and deadlines. Material findings generate action proposals — draft emails, schedule reviews, update registers — with human approval gates for high-stakes decisions. Outcomes are recorded to calibrate future accuracy. The platform serves developers via API and business users via web UI. Three domains are available as Core (pick 1), Pro (pick 2), or Enterprise OS (all 3 unified). Deployed live on Vercel and Vultr with all partner integrations configured. Covers all three hackathon tracks and all four partner prizes from one submission.

Sentra AI - Live GTM Intelligence OS

Enterprise GTM, strategy, and sales teams lose days stitching competitor pricing, SERPs, product pages, hiring signals, and macro news across spreadsheets, ad-hoc searches, and static battlecards. By the time a slide deck is updated, the market has already moved. Sentra AI is a GTM intelligence operating system built on Next.js and Supabase. Users describe what to watch in plain language; Sentra infers monitor intent, routes collection through Bright Data (SERP API, Web Unlocker, scraper/browser zones, and MCP search + scrape), and synthesizes evidence-backed risks, opportunities, and recommendations with AI/ML API and Featherless for document-heavy workflows. The dashboard surfaces live briefings and market context; chat answers GTM questions with visible provider attribution; Alerts run on-demand or on a schedule with executive reports, webhooks, and CRM-style export. Visual Forensics and Face Intelligence support image authenticity investigations. World Engine adds macro scenario views. Speechmatics enables spoken briefings. A unified History page stores every analysis run for review. Target audience: GTM leaders, competitive intelligence, revenue operations, and founders who need defensible, current external evidence—not generic LLM guesses. Production mode prioritizes live Bright Data collection so judges and users see real web evidence when zones are configured.