Speechmatics API

Top Builders

Explore the top contributors showcasing the highest number of app submissions within our community.

Speechmatics API

The Speechmatics API is the company's core speech-to-text service, providing batch file transcription and real-time streaming transcription via WebSocket. Powered by the Ursa 2 model (released October 2024), it supports 55+ languages and dialects, speaker diarization, automatic translation into 30+ target languages, and a suite of Voice Intelligence add-ons. Transcription requires no model fine-tuning; custom dictionaries of up to 1,000 words take effect immediately.

General
Release date	Generally available; Ursa 2 model released Oct 2024
Developer	Speechmatics
Type	Cloud speech-to-text API (batch and real-time)
License	Commercial API
Documentation	docs.speechmatics.com/speech-to-text
GitHub	speechmatics/speechmatics-python-sdk

Core Features

55+ languages and dialects: broad multilingual support including accent and dialect variants.
Two accuracy tiers: Enhanced (optimized for accuracy) and Standard (optimized for speed and cost).
Speaker diarization: multi-speaker detection included at no extra cost in all plans.
Custom dictionary: up to 1,000 domain-specific words added without retraining.
Automatic translation: transcripts translated into 30+ target languages via AI.
Voice Intelligence add-ons: summarization, sentiment analysis, topic detection, chapter generation, and entity recognition.
Audio events detection: identifies non-speech events in audio.
Smart formatting: formats numbers, dates, currencies, and capitalization automatically.
Sub-1-second real-time latency: streaming transcription via WebSocket.
Flexible deployment: cloud API, on-premises, on-device, Docker, and Kubernetes.

Accuracy Benchmarks (Ursa 2)

Metric	Result
WER on Kincaid46 (English)	7.88% (surpasses human-level on that test)
WER improvement vs. previous Ursa	18% reduction across 50+ languages
FLEURS dataset leadership	Leads in 62% of supported languages
Head-to-head vs. other providers	Wins 88% of comparisons

Pricing

Tier	Included	Rate
Free	480 minutes/month	No credit card required
Pro	Up to 6,000 hours/month	From $0.24/hour (with discount)
Enterprise	Unlimited scale, no rate limits	Custom

Volume discounts apply automatically above 500 hours per month per transcription type.

Tools and Resources

Batch API Reference: REST API for asynchronous file transcription jobs.
Real-time API Reference: WebSocket API reference for streaming transcription.
Python SDK: official SDK covering STT batch, real-time, and TTS.
JavaScript/TypeScript SDK: official browser and Node.js SDK.
Developer Portal: API key management and usage monitoring.

Ecosystem and Integrations

Integrates with LiveKit, Pipecat, and Vapi for voice pipeline deployments.
Available on Microsoft Azure Marketplace.
Compatible with on-device and edge deployments via Docker or Kubernetes.
Medical Model variant targets clinical transcription in English, German, Danish, and Norwegian.

Start building with the free tier (no credit card required) and explore the full API via docs.speechmatics.com.

Edit on GitHub

speechmatics Speechmatics api AI technology Hackathon projects

Discover innovative solutions crafted with speechmatics Speechmatics api AI technology, developed by our community members during our engaging hackathons.

ForgeAi

ForgeAI is a hardware-aware AI model optimization platform that automatically finds the fastest, most efficient version of a model for a specific GPU — starting with AMD MI300X. Instead of manually tuning models for each accelerator, ForgeAI runs a 7-phase optimization pipeline: architecture search finds the best candidate structures, knowledge distillation transfers accuracy from a teacher model, pruning removes redundant weights, quantization compresses from FP32 to INT8, benchmarking measures real performance on target hardware, Pareto analysis identifies optimal latency-accuracy tradeoffs, and Optuna hyperparameter tuning auto-optimizes across 6 parameters with 50 trials and early stopping. The platform consists of a FastAPI backend with 9 optimization modules, a Next.js 14 frontend, and WebSocket-based live progress streaming. Users upload a PyTorch checkpoint, select target hardware, set constraints (max latency, max memory, min accuracy), and watch the pipeline execute in real time. Results include a Pareto frontier chart, before/after performance comparison, and export to ONNX or TorchScript. ForgeAI targets the $100B+ AI inference market where hardware-specific optimization is still done manually. Unlike Neural Magic and OpenVINO (CPU-focused, tool-by-tool), ForgeAI is AMD-native, full-pipeline, and open source under Apache 2.0.

Speech Transcription and Recording Assistant

ASTRA, the Adaptive Speech Transcription and Recording Assistant, is a hybrid Windows desktop application designed to turn live meetings, interviews, hearings, trainings, consultations, and uploaded recordings into organized, reviewable, and exportable documentation. Before transcription begins, ASTRA prepares audio locally using FFmpeg and Silero Voice Activity Detection. Silent and non-speech portions are skipped, while useful speech is isolated and compressed before online transmission. This reduces upload size, unnecessary AI processing, and provider usage. Long recordings are divided into manageable sections, allowing users to monitor progress, replay audio, retry failed parts, resume interrupted work, and avoid restarting an entire transcription because of one failed section. Users can choose between online and offline processing. Offline mode runs Whisper locally for privacy, poor connectivity, or reduced cloud dependence. Online mode connects to the ASTRA Server through a license-protected API. The server validates access, accepts individual or batched audio clips, creates asynchronous transcription jobs, and returns job status while processing continues. It can route requests across multiple configured speech-to-text providers and automatically try another provider when the preferred service becomes unavailable. This server layer keeps provider credentials away from the desktop app and allows models or providers to be changed without rebuilding the client. After transcription, local Sherpa-ONNX speaker diarization adds anonymous speaker labels and keeps conversations easier to follow across sections. ASTRA also supports transcript polishing, summaries, timestamps, playback, speech-detection logs, processing status, and exportable output. The result is a practical transcription workflow that combines local privacy, cloud performance, provider resilience, and efficient AI resource usage for real documentation work.

Apohara Synthex

AI agents now run on the live web, but prompt injection is the number-one risk on the OWASP LLM Top 10, and most teams cannot prove what their agents ingested, or that it was safe. Apohara Synthex fixes that. Synthex is the provenance and security layer for the web data an AI agent consumes. It fetches across the full Bright Data spectrum: Web Unlocker, the Web Scraper API, SERP API, Scraping Browser, and the MCP Server. We didn't just use Bright Data; we improved it, contributing PR #140 upstream. Every fetch runs a layered defense before anything reaches a model. A deterministic regex pass and Qwen3Guard on Featherless form a high-recall net; NVIDIA's NemoGuard, selected by a measured benchmark, is the low-false-positive block gate; and a reasoning model on the AI/ML API knows the difference between describing an attack and executing one. Clean content is classified across four lenses, then sealed into an enterprise Evidence Report. The seal is real and shipped: an Ed25519 signature, an RFC 3161 DigiCert timestamp, an offline-verifiable Sigstore Rekor transparency log, and C2PA Content Credentials. Anyone can verify it in seconds with openssl, the industry's own c2patool, and a public ledger. No trust required. Cognee adds memory across re-scrapes, TriggerWare turns it into an automated monitor, and Kiro runs our continuous test and QA hooks. Synthex spans all three tracks, Security & Compliance, Finance & Market Intelligence, and GTM Intelligence, built for the CISO, CFO, compliance lead, and underwriter who need evidence they can defend to a board or a regulator. The average data breach costs 4.44 million dollars; Synthex seals an evidence artifact for a fraction of a cent. Everything signed, nothing trusted, and every number ships with a command to reproduce it.

WebDataOS

WebDataOS turns public web signals into sourced intelligence briefs for Security, GTM, and Finance teams. Enterprise AI agents fail on the live web. Bot detection, JavaScript rendering, geo-blocks, and stale data break scrapers. Even when data arrives, someone still has to decide what matters. WebDataOS solves both problems. The system runs a seven-step pipeline. Users submit tasks via text, voice, or audio. Speechmatics transcribes. Cognee checks its knowledge graph for prior context. The Bright Data gateway retrieves fresh evidence across five tools — SERP API, Web Scraper, Web Unlocker, Scraping Browser, and MCP Server — with automatic failure detection and recovery routing between them. OpenAI synthesizes contextual analysis from evidence and memory. The reasoning engine assesses each finding against organizational context: contracts, risk thresholds, financial exposure, and deadlines. Material findings generate action proposals — draft emails, schedule reviews, update registers — with human approval gates for high-stakes decisions. Outcomes are recorded to calibrate future accuracy. The platform serves developers via API and business users via web UI. Three domains are available as Core (pick 1), Pro (pick 2), or Enterprise OS (all 3 unified). Deployed live on Vercel and Vultr with all partner integrations configured. Covers all three hackathon tracks and all four partner prizes from one submission.

Sentra AI - Live GTM Intelligence OS

Enterprise GTM, strategy, and sales teams lose days stitching competitor pricing, SERPs, product pages, hiring signals, and macro news across spreadsheets, ad-hoc searches, and static battlecards. By the time a slide deck is updated, the market has already moved. Sentra AI is a GTM intelligence operating system built on Next.js and Supabase. Users describe what to watch in plain language; Sentra infers monitor intent, routes collection through Bright Data (SERP API, Web Unlocker, scraper/browser zones, and MCP search + scrape), and synthesizes evidence-backed risks, opportunities, and recommendations with AI/ML API and Featherless for document-heavy workflows. The dashboard surfaces live briefings and market context; chat answers GTM questions with visible provider attribution; Alerts run on-demand or on a schedule with executive reports, webhooks, and CRM-style export. Visual Forensics and Face Intelligence support image authenticity investigations. World Engine adds macro scenario views. Speechmatics enables spoken briefings. A unified History page stores every analysis run for review. Target audience: GTM leaders, competitive intelligence, revenue operations, and founders who need defensible, current external evidence—not generic LLM guesses. Production mode prioritizes live Bright Data collection so judges and users see real web evidence when zones are configured.

EROS - External Reality OS

EROS (External Reality Operating System) is a next-generation enterprise intelligence platform designed to help organizations understand and navigate the constantly changing external world. While companies have ERP systems for internal operations, CRM systems for customer relationships, and BI platforms for internal analytics, they lack a unified system capable of continuously monitoring, interpreting, and reasoning about external reality. Critical business signals such as competitor movements, supplier risks, market shifts, regulatory changes, pricing updates, technology adoption, and emerging opportunities already exist across the web, but they remain fragmented, unstructured, and difficult to operationalize. EROS solves this challenge by leveraging Bright Data's web intelligence infrastructure to collect, structure, and analyze public information at scale. The platform creates a living External Reality Twin for every monitored entity, including customers, prospects, vendors, suppliers, competitors, technologies, industries, and markets. Using a layered intelligence architecture, EROS transforms raw web data into evidence, evidence into signals, signals into events, and events into actionable business intelligence. The platform combines knowledge graphs, organizational memory, causal reasoning, pattern detection, and future-ready multi-agent intelligence to help organizations answer critical questions: What changed? Why did it change? How confident are we? What evidence supports this conclusion? What is likely to happen next? What action should we take? By turning the internet into a continuously updated intelligence layer, EROS enables sales teams to identify buying signals earlier, procurement teams to reduce supplier risk, security teams to detect external threats faster, and executives to make strategic decisions with real-time context. EROS transforms the web from a source of information into a system of enterprise intelligence.