Cloudflare Workers AI

Top Builders

Explore the top contributors showcasing the highest number of app submissions within our community.

Cloudflare Workers AI

Run machine learning models, powered by serverless GPUs, on Cloudflare's global network. Workers AI allows you to run AI models in a serverless way, without having to worry about scaling, maintaining, or paying for unused infrastructure. You can invoke models running on GPUs on Cloudflare's network from your own code â€” from Workers, Pages, or anywhere via the Cloudflare API.

General
Author	Cloudflare, Inc.
Website	Cloudflare Workers AI
Documentation	https://developers.cloudflare.com/workers-ai/
Type	Serverless AI Inference Platform
Launch Year	2023
GPU Network	180+ cities globally

Features

50+ Open-Source Models

Text generation (Llama, Mistral, and more)
Text embeddings and classification
Image generation and classification
Automatic speech recognition
Translation models
Object detection capabilities

Serverless Infrastructure

Pay-for-what-you-use pricing model
Automatic scaling with demand
No infrastructure management required
Fast cold start times with V8 isolates

Global Edge Network

AI inference close to users for low latency
Models available in 180+ cities worldwide
Reduced network bottlenecks
Consistent performance globally

Developer Platform Integration

Seamless integration with Cloudflare Workers
Works with Pages for full-stack AI applications
REST API for platform-agnostic access
Integration with Vectorize (vector database)
AI Gateway for monitoring and control

Key Capabilities

Edge AI Computing: Run AI models at the network edge for minimal latency
Serverless GPU Access: Access powerful GPU infrastructure without provisioning
Model Catalog: Curated selection of popular open-source AI models
Real-time Inference: Low-latency AI processing for interactive applications
Global Deployment: Deploy once, run everywhere on Cloudflare's network
Integrated Ecosystem: Works with R2 storage, D1 database, and other Cloudflare services

Use Cases

Building AI-powered chatbots and conversational interfaces
Real-time content moderation and classification
Image and video processing at scale
Personalization and recommendation engines
Automated translation and localization
Voice recognition and text-to-speech applications
RAG (Retrieval-Augmented Generation) systems
AI-powered API endpoints and microservices

Supported Model Categories

Large Language Models: For text generation and chat applications
Embedding Models: For semantic search and similarity matching
Image Models: For generation, classification, and analysis
Speech Models: For transcription and synthesis
Vision Models: For object detection and recognition
Translation Models: For multilingual content processing

Edit on GitHub

Cloudflare Cloudflare Workers AI AI technology Hackathon projects

Discover innovative solutions crafted with Cloudflare Cloudflare Workers AI AI technology, developed by our community members during our engaging hackathons.

Continuity Agent

Continuity Agent is an offline-first backend that helps small Ethiopian businesses shops, restaurants, hotel front desks keep recording sales even when the internet or mobile payment networks like Telebirr and CBE Birr go down. This is a frequent, real operational risk: when connectivity drops, sales simply stop being recorded, forcing merchants back to paper records and manual reconciliation. The system lets a merchant record a sale instantly to a local SQLite queue, regardless of connectivity. When the connection returns, a hybrid AI routing engine reconciles every queued transaction: routine sales are classified and cleared by a fast, inexpensive model (Llama 3.1 8B via Fireworks AI on AMD Instinct GPUs), while unusual or high-value transactions are escalated to a more powerful model (Llama 3.1 70B) that writes a real audit note for human review. This hybrid approach is not just a technical choice it directly determines whether the product is affordable for a small merchant operating on thin margins. Based on real logged transaction data from the /metrics endpoint, this routing approach delivers an estimated 86% cost reduction compared to routing every transaction through the larger model. If the Fireworks API is ever unavailable, the system automatically falls back to rule-based routing, so the merchant's core workflow — record, queue, sync never breaks, even in a worst-case scenario. This reliability mirrors the same promise the product makes to its users. Built solo in Addis Ababa, Ethiopia, this project targets a problem that affects hundreds of thousands of small merchants locally, and the same underlying pattern mobile-first commerce outrunning infrastructure reliability applies broadly across emerging markets.

Bayanihan Collective

Bayanihan Collective is the operational backbone for independent AI-developer cooperatives — a member-owned model built on mutual support, shared resources, and collective governance, not a commission-based freelance marketplace. Tech-worker cooperatives already exist and prove the model works (CoTech, Patio, FACTTIC), but they run on informal networks, mailing lists, and spreadsheets, with no modern operational tooling built for the AI era. Bayanihan Collective is that missing layer. The platform has three surfaces in one shell: an Ops Dashboard for case tracking with a 3-tier escalation model; an Onboarding Presenter that guides new members through real content, a scroll-gated quiz, an Available Mentors panel, and live AI-grounded Q&A; and a Member Concierge chat that answers onboarding, resource-sharing, and mentorship questions. Both AI-powered features run on the same two-layer pipeline: Gemma classifies intent through a three-tier chain anchored in a self-hosted google/gemma-3-12b-it model served via vLLM on a real AMD Instinct MI300X GPU instance (AMD Developer Cloud), falling back gracefully to Google AI Studio and then a local classifier only if that primary tier is unreachable. Claude is always the primary responder, generating the actual reply grounded in Gemma's classification — every message shows exactly which tier answered, so the architecture is visible, not just claimed. Built solo by Tribeium, containerized with Docker Compose, deployed live on Cloudflare Pages and Render, open source under MIT.

ExecOS AI – Multi-Agent Executive Operating System

ExecOS AI is an AI-powered Executive Operating System designed to help founders, startups, and small businesses make faster, data-driven decisions without hiring an expensive executive team. Instead of simply answering questions, ExecOS AI acts like an AI boardroom where specialized business experts collaborate to analyze business performance and generate actionable insights. Users can upload business documents such as financial reports, sales data, operational records, marketing performance, customer feedback, and internal reports. These documents are automatically processed using Retrieval-Augmented Generation (RAG), allowing the platform to understand the company's current business context before answering questions or generating reports. ExecOS AI uses a workflow-based orchestration system that coordinates multiple AI specialists, including Finance, Marketing, Business Analysis, Customer Success, and Executive Strategy. Their combined knowledge is synthesized into a single executive response, giving business owners practical recommendations instead of isolated answers. The platform supports natural language business conversations where users can ask questions such as "Why has our profit margin decreased?", "What is our revenue trend?", or "How can we reduce operating expenses?" The AI analyzes uploaded business knowledge and produces contextual recommendations, risk assessments, and improvement strategies tailored to the business. One of the platform's key features is AI-generated Executive Reports. With a single click, ExecOS AI produces a structured executive report containing financial analysis, marketing insights, operational risks, customer experience analysis, business health scoring, confidence metrics, priority actions, and a 30-day strategic action plan. Our goal is to make executive-level business intelligence accessible to every startup and small business by providing instant, AI-driven strategic guidance through a modern and intuitive platform

LogiSecure AI-Autonomous On-Prem Logistics Copilot

What is LogiSecure AI? LogiSecure AI is an enterprise-grade autonomous logistics platform that combines real-time global supply chain monitoring with secure on-premise AI. Through a 5-step agentic workflow, it detects disruptions, analyzes their impact, and autonomously responds while keeping sensitive shipment data on local AMD hardware. The Problem: Logistics companies face a critical trade-off: use cloud AI and risk exposing confidential operational data, or keep data on-premise and lose real-time global visibility. This results in delays, inefficiencies, and reactive decision-making. Our Solution: LogiSecure AI is a 100% on-premise agentic control hub that: Monitors global supply chains using aviation, maritime, GPS, weather, and geopolitical APIs. Detects disruptions with AI-driven risk scoring. Matches incidents with confidential local shipment data. Performs local LLM inference on AMD ROCm to optimize routes and assess impact. Automatically updates routes and prepares client notifications. Technical Implementation: Built with FastAPI, LangGraph, and Fireworks AI, featuring: 5-step multi-agent workflow Real-time data from OpenSky, AISstream, OSRM, and NewsData Unified air, sea, and land shipment tracking AMD ROCm-ready on-premise inference Dockerized deployment Interactive Swagger API Business Value: 🔒 Data Sovereignty: Zero cloud data leakage ⚡ Faster Operations: Response time reduced from hours to seconds 💰 Lower Costs: Eliminates cloud inference fees 🛡️ Risk Mitigation: Proactive disruption management 🌍 Competitive Edge: Secure real-time supply chain visibility

LessonForge — AI Video Lessons and short reels

LessonForge is an AI studio that turns a topic and a set of learning objectives into a fully rendered, studio-quality video lesson — narrated, on-brand, and personalised per student. The problem: producing educational video is slow and expensive. A single lesson can take a faculty member days of scripting, filming, and editing, and it can't easily be adapted for different students, languages, or reading levels. Great teaching gets locked to one recording. Our approach flips that. Faculty define a lesson once — topic, objectives, and a chosen on-screen educator. LessonForge's agentic Lesson Application Framework then does the heavy lifting: a Curriculum Architect structures the lesson, a Scene Consistency Director locks visual continuity across every shot, and a Prompt Engineer produces clean, filmable scene descriptions. Each shot is rendered as a still for human review, approved, then generated as final video with synced narration. The stack is built end-to-end on the hackathon's ecosystem. Gemma 4, served through Fireworks AI, powers the agentic reasoning. AMD Instinct MI300X GPUs run the video generation pipeline, with LTX video models producing the final clips. A human-in-the-loop review gate keeps every frame accurate before it ships. The result: faculty record once, and AMD-powered agents personalise it for every student — different presenters, pacing, and scripts from a single source lesson. What took weeks now takes minutes.

Emergent Negotiation Arena

Emergent Negotiation Arena is an interactive multi-agent AI experiment that explores how communication can emerge from necessity. Three autonomous agents enter a resource-scarce 10×10 grid world with no shared language or predefined vocabulary. To survive, they must negotiate trades. Instead of communicating with ordinary words, the agents invent and exchange symbols. As the simulation progresses, the system tracks how often those symbols are reused, in which trade contexts they appear, and whether multiple agents begin assigning them the same meaning. When a symbol consistently stabilizes across agents for the same context, it becomes an emergent shared “word.” The project supports multiple execution modes. Fireworks enables live LLM-powered agents, while heuristic mode provides a rule-based live simulation and replay mode lets judges inspect a recorded run without consuming API credits. A real-time Gradio dashboard visualizes the world, agent activity, trade outcomes, vocabulary formation, emergent words, and benchmark data. The project is designed as both an AI research experiment and an accessible interactive demo. It demonstrates how coordination, negotiation, and primitive language can emerge without giving agents a shared vocabulary in advance. The application includes a tested multi-agent simulation engine, parallel agent execution, live AI integration, replayable experiment artifacts, and 84/84 passing tests.