ExecuTorch

Top Builders

Explore the top contributors showcasing the highest number of app submissions within our community.

ExecuTorch

ExecuTorch is Meta's production-grade framework for on-device AI inference, allowing PyTorch models to run natively on mobile phones, wearables, embedded systems, and AI PCs without cloud connectivity. Unlike conversion-based pipelines, ExecuTorch exports models directly from torch.export to a .pte binary format, retaining PyTorch semantics through the entire deployment stack. It reached general availability (v1.0) in October 2025 and is already in production across Meta's Ray-Ban Smart Glasses, Meta Quest headsets, and billions of on-device AI feature interactions on Instagram, WhatsApp, and Messenger.

General
GA date	18 Oct 2025 (v1.0)
Developer	Meta / PyTorch
Type	On-Device AI Inference Framework
License	BSD License
GitHub	pytorch/executorch
Documentation	docs.pytorch.org/executorch

Core Features

PyTorch-native export — models go from torch.export directly to .pte format with no ONNX or TFLite conversion step.
50 KB base runtime — minimal core footprint suitable for microcontrollers and embedded targets.
Ahead-of-time compilation — models compiled offline to .pte binaries, reducing on-device startup overhead.
Single-line backend switching — swap hardware accelerators (CPU, NPU, GPU) without rewriting model code.
Quantization tooling — INT8, INT4 (per-block), QAT+LoRA (QLoRA), and SpinQuant quantization via integrated PyTorch tools.
Selective operator builds — include only operators the model uses, minimizing binary size.
Multimodal support — composable backbone for LLMs, vision-language models, image segmentation, depth estimation, OCR, ASR, and object detection.
Hugging Face Optimum-ExecuTorch — over 80% of the most-downloaded edge-friendly models on Hugging Face run on ExecuTorch out of the box.

Supported Hardware Backends

Backend	Target	Status
XNNPACK + Arm KleidiAI	CPU (Android, iOS, Linux, AI PCs)	Stable
Apple Core ML	Apple silicon (iOS, macOS)	Stable
Qualcomm AI Engine / Hexagon NPU	Android (Qualcomm SoCs)	Stable
Arm Ethos-U NPU	Embedded / MCU	Stable
Vulkan GPU	Cross-platform GPU (Android, Linux)	Stable
Apple MPS (Metal Performance Shaders)	iOS / macOS GPU	Alpha
MediaTek NPU	Android (MediaTek SoCs)	Beta
Samsung Exynos NPU	Android (Samsung SoCs)	Alpha
Intel OpenVINO	AI PCs (Windows / Linux x86)	Alpha
CUDA	Linux / Windows GPU	Experimental

Hardware partners include Apple, Arm, Cadence, Intel, MediaTek, NXP Semiconductors, Qualcomm, and Samsung.

Performance (Llama 3.2 1B Quantized)

Device	Decode Speed	Prefill Speed
Samsung Galaxy S24+	>40 tokens/s	>350 tokens/s
OnePlus 12	50.2 tokens/s	260 tokens/s

Quantization reduces model size by ~52% (2.3 GiB to 1.1 GiB) and peak runtime memory by ~39%, with 2.5x average decode latency improvement over BF16 baseline.

Tools and Resources

GitHub — github.com/pytorch/executorch — source code, model export scripts, and backend integrations.
Documentation — docs.pytorch.org/executorch/stable — installation, export workflow, and backend guides.
PyPI — pip install executorch — Python package for model export and tooling.
Llama export script — export_llm command in the repo for exporting Llama variants to .pte format.
PyTorch blog — pytorch.org/blog/introducing-executorch-1-0 — GA announcement with performance benchmarks.

Ecosystem and Integrations

Powers on-device AI in Meta Ray-Ban Smart Glasses (live translation, visual captions, menu translation) and Oakley Meta Vanguard glasses (athletic performance insights).
Runs scene understanding, depth estimation, hand tracking, and persistent room memory on Meta Quest 3 / Quest 3S.
Llama 3.2 1B and 3B models were co-developed with Qualcomm and MediaTek for optimized Snapdragon deployment via ExecuTorch.
Backend available in Hugging Face Optimum-ExecuTorch for direct integration with the Hugging Face model hub.
Complements PyTorch Mobile for teams already in the PyTorch ecosystem, offering a significantly smaller runtime and better edge-hardware coverage.

Get started by cloning github.com/pytorch/executorch and following the quickstart guide, or install via pip install executorch for model export tooling.

Edit on GitHub

Meta Meta Executorch AI technology Hackathon projects

Discover innovative solutions crafted with Meta Meta Executorch AI technology, developed by our community members during our engaging hackathons.

LifePilot — On-Device Wellness Co-Pilot

Every AI wellness app works the same way: your tasks, sleep, spending, and health data get shipped to someone else's servers before you get an answer. LifePilot removes that trade-off — every model runs on the device itself. LifePilot is a calm, offline-first wellness app with four features, each backed by its own on-device model: • Overwhelm Manager — describe what's overwhelming you; an on-device Llama 3.2 1B model breaks it into a concrete 5–8 step plan, personalized from on-device memory that never syncs anywhere. • Energy Planner — a trained time-series model predicts your energy curve for the day from recent sleep and activity. • Hydration Tracker — a trained regression model sets a personalized daily water target and explains why. • Expense Scanner — point the camera at a receipt; on-device OCR plus trained models extract the merchant, total, currency, and category. No photo or text ever uploaded. Every model is exported to ExecuTorch and runs directly on the phone's own chip. There is no server-side inference path — put the phone in airplane mode and everything still works identically. That's not a fallback; it's the only mode. Where AMD fits: the four trained models (energy, hydration, and two expense models) are trained and exported on AMD Instinct MI300X GPUs via ROCm, on AMD's ROCm cloud notebooks — real GPU compute producing the exact .pte files that ship in the app. All four features are built and running on a real Android device today, including the on-device Llama agent generating multi-step task breakdowns in airplane mode. Privacy isn't a marketing line here — it's the architecture.

Lantern: Model anything

Lantern is an AI-powered mobile application that transforms a standard smartphone into a real-time 3D scanner using entirely on-device inference. Users simply point their phone at an object and walk around it for approximately 30 seconds while Lantern estimates depth, tracks camera motion, reconstructs geometry, and generates a dimensionally accurate 3D mesh, all without sending data to the cloud. Built with ExecuTorch, Lantern demonstrates how edge AI can deliver fast, private, and offline 3D reconstruction on consumer smartphones. By eliminating the need for expensive LiDAR hardware, external computers, or cloud processing, Lantern makes high-quality 3D scanning more accessible to engineers, designers, makers, researchers, and students. The generated 3D models can be used for reverse engineering, CAD workflows, rapid prototyping, digital asset creation, manufacturing, and AR/VR applications. Lantern showcases the power of on-device AI by combining computer vision, monocular depth estimation, and real-time 3D reconstruction into a seamless mobile experience. Our goal is to make professional-quality 3D scanning available to anyone with a smartphone while preserving user privacy and enabling low-latency performance through efficient edge inference.

Memory Recall Assistant

Memory Recall Assistant is a fully on-device, multimodal AI memory aid designed for people with early-to-mid stage cognitive decline. The app runs continuously in the background, periodically capturing context through the camera and microphone. When the user presses Recall while looking at someone, the app identifies the person using CavaFace face embeddings (w8a16 quantized, ~99.5% accuracy on LFW benchmark), retrieves stored conversation history transcribed by Whisper-Small-Quantized, and generates a warm, natural memory cue using Llama 3.2 1B Instruct - all within seconds and entirely offline. A key design innovation is center crop targeting: rather than identifying the largest or highest-confidence face in frame, the app identifies the face closest to the center of the camera frame - the person the user is actively talking to, not a bystander. This makes identification intentional and natural, and extends directly to Meta AI Glasses where the camera points where the user looks. All sensitive data - face embeddings, conversation transcripts, location context - is stored on-device in a local SQLite knowledge graph and never transmitted anywhere. The system functions fully in airplane mode, demonstrating that privacy-sensitive AI for vulnerable populations is not only possible on mobile hardware - it is better this way. There is no server to breach, no API to call, and no company holding your family's data.

PrivateDoc

PrivateDoc is an offline, on-device document intelligence assistant designed for sensitive personal paperwork, running on a Samsung S25 Ultra phone powered by Qualcomm SnapDragon. It combines ML Kit OCR, ExecuTorch, MiniLM semantic retrieval, and an on-device Llama 3.2 1B LLM to scan, classify, index, and answer questions about prescriptions, contracts, bills, IDs, and other documents without requiring an Internet connection. A strict retrieval-augmented generation (RAG) pipeline ensures every response is grounded in retrieved OCR evidence, with supporting text highlighted on the original document to improve transparency and reduce hallucinations. By performing OCR, retrieval, and language generation entirely on the device, PrivateDoc preserves user privacy while delivering fast, reliable, and source-backed document understanding.

EchoWalk: On-Device Guidance for Low-Vision Users

Imagine walking through an unfamiliar room with your eyes closed. You need to know what is ahead, what is around you, and how to reach the chair someone mentioned — without cloud latency or sending your camera feed anywhere. EchoWalk is built for that moment. On a Galaxy S25 Ultra, one shared camera pipeline feeds a central ModeManager that decides when to warn, when to describe, and when to search — all on the Snapdragon NPU via ExecuTorch and Qualcomm QNN. Safety Radar runs continuously. Depth Anything V2 and YOLOv10 fuse on the Hexagon NPU: not just what is there, but how far and whether it is a trip hazard or a wall you can trail. Spatial audio and haptics place obstacles in space; a VoiceWarningEngine speaks when it matters. A live bounding-box overlay helps sighted helpers follow along in demos. Scene Description is on demand — tap the preview, the Describe button, or long-press Volume Up. A short burst of frames runs through a Places365 classifier and pairs the room label with live YOLO directions: "You appear to be in a living room — couch on your left, TV ahead." Auto-describe announces stable scene changes hands-free. The full SmolVLM-500M stack is integrated and validated through handoff scripts; richer VLM captions are ready for the next aligned build. Find Mode is voice-first. Long-press Volume Down, say "find the bottle," and the app maps your words to everyday object labels. It scans the room, guides you turn by turn, warns about obstacles in your path, highlights the target on screen, and remembers where it last saw it so the next search starts with a hint. Accessibility is front and center: lock-screen access, screen-on at launch, spoken onboarding with a first orientation from live radar, eyes-free volume shortcuts, and double-tap to repeat your last description. No cloud. No upload. Your home never leaves your pocket.

Beacon

When the river crests and the towers go dark, a hundred people end up stranded in a school gym with no signal and no way to call for help. A volunteer nurse faces a growing line of the sick and injured with no one to consult. A teacher manages sixty frightened kids alone. A family doesn't know if their water is safe to drink. Every one of them is holding a phone with a powerful on-device NPU, but cloud AI dies the instant the network does, and no single phone has the memory or compute to run a frontier-grade LLM by itself. Beacon is built around this constraint from the start: the model is pre-sharded before disaster strikes, not after. Users opt in ahead of time, downloading a layer-wise slice of a large language model's weights onto their device, a contiguous block of transformer layers sized to that phone's available memory and NPU class. These shards sit dormant on the device, costing nothing until they're needed. When the network goes down, phones nearby connect over a peer-to-peer hotspot network: one phone hosts, others join directly, with no router or internet infrastructure required. Beacon assembles an inference cluster from whichever pre-loaded layer shards happen to be present in the room, sequencing them in the correct layer order for a forward pass. The hotspot link only needs to negotiate which layers are available, route activations between phones in sequence, and reroute around a phone that drops out or runs out of battery. The heavy lifting, distribution, was done in advance, when everyone still had a connection. The result is a cluster that can assemble in seconds during an emergency, because the only real-time job is discovery and coordination, not download. The nurse gets triage guidance. The teacher gets crisis-management support. The family gets a real answer about their water. The help didn't arrive; it was already pre-positioned in their pockets, just waiting to be switched on.