
𩺠env-doctor Stop wasting expensive GPU hours on environment failures. Have you ever had an ML build fail or rerun expensive training because of a CUDA mismatch? You launch a job on a $30/hr H100 cluster, only to find it crashed 5 minutes in because flash-attn wasn't compiled for your CUDA version or xformers mismatched with torch. Enter env-doctor: a local-first runtime compatibility platform for Python AI/ML workflows. Premise: "If one user faces an environment failure, no other user will ever face it again." š Why env-doctor? Traditional managers only check if packages can be installed together. env-doctor checks: "Will this stack actually work at runtime on your exact hardware?" We stop OOM errors, silent CUDA fallback slowdowns, and breaking changes before you provision a GPU. Core Features š”ļø Community Intelligence: Vetted by AI agents (Watsonx Orchestrate) and pushed to a global DB. š§ Smart VRAM Estimation: OOM detection accounting for quantization, KV cache, and fragmentation across vllm, transformers, llama.cpp, and tgi. š Stable Recommendations: Analyzes hardware to recommend rock-solid dependency stacks. š Deep Checks: Scans files against known ABI conflicts and CUDA mismatches. š¤ AI Bug Reporting: Captures stack traces and system states to generate new protection rules. ā” Quick Start Install env-doctor globally using uv or pip: pip install env-doctor-pypi 1. Sync Database: env-doctor update-db 2. Check Project: env-doctor check requirements.txt š“ Critical Issue: torch 2.1.0 and flash-attn 2.5.0 conflict. Will cause segmentation fault. 3. Estimate VRAM: env-doctor vram --model meta-llama/Llama-2-7b-hf --runtime vllm --seq-len 32768 --quant fp16 4. Get Recommendations: env-doctor recommend š ļø Supported: vllm, transformers, tgi, deepspeed, tensorrt-llm, llama.cpp, onnxruntime on NVIDIA GPUs (CUDA). (Majority of the code is written by IBM Bob. Thanks Bob!!)
17 May 2026