
**M3I — Targeted LoRA-DPO via Multi-Observer Attribution** When small LLMs fail at multi-step reasoning, the errors rarely spread uniformly — specific layers carry the failure. Standard LoRA is blind to this. M3I asks: can two LLM observers identify those layers, then train LoRA-DPO on only them? Five-phase pipeline, validated end-to-end on DeepSeek-V2-Lite (16B MoE): 1. Inference + per-layer signal capture via custom ggml hook 2. Cross-observer audit (Claude Sonnet 4.5 + Gemini 2.5 Pro), agreement required 3. Statistical attribution: paired t-tests at failure vs control tokens, Bonferroni-corrected 4. LoRA-DPO on attribution-identified layers only 5. Held-out eval: verdict + position-bias-controlled pairwise **Status (May 10, 2026):** Trained on 78 problems (50 PhD-level math + 28 crypto microstructure), 116 preference pairs, 3 epochs. Attribution flagged layers {6, 7, 8, 10, 13} — strongest signals at L13 attn (Z=+3.71), L10 attn (Z=+3.13), L6 MoE (Z=-2.84). Held-out eval on 12 problems: base 3 / LoRA 1 / tie 8 in pairwise wins. Three problems shift "incorrect" → "insufficient" — behavioral change without quality gain. Net: null to slightly negative. **Caveats:** n=12 held-out is small; ~20% adapter capacity stripped for GGUF conversion (kv_b_proj / MLA incompatibility); 116 pairs is modest; 16B is below capability floor for these problems. **Next:** (a) random-layer control to separate "attribution fails" from "any DPO at this scale fails"; (b) seed sensitivity for error bars. ~90 min MI300X compute + ~$2 API spend → defensible methodology paper either way. Inference: AMD MI300X via llama.cpp with HIP backend, ~115 t/s with adapter.
10 May 2026