
1
1
Japan
2+ years of experience
I'm an AI/ML developer based in Japan (UTC+9), focused on on-device LLM fine-tuning and deployment for iOS. My main project is TeenEmo, an on-device AI counseling app for teenagers built with Swift/SwiftUI and locally running LLMs. I specialize in post-training pipelines — SFT, DPO, and GRPO (reinforcement learning) — using TRL and Unsloth, with experience training models in the 0.6B–4B range. I've also contributed to the open-source MLX ecosystem (merged PR to ml-explore/mlx-lm) and publish models and datasets on Hugging Face under the username YUGOROU. Tech stack: Python (TRL, Unsloth, Hugging Face), Swift/SwiftUI/CoreML/MLX Swift, llama.cpp. Compute: Vast.ai, Google Colab Pro, Hugging Face Pro. I'm particularly interested in domain-specific fine-tuning, reward model design, and efficient on-device inference. Always looking to collaborate with teams pushing the boundaries of open-source LLMs.

Lumi is a domain fine-tuned AI voice companion for dementia and Alzheimer's care. Built on AMD MI300X using QLoRA and GRPO reinforcement learning, Lumi handles confusion, repetition, and emotional fragility the way a trained caregiver would — not a generic chatbot. 55 million people live with dementia worldwide. Families cannot provide 24/7 care — and existing AI companions fail them. They reset every session, correct temporal confusion (which is clinically harmful), and leave patients vulnerable to scams costing $3 billion annually in elder fraud. Lumi is the first AI companion purpose-built for this population. Fine-tuned on AMD MI300X using QLoRA and GRPO reinforcement learning — the same technique behind DeepSeek-R1 — Lumi was trained on 8,540 dementia-specific samples processed through our EQ-Matrix framework, covering scenarios across severity levels, emotional states, and scam patterns. Persistent memory via ChromaDB injects prior session context into every new conversation — patients never repeat themselves. A structured output format fires the opening spoken line to TTS before the full response is generated, achieving time-to-first-audio under 1.5 seconds. A binary scam deflection classifier intercepts fraud attempts gently, without alarming the patient. On EQ-Bench 3, Lumi ranked 7th out of 46 models with a Rubric Score of 14.55 — confirming genuine emotional intelligence gains, not just surface fluency. The entire pipeline runs locally on AMD MI300X with ROCm and vLLM. No data leaves the device. No proprietary APIs. Fully private, fully open hardware.
10 May 2026