Building a physics-grounded RL training pipeline for datacenter operations - from custom environment to teacher-distilled SFT to live-simulation GRPO.
Student
DC-Ops teaches a 7B model to run a datacenter via physics-grounded RL. A reasoning teacher distills SFT data, then GRPO trains against a live RC thermal and power simulation. Built on Meta OpenEnv, NL command interface, runs on a single AMD MI300X.