Inference A-JEPA that turns speech into acoustic latents, predicts 80‑band log‑mel spec, and can reconstruct waveforms with HiFi‑GAN on AMD MI300X.
Learning