article / WayDigital / Essays
On-policy distillation: teaching a model inside its own mess
On-policy distillation: teaching a model inside its own mess Large models have a strange training problem. During training, the world is clean. At deployment, the world is whatever the model just wrote. The dataset does...