What is Distillation?

Distillation

Deep Learning

Knowledge distillation is a model compression technique where a smaller student model learns to replicate the behavior of a larger teacher model. Distillation makes it possible to deploy powerful AI in resource-constrained environments.

Understanding Distillation

Distillation, or knowledge distillation, is a model compression technique where a smaller "student" model is trained to replicate the behavior of a larger, more capable "teacher" model. Rather than learning solely from hard labels, the student learns from the teacher's soft probability distributions, capturing nuanced relationships between classes that the teacher has discovered. This approach is widely used to deploy deep learning models on resource-constrained devices for edge AI applications, such as mobile phones and IoT sensors, where a full-sized foundation model would be impractical. Distillation has been applied to compress large language models like GPT into smaller variants that retain much of the original performance. The technique can be combined with other optimization methods like quantization and pruning to further reduce model size and inference latency.

Distributed Training

Back to glossary

Distillation

Understanding Distillation

Related in Deep Learning

Activation Function

Adam Optimizer

Adapter Layers

Attention Mechanism

Autoencoder

Backpropagation

Batch Normalization

Batch Size