What is Knowledge Distillation?

Deep Learning

Knowledge Distillation

Knowledge distillation is a technique where a smaller model (student) is trained to mimic the outputs of a larger model (teacher). This transfers the teacher's knowledge into a more efficient model suitable for deployment.

Understanding Knowledge Distillation

Knowledge distillation is a model compression technique where a smaller "student" model is trained to replicate the behavior of a larger, more capable "teacher" model. Rather than training on hard labels alone, the student learns from the teacher's soft probability distributions, which contain richer information about relationships between classes and uncertainty. This process transfers the teacher's learned knowledge into a compact model suitable for deployment on edge devices, mobile phones, or latency-sensitive applications where a large language model or deep neural network would be impractical. Knowledge distillation has been used to create efficient versions of BERT, GPT, and vision models, achieving much of the teacher's accuracy at a fraction of the computational cost. The technique is a key component of MLOps pipelines that need to balance inference speed, model size, and performance in production environments.

Is AI recommending your brand?

Find out if ChatGPT, Perplexity, and Gemini mention you when people search your industry.

Check your brand — $9

Knowledge Graph

Back to full glossary

Knowledge Distillation

Understanding Knowledge Distillation

Is AI recommending your brand?

Related Deep Learning Terms

Activation Function

Adam Optimizer

Adapter Layers

Attention Mechanism

Autoencoder

Backpropagation

Batch Normalization

Batch Size