Knowledge Distillation

Knowledge Distillation (KD) is a technique where a smaller, simpler model (student) is trained to mimic the behavior of a larger, more complex model (teacher). The aim is to transfer the knowledge from the teacher model to the student model. This is particularly useful in NLP tasks where deploying large models is computationally expensive. The student model, being smaller, is more efficient to use but still maintains a high level of performance by learning from the teacher model.

Synonyms:

Papers published in this field over the years:

Hierarchy

Publications for Knowledge Distillation

Sort by

Showing results 1 to 0 of 0

Researchers for Knowledge Distillation

Sort by