Bayesian distillation of deep learning models (Q2069701)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: Bayesian distillation of deep learning models |
scientific article; zbMATH DE number 7461105
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | Bayesian distillation of deep learning models |
scientific article; zbMATH DE number 7461105 |
Statements
Bayesian distillation of deep learning models (English)
0 references
21 January 2022
0 references
The authors present a Bayesian approach to teacher-student networks' knowledge distillation. Knowledge distillation was first proposed by \textit{G. Hinton} et al. in their paper [``Distilling the knowledge in a neural network'', Preprint, \url{arXiv:1503.02531}]. They proposed to train a large network with ground truth labels as the teacher network, then train a smaller model on the outputs of the teacher network as ``soft targets''. This work extends the prior framework of teacher-student networks. The authors argue that the parameters of the student network can be initialized from the teacher network. The teacher network is usually larger than the student network. To meaningfully initialize the student network, the authors propose to prune the teacher network so that it has the same architecture as the student network. With the assumption that the posterior of the teacher network follows a Gaussian distribution, the authors prove that the pruned teacher network also follows a Gaussian distribution.
0 references
deep learning
0 references
Bayesian methods
0 references
knowledge distillation
0 references
model selection
0 references
Bayesian inference
0 references
0 references
0 references
0 references
0 references
0.8858735
0 references
0.87894446
0 references
0 references
0.8767326
0 references
0 references