Bayesian distillation of deep learning models (Q2069701)

From MaRDI portal





scientific article; zbMATH DE number 7461105
Language Label Description Also known as
English
Bayesian distillation of deep learning models
scientific article; zbMATH DE number 7461105

    Statements

    Bayesian distillation of deep learning models (English)
    0 references
    0 references
    0 references
    21 January 2022
    0 references
    The authors present a Bayesian approach to teacher-student networks' knowledge distillation. Knowledge distillation was first proposed by \textit{G. Hinton} et al. in their paper [``Distilling the knowledge in a neural network'', Preprint, \url{arXiv:1503.02531}]. They proposed to train a large network with ground truth labels as the teacher network, then train a smaller model on the outputs of the teacher network as ``soft targets''. This work extends the prior framework of teacher-student networks. The authors argue that the parameters of the student network can be initialized from the teacher network. The teacher network is usually larger than the student network. To meaningfully initialize the student network, the authors propose to prune the teacher network so that it has the same architecture as the student network. With the assumption that the posterior of the teacher network follows a Gaussian distribution, the authors prove that the pruned teacher network also follows a Gaussian distribution.
    0 references
    0 references
    deep learning
    0 references
    Bayesian methods
    0 references
    knowledge distillation
    0 references
    model selection
    0 references
    Bayesian inference
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references

    Identifiers