This repository is for testing knowledge distillation. Especially for experiments on MNIST, which is described in this paper [1]: Hinton et. al. "Distilling the Knowledge in a Neural Network". NIPS2014. Details of model structures and training hyper-paremters are written in another paper [2]: Hinton et. al. "Improving neural networks by preventing co-adaption of feature detectors". https://arxiv.org/abs/1207.0580
I also wrote a story in Medium: Knowledge Distillation for Object Detection 1: Start from simple classification model
scheduler.py is copied from timesler's repository (https://github.com/timesler/lr-momentum-scheduler)
- Pytorch (1.5)
- Numpy
- matplotlib
docker pull poperson1205/knowledge_distillation
python train_teacher.py
python train_student.py
python train_student_distill.py
python evaluate.py
- Teacher: 100 / 10000 (1.00%)
- Student: 171 / 10000 (1.71%)
- Student with KD: 111 / 10000 (1.11%)
Although the accuracy of teacher model (100 errors) is not good as written in the original paper (74 errors), we could see the power of the knowledge distillation by comparing vanilla student model (171 errors) and distilled student model (111 errors).
[1] Hinton et. al. "Distilling the Knowledge in a Neural Network". NIPS2014.
[2] Hinton et. al. "Improving neural networks by preventing co-adaption of feature detectors". https://arxiv.org/abs/1207.0580
[3] Presentation material of paper[1]: https://www.ttic.edu/dl/dark14.pdf