Abstract
Knowledge distillation essentially maximizes the mutual information between teacher and student networks. Typically, a variational distribution is introduced to maximize the variational lower bound. However, the heteroscedastic noises derived from this distribution are often unstable, leading to unreliable data-uncertainty modeling. Our research identifies that bias-variance coupling in knowledge distillation causes this instability. We thus propose Bias-variance dEcomposition kNowledge dIstillatioN (BENIN) approach. Initially, we use bias-variance decomposition to decouple these components. Subsequently, we design a lightweight Feature Frequency Expectation Estimation Module (FF-EEM) to estimate the student's prediction expectation, which helps compute bias and variance. Variance learning measures data uncertainty in the teacher's prediction. A balance factor addresses the bias-variance dilemma. Lastly, the bias-variance decomposition distillation loss enables the student to learn valuable knowledge while reducing noise. Experiments on Synapse and Lits17 medical-image-segmentation datasets validate BENIN's effectiveness. FF-EEM also mitigates high-frequency noise from high mask rates, enhancing data-uncertainty estimation and visualization. Our code is available at https://github.com/duanzhongjian/BENIN.
Original language | English |
---|---|
Article number | 130230 |
Journal | Neurocomputing |
Volume | 638 |
Early online date | 12 Apr 2025 |
DOIs | |
Publication status | E-pub ahead of print - 12 Apr 2025 |
Keywords
- Bias-variance decomposition
- Data uncertainty
- Feature frequency expectation estimation
- Knowledge distillation
- Maximizing mutual information
ASJC Scopus subject areas
- Computer Science Applications
- Cognitive Neuroscience
- Artificial Intelligence