TY - JOUR
T1 - Industrial Cyber-Physical Systems-Based Cloud IoT Edge for Federated Heterogeneous Distillation
AU - Wang, Chengjia
AU - Yang, Guang
AU - Papanastasiou, Giorgos
AU - Zhang, Heye
AU - Rodrigues, Joel J. P. C.
AU - De Albuquerque, Victor Hugo C.
N1 - Funding Information:
part by the Key Program for International Cooperation Projects of EEP learning has achieved tremendous successes in a AreaResearchandDevelopmentProgramofGuangdongProvinceun-DGuangdongProvinceunderGrant2018A050506031,inpartbytheKey-wide range of visual applications [1]. Effective training of der Grant 2019B010110001, in part by the Guangdong Natural Science deep neural networks (DNN) often requires powerful computing FundsforDistinguishedYoungScholarunderGrant2019B151502031, hardware, availability of massive training data, and optimal sitiesunderGrant19LGZD36.Paperno.TII-20-1728.(CorrespondingandinpartbytheFundamentalResearchFundsfortheCentralUniver- hyper-parametric setups. These prerequisites limited the appli- author: Heye Zhang.) cability of experimentally verified DNNs when being deployed Chengjia Wang is with the BHF Centre for Cardiovascular Sci- into industrial Internet of Things (IoT). This article aims to [email protected]).ence,TheUniversityofEdinburgh,EH164TJEdinburgh,U.K.(e-mail: overcome these difficulties when solving image classification Guang Yang is with the Cardiovascular Research Centre, Royal tasks. Fig. 1 shows a conceptual demonstration of an example BromptonHospitalandNationalHeartandLungInstitute,ImperialCol- IoT system with these practical limitations. GiorgosPapanastasiouiswiththeEdinburghImagingFacilityQMRI,legeLondon,SW72AZLondon,U.K.(e-mail:[email protected]). First, most deep learning models with the reported state-of- The University of Edinburgh, EH16 4TJ Edinburgh, U.K. (e-mail: the-arts performance are often trained on devices with sufficient [email protected]). computing power, such as a GPU-based cloud server. As shown SunYat-Sen University,GuangzhouHeyeZhangis withtheSchool of in Fig. 1, adopting these pretrained models on fog/edge devices [email protected]). with diverse capacities requires to compress a large network JoelJ.P.C.RodriguesiswiththeFederalUniversityofPiauí(UFPI), into various light-weight models optimal for different devices, cações,1049-001Lisboa,Portugal(e-mail:[email protected]).Teresina64049-550,Brazil,andalsowiththeInstitutodeTelecomuni- for example, deploying an face recognition model initialized on Victor Hugo C. de Albuquerque is with the University of Fortaleza, a GPU server to security cameras at different sites. Knowledge Fortaleza60811-905,Brazil(e-mail:[email protected]). distillation [2] (KD) has been one of the most popular paradigms onlineathttps://ieeexplore.ieee.org.Colorversionsofoneormoreofthefiguresinthisarticleareavailable for robust and efficient model compression. The core idea of the Digital Object Identifier 10.1109/TII.2020.3007407 original KD is to teach a student model using a soft categorical
Funding Information:
Manuscript received April 7, 2020; revised June 12, 2020; accepted June 30, 2020. Date of publication July 7, 2020; date of current version May 3, 2021. This work was supported by FCT/MCTES through national funds and when applicable co-funded EU funds under the Project UIDB/EEA/50008/2020, in part by the Brazilian National Council for Scientific and Technological Development (CNPq) under Grant 309335/2017-5, Grant 304315/2017-6, and Grant 430274/2018-1, in
Publisher Copyright:
© 2005-2012 IEEE.
PY - 2021/8
Y1 - 2021/8
N2 - Deep convoloutional networks have been widely deployed in modern cyber-physical systems performing different visual classification tasks. As the fog and edge devices have different computing capacity and perform different subtasks, models trained for one device may not be deployable on another. Knowledge distillation technique can effectively compress well trained convolutional neural networks into light-weight models suitable to different devices. However, due to privacy issue and transmission cost, manually annotated data for training the deep learning models are usually gradually collected and archived in different sites. Simply training a model on powerful cloud servers and compressing them for particular edge devices failed to use the distributed data stored at different sites. This offline training approach is also inefficient to deal with new data collected from the edge devices. To overcome these obstacles, in this article, we propose the heterogeneous brain storming (HBS) method for object recognition tasks in real-world Internet of Things (IoT) scenarios. Our method enables flexible bidirectional federated learning of heterogeneous models trained on distributed datasets with a new 'brain storming' mechanism and optimizable temperature parameters. In our comparison experiments, this HBS method outperformed multiple state-of-the-art single-model compression methods, as well as the newest multinetwork knowledge distillation methods with both homogeneous and heterogeneous classifiers. The ablation experiment results proved that the trainable temperature parameter into the conventional knowledge distillation loss can effectively ease the learning process of student networks in different methods. To the best of authors' knowledge, this is the first IoT-oriented method that allows asynchronous bidirectional heterogeneous knowledge distillation in deep networks.
AB - Deep convoloutional networks have been widely deployed in modern cyber-physical systems performing different visual classification tasks. As the fog and edge devices have different computing capacity and perform different subtasks, models trained for one device may not be deployable on another. Knowledge distillation technique can effectively compress well trained convolutional neural networks into light-weight models suitable to different devices. However, due to privacy issue and transmission cost, manually annotated data for training the deep learning models are usually gradually collected and archived in different sites. Simply training a model on powerful cloud servers and compressing them for particular edge devices failed to use the distributed data stored at different sites. This offline training approach is also inefficient to deal with new data collected from the edge devices. To overcome these obstacles, in this article, we propose the heterogeneous brain storming (HBS) method for object recognition tasks in real-world Internet of Things (IoT) scenarios. Our method enables flexible bidirectional federated learning of heterogeneous models trained on distributed datasets with a new 'brain storming' mechanism and optimizable temperature parameters. In our comparison experiments, this HBS method outperformed multiple state-of-the-art single-model compression methods, as well as the newest multinetwork knowledge distillation methods with both homogeneous and heterogeneous classifiers. The ablation experiment results proved that the trainable temperature parameter into the conventional knowledge distillation loss can effectively ease the learning process of student networks in different methods. To the best of authors' knowledge, this is the first IoT-oriented method that allows asynchronous bidirectional heterogeneous knowledge distillation in deep networks.
KW - Deep learning
KW - heterogeneous classifiers
KW - Internet of Things (IoT)
KW - knowledge distillation (KD)
KW - online learning
UR - http://www.scopus.com/inward/record.url?scp=85105582558&partnerID=8YFLogxK
U2 - 10.1109/TII.2020.3007407
DO - 10.1109/TII.2020.3007407
M3 - Article
AN - SCOPUS:85105582558
SN - 1551-3203
VL - 17
SP - 5511
EP - 5521
JO - IEEE Transactions on Industrial Informatics
JF - IEEE Transactions on Industrial Informatics
IS - 8
ER -