Optimising Hardware Accelerated Neural Networks with Quantisation and a Knowledge Distillation Evolutionary Algorithm

Robert James Stewart, Andrew Nowlan, Pascal Bacchus, Quentin Ducasse, Ekaterina Komendantskaya

Research output: Contribution to journalArticlepeer-review

49 Downloads (Pure)

Abstract

This paper compares the latency, accuracy, training time and hardware costs of neural networks compressed with our new multi-objective evolutionary algorithm called NEMOKD, and with quantisation. We evaluate NEMOKD on Intel's Movidius Myriad X VPU processor, and quantisation on Xilinx's programmable Z7020 FPGA hardware. Evolving models with NEMOKD increases inference accuracy by up to 82% at the cost of 38% increased latency, with throughput performance of 100–590 image frames-per-second (FPS). Quantisation identifies a sweet spot of 3 bit precision in the trade-off between latency, hardware requirements, training time and accuracy. Parallelising FPGA implementations of 2 and 3 bit quantised neural networks increases throughput from 6k FPS to 373k FPS, a 62x speedup.
Original languageEnglish
Article number396
JournalElectronics
Volume10
Issue number4
DOIs
Publication statusPublished - 5 Feb 2021

Keywords

  • Evolutionary algorithm
  • FPGA
  • Movidius VPU
  • Neural network
  • Quantisation

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Signal Processing
  • Hardware and Architecture
  • Computer Networks and Communications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Optimising Hardware Accelerated Neural Networks with Quantisation and a Knowledge Distillation Evolutionary Algorithm'. Together they form a unique fingerprint.

Cite this