Abstract
With the shift of deep learning applications to Edge Computing devices, compression techniques have been introduced to minimize hardware use, power consumption and latency. For example, quantization uses low numeric precision to represent inputs, parameters and activation functions. Transposed Convolutions (TCONVs) provide neural networks with image up-sampling capabilities. However, the accuracy and performance trade-off of TCONV Layers is under-explored, with existing works evaluating down to 8-bit precision but not less. This research systematically evaluates the impact of very low precision when a two-layers quantized decoder, using TCONVs, is implemented within an FPGA-based System-on-Chip (SoC) architecture. We evaluate the quantization impact on throughput performance and hardware costs, as well as the impact of parallelizing the computations of TCONV Layers using the same metrics. Results show that, when 4-bit data are processed, the circuit implemented on a Xilinx Zynq-7020 SoC only uses ~15% of logic and ~7.5% of on-chip memories, at the expense of a negligible ~2.5% accuracy loss with respect to the 8-bit counterpart. Furthermore, 3.5× speed-up is observed when inputs are processed with 4× parallelism.
Original language | English |
---|---|
Title of host publication | 2022 IEEE International Conference on Pervasive Intelligence and Computing (PiCom) |
Editors | Giancarlo Fortino, Raffaele Gravina, Antonio Guerrieri, Claudio Savaglio |
Publisher | IEEE |
ISBN (Electronic) | 9781665462976 |
DOIs | |
Publication status | Published - 13 Dec 2022 |
Keywords
- field programmable gate arrays (FPGAs)
- quantization
- reconfigurable systems-on-chip
- transposed convolution layers
ASJC Scopus subject areas
- Management of Technology and Innovation
- Artificial Intelligence
- Computer Networks and Communications
- Computer Science Applications
- Information Systems
- Information Systems and Management
- Safety, Risk, Reliability and Quality
- Control and Optimization