Abstract
Deep Learning (DL) is pervasive across a wide variety of domains. Convolutional Neural Networks (CNNs) are often used for image processing DL applications. Modern CNN models are growing to meet the needs of more sophisticated tasks, e.g. using Transposed Convolutions (TCONVs) for image decompression and image generation. Such state-of-the-art DL models often target GPU-based high-performance architectures, due to the high computational and hardware resource needs of TCONV layers. To avoid prohibitive GPU energy costs, CNNs are increasingly deployed to decentralized embedded autonomous devices, such as Field Programmable Gate Arrays (FPGAs). However, this poses challenges for designing efficient hardware implementations of TCONV layers. This paper presents a parameterized design and implementation of a new TCONV module, which is synthesizable onto FPGAs. It is implemented using the High-Level Synthesis (HLS), through a C++ template to parameterize its functional and non-functional properties. These parameters allow kernel sizes, image sizes, quantization and parallelism to be varied by users. With a systematic exploration in this design space, we find an optimal instance of this TCONV module that achieves 6.25 Giga Outputs per Second (Gout/s) using just 1.53 W of power. We then use our TCONV layer in two neural networks for image decompression and image generation. Image decompression achieves a speed throughput of more than 30K frames-per-second (fps) using only the 16% of resources on average, image generation achieves an energy efficiency of 324 fps/W and outperforms comparable state-of-the-art models by at least 7.3×.
Original language | English |
---|---|
Pages (from-to) | 1245-1263 |
Number of pages | 19 |
Journal | Journal of Signal Processing Systems |
Volume | 95 |
Issue number | 10 |
Early online date | 4 Aug 2023 |
DOIs | |
Publication status | Published - Oct 2023 |
Keywords
- Deep Learning
- FPGA
- High-Level Synthesis
- Parallelism
- Quantization
- Transposed Convolution
ASJC Scopus subject areas
- Theoretical Computer Science
- Information Systems
- Signal Processing
- Control and Systems Engineering
- Hardware and Architecture
- Modelling and Simulation