Abstract
Neural networks have proven a successful AI approach in many application areas. Some neural network deployments require low inference latency and lower power requirements to be useful e.g. autonomous vehicles and smart drones. Whilst FPGAs meet these requirements, hardware needs of neural networks to execute often exceed FPGA resources.
Emerging industry led frameworks aim to solve this problem by compressing the topology and precision of neural networks, eliminating computations that require memory for execution. Compressing neural networks inevitably comes at the cost of reduced inference accuracy.
This paper uses Xilinx's FINN framework to systematically evaluate the trade-off between precision, inference accuracy, training time and hardware resources of 64 quantized neural networks that perform MNIST character recognition.
We identify sweet spots around 3 bit precision in the quantization design space after training with 40 epochs, minimising both hardware resources and accuracy loss. With enough training, using 2 bit weights achieves almost the same inference accuracy as 3-8 bit weights.
Emerging industry led frameworks aim to solve this problem by compressing the topology and precision of neural networks, eliminating computations that require memory for execution. Compressing neural networks inevitably comes at the cost of reduced inference accuracy.
This paper uses Xilinx's FINN framework to systematically evaluate the trade-off between precision, inference accuracy, training time and hardware resources of 64 quantized neural networks that perform MNIST character recognition.
We identify sweet spots around 3 bit precision in the quantization design space after training with 40 epochs, minimising both hardware resources and accuracy loss. With enough training, using 2 bit weights achieves almost the same inference accuracy as 3-8 bit weights.
Original language | English |
---|---|
Title of host publication | Applied Reconfigurable Computing. Architectures, Tools, and Applications |
Subtitle of host publication | ARC 2020 |
Publisher | Springer |
Pages | 121-135 |
Number of pages | 15 |
ISBN (Electronic) | 9783030445348 |
ISBN (Print) | 9783030445331 |
DOIs | |
Publication status | Published - 2020 |
Event | 16th International Symposium on Applied Reconfigurable Computing 2020 - University of Castilla-La Mancha, Toledo, Spain Duration: 1 Apr 2020 → 3 Apr 2020 https://arcoresearch.com/arc2020/ |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Volume | 12083 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 16th International Symposium on Applied Reconfigurable Computing 2020 |
---|---|
Abbreviated title | ARC2020 |
Country/Territory | Spain |
City | Toledo |
Period | 1/04/20 → 3/04/20 |
Internet address |
Keywords
- Deep learning
- FPGA
- Neural networks
- Quantization
ASJC Scopus subject areas
- Theoretical Computer Science
- Computer Science(all)