Algorithm System Modeling of Neural Network Hardware Acceleration System
本文于1075天之前发表,文中内容可能已经过时。如有疑问,请在评论区留言。
Abstract
Convolutional Neural Network (CNN) is a pattern recognition method combining artificial neural network and deep learning theory. CNN’s parallel architecture is the foundation and core of CNN’s overall computing architecture. Field programmable gate array (FPGA) can give full play to the parallel characteristics of convolutional neural network. The neural network is realized by using the special multiplier and digital signal processing unit in FPGA. It can perform a large number of multiplication and addition operations in the algorithm in high parallel. Convolution neural network is composed of convolution layer, pooling layer and other layers with different functions, and convolution operation often consumes a lot of time. FPGA sharing this part of resources can effectively speed up the forward propagation and back-propagation process of convolutional neural network.
Python, a scripting language, makes project development easier. The PYNQ-Z1 from Xilinx Inc. integrates the FPGA and Python ARM, enabling fully embedded neural networks to be implemented.
In this research, FPGA is used as the hardware platform, and the speed of image classification by convolutional neural network is improved by 8.2 times compared with the software processing method. The hardware acceleration of convolutional neural network is well realized. This research also uses WindowsPC as a client and PYNQ-Z1 as a server to build the entire neural network system. The application program on the Windows PC side and the server program on the PYNQ-Z1 side were implemented. Combined with the PYNQ-Z1 neural network and the FPGA parallel accelerator of the same research group, a complete set of operational neural network systems was realized.
Hardware CNN Accelerator
Traditional convolutional neural networks are constructed by many different network layers, including convolution layer, pooling layer, activation layer and full connection layer. The convolution layer realizes the two-dimensional convolution of the image. The pooling layer can aggregate the features and reduce the amount of computation. The activation layer uses the activation function to map the input data between 0 and 1. The full connection layer is responsible for mapping the learned feature map to the output sample space. The structure of convolution neural network is shown in the figure below

Traditional FPGA design builds module RTL model manually. Although RTL model can provide the best performance with optimal resources, it takes a lot of time to write RTL. At the same time, large-scale RTL model also has low readability and high maintenance difficulty.
Vivado HLS high level synthesis tool is based on the IP core design of FPGA. The tool can create and synthesize IP core through C, C + + and system C, and does not need to create RTL model manually. It also supports ise and vivado design environment. It greatly reduces the design time and development difficulty.

Therefore, in this design, we use high-level synthesis tools based on vivado HLS and use C language to design the hardware circuit.