基于Winograd算法的可重构卷积神经网络加速器

doi:10.16180/j.cnki.issn1007-7820.2022.12.005

Abstract

Abstract:

Neural network is widely used in pattern recognition, predictive analysis, data fitting and other aspects, and it is an important foundation of artificial intelligence. Due to the large calculation amount of convolution and the large amount of network parameters, neural networks have caused problems such as long calculation time and high data access pressure. In response to the above problems, this study accelerates the convolution calculation based on the Winograd algorithm, designs an optimized hardware calculation structure, which improves the data reuse efficiency and calculation parallelism. Compared with the sliding window convolution, this accelerator increases the calculation efficiency by 4.352 times. In terms of convolution kernel gradient calculation, this accelerator adopts an optimized data distribution method, which reduces data movement and meets the data requirements of multiple PE parallel calculations. Compared with the CPU, the performance is improved by 23 times. Experiments show that the convolution calculation throughput rate of the accelerator can reach 192.55 GFLOPS under the VGG-9 network model, and the recognition rate of the CIFAR-10 data set after training is 76.54%.

Key words: CNN hardware accelerator, Winograd, FPGA, reconfigurable, convolution acceleration, multiplexed parallelism, image identification, VGG network

CLC Number:

TN47

YUAN Ziang,NI Wei,RAN Jingnan. Reconfigurable Convolutional Neural Network Accelerator Based on Winograd Algorithm[J].Electronic Science and Technology, 2022, 35(12): 35-42.

Figures/Tables 17

Figure 1.

Table 1.

Inference and training formula"

	推理	训练
全连接层	Z^l=W^lX^l+B^l O^l=σ(Z^l)	s^l= $(W l + 1) T$ δ^l⁺¹ δ^l=s^l☉σ'(Z^l) W^l=W^l+αδ^l $(X l) T$ B^l=B^l+αδ^l
	推理	训练
池化层	O^l=maxpool(X^l) O^l=meanpool(X^l)	δ^l=upsample(δ^l⁺¹)
卷积层	Z^l=X^l×K^l+B^l O^l=σ(Z^l)	s^l=δ^l⁺¹×rot180(W^l⁺¹) δ^l=s^l☉σ'(Z^l) K^l=K^l+αδ^l×X^l B^l=B^l+α∑δ^l

Table 1.

Figure 2.

Figure 3.

Table 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Table 2.

Figure 10.

Figure 11.

Figure 12.

Figure 13.

Figure 14.

References 19

[1]	何宜鸿, 李彦锋, 黄树恺, 等. 基于深度卷积神经网络的自适应图像去雾算法[J]. 电子科技, 2020, 33(8):70-73.
	He Yihong, Li Yanfeng, Huang Shukai, et al. Adaptive image dehazing algorithm based on deep convolutional neural network[J]. Electronic Science and Technology, 2020, 33(8):70-73.
[2]	秦兴, 高晓琪, 陈滨. 基于压缩卷积神经网络的图像超分辨率算法[J]. 电子科技, 2020, 33(5):1-8.
	Qin Xing, Gao Xiaoqi, Chen Bin. Image super-resolution algorithm based on squeezenet convolution neural network[J]. Electronic Science and Technology, 2020, 33(5):1-8.
[3]	Hoel C J, Wolff K, Laine L. Automated speed and lane change decision making using deep reinforcement learning[C]. Maui: Proceedings of the Twenty-first International Conference on Intelligent Transportation Systems, 2018.
[4]	Zhang S, Peng H, Nageshrao S, et al. Discretionary lane change decision making using reinforcement learning with model-based exploration[C]. Boca Raton: Proceedings of the Eighteenth IEEE International Conference On Machine Learning And Applications, 2019.
[5]	卢丽强, 郑思泽, 肖倾城, 等. 面向卷积神经网络的FPGA设计[J]. 中国科学:信息科学, 2019, 49(3):277-294.
	Lu Liqiang, Zheng Size, Xiao Qingcheng, et al. Accelerating convolutional neural networks on FPGAs[J]. Scientia Sinica:Informations, 2019, 49(3): 277-294.
[6]	Alwani M, Chen H, Ferdman M, et al. Fused-layer CNN accelerators[C]. Taipei: Proceedings of the Forty-ninth Annual IEEE/ACM International Symposium on Microarchitecture, 2016.
[7]	Kim S K, McAfee L C, Mcmahon P L, et al. A highly scalable restricted boltzmann machine FPGA implementation[C]. Prague: Proceedings of the International Conference on Field Programmable Logic and Applications, 2009.
[8]	Lavin A, Gray S. Fast algorithms for convolutional neural networks[C]. Las Vegas: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[9]	訾晶, 张旭欣, 王钰, 等. 基于FPGA的可配置神经网络硬件设计[J]. 传感器与微系统, 2020, 39(12):92-95.
	Zi Jing, Zhang Xuxin, Wang Yu, et al. Hardware design of configurable neural network based on FPGA[J]. Transducer and Microsystem Technologies, 2020, 39(12):92-95.
[10]	左国渭, 应三丛. FPGA的可配置卷积运算单元的设计与实现[J]. 单片机与嵌入式系统应用, 2020, 20(11):54-58.
	Zuo Guowei, Ying Sancong. Design and implementation of configurable convolution operation unit based on FPGA[J]. Microcontrollers & Embedded Systems, 2020, 20(11):54-58.
[11]	Gu J, Wang Z, Kuen J, et al. Recent advances in convolutional neural networks[J]. Pattern Recognition, 2018, 7(7):354-377.
[12]	Phan H, Hertel L, Maass M, et al. Robust audio event recognition with 1-max pooling convolutional neural networks[C]. Beijing: Proceedings of the Seventeenth Annual Conference of the International Speech Communication Association, 2016.
[13]	Girones R G, Palero R C, Boluda J C, et al. FPGA implementation of a pipeflned on-line backpropagation[J]. Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, 2005, 40(2):189-213. doi: 10.1007/s11265-005-4961-3
[14]	Le Cun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11):2278-2324. doi: 10.1109/5.726791
[15]	张占军, 彭艳兵, 程光. 基于CIFAR-10的图像分类模型优化[J]. 计算机应用与软件, 2018, 35(3):177-181.
	Zhang Zhanjun, Peng Yanbing, Cheng Guang. The optimization of image categorization model based on CIFAR-10[J]. Computer Applications and Software, 2018, 35(3):177-181.
[16]	Molchanov P, Tyree S, Karras T, et al. Pruning convolutional neural networks for resource efficient transfer learning[C]. San Juan: Proceedings of the International Conference on Learning Representations, 2016.
[17]	Suda N, Chandra V, Dasika G, et al. Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks[C]. Monterey: Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016.
[18]	Zhang C, Prasanna V. Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system[C]. Monterey: Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017.
[19]	Qiu J T, Wang J, Yao S, et al. Going deeper with embedded FPGA platform for convolutional neural network[C]. Monterey: Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016.

	文献[17]	文献[18]	文献[19]	本文
网络架构	VGG-16	VGG-16	VGG-16	VGG-9
FPGA芯片	Stratix V	Intel QPI FPGA	Zynq XC7Z045	Virtex6 240T
频率	120 MHz	200 MHz	150 MHz	130 MHz
吞吐率^[16]	117.8 GOPS	124.0 GFLOPS	137.0 GOPS	192.0 GFLOPS
DSP	727	224	780	508
吞吐率/Dsp	0.162	0.553	0.175	0.378

片上资源	已使用/总量	使用率
Slice Registers	93 446/301 440	31%
Slice LUTs	63 302/150 720	42%
Memory	42 048/58 400	72%
DSP48E1s	508/768	66%

Reconfigurable Convolutional Neural Network Accelerator Based on Winograd Algorithm

RichHTML

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 17

References 19

Related Articles 15

Metrics

Comments

Recommended 0

[1]	ZHANG Xuan,ZHANG Duoli,SONG Yukun. Optimization of the Internal Memory Architecture of Heterogeneous Multi-Core SoC Processors [J]. Electronic Science and Technology, 2022, 35(9): 44-51.
[2]	CHENG Biqian,LIU Guangzhu,XIAO Hao. Improved Montgomery Modular Multiplication Algorithm and FPGA Implementation [J]. Electronic Science and Technology, 2022, 35(7): 58-63.
[3]	ZHANG Fengyin,GAO Bo,JI Yawei. Multi-Task Traffic Management Algorithm Based on FPGA [J]. Electronic Science and Technology, 2022, 35(12): 43-48.
[4]	YUAN Qingqing,HU Xu,LIU Zhiyong,MA Ting,JIANG Quan. Servo Control for the Dual Three-Phase Permanent Magnet Synchronous Motor Based on FPGA [J]. Electronic Science and Technology, 2022, 35(12): 49-56.
[5]	WANG Chunhua,LI Bin,DU Gaoming,LI Zhenmin. Design and Implement of a High-Performance RLWE Cryptoprocessor [J]. Electronic Science and Technology, 2022, 35(11): 13-20.
[6]	CHEN Wenjie,SONG Yukun,ZHANG Duoli. Design of Matrix Decomposer Based on Improved QR Algorithm [J]. Electronic Science and Technology, 2022, 35(11): 21-28.
[7]	CAO Jiming,CHONG Yuhua,MEI Li,ZHU Yupeng,DUAN Zongming. Reconfigurable Radar System Enabled by Microwave Photonics Technology [J]. Electronic Science and Technology, 2021, 34(9): 36-40.
[8]	YAN Zijie,WANG Jingmei,CHEN Zhuo,LIU Yu. Implementation of Software Programmable FPGA Network Measurement Engine Technology [J]. Electronic Science and Technology, 2021, 34(2): 27-32.
[9]	SHI Weizhong,CAO Weiwei,FAN Yanming,DONG Jiajun,CHEN Shu,XIAO Hao. FPGA-Based Real-Time Edge Detection and its Implementation for Deep-Space Images [J]. Electronic Science and Technology, 2020, 33(5): 45-49.
[10]	XIE Zhixuan,YAO Hongbing,FAN Ning,CHEN Feng. Connected Domain Label Detection Algorithm for Multi-target Lens [J]. Electronic Science and Technology, 2020, 33(4): 50-54.
[11]	ZHAO Nengwu,LU Hongmin,XU Tao,HU Kuan,HE Chuanxia,MENG Xiaojiao. Design of Signal Source for Perimeter Intrusion Detection System Based on AD9910 [J]. Electronic Science and Technology, 2020, 33(1): 1-5.
[12]	DUAN Tie,HUANG Yan,WANG Yang. Reconfiguration Design of Digital PLL in Carrier Synchronous [J]. Electronic Science and Technology, 2019, 32(8): 41-45.
[13]	LUO Ying,WU Qiang,QIN Yun. The Least Mean Square Ultrasonic Beamforming Algorithm Based on GSC [J]. Electronic Science and Technology, 2019, 32(2): 37-41.
[14]	WANG Jiamin,YANG Qinghui,ZHANG Huaiwu. 9 kHz1.4 GHz High Precision Fast Random Frequency Hopping DDS Frequency Synthesizer [J]. Electronic Science and Technology, 2019, 32(12): 27-32.
[15]	HE Kai, LIANG Bei, YANG Fashun. Design of Calculating Image Coordinates of Feature Point Based on Vivado HLS [J]. , 2018, 31(4): 87-.