基于FPGA的SqueezeNet推断加速器设计

doi:10.16180/j.cnki.issn1007-7820.2022.02.004

Abstract

Abstract:

In view of the problems of the lightweight deep neural network SqueezeNet, such as large amount of intermediate data and long consumption calculation cycle,this study proposes to divide the entire network with a process block structure to speed up the calculation. Each process block is composed of Expand layer and Squeeze layer. The processing block structure ending with the Squeeze layer reduces the amount of intermediate data flowing between the computing module and the memory, and reduces the read and write consumption. The core calculation module introduces the early termination of the convolution calculation technology using the characteristics of the activation function. The effective index survival unit, the effective index control value unit and the convolution judgment unit are designed to skip the calculation amount and calculation cycle occupied by invalid values in the convolution calculation. Experimental results show that the data flow of the accelerator is reduced by 55.38%, and the calculation amount and calculation period occupied by invalid values are reduced by 14.68%.

Key words: lightweight deep neural network, SqueezeNet, process block, activation function, early termination of the convolution calculation, effective index, invalid value, calculation period

CLC Number:

TP183

CHU Ping,NI Wei. Design of FPGA-Based SqueezeNet Inference Accelerator[J].Electronic Science and Technology, 2022, 35(2): 20-26.

Figures/Tables 11

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Table 1.

Table 2.

References 16

[1]	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[C]. San Diego:Proceedings of the International Conference on Learning Representations, 2015.
[2]	Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions[C]. Boston:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
[3]	He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]. Seattle:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[4]	Han S, Mao H, Dally W J, et al. Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding[C]. San Juan:Proceedings of the International Conference on Learning Representations, 2016.
[5]	Han S, Liu X Y, Mao H Z, et al. EIE: efficient inference engine on compressed deep neural network[J]. International Symposium on Computer Architecture, 2016, 44(3):243-254.
[6]	Courbariaux M, Bengio Y, David J P. Binaryconnect: training deep neural networks with binary weights during propagations[C]. Montreal:Proceedings of the Twenty-ninth Annual Conference on Neural Information Processings Systems, 2015.
[7]	Rastegari M, Ordonez V, Redmon J, et al. XNOR-Net: imageNet classification using binary convolutional neural networks[C]. Amsterdam:Proceedings of the Fourteenth European Conference on Computer Vision, 2016.
[8]	Zhang X Y, Zhou X Y, Lin M X, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]. Salt Lake City:Proceedings of the Thirty-first IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
[9]	Sandler M, Howard A, Zhu M, et al. MobilenetV2: inverted residuals and linear bottlenecks[C]. Salt Lake City: IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
[10]	Santos A G, Souza C D, Zanchettin C, et al. Reducing SqueezeNet Storage Size with Depthwise Separable Convolutions[C]. Rio de Janeiro:International Joint Conference on Neural Networks, 2018.
[11]	毕鹏程, 罗健欣, 陈卫卫, 等. 面向移动端的轻量化卷积神经网络结构[J]. 信息技术与网络安全, 2019, 38(9):24-29.
	Bi Pengcheng, Luo Jianxin, Chen Weiwei, et al. Lightweight convolutional neural network structure for mobile terminal[J]. Information Technology and Network Security, 2019, 38(9):24-29.
[12]	胡挺, 祝永新, 田犁, 等. 面向移动平台的轻量级卷积神经网络架构[J]. 计算机工程, 2019, 45(1):17-22.
	Hu Ting, Zhu Yongxin, Tian Li, et al. Lightweight convolutional neural network architecture for mobile platforms[J]. Computer Engineering, 2019, 45(1):17-22.
[13]	秦兴, 高晓琪, 陈滨. 基于压缩卷积神经网络的图像超分辨率算法[J]. 电子科技, 2020, 33(5):1-8.
	Qin Xing, Gao Xiaoqi, Chen Bin. Image super-resolution algorithm based on SqueezeNet convolution neural network[J]. Electronic Science and Technology, 2020, 33(5):1-8.
[14]	Huang C, Ni S Y, Chen G S. A layer-based structured design of CNN on FPGA[C]. Guiyang:Proceedings of the Twelfth IEEE International Conference on ASIC, 2017.
[15]	Aimar A, Mostafa H, Calabrese E, et al. Nullhop: a flexible convolutional neural network accelerator based on sparse representations of feature maps[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(3):644-656. doi: 10.1109/TNNLS.2018.2852335
[16]	Mousouliotis P G, Petrou L P. SqueezeJet: high-level synthesis accelerator design for deep convolutional neural networks[C]. Voros:Proceedings of the International Symposium on Applied Reconfigurable Computing, 2018.

资源	LUT	FF	DSP	BRAM
可用的	1 221 600	2 443 200	2 160	2 584
已用的	477 940	152 120	1 536	2 279
利用率	39.1%	6.2%	71.1%	88.1%

	文献[16]	文献[14]	本加速器
平台	Zynq XC7Z020	Xilinx XC7Z020	XC7V 2000T
频率/MHz	100	110	100
DSP	186	1 879	1 536
BRAM/kB	269	2 715	2 279
精度	8/16-bit fixed	16-bit fixed	16-bit fixed
延迟/s	0.333 00	0.003 65	0.003 58

Design of FPGA-Based SqueezeNet Inference Accelerator

RichHTML

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 11

References 16

Related Articles 1

Metrics

Comments

Recommended 10