基于Tiny-YOLOv3的网络结构化压缩与加速

doi:10.16180/j.cnki.issn1007-7820.2023.08.007

Abstract

Abstract:

In particular application scenarios, Tiny-YOLOv3 network has problems of high resource cost and slow running speed when deployed on embedded platform. This study proposes a structured compression scheme combining pruning and quantization, and establishes a convolutional layer acceleration system for compressed network. The structured compression scheme uses sparse training and channel pruning to reduce the amount of computation in the network, and utilizes fixed-point quantization of activation value and integer power quantization of weight two to reduce the storage of parameters in the network convolution layer. In the convolution layer accelerator system, the programmable logic part designs a convolution layer accelerator core according to the parallel plus pipeline method, and the processing system part is responsible for the scheduling of the convolution layer accelerator system. The experimental results show that the mean average precision of Tiny-YOLOv3 network after structured compression is 0.46, and the parameter compression ratio reaches 5%. When the convolution layer acceleration system is deployed on Xilinx ZYNQ chip, the hardware can run stably at 250 MHz clock frequency, and the calculation force of the convolution operation unit is 36 GOPS. In addition, the overall power consumption of the acceleration platform is 2.6 W, and the hardware design greatly saves hardware resources.

Key words: object detection network, Tiny-YOLOv3, neural network compression, structural pruning, quantization, hardware acceleration, pipeline, ZYNQ

CLC Number:

TP391

HU Yongyang,LI Miao,MENG Fankai,ZHANG Feng,MENG Yiwei,SONG Yukun. Structured Compression and Acceleration of Network Based on Tiny-YOLOv3[J].Electronic Science and Technology, 2023, 36(8): 43-48.

Figures/Tables 12

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Table 1.

Table 2.

Figure 8.

Figure 9.

Table 3.

References 18

[1]	Li L, Bao J, Zhang T, et al. Face X-ray for more general face forgery detection[C]. Seattle: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020:5001-5010.
[2]	叶飞, 刘子龙. 基于改进YOLOv3算法的行人检测研究[J]. 电子科技, 2021, 34(1):5-9.
	Ye Fei, Liu Zilong. Pedestrian detection based on improved YOLOv3 algorithm[J]. Electronic Science and Technology, 2021, 34(1):5-9.
[3]	张莹, 刘子龙, 万伟. 基于Faster R-CNN的无人机车辆目标检测[J]. 电子科技, 2021, 34(11):11-20.
	Zhang Ying, Liu Zilong, Wan Wei. UAV vehicle target detection based on Faster R-CNN[J]. Electronic Science and Technology, 2021, 34(11):11-20.
[4]	梁月翔, 冯辉, 徐海祥. 基于YOLOv3-tiny的船舶可见光图像细粒度检测[J]. 武汉理工大学学报(交通科学与工程版), 2020, 44(6):1041-1051.
	Liang Yuexiang, Feng Hui, Xu Haixiang. Fine-grained detection of ship visible images based on YOLOv3-tiny[J]. Journal of Wuhan Univerisity of Technology (Transportation Science & Engineering), 2020, 44(6):1041-1051.
[5]	Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6):1137-1149.
[6]	Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]. Las Vegas: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016:779-788.
[7]	Jiao Z T, Zhang Y M, Mu L X, et al. A YOLOv3-based learning strategy for realtime UAV based forest fire detection[C]. Hefei: The Thirty-second Conference on Control and Decision Making in China, 2020:729-733.
[8]	Liu N, Ma X, Xu Z, et al. AutoCompress: An automatic DNN structured pruning framework for ultra-high compression rates[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(4):4876-4883. doi: 10.1609/aaai.v34i04.5924
[9]	Wen W, Xu C, Wu C, et al. Coordinating filters for faster deep neural networks[C]. Shenzhen: IEEE International Conference on Computer Vision, 2017:658-666.
[10]	Sangil J, Changyong S, Seohyung L, et al. Learning to quantize deep networks by optimizing quantization intervals with task loss[C]. Long Beach: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019:4345-4354.
[11]	Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. Computer Science, 2015, 14(7):38-39.
[12]	Moss D J M, Leong P H W, Krishnan S, et al. A customizable matrix multiplication framework for the intel HARPv2 Xeon+FPGA platform: A deep learning case study[C]. Monterey: Proceedings of the ACM/SIGDA International Symposium, 2018:107-116.
[13]	Deng C H, Sun F X, Qian X H, et al. TIE: Energy-efficient tensor train-based inference engine for deep neural network[C]. New York: Proceedings of the Forty-sixth Annual International Symposium on Computer Architecture, 2019:264-277.
[14]	郭文旭, 苏远歧, 刘跃虎. 基于ZYNQ平台的YOLOv3压缩和加速[J]. 计算机应用, 2021, 41(3):669-676. doi: 10.11772/j.issn.1001-9081.2020060994
	Guo Wenxu, Su Yuanqi, Liu Yuehu. YOLOv3 compression and acceleration based on ZYNQ platform[J]. Journal of Computer Applications, 2021, 41(3):669-676. doi: 10.11772/j.issn.1001-9081.2020060994
[15]	Sergey I, Christian S. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]. Lille: Proceedings of the Thirty-second International Conference on International Conference on Machine Learning, 2015:448-456.
[16]	Zhuang L, Li J, Shen Z, et al. Learning efficient convolutional networks through network slimming[C]. Venice: Proceedings of the IEEE International Conference on Computer Vision, 2017: 2755-2763.
[17]	饶川, 陈靓影, 徐如意, 等. 一种基于动态量化编码的深度神经网络压缩方法[J]. 自动化学报, 2019, 45(10):1960-1968.
	Rao Chuan, Chen Jingying, Xu Ruyi, et al. A dynamic quantization coding based deep neural network compression method[J]. Acta Automatica sinica, 2019, 45(10):1960-1968.
[18]	张文烨, 尚方信, 郭浩. 基于Octave卷积的混合精度神经网络量化方法[J]. 计算机应用, 2021, 41(5):1299-1304. doi: 10.11772/j.issn.1001-9081.2020071106
	Zhang Wenye, Shang Fangxin, Guo Hao. Mixed precision neural network quantization method based on Octave convolution[J]. Journal of Computer Applications, 2021, 41(5):1299-1304. doi: 10.11772/j.issn.1001-9081.2020071106

模型	计算量	存储量 /MB	压缩率 /%	mAP
全精度网络	8 713 766	33.24	—	0.57
稀疏化网络	8 713 766	33.24	100.0	0.54
剪枝后网络	3 832 938	14.62	44.0	0.54
DoRefa量化后网络	3 832 938	14.62	44.0	0.52
INQ量化后网络	3 832 938	1.83	5.5	0.46

模型	计算量	存储量 /MB	压缩率 /%	mAP
全精度Tiny-YOLOv3	8 713 766	33.24	—	0.57
权重为±2⁰~±2^-15量化权重为±2⁰~±2^-7量化权重为±2⁰~±2^-3量化	8 713 766 8 713 766 8 713 766	5.19 4.16 3.12	15.6 12.5 9.4	0.50 0.47 0.35

资源名称	使用数	总数	使用率/%
LUT	18 478	99 360	18.60
FF	29 882	141 120	21.17
BRAM	123.5	216	57.18

Structured Compression and Acceleration of Network Based on Tiny-YOLOv3

RichHTML

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 12

References 18

Related Articles 15

Metrics

Comments

Recommended 0

[1]	YU Haoran,XIAO Hao. Design and FPGA Implementation of Large Scale Matrix Inversion Accelerator Based on LDL Algorithm [J]. Electronic Science and Technology, 2023, 36(7): 1-7.
[2]	CHENG Biqian,LIU Guangzhu,XIAO Hao. Improved Montgomery Modular Multiplication Algorithm and FPGA Implementation [J]. Electronic Science and Technology, 2022, 35(7): 58-63.
[3]	CHEN Wenjie,SONG Yukun,ZHANG Duoli. Design of Matrix Decomposer Based on Improved QR Algorithm [J]. Electronic Science and Technology, 2022, 35(11): 21-28.
[4]	GENG Zhaoqian,ZHU Huming,LI Xuming,CHEN Meiqing,YANG Guipeng. A Review: Radar Signal Processing Based on High Performance Computing [J]. Electronic Science and Technology, 2021, 34(9): 1-6.
[5]	GONG Jialiang,TANG Qingshan,BAI Chuang. Design of Efficiency Parallel HDMI 2.0 Encoder Based on FPGA [J]. Electronic Science and Technology, 2021, 34(4): 34-40.
[6]	YAN Zijie,WANG Jingmei,CHEN Zhuo,LIU Yu. Implementation of Software Programmable FPGA Network Measurement Engine Technology [J]. Electronic Science and Technology, 2021, 34(2): 27-32.
[7]	WANG Chunjiang,LI Peng. Design of Moving Target Detection System Based on ZYNQ [J]. Electronic Science and Technology, 2020, 33(5): 82-86.
[8]	ZHANG Lele,SU Qianmin. Adaptive Sliding Mode Control for Nonlinear Systems with Mismatched Quantization [J]. Electronic Science and Technology, 2020, 33(12): 38-43.
[9]	SU Rao,LI Feifei,CHEN Qiu. Face Recognition Algorithm Based on Multiple Feature Fusion [J]. Electronic Science and Technology, 2019, 32(7): 43-48.
[10]	JIN Hengkang,ZHANG Yiwen. Research on Design of Synchronous High Speed Acquisition System of Microphone Array Based on ZYNQ [J]. Electronic Science and Technology, 2019, 32(7): 28-32.
[11]	CAO Yuan,DONG Dengfeng,XU Zhenying,ZHOU Weihu,HE Yang. The Design of Wireless Image Transmission System for Balloon-borne Observation System [J]. Electronic Science and Technology, 2019, 32(4): 10-15.
[12]	SUN Yuanxin, QIN Shuijie. Design of DFT Hardware Accelerator Used in LTE Uplink [J]. , 2018, 31(4): 52-.
[13]	HE Kai, LIANG Bei, YANG Fashun. Design of Calculating Image Coordinates of Feature Point Based on Vivado HLS [J]. , 2018, 31(4): 87-.
[14]	ZHANG Chao-Yuan, SHAO Gao-Beng, HONG Xiang. The Transplanting of embedded Linux Based on Zynq-7000 [J]. , 2018, 31(1): 9-.
[15]	HUANG Meng, DIAO Liang-Yu, ZHANG Chu-Juan. Power Consumption Reduction and Efficiency Improvement of Power Plant Units Based on Karina Cycle [J]. , 2018, 31(1): 83-.