基于卷积与自注意力聚合的小目标检测

doi:10.16180/j.cnki.issn1007-7820.2024.02.003

Abstract

Abstract:

Small object detection is a research hotspot in most object detection open datasets. In view of the problem of insufficient detection accuracy of small targets in multi-size detection scenarios, an improved small target detection model based on YOLOv5s(You Only Look Once version 5s) is proposed in this study.A convolution self-attention aggregation residual block is added to the feature extraction network of the detector to improve the feature extraction ability, and a new feature graph is introduced from the shallow network to enhance the feature information of small object. The feature fusion network structure is improved to make full use of the newly introduced shallow features. SIOU Loss is introduced to replace the original GIOU Loss rectangular frame loss function to improve the detection accuracy and training speed.The experimental results show that the detection accuracy of the improved model is 0.012 higher than YOLOv5s on the 2007 and 2012 data sets of PASCAL VOC, and the small object detection accuracy is 0.023 higher than YOLOv5s. The detection accuracy of the imporved model in MS COCO data set is 0.001 higher than YOLOv5s, and the detection accuracy of small objects is 0.009 higher than YOLOv5s.

Key words: small object, object detection, YOLOv5s, convolutional neural network, self-attention, ACmix, SIOU Loss, residual network

CLC Number:

TN247

WANG Xiaozhu,YU Lianzhi. Small Object Detection Based on Convolution and Self-Attention of Aggregation[J].Electronic Science and Technology, 2024, 37(2): 14-22.

Figures/Tables 17

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Figure 10.

Table 1.

Table 2.

Table 3.

Figure 11.

Figure 12.

Table 4.

Table 5.

References 25

[1]	Gu J X, Wang Z H, Kuen J, et al. Recent advances in convolutional neural networks[J]. Pattern Recognition:The Journal of the Pattern Recognition Society, 2018, 77(9):354-377.
[2]	Lawrence Z, Piotr D. Edge boxes:Locating object proposals from edges[C]. Zurich: European Conference on Computer Vision, 2014:162-169.
[3]	Ren S, He K, Girshick R, et al. Faster R-CNN:Towards real-time object detection with region proposal networks[C]. Montreal: Proceedings of Advances in Neural Information Processing Systems, 2015:805-812.
[4]	程旭, 宋晨, 郑钰辉. 基于深度学习的通用目标检测研究综述[J]. 电子学报, 2021, 49(7):1428-1438. doi: 10.12263/DZXB.20200570
	Cheng Xu, Song Chen, Zheng Yuhui. A survey of generic object detection methods based on deep learning[J]. Acta Electronica Sinica, 2021, 49(7):1428-1438. doi: 10.12263/DZXB.20200570
[5]	Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]. Columbus: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014:78-87.
[6]	Girshick R. Fast R-CNN[C]. Santiago: International Conference on Computer Vision, 2015:366-378.
[7]	Sermanet P, Eigen D, Zhang X, et al. OverFeat:Integrated recognition,localization and detection using convolutional networks[C]. Scottsdale: International Conference on Learning Representations, 2013:264-275.
[8]	Redmon J, Divvala S, Girshick R, et al. You only look once: Unified,real-time object detection[C]. Boston: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016:597-605.
[9]	Liu W, Anguelov D, Erhan D, et al. SSD:Single shot multi box detector[C]. Amsterdam: Proceedings of European Conference on Computer Vision, 2016:369-378.
[10]	Redmon J, Farhadi A. YOLO9000:Better,faster, stronger[C]. Honolulu: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017:1190-1230.
[11]	Redmon J, Farhadi A. YOLOv3:An incremental improvement[C]. Wellington: IEEE Conference on Computer Vision and Pattern Recognition, 2018:752-768.
[12]	Lim J S, Astrid M, Yoon H J, et al. Small object detection using context and attention[C]. Jeju island: International Conference on Artificial Intelligence in Information and Communication, 2021:593-599.
[13]	郭磊, 王邱龙, 薛伟. 基于改进YOLOv5的小目标检测算法[J]. 电子科技大学学报, 2022, 51(2):251-258.
	Guo Lei, Wang Qiulong, Xue Wei. A small object detection algorithm based on improved YOLOv5[J]. Journal of University of Electronic Science and Technology of China, 2022, 51(2):251-258.
[14]	邱天衡, 王玲, 王鹏. 基于改进YOLOv5的目标检测算法研究[J]. 计算机工程与应用, 2022, 58(13):63-73. doi: 10.3778/j.issn.1002-8331.2202-0093
	Qiu Tianheng, Wang Ling, Wang Peng. Research on object detection algorithm based on improved YOLOv5[J]. Computer Engineering and Applications, 2022, 58(13):63-73. doi: 10.3778/j.issn.1002-8331.2202-0093
[15]	张寅, 朱桂熠, 施天俊. 基于特征融合与注意力的遥感图像小目标检测[J]. 光学学报, 2022, 42(24):140-150.
	Zhang Yin, Zhu Guiyi, Shi Tianjun. Small object detection in remote sensing images based on feature fusionand attention[J]. Acta Optica Sinica, 2022, 42(24):140-150.
[16]	Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: Common objects in context[J]. European Conference on Computer Vision, 2014(4):740-755.
[17]	李昂, 孙士杰, 张朝阳. 改进YOLOv5s的轨道障碍物检测模型轻量化研究[J]. 计算机工程与应用, 2023, 59(4):197-207. doi: 10.3778/j.issn.1002-8331.2208-0045
	Li Ang, Sun Shijie, Zhang Chaoyang. Research on lightweight of improved YOLOv5 track obstacle detectio model[J]. Computer Engineering and Applications, 2023, 59(4):197-207. doi: 10.3778/j.issn.1002-8331.2208-0045
[18]	Wang C Y, Liao H Y M, Wu Y H, et al. CSPNet:A new backbone that can enhance learning capability of CNN[C]. Seattle: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020:198-206.
[19]	Lin T Y, Dollar P, Girshick R, et al. Feature pyramid networks for object detection[C]. Honolulu: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017:357-368.
[20]	Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation[C]. Salt Lake City: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018:506-522.
[21]	Pan X, Ge C, Lu R, et al. On the integration of self-attention and convolution[EB/OL].(2021-11-29) [2022-09-26] https://arxiv.org/abs/2111.14556.
[22]	He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]. Las Vegas: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016:778-789.
[23]	Rezatofighi H, Tsoi N, Gwak J Y, et al. Generalized intersection over union:A metric and a loss for bounding box regression[C]. Long Beach: Computer Vision and Pattern Recognition, 2019:3012-3022.
[24]	Zhora G. SIoU Loss:More powerful learning for bounding box regression[EB/OL].(2022-05-25) [2022-09-26] https://arxiv.org/abs/2205.12740.
[25]	Everingham M, Eslami S M A, Gool L V, et al. The pascal, visual object classes challenge:A retrospective[J]. International Journal of Computer Vision, 2015, 111(1):98-136. doi: 10.1007/s11263-014-0733-5

模型	输入尺寸	参数量/MB	mAP50	mAP50∶95
SSD	300×300	26.285	0.783	0.470
YOLOv3	416×416	61.626	0.851	0.583
YOLOv4-tiny	416×416	5.918	0.781	0.403
YOLOv4	416×416	64.040	0.880	0.602
YOLOv5s	640×640	7.115	0.860	0.591
本文	640×640	9.279	0.872	0.599

模型	输入尺寸	运算量/GB	mAP50	mAP50∶95
SSD	640×640	282.197	0.742	0.408
YOLOv3	640×640	155.404	0.806	0.444
YOLOv4-tiny	640×640	16.216	0.676	0.308
YOLOv4	640×640	141.766	0.775	0.430
YOLOv5s	640×640	16.541	0.860	0.591
本文	640×640	23.040	0.872	0.599

模型	输入尺寸	mAP50	Small(mAP50∶95)
SSD	640×640	0.742	0.265
YOLOv3	640×640	0.806	0.362
YOLOv4-tiny	640×640	0.676	0.269
YOLOv4	640×640	0.775	0.385
YOLOv5s	640×640	0.860	0.390
本文	640×640	0.872	0.413

模型	ResAC	4-FPN+PAN	SIOU Loss	mAP50
模型1	√			0.865
模型2		√		0.830
模型3			√	0.862
模型4	√	√		0.865
模型5	√		√	0.868
模型6	√	√	√	0.872

模型	输入尺寸	mAP50	mAP50∶95	Small(mAP50∶95)
YOLOv5s	640×640	0.539	0.356	0.206
本文	640×640	0.549	0.355	0.215

Small Object Detection Based on Convolution and Self-Attention of Aggregation

RichHTML

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 17

References 25

Related Articles 15

Metrics

Comments

Recommended 10

[1]	NAN Jiao,SUN Zhanquan. Sorting Method of Multi Leads ECG Based on Mutual Information [J]. Electronic Science and Technology, 2024, 37(2): 55-60.
[2]	YU Zhihong,LI Feifei. Semi-Supervised Medical Image Segmentation Method Based on Meta-Learning and Neural Architecture Search [J]. Electronic Science and Technology, 2024, 37(1): 17-23.
[3]	YUE Shengyao,XU Baiqiang,XU Guidong,XU Chenguang,ZHANG Sai. Super-Resolution Imaging of Laminate Debonding Defects via Deconvolutional Neural Network and Ultrasound Guided Waves [J]. Electronic Science and Technology, 2023, 36(8): 7-13.
[4]	HU Yongyang,LI Miao,MENG Fankai,ZHANG Feng,MENG Yiwei,SONG Yukun. Structured Compression and Acceleration of Network Based on Tiny-YOLOv3 [J]. Electronic Science and Technology, 2023, 36(8): 43-48.
[5]	SUN Hong,ZHAO Yingzhi. Lightweight Generative Adversarial Networks Based on Multi-Scale Gradient [J]. Electronic Science and Technology, 2023, 36(7): 32-38.
[6]	OU Jingyi,TIAN Ying,XIANG Xin,SONG Qizhe. Fault Diagnosis of Few Shot Industrial Process Based on Transfer BN-CNN Framework [J]. Electronic Science and Technology, 2023, 36(7): 49-55.
[7]	ZHA Junwei,ZHANG Hongyan. Dynamic Receptive Field Feature Selection Dehazing Network [J]. Electronic Science and Technology, 2023, 36(7): 56-63.
[8]	SHI Jianke,QIAO Meiying,LI Bingfeng,ZHAO Yan. Underwater Occlusion Target Detection Algorithm Based on Attention Mechanism [J]. Electronic Science and Technology, 2023, 36(5): 62-70.
[9]	CUI Zhuodong,CHEN Wei,YIN Zhong. Helmet Wearing Detection Based on Enhanced Feature Fusion Network [J]. Electronic Science and Technology, 2023, 36(4): 44-51.
[10]	SUN Hong,ZHANG Yuxiang. Super-Resolution Image Reconstruction Algorithm Based on Multi-Feature Gated Feedback Residual Network [J]. Electronic Science and Technology, 2023, 36(4): 65-70.
[11]	HUANG Yuan,WEI Yunbing,TONG Dongbing,WANG Weigao. Short-Term Photovoltaic Power Prediction Based on VMD and Improved TCN [J]. Electronic Science and Technology, 2023, 36(3): 42-49.
[12]	ZUO Bin,LI Feifei. An Effective Segmentation Method for COVID-19 CT Image Based on Attention Mechanism and Inf-Net [J]. Electronic Science and Technology, 2023, 36(2): 22-28.
[13]	YU Guangzeng,ZHANG Qiaoling,ZHOU Yurong. Bearing Fault Diagnosis Based on SC-CNN-BiLSTM [J]. Electronic Science and Technology, 2023, 36(11): 56-65.
[14]	WANG Qiao,HU Chunyan,LI Feifei. Scene Recognition Algorithm Based on Deep Transfer Learning and Multi-Scale Feature Fusion [J]. Electronic Science and Technology, 2023, 36(11): 19-27.
[15]	CHEN Ziang,LIU Na,YUAN Ye,LI Qingdu,WAN Lihong. Chinese License Plate Detection and Recognition in Unconstrained Scenarios Based on YOLO [J]. Electronic Science and Technology, 2023, 36(10): 1-8.