改进YOLO的口罩佩戴实时检测方法

doi:10.16180/j.cnki.issn1007-7820.2023.02.011

摘要/Abstract

摘要：

现有的YOLO目标检测模型基于One-stage思想进行多目标检测,其对于双分类检测有所不足,并且检测时性能消耗较大。为了能够在新冠疫情爆发的特殊时期,提高双分类口罩佩戴的检测精度和检测效率,文中提出了一种基于YOLO的双目标口罩佩戴实时检测方法。改进模型的前馈输入层,优化了数据增强部分,添加了自适应图片缩放,以便提升双分类和小目标的检测精度和检测效率。添加了自适应锚定框,替换了激活函数,降低了方法的计算量从而提高方法的检测效率。Neck部分优化和添加的Focus结构提高了特征融合能力并且减少了参数量,达到了提速的效果。实验结果表明,与YOLOv4相比,所提方法在文中数据集中的F₁提高了0.33%,mAp提高了0.71%,并且相同实验环境下的检测效率也提升明显。

关键词: YOLOv4, CSPDenseNet, Focus, 数据增强, 激活函数, CSP2, 目标检测, 口罩佩戴

Abstract:

The existing YOLO target detection model is based on the One-stage idea for multi-target detection. It is insufficient for dual-classification detection, and the performance consumption is large during detection. In order to improve the detection efficiency of dual-classification mask wearing during the period of the outbreak of COVID-19, this study proposes a real-time detection method based on YOLO for detecting the condition of bi-objective mask wearing. The feedforward input layer of the model is improved, the data enhancement part is optimized, and adaptive image scaling is added to improve the detection accuracy and detection efficiency of dual-classification and small targets. The adaptive anchoring frame is added to replace the activation function so as to reduce the computational complexity of the method and improves the detection efficiency of the method. The optimization of Neck and the addition of Focus structure improve the capability of feature fusion and reduce the amount of parameters to raise the efficiency. The experimental results showed that compared with the YOLOv4, the proposed method has a 0.33% increase in F1 and a 0.71% increase in mAp in the data set in the text, and the detection efficiency is also significantly improved under the same experimental environment.

Key words: YOLOv4, CSPDenseNet, Focus, data augmentation, activation function, CSP2, target detection, mask wearing

中图分类号:

TP391

程长文,陈玮,陈劲宏,尹钟. 改进YOLO的口罩佩戴实时检测方法[J]. 电子科技, 2023, 36(2): 73-80.

CHENG Changwen,CHEN Wei,CHEN Jinhong,YIN Zhong. YOLO-Improve Detection Method of Real-Time Mask Wearing[J]. Electronic Science and Technology, 2023, 36(2): 73-80.

图/表 19

图1

图2

图3

图4

表1

图5

图6

图7

图8

图9

图10

图11

表2

表3

表4

表5

图12

表6

图13

参考文献 26

[1]	Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]. Columbus: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014.
[2]	Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[J]. Advances in Neural Information Processing Systems, 2015, 28(4):91-99.
[3]	Tan M, Pang R, Le Q V. Efficientdet: Scalable and efficient object detection[C]. Seattle: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
[4]	Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks[C]. Shanghai: International Conference on MachineLearning, 2019.
[5]	Deng J, Guo J, Ververas E, et al. Retinaface: Single-shot multi-level face localisation in the wild[C]. Seattle: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
[6]	陈磊, 张孙杰, 王永雄. 基于改进的YOLOv3及其在遥感图像中的检测[J]. 小型微型计算机系统, 2020, 41(11):2321-2324.
	Chen Lei, Zhang Sunjie, Wang Yongxiong. Based on improved YOLOv3 and its detection in remote sensing images[J]. Journal of Chinese Computer Systems, 2020, 41(11):2321-2324.
[7]	Redmon J, Divvala S, Girshick R, et al. You only look once: Unified,real-time object detection[C]. Las Vegas: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[8]	Wang C Y, Bochkovskiy A, Liao H Y M. Scaled-YOLOv4: Scaling cross stage partial network[C]. Seattle: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
[9]	Redmon J, Farhadi A. YOLOv3: An incremental improvement[C]. Salt Lake City: Proceedings of the Conference on Computer Vision and Pattern Recognition, 2018.
[10]	Maas A L, Hannun A Y, Ng A Y. Rectifier nonlinearities improve neural network acoustic models[C]. Atlanta: Proceedings of the International Conference on Machine Learning, 2013.
[11]	Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]. Paris: International Conference on Machine Learning, 2015.
[12]	He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]. Las Vegas: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[13]	Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]. Honolulu: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
[14]	Wang C Y, Liao H Y M, Wu Y H, et al. CSPNet: A new backbone that can enhance learning capability of CNN[C]. Seattle: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020.
[15]	Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional Nnetworks[C]. Honolulu: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
[16]	Wang C Y, Liao H Y M, Wu Y H, et al. CSPNet: A new backbone that can enhance learning capability of CNN[C]. Seattle: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020.
[17]	He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916. doi: 10.1109/TPAMI.2015.2389824 pmid: 26353135
[18]	Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]. Honolulu: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
[19]	Wang W, Xie E, Song X, et al. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network[C]. Seoul: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.
[20]	Yun S, Han D, Oh S J, et al. Cutmix: Regularization strategy to train strong classifiers with localizable features[C]. Seoul: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.
[21]	Ge S, Li J, Ye Q, et al. Detecting masked faces in the wild with lle-cnns[C]. Honolulu: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
[22]	张春蕾, 牛馨苑. 结合YOLO的ORB双目图像匹配方法研究[J]. 小型微型计算机系统, 2020, 41(1):185-189.
	Zhang Chunlei, Niu Xinyuan. Research on ORB binocular image matching method based on YOLO[J]. Journal of Chinese Computer Systems, 2020, 41(1):185-189.
[23]	牛作东, 覃涛, 李捍东, 等. 改进RetinaFace的自然场景口罩佩戴检测算法[J]. 计算机工程与应用, 2020, 56(12):1-7. doi: 10.3778/j.issn.1002-8331.2002-0402
	Niu Zuodong, Qin Tao, Li Handong, et al. Improved algorithm of retinaFace for Natural scene mask wear detection[J]. Computer Engineering and Applications, 2020, 56(12):1-7. doi: 10.3778/j.issn.1002-8331.2002-0402
[24]	张修宝, 林子原, 田万鑫, 等. 全天候自然场景下的人脸佩戴口罩识别技术[J]. 中国科学:信息科学, 2020, 50(7):1110-1120.
	Zhang Xiubao, Lin Ziyuan, Tian Wanxin, et al. Mask-wearing recognition in the wild[J]. Science in China: Information Sciences, 2020, 50(7):1110-1120.
[25]	邓黄潇. 基于迁移学习与RetinaNet的口罩佩戴检测的方法[J]. 电子技术与软件工程, 2020(5):209-211.
	Deng Huangxiao. Method of mask wearing detection based on transfer learning and RetinaNet[J]. Electronic Technology and Software Engineering, 2020(5):209-211.
[26]	赵崇, 迟蒙蒙, 储聪, 等. 导盲犬行走机构运动仿真及其视觉识别算法研究[J]. 电子科技, 2021, 34(9):66-72.
	Zhao Chong, Chi Mengmeng, Chu Cong, et al. Research on motion simulation and visual recognition algorithm of guide dog walking mechanism[J]. Electronic Science and Technology, 2021, 34(9):66-72.

参数名	参数值	参数解释
hsv_h	0.015	图像 HSV-Hue 增强(小数)
hsv_s	0.7	图像HSV-饱和度增强(小数)
hsv_v	0.6	图像HSV-值增强(小数)
degrees	1.0	图像旋转(+/- deg)
translate	0.1	图像翻译(+/- fraction)
scale	0.6	图像比例(+/- gain)
shear	1.0	图像剪切(+/- deg)
perspective	0.0	图像透视(+/- fraction), range 0-0.001
flipud	0.01	图像上下翻转(比例)
fliplr	0.5	图像左右翻转(比例)
mixup	0.2	图像混合(比例)

实验环境	环境配置
操作系统	Windows10
处理器	i7-9750H
内存	32 GB
显卡	GTX-1080TI(11 GB)
开发环境	Python
编辑器	PyCharm
深度学习框架	Pytorch(v1.7)

实验参数	参数值
迭代次数(epochs)	300
每次处理张数	16
图像尺寸	608×608
学习率(lr)	0.01
学习率衰减(weight_decay)	5e^-4
是否开启余弦退火	false
是否开启马赛克增强	true
是否使用cuda	true

	正类	负类
正确	True Positive(TP)	True Negative(TN)
错误	False Positive(FP)	False Negative(FN)

方法	RetinaFace	DFS	RetinaNet
数据集	3 000	32 203	7 959
F₁分数/%	91.71	92.34	89.55
帧速率/frame·s^-1	18.30	16.80	20.82
各类别AP的平均值(mAP)/%	90.30	91.20	86.45