一种具有多尺度感受视野注意力机制的生活垃圾单阶段目标检测方法

魏铖磊; 南新元; 李成荣; 罗杨宇

doi:10.13205/j.hjgc.202201026

一种具有多尺度感受视野注意力机制的生活垃圾单阶段目标检测方法

doi: 10.13205/j.hjgc.202201026

1. 新疆大学电气工程学院, 乌鲁木齐 830047;
2. 中国科学院自动化研究所智能制造技术与系统研究中心, 北京 100190

详细信息

作者简介:
魏铖磊(1994-),男,硕士在读,主要研究方向为计算机视觉、目标检测及跟踪。15689131877@163.com

通讯作者:
南新元(1967-),男,博士,教授,主要研究方向为机器学习、工业环境技术与应用。nxyxd@sina.com

计量
- 文章访问数: 238
- HTML全文浏览量: 52
- PDF下载量: 2
- 被引次数: 0
出版历程
- 收稿日期: 2021-01-17
- 网络出版日期: 2022-03-30
- 刊出日期: 2022-03-30

A SINGLE-STAGE OBJECT DETECTION METHOD FOR DOMESTIC GARBAGE BASED ON MULTI-SCALE RECEPTIVE FIELD ATTENTION MECHANISM

1. College of Electrical Engineering, Xinjiang University, Urumqi 830047, China;
2. Intelligent Manufacturing Technology and System Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

摘要

摘要: 生活垃圾种类繁杂,传统垃圾分选工艺的效率及精确度较低,为提高多尺度、不同材质垃圾的检测精度,同时保证垃圾分类的鲁棒性,基于现有深度卷积神经网络和单阶段目标检测算法YOLOv3,提出具有多尺度感受视野注意力机制的ECA_ERFB_s-YOLOv3算法。首先在算法检测器前引入多尺度感受视野模块,使算法能选择合适的感受视野对不同尺度垃圾物体进行匹配,提高了检测精度;然后,使用ResNet50替换原骨架网络Darknet53,在迁移学习条件下,使用高效注意力机制对ResNet50和多尺度感受视野模块中的特征进行自主增强和抑制,提高了算法的鲁棒性。最后,使用K-means算法对锚框进行回归,并设计了锚框的分配方式。消融实验结果表明:ECA_ERFB_s-YOLOv3精度更高,鲁棒性更好;在检测密集堆放的生活垃圾时,算法能较好地满足任务需要,表现出更好的检测效果。
- 垃圾分选 /
- 目标检测 /
- 多尺度感受视野 /
- 注意力机制
Abstract: In order to improve the detection accuracy of multi-scale and different materials and ensure the robustness of waste classification, based on the existing deep convolution neural network and single-stage target detection algorithm YOLOv3, an ECA with multi-scale perception visual field attention mechanism was proposed as ECA_ ERFB_ S-YOLOv3 algorithm. The multi-scale perceptual field module was introduced in front of the algorithm detector, so that the algorithm could select the appropriate perceptual field to match the garbage objects with different scales, and the detection accuracy was improved; then, ResNet50 was used to replace the original skeleton network Darknet53. Under the condition of transfer learning, efficient attention mechanism was used to autonomously enhance and suppress the features in ResNet50 and multi-scale sensory visual field module, which improved the robustness of the algorithm. Finally, K-means algorithm was used to regress the anchor box, and the allocation method of anchor box was designed. The results of ablation experiment showed that ECA_ ERFB_ S-YOLOV3 had higher precision and better robustness; when detecting densely stacked domestic waste, the algorithm could better meet the needs of the task and show better detection effect.
- garbage sorting /
- object detection /
- multi-scale receptive field /
- attention mechanism

HTML全文

参考文献(26)

[1]	冯林玉,秦鹏.生活垃圾分类的实践困境与义务进路[J].中国人口·资源与环境,2019,29(5):118-126.
[2]	刘维来,贾思良,刘静.基于红外吸收谱的建筑固废有/无机物分选研究[C]//《环境工程》2019年全国学术年会论文集(中册),2019.
[3]	白洋.废旧塑料溶气浮选分离研究[D].长沙:中南大学,2009.
[4]	梁勇,李博,马刚平,等.建筑垃圾资源化处置技术及装备综述[J].环境工程,2013,31(4):109-113.
[5]	杜欢政,刘飞仁.我国城市生活垃圾分类收集的难点及对策[J].新疆师范大学学报(哲学社会科学版),2020,41(1):134-144.
[6]	LOWE D G.Distinctive image features from scale-invariant keypoints[J].International Journal of Computer Vision,2004,60(2):91-110.
[7]	DALAL N,TRIGGS B.Histograms of oriented gradients for human detection[C]//IEEE Computer Society Conference on Computer Vision & Pattern Recognition,IEEE,2005.
[8]	REN S Q,HE K M,GIRSHICK R,et al.Faster R-CNN:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,39(6):1137-1149.
[9]	GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Computer Vision and Pattern Recognition,Wasgington:IEEE,2014:580-587.
[10]	Girshick R.Fast R-CNN[C]//International Conference on Computer Vision,Wasgington:IEEE,2015:1440-1448.
[11]	周滢慜.基于机器视觉的生活垃圾智能分拣系统的设计与实现[D].哈尔滨:哈尔滨工业大学,2018.
[12]	LIU W,ANGUELOV D,ERHAN D,et al.SSD:single shot multibox detector[C]//European Conference on Computer Vision,2016:21-37.
[13]	REDMON J,DIVVALA S,GIRSHICK R et al.You only look once:unified,real-time object detection,2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),Las Vegas,NV,2016:779-788.
[14]	REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[C]//IEEE Conference on Computer Vision & Pattern Recognition,IEEE,2017:6517-6525.
[15]	REDMON J,FARHADI A.YOLOv3:an incremental improvement[J].arXiv e-Prints,2018.
[16]	宁凯,张东波,印峰,等.基于视觉感知的智能扫地机器人的垃圾检测与分类[J].中国图象图形学报,2019,24(8):1358-1368.
[17]	HUANG G,LIU Z,LAURENS V D M,et al.Densely connected convolutional networks[J].2016.
[18]	LIN T Y,DOLLAR P,GIRSHICK R,et al.Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),IEEE Computer Society,2017.
[19]	HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision & Pattern Recognition,IEEE Computer Society,2016.
[20]	LIU S T,HUANG D,WANG Y H.(2017).Receptive Field Block Net for Accurate and Fast Object Detection.
[21]	SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),IEEE,2015.
[22]	ZEILER M D,FERGUS R.Visualizing and understanding convolutional networks[C]//European Conference on Computer Vision,Springer,Cham,2014.
[23]	WANG Q,WU B,ZHU P,et al.ECA-Net:Efficient channel attention for deep convolutional neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),IEEE,2020.
[24]	HU J,SHEN L,ALBANIE S,et al.Squeeze-and-excitation networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,42(8):2011-2013.
[25]	LIN T Y,MAIRE M,BELONGIE S,et al.Microsoft COCO:common objects in context[C]//European Conference on Computer Vision,Springer International Publishing,2014.
[26]	SELVARAJU R R,COGSWELL M,DAS A,et al.Grad-CAM:visual explanations from deep networks via gradientbased localization[C]//2017 IEEE International Conference on Computer Vision,2017.