[1]郑 冬,李向群,许新征.基于轻量化SSD的车辆及行人检测网络[J].南京师范大学学报(自然科学版),2019,42(01):73.[doi:10.3969/j.issn.1001-4616.2019.01.012]
 Zheng Dong,Li Xiangqun,Xu Xinzheng.Vehicle and Pedestrian Detection Model Based on Lightweight SSD[J].Journal of Nanjing Normal University(Natural Science Edition),2019,42(01):73.[doi:10.3969/j.issn.1001-4616.2019.01.012]
点击复制

基于轻量化SSD的车辆及行人检测网络()
分享到:

《南京师范大学学报》(自然科学版)[ISSN:1001-4616/CN:32-1239/N]

卷:
第42卷
期数:
2019年01期
页码:
73
栏目:
·人工智能算法与应用专栏·
出版日期:
2019-03-20

文章信息/Info

Title:
Vehicle and Pedestrian Detection Model Based on Lightweight SSD
文章编号:
1001-4616(2019)01-0073-09
作者:
郑 冬1李向群1许新征12
(1.中国矿业大学计算机科学与技术学院,江苏 徐州 221116)(2.数据科学与智能应用福建省高校重点实验室,福建 漳州 363000)
Author(s):
Zheng Dong1Li Xiangqun1Xu Xinzheng12
(1.School of Computer Science and Technology,China University of Mining and Technology,Xuzhou 221116,China)(2.Key Laboratory of Data Science and Intelligence Application,Fujian Province University,Zhangzhou 363000,China)
关键词:
目标检测卷积神经网络轻量化神经网络SSDMobileNetv2
Keywords:
object detectionconvolutional neural networklightweight neural networkSSDmobileNetv2
分类号:
TP193
DOI:
10.3969/j.issn.1001-4616.2019.01.012
文献标志码:
A
摘要:
近年来,基于深度学习的目标检测算法发展迅速. 但是由于深度网络规模过大,导致其还不能在嵌入式平台上进行广泛应用. 本文针对SSD(Single Shot Multi-box Detector)模型的规模进行优化,引入了轻量化卷积神经网络MobileNetv2,对比了SSD和其轻量化版本SSDLite的网络结构,在此基础上提出了基于轻量化SSD的车辆及行人检测模型LVP-DN(Lightweight Vehicle and Pedestrian Detection Network). 首先,通过MobilNetv2替代VGG作为基础网络进行特征提取. 然后,用轻量化的SSD版本SSDLite替代SSD,从而达到减少模型大小、加快检测速度的目的. 进一步通过优化默认候选框的比例,提高了网络对行人的检测精度. 最后,在KITTI和PASCAL VOC数据集上分别对比了不同基础网络、输入图像尺寸及是否使用预训练模型这3个因素对网络性能的影响. 实验结果表明,相比其他流行的目标检测模型,本文所提出的车辆及行人检测模型在精度、速度和模型大小等评价标准上取得了较好的效果.
Abstract:
In recent years,the object detection algorithm based on deep learning has developed rapidly. However,it can’t be widely used in embedded platforms because the network is too large. This paper optimized the model size of SSD(Single Shot Multi-box Detector)network,introduced the lightweight convolutional neural network—MobileNetv2,analyzed the inverted residual and linear bottleneck structure in MobileNetv2,and compared SSD and its lightweight version—SSDLite. We proposed a lightweight vehicle and pedestrian detection model which named LVP-DN(Lightweight Vehicle and Pedestrian Detection Network). First,the MobilNetv2 was used to instead of VGG as the basic network to perform feature extraction. Then,the SSDLite was used to replace the original structure,in order to reduce the model size and speed up the detection process. It is improved that the accuracy of network detection for pedestrians by optimizing the ratio of the default box. We compared the impact of three factors on network performance on the KITTI and PASCAL VOC datasets. The factors are the input image size,different basic network and whether used the pre-training models. The experimental results show that compared with other popular object detection models,the vehicle and pedestrian detection models proposed in this paper have achieved good results in the evaluation standards such as accuracy,speed,and model size.

参考文献/References:

[1] SZEGEDY C,TOSHEV A,ERHAN D. Deep neural networks for object detection[C]//International Conference on Neural Information Processing Systems. USA:MIT Press,2013,26:2553-2561.
[2]SERMANET P,EIGEN D,ZHANG X,et al. OverFeat:integrated recognition,localization and detection using convolutional networks[J]. Eprint Arxiv,2013:1312.6229.
[3]REN S,HE K,GIRSHICK R,et al. Faster R-CNN:towards real-time object detection with region proposal networks[C]//International Conference on Neural Information Processing Systems. Canada:MIT Press,2015:91-99.
[4]DAI J,LI Y,HE K,et al. R-FCN:Object detection via region-based fully convolutional networks[J]. Eprint Arxiv,2016:1605.06409.
[5]HE K,GKIOXARI G,DOLLáR P,et al. Mask R-CNN[C]//IEEE International Conference on Computer Vision. Italy:IEEE,2017:2980-2988.
[6]LIU W,ANGUELOV D,ERHAN D,et al. SSD:single shot multibox detector[C]//Computer Vision-ECCV 2016. Amsterdam,the Netherlands:Springer International Publishing,2016:21-37.
[7]REDMON J,DIVVALA S,GIRSHICK R,et al. You only look once:unified,real-time object detection[C]//Computer Vision and Pattern Recognition. USA:IEEE,2016:779-788.
[8]GIRSHICK R,DONAHUE J,DARRELL T,et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition. USA:IEEE Computer Society,2014:580-587.
[9]SANDLER M,HOWARD A,ZHU M,et al. MobileNetV2:inverted residuals and linear bottlenecks[J]. Eprint Arxiv,2018.
[10]GIRSHICK R. Fast R-CNN[C]//IEEE International Conference on Computer Vision. USA:IEEE Computer Society,2015:1440-1448.
[11]REDMON J,FARHADI A. YOLO9000:better,faster,stronger[C]//IEEE Conference on Computer Vision and Pattern Recognition. Italy:IEEE Computer Society,2017:6517-6525.
[12]REDMON J,FARHADI A. YOLOv3:an incremental improvement[J]. Eprint Arxiv,2018:104.02767.
[13]HE K,ZHANG X,REN S,et al. Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition. USA:IEEE Computer Society,2016:770-778.
[14]LIN T Y,DOLLAR P,GIRSHICK R,et al. Feature pyramid networks for object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition. Italy:IEEE Computer Society,2017:936-944.
[15]WONG A,SHAFIEE M J,LI F,et al. Tiny SSD:a tiny single-shot detection deep convolutional neural network for real-time embedded object detection[C]//Conference on Computer and Robot Vision. Foronto:IEEE,2018(15):95-101.
[16]IANDOLA F N,HAN S,MOSKEWICZ M W,et al. SqueezeNet:AlexNet-level accuracy with 50×fewer parameters and<0.5 MB model size[J]. Eprint Arxiv,2016:1602.07360.
[17]EVERINGHAM M. The PASCAL visual object classes challenge[J]. Lecture notes in computer science,2005,111(1):98-136.
[18]CHOLLET F. Xception:deep learning with depthwise separable convolutions[C]//IEEE Conference on Computer Vision and Pattern Recognition. Italy:IEEE Computer Society,2017:1800-1807.
[19]HOWARD A G,ZHU M,CHEN B,et al. MobileNets:efficient convolutional neural networks for mobile vision applications[J]. Eprint Arxiv,2017.
[20]GEIGER A,LENZ P,URTASUN R. Are we ready for autonomous driving?The KITTI vision benchmark suite[C]//IEEE Conference on Computer Vision and Pattern Recognition. USA:IEEE Computer Society,2012:3354-3361.
[21]GEIGER A,LENZ P,STILLER C,et al. Vision meets robotics:the KITTI dataset[J]. International journal of robotics research,2013,32(11):1231-1237.
[22]LECUN Y,BOTTOU L,BENGIO Y,et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE,1998,86(11):2278-2324.
[23]SIMONYAN K,ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]//International Conference on Leoorning Representations. USA:IEEE,2014.
[24]GIRSHICK R. Fast R-CNN[C]//IEEE International Conference on Computer Vision. USA:IEEE Computer Society,2015:1440-1448.
[25]KIM H,LEE Y,YIM B,et al. On-road object detection using deep neural network[C]//IEEE International Conference on Consumer Electronics-Asia. Korea:IEEE,2016:1-4.
[26]HUANG J,GUADARRAMA S,MURPHY K,et al. Speed/accuracy trade-offs for modern convolutional object detectors[C]//IEEE International Conference on Computer Vision. USA:IEEE Computer Society,2016:3296-3297.

相似文献/References:

[1]王 征,李皓月,许洪山,等.基于卷积神经网络和SVM的中国画情感分类[J].南京师范大学学报(自然科学版),2017,40(03):74.[doi:10.3969/j.issn.1001-4616.2017.03.011]
 Wang Zheng,Li Haoyue,Xu Hongshan,et al.Chinese Painting Emotion Classification Based onConvolution Neural Network and SVM[J].Journal of Nanjing Normal University(Natural Science Edition),2017,40(01):74.[doi:10.3969/j.issn.1001-4616.2017.03.011]
[2]方谦昊,朱 红,何瀚志,等.基于卷积神经网络的脑膜瘤亚型影像自动分级[J].南京师范大学学报(自然科学版),2018,41(03):22.[doi:10.3969/j.issn.1001-4616.2018.03.004]
 Fang Qianhao,Zhu Hong,He Hanzhi,et al.Automatic Classification of Meningioma Subtype ImageBased on Convolutional Neural Network[J].Journal of Nanjing Normal University(Natural Science Edition),2018,41(01):22.[doi:10.3969/j.issn.1001-4616.2018.03.004]

备注/Memo

备注/Memo:
收稿日期:2018-08-19.
基金项目:国家自然科学基金(61672522)、数据科学与智能应用福建省高校重点实验室开放课题(D1804).
通讯联系人:许新征,博士,副教授,研究方向:机器学习和模式识别. E-mail:xuxinzh@163.com
更新日期/Last Update: 2019-03-30