«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.issn.1001-4616.2025.06.013]
点击复制

基于深度融合与噪声扰动增强的两阶段单目3D目标检测()

《南京师大学报（自然科学版）》[ISSN:1001-4616/CN:32-1239/N]

卷:: 48
期数:: 2025年06期

页码:: 121-128

栏目:: 计算机科学与技术

出版日期:: 2025-12-20

文章信息/Info

Title:: Two-Stage Monocular 3D Object Detection Based on Deep Fusion and Noise Perturbation Enhancement

文章编号:: 1001-4616(2025)06-0121-08

作者:: 章友¹; 伊君²; 余烨¹; (1.合肥工业大学计算机与信息学院,安徽合肥230601)
(2.黄冈师范学院,湖北黄冈 438000)

Author(s):: Zhang You¹; Yi Jun²; Yu Ye¹; (1.School of Computer and Information,Hefei University of Technology,Hefei 230601,China)
(2.Huanggang Normal University,Huanggang 438000,China)

关键词:: 3D目标检测; 噪声扰动; 特征增强; 深度融合

Keywords:: 3D object detection; noise perturbation; feature enhancement; depth fusion

分类号:: O643/X703

DOI:: 10.3969/j.issn.1001-4616.2025.06.013

文献标志码:: A

摘要:: 单目3D目标检测是自动驾驶系统中的重要技术之一,随着自动驾驶需求的增加,单目3D目标检测受到了越来越多的重视. 然而,从单幅图像中精确定位3D目标是一个极具挑战性的问题,一方面,深度信息估计的精度还有待提升,另一方面,现有方法通常采用3D与2D检测分支联合训练的策略,这种耦合方式限制了2D检测分支的性能优化. 为解决上述问题,本文提出了一种基于深度融合与噪声扰动增强的两阶段单目3D目标检测方法. 该方法设计了一种深度信息融合机制,通过评估不同深度估计结果的可靠性,采用自适应加权策略进行深度融合,显著提升了深度估计精度. 同时,提出了一种解耦式训练策略,将3D与2D检测分支独立训练,并在2D检测分支中引入噪声扰动,通过数据增强的方式强化2D特征提取能力,从而为3D检测提供更可靠的2D信息支持. 在KITTI数据集上对本文提出的模型进行了验证,结果显示该模型对车辆目标识别较好,对于数据集中不同难度的目标都取得了较好的效果.

Abstract:: Monocular 3D object detection is one of the important technologies in autonomous driving systems. With the increasing demand for autonomous driving,monocular 3D object detection has received more and more attention. However,accurately locating 3D objects from a single image is a very challenging problem. On the one hand,the accuracy of depth information estimation needs to be improved. On the other hand,existing methods usually adopt a strategy of joint training of 3D and 2D detection branches. This coupling method limits the performance optimization of the 2D detection branch. To solve the above problems,this paper proposes a two-stage monocular 3D object detection method based on depth fusion and noise perturbation enhancement. This method designs a depth information fusion mechanism. By evaluating the reliability of different depth estimation results,an adaptive weighting strategy is used for depth fusion,which significantly improves the depth estimation accuracy. At the same time,a decoupled training strategy is proposed to train 3D and 2D detection branches independently,introduce noise perturbations in the 2D detection branch,and enhance the 2D feature extraction capability by data enhancement,thereby providing more reliable 2D information support for 3D detection. The model proposed in this paper is verified on the KITTI dataset. The results show that the model has good vehicle target recognition and has achieved good results for targets of different difficulty levels in the dataset.

参考文献/References:

[1]黄哲,王永才,李德英. 3D目标检测方法研究综述[J]. 智能科学与技术学报,2023,5(1):7-31.
[2]刘永刚,于丰宁,章新杰,等. 基于激光点云与图像融合的3D目标检测研究[J]. 机械工程学报,2022,58(24):289-299.
[3]DUAN K,BAI S,XIE L,et al. Centernet:Keypoint triplets for object detection[C]//International Conference on Computer Vision. Seoul:IEEE,2019:6569-6578.
[4]MOUSAVIAN A,ANGUELOV D,FLYNN J,et al. 3D bounding box estimation using deep learning and geometry[C]//Conference on Computer Vision and Pattern Recognition. Hawaii:IEEE,2017:7074-7082.
[5]JIANG X,JIN S,ZHANG X,et al. MonoMAE:enhancing monocular 3D detection through depth-aware masked autoencoders[C]//Neural Information Processing Systems. Vancouver:Morgan Kaufmann,2024:11392-11411.
[6]TAO R,HAN W,QIU Z,et al. Weakly supervised monocular 3d object detection using multi-view projection and direction consistency[C]//Conference on Computer Vision and Pattern Recognition. Vancouver:IEEE,2023:17482-17492.
[7]ZHOU Y,ZHU H,LIU Q,et al. MonoATT:online monocular 3D object detection with adaptive token transformer[C]//Conference on Computer Vision and Pattern Recognition. Vancouver:IEEE,2023:17493-17503.
[8]WU Z,GAN Y,WANG L,et al. MonoPGC:Monocular 3d object detection with pixel geometry contexts[C]//International Conference on Robotics and Automation. London:IEEE,2023:4842-4849.
[9]MA X,WANG Z,LI H,et al. Accurate monocular 3D object detection via color-embedded 3D reconstruction for autonomous driving[C]//International Conference on Computer Vision. Seoul:IEEE,2019:6851-6860.
[10]HE T,SOATTO S. Mono3D++:Monocular 3D vehicle detection with two-scale 3D hypotheses and task priors[C]//AAAI Conference on Artificial Intelligence. Hawaii:AAAI Press,2019,33(01):8409-8416.
[11]HUANG R,ZHENG H,WANG Y,et al. Training an open-vocabulary monocular 3d detection model without 3d data[C]//Neural Information Processing Systems. Vancouver:Morgan Kaufmann,2024,37:72145-72169.
[12]XU J,PENG L,CHENG H,et al. Mononerd:Nerf-like representations for monocular 3d object detection[C]//International Conference on Computer Vision. Los Angeles:IEEE,2023:6814-6824.
[13]LIU Z,ZHOU D,LU F,et al. Autoshape:real-time shape-aware monocular 3D object detection[C]//International Conference on Computer Vision. Montreal:IEEE,2021:15641-15650.
[14]XIONG K,ZHANG D,LIANG D,et al. You only look bottom-up for monocular 3d object detection[J]. Robotics and automation letters,2023,8(11):7464-7471.
[15]YU F,WANG D,SHELHAMER E,et al. Deep layer aggregation[C]//Conference on Computer Vision and Pattern Recognition. Salt Lake City:IEEE,2018:2403-2412.
[16]GEIGER A,LENZ P,STILLER C,et al. Vision meets robotics:the kitti dataset[J]. The international journal of robotics research,2013,32(11):1231-1237.
[17]朱文佳,张婷,程茹秋,等. 考虑背景失真的无参考视频质量评价方法[J]. 南京师大学报(自然科学版),2025,48(3):103-111.
[18]WANG L,DU L,YE X,et al. Depth-conditioned dynamic message propagation for monocular 3d object detection[C]//Conference on Computer Vision and Pattern Recognition. Nashville:IEEE,2021:454-463.
[19]BRAZIL G,PONS-MOLL G,LIU X,et al. Kinematic 3d object detection in monocular video[C]//European Conference on Computer Vision. Glasgow:Springer,2020:135-152.
[20]CHEN H,HUANG Y,TIAN W,et al. Monorun:monocular 3D object detection by reconstruction and uncertainty propagation[C]//Conference on Computer Vision and Pattern Recognition. Nashville:IEEE,2021:10379-10388.
[21]READING C,HARAKEH A,CHAE J,et al. Categorical depth distribution network for monocular 3D object detection[C]//Conference on Computer Vision and Pattern Recognition. Nashville:IEEE,2021:8555-8564.
[22]HUANG K C,WU T H,SU H T,et al. Monodtr:monocular 3D object detection with depth-aware transformer[C]//Conference on Computer Vision and Pattern Recognition. New Orleans:IEEE,2022:4012-4021.
[23]LIU Z,WU Z,TTH R. Smoke:single-stage monocular 3D object detection via keypoint estimation[C]//Conference on Computer Vision and Pattern Recognition. Seattle:IEEE,2020:996-997.
[24]CHEN Y,TAI L,SUN K,et al. Monopair:monocular 3D object detection using pairwise spatial relationships[C]//Conference on Computer Vision and Pattern Recognition. Seattle:IEEE,2020:12093-12102.
[25]MA X,ZHANG Y,XU D,et al. Delving into localization errors for monocular 3D object detection[C]//Conference on Computer Vision and Pattern Recognition. Nashville:IEEE,2021:4721-4730.
[26]ZHANG Y,LU J,ZHOU J. Objects are different:flexible monocular 3D object detection[C]//Conference on Computer Vision and Pattern Recognition. Nashville:IEEE,2021:3289-3298.
[27]QIN Z,LI X. Monoground:detecting monocular 3D objects from the ground[C]//Conference on Computer Vision and Pattern Recognition. New Orleans:IEEE,2022:3793-3802.
[28]SHI X,CHEN Z,KIM T K. Multivariate probabilistic monocular 3D object detection[C]//Winter Conference on Applications of Computer Vision(WACV). Hawaii:IEEE,2023:4270-4279.
[29]GUAN H,SONG C,ZHANG Z. GRAMO:geometric resampling augmentation for monocular 3D object detection[J]. Frontiers of computer science,2024,18(5):185706.

备注/Memo

备注/Memo:: 收稿日期:2025-03-11.
基金项目:安徽省自然科学基金面上资助项目(2308085MF216)、国家自然科学基金面上资助项目(62372153).
通讯作者:伊君,博士,讲师,研究方向:计算机视觉、多媒体技术与三维重建. E-mail:junyi@hgnu.edu.cn

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed271
全文下载/Downloads308
评论/Comments

更新日期/Last Update: 2025-12-20