[1]陈 斌,樊飞燕,陆天易.残差混合注意力结合骨骼图卷积多人姿态识别[J].南京师大学报(自然科学版),2024,(04):106-117.[doi:10.3969/j.issn.1001-4616.2024.04.012]
 Chen Bin,Fan Feiyan,Lu Tianyi.Skelton-based Graph Convolution with Residual Combined with Mixed Attention Mechanism for Multi-Person Posture Recognition[J].Journal of Nanjing Normal University(Natural Science Edition),2024,(04):106-117.[doi:10.3969/j.issn.1001-4616.2024.04.012]
点击复制

残差混合注意力结合骨骼图卷积多人姿态识别()
分享到:

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

卷:
期数:
2024年04期
页码:
106-117
栏目:
计算机科学与技术
出版日期:
2024-12-15

文章信息/Info

Title:
Skelton-based Graph Convolution with Residual Combined with Mixed Attention Mechanism for Multi-Person Posture Recognition
文章编号:
1001-4616(2024)04-0106-12
作者:
陈 斌樊飞燕陆天易
(南京师范大学信息化建设管理处,江苏 南京 210023)
Author(s):
Chen BinFan FeiyanLu Tianyi
(Informatization Office,Nanjing Normal University,Nanjing 210023,China)
关键词:
多人姿态识别残差混合注意力机制骨骼关键点图图卷积
Keywords:
multi-person posture recognitionresidualmixed attention mechanismskeletal key point diagramgraph convolution
分类号:
TP394.1
DOI:
10.3969/j.issn.1001-4616.2024.04.012
文献标志码:
A
摘要:
多人姿态识别研究起步晚,成熟度低,复杂性高,因此网络深度也随之加深,梯度消失问题也随之加剧,网络性能也随之衰减,由此造成识别精度差,识别效率低等共性问题. 为解决这些问题,本文提出了一种残差混合注意力结合骨骼图卷积多人姿态识别模型. 通过自顶向下的研究路径,运用预处理干预方式对多人体图像进行检测并对单人体坐标定位及框选标定,生成骨骼关键点架构图,借助残差块对网络结构进行改进以抑制梯度弥散,加载混合注意力机制对模型赋能增效. 在MPII及MSCOCO2017两个数据集上对本文提出的模型进行了验证,结果显示该模型对多人姿态识别效果较好,在两个数据集上分布稳定,差异微小. 同时,将本文模型与对本领域各类重要文献中记载模型综合能力进行了比较,结果表明在各项精细指标上本模型都有一定程度提升,稳定性较好,分布较为均匀. 本文提出的多人姿态识别模型在跨数据集基础上表现出较好的识别效果和效率,为多人姿态识别的研究增添了动力.
Abstract:
The research of multi-person attitude recognition started lately,with low maturity and high complexity,so the network depth is also deepened,the problem of gradient vanishing is also intensified,and the network performance is also attenuated,resulting in the common problems of poor recognition accuracy and low recognition efficiency. To solve these problems,this paper proposes a model of skelton-based graph convolution with residual combined with mixed attention mechanism for multi-person posture recognition. Through the top-down research path,the pre-processing intervention was used to detect multi-body images and select the single body coordinate frame,and the bone key point architecture map was generated. With the residual block,the network structure was improved to suppress the gradient dispersion,and the mixed attention mechanism was loaded to enable and enhance the model. The proposed model is validated on two datasets,MPII and MSCOCO2017,and has stable distribution on the two datasets with small differences. At the same time,the model in this paper is compared with the comprehensive ability of the model recorded in various important literature in this field. In various fine indicators,the model has been improved to a certain extent,with good stability and uniform distribution. The multi-person pose recognition model proposed in this paper reflects the good recognition effect and efficiency based on the cross-data sets,and adds impetus to the study of multi-person gesture recognition.

参考文献/References:

[1]马境远,刘鲲,傅慧源. 一种融合多模态特征的视频暴力检测方法[J]. 重庆邮电大学学报(自然科学版),2021,33:861.
[2]蔡哲栋,应娜,郭春生,等. YOLOv3剪枝模型的多人姿态估计[J]. 中国图像图形学报,2021,26(4):837-846.
[3]XIA R J,LI Y S,LUO W H. LAGA-Net:local-and-global attention network for skeleton based action recognition[J]. IEEE transactions on multimedia,2022,24:2648-2661.
[4]YANG H Y,GU Y Z,ZHU J C,et al. PGCNTCA:pseudo graph convolutional network with temporal and channel-wise attention for skeleton-based action recognition[J]. IEEE access,2020,8:10040-10047.
[5]SULTANI W,CHEN C,SHAH M,et al. Real-world anomaly detection in surveillance videos[C]//Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:6479-6488.
[6]PENG W,SHI J G,ZHAO G Y. Spatial temporal graph deconvolutional network for skeleton-based human action recognition[J]. IEEE signal processing letters,2021,28:244-248.
[7]YU W,YANG K,YAO H,et al. Exploiting the complementary strengths of multi-layer CNN features for image retrieval[J]. Neurocomputing,2017,237:235-241.
[8]LIU J,SHAHROUDY A,WANG G,et al. Skeleton based online action prediction using scale selection network[J]. IEEE transactions on pattern analysis and machine intelligence,2020,42(6):1453-1467.
[9]PENG W,SHI J,VARANKA T,et al. Rethinking the ST-GCNs for 3D skeleton-based human action recognition[J]. Neurocomputing,2021,454:45-53.
[10]ZHANG S Y,YANG Y,JUN X,et al. Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks[J]. IEEE transactions on multimedia,2018,20(9):2230-2343.
[11]王佳铖,鲍劲松,刘天元等. 基于工件注意力的车间作业行为在线识别方法[J]. 计算机集成制造系统,2021,27(4):1099-1107.
[12]苏江毅,宋晓宁,吴小俊,等. 多模态轻量级图卷积人体骨架行为识别方法[J]. 计算机科学与探索,2021,15(4):733-742.
[13]JI S,XU W,YANG M,et al. 3D convolutional neural networks for human action recognition[J]. IEEE transactions on pattern analysis and machine intelligence,2012,35(1):221-231.
[14]黄海新,王瑞鹏,刘孝阳. 基于3D 卷积的人体行为识别技术综述[J]. 计算机科学,2020,47(S2):139-144.
[15]ZHANG B,WANG Y,HOU W,et al. Flexmatch:boosting semi-supervised learning with curriculum pseudo labeling[J]. Advances in neural information processing systems,2021,34:18408-18419.
[16]CHEN P,GAO Y,MA A J. Multi-level attentive adversarial learning with temporal dilation for unsupervised video domain adaptation[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision. Piscataway,NJ:IEEE Press,2022:1259-1268.
[17]TOSHEV A,SZEGEDY C. DeepPose:human pose estimation via deep neural networks[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus:IEEE,2014:1653-1660.
[18]LIN G,LI Q,LI M,et al. A novel bottleneck-activated feedback neural network model for time series prediction[J]. IEEE transactions on neural networks and learning systems,2021,32(4):1621-1635.
[19]ABDEL-BASSET M,HAWASHH,CHAKRABORTTYR K,et al. ST-DeepHAR:deep learning model for human activity recognition in loHT applications[J]. SENSORS,2021,8(6):4969-4979.
[20]HE K M,ZHANG X Y,REN S Q,et al. Delving deep into rectifiers:surpassing human-level performance on ImageNet classification[C]//2015 IEEE International Conference on Computer Vision. Santiago:IEEE,2016:1026-1034.
[21]WEI S E,RAMAKRISHNA V,KANADE T,et al. Convolutional pose machines[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas:IEEE,2016:4724-4732.
[22]陈斌,朱晋宁,东一舟. 基于残差整流增强卷积神经网络的表情识别[J]. 液晶与显示,2020,35(12):1299-1308.
[23]KOCABAS M,KARAGOZ S,AKBAS E. MultiPoseNet:fast multi-person pose estimation using pose residual network[C]//Proceedings of the 15th European Conference on Computer Vision. Munich:Springer,2018:417-433.
[24]秦晓飞,郭海洋,陈浩胜,等. 基于深度残差网络的多人姿态估计[J]. 光学仪器,2021,43(2):39-47.
[25]HU J,SHEN L,SUN G. Squeeze-adn-excitation networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City:IEEE,2018:7132-7141.
[26]CAO Z,GINES H,SIMON T,et al. Realtime multi-person 2D pose estimation using part affinity fields[C]//Proceedings of the Conference on Computer Vision and Pattern Recognition. Washington,D. C.,USA:IEEE Press,2017:7291-7299.
[27]KREISS S,BERTONI L,ALAHI A. PifPaf:composite fields for human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach,CA:IEEE,2019:11969-11978.
[28]CHENG B,XIAO B,WANG J,et al. HigherHRNet:scale-aware representation learning for bottom-up human pose estimation[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle,WA:IEEE,2020:5385-5394.
[29]HE K M,GKIOXARI G,DOLLÁR P,et al. Mask R-CNN[J]. IEEE transactions on pattern analysis and machine intelligence,2020,42(2):386-397.
[30]REDMON J,DIVVALA S,GIRSHICK R,et al. You only look once:unified,real-time object detection[C]//Computer Vision & Pattern Recognition. Las Vegas,NV,USA:IEEE,2016,1(2):779-788.
[31]WANG X,SHRIVASTAVA A,GUPTA A. A-fastrcnn:hard positive generation via adversary for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii,USA:IEEE,2017:2606-2615.
[32]PISHCHULIN L,INSAFUTDINOV E,TANG S,et al. Deepcut:joint subset partition and labeling for multi person pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas,NV:IEEE,2016:4929-4937.
[33]CHEN Y,WANG Z,PENG Y,et al. Cascaded pyramid network for multi-person pose estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lakecity:IEEE,2018:7103-7112.
[34]NEWELL A,YANG K Y,DENG J. Stacked hourglass networks for human pose estimationproceedings of the european[C]//Conference on Computer Vision.Berlin,Germany:Springer,2016:483-499.
[35]FANG H S,XIE S,TAI Y W,et al. RMPE:regional multi-person pose estimation[C]//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice,Italy:IEEE,2017:2353-2362.
[36]LIN T Y,MAIRE M,BELONGIE S,et al. Microsoft COCO:common objects in context[C]//European Conference on Computer Vision. Zurich:Springer,2014,8693:740-755.
[37]成科扬,吴金霞,王文杉,等. 融合时空图卷积的多人交互行为识别[J]. 中国图像图形学报,2021,26(7):1681-1691.

备注/Memo

备注/Memo:
收稿日期:2024-07-18.
基金项目:江苏省现代教育技术研究2023年度智慧校园专项(2023-R-107311).
通讯作者:陈 斌,博士,高级工程师,研究方向:模式识别、机器学习、大数据分析. E-mail:60167@njnu.edu.cn
更新日期/Last Update: 2024-12-15