«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.issn.1001-4616.2025.03.010]
点击复制

基于NLP和图像分类模型的中文科技文献双模态分类方法()

《南京师大学报（自然科学版）》[ISSN:1001-4616/CN:32-1239/N]

卷:: 48
期数:: 2025年03期

页码:: 84-92

栏目:: 计算机科学与技术

出版日期:: 2025-06-20

文章信息/Info

Title:: A Bimodal Classification Method for Chinese Scientific and Technological Literature Based on NLP and Image Classification Models

文章编号:: 1001-4616(2025)03-0084-09

作者:: 王峥¹; 丁熠²; 陈海明³; 陈盈⁴; (1.台州学院学报编辑部,浙江台州 318000)
(2.电子科技大学信息与软件工程学院,四川成都 610054)
(3.宁波大学信息科学与工程学院,浙江宁波 315211)
(4.台州学院电子与信息工程学院,浙江台州 318000)

Author(s):: Wang Zheng¹; Ding Yi²; Chen Haiming³; Chen Ying⁴; (1.Journal Editorial Department,Taizhou University,Taizhou 318000,China)
(2.School of Information and Software Engineering,University of Electronic Science and Technology of China,Chengdu 610054,China)
(3.Faculty of Electrical Engineering and Computer Science,Ningbo University,Ningbo 315211,China)
(4.College of Electronics and Information Engineering,Taizhou University,Taizhou 318000,China)

关键词:: 科技文献分类; 图像分类; 多模态特征; 自然语言处理; 深度学习; YOLOv7

Keywords:: classification of scientific and technological literature; document image classification; multi-modal features; natural langliteratureuage processing; deep learning; YOLOv7

分类号:: TP391.1

DOI:: 10.3969/j.issn.1001-4616.2025.03.010

文献标志码:: A

摘要:: 随着当前对科技文献管理和组织要求的急剧增加,对于更为可扩展、精确且自动化的文献分类方式的需求也更高. 为了有效应对海量科技文献数据的分析难题,提出了融合YOLOv7图像分类模型和自然语言处理(NLP)模型的多模态文献分析引擎. 该架构充分挖掘文档中的自然语言文本、描述性图像以及两者间的内在关联这3种关键信息,通过综合训练流程整合不同模态的深度学习网络,达成相较于单模态分类方法更优的分类精准度. 同时,将所提方法应用到中文科技文献数据集,并依据中图分类号对文献进行了分类训练. 结果表明,所提双模态文献分类方法具有更高的分类准确性,有助于企事业单位和研究机构在数据与知识管理方面的效率提升.

Abstract:: Currently,the demand for more scalable,accurate,and automated document classification is increasing due to the sharp increase in the management and organization of technical literature. To solve the problem of effective data analysis from massive scientific literature data,a multi-modal literature analysis engine is proposed,which combines the YOLOv7 image classification model and natural language processing model. This architecture utilizes three types of information,including natural language text in the document,descriptive images,and the relationship between them. By integrating and training deep learning networks of different modals,the multi-modal approach achieves better classification accuracy than the unimodal method. The proposed method is applied to a Chinese scientific literature dataset,and the model is trained to classify documents based on the Chinese Library Classification system. The results show that the proposed method has higher classification accuracy than unimodal methods,which helps promote data and knowledge management for enterprises,institutions,and research organizations.

参考文献/References:

[1]时莹,王铮. 数字化转型中的高校图书馆数字化服务能力影响因素研究[J]. 图书情报工作,2022,66(23):41-50.
[2]RAHMAN M H,AHMAD A,ZAKARIA S. A literature review on digital content management:trends and future challenges[J]. Digital library perspectives,2023,39(1):97-110.
[3]柴变芳,李政,赵晓鹏,等. 基于泛化图卷积神经网络的深度文档聚类模型[J]. 南京师大学报(自然科学版),2024,47(1):82-90.
[4]GOYAL A,PREM PRAKASH V. Statistical and deep learning approaches for literary genre classification[C]//Advances in Data and Information Sciences:Proceedings of ICDIS 2021. Singapore:Springer Singapore,2022:297-305.
[5]熊帆,陈田,卞佰成,等. 基于卷积循环神经网络的芯片表面字符识别[J]. 浙江大学学报(工学版),2023,57(5):948-956.
[6]IMAM N H,VASSILAKIS V G,KOLOVOS D. OCR post-correction for detecting adversarial text images[J]. Journal of information security and applications,2022,66(1):1-15.
[7]崔磊,徐毅恒,吕腾超,等. 文档智能:数据集,模型和应用[J]. 中文信息学报,2022,36(6):1-19.
[8]MINAEE S,KALCHBRENNER N,CAMBRIA E,et al. Deep learning-based text classification:a comprehensive review[J]. ACM computing surveys(CSUR),2021,54(3):1-40.
[9]UMER M,IMTIAZ Z,AHMAD M,et al. Impact of convolutional neural network and FastText embedding on text classification[J]. Multimedia tools and applications,2023,82(4):5569-5585.
[10]张虎,柏萍. 融入句子中远距离词语依赖的图卷积短文本分类方法[J]. 计算机科学,2022,49(2):279-284.
[11]KAMATERI E,MICHAIL S,KONSTANTINOS D. An ensemble framework for patent classification[J]. World patent information,2023,75(12):102233-102243.
[12]SHALABY M,STUTZKI J,SCHUBERT M,et al. A LSTM approach to patent classification based on fixed hierarchy vectors[C]//Proceedings of the 2018 SIAM International Conference on Data Mining. San Diego:SIAM,2018:495-503.
[13]HAGHIGHIAN ROUDSARI A,AFSHAR J,LEE W,et al. PatentNet:multi-label classification of patent documents using deep learning-based language understanding[J]. Scientometrics,2022,127(1):207-231.
[14]AUDEBERT N,HEROLD C,SLIMANI K,et al. Multimodal deep networks for text and image-based document classification[C]//Machine Learning and Knowledge Discovery in Databases:International Workshops of ECML PKDD 2019. Würzburg:Springer,2020:427-443.
[15]景丽,姚克. 融合知识图谱和多模态的文本分类研究[J]. 计算机工程与应用,2023,59(2):102-109.
[16]QIU M,ZHANG Y,MA T,et al. Convolutional-neural-network-based multilabel text classification for automatic discrimination of legal documents[J]. Sensors and materials,2020,32(8):2659-2672.
[17]TSUTSUI S,CRANDALL D J. A data driven approach for compound figure separation using convolutional neural networks[C]//2017 14th IAPR International Conference on Document Analysis and Recognition(ICDAR). Piscataway:IEEE,2018:533-540.
[18]BAI H Y,HUANG Z L,HAO A,et al. Gated character-aware convolutional neural network for effective automated essay scoring[C]//IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. New York:Association for Computing Machinery,2021:351-359.
[19]WANG C Y,BOCHKOVSKIY A,LIAO H Y M. YOLOv7:Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver:IEEE,2023:7464-7475.
[20]HOU Q,ZHOU D,FENG J. Coordinate attention for efficient mobile network design[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021. Piscataway:IEEE,2021:13713-13722.
[21]ZHANG J,ZONG C. Neural machine translation:challenges,progress and future[J]. Science China technological sciences,2020,63(10):2028-2050.
[22]YAO H S,ZHU D L,JIANG B,et al. Negative log likelihood ratio loss for deep neural network classification[C]//Proceedings of the Future Technologies Conference(FTC)2019. Berlin:Springer,2020:276-282.
[23]RAFI M,ABID F. Learning local and global features for optimized multi-label text classification[C]//2022 International ARAB Conference on Information Technology(ACIT). Piscataway:IEEE,2022:1-9.

相似文献/References:

[1]朱志宾,丁世飞.基于TWSVM的图像分类[J].南京师大学报(自然科学版),2014,37(03):8.
　Zhu Zhibin,Ding Shifei.Image Classification Based on Twin Support Vector Machines[J].Journal of Nanjing Normal University(Natural Science Edition),2014,37(03):8.
[2]舒速,杨明,赵振凯.基于分水岭的高光谱图像分类方法[J].南京师大学报(自然科学版),2015,38(01):91.
　Shu Su,Yang Ming,Zhao Zhenkai.Hyperspectral Image Classification Method Based on Watershed[J].Journal of Nanjing Normal University(Natural Science Edition),2015,38(03):91.
[3]王芃,吕静,沈华乐.基于局部结构保持的自适应有序回归学习[J].南京师大学报(自然科学版),2019,42(02):9.[doi:10.3969/j.issn.1001-4616.2019.02.002]
　Wang Peng,Lü Jing,Shen Huale.Improved Adaptive Ordinal Regression LearningBased on Locality Structure Preserving[J].Journal of Nanjing Normal University(Natural Science Edition),2019,42(03):9.[doi:10.3969/j.issn.1001-4616.2019.02.002]
[4]汤凯,何庆,赵群,等.基于改进的深度残差网络的图像识别[J].南京师大学报(自然科学版),2019,42(03):115.[doi:10.3969/j.issn.1001-4616.2019.03.015]
　Tang Kai,He Qing,Zhao Qun,et al.Image Recognition Based on Improved Deep Neural Network[J].Journal of Nanjing Normal University(Natural Science Edition),2019,42(03):115.[doi:10.3969/j.issn.1001-4616.2019.03.015]
[5]黄文秀,周术诚,陈新元,等.基于图像融合和注意力机制的图像分类[J].南京师大学报(自然科学版),2025,48(03):120.[doi:10.3969/j.issn.1001-4616.2025.03.014]
　Huang Wenxiu,Zhou Shucheng,Chen Xinyuan,et al.Image Classification Based on Image Fusion and Attention Mechanism[J].Journal of Nanjing Normal University(Natural Science Edition),2025,48(03):120.[doi:10.3969/j.issn.1001-4616.2025.03.014]

备注/Memo

备注/Memo:: 收稿日期:2024-12-12.
基金项目:国家自然科学基金面上资助项目(61976149)、浙江省自然科学基金重点资助项目(Z20F020008)、浙江省普通本科高校“十四五”教学改革资助项目(jg20220563)、2025年度浙江省自然科学基金资助项目(LMS25A010011)、浙江省科技厅软科学研究计划资助项目(2025C35030).
通讯作者:丁熠,博士,教授,研究方向:计算机视觉,人工智能. E-mail:yi.ding@uestc.edu.cn; 陈盈,教授,研究方向:人

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed439
全文下载/Downloads696
评论/Comments

更新日期/Last Update: 2025-06-20