«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.issn.1001-4616.2025.01.014]
点击复制

基于Transformer的报纸版面分割方法研究()

分享到：

《南京师大学报（自然科学版）》[ISSN:1001-4616/CN:32-1239/N]

卷:: 48
期数:: 2025年01期

页码:: 109-118

栏目:: 计算机科学与技术

出版日期:: 2025-02-15

文章信息/Info

Title:: Research on Newspaper Layout Segmentation Method Based on Transformer

文章编号:: 1001-4616(2025)01-0109-10

作者:: 朱一凡; 高华; 业宁; (南京林业大学信息科学技术学院、人工智能学院,江苏南京 210037)

Author(s):: Zhu Yifan; Gao Hua; Ye Ning; (College of Information Science and Technology & Artificial Intelligence,Nanjing Forestry University,Nanjing 210037,China)

关键词:: 版面分割; DETR; ShuffleNet V2; 特征金字塔; ECA通道注意力

Keywords:: layout segmentation; DETR; ShufflNet V2; Feature Pyramid Networks(FPN); ECA

分类号:: TP391

DOI:: 10.3969/j.issn.1001-4616.2025.01.014

文献标志码:: A

摘要:: 大数据背景下信息的检索与研究对海量传统纸媒的数字化提出了挑战,得益于不断发展的计算机视觉与人工智能方法,DETR模型可被应用于报纸版面分割. 针对原模型在版面分割中存在的检测速度慢、参数量大及分类不精准等问题,本文提出了采用ShuffleNet V2轻量级主干网络的改进模型,该方法可有效提升计算效率并减少模型参数量,从而缓解Transformer结构的计算压力. 同时,通过特征金字塔结构,该模型能够充分融合全局信息及细节信息,显著增强多尺度目标的识别能力. 此外,该模型还引入高效通道注意力(ECA)模块来提取关键目标特征,以此有效抑制无关背景信息,在保证分割性能的同时实现轻量化设计. 实验结果表明,改进模型在报纸版面分割任务中的参数量为38.5 M,帧率(FPS)高达47.5 img/s,mAP_0.5达到了0.806. 与原DETR模型相比,改进模型在参数量上减少了2.8 M,帧率提高了28.3 img/s,mAP_0.5提升了3.2%. 本文提出的模型还可以为报纸版面的OCR识别提供前期技术支持.

Abstract:: The retrieval and research of information in the context of big data poses a challenge to the digitalization of massive traditional paper media. Thanks to the continuous development of computer vision and artificial intelligence methods,DETR model can be applied to newspaper layout segmentation. In view of the problems existing in the original model in layout segmentation,such as slow detection speed,large number of parameters and inaccurate classification,this paper proposes an improved model using ShuffleNet V2 lightweight backbone network,which can effectively improve computing efficiency and reduce the number of model parameters,thus easing the computing pressure of Transformer structure. At the same time,through the feature pyramid structure,the model can fully integrate the global information and detail information,and significantly enhance the recognition ability of multi-scale targets. In addition,the model also introduces Efficient Channel Attention(EAC)module to extract key target features to effectively suppress irrelevant background information and achieve lightweight design while ensuring segmentation performance. The experimental results show that the parameter number of the improved model is 38.5 M,the frame rate(FPS)is up to 47.5 img/s,and the mAP_0.5 is up to 0.806. Compared with the original DETR model,the improved model reduces the number of parameters by 2.8 M,increases the frame rate by 28.3 img/s and improves mAP_0.5 by 3.2%. The model proposed in this paper can provide early technical support for OCR recognition of newspaper layout.

参考文献/References:

[1]COÜASNON B,LEMAITRE A. Recognition of tables and forms[J]. Handbook of document image processing and recognition,2019:647-677.
[2]ZANIBBI R,BLOSTEIN D,CORDY J R. A survey of table recognition:models,observations,transformations,and inferences[J]. Document analysis and recognition,2004,7:1-16.
[3]E SILVA A C,JORGE A M,TORGO L. Design of an end-to-end method to extract information from tables[J]. International journal of document analysis and recognition(IJDAR),2006,8:144-171.
[4]KHUSRO S,LATIF A,ULLAH I. On methods and tools of table detection,extraction and annotation in PDF documents[J]. Journal of information science,2015,41(1):41-57.
[5]EMBLEY D W,HURST M,LOPRESTI D,et al. Table-processing paradigms:a research survey[J]. International journal of document analysis and recognition(IJDAR),2006,8:66-86.
[6]CESARINI F,MARINAI S,SARTI L,et al. Trainable table location in document images[C]//2002 International Conference on Pattern Recognition. Quebec,Canada:IEEE,2002,3:236-240.
[7]YANG X,YUMER E,ASENTE P,et al. Learning to extract semantic structure from documents using multimodal fully convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,HI,USA:IEEE,2017:5315-5324.
[8]HE D,COHEN S,PRICE B,et al. Multi-scale multi-task fcn for semantic page segmentation and table detection[C]//2017 14th IAPR International Conference on Document Analysis and Recognition(ICDAR). Kyoto,Japan:IEEE,2017,1:254-261.
[9]孙皓月. 基于深度学习的文档版面分析方法研究[D]. 福建:厦门理工学院,2022.
[10]张洪红. 基于注意力机制的文档图像版面分析算法[D]. 山东:青岛科技大学,2023.
[11]杨陈慧,周小亮,张恒,等. 基于Multi-WHFPN与SimAM注意力机制的版面分割[J]. 电子测量技术,2024,47(1):159-168.
[12]付苗苗,邓淼磊,张德贤. 基于深度学习和Transformer的目标检测算法[J]. 计算机工程与应用,2023,59(1):37-48.
[13]李沂杨,陆声链,王继杰,等. 基于Transformer的DETR目标检测算法研究综述[J]. 计算机工程,2025:1-20.
[14]李建,杜建强,朱彦陈,等. 基于Transformer的目标检测算法综述[J]. 计算机工程与应用,2023,59(10):48-64.
[15]ZOU Z,CHEN K,SHI Z,et al. Object detection in 20 years:a survey[J]. Proceedings of the IEEE,2023,111(3):257-276.
[16]ZITNICK C L,DOLLáR P. Edge boxes:locating object proposals from edges[C]//Computer Vision-ECCV 2014:13th European Conference. Zurich,Switzerland:Springer International Publishing,2014:391-405.
[17]HU Q,ZHAI L. RGB-D image multi-target detection method based on 3D DSF R-CNN[J]. International journal of pattern recognition and artificial intelligence,2019,33(8):1954026.
[18]GIRSHICK R,DONAHUE J,DARRELL T,et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus,OH,USA:IEEE,2014:580-587.
[19]许德刚,王露,李凡.深度学习的典型目标检测算法研究综述[J]. 计算机工程与应用,2021,57(8):10-25.
[20]REDMON J,DIVVALA S,GIRSHICK R,et al. You only look once:unified,real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Vegas,NV,USA:IEEE,2016:779-788.
[21]LIU W,ANGUELOV D,ERHAN D,et al. Ssd:single shot multibox detector[C]//Computer Vision-ECCV 2016:14th European Conference. Amsterdam,The Netherlands:Springer International Publishing,2016:21-37.
[22]CARION N,MASSA F,SYNNAEVE G,et al. End-to-end object detection with transformers[C]//European Conference on Computer Vision. Cham:Springer International Publishing,2020:213-229.
[23]李宗刚,宋秋凡,杜亚江,等. 基于改进DETR的机器人铆接缺陷检测方法研究[J]. 铁道科学与工程学报,2024,21(4):1690-1700.
[24]徐浩东. 基于DETR的自动驾驶汽车交通标志识别系统研究[D]. 陕西:西京学院,2022.
[25]崔颖,韩佳成,高山,等. 基于改进Deformable-DETR的水下图像目标检测方法[J]. 应用科技,2024,51(1):30-36,91.
[26]江志鹏,王自全,张永生,等. 基于改进Deformable DETR的无人机视频流车辆目标检测算法[J]. 计算机工程与科学,2024,46(1):91-101.
[27]武庭润,高建虎,常德宽,等. 基于Transformer的地震数据断层识别[J]. 石油地球物理勘探,2024,59(6):1217-1224.
[28]冯程,杨海,王淑娴,等. 基于自上而下掩码生成与层叠Transformer的多模态情感分析[J]. 计算机工程与应用,2025:1-11.
[29]MA N,ZHANG X,ZHENG H T,et al. Shufflenet v2:practical guidelines for efficient cnn architecture design[C]//Proceedings of the European Conference on Computer Vision(ECCV). Munich,Germany:Springer,2018:116-131.
[30]LIN T Y,DOLLáR P,GIRSHICK R,et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,HI,USA:IEEE,2017:2117-2125.
[31]WANG Q,WU B,ZHU P,et al. ECA-Net:efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle,WA,USA:IEEE,2020:11534-11542.

备注/Memo

备注/Memo:: 收稿日期:2024-06-07.
基金项目:国家重点研发计划项目(2016YFD0600101).
通讯作者:业宁,博士,教授,研究方向:生物信息学、数据挖掘和机器学习. E-mail:yening@njfu.edu.cn

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed396
全文下载/Downloads383
评论/Comments

更新日期/Last Update: 2025-02-15