Abstract

Info

Title:: Research on Newspaper Layout Segmentation Method Based on Transformer

Author(s):: Zhu Yifan; Gao Hua; Ye Ning; (College of Information Science and Technology & Artificial Intelligence,Nanjing Forestry University,Nanjing 210037,China)

Keywords:: layout segmentation; DETR; ShufflNet V2; Feature Pyramid Networks(FPN); ECA

PACS:: TP391

DOI:: 10.3969/j.issn.1001-4616.2025.01.014

Abstract:: The retrieval and research of information in the context of big data poses a challenge to the digitalization of massive traditional paper media. Thanks to the continuous development of computer vision and artificial intelligence methods,DETR model can be applied to newspaper layout segmentation. In view of the problems existing in the original model in layout segmentation,such as slow detection speed,large number of parameters and inaccurate classification,this paper proposes an improved model using ShuffleNet V2 lightweight backbone network,which can effectively improve computing efficiency and reduce the number of model parameters,thus easing the computing pressure of Transformer structure. At the same time,through the feature pyramid structure,the model can fully integrate the global information and detail information,and significantly enhance the recognition ability of multi-scale targets. In addition,the model also introduces Efficient Channel Attention(EAC)module to extract key target features to effectively suppress irrelevant background information and achieve lightweight design while ensuring segmentation performance. The experimental results show that the parameter number of the improved model is 38.5 M,the frame rate(FPS)is up to 47.5 img/s,and the mAP_0.5 is up to 0.806. Compared with the original DETR model,the improved model reduces the number of parameters by 2.8 M,increases the frame rate by 28.3 img/s and improves mAP_0.5 by 3.2%. The model proposed in this paper can provide early technical support for OCR recognition of newspaper layout.

References:

[1]COÜASNON B,LEMAITRE A. Recognition of tables and forms[J]. Handbook of document image processing and recognition,2019:647-677.
[2]ZANIBBI R,BLOSTEIN D,CORDY J R. A survey of table recognition:models,observations,transformations,and inferences[J]. Document analysis and recognition,2004,7:1-16.
[3]E SILVA A C,JORGE A M,TORGO L. Design of an end-to-end method to extract information from tables[J]. International journal of document analysis and recognition(IJDAR),2006,8:144-171.
[4]KHUSRO S,LATIF A,ULLAH I. On methods and tools of table detection,extraction and annotation in PDF documents[J]. Journal of information science,2015,41(1):41-57.
[5]EMBLEY D W,HURST M,LOPRESTI D,et al. Table-processing paradigms:a research survey[J]. International journal of document analysis and recognition(IJDAR),2006,8:66-86.
[6]CESARINI F,MARINAI S,SARTI L,et al. Trainable table location in document images[C]//2002 International Conference on Pattern Recognition. Quebec,Canada:IEEE,2002,3:236-240.
[7]YANG X,YUMER E,ASENTE P,et al. Learning to extract semantic structure from documents using multimodal fully convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,HI,USA:IEEE,2017:5315-5324.
[8]HE D,COHEN S,PRICE B,et al. Multi-scale multi-task fcn for semantic page segmentation and table detection[C]//2017 14th IAPR International Conference on Document Analysis and Recognition(ICDAR). Kyoto,Japan:IEEE,2017,1:254-261.
[9]孙皓月. 基于深度学习的文档版面分析方法研究[D]. 福建:厦门理工学院,2022.
[10]张洪红. 基于注意力机制的文档图像版面分析算法[D]. 山东:青岛科技大学,2023.
[11]杨陈慧,周小亮,张恒,等. 基于Multi-WHFPN与SimAM注意力机制的版面分割[J]. 电子测量技术,2024,47(1):159-168.
[12]付苗苗,邓淼磊,张德贤. 基于深度学习和Transformer的目标检测算法[J]. 计算机工程与应用,2023,59(1):37-48.
[13]李沂杨,陆声链,王继杰,等. 基于Transformer的DETR目标检测算法研究综述[J]. 计算机工程,2025:1-20.
[14]李建,杜建强,朱彦陈,等. 基于Transformer的目标检测算法综述[J]. 计算机工程与应用,2023,59(10):48-64.
[15]ZOU Z,CHEN K,SHI Z,et al. Object detection in 20 years:a survey[J]. Proceedings of the IEEE,2023,111(3):257-276.
[16]ZITNICK C L,DOLLáR P. Edge boxes:locating object proposals from edges[C]//Computer Vision-ECCV 2014:13th European Conference. Zurich,Switzerland:Springer International Publishing,2014:391-405.
[17]HU Q,ZHAI L. RGB-D image multi-target detection method based on 3D DSF R-CNN[J]. International journal of pattern recognition and artificial intelligence,2019,33(8):1954026.
[18]GIRSHICK R,DONAHUE J,DARRELL T,et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus,OH,USA:IEEE,2014:580-587.
[19]许德刚,王露,李凡.深度学习的典型目标检测算法研究综述[J]. 计算机工程与应用,2021,57(8):10-25.
[20]REDMON J,DIVVALA S,GIRSHICK R,et al. You only look once:unified,real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Vegas,NV,USA:IEEE,2016:779-788.
[21]LIU W,ANGUELOV D,ERHAN D,et al. Ssd:single shot multibox detector[C]//Computer Vision-ECCV 2016:14th European Conference. Amsterdam,The Netherlands:Springer International Publishing,2016:21-37.
[22]CARION N,MASSA F,SYNNAEVE G,et al. End-to-end object detection with transformers[C]//European Conference on Computer Vision. Cham:Springer International Publishing,2020:213-229.
[23]李宗刚,宋秋凡,杜亚江,等. 基于改进DETR的机器人铆接缺陷检测方法研究[J]. 铁道科学与工程学报,2024,21(4):1690-1700.
[24]徐浩东. 基于DETR的自动驾驶汽车交通标志识别系统研究[D]. 陕西:西京学院,2022.
[25]崔颖,韩佳成,高山,等. 基于改进Deformable-DETR的水下图像目标检测方法[J]. 应用科技,2024,51(1):30-36,91.
[26]江志鹏,王自全,张永生,等. 基于改进Deformable DETR的无人机视频流车辆目标检测算法[J]. 计算机工程与科学,2024,46(1):91-101.
[27]武庭润,高建虎,常德宽,等. 基于Transformer的地震数据断层识别[J]. 石油地球物理勘探,2024,59(6):1217-1224.
[28]冯程,杨海,王淑娴,等. 基于自上而下掩码生成与层叠Transformer的多模态情感分析[J]. 计算机工程与应用,2025:1-11.
[29]MA N,ZHANG X,ZHENG H T,et al. Shufflenet v2:practical guidelines for efficient cnn architecture design[C]//Proceedings of the European Conference on Computer Vision(ECCV). Munich,Germany:Springer,2018:116-131.
[30]LIN T Y,DOLLáR P,GIRSHICK R,et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,HI,USA:IEEE,2017:2117-2125.
[31]WANG Q,WU B,ZHU P,et al. ECA-Net:efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle,WA,USA:IEEE,2020:11534-11542.

Research on Newspaper Layout Segmentation Method Based on Transformer(PDF)

《南京师大学报（自然科学版）》[ISSN:1001-4616/CN:32-1239/N]

Info

References:

Memo

Common functions

Navigate

Tools

Statistics