[1]柴变芳,李 政,赵晓鹏,等.基于泛化图卷积神经网络的深度文档聚类模型[J].南京师大学报(自然科学版),2024,(01):82-90.[doi:10.3969/j.issn.1001-4616.2024.01.010]
 Chai Bianfang,Li Zheng,Zhao Xiaopeng,et al.Deep Document Clustering Model Based on Generalization Graph Convolutional Neural Network[J].Journal of Nanjing Normal University(Natural Science Edition),2024,(01):82-90.[doi:10.3969/j.issn.1001-4616.2024.01.010]
点击复制

基于泛化图卷积神经网络的深度文档聚类模型()
分享到:

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

卷:
期数:
2024年01期
页码:
82-90
栏目:
计算机科学与技术
出版日期:
2024-03-15

文章信息/Info

Title:
Deep Document Clustering Model Based on Generalization Graph Convolutional Neural Network
文章编号:
1001-4616(2024)01-0082-09
作者:
柴变芳1李 政1赵晓鹏2王荣娟3
(1.河北地质大学信息工程学院,河北 石家庄 050031)
(2.河北省财政厅一体化系统运维中心,河北 石家庄 050091)
(3.河北地质职工大学,河北 石家庄 050086)
Author(s):
Chai Bianfang1Li Zheng1Zhao Xiaopeng2Wang Rongjuan3
(1.College of Information Engineering,Hebei GEO University,Shijiazhuang 050031,China)
(2.Integrated system operation and maintenance center,Hebei Provincial Department of Finance,Shijiazhuang 050091,China)
(3.Hebei Vocational College of Geology,Shijiazhuang 050086,China)
关键词:
图神经网络深度图聚类文本分类文本表示
Keywords:
graph neural networkdeep graph clusteringtext classificationtext representation
分类号:
TP391
DOI:
10.3969/j.issn.1001-4616.2024.01.010
文献标志码:
A
摘要:
文本分类是自然语言处理中一项重要任务,基于图神经网络的文本分类因其可建模文本间的多种交互成为一种主流方法. 但现有方法大都依赖标签,而真实标签难以获取. 提出一个基于图泛化卷积神经网络的深度文档聚类模型(generalization graph convolutional neural network-deep document clustering,GGCN-DDC),同时实现文本表示学习和无监督文档分类. 该模型首先将每个文档建模为文本图; 然后采用泛化卷积层学习更有区分力的文档词特征表示和文档表示; 最后通过文档聚类损失和文档图重建损失约束参数学习算法. 在3个基准数据集上的实验表明,GGCN-DDC在多个指标上均优于其他基准算法.
Abstract:
Text classification is an important task in natural language processing. The method of text classification on graph neural network has become a mainstream method since it can model the interactions among texts. However,most of the existing graph-based classification methods rely on real labels,which are difficult to captain. A deep document clustering model based on graph generalization convolutional neural network(GGCN-DDC)is proposed,which can realize unsupervised text classification while learning text representation. Firstly,the documents are modeled as a text graph. Then generalized convolution layer is used to learn the more distinguishable feature representations of words and the document representations. Finally,The learning algorithm of parameters is constrained by document clustering and reconstructing document graph. Experiments on three benchmark datasets show that GGCN-DDC outperforms other benchmark algorithms on several measures.

参考文献/References:

[1]KOSAR A,PAUW G D,DAELEMANS W. Unsupervised text classification with neural word embeddings[J]. Computational linguistics in the netherlands journal,2022(12):165-181.
[2]LI Q,PENG H,LI J X,et al. A survey on text classification:from traditional to deep learning[J]. ACM transactions on intelligent systems and technology,2022,13(2):41.
[3]KIM Y. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg,PA:ACL Press,2014:1746-1751.
[4]LIU P F,QIU X P,HUANG X J. Recurrent neural network for text classification with multi-task learning[C]//Proceedings of the 25th International Joint Conference on Artificial Intelligence. Palo Alto,CA:AAAI Press,2016:2873-2879.
[5]周玄郎,邱卫根,张立臣. 融合文本图卷积和集成学习的文本分类方法[J]. 计算机应用研究,2022,39(9):2621-2625.
[6]KIPF T N,WELLING M. Semi-supervised classification with graph convolutional networks[C]//Proceedings of the 5th International Conference on Learning Representations.(2017-02-22)[2023-03-10]. https://doi.org/10.48550/arXiv.1609.02907.
[7]YAO L,MAO C S,LUO Y. Graph convolutional networks for text classification[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto,CA:AAAI Press,2019:7370-7377.
[8]DAI Y,SHOU L J,GONG M,et al. Graph fusion network for text classification[J]. Knowledge-based systems,2022,236:107659.
[9]ZHANG Y F,YU X L,CUI Z Y,et al. Every document owns its structure:inductive text classification via graph neural networks[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg,PA:ACL Press,2020:334-339.
[10]CUI H Y,WANG G K,LI Y X,et al. Self-training method based on GCN for semi-supervised short text classification[J]. Information sciences,2022,611:18-29.
[11]HAJ-YAHIA Z,SIEG A,LéA A DELERIS. Towards unsupervised text classification leveraging experts and word embeddings[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg,PA:ACL Press,2019:371-379.
[12]SCHOPF T,BRAUN D,MATTHES F. Lbl2Vec:An embedding-based approach for unsupervised document retrieval on predefined topics[J/OL].(2022-10-12)[2023-3-10]. https://doi.org/10.48550/arXiv.2210.06023.
[13]TIAN F,GAO B,CUI Q,et al. Learning deep representations for graph clustering[C]//Proceedings of the 28th AAAI Conference on Artificial Intelligence. Palo Alto,CA:AAAI Press,2014:1293-1299.
[14]ZHANG X T,LIU H,LI Q M,et al. Attributed graph clustering via adaptive graph convolution[C/OL].(2019-08-01)[2023-3-10]. https://doi.org/10.24963/ijcai.2019/601.
[15]ZHU D Y,CHEN S D,MA X H,et al. Adaptive graph convolution using heat kernel for attributed graph clustering[J]. Applied sciences,2020,10(2):1473.
[16]WANG CC,PAN S R,HU R Q,et al. Attributed graph clustering:a deep attentional embedding approach[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence. Palo Alto,CA:AAAI Press,2019:3670-3676.
[17]PENNINGTON J,SOCHER R,MANNING C. Glove:Global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg,PA:ACL Press,2014:1532-1543.
[18]CUI G Q,ZHOU J,YANG C,et al. Adaptive graph encoder for attributed graph embedding[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York:ACM Press,2020:976-985.
[19]PEROZZI B,AI-RFOU R,SKIENA S. Deepwalk:online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press,2014:701-710.
[20]YANG C,LIU Z Y,ZHAO D L,et al. Network representation learning with rich text information[C]//Proceedings of the 24th International Joint Conference on Artificial Intelligence. Palo Alto,CA:AAAI Press,2015:2111-2117.
[21]KIPF T N,WELLING M. Variational graph auto-encoders[C/OL]//Proc of 30th Conference on Neural Information Processing Systems Workshop on Bayesian Deep Learning.(2016-11-21)[2023-3-10]. https://doi.org/10.48550/arXiv.1611.07308.
[22]BO D Y,WANG X,SHI C,et al. Structural deep clustering network[C]//Proceedings of the Web Conference 2020. New York:ACM Press,2020:1400-1410.

备注/Memo

备注/Memo:
收稿日期:2023-05-29.
基金项目:河北省高等学校科学技术研究项目(ZD2020175)、河北地质大学2023国家预研项目(KY202310).
通讯作者:王荣娟,副教授,主要研究方向:自然语言处理、图挖掘、机器学习等.E-mail:30673253@qq.com
更新日期/Last Update: 2024-03-15