[1]许伟忠,曹金鑫,金 弟,等.结合网络拓扑与节点内容的统一化半监督社团检测方法[J].南京师大学报(自然科学版),2023,46(01):130-138.[doi:10.3969/j.issn.1001-4616.2023.01.017]
 Xu Weizhong,Cao Jinxin,Jin Di,et al.A Unified Semi-supervised Community Detection Approach Integrating Network Topology and Node Contents[J].Journal of Nanjing Normal University(Natural Science Edition),2023,46(01):130-138.[doi:10.3969/j.issn.1001-4616.2023.01.017]
点击复制

结合网络拓扑与节点内容的统一化半监督社团检测方法()
分享到:

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

卷:
第46卷
期数:
2023年01期
页码:
130-138
栏目:
计算机科学与技术
出版日期:
2023-03-15

文章信息/Info

Title:
A Unified Semi-supervised Community Detection Approach Integrating Network Topology and Node Contents
文章编号:
1001-4616(2023)01-0130-09
作者:
许伟忠1曹金鑫1金 弟2孙 翔3张晓峰1刘 路3丁卫平1
(1.南通大学信息科学技术学院,江苏 南通 226019)
(2.天津大学智能与计算学部,天津300350)
(3.莱斯特大学信息学院 莱斯特,英国LE1 7RH)
Author(s):
Xu Weizhong1Cao Jinxin1Jin Di2Sun Xiang3Zhang Xiaofeng1Liu Lu3Ding Weiping1
(1.School of Information Science and Technology, Nantong University, Nantong 226019, China)
(2.College of Intelligence and Computing, Tianjin University, Tianjin 300350, China)
(3.School of Informatics, University of Leicester, Leicester, UK LE1 7RH)
关键词:
社团检测节点内容先验信息随机块非负矩阵分解
Keywords:
community detection node contents prior information stochastic block nonnegative matrix factorization
分类号:
TP182
DOI:
10.3969/j.issn.1001-4616.2023.01.017
文献标志码:
A
摘要:
在复杂网络分析中,社团检测发挥着越来越重要的作用,而在实际应用中如何提高社团检测的性能仍是一个共同研究目标. 由于网络节点中内容信息有助于社团识别,一些方法侧重于将网络拓扑和节点内容相结合,并且获得了不错效果. 此外,也有些方法借用节点之间的拓扑相似度,以提升实现社团检测性能. 鉴于此,我们提出了一个统一化方法,结合节点内容的半监督社团检测,简称SCDNC. 在该方法中,我们不仅将链接增强应用于社团检测,而且实现了拓扑和内容有机融合. 首先,我们运用随机模型来描述节点社团隶属度. 其次,我们构建出一个刻画节点内容社团隶属度的随机块模型,节点社团隶属度作为节点内容的权重向量,以实现拓扑和内容结合. 再次,我们利用网络中节点之间的拓扑相似度构建先验信息,即,使网络中节点与其最相似的邻居节点具有相同的隶属度分布. 最后,使用非负矩阵分解的方法学习新模型的统一化参数. 在带有真实标签的人工网络和真实网络上,我们对新方法与一些当前流行的社团检测方法进行了性能比较. 实验结果显示,通过融合节点内容和先验信息强化的链接,新方法检测社团的性能取得了显著提升.
Abstract:
Community detection plays an increasing important role in complex network analysis. There is still a goal that how to improve the performance of community detection in real applications. Due to the content in networks helpful to identifying communities,some methods focus on combining network topology with node content,which obtains no bad performance of community detection. Besides,some community detection enhancement methods are mainly based on designing the topological similarity of nodes to adjust network topology,which aims to achieve the enhancement. In order to further improve the quality of community detection,we propose a unified method,Semi-supervised Community Detection with Node Contents,shorted as SCDNC,which not only apply the enhancement into community detection,but also achieve the integration of network topology and node content. In the new method,firstly,we propose a stochastic block model to describe the community memberships of nodes. Secondly,we present another stochastic model to describe the community memberships of node contents,which utilizes community memberships of nodes as weight vectors of node contents. By now,integrating network topology with node content is achieved. Thirdly,we calculate the topological similarity of nodes by using links,and then model the prior information based on topological similarity,i.e.,we make nodes and their most similar neighbors have the same community membership. Finally,we present a nonnegative matrix factorization approach to obtain the parameters of the model. On both synthetic and real-world networks with ground-truths,we compare performance of the new method with the state-of-the-art methods. The experimental results show that the new method obtains significant improvement for community detection via combining node contents and network topology enhanced by prior information.

参考文献/References:

[1]LATORA V,VINCENZO N,GIOVANNI R. Complex networks:principles,methods and applications[M]. Cambridge:Cambridge University Press,2017.
[2]RIOLO M A,NEWMAN M E J. Consistency of community structure in complex networks[J]. Physical review E,2020,101(5):052306.
[3]LESKOVEC J. Large-scale graph representation learning[C]//IEEE International Conference on Big Data. Boston,MA:IEEE,2017:4-4.
[4]胡云,张舒,佘侃侃,等. 基于重叠社区发现的社会网络推荐算法研究[J]. 南京师大学报(自然科学版),2018,41(3):35-41.
[5]黄立威,李彩萍,张海粟,等. 一种基于因子图模型的半监督社区发现方法[J]. 自动化学报,2016,42(10):1520-1531.
[6]陈俊宇,周刚,南煜,等. 一种半监督的局部扩展式重叠社区发现方法[J]. 计算机研究与发展,2016,53(6):1376-1388.
[7]JIN D,ZHANG B B,SONG Y,et al. ModMRF:A modularity-based Markov Random Field method for community detection[J]. Neurocomputing,2020,405:218-228.
[8]NEWMAN M E J,CLAUSET A. Structure and inference in annotated networks[J]. Nature communications,2016,7(1):1-11.
[9]JIN D,WANG X B,LIU M Q,et al. Identification of generalized semantic communities in large social networks[J]. IEEE transactions on network science and engineering,2020,7(4):2966-2979.
[10]WANG X,JIN D,CAO X C,et al. Semantic community identification in large attribute networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Phoenix,AZ:AAAI,2016:265-271.
[11]RUAN Y Y,FUHRY D,PARTHASARATHY S. Efficient community detection in large networks using content and links[C]//Proceedings of the 22nd International Conference on World Wide Web. New York,NY,USA:ACM,2013:1089-1098.
[12]YANG J,MCAULEY J,LESKOVEC J. Community detection in networks with node attributes[C]//IEEE Inter-national Conference on Data Mining. Dallas,TX:IEEE,2013:1151-1156.
[13]HE D X,WANG Y Y,CAO J X,et al. A network embedding-enhanced Bayesian model for generalized community detection in complex networks[J]. Information sciences,2021,575:306-322.
[14]GIRVAN M,NEWMAN M E J. Community structure in social and biological networks[J]. Proceedings of the national academy of sciences,2002,99(12):7821-7826.
[15]YANG L,CAO X C,JIN D,et al. A unified semi-supervised community detection framework using latent space graph regularization[J]. IEEE transactions on cybernetics,2014,45(11):2585-2598.
[16]HE D X,WANG H C,JIN D,et al. A model framework for the enhancement of community detection in complex networks[J]. Physica A:statistical mechanics and its applications,2016,461:602-612.
[17]HOFMANN T. Probabilistic latent semantic indexing[C]//Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York,NY:ACM,1999:50-57.
[18]CAO J X,WANG H C,JIN D,et al. Combination of links and node contents for community discovery using a graph regularization approach[J]. Future generation computer systems,2019,91:361-370.
[19]ALLAHVERDYAN A E,VER STEEG G,GALSTYAN A. Community detection with and without prior information[J]. Europhysics letters,2010,90(1):18002.
[20]MA X K,GAO L,YONG X R,et al. Semi-supervised clustering algorithm for community structure detection in complex networks[J]. Physica A:statistical mechanics and its applications,2010,389(1):187-197.
[21]XU X W,YURUK N,FENG Z,et al. Scan:a structural clustering algorithm for networks[C]//Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose,CA:ACM,2007:824-833.
[22]CHOI S. Algorithms for orthogonal nonnegative matrix factorization[C]//2008 IEEE International Joint Conference on Neural Networks(IEEE World Congress on Computational Intelligence). Hongkong,China:IEEE,2008:1828-1832.
[23]LIU H F,WU Z H,LI X L,et al. Constrained nonnegative matrix factorization for image representation[J]. IEEE transactions on pattern analysis and machine intelligence,2011,34(7):1299-1311.
[24]YEUNG K Y,RUZZO W L. An empirical study on principal component analysis for clustering gene expression data[J]. Bioinformatics,2001,17(9):763-774.
[25]LANCICHINETTI A,FORTUNATO S. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities[J]. Physical review E,2009,80(1):016118.
[26]CAO J X,JIN D,DANG J W. Autoencoder based community detection with adaptive integration of network topology and node contents[C]//International Conference on Knowledge Engineering and Management. Changchun,China:Springer,2018:184-196.
[27]SEN P,NAMATA G,BILGIC M,et al. Collective classification in network data[J]. AI magazine,2008,29(3):93-106.

备注/Memo

备注/Memo:
收稿日期:2021-12-31.
基金项目:国家自然科学基金面上项目(61976120)、江苏省自然科学基金面上项目(BK20191445)、江苏省高等学校自然科学研究面上项目(21KJB520018)、南通大学人才引进项目(03081198).
通讯作者:曹金鑫,博士,讲师,研究方向:数据挖掘、机器学习、社团检测等. E-mail:alfred7c@ntu.edu.cn
更新日期/Last Update: 2023-03-15