|Table of Contents|

Research on Improved Algorithm of FT-Tree for Log Template Extraction(PDF)

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

Issue:
2021年02期
Page:
121-126
Research Field:
·计算机科学与技术·
Publishing date:

Info

Title:
Research on Improved Algorithm of FT-Tree for Log Template Extraction
Author(s):
Gu HaiyanZheng Qiwen
Department of Computer Information and Cyber Security,Jiangsu Police Institute,Nanjing 210031,China
Keywords:
Log templateextraction methodFT-tree algorithmApriori algorithmimproved algorithm
PACS:
TP183
DOI:
10.3969/j.issn.1001-4616.2021.02.017
Abstract:
Due to the authenticity and comprehensiveness of the event information recorded by computer log files,it has been widely used as electronic evidences in the investigation and handling of cases. In order to realize the fast and automatic analysis of massive log files,an efficient and reliable log template extraction method is needed. In view of the problems existing in the current log template extraction method,we take network server log as the research object of template extraction and propose an improved FT-tree algorithm. In the proposed algorithm,the pruning threshold is calculated by using Apriori algorithm,and then the pruning is controlled by using this threshold. An experiment of template extraction is carried out using the real log file of a university network server. The results show that the improved algorithm significantly improves the accuracy of log template extraction compared with the FT-tree algorithm,and can better meet the needs of practical application.

References:

[1] TANG L,LI T,PERNG C S. LogSig:Generating system events from raw textual logs[C]//Proceedings of the 20th Association for Computing Machiner(ACM)International Conference on Information and Knowledge Management. New York,NY,USA:ACM,2011:785-794.
[2]李文杰,闫世强,蒋莹,等. 自适应确定DBSCAN算法参数的算法研究[J]. 计算机工程与应用,2019,55(5):1-7,148.
[3]VAARANDI R,PIHELGAS M. LogCluster—A data clustering and pattern mining algorithm for event logs[C]//2015 11th International Conference on Network and Service Management(CNSM).Barcelona,Spain:IEEE,2015:1-7.
[4]NANDI A,MANDAL A,ATREJA S,et al. Anomaly detection using program control flow graph mining from execution logs[C]//Association for Computing Machinery International Conference on Knowledge Discovery and Data Mining(ACM SIGKDD). New York,NY,USA:ACM,2016:215-224.
[5]双锴,李怡雯,吕志恒,等. 基于归一化特征判别的日志模板挖掘算法[J/OL]. 北京邮电大学学报:1-6[2020-02-09]. https://doi.org/10.13190/j.jbupt.2019-033.
[6]崔元,张琢. 基于大规模网络日志的模板提取研究[J]. 计算机科学,2017(11A):448-452.
[7]ZHANG S L,MENG W B,BU J H,et al. Syslog processing for switch failure diagnosis and prediction in datacenter networks[C]//2017 IEEE/ACM 25th International Symposium on Quality of Service(IWQoS). Vilanova i la Geltrú,Spain:ACM,2017:1-10.
[8]刘洪歧,陈远平,马建化. 系统日志模板提取方法研究[J/OL]. 计算机系统应用,2019,28(10):239-244. http://www.c-s-a.org.cn/1003-3254/7112.html.
[9]ZHANG S L,SONG L,ZHANG M,et al. Efficient and robust syslog parsing for network devices in datacenter networks[J]. IEEE access,2020,8:30245-30261.
[10]李峰,李明祥,张宇敬. 局部迭代的快速K-means聚类算法[J/OL]. 计算机工程与应用:1-11[2020-07-01]. http://kns.cnki.net/kcms/detail/11.2127.tp.20190815.1706.027.html.
[11]廖纪勇,吴晟,刘爱莲. 基于布尔矩阵约简的Apriori算法改进研究[J]. 计算机工程与科学,2019,41(12):2231-2238.
[12]郭涛敏. 基于轻量化关联规则挖掘的安全日志审计技术研究[J]. 现代电子技术,2019,42(15):83-85.
[13]TAN P N,STEINBACH M,KUMAR V,等. 数据挖掘导论[M]. 北京:人民邮电出版社,2011:202-207.

Memo

Memo:
-
Last Update: 2021-06-30