文章摘要
Sunhee Baek,Donghwoon Kwon,Sang C. Suh,Hyunjoo Kim,Ikkyun Kim,Jinoh Kim.[J].重庆邮电大学新办英文刊,2021,7(1):37-44
下载全文  View/Add Comment  Download reader
Clustering-based label estimation for network anomaly detection
Received: July 11, 2018  Revised: May 14, 2020
DOI:https://doi.org/10.1016/j.dcan.2020.06.001
中文关键词: 
英文关键词: Label estimation;Network anomaly detection;Clustering randomness
基金项目:This work was supported in part by Institute of Information and Communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. 2016-0-00078, Cloud-based Security Intelligence Technology Development for the Customized Security Service Provisioning).
AuthorInstitutionE-mail
Sunhee Baek Computer Science Department, Texas A&M University, Commerce, TX, 75429, USA  
Donghwoon Kwon Computer Science Department, Texas A&M University, Commerce, TX, 75429, USA  
Sang C. Suh Computer Science Department, Texas A&M University, Commerce, TX, 75429, USA  
Hyunjoo Kim ETRI, 218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, Republic of Korea  
Ikkyun Kim ETRI, 218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, Republic of Korea  
Jinoh Kim Computer Science Department, Texas A&M University, Commerce, TX, 75429, USA jinoh.kim@tamuc.edu 
Hits: 10
Download times: 14
中文摘要:
      
英文摘要:
      A substantial body of work has been done to identify network anomalies using supervised and unsupervised learning techniques with their unique strengths and weaknesses. In this work, we propose a new approach that takes advantage of both worlds of unsupervised and supervised learnings. The main objective of the proposed approach is to enable supervised anomaly detection without the provision of the associated labels by users. To this end, we estimate the labels of each connection in the training phase using clustering. The “estimated” labels are then utilized to establish a supervised learning model for the subsequent classification of connections in the testing stage. We set up a new property that defines anomalies in the context of network anomaly detection to improve the quality of estimated labels. Through our extensive experiments with a public dataset (NSL-KDD), we will prove that the proposed method can achieve performance comparable to one with the “original” labels provided in the dataset. We also introduce two heuristic functions that minimize the impact of the randomness of clustering to improve the overall quality of the estimated labels.