网站首页 | 期刊介绍 | 编委会 | 投稿指南 | 在线订阅 | 联系我们 | 同行评议 | 出版声明 | 征稿English
尚家泽,安葳鹏,郭耀丹.基于阈值的BIRCH算法改进与分析[J].重庆邮电大学学报(自然科学版),2020,32(3):487-494. 本文二维码信息
二维码(扫一下试试看!)
基于阈值的BIRCH算法改进与分析
BIRCH algorithm improvement and analysis based on threshold value
投稿时间:2018-12-18  修订日期:2020-02-25
DOI: 10.3979/j.issn.1673-825X.2020.03.019
中文关键词:  平衡迭代规约层次聚类(BIRCH)算法  自适应  阈值  贝叶斯算法
English Keywords:balanced iterative reducing and clustering using hierarchies (BIRCH) algorithm  self-adaption  threshold  Bayesian algorithm
基金项目:河南省教育厅应用研究计划项目(16A520052)
作者单位E-mail
尚家泽 河南理工大学,河南 焦作 454000 971945362@qq.com 
安葳鹏 河南理工大学,河南 焦作 454000 awp@hpu.edu.cn 
郭耀丹 河南理工大学,河南 焦作 454000 guodandna@163.com 
摘要点击次数: 78
全文下载次数: 42
中文摘要:
      平衡迭代规约层次聚类(balanced iterative reducing and clustering using hierarchies, BIRCH)算法是一个综合的层次聚类算法。但BIRCH算法为叶子节点中的簇设置统一的空间阈值,根据数据对象与簇之间的距离来决定数据对象的插入位置,从而忽略了簇与簇之间的关系;此外,算法在分裂节点时,选取距离最远的2个聚类特征作为子簇,其他聚类特征会根据与这2个聚类特征之间的距离关系分裂为另外的子簇,造成处于簇与簇之间的样本数据错误分类,这样会忽略聚类特征之间的关系。针对BIRCH算法的这2个问题,提出了基于阈值的自适应算法,用于解决原算法统一空间阈值的问题;并在针对聚类特征关系的问题上,结合朴素贝叶斯算法对原算法进行改进。对改进后BIRCH算法与传统的算法进行仿真实验。结果表明,改进算法在损失效率的情况下,聚类效果得到了明显的改善,并且与其他算法相比,所提算法具有不错的表现性,而且具有跨数据集的鲁棒性。
English Summary:
      Balanced iterative reducing and clustering using hierarchies(BIRCH) is a comprehensive and hierarchical clustering algorithm. However, algorithm BIRCH sets a unified space threshold for clusters in leaf nodes, and where it inserts the data is determined by the distance between data and clusters,thus ignoring the relationship between clusters. In addition, when splitting nodes, the algorithm selects two clustering feature with the maximum distance as its sub-clusters,which is used by other clustering to splitting, thus resulting in the wrong classification of sample data between clusters and ignoring the relationship between clustering features. To deal with the two problems of BIRCH algorithm, an adaptive algorithm based on threshold is proposed in order to solve unified space threshold of the original algorithm, and the original algorithm is improved by combining Naive Bayesian algorithm to solve the problem of clustering features. A simulated experiment on the improved BIRCH algorithm and the traditional one shows that the clustering effect of the BIRCH algorithm is obviously improved under the loss of efficiency, and compared with other methods,the proposed method has good performance and is robust across data sets.
HTML    PDF浏览   查看/发表评论  下载PDF阅读器
版权所有 © 2009 重庆邮电大学期刊社  
地址:重庆市 南岸区 重庆邮电大学 期刊社 邮编:400065
电话:023-62461032 E-mail : journal@cqupt.edu.cn
meinv 海贼王论坛