引用本文: |
-
曹超,李梦利,阳树洪,李春贵.基于局部结构保持的高维数据半监督深度嵌入聚类算法[J].广西科学,2022,29(5):922-929. [点击复制]
- CAO Chao,LI Mengli,YANG Shuhong,LI Chungui.Semi-supervised Deep Embedded Clustering of High Dimensional Data Based on Local Structure Preservation[J].Guangxi Sciences,2022,29(5):922-929. [点击复制]
|
|
摘要: |
聚类是机器学习和数据挖掘中的重要课题。近年来,深度神经网络(Deep Neural Networks,DNN)在各种聚类任务中受到广泛关注。特别是半监督聚类,在大量无监督数据中仅引入少量先验信息即可显著提高聚类性能。然而,这些聚类方法忽略了定义的聚类损失可能破坏特征空间,从而导致非代表性的无意义特征。针对现有半监督深度聚类的特征学习过程中局部结构保持有所欠缺的问题,本文提出一种改进的半监督深度嵌入聚类(Improved Semi-supervised Deep Embedded Clustering,ISDEC)算法,采用欠完备自动编码器在特征表达学习的同时,保持数据的内在局部结构;通过综合聚类损失、成对约束损失和重构损失,对聚类标签分配和特征表达进行联合优化。在包括基因数据在内的若干高维数据集上的实验结果表明,本方法的聚类性能比现有方法更好。 |
关键词: 聚类 半监督 深度嵌入 基因 表达学习 |
DOI:10.13656/j.cnki.gxkx.20221116.013 |
投稿时间:2021-11-19修订日期:2022-02-10 |
基金项目:国家自然科学基金项目 (62061003,62062010),广西自然科学基金项目(2019GXNSFAA245049), 广西科技计划项目(桂科AD19245101)和广西大学生创新创业训练计划项目 (201910594057)资助。 |
|
Semi-supervised Deep Embedded Clustering of High Dimensional Data Based on Local Structure Preservation |
CAO Chao, LI Mengli, YANG Shuhong, LI Chungui
|
(School of Electrical Electronics and Computer Science, Guangxi University of Science and Technology, Liuzhou, Guangxi, 545006, China) |
Abstract: |
Clustering is an important topic in machine learning and data mining.In recent years,Deep Neural Networks (DNN) have received extensive attention in various clustering tasks.In particular,semi-supervised clustering can significantly improve clustering performance by introducing only a small amount of prior information into a large number of unsupervised data.However,these clustering methods ignore that the defined clustering loss may destroy the feature space,leading to non-representative meaningless features.Aiming at the problem that the existing semi-supervised deep clustering has a lack of local structure preservation in the feature learning process,an Improved Semi-supervised Deep Embedded Clustering Algorithm (ISDEC) is proposed in this article,which uses an under-complete auto-encoder to preserve the inherent local structure of the data while learning the feature expression.The clustering label allocation and feature expression are jointly optimized by combining clustering loss,pairwise constraint loss and reconstruction loss.Experimental results on several high-dimensional datasets including genetic data show that this method achieves better clustering performance than existing methods. |
Key words: clustering semi-supervised deep embedding gene expressive learning |