广西科学

引用本文：

韦慧娴,韦程东,陈少凡,何国源,李冶通.CDKM：基于K-means聚类的因果分解方法[J].广西科学,2025,32(1):121-131. [点击复制]
WEI Huixian,WEI Chengdong,CHEN Shaofan,HE Guoyuan,LI Yetong.CDKM:A Causal Decomposition Method Based on K-means Clustering[J].Guangxi Sciences,2025,32(1):121-131. [点击复制]

【打印本页】【在线阅读全文】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

本文已被：浏览 13次下载 20次	码上扫一扫！
CDKM：基于K-means聚类的因果分解方法
韦慧娴¹, 韦程东¹, 陈少凡², 何国源³, 李冶通¹
0 字体:加大+\|默认\|缩小-
(1.南宁师范大学数学与统计学院, 广西南宁 530100;2.广西科学院《广西科学》编辑部, 广西南宁 530007;3.贺州学院经济与管理学院, 广西贺州 542899)

摘要:

冗余的条件独立性测试严重影响了基于约束的因果发现方法的效率和准确性。针对这一问题，本研究提出一种基于K-means聚类的因果分解方法(Causal decomposition method based on K-means clustering，CDKM)。CDKM利用K-means聚类将原始因果发现问题划分为多个子因果发现问题，然后再将发现的子因果网络合并，从而得到完整的因果网络。具体来说，CDKM首先利用K-means聚类将原始变量集分割成k个簇；其次在每个簇中加入其他簇中与当前簇相关距离最小的两个节点，得到更新后的k个簇；然后在每个簇上进行因果发现，得到k个子因果网络；最后将所有子因果网络合并得到一个完整的因果网络。CDKM不仅避免了使用高阶条件独立性测试进行分解，还大大减少了冗余的条件独立性测试，相比传统的递归型基于约束的因果发现方法，CDKM可以将原始变量集任意分割。在8个数据集上的实验结果表明，CDKM可以极大地加速因果发现，降低了时间复杂度，且精准度优于基线模型。

关键词: 因果发现因果分解 K-means聚类因果网络条件独立性测试

DOI：10.13656/j.cnki.gxkx.20240709.001

投稿时间：2024-04-22修订日期：2024-05-20

基金项目:国家自然科学基金项目(11561010)资助.

CDKM:A Causal Decomposition Method Based on K-means Clustering

WEI Huixian¹, WEI Chengdong¹, CHEN Shaofan², HE Guoyuan³, LI Yetong¹

(1.School of Mathematics and Statistics, Nanning Normal University, Nanning, Guangxi, 530100, China;2.The Editor Office of Guangxi Sciences, Guangxi Academy of Sciences, Nanning, Guangxi, 530007, China;3.School of Economics and Management, Hezhou University, Hezhou, Guangxi, 542899, China)

Abstract:

Redundant conditional independence tests have seriously affected the efficiency and accuracy of constraint-based methods in causal discovery.To solve this problem,a causal decomposition method based on K-means clustering (CDKM) is proposed.CDKM divides the original causal discovery problem into multiple sub-causal discovery problems by using K-means clustering and then merges the sub-causal networks to obtain a complete causal network.Specifically,CDKM first uses K-means clustering to divide the original variable set into k clusters and then adds two nodes with the smallest correlation distance to the current cluster from other clusters to each cluster to obtain updated k clusters.After that,it discovers causality in each cluster and obtain various sub-causal networks.Finally,it merges all the sub-causal networks to obtain a complete causal network.CDKM avoids the decomposition using high-order conditional independence tests and reduces redundant conditional independence tests.Compared with recursive constraint-based methods,CDKM can divide the original variable set into any segments.Experimental results on 8 datasets show that CDKM can greatly accelerate causal discovery,reduce time complexity,and achieve higher accuracy than baseline models.

Key words: causal discovery causal decomposition K-means clustering causal network conditional independence test

用微信扫一扫