引用本文: |
-
韦慧娴,韦程东,陈少凡,何国源,李冶通.CDKM:基于K-means聚类的因果分解方法[J].广西科学,2025,32(1):121-131. [点击复制]
- WEI Huixian,WEI Chengdong,CHEN Shaofan,HE Guoyuan,LI Yetong.CDKM:A Causal Decomposition Method Based on K-means Clustering[J].Guangxi Sciences,2025,32(1):121-131. [点击复制]
|
|
摘要: |
冗余的条件独立性测试严重影响了基于约束的因果发现方法的效率和准确性。针对这一问题,本研究提出一种基于K-means聚类的因果分解方法(Causal decomposition method based on K-means clustering,CDKM)。CDKM利用K-means聚类将原始因果发现问题划分为多个子因果发现问题,然后再将发现的子因果网络合并,从而得到完整的因果网络。具体来说,CDKM首先利用K-means聚类将原始变量集分割成k个簇;其次在每个簇中加入其他簇中与当前簇相关距离最小的两个节点,得到更新后的k个簇;然后在每个簇上进行因果发现,得到k个子因果网络;最后将所有子因果网络合并得到一个完整的因果网络。CDKM不仅避免了使用高阶条件独立性测试进行分解,还大大减少了冗余的条件独立性测试,相比传统的递归型基于约束的因果发现方法,CDKM可以将原始变量集任意分割。在8个数据集上的实验结果表明,CDKM可以极大地加速因果发现,降低了时间复杂度,且精准度优于基线模型。 |
关键词: 因果发现 因果分解 K-means聚类 因果网络 条件独立性测试 |
DOI:10.13656/j.cnki.gxkx.20240709.001 |
投稿时间:2024-04-22修订日期:2024-05-20 |
基金项目:国家自然科学基金项目(11561010)资助. |
|
CDKM:A Causal Decomposition Method Based on K-means Clustering |
WEI Huixian1, WEI Chengdong1, CHEN Shaofan2, HE Guoyuan3, LI Yetong1
|
(1.School of Mathematics and Statistics, Nanning Normal University, Nanning, Guangxi, 530100, China;2.The Editor Office of Guangxi Sciences, Guangxi Academy of Sciences, Nanning, Guangxi, 530007, China;3.School of Economics and Management, Hezhou University, Hezhou, Guangxi, 542899, China) |
Abstract: |
Redundant conditional independence tests have seriously affected the efficiency and accuracy of constraint-based methods in causal discovery.To solve this problem,a causal decomposition method based on K-means clustering (CDKM) is proposed.CDKM divides the original causal discovery problem into multiple sub-causal discovery problems by using K-means clustering and then merges the sub-causal networks to obtain a complete causal network.Specifically,CDKM first uses K-means clustering to divide the original variable set into k clusters and then adds two nodes with the smallest correlation distance to the current cluster from other clusters to each cluster to obtain updated k clusters.After that,it discovers causality in each cluster and obtain various sub-causal networks.Finally,it merges all the sub-causal networks to obtain a complete causal network.CDKM avoids the decomposition using high-order conditional independence tests and reduces redundant conditional independence tests.Compared with recursive constraint-based methods,CDKM can divide the original variable set into any segments.Experimental results on 8 datasets show that CDKM can greatly accelerate causal discovery,reduce time complexity,and achieve higher accuracy than baseline models. |
Key words: causal discovery causal decomposition K-means clustering causal network conditional independence test |