摘要: |
[目的]在没有先验知识的前提下,采用基于粒子群优化算法(PSO)的加权模糊C-均值(WFCM)聚类算法,从30多万条记录的医疗保险数据中挖掘出疑似医疗保险欺诈的记录。[方法]首先,引用改进的欧式距离、相似性函数以及交叉熵函数并通过PSO算法极小化交叉熵函数,对属性权重进行分析;其次,选取Calinski-Harabasz (CH)有效性指标,展开聚类有效性的研究;然后,基于数据预处理的结果将数据运用于PSO算法,不断更新得到各属性的权重,并运用聚类有效性评价中的CH有效性指标来动态估计最佳聚类个数,提高FCM聚类的速度;最后,将属性权重和最佳聚类数应用于FCM聚类算法,根据隶属度矩阵聚类得到疑似医疗保险欺诈结果。[结果]基于上述研究方法,本研究根据最后的隶属度矩阵来进行聚类分析。[结论]将优化的权重应用于加权FCM聚类算法与聚类有效性评价,既提高了聚类算法的高效性,又避免了主观评价对分类的影响。 |
关键词: PSO WFCM CH有效性指标 医保欺诈 |
DOI:10.13657/j.cnki.gxkxyxb.20170223.001 |
投稿时间:2016-11-26修订日期:2016-12-07 |
基金项目:国家自然科学基金项目(61363003)资助。 |
|
Study on WFCM Algorithm based on PSO and Its Application in Identifying Medicare Fraud |
LI Hua, CHEN Ningjiang
|
(School of Computer, Electronics and Information in Guangxi University, Nanning, Guangxi, 530004, China) |
Abstract: |
[Objective] This paper aims to find the records of suspected medicare fraud from over 30 million records by using the Weighted Fuzzy C-Means clustering algorithm based on particle swarm optimization (PSO) algorithm with the absence of prior knowledge.[Methods] Firstly, the improved Euclidean Distance,similarity function and cross entropy function are introduced and the entropy function is minimized by PSO algorithm to analyze the attribute weight.Secondly, the validity index of CH (Calinski-Harabasz) is selected,and the study of validity of clustering is carried out.Thirdly,the data is applied to the PSO algorithm based on the results of data preprocessing, constantly updated to get the weight of each attribute,and the optimal numbers of clusters are estimated dynamically by validity index of CH,in order to increase the speed of FCM.Finally,the attribute weights and the optimal clustering numbers are applied to the FCM clustering algorithm,and the results of suspected medical insurance fraud are obtained according to the membership matrix.[Results] Based on the above method,the final membership matrix is used for carrying out cluster analysis.[Conclusion] This paper shows the running efficiency of clustering algorithms can be improved, and the influence of subjective evaluation for classification can be avoided by applying the weights to the WFCM clustering algorithm and clustering validity. |
Key words: PSO WFCM validity index of CH medicare fraud |