引用本文: |
-
黄家裕,刘连芳.基于多质心的不良文本快速过滤方法[J].广西科学院学报,2010,26(4):436-438. [点击复制]
- HUANG Jia-yu,LIU Lian-fang.A Method of Illegal and Harmful Text Fast Filter Based on Multi-Centroid Vector[J].Journal of Guangxi Academy of Sciences,2010,26(4):436-438. [点击复制]
|
|
摘要: |
针对Rocchio容易受到类别样本分布及噪声影响的而导致错误扩大类别范围的问题,提出对训练样本进行聚类,使用聚类形成的多个簇的质心向量替代单个质心向量作为过滤判定向量组的方法。该方法既能保证过滤效率,又比单质心的Rocchio过滤法具有更高的召回率和准确率。 |
关键词: 不良文本 快速过滤 多质心向量 Rocchio K-means |
DOI: |
投稿时间:2010-09-28修订日期:2010-10-18 |
基金项目: |
|
A Method of Illegal and Harmful Text Fast Filter Based on Multi-Centroid Vector |
HUANG Jia-yu, LIU Lian-fang
|
(Pingsoft New Technology Co. Ltd. of Nanning, Nanning, Guangxi, 530007, China) |
Abstract: |
Aiming at the defect in Rocchio that classification range could be easily mis-extended due to distribution of classification samples and noises,a filtering method is presented in this paper, in which a vector of single centroid is substituted by a vector group of centroids at multiple clusters formed by clustering trained samples and used as a deciding vector group for filtering.This method is characterized by lossless filtering efficiency.Recalling rate and accuracy of this method is higher than that of the single centroid-featured Rocchio Filtering. |
Key words: illegal and harmful text fast filter multi-centroid vector Rocchio K-means |