引用本文
  • 黄家裕,刘连芳.基于多质心的不良文本快速过滤方法[J].广西科学院学报,2010,26(4):436-438.    [点击复制]
  • HUANG Jia-yu,LIU Lian-fang.A Method of Illegal and Harmful Text Fast Filter Based on Multi-Centroid Vector[J].Journal of Guangxi Academy of Sciences,2010,26(4):436-438.   [点击复制]
【打印本页】 【在线阅读全文】【下载PDF全文】 查看/发表评论下载PDF阅读器关闭

←前一篇|后一篇→

过刊浏览    高级检索

本文已被:浏览 320次   下载 394 本文二维码信息
码上扫一扫!
基于多质心的不良文本快速过滤方法
黄家裕, 刘连芳
0
(南宁市平方软件新技术有限责任公司, 广西南宁 530007)
摘要:
针对Rocchio容易受到类别样本分布及噪声影响的而导致错误扩大类别范围的问题,提出对训练样本进行聚类,使用聚类形成的多个簇的质心向量替代单个质心向量作为过滤判定向量组的方法。该方法既能保证过滤效率,又比单质心的Rocchio过滤法具有更高的召回率和准确率。
关键词:  不良文本  快速过滤  多质心向量  Rocchio  K-means
DOI:
投稿时间:2010-09-28修订日期:2010-10-18
基金项目:
A Method of Illegal and Harmful Text Fast Filter Based on Multi-Centroid Vector
HUANG Jia-yu, LIU Lian-fang
(Pingsoft New Technology Co. Ltd. of Nanning, Nanning, Guangxi, 530007, China)
Abstract:
Aiming at the defect in Rocchio that classification range could be easily mis-extended due to distribution of classification samples and noises,a filtering method is presented in this paper, in which a vector of single centroid is substituted by a vector group of centroids at multiple clusters formed by clustering trained samples and used as a deciding vector group for filtering.This method is characterized by lossless filtering efficiency.Recalling rate and accuracy of this method is higher than that of the single centroid-featured Rocchio Filtering.
Key words:  illegal and harmful text  fast filter  multi-centroid vector  Rocchio  K-means

用微信扫一扫

用微信扫一扫