摘要: |
设计并实现一个网页分类系统,采用相同的特征权值计算方法,特征选择算法以及分类算法,进行基于分词的网页分类系统和基于N-Gram的网页分类系统的对比实验,分析两者的分类效果。结果表明,基于N-Gram的网页分类系统能达到并在一定程度上高于基于分词的网页分类系统的效果。 |
关键词: 中文网页 分类 N-Gram 分词 KNN |
DOI: |
投稿时间:2005-09-07 |
基金项目: |
|
A Comparative Study of Word-Segment and N-Gram Categorization System |
Gao Weifeng, Liu Lianfang
|
(Nanning Pingsoft New Technology Co. Ltd., Nanning, Guangxi, 530003, China) |
Abstract: |
This page designs a Chinese web categorization system,with the same feature weight,feature selection and categorizing algorithm,based on Word-Segment categorization system and N-Gram categorization system.The experiment demonstrates that being based on N-Gram categorization system has the same effect as being based on Word-Segment categorization system,which is more effective in some aspects. |
Key words: chinese web categorization N-Gram word-segment KNN |