引用本文: |
-
梁莹,肖健,李玥.基于多引擎的印刷体汉字识别系统的设计[J].广西科学院学报,2011,27(4):317-319. [点击复制]
- LIANG Ying,XIAO Jian,LI Yue.Development of Multi-engine Printed Chinese Character Recognition System[J].Journal of Guangxi Academy of Sciences,2011,27(4):317-319. [点击复制]
|
|
摘要: |
设计一种基于多引擎的印刷体汉字识别系统,优先采用汉王光学字符识别(OCR)引擎的版面分析结果,在汉王、清华OCR引擎分别完成字符识别之后,根据字符的图像坐标,整合两者的识别结果,并用彩色突出两OCR引擎的冲突字符、置信度低的字符及WiseCheck语义校对引擎提示的错误字符。该系统改善了现有大规模数字化加工生产线中人工比照图像时对识别文本逐字、全文遍历式校对的工作模式,能减轻劳动强度,提高工作效率,降低处理成本。 |
关键词: 汉字识别 光学字符识别 语义校对 多引擎 |
DOI: |
投稿时间:2011-09-16 |
基金项目: |
|
Development of Multi-engine Printed Chinese Character Recognition System |
LIANG Ying, XIAO Jian, LI Yue
|
(Guangxi Computing Center, Nanning, Guangxi, 530022, China) |
Abstract: |
A printed Chinese characters recognition system based on multi-engine has been constructed.Basing on the HW-OCR engine's layout analysis,the HW-OCR and TH-OCR engines accomplished character recognition respectively.According to the coordinate of the character image,the system will integrate the two OCR engine's recognition results using different colors to highlight their conflict character and low confidence character,and the other wrong words which are checked by the "WiseCheck"(a semantic collation engine).This system has improved the text verbatim identification by artificial contrast image and full-text search proofreading work mode in the existing mass digitization processing production line,which further can reduce labor intensity,improve work efficiency and reduce the cost of processing. |
Key words: Chinese character recognition OCR semantic collation multi-engine |