摘要: |
设计海蓝目录自动识别系统。该系统在获取已预处理过的目录图像之后,对图像进行版面分析,利用OCR技术识别文字,并自适应地获取目录的缩进量作为判断目录层次的依据,最后通过目录提取和人工校正得到统一的目录格式。该系统具有自动识别、提取书籍目录结构等功能,能有效地处理多种格式的书籍目录类型。 |
关键词: 目录识别 OCR 版面分析 缩进量 目录提取 人工校正 |
DOI: |
投稿时间:2004-09-30 |
基金项目: |
|
Design of Highland Catalog Auto-Recognition System |
Liang Ying, Shi Shandan
|
(Guangxi Computing Center, Nanning, Guangxi, 530022, China) |
Abstract: |
Highland catalog auto-recognition system is proposed,which can handle various styles of catalog images, and its key technical characteristics are described.This system firstly analyzes the content of the catalog in the preprocessed images,then uses OCR technique to recognize the characters,then exploits the relative indent to get the hierarchical structure of the catalog,which is corrected manually to get the unified catalog format. |
Key words: catalog recognition OCR layout analysis indent catalog extraction manual correction |