ASM Based Synthesis of Handwritten Arabic Text PagesReportar como inadecuado

ASM Based Synthesis of Handwritten Arabic Text Pages - Descarga este documento en PDF. Documentación en PDF para descargar gratis. Disponible también para leer online.

The Scientific World Journal - Volume 2015 2015, Article ID 323575, 18 pages -

Research Article

Institute for Information Technology and Communications IIKT, Otto-von-Guericke-University Magdeburg, 39016 Magdeburg, Germany

Umm Al-Qura University, Makkah 21421, Saudi Arabia

Faculty of Computers and Information, Menoufia University MUFIC, Menofia 32721, Egypt

Department of Software Engineering, College of Computer Science and Information Sciences, King Saud University, Riyadh 11451, Saudi Arabia

Department of Computer Science, College of Science, Menoufia University, Menofia 32721, Egypt

Received 7 January 2015; Revised 28 April 2015; Accepted 29 April 2015

Academic Editor: Tongxing Li

Copyright © 2015 Laslo Dinges et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Document analysis tasks, as text recognition, word spotting, or segmentation, are highly dependent on comprehensive and suitable databases for training and validation. However their generation is expensive in sense of labor and time. As a matter of fact, there is a lack of such databases, which complicates research and development. This is especially true for the case of Arabic handwriting recognition, that involves different preprocessing, segmentation, and recognition methods, which have individual demands on samples and ground truth. To bypass this problem, we present an efficient system that automatically turns Arabic Unicode text into synthetic images of handwritten documents and detailed ground truth. Active Shape Models ASMs based on 28046 online samples were used for character synthesis and statistical properties were extracted from the IESK-arDB database to simulate baselines and word slant or skew. In the synthesis step ASM based representations are composed to words and text pages, smoothed by B-Spline interpolation and rendered considering writing speed and pen characteristics. Finally, we use the synthetic data to validate a segmentation method. An experimental comparison with the IESK-arDB database encourages to train and test document analysis related methods on synthetic samples, whenever no sufficient natural ground truthed data is available.

Autor: Laslo Dinges, Ayoub Al-Hamadi, Moftah Elzobi, Sherif El-etriby, and Ahmed Ghoneim



Documentos relacionados