Arabic Scene Text Acquisition and Statistics
Classification relies on characteristics of provided data. The tasks of machine learning produce good results if classification techniques applied on big dataset. Therefore, dataset is considered as an integral part of machine learning research. Unlike traditional machine learning algorithms where dataset size was never been important, state of the machine learning techniques usually work with huge chunks of data and cannot produce good results if significant amount of data samples have not been trained. This book accentuates to highlight the challenges and present solutions for smooth recognition of Arabic text appeared in natural images. Therefore, this chapter presents Arabic scene text datasets and provides researchers a benchmark database for their presented solutions. This chapter aims to discuss the sources used to capture scene text. The scene text recognition is a challenging problem due to the text having variations in font styles, size, alignment, orientation, reflection, illumination change, blurriness, and complex background. Among cursive scripts, Arabic scene text recognition is contemplated as a more challenging problem due to joined writing, same character variations, large number of ligatures and number of baselines, etc. As scene text recognition is an application of supervised learning, therefore to present two ways to generate ground truth.