Abstract
This research paper demonstrates the work accomplished in the last phase of the ongoing research project with an objective of developing a system for moving Arabic video text extraction for efficient content-based indexing and searching. The novelty of this paper is the technique used for concatenation of the individual stand alone Arabic characters which are extracted and recognized from image frames. Unicode format of Arabic characters is used for concatenation of extracted characters which is never done before. The concatenated characters are written into the text file in incessant way. This text files are indexed using Lucene and search for the desired string is done in a faster and precise manner.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Saudagar, A.K.J., Mohammed, H.V.: A comparative study of video splitting techniques. In: 23rd International Conference on Systems Engineering, pp. 783–788. Springer International Publishing, Switzerland (2015)
Saudagar, A.K.J., Mohammed, H.V., Iqbal, K., Gyani, Y.J.: Efficient Arabic text extraction and recognition using thinning and dataset comparison technique. In: International Conference on Communication, Information & Computing Technology, pp. 1–5. IEEE Press, New York (2015)
Elarian, Y.S., Al-Muhtaseb, H.A., Ghouti, L.M.: Arabic handwriting synthesis. In: 1st International Workshop on Frontiers in Arabic Handwriting Recognition. http://hdl.handle.net/2003/27562 (2010)
Assabie, Y., Bigun, J.: HMM-based handwritten amharic word recognition with feature concatenation. In: 10th International Conference on Document Analysis and Recognition, pp. 961–965. IEEE Press, New York (2009)
Buckwalter, T.: Issues in Arabic orthography and morphology analysis. In: Workshop on Computational Approaches to Arabic Script-based Languages, pp. 31–34 (2004)
Amin, A.: Recognition of printed Arabic text based on global features and decision tree learning techniques. Pattern Recogn. 33, 1309–1323 (2000)
Harmanani, H., Keirouz, W., Raheel, S.: A rule-based extensible stemmer for information retrieval with application to Arabic. Int. Arab. J. Inf. Techn. 3, 265–272 (2006)
Chherawala, Y., Cheriet, M.: Arabic word descriptor for handwritten word indexing and lexicon reduction. Pattern Recogn. 47, 3477–3486 (2014)
Mahmoud, R., Majed, S.: Improving Arabic information retrieval system using n-gram method. WSEAS Trans. Comput. 10, 125–133 (2011)
Al-Molijy, A., Hmeidi, I., Alsmadi, I.: Indexing of Arabic documents automatically based on lexical analysis. Int. J. Nat. Lang. Comput. 1, 1–8 (2012)
Wedyan, M., Alhadidi, B., Alrabea, A.: The effect of using a thesaurus in Arabic information retrieval system. Int. J. Comput. Sci. Issues 9, 431–435 (2012)
Abderrahim, M.A., Abderrahim, M.E.A., Chikh, M.A.: Using Arabic wordnet for semantic indexation in information retrieval system. http://arxiv.org/ftp/arxiv/papers/1306/1306.2499.pdf
Chan, J., Ziftci, C., Forsyth, D.: Searching off-line arabic documents. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1455–1462. IEEE Press, New York (2006)
Lin, C.H., Chen, H.: An automatic indexing and neural network approach to concept retrieval and classification of multilingual (Chinese-English) documents. IEEE T. Syst. Man. Cy. B. 26, 75–88 (1996)
Moukdad, H., Large, A.: Information retrieval from full-text arabic databases: can search engines designed for English do the job? Libri. 51, 63–74 (2001)
Kefali, A., Chemmam, C.: A semi-automatic approach of old arabic documents indexing. http://ceur-ws.org/Vol-825/paper_83.pdf
Sari, T., Kefali, A.: A search engine for Arabic documents. https://hal.archives-ouvertes.fr/hal-00334402/document
Yacine, E.Y.: Towards an Arabic web-based information retrieval system (ARABIRS): stemming to indexing. Int. J. Comput. Appl. 109, 16–21 (2015)
Savoy, J., Rasolofo, Y.: Report on the TREC-11 experiment: Arabic, named page and topic distillation searches. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.13.8419&rep=rep1&type=pdf
Darwish, K., Oard, D.W.: Term selection for searching printed Arabic. In: 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 261–268 (2002)
He, J., Yan, H., Suel, T.: Compact full-text indexing of versioned document collections. In: 18th ACM conference on Information and Knowledge Management, pp. 415–424 (2009)
Al-Tayyar, M.S.: Arabic information retrieval system based on morphological analysis (AIRSMA). https://www.dora.dmu.ac.uk/bitstream/handle/2086/4126/DX221482.pdf?sequence=1
Mazari, A.C., Aliane, H., Alimazighi, Z.: A conceptual indexing approach for Arabic texts. In: ACS International Conference on Computer Systems and Applications (AICCSA), p. 1. IEEE Press, New York (2013)
Al-Taani, A,T., Al-Gharaibeh, A.M.: Searching concepts and keywords in the HolyQuran. http://www.nauss.edu.sa/En/DigitalLibrary/Researches/Documents/2011/articles_2011_3088.pdf
Arara, A., Smeda, A., Ellabib, I.: Searching and analyzing Arabic text using regular expressions e–Quran case study. Int. J. Comput. Sci. Electron. Eng. 1, 627–631 (2013)
Saabni, R., El-Sana, J.: Keyword searching for Arabic handwritten documents. http://www.iapr-tc11.org/archive/icfhr2008/Proceedings/papers/cr1134.pdf
Srihari, S.N., Ball, G.R., Srinivasan, H.: Versatile search of scanned Arabic handwriting. In: Arabic and Chinese Handwriting Recognition. LNCS, vol. 4768, pp. 57–69. Springer, Heidelberg (2008)
Navarro, G., Sutinen, E., Tanninen, J., Tarhio, J.: Indexing text with approximate q-Grams. In: Combinatorial Pattern Matching. LNCS, vol. 1848, pp. 350–363. Springer, Heidelberg (2000)
Lucene: http://lucene.apache.org/core/
Acknowledgments
This research is supported by King Abdulaziz City for Science and Technology (KACST), Saudi Arabia, vide grant no. AT-32-87.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer India
About this paper
Cite this paper
Saudagar, A.K.J., Mohammed, H.V. (2016). Concatenation Technique for Extracted Arabic Characters for Efficient Content-based Indexing and Searching. In: Satapathy, S., Raju, K., Mandal, J., Bhateja, V. (eds) Proceedings of the Second International Conference on Computer and Communication Technologies. Advances in Intelligent Systems and Computing, vol 379. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2517-1_55
Download citation
DOI: https://doi.org/10.1007/978-81-322-2517-1_55
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2516-4
Online ISBN: 978-81-322-2517-1
eBook Packages: EngineeringEngineering (R0)