Concatenation Technique for Extracted Arabic Characters for Efficient Content-based Indexing and Searching

  • Abdul Khader Jilani Saudagar
  • Habeeb Vulla Mohammed
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 379)


This research paper demonstrates the work accomplished in the last phase of the ongoing research project with an objective of developing a system for moving Arabic video text extraction for efficient content-based indexing and searching. The novelty of this paper is the technique used for concatenation of the individual stand alone Arabic characters which are extracted and recognized from image frames. Unicode format of Arabic characters is used for concatenation of extracted characters which is never done before. The concatenated characters are written into the text file in incessant way. This text files are indexed using Lucene and search for the desired string is done in a faster and precise manner.


Concatenation Extraction Recognition Indexing and searching 



This research is supported by King Abdulaziz City for Science and Technology (KACST), Saudi Arabia, vide grant no. AT-32-87.


  1. 1.
    Saudagar, A.K.J., Mohammed, H.V.: A comparative study of video splitting techniques. In: 23rd International Conference on Systems Engineering, pp. 783–788. Springer International Publishing, Switzerland (2015)Google Scholar
  2. 2.
    Saudagar, A.K.J., Mohammed, H.V., Iqbal, K., Gyani, Y.J.: Efficient Arabic text extraction and recognition using thinning and dataset comparison technique. In: International Conference on Communication, Information & Computing Technology, pp. 1–5. IEEE Press, New York (2015)Google Scholar
  3. 3.
    Elarian, Y.S., Al-Muhtaseb, H.A., Ghouti, L.M.: Arabic handwriting synthesis. In: 1st International Workshop on Frontiers in Arabic Handwriting Recognition. (2010)
  4. 4.
    Assabie, Y., Bigun, J.: HMM-based handwritten amharic word recognition with feature concatenation. In: 10th International Conference on Document Analysis and Recognition, pp. 961–965. IEEE Press, New York (2009)Google Scholar
  5. 5.
    Buckwalter, T.: Issues in Arabic orthography and morphology analysis. In: Workshop on Computational Approaches to Arabic Script-based Languages, pp. 31–34 (2004)Google Scholar
  6. 6.
    Amin, A.: Recognition of printed Arabic text based on global features and decision tree learning techniques. Pattern Recogn. 33, 1309–1323 (2000)CrossRefGoogle Scholar
  7. 7.
    Harmanani, H., Keirouz, W., Raheel, S.: A rule-based extensible stemmer for information retrieval with application to Arabic. Int. Arab. J. Inf. Techn. 3, 265–272 (2006)Google Scholar
  8. 8.
    Chherawala, Y., Cheriet, M.: Arabic word descriptor for handwritten word indexing and lexicon reduction. Pattern Recogn. 47, 3477–3486 (2014)CrossRefGoogle Scholar
  9. 9.
    Mahmoud, R., Majed, S.: Improving Arabic information retrieval system using n-gram method. WSEAS Trans. Comput. 10, 125–133 (2011)Google Scholar
  10. 10.
    Al-Molijy, A., Hmeidi, I., Alsmadi, I.: Indexing of Arabic documents automatically based on lexical analysis. Int. J. Nat. Lang. Comput. 1, 1–8 (2012)Google Scholar
  11. 11.
    Wedyan, M., Alhadidi, B., Alrabea, A.: The effect of using a thesaurus in Arabic information retrieval system. Int. J. Comput. Sci. Issues 9, 431–435 (2012)Google Scholar
  12. 12.
    Abderrahim, M.A., Abderrahim, M.E.A., Chikh, M.A.: Using Arabic wordnet for semantic indexation in information retrieval system.
  13. 13.
    Chan, J., Ziftci, C., Forsyth, D.: Searching off-line arabic documents. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1455–1462. IEEE Press, New York (2006)Google Scholar
  14. 14.
    Lin, C.H., Chen, H.: An automatic indexing and neural network approach to concept retrieval and classification of multilingual (Chinese-English) documents. IEEE T. Syst. Man. Cy. B. 26, 75–88 (1996)CrossRefGoogle Scholar
  15. 15.
    Moukdad, H., Large, A.: Information retrieval from full-text arabic databases: can search engines designed for English do the job? Libri. 51, 63–74 (2001)CrossRefGoogle Scholar
  16. 16.
    Kefali, A., Chemmam, C.: A semi-automatic approach of old arabic documents indexing.
  17. 17.
    Sari, T., Kefali, A.: A search engine for Arabic documents.
  18. 18.
    Yacine, E.Y.: Towards an Arabic web-based information retrieval system (ARABIRS): stemming to indexing. Int. J. Comput. Appl. 109, 16–21 (2015)Google Scholar
  19. 19.
    Savoy, J., Rasolofo, Y.: Report on the TREC-11 experiment: Arabic, named page and topic distillation searches.
  20. 20.
    Darwish, K., Oard, D.W.: Term selection for searching printed Arabic. In: 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 261–268 (2002)Google Scholar
  21. 21.
    He, J., Yan, H., Suel, T.: Compact full-text indexing of versioned document collections. In: 18th ACM conference on Information and Knowledge Management, pp. 415–424 (2009)Google Scholar
  22. 22.
    Al-Tayyar, M.S.: Arabic information retrieval system based on morphological analysis (AIRSMA).
  23. 23.
    Mazari, A.C., Aliane, H., Alimazighi, Z.: A conceptual indexing approach for Arabic texts. In: ACS International Conference on Computer Systems and Applications (AICCSA), p. 1. IEEE Press, New York (2013)Google Scholar
  24. 24.
    Al-Taani, A,T., Al-Gharaibeh, A.M.: Searching concepts and keywords in the HolyQuran.
  25. 25.
    Arara, A., Smeda, A., Ellabib, I.: Searching and analyzing Arabic text using regular expressions e–Quran case study. Int. J. Comput. Sci. Electron. Eng. 1, 627–631 (2013)Google Scholar
  26. 26.
    Saabni, R., El-Sana, J.: Keyword searching for Arabic handwritten documents.
  27. 27.
    Srihari, S.N., Ball, G.R., Srinivasan, H.: Versatile search of scanned Arabic handwriting. In: Arabic and Chinese Handwriting Recognition. LNCS, vol. 4768, pp. 57–69. Springer, Heidelberg (2008)Google Scholar
  28. 28.
    Navarro, G., Sutinen, E., Tanninen, J., Tarhio, J.: Indexing text with approximate q-Grams. In: Combinatorial Pattern Matching. LNCS, vol. 1848, pp. 350–363. Springer, Heidelberg (2000)Google Scholar
  29. 29.

Copyright information

© Springer India 2016

Authors and Affiliations

  • Abdul Khader Jilani Saudagar
    • 1
  • Habeeb Vulla Mohammed
    • 1
  1. 1.College of Computer and Information SciencesAl Imam Mohammad Ibn Saud Islamic University (IMSIU)RiyadhSaudi Arabia

Personalised recommendations