Arabian Journal for Science and Engineering

, Volume 40, Issue 11, pp 3211–3232 | Cite as

Enhanced Arabic Document Retrieval Using Optimized Query Paraphrasing

Research Article - Computer Engineering and Computer Science
  • 155 Downloads

Abstract

Query paraphrasing aims to construct a better formulation of user queries in order to enhance retrieval. Formulating search queries remains complicated for a subset of Web users. In a typical situation, a user will not receive satisfactory results from the submitted search query and will subsequently attempt different query paraphrases. The Arabic vocabulary is rich in synonyms and hyponyms. Such richness of synonyms makes automation of the paraphrasing technique crucial for Arabic information retrieval systems in order to facilitate the process of paraphrasing synonyms. In this article, we propose an enhancement for Arabic information retrieval using a query paraphrasing technique. Furthermore, two query paraphrasing optimization techniques are proposed to overcome the time complexity and exhaustive calculation of existing query paraphrasing techniques. One of these techniques uses a genetic algorithm (GA–QP), and the other employs the artificial bee colony algorithm (ABC–QP). The performance of these two algorithms is compared. ABC–QP shows an improvement in Arabic information retrieval performance compared with the genetic algorithm query paraphrasing system.

Keywords

Arabic language Arabic information retrieval Query paraphrasing Genetic algorithm Artificial bee colony 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bolshakov, I.; Gelbukh, E.: Synonymous paraphrasing using wordnet and internet. In: The 9th International Conference on Applications of Natural Language to Information Systems, pp. 312–323 (2004)Google Scholar
  2. 2.
    Dias G., Moraliyski R., Cordeiro J., Doucet A.: Automatic discovery of word semantic relations using paraphrase alignment and distributional lexical semantics analysis. Nat. Lang. Eng. 16(4), 439–467 (2010)CrossRefGoogle Scholar
  3. 3.
    Shimohata, M.; Sumita, E.: Automatic paraphrasing based on parallel corpus for normalization. In: Third International Conference on Language Resources and Evaluation, pp. 453–457 (2002)Google Scholar
  4. 4.
    Shinyama, Y.; Sekine, S.; Sudo, K.: Automatic paraphrase acquisition from news articles. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 313–318. Morgan Kaufmann Publishers Inc. (2002)Google Scholar
  5. 5.
    Malakasiotis, P.; Androutsopoulos, I.: A generate and rank approach to sentence paraphrasing. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 96–106. Association for Computational Linguistics (2011)Google Scholar
  6. 6.
    Zukerman, I.; Raskutti, B.: Lexical query paraphrasing for document retrieval. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 1–7. Association for Computational Linguistics (2002)Google Scholar
  7. 7.
    Internet World Stats: Internet World Stats. http://internetworldstats.com/. Accessed 5 May 2013
  8. 8.
    Ben Cheikh, I.; Belad, A.; Kacem, A.: A novel approach for the recognition of a wide Arabic handwritten word lexicon. In: Proceedings of the 19th International Conference on Pattern Recognition (ICPR). IEEE (2008)Google Scholar
  9. 9.
    European Commission: Lingua Franca: Chimera or Reality. Studies on Translation and Multilingualism, pp. 59–70 (2011)Google Scholar
  10. 10.
    Zitouni I.: Natural Language Processing of Semitic Languages. Theory and Applications of Natural Language Processing. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  11. 11.
    Al-Dayel A., Ykhlef M.: Arabic users’ attitudes toward web searching using paraphrasing mechanisms. Int. Res. J. Comput. Sci. Inf. Syst. 2(2), 34–39 (2013)Google Scholar
  12. 12.
    Vila M., Marti M.A., Rodriguez H.: Is this a paraphrase? What kind? Paraphrase boundaries and typology. Open J. Mod. Linguist. 4, 205–218 (2014)CrossRefGoogle Scholar
  13. 13.
    Boyer M., Lapalme G.: Generating paraphrases from meaning-text semantic networks. Comput. Intell. 1(1), 103–117 (1985)CrossRefGoogle Scholar
  14. 14.
    Zukerman, I.; George, S.; Wen, Y.: Lexical paraphrasing for document retrieval and node identification. In: Proceedings of the Second International Workshop on Paraphrasing-Volume 16, pp. 94–101. Association for Computational Linguistics (2003)Google Scholar
  15. 15.
    Zukerman, I.; Raskutti, B.; Wen, Y.: Experiments in query paraphrasing for information retrieval. Adv. Artif. Intell. 2557, 24–35 (2002)Google Scholar
  16. 16.
    Wu, H.; Zhou, M.: Optimizing synonym extraction using monolingual and bilingual resources. In: Proceedings of the Second International Workshop on Paraphrasing, pp. 72–79. Association for Computational Linguistics (2003)Google Scholar
  17. 17.
    Barzilay, R.; McKeown, K.R.: Extracting paraphrases from a parallel corpus. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pp. 50–57. Association for Computational Linguistics (2001)Google Scholar
  18. 18.
    Meng, Z.; Hiroaki, O.; Katsumi, T.: Query paraphrasing towards better search by incorporating coordinate relationship. In: 17th Forum on Data Engineering and Information Management, (2015)Google Scholar
  19. 19.
    Baeza-Yates, R.; Hurtado, C.; Mendoza, M.: Query recommendation using query logs in search engines. In: Current Trends in Database Technology, pp. 395–397. Springer, Berlin, (2005)Google Scholar
  20. 20.
    Zhao, S.; Wang, H.; Liu, T.: Paraphrasing with search engine query logs. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1317–1325. Association for Computational Linguistics (2010)Google Scholar
  21. 21.
    Zhao, S.; Wang, H.; Liu, T.: User behaviors lend a helping hand: learning paraphrase query patterns from search log sessions. In: COLING, pp. 3137–3152 (2012)Google Scholar
  22. 22.
    Al-shalabi, R.; Kanaan, G.; Naji, N.A.; Yaseen, M.; Al-Sarayreh, B.: Arabic query expansion using interactive word sense disambiguation. In: Second International Conference on Arabic Language Resources and Tools, pp. 156–158, Cairo, Egypt (2006)Google Scholar
  23. 23.
    Kanaan G., Al-shalabi R., Ghwanmeh S., Bani-Ismail B.: Interactive and automatic query expansion: a comparative study with an application on Arabic. Am. J. Appl. Sci. 5(11), 1433–1436 (2008)CrossRefGoogle Scholar
  24. 24.
    Harrag, F.; Hamdi-Cherif, A.; El-Qawasmeh, E.: Vector space model for Arabic information retrieval application to Hadith Indexing. In: Applications of Digital Information and Web Technologies, pp. 107–112. IEEE (2008)Google Scholar
  25. 25.
    Fowkes, H.; Beaulieu, M.: Interactive searching behaviour: Okapi experiment for TREC-8. In: 22nd BCS-IRSG European Colloquium on IR Research, (2000)Google Scholar
  26. 26.
    Mahgoub, A.Y.; Rashwan, M.A.; Raafat, H.; Zahran, M.A.; Fayek, M.B.: Semantic query expansion for Arabic information retrieval. In: Proceedings of the EMNLP, pp. 87–92, Doha, Qatar (2014)Google Scholar
  27. 27.
    Khafajeh, H.; Yousef, N.: Evaluation of different query expansion techniques by using different similarity measures in Arabic documents. Int. J. Comput. Sci. Issues 10(4), 160–166 (2013)Google Scholar
  28. 28.
    Abderrahim, M.A.; Abderrahim, M.E.A.; Chikh, M.A.: Using Arabic wordnet for semantic indexation in information retrieval system. Int. J. Comput. Sci. 10(1), 327–332 (2013)Google Scholar
  29. 29.
    Rachidi, T.; Bouzoubaa, M.; Elmortaji, L.; Boussouab, B.; Bensaid, A.: Arabic User search query correction and expansion. In: Proceedings of COPSTIC 3, pp. 11–13 (2003)Google Scholar
  30. 30.
    Rachidi, T.; Iraqi, O.; Bouzoubaa, M.; Khattab, A.B.E.; Kourdi, M.E.; Zahi, A.; Bensaid, A.: Barq: distributed multilingual internet search engine with focus on Arabic language. In: Systems, Man and Cybernetics, pp. 428–435. IEEE, Washington, D.C., USA (2003)Google Scholar
  31. 31.
    Hammo B.H.: Towards enhancing retrieval effectiveness of search engines for diacritisized Arabic documents. Inf. Retr. 12(3), 300–323 (2009)CrossRefGoogle Scholar
  32. 32.
    Farag, A.; Nrnberger, A.: N-grams conflation approach for Arabic text. In: Proceedings of the International Workshop on Improving Non English Web Searching (iNEWS 07), pp. 39–46, Amsterdam City, Netherlands (2007)Google Scholar
  33. 33.
    Trad, R.; Mustafa, H.; Koroni, R.; Almaghrabi, A.: Evaluating Arabic wordnet ontology by expansion of Arabic queries using various retrieval models. In: ICT and Knowledge Engineering, pp. 155–162. IEEE (2012)Google Scholar
  34. 34.
    Abouenour L., Bouzoubaa K., Rosso P.: An evaluated semantic QE and structure-based approach for enhancing Arabic Q/A. Int. J. Inform. Commun. Technol. 3(3), 37–51 (2010)Google Scholar
  35. 35.
    Bar, K.: Deriving Paraphrases for Highly Inflected Languages, with a Focus on Machine Translation. PhD diss, Tel Aviv University (2013)Google Scholar
  36. 36.
    Bar K., Dershowitz N.: Using semantic equivalents for Arabic-to-English: example-based translation. Chall. Arabic Mach. Transl. 9, 49–72 (2012)CrossRefGoogle Scholar
  37. 37.
    Denkowski, M.; Al-Haj, H.; Lavie, A.: Turker-assisted paraphrasing for English–Arabic machine translation. In: Proceedings of the NAACL HLT 2010, pp. 66–70. Association for Computational Linguistics (2010)Google Scholar
  38. 38.
    Al-Shaor, A.; Hmeidi, S.; Najadat, H.: Application of genetic algorithm in automatic query expansion. In: International Arab Conference on Information Technology, Sfax, Tunisia (2008)Google Scholar
  39. 39.
    Bashir S.: Combining pre-retrieval query quality predictors using genetic programming. Appl. Intell. 40(3), 525–535 (2014)MathSciNetCrossRefGoogle Scholar
  40. 40.
    Bhatnagar P., Pareek N.: Improving pseudo relevance feedback based query expansion using genetic fuzzy approach and semantic similarity notion. J. Inf. Sci. 40(4), 523–537 (2014)CrossRefGoogle Scholar
  41. 41.
    Maitah W., Al-Rababaa M., Kannan G.: Improving the effectiveness of information retrieval system using adaptive genetic algorithm. Int. J. Comput. Sci. Inf. Technol. 5(5), 91–105 (2013)Google Scholar
  42. 42.
    Araujo, L.; Perezaguera, J.: Improving query expansion with stemming terms: a new genetic algorithm approach. In: The 8th European Conference on Evolutionary Computation in Combinatorial Optimization, pp. 182–193. Springer, Berlin (2008)Google Scholar
  43. 43.
    Boughanem M., Chrisment C., Tamine L.: Genetic approach to query space exploration. Inf. Retr. 1(3), 175–192 (1999)CrossRefGoogle Scholar
  44. 44.
    Bhatnagar, P.; Pareek, N.: Genetic algorithm-based query expansion for improved information retrieval. Intell. Comput. Commun. Devices. pp. 47–55 (2015)Google Scholar
  45. 45.
    Bolajl A., Khader A., AL-Betar M., Awadallah M.: Artificial bee colony algorithm, its variants and applications: a survey. J. Theor. Appl. Inf. Technol. 47(2), 434–459 (2013)Google Scholar
  46. 46.
    Akay, B.; Karaboga, D.: Parameter tuning for the artificial bee colony algorithm. Computational collective intelligence. Semantic Web, Social Networks and Multiagent Systems, pp. 608–619 (2009)Google Scholar
  47. 47.
    Kromer, P.; Snasel, V.; Platos, J.; Abraham, A.: Implicit user modelling using hybrid meta-heuristics. In: Hybrid Intelligent Systems, pp. 42–47. IEEE, Barcelona (2008)Google Scholar
  48. 48.
    Anuradha G., Lavanya Devi G.: Artificial Bee Colony (ABC) approach for ranking web pages. Int. J. Comput. Appl. 99(1), 35–39 (2014)Google Scholar
  49. 49.
    Veningston, K.; Shanmugalakshmi, R.: Efficient implementation of web search query reformulation using ant colony optimization. In: Third International Conference BDA, pp. 80–94, New Delhi, India (2014)Google Scholar
  50. 50.
    Navrat, P.; Ezzeddine, A.: Bee hive at work: following a developing story on the web. Artif. Intell. Theory Pract. 3, 187–196 (2010)Google Scholar
  51. 51.
    Ezzeddine A.B.: Web information retrieval inspired by social insect behaviour. Inf. Sci. Technol. 3(1), 93–100 (2011)Google Scholar
  52. 52.
    USTHB, L.; Algiers, A.: Bees swarm optimization based approach for web information retrieval. In: International Conference on Web Intelligence and Intelligent Agent Technology. IEEE (2010)Google Scholar
  53. 53.
    Princeton University: About WordNet. http://wordnet.princeton.edu. Accessed 8 March 2013
  54. 54.
    Hatcher, E.; Gospodnetic, O.: Lucene in Action. Manning Publications Co., Greenwich (2004)Google Scholar
  55. 55.
    Rodriguez, H.; Farwell, D.; Farreres, J.; Bertran, M.; Alkhalifa, M.; Mart_Õ, M.A.; Black, W.; Elkateb, S.; Kirk, J.; Pease, A.: Arabic wordnet: current state and future extensions. In: Proceedings of The Fourth Global WordNet Conference, (2008)Google Scholar
  56. 56.
    AbdulAmeer, A.; ALTaie, A.: Homonymy in English and Arabic: a contrastive study. http://www.uobabylon.edu.iq/uobcoleges/fileshare/articles/Homonymy.pdf. Accessed 3 Feb 2013
  57. 57.
    Larkey, L.S.; Ballesteros, L.; Connell, M.E.: Light Stemming for Arabic Information Retrieval. In: Arabic Computational Morphology, vol. 38, pp. 221–243 (2007)Google Scholar
  58. 58.
    Holland J.H.: Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology. MIT press, Cambridge (1992)Google Scholar
  59. 59.
    Melanie M.: An Introduction to Genetic Algorithms. Massachusetts Institute of Technology, London (1999)Google Scholar
  60. 60.
    Boyabatli O., Sabuncuoglu I.: Parameter selection in genetic algorithms. Syst. Cybern. Inf. 2(4), 78–83 (2007)Google Scholar
  61. 61.
    Ykhlef, M.; Al-Dayel, A.: Query paraphrasing using genetic approach for intelligent information retrieval. In: The 7th International Conference for Internet Technology and Secured Transactions, pp. 699–703. IEEE UK/RI Computer Chapter (2012)Google Scholar
  62. 62.
    Karaboga D., Basturk B.: A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. J. Global Optim. 39(3), 459–471 (2007)MATHMathSciNetCrossRefGoogle Scholar
  63. 63.
    Al-Dayel, A.; Ykhlef, M.: Query paraphrasing enhancement using artificial bee colony. In: Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics. ACM, Madrid (2013)Google Scholar
  64. 64.
    Abdelali, A.; Cowie, J.; Soliman, H.S.: Arabic information retrieval perspectives. In: Proceedings of the 11th Conference on Natural Language Processing, (JEP-TALN), pp. 391–400 (2004)Google Scholar
  65. 65.
    Darwish K., Magdy W.: Arabic information retrieval. Found. Trends Inf. Retr. 7(4), 239–342 (2013)CrossRefGoogle Scholar
  66. 66.
    Webber, W.E.: Measurement in Information Retrieval Evaluation. Phd diss, University of Melbourne (2010)Google Scholar
  67. 67.
    Intelligentia s.r.l.: Jenes 2.0. http://jenes.intelligentia.it/. Accessed 5-6-2013
  68. 68.
    Buckley, C.: Trec Eval IR Evaluation Package. http://trec.nist.gov/trec_eval/. Accessed 5 Apr 2014
  69. 69.
    Said, D.; Wanas, N.M.; Darwish, N.M.; Hegazy, N.: A study of text preprocessing tools for Arabic text categorization. In: The Second International Conference on Arabic Language, pp. 230–236. The MEDAR Consortium, Cairo, Egypt (2009)Google Scholar
  70. 70.
    Abbas, M.; Kamel, S.: Comparison of topic identification methods for Arabic language. In: Recent Advances in Natural Language Processing, pp. 14–17, Bulgary (2005)Google Scholar
  71. 71.
    Polyvyanyy, A.S.: Evaluation Design of Information Retrieval System With ETVSM Specific Extensions. https://bpt.hpi.uni-potsdam.de/pub/Public/SeminarPublications/ArtemPolyvyanyy.pdf. Accessed 7 Dec 2013

Copyright information

© King Fahd University of Petroleum & Minerals 2015

Authors and Affiliations

  1. 1.Department of Information Technology, College of Computer and Information SciencesKing Saud UniversityRiyadhKingdom of Saudi Arabia
  2. 2.Department of Information Systems, College of Computer and Information SciencesKing Saud UniversityRiyadhKingdom of Saudi Arabia

Personalised recommendations