Enhanced Arabic Document Retrieval Using Optimized Query Paraphrasing
- 155 Downloads
Abstract
Query paraphrasing aims to construct a better formulation of user queries in order to enhance retrieval. Formulating search queries remains complicated for a subset of Web users. In a typical situation, a user will not receive satisfactory results from the submitted search query and will subsequently attempt different query paraphrases. The Arabic vocabulary is rich in synonyms and hyponyms. Such richness of synonyms makes automation of the paraphrasing technique crucial for Arabic information retrieval systems in order to facilitate the process of paraphrasing synonyms. In this article, we propose an enhancement for Arabic information retrieval using a query paraphrasing technique. Furthermore, two query paraphrasing optimization techniques are proposed to overcome the time complexity and exhaustive calculation of existing query paraphrasing techniques. One of these techniques uses a genetic algorithm (GA–QP), and the other employs the artificial bee colony algorithm (ABC–QP). The performance of these two algorithms is compared. ABC–QP shows an improvement in Arabic information retrieval performance compared with the genetic algorithm query paraphrasing system.
Keywords
Arabic language Arabic information retrieval Query paraphrasing Genetic algorithm Artificial bee colonyPreview
Unable to display preview. Download preview PDF.
References
- 1.Bolshakov, I.; Gelbukh, E.: Synonymous paraphrasing using wordnet and internet. In: The 9th International Conference on Applications of Natural Language to Information Systems, pp. 312–323 (2004)Google Scholar
- 2.Dias G., Moraliyski R., Cordeiro J., Doucet A.: Automatic discovery of word semantic relations using paraphrase alignment and distributional lexical semantics analysis. Nat. Lang. Eng. 16(4), 439–467 (2010)CrossRefGoogle Scholar
- 3.Shimohata, M.; Sumita, E.: Automatic paraphrasing based on parallel corpus for normalization. In: Third International Conference on Language Resources and Evaluation, pp. 453–457 (2002)Google Scholar
- 4.Shinyama, Y.; Sekine, S.; Sudo, K.: Automatic paraphrase acquisition from news articles. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 313–318. Morgan Kaufmann Publishers Inc. (2002)Google Scholar
- 5.Malakasiotis, P.; Androutsopoulos, I.: A generate and rank approach to sentence paraphrasing. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 96–106. Association for Computational Linguistics (2011)Google Scholar
- 6.Zukerman, I.; Raskutti, B.: Lexical query paraphrasing for document retrieval. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 1–7. Association for Computational Linguistics (2002)Google Scholar
- 7.Internet World Stats: Internet World Stats. http://internetworldstats.com/. Accessed 5 May 2013
- 8.Ben Cheikh, I.; Belad, A.; Kacem, A.: A novel approach for the recognition of a wide Arabic handwritten word lexicon. In: Proceedings of the 19th International Conference on Pattern Recognition (ICPR). IEEE (2008)Google Scholar
- 9.European Commission: Lingua Franca: Chimera or Reality. Studies on Translation and Multilingualism, pp. 59–70 (2011)Google Scholar
- 10.Zitouni I.: Natural Language Processing of Semitic Languages. Theory and Applications of Natural Language Processing. Springer, Heidelberg (2014)CrossRefGoogle Scholar
- 11.Al-Dayel A., Ykhlef M.: Arabic users’ attitudes toward web searching using paraphrasing mechanisms. Int. Res. J. Comput. Sci. Inf. Syst. 2(2), 34–39 (2013)Google Scholar
- 12.Vila M., Marti M.A., Rodriguez H.: Is this a paraphrase? What kind? Paraphrase boundaries and typology. Open J. Mod. Linguist. 4, 205–218 (2014)CrossRefGoogle Scholar
- 13.Boyer M., Lapalme G.: Generating paraphrases from meaning-text semantic networks. Comput. Intell. 1(1), 103–117 (1985)CrossRefGoogle Scholar
- 14.Zukerman, I.; George, S.; Wen, Y.: Lexical paraphrasing for document retrieval and node identification. In: Proceedings of the Second International Workshop on Paraphrasing-Volume 16, pp. 94–101. Association for Computational Linguistics (2003)Google Scholar
- 15.Zukerman, I.; Raskutti, B.; Wen, Y.: Experiments in query paraphrasing for information retrieval. Adv. Artif. Intell. 2557, 24–35 (2002)Google Scholar
- 16.Wu, H.; Zhou, M.: Optimizing synonym extraction using monolingual and bilingual resources. In: Proceedings of the Second International Workshop on Paraphrasing, pp. 72–79. Association for Computational Linguistics (2003)Google Scholar
- 17.Barzilay, R.; McKeown, K.R.: Extracting paraphrases from a parallel corpus. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pp. 50–57. Association for Computational Linguistics (2001)Google Scholar
- 18.Meng, Z.; Hiroaki, O.; Katsumi, T.: Query paraphrasing towards better search by incorporating coordinate relationship. In: 17th Forum on Data Engineering and Information Management, (2015)Google Scholar
- 19.Baeza-Yates, R.; Hurtado, C.; Mendoza, M.: Query recommendation using query logs in search engines. In: Current Trends in Database Technology, pp. 395–397. Springer, Berlin, (2005)Google Scholar
- 20.Zhao, S.; Wang, H.; Liu, T.: Paraphrasing with search engine query logs. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1317–1325. Association for Computational Linguistics (2010)Google Scholar
- 21.Zhao, S.; Wang, H.; Liu, T.: User behaviors lend a helping hand: learning paraphrase query patterns from search log sessions. In: COLING, pp. 3137–3152 (2012)Google Scholar
- 22.Al-shalabi, R.; Kanaan, G.; Naji, N.A.; Yaseen, M.; Al-Sarayreh, B.: Arabic query expansion using interactive word sense disambiguation. In: Second International Conference on Arabic Language Resources and Tools, pp. 156–158, Cairo, Egypt (2006)Google Scholar
- 23.Kanaan G., Al-shalabi R., Ghwanmeh S., Bani-Ismail B.: Interactive and automatic query expansion: a comparative study with an application on Arabic. Am. J. Appl. Sci. 5(11), 1433–1436 (2008)CrossRefGoogle Scholar
- 24.Harrag, F.; Hamdi-Cherif, A.; El-Qawasmeh, E.: Vector space model for Arabic information retrieval application to Hadith Indexing. In: Applications of Digital Information and Web Technologies, pp. 107–112. IEEE (2008)Google Scholar
- 25.Fowkes, H.; Beaulieu, M.: Interactive searching behaviour: Okapi experiment for TREC-8. In: 22nd BCS-IRSG European Colloquium on IR Research, (2000)Google Scholar
- 26.Mahgoub, A.Y.; Rashwan, M.A.; Raafat, H.; Zahran, M.A.; Fayek, M.B.: Semantic query expansion for Arabic information retrieval. In: Proceedings of the EMNLP, pp. 87–92, Doha, Qatar (2014)Google Scholar
- 27.Khafajeh, H.; Yousef, N.: Evaluation of different query expansion techniques by using different similarity measures in Arabic documents. Int. J. Comput. Sci. Issues 10(4), 160–166 (2013)Google Scholar
- 28.Abderrahim, M.A.; Abderrahim, M.E.A.; Chikh, M.A.: Using Arabic wordnet for semantic indexation in information retrieval system. Int. J. Comput. Sci. 10(1), 327–332 (2013)Google Scholar
- 29.Rachidi, T.; Bouzoubaa, M.; Elmortaji, L.; Boussouab, B.; Bensaid, A.: Arabic User search query correction and expansion. In: Proceedings of COPSTIC 3, pp. 11–13 (2003)Google Scholar
- 30.Rachidi, T.; Iraqi, O.; Bouzoubaa, M.; Khattab, A.B.E.; Kourdi, M.E.; Zahi, A.; Bensaid, A.: Barq: distributed multilingual internet search engine with focus on Arabic language. In: Systems, Man and Cybernetics, pp. 428–435. IEEE, Washington, D.C., USA (2003)Google Scholar
- 31.Hammo B.H.: Towards enhancing retrieval effectiveness of search engines for diacritisized Arabic documents. Inf. Retr. 12(3), 300–323 (2009)CrossRefGoogle Scholar
- 32.Farag, A.; Nrnberger, A.: N-grams conflation approach for Arabic text. In: Proceedings of the International Workshop on Improving Non English Web Searching (iNEWS 07), pp. 39–46, Amsterdam City, Netherlands (2007)Google Scholar
- 33.Trad, R.; Mustafa, H.; Koroni, R.; Almaghrabi, A.: Evaluating Arabic wordnet ontology by expansion of Arabic queries using various retrieval models. In: ICT and Knowledge Engineering, pp. 155–162. IEEE (2012)Google Scholar
- 34.Abouenour L., Bouzoubaa K., Rosso P.: An evaluated semantic QE and structure-based approach for enhancing Arabic Q/A. Int. J. Inform. Commun. Technol. 3(3), 37–51 (2010)Google Scholar
- 35.Bar, K.: Deriving Paraphrases for Highly Inflected Languages, with a Focus on Machine Translation. PhD diss, Tel Aviv University (2013)Google Scholar
- 36.Bar K., Dershowitz N.: Using semantic equivalents for Arabic-to-English: example-based translation. Chall. Arabic Mach. Transl. 9, 49–72 (2012)CrossRefGoogle Scholar
- 37.Denkowski, M.; Al-Haj, H.; Lavie, A.: Turker-assisted paraphrasing for English–Arabic machine translation. In: Proceedings of the NAACL HLT 2010, pp. 66–70. Association for Computational Linguistics (2010)Google Scholar
- 38.Al-Shaor, A.; Hmeidi, S.; Najadat, H.: Application of genetic algorithm in automatic query expansion. In: International Arab Conference on Information Technology, Sfax, Tunisia (2008)Google Scholar
- 39.Bashir S.: Combining pre-retrieval query quality predictors using genetic programming. Appl. Intell. 40(3), 525–535 (2014)MathSciNetCrossRefGoogle Scholar
- 40.Bhatnagar P., Pareek N.: Improving pseudo relevance feedback based query expansion using genetic fuzzy approach and semantic similarity notion. J. Inf. Sci. 40(4), 523–537 (2014)CrossRefGoogle Scholar
- 41.Maitah W., Al-Rababaa M., Kannan G.: Improving the effectiveness of information retrieval system using adaptive genetic algorithm. Int. J. Comput. Sci. Inf. Technol. 5(5), 91–105 (2013)Google Scholar
- 42.Araujo, L.; Perezaguera, J.: Improving query expansion with stemming terms: a new genetic algorithm approach. In: The 8th European Conference on Evolutionary Computation in Combinatorial Optimization, pp. 182–193. Springer, Berlin (2008)Google Scholar
- 43.Boughanem M., Chrisment C., Tamine L.: Genetic approach to query space exploration. Inf. Retr. 1(3), 175–192 (1999)CrossRefGoogle Scholar
- 44.Bhatnagar, P.; Pareek, N.: Genetic algorithm-based query expansion for improved information retrieval. Intell. Comput. Commun. Devices. pp. 47–55 (2015)Google Scholar
- 45.Bolajl A., Khader A., AL-Betar M., Awadallah M.: Artificial bee colony algorithm, its variants and applications: a survey. J. Theor. Appl. Inf. Technol. 47(2), 434–459 (2013)Google Scholar
- 46.Akay, B.; Karaboga, D.: Parameter tuning for the artificial bee colony algorithm. Computational collective intelligence. Semantic Web, Social Networks and Multiagent Systems, pp. 608–619 (2009)Google Scholar
- 47.Kromer, P.; Snasel, V.; Platos, J.; Abraham, A.: Implicit user modelling using hybrid meta-heuristics. In: Hybrid Intelligent Systems, pp. 42–47. IEEE, Barcelona (2008)Google Scholar
- 48.Anuradha G., Lavanya Devi G.: Artificial Bee Colony (ABC) approach for ranking web pages. Int. J. Comput. Appl. 99(1), 35–39 (2014)Google Scholar
- 49.Veningston, K.; Shanmugalakshmi, R.: Efficient implementation of web search query reformulation using ant colony optimization. In: Third International Conference BDA, pp. 80–94, New Delhi, India (2014)Google Scholar
- 50.Navrat, P.; Ezzeddine, A.: Bee hive at work: following a developing story on the web. Artif. Intell. Theory Pract. 3, 187–196 (2010)Google Scholar
- 51.Ezzeddine A.B.: Web information retrieval inspired by social insect behaviour. Inf. Sci. Technol. 3(1), 93–100 (2011)Google Scholar
- 52.USTHB, L.; Algiers, A.: Bees swarm optimization based approach for web information retrieval. In: International Conference on Web Intelligence and Intelligent Agent Technology. IEEE (2010)Google Scholar
- 53.Princeton University: About WordNet. http://wordnet.princeton.edu. Accessed 8 March 2013
- 54.Hatcher, E.; Gospodnetic, O.: Lucene in Action. Manning Publications Co., Greenwich (2004)Google Scholar
- 55.Rodriguez, H.; Farwell, D.; Farreres, J.; Bertran, M.; Alkhalifa, M.; Mart_Õ, M.A.; Black, W.; Elkateb, S.; Kirk, J.; Pease, A.: Arabic wordnet: current state and future extensions. In: Proceedings of The Fourth Global WordNet Conference, (2008)Google Scholar
- 56.AbdulAmeer, A.; ALTaie, A.: Homonymy in English and Arabic: a contrastive study. http://www.uobabylon.edu.iq/uobcoleges/fileshare/articles/Homonymy.pdf. Accessed 3 Feb 2013
- 57.Larkey, L.S.; Ballesteros, L.; Connell, M.E.: Light Stemming for Arabic Information Retrieval. In: Arabic Computational Morphology, vol. 38, pp. 221–243 (2007)Google Scholar
- 58.Holland J.H.: Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology. MIT press, Cambridge (1992)Google Scholar
- 59.Melanie M.: An Introduction to Genetic Algorithms. Massachusetts Institute of Technology, London (1999)Google Scholar
- 60.Boyabatli O., Sabuncuoglu I.: Parameter selection in genetic algorithms. Syst. Cybern. Inf. 2(4), 78–83 (2007)Google Scholar
- 61.Ykhlef, M.; Al-Dayel, A.: Query paraphrasing using genetic approach for intelligent information retrieval. In: The 7th International Conference for Internet Technology and Secured Transactions, pp. 699–703. IEEE UK/RI Computer Chapter (2012)Google Scholar
- 62.Karaboga D., Basturk B.: A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. J. Global Optim. 39(3), 459–471 (2007)MATHMathSciNetCrossRefGoogle Scholar
- 63.Al-Dayel, A.; Ykhlef, M.: Query paraphrasing enhancement using artificial bee colony. In: Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics. ACM, Madrid (2013)Google Scholar
- 64.Abdelali, A.; Cowie, J.; Soliman, H.S.: Arabic information retrieval perspectives. In: Proceedings of the 11th Conference on Natural Language Processing, (JEP-TALN), pp. 391–400 (2004)Google Scholar
- 65.Darwish K., Magdy W.: Arabic information retrieval. Found. Trends Inf. Retr. 7(4), 239–342 (2013)CrossRefGoogle Scholar
- 66.Webber, W.E.: Measurement in Information Retrieval Evaluation. Phd diss, University of Melbourne (2010)Google Scholar
- 67.Intelligentia s.r.l.: Jenes 2.0. http://jenes.intelligentia.it/. Accessed 5-6-2013
- 68.Buckley, C.: Trec Eval IR Evaluation Package. http://trec.nist.gov/trec_eval/. Accessed 5 Apr 2014
- 69.Said, D.; Wanas, N.M.; Darwish, N.M.; Hegazy, N.: A study of text preprocessing tools for Arabic text categorization. In: The Second International Conference on Arabic Language, pp. 230–236. The MEDAR Consortium, Cairo, Egypt (2009)Google Scholar
- 70.Abbas, M.; Kamel, S.: Comparison of topic identification methods for Arabic language. In: Recent Advances in Natural Language Processing, pp. 14–17, Bulgary (2005)Google Scholar
- 71.Polyvyanyy, A.S.: Evaluation Design of Information Retrieval System With ETVSM Specific Extensions. https://bpt.hpi.uni-potsdam.de/pub/Public/SeminarPublications/ArtemPolyvyanyy.pdf. Accessed 7 Dec 2013