Query expansion based on clustering and personalized information retrieval

Khalifi, Hamid; Cherif, Walid; Qadi, Abderrahim El; Ghanou, Youssef

doi:10.1007/s13748-019-00178-y

Query expansion based on clustering and personalized information retrieval

Regular Paper
Published: 04 March 2019

Volume 8, pages 241–251, (2019)
Cite this article

Progress in Artificial Intelligence Aims and scope Submit manuscript

Hamid Khalifi ORCID: orcid.org/0000-0002-3367-9748¹,
Walid Cherif²,
Abderrahim El Qadi³ &
…
Youssef Ghanou¹

366 Accesses
8 Citations
Explore all metrics

Abstract

Information retrieval systems are used to describe a variety of processes involving the delivery of information to people who need it. Although several mathematical approaches have been studied in order to formalize the main components of an information retrieval system: queries representation, information items representations and the retrieval process, such systems still face many difficulties to extract relevant information for users especially when the processed data are texts. This is due to the complex nature of text databases. Generally, an information retrieval system reformulates queries according to associations among information items before matching them to dataset items. In this sense, semantic relationships or machine learning techniques can be applied to refine the returned results. This paper presents a formal model to organize data, and a new search algorithm to browse it. It incorporates a natural language preprocessing stage, a statistical representation of short documents and queries and a machine learning model to select relevant results. We propose later in this paper two further optimizations that proved quite interesting and returned significantly satisfying results on two datasets in a reasonable computation time. The first optimization concerns queries expansions, while the second one concerns dataset restructuration. Thus, we formally evaluate the impact of each optimization by computing the performance of the information retrieval system with and without it; the highest reached recall and precision were 96.2% and 99.2%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid optimization and ontology-based semantic model for efficient text-based information retrieval

Article 10 August 2022

A Tutorial on Information Retrieval Using Query Expansion

Clustering Algorithms for Query Expansion Based Information Retrieval

Notes

Yahoo! Webscope dataset ydata-ymusic-user-artist-ratings-v1_0 [http://research.yahoo.com/Academic_Relations].

References

Barreau, D., Nardi, B.A.: Finding and reminding: file organization from the desktop. SIGCHI Bull. 27(3), 329–339 (1995)
Article Google Scholar
Berger, A., Lafferty, J.: Information retrieval as statistical translation. In: ACM SIGIR Forum, vol. 51, no. 2, pp. 219–226. ACM (2017)
Bordogna, G., Carrara, P., Pasi, G.: Query term weights as constraints in fuzzy information retrieval. Inf. Process. Manage. 27(1), 15–26 (1991)
Article Google Scholar
Cai, F., De Rijke, M.: A survey of query auto completion in information retrieval. Found. Trends Inf. Retr. 10(4), 273–363 (2016)
Article Google Scholar
Cai, F., Liang, S., De Rijke, M.: Personalized document re-ranking based on bayesian probabilistic matrix factorization, pp. 835–838. SIGIR, ACM (2014)
Google Scholar
Campos, R., Dias, G., Jorge, A.M., Jatowt, A.: Survey of temporal information retrieval and related applications. ACM Comput. Surv. (CSUR) 47(2), 15 (2015)
Google Scholar
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. (CSUR) 44(1), 1 (2012)
Article MATH Google Scholar
Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Google Scholar
Cherif, W., Madani, A., Kissi, M.: New rules-based algorithm to improve Arabic stemming accuracy. Int. J. Knowl. Eng. Data Mining 3(3–4), 315–336 (2015)
Article Google Scholar
Cherif, W., Madani, A., Kissi, M.: Towards an efficient opinion measurement in Arabic comments. Proc. Comput. Sci. 73, 122–129 (2015)
Article Google Scholar
Cherif, W.: Optimization of K-NN algorithm by clustering and reliability coefficients: application to breast-cancer diagnosis. Proc. Comput. Sci. 127, 293–299 (2018)
Article Google Scholar
Conneau, A., Schwenk, H., Barrault, L., Lecun, Y.: Very deep convolutional networks for natural language processing. arXiv preprint arXiv:1606.01781 (2016)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)
Book MATH Google Scholar
Dumais, S., Cutrell, E., Cadiz, J.J., Jancke, G., Sarin, R., Robbins, D.C.: Stuff I’ve seen: a system for personal information retrieval and re-use. In ACM SIGIR Forum, vol. 49, no. 2, pp. 28–35. ACM (2016)
El Ghali, B., El Qadi, A.: Context-aware query expansion method using language models and latent semantic analyses. Knowl. Inf. Syst. 50(3), 751–762 (2017)
Article Google Scholar
Erickson, T.: The design and long-term use of a personal electronic notebook: a reflective analysis. In: Proceedings of CHI’96, pp. 11–18 (1996)
Furnas, G.W., Landauer, T.K., Gomez, L.M., Dumais, S.T.: The vocabulary problem in human-system communication. Commun. ACM 30(11), 964–971 (1987)
Article Google Scholar
Ghorab, M.R., Zhou, D., O’connor, A., Wade, V.: Personalised information retrieval: survey and classification. User Model. User-Adap. Inter. 23(4), 381–443 (2013)
Article Google Scholar
Harper, D.J., Van Rijsbergen, C.J.: An evaluation of feedback in document retrieval using co-occurrence data. J. Doc. 34(3), 189–216 (1978)
Article Google Scholar
Hattie, J.: Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement. Routledge, London (2008)
Book Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: ACM SIGIR Forum, vol. 51, no. 2, pp. 211–218. ACM (2017)
Jain, A., Mishne, G.: Organizing query completions for web search. In: Proceedings of the 19th ACM international conference on Information and knowledge management, pp. 1169–1178. ACM (2010)
Jones, S.R., Thomas, P.J.: Empirical assessment of individuals’ ‘personal information management systems’. Behav. Inf. Technol. 16(3), 158–160 (1997)
Article Google Scholar
Jones. W.P., Dumais, S.T., Bruce, H.: Once found, what then? A study of “Keeping” behaviors in the personal use of web information. In: Proceedings of ASIST, pp. 391–402 (2002)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of Tricks for Efficient Text Classification. arXiv preprint arXiv:1607.01759. (2016)
Khalifi, H., Elqadi, A., Ghanou, Y.: Support Vector Machines for a new Hybrid Information Retrieval System. Proc. Comput. Sci. 127(C), 139–145 (2018)
Article Google Scholar
Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., Fidler, S.: Skip-thought vectors. In: Advances in Neural Information Processing Systems, pp. 3294–3302 (2015)
Ko, Y.: How to use negative class information for Naive Bayes classification. Inf. Process. Manage. 53(6), 1255–1268 (2017)
Article Google Scholar
Krishnamurthy, S., Akila, V.: Information retrieval models: trends and techniques. In: Web Semantics for Textual and Visual Information Retrieval, pp. 17–42. IGI Global (2017)
Labjar, H., Cherif, W., Nadir, S., Digua, K., Sallek, B., Chaair, H.: Support vector machines for modelling phosphocalcic hydroxyapatite by precipitation from a calcium carbonate solution and phosphoric acid solution. J. Taibah Univ. Sci. 10(5), 745–754 (2016)
Article Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 1188–1196 (2014)
Lewis, D.D.: Naive (Bayes) at forty: the independence assumption in information retrieval. In European Conference on Machine Learning, pp. 4–15. Springer, Berlin, Heidelberg (1998)
Lewis, D.D.: Learning in intelligent information retrieval. In: Machine Learning: Proceedings of the Eighth International Workshop, pp. 235–239 (2014)
Li, B., Han, L.: Distance weighted cosine similarity measure for text classification. In: International Conference on Intelligent Data Engineering and Automated Learning, pp. 611–618. Springer, Berlin, Heidelberg (2013)
Lu, Y., Hsiao, I.H.: Personalized Information Seeking Assistant (PiSA): from programming information seeking to learning. Inf. Retr. J. 20(5), 433–455 (2017)
Article Google Scholar
Malone, T.: How do people organize their desks? Implications for the design of office information systems. ACM Trans. Office Inf. Syst. 1(1), 99–112 (1983)
Article Google Scholar
Mao, R., Chen, G., Li, R., & Lin, C.: ABDN at SemEval-2018 Task 10: recognising discriminative attributes using context embeddings and WordNet. In: Proceedings of the 12th International Workshop on Semantic Evaluation, pp. 1017–1021 (2018)
Marais, H., Bharat, K.: Supporting cooperative and personal surfing with a desktop assistant. Proc. UIST 1997, 129–138 (1997)
Article Google Scholar
Micarelli, A., Gasparetti, F., Sciarrone, F., Gauch, S.: Personalized search on the world wide web. In: The adaptive web, pp. 195–230. Springer, Berlin, Heidelberg (2007)
Moniz, N., Torgo, L.: Multi-Source Social Feedback of Online News Feeds. arXiv preprint arXiv:1801.07055 (2018)
Nie, J.: An information retrieval model based on modal logic. Inf. Process. Manage. 25(5), 477–491 (1989)
Article Google Scholar
Palangi, H., Deng, L., Shen, Y., Gao, J., He, X., Chen, J., Song, X., Ward, R.: Deep sentence embedding using long short-term memory networks: analysis and application to information retrieval. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 24(4), 694–707 (2016)
Article Google Scholar
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: ACM SIGIR Forum, vol. 51, no. 2, pp. 202–208. ACM (2017)
Powers, D.M.W.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness & correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)
MathSciNet Google Scholar
Qu, Z., Song, X., Zheng, S., Wang, X., Song, X., Li, Z.: Improved Bayes method based on TF-IDF feature and grade factor feature for Chinese information classification. In: 2018 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 677–680. IEEE (2018)
Rajman, M., Vesely, M.: From text to knowledge: document processing and visualization: a text mining approach. In: Text mining and its applications, pp. 7–24. Springer, Berlin, Heidelberg (2004)
Rhodes, B., Starner, T.: Remembrance agent: a continuously running automated information retrieval system. In: The Proceedings of the First International Conference on The Practical Application Of Intelligent Agents and Multi Agent Technology, pp. 487–495 (1996)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Article Google Scholar
Silvestri, F.: Mining query logs: turning search usage data into knowledge. Foundations and Trends^® in Information Retrieval, 4(1–2), 1-174. (2009)
Shen, Y., He, X., Gao, J., Deng, L., Mesnil, G.: A latent semantic model with convolutional-pooling structure for information retrieval. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 101–110. ACM (2014)
Smits, G.F., Jordaan, E.M.: Improved SVM regression using mixtures of kernels. In: Proceedings of the 2002 International Joint Conference on Neural Networks, 2002. IJCNN’02, vol. 3, pp. 2785–2790. IEEE (2002)
Suykens, J.A., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)
Article Google Scholar
UtreraSust, E., Simon-Cuevas, A., Olivas, J.A., Romero, F.P.: An approach of a personalized information retrieval model based on contents semantic analysis. Procesamiento del lenguaje natural 61, 31–38 (2018)
Google Scholar
Vapnik, V., Mukherjee, S.: Support vector method for multivariate density estimation. In: Advances in Neural Information Processing Systems, pp. 659–665 (2000)
Walpole, R.E., Myers, R.H., Myers, S.L., Ye, K.: Probability and Statistics for Engineers and Scientists, vol. 5. Macmillan, New York (1993)
MATH Google Scholar
Whittaker, S., & Sidner, C.: Email overload: exploring personal information management of email. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 276-283). ACM. (1996)
Xiao, C., Qin, J., Wang, W., Ishikawa, Y., Tsuda, K., Sadakane, K.: Efficient error-tolerant query autocompletion. Proceedings of the VLDB Endowment 6(6), 373–384 (2013)
Article Google Scholar
Yin, Z., Shokouhi, M., & Craswell, N.: Query Expansion Using External Evidence. In ECIR (Vol. 9, pp. 362-374). (2009)
Zhai, C., & Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In ACM SIGIR Forum (Vol. 51, No. 2, pp. 268-276). ACM. (2017)
Zhang, X., Zhao, J., & LeCun, Y.: Character-level convolutional networks for text classification. In Advances in neural information processing systems (pp. 649-657) (2015)

Download references

Author information

Authors and Affiliations

TIM Team, High School of Technology, Moulay Ismail University, Meknes, Morocco
Hamid Khalifi & Youssef Ghanou
SI2M Laboratory, National Institute of Statistics and Applied Economics, Rabat, Morocco
Walid Cherif
High School of Technology, Mohammed V University, Rabat, Morocco
Abderrahim El Qadi

Authors

Hamid Khalifi
View author publications
You can also search for this author in PubMed Google Scholar
Walid Cherif
View author publications
You can also search for this author in PubMed Google Scholar
Abderrahim El Qadi
View author publications
You can also search for this author in PubMed Google Scholar
Youssef Ghanou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hamid Khalifi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khalifi, H., Cherif, W., Qadi, A.E. et al. Query expansion based on clustering and personalized information retrieval. Prog Artif Intell 8, 241–251 (2019). https://doi.org/10.1007/s13748-019-00178-y

Download citation

Received: 25 December 2018
Accepted: 26 February 2019
Published: 04 March 2019
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s13748-019-00178-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Query expansion based on clustering and personalized information retrieval

Abstract

Access this article

Similar content being viewed by others

Hybrid optimization and ontology-based semantic model for efficient text-based information retrieval

A Tutorial on Information Retrieval Using Query Expansion

Clustering Algorithms for Query Expansion Based Information Retrieval

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Query expansion based on clustering and personalized information retrieval

Abstract

Access this article

Similar content being viewed by others

Hybrid optimization and ontology-based semantic model for efficient text-based information retrieval

A Tutorial on Information Retrieval Using Query Expansion

Clustering Algorithms for Query Expansion Based Information Retrieval

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation