Towards a Contextual and Semantic Information Retrieval System Based on Non-negative Matrix Factorization Technique

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 736)

Abstract

With the fast speed of technological evolution, information retrieval systems are trying to confront the large amount of textual data in order to retrieve pertinent information to meet users needs or queries. Information retrieval systems depend also on the user’s query who often finds difficulty to express his need.

To resolve these problems, we propose, in this paper, a new approach that provides a contextual and semantic information retrieval system.

Our proposed system is based firstly on NNMF (Non-negative Matrix Factorization) technique for data analysis in order to present textual data with new and small representations and to organize this data into categories. Secondly, our system try to how ameliorate the user’s need with new semantic keywords that keeping the same context of the original query, by exploiting obtained results by the used data analysis technique and the LSM method that defines semantic relationships between terms.

Experimental results performed on the ClefEhealth-2014 database demonstrate the performance of our proposed approach on large scale text collections.

Keywords

Semantic search system Data analysis Information retrieval Text representation Semantic relationships 

References

  1. 1.
    Salton, G., Wong, A., Yang, C.-S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)CrossRefMATHGoogle Scholar
  2. 2.
    Liu, M., Wu, C., Chen, L.: A vector reconstruction based clustering algorithm particularly for large-scale text collection. Neural Netw. 63, 141–155 (2015)CrossRefGoogle Scholar
  3. 3.
    Zurada, J.M., et al.: Nonnegative matrix Factorization and its application to pattern analysis and text mining. In: Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 11–16. IEEE (2013)Google Scholar
  4. 4.
    Devarajan, K., Wang, G., Ebrahini, N.: A unified statistical approach to non-negative matrix factorization and probabilistic latent semantic indexing. Mach. Learn. 99(1), 137–163 (2015)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Ye, Y.Q.: Comparing matrix methods in text-based information retrieval. School of Mathematical Sciences, Peking University, Technical report (2000)Google Scholar
  6. 6.
    Zhang, W., Yoshida, T., Tang, X.: A comparative study of TF * IDF, LSI and multi-words for text classification. Exp. Syst. Appl. 38(3), 2758–2765 (2011)CrossRefGoogle Scholar
  7. 7.
    Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdisc. Rev. Comput. Stat. 2(4), 433–459 (2010)CrossRefGoogle Scholar
  8. 8.
    Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. (CSUR) 44(1), 1 (2012)CrossRefMATHGoogle Scholar
  9. 9.
    Ksentini, N., Tmar, M., Gargouri, F.: Detection of semantic relationships between terms with a new statistical method. In: WEBIST Conference (2014)Google Scholar
  10. 10.
    Goeuriot, L., et al.: Share/clef ehealth evaluation lab 2014, task 3: user-centred health information retrieval (2014)Google Scholar
  11. 11.
    Ksentini, N., Tmar, M., Gargouri, F.: Controlled automatic query expansion based on a new method arisen in machine learning for detection of semantic relationships between terms. In: 15th IEEE International Conference on Intelligent Systems Design and Applications (ISDA), pp. 134–139 (2015)Google Scholar
  12. 12.
    Ksentini, N., Tmar, M., Gargouri, F.: The impact of term statistical relationships on Rocchio’s model parameters for pseudo relevance feedback. Int. J. Comput. Inf. Syst. Ind. Manage. Appl. 8, 135–144 (2016)Google Scholar
  13. 13.
    Ksentini, N., Tmar, M., Gargouri, F.: Miracl at CLEF 2014: eHealth information retrieval task. In: Proceedings of the ShARe/CLEF eHealth Evaluation Lab (2014)Google Scholar
  14. 14.
    Ksentini, N., Tmar, M., Gargouri, F.: Towards automatic improvement of patient queries in health retrieval systems. Appl. Med. Inf. 38(2), 73–80 (2016)Google Scholar
  15. 15.
  16. 16.

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Nesrine Ksentini
    • 1
  • Mohamed Tmar
    • 1
  • Faïez Gargouri
    • 1
  1. 1.MIRACL LaboratoryUniversity of SfaxSakiet Ezzeit, SfaxTunisia

Personalised recommendations