Skip to main content
Log in

Towards adaptive structured Dirichlet smoothing model for digital resource objects

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Digital resource objects (DRO) are among the most valuable resources that store the accumulated knowledge of humankind. Nowadays, many organisations aim to make these resources available to users. Basically, Dirichlet smoothing (DS) model is widely used to retrieve DRO documents. DS model uses a smoothing parameter μ which plays a strong role in finding the value of the unseen terms to avoid zero probability value. For documents of equal length, the value of μ is set as a constant value although its value depends on the length of a document. In DROs, almost all documents are of different length, and each metadata unit in a document also has a different length. Hence, it is not appropriate to predefine the μ parameter with a constant value and uses it for different search space. This leads to difficulty in accessing and retrieving the DRO documents. To solve fixed smoothing-parameter value problem in DRO’s retrieval, and make DROs more accessible, Adaptive Dirichlet Smoothing (ADS) and Adaptive Structured Dirichlet Smoothing (ASDS) models are proposed to improve the performance of the DRO’s retrieval by estimating the smoothing parameter automatically. The proposed ASDS model comprises the ADS model together with an existing DS model. Experimental results on CHiC2013 collections show that the proposed models have the ability to retrieve the most relevant results (documents or metadata units) related to a particular query and reduce the zero-probability values compared with state-of-the-art traditional methods particularly on DROs. Moreover, t-test result is used to prove that the performance of the proposed models is statistically significant.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Abdulmutalib N, Fuhr N (2008) Language models and smoothing methods for collections with large variation in document length. In 2008 19th International Workshop on Database and Expert Systems Applications, pp. 9-14. IEEE

  2. Alma’aitah WZ, Talib AZ, Osman MA (2019) Document expansion method for digital resource objects. In 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), pp. 256-260.

  3. Alma’aitah WZ, Talib AZ, Osman MA (2020) Opportunities and challenges in enhancing access to metadata of cultural heritage collections: a survey. Artif Intell Rev 53(5):3621–3646. https://doi.org/10.1007/s10462-019-09773-w

    Article  Google Scholar 

  4. Alma'aitah WZ, Zawawi Talib A, Osman M (2019a) Information retrieval framework for digital resource objects. International Journal of Advanced Trends in Computer Science and Engineering 8(1):6

    Google Scholar 

  5. Alma'aitah WZ, Zawawi Talib A, Osman M (2019b) Structured Dirichlet smoothing model for digital resource objects. International Journal of Engineering and Advanced Technology 9(1):4

    Google Scholar 

  6. Almasri M (2013) Semantic query structuring to enhance precision of an information retrieval system: application to the medical domain. In CORIA:293–298

  7. Almasri, M., Tan, K., Berrut, C., Chevallet, J.-P., & Mulhem, P. (2014). Integrating semantic term relations into information retrieval systems based on language models. In Asia Information Retrieval Symposium, pp. 136-147. Springer

  8. Alnaied, A., Elbendak, M., & Bulbul, A. (2020). An intelligent use of stemmer and morphology analysis for Arabic information retrieval. Egyptian Informatics Journal

  9. Arslan A (2020) On the usefulness of html meta elements for web retrieval. Anadolu University of Sciences & Technology-A: Applied Sciences & Engineering 21(1)

  10. Azzopardi L, Losada DE (2007) Fairly retrieving documents of all lengths. In: In proceedings of the first international conference in theory of information retrieval (ICTIR 2007), pp 65–76

    Google Scholar 

  11. Berger, A., & Lafferty, J. (1999). Information retrieval as statistical translation. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 222-229. ACM

  12. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022

    MATH  Google Scholar 

  13. Boban I, Doko A, Gotovac S (2020) Improving sentence retrieval using sequence similarity. Appl Sci 10(12):4316

    Article  Google Scholar 

  14. Brocks H, Thiel U, Stein A, Dirsch-Weigand A (2001) Customizable retrieval functions based on user tasks in the cultural heritage domain. In International Conference on Theory and Practice of Digital Libraries, pp. 37-48. Springer

  15. Bruza P, Song D (2003). A comparison of various approaches for using probabilistic dependencies in language modeling. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 419-420. ACM

  16. Câmara A, Hauff C (2020) Diagnosing BERT with Retrieval Heuristics. In, pp. 605-618. Springer International Publishing

  17. Candela L, Castelli D, Ferro N, Ioannidis Y, Koutrika G, Meghini C, … Agosti M (2007) The DELOS digital library reference model. Foundations for digital libraries, ISTI-CNR

    Google Scholar 

  18. Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval. ACM Computing Surveys (CSUR) 44(1):1–50

    Article  MATH  Google Scholar 

  19. Cechinel, C., Sánchez-Alonso, S., & Sicilia, M. Á. (2009, 2009). Empirical analysis of errors on human-generated learning objects metadata. In Metadata and semantic research, pp. 60–70. Springer Berlin Heidelberg

  20. Chen SF, Goodman J (1999) An empirical study of smoothing techniques for language modeling. Comput Speech Lang 13(4):359–394

    Article  Google Scholar 

  21. Cummins R, Paik JH, Lv Y (2015) A Pólya urn document language model for improved information retrieval. ACM Transactions on Information Systems (TOIS) 33(4):21

    Article  Google Scholar 

  22. Darwish, K., & Oard, D. W. (2007). Adapting morphology for arabic information retrieval Arabic Computational Morphology (pp. 245-262): Springer.

  23. Duris F, Gazdarica J, Gazdaricova I, Strieskova L, Budis J, Turna J, Szemes T (2018) Mean and variance of ratios of proportions from categories of a multinomial distribution. Journal of Statistical Distributions and Applications 5(1):2

    Article  MATH  Google Scholar 

  24. Hatano, K., Kinutani, H., Yoshikawa, M., & Uemura, S. (2002). Information retrieval system for XML documents. In International Conference on Database and Expert Systems Applications, pp. 758-767. Springer

  25. He, B., & Ounis, I. (2005). A study of the dirichlet priors for term frequency normalisation. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 465-471. ACM

  26. Jungmaier J, Kassner N, Roth B (2020). Dirichlet-smoothed word embeddings for low-resource settings. arXiv preprint arXiv:2006.12414.

  27. Krasakis, A. M., Aliannejadi, M., Voskarides, N., & Kanoulas, E. (2020). Analysing the effect of clarifying questions on document ranking in conversational search. arXiv preprint arXiv:2008.03717.

  28. Lafferty J, Zhai C (2001) Document language models, query models, and risk minimization for information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 111-119. ACM

  29. Laitang C, Pinel-Sauvagnat K, Boughanem M (2013) Estimating structural relevance of XML elements through language model. In Proceedings of the 10th Conference on Open Research Areas in Information Retrieval, pp. 41–46.

  30. Lavrenko V, Choquette M, Croft WB (2002) Cross-lingual relevance models. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 175-182. ACM

  31. Little RJ, Rubin DB (2014) Statistical analysis with missing data (Vol. 333): John Wiley & Sons.

  32. Losada DE, Azzopardi L (2008) An analysis on document length retrieval trends in language modeling smoothing. Inf Retr 11(2):109–138. https://doi.org/10.1007/s10791-007-9040-x

    Article  Google Scholar 

  33. Lv Y, Zhai C (2009a) Positional language models for information retrieval. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pp. 299-306. ACM

  34. Lv Y, Zhai C (2009b) Positional language models for information retrieval. In: Paper presented at the proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. MA, USA, Boston

    Google Scholar 

  35. Manning P (2013) Introduction drugs and popular culture (pp. 10-13): Willan.

  36. Mataoui MH, Sebbak F, Benhammadi F, Bey KB (2015). Query expansion in XML information retrieval: a new approach for terms selection. In Modeling, simulation, and applied optimization (ICMSAO), 2015 6th International Conference on, pp. 1-4. IEEE

  37. Mei Q, Ling X, Wondra M, Su H, Zhai C (2007) Topic sentiment mixture: modeling facets and opinions in weblogs. In Proceedings of the 16th international conference on World Wide Web, pp. 171-180. ACM

  38. Nallapati R, Allan J (2002) Capturing term dependencies using a language model based on sentence trees. In Proceedings of the eleventh international conference on Information and knowledge management, pp. 383-390. ACM

  39. Ogawa K, Murahashi T, Taguchi H, Nakajima K, Takehara M, Tamura S, Hayamizu S (2016) Spoken document retrieval using neighboring documents and extended language models for query likelihood model. In NTCIR, pp. 186-190.

  40. Ogilvie P, Callan J (2003) Language models and structured document retrieval. In Proceeding of the INitiative for the Evaluation of XML Retrieval (INEX), pp. 12-18.

  41. Parikh N, Sriram P, Al Hasan M (2013). On segmentation of ecommerce queries. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, pp. 1137-1146. ACM

  42. Ponte JM, Croft WB (1998) A language modeling approach to information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 275-281. ACM

  43. Rahimi R, Montazeralghaem A, Shakery A (2020) An axiomatic approach to corpus-based cross-language information retrieval. Information Retrieval Journal, 1-25.

  44. Si L, Jin R, Callan, J, Ogilvie P (2002). A language modeling framework for resource selection and results merging. In Proceedings of the eleventh international conference on Information and knowledge management, pp. 391-397. ACM

  45. Singhal, A., & Pereira, F. (1999). Document expansion for speech retrieval. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 34-41. ACM

  46. Smucker, M. D., Kulp, D., & Allan, J. (2005). Dirichlet mixtures for query estimation in information retrieval. University of Massachusetts Amherst, Department of Computer Science: Technical Report IR-445.

  47. Strohman, T., Metzler, D., Turtle, H., & Croft, W. B. (2005). Indri: A language model-based search engine for complex queries. In Proceedings of the International Conference on Intelligent Analysis, pp. 2-6. Citeseer

  48. Tan (2015). Extended language model in cultural heritage collection (PhD thesis), Universiti Sains Malaysia.

  49. Wang J, Pan M, He T, Huang X, Wang X, Tu X (2020) A pseudo-relevance feedback framework combining relevance matching and semantic matching for information retrieval. Inf Process Manag 57(6):102342

    Article  Google Scholar 

  50. Winther, O. (2020). Method of and system for information retrieval: Google patents.

    Google Scholar 

  51. Witten IH, Bainbridge D, Paynter G, Boddie S (2002, 2002//). Importing documents and metadata into digital libraries: requirements analysis and an extensible architecture. In Research and advanced Technology for Digital Libraries, pp. 390–405. Springer Berlin Heidelberg

  52. Xu J, Croft WB (1999) Cluster-based language models for distributed retrieval. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 254-261. ACM

  53. Xu J, Weischedel R, Nguyen C (2001) Evaluating a probabilistic model for cross-lingual information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 105-110. ACM

  54. Xu B, Lin H, Lin Y, Guan Y (2020) Integrating social annotations into topic models for personalized document retrieval. Soft Comput 24(3):1707–1716. https://doi.org/10.1007/s00500-019-03998-1

    Article  Google Scholar 

  55. Zhai C (2002). Risk minimization and language modeling in text retrieval. PhD thesis, Carnegie Mellon University.

  56. Zhai (2008a) Statistical language models for information retrieval. Synthesis Lectures on Human Language Technologies 1(1):1–141

    Article  Google Scholar 

  57. Zhai C (2008b) Statistical language models for information retrieval. Synthesis Lectures on Human Language Technologies 1(1):1–141

    Article  Google Scholar 

  58. Zhai C, Lafferty J (2001) Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the tenth international conference on Information and knowledge management, pp. 403-410. ACM

  59. Zhai C, Lafferty J (2004) A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems (TOIS) 22(2):179–214

    Article  Google Scholar 

  60. Zhai C, Lafferty J (2017) A study of smoothing methods for language models applied to ad hoc information retrieval. In ACM SIGIR Forum, pp. 268-276. ACM

  61. Zhao L, Callan J (2008) A generative retrieval model for structured documents. In Proceedings of the 17th ACM conference on Information and knowledge management, pp. 1163-1172. ACM

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wafa’ Za’al Alma’aitah.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alma’aitah, W.Z., Talib, A.Z. & Osman, M.A. Towards adaptive structured Dirichlet smoothing model for digital resource objects. Multimed Tools Appl 80, 12175–12194 (2021). https://doi.org/10.1007/s11042-020-10305-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-10305-w

Keywords

Navigation