Opportunities and challenges in enhancing access to metadata of cultural heritage collections: a survey


Machine processable data that narrate digital/non-digital resources are termed as metadata. Different metadata standards exist for describing various types of digital objects. Several researches have reported on how to address issues related to accessing of metadata resources. Most studies on metadata involve cultural heritage domain, and this is an indication of the importance of this domain in metadata research and development. Research on metadata in cultural heritage mainly revolves around three fundamental issues: (1) lack of quality in metadata contents in most of the cases, (2) difficulty in accessing metadata contents due largely to limited user’s knowledge on the content of the metadata, and (3) heterogeneity of the data at the level of schemas which makes the access even more difficult. The lack of quality in metadata makes it difficult for the users to retrieve and explore information that satisfies their needs. So, in order to make its contents more accessible, enhancing the metadata content is required, especially for cultural heritage collections which consist of digital objects (structured documents) described by a variety of metadata schemas. This paper presents issues and challenges in enhancing access to metadata by reviewing the existing approaches in metadata environment with a particular emphasis on cultural heritage collections. In this paper, firstly, we look at the classification of metadata which is divided into two categories namely data retrieval and information retrieval. Then, we present the analysis, findings and suggestions on how to address issues in enhancing access to metadata contents especially in cultural heritage collections. A detailed comparison is given between information retrieval and data retrieval, and it focuses on the applicability of one approach over the other. A framework that aims to improve the effectiveness of retrieval when searching metadata is also proposed and tested. The proposed framework consists of approaches and methods that are expected to enhance access to metadata especially in cultural heritage collections and be useful for those with limited knowledge on cultural heritage. The experiments were conducted on CHiC2013 which is a collection on cultural heritage. The results show a considerable enhancement over other IR approaches that use the expansion methods.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2


  1. Abd Manaf Z (2007) The state of digitization initiatives by cultural institutions Malaysia. An exploratory survey. Libr Rev 56(1):45–60

    Article  Google Scholar 

  2. Agirre E, Arregi X, Otegi E (2010) Document expansion based on WordNet for robust IR. In: Proceedings of the 23rd international conference on computational linguistics, China, pp 9–17

  3. Agosti M, Conlan O, Ferro N, Hampson C, Munnelly G (2013) Interacting with digital cultural heritage collections via annotations: the CULTURA approach. In: Proceedings of the 2013 ACM symposium on document engineering. ACM, New York, NY, USA, pp 13–22

  4. Akasereh M (2013) A quantitative evaluation of query expansion in domain specific information retrieval. In: Proceedings of the 76th ASIS&T annual meeting: beyond the cloud: rethinking information boundaries, Montreal, Quebec, Canada, pp 1–5

  5. Alma’aitah WZ, Talib AZ, Osman MA (2019a) Document expansion method for digital resource objects. In: 2019 IEEE Jordan international joint conference on electrical engineering and information technology (JEEIT), pp 256–260

  6. Alma’aitah WZ, Talib AZ, Osman MA (2019b) Information retrieval framework for digital resource objects. In: Presented in international conference on innovations in computer science and engineering (ICICSE2019) Miri, Sarawak, Malaysia

  7. Alma’aitah WZ, Talib AZ, Osman MA (2019c) Language model for digital recourse objects retrieval. J Theor Appl Inf Technol 97(11):2871–2881

    Google Scholar 

  8. AlMasri M, Berrut C, Chevallet J-P (2013) Wikipedia-based semantic query enrichment. In: Proceedings of the sixth international workshop on exploiting semantic annotations in information retrieval. ACM, California, USA, pp 5–8

  9. AlMasri M, Berrut C, Chevallet JP (2014) Exploiting wikipedia structure for short query expansion in cultural heritage. In: Proceedings of the CORIA, pp 287–302

  10. Alvey E (2016) Cultural heritage information: access and management. Aust Acad Res Libr 47(2):120–121

    Article  Google Scholar 

  11. Amer-Yahia S, Botev C, Shanmugasundaram J (2004) Texquery: a full-text search extension to xquery. In: Proceedings of the 13th international conference on World Wide Web, pp 583–594

  12. Antoniou G, Van Harmelen F (2004) A semantic web primer. MIT Press, Cambridge

    Google Scholar 

  13. Aouicha MB, Tmar M, Boughanem M, Abid M (2009) Experiments on element and document statistics for XML retrieval based on tree matching. Int J Comput Inf Sci Eng 3(1):7–16

    Google Scholar 

  14. Arms W (1995) Key concepts in the architecture of the digital library. D-Lib Magazine, Vol 1(1). http://www.dlib.org/dlib/July95/07arms.html. Accessed 2017

  15. Aruleba KD, Akomolafe DT, Afeni B (2016) A full text retrieval system in a digital library environment. Intell Inf Manag 8:720–726

    Google Scholar 

  16. Attar R, Fraenkel AS (1977) Local feedback in full-text retrieval systems. J ACM 24(3):397–417. https://doi.org/10.1145/322017.322021

    Article  MATH  Google Scholar 

  17. Baca M (2003) Practical issues in applying metadata schemas and controlled vocabularies to cultural heritage information. Catal Classif Q 36(3):47–55

    Google Scholar 

  18. Bai J, Song D, Bruza P, Nie JY, Cao G (2005) Query expansion using term relationships in language models for information retrieval. In: Proceedings of the 14th ACM international conference on Information and knowledge management, pp 688–695

  19. Barros EG, Laender AHF, Gonçalves MA, Barbosa FAR (2008) A digital library environment for integrating, disseminating and exploring ecological Data. Ecol Inform 3(4–5):295–308

    Article  Google Scholar 

  20. Baziz M (2005) Indexation conceptuelle guidée par ontologie pour la recherche d’information. Ph.D. thesis, Université de Toulouse, Université Toulouse IIIPaul Sabatier

  21. Bellini E, Deussom MA, Nesi P (2010) Assessing open archive OAI-PMH implementations. DMS, pp 153–158

  22. Berardi G, Esuli A, Gordea S, Marcheggiani D, Sebastiani F (2012) Metadata enrichment services for the Europeana digital library. In: Proceeding of second International conference on theory and practice of digital libraries, Berlin, pp 508–511

  23. Berners-Lee T, Hendler J, Lassila O (2001) The semantic web. Sci Am 284(5):34–43

    Article  Google Scholar 

  24. Bernstein P, Madhavan J, Rahm E (2011) Generic schema matching, ten years later. VLDB Endow J 4(11):695–701

    Article  Google Scholar 

  25. Best BD, Halpin PN, Fujioka E, Read AJ, Qian SS, Hazen LJ, Schick RR (2007) Geospatial web services within scientific workflow: predicting marine mammal habitats in a dynamic environment. Ecol Inform 2(3):210–223

    Article  Google Scholar 

  26. Bigi B, Huang Y, Mori RD (2004) Vocabulary and language model adaptation using information retrieval. In: Proceedings of the ICSLP, pp 602–605

  27. Brocks H, Thiel U, Stein A, Dirsch-Weigand A (2001) Customizable retrieval functions based on user tasks in the cultural heritage domain. In: Proceedings of the 5th European conference on research and advanced technology for digital libraries (ECDL ‘01). Springer, Berlin, pp 37–48

  28. Brocks H, Thiel U, Stein A, Dirsch-Weigand A (2002) How to incorporate collaborative discourse in cultural digital libraries. In: Proceedings of the ECAI 2002 workshop on semantic authoring, annotation and knowledge markup (SAAKM02), Lyon, France, pp 37–48

  29. Broder A (2002) A taxonomy of web search. SIGIR Forum 36(2):3–10

    MATH  Article  Google Scholar 

  30. Brogan ML (2003) Survey of digital library aggregation services, Digital Library Federation. Washington, District Columbia, USA. http://old.diglib.org/pubs/dlf101/dlf101.htm. Accessed 2017

  31. Buckley C (1995) Automatic query expansion using smart: TREC 3. In: Proceedings of the third text retrieval conference (TREC–3) NIST Special Publication 500–226

  32. Candela L, Castelli D, Ferro N, Ioannidis Y, Koutrika G, Meghini C, Pagano P, Ross S, Soergel D, Agosti M, Dobreva M, Katifori V, Schuldt H (2008) The DELOS digital library reference model—foundations for digital libraries, Version 0.98, December 2007. http://www.delos.info/files/pdf/ReferenceModel/DELOS_DLReferenceModel_0.98.pdf. Accessed 2017

  33. Cao G, Nie JY, Gao J, Robertson S (2008) Selecting good expansion terms for pseudo relevance feedback. In: Proceedings of the SIGIR 2008. ACM, pp 243–250

  34. Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval. ACM Comput Surv 44(1), Article 1, 50 pages. http://dx.doi.org/10.1145/2071389.2071390

  35. Carpineto C, Romano G, Giannini V (2002) Improving retrieval feedback with multiple term-ranking function combination. ACM Trans Inf Syst (TOIS) 20(3):259–290

    Article  Google Scholar 

  36. Carrasco LB (2013) Information integration: mapping cultural heritage metadata into CIDOC CRM. [Pdf]. Available online

  37. Chaudhari B, Parikh M (2012) A comparative study of clustering algorithms using weka tools. Int J Appl Innov Eng Manag 1(2):154–158

    Google Scholar 

  38. Chellatamilan T, Suresh R (2013) Concept based query expansion and cluster based feature selection for information retrieval. Life Sci J 10(7):661–667

    Google Scholar 

  39. CIDOC Documentation Standards Working Group, and CIDOC CRM SIG (2005) The CIDOC conceptual reference model

  40. Coyle K (2010) Library data in the web world. Library technology reports, vol 46, no 2, pp 5–11

  41. Craswell N, Hawking D, Robertson S (2001) Effective site finding using link anchor information. In: Proceedings of ACM SIGIR 2001, New Orleans, pp 250–257

  42. Cui H, Wen JR, Nie JY, Ma WY (2003) Query expansion by mining user logs. IEEE Trans Knowl Data Eng 15(4):829–839

    Article  Google Scholar 

  43. Daquino M, Mambelli F, Peroni S, Tomasi F, Vitali F (2017) Enhancing semantic expressivity in the cultural heritage domain: exposing the zeri photo archive as linked open data. J Comput Cultur Herit (JOCCH) 10(4):21:1–21:21

    Google Scholar 

  44. Darwish K, Oard DW (2007) Adapting morphology for arabic information retrieval, Arabic Computational Morphology. Springer, pp 245-262

  45. de Boer V, van Doornik J, Buitinck L, Marx M, Veken T, Ribbens K (2013) Linking the kingdom: enriched access to a historiographical text. In: Proceedings of the 7th international conference on knowledge capture (KCAP 2013). ACM, New York, USA, Banff, Canada, pp 17–24

  46. Devi M, Gandhi G (2015) WordNet and ontology based query expansion for semantic information retrieval in sports domain. J Comput Sci 11(2):361–371

    Article  Google Scholar 

  47. Diaz F, Mitra B, Craswell N (2016) Query expansion with locally-trained word embeddings. In: Proceedings of the 54th annual meeting of the association for computational (ACL), pp 367–377

  48. Doerr Marti (2009) Ontologies for cultural heritage, international handbooks on information systems. Springer, Berlin, pp 463–486

    Google Scholar 

  49. Dublin Core Metadata Initiative (2008) Expressing dublin core metadata using HTML/XHTML meta and link elements. DCMI recommendation

  50. Dublin Core Metadata Initiative (2012) Dublin core metadata element set, Version 1.1. DCMI recommendation. http://dublincore.org/documents/dces/

  51. Efron M, Organisciak P, Fenlon K (2012) Improving retrieval of short texts through document expansion. In: Proceedings of the 35th international ACM SIGIR conference, USA, pp 911–920

  52. El-Sappagh S, Hendawi A, Bastawissy A (2011) A proposed model for data warehouse ETL processes. J King Saud Univ Comput Inf Sci 23(2):91–104

    Google Scholar 

  53. Elsweiler D, Wilson ML, Kirkegaard Lunn B (2011) Understanding casual-leisure information behaviour. New directions in information behaviour. Emerald Press, Bingley, pp 211–241

    Chapter  Google Scholar 

  54. Evens T, Hauttekeete L (2011) Challenges of digital preservation for cultural heritage institutions. J Librariansh Inf Sci 43(3):157–165

    Article  Google Scholar 

  55. Fuhr N, Lalmas M, Malik S, Szlávik Z (2005) Advances in XML information retrieval. In: Proceedings of third international workshop of the initiative for the evaluation of XML retrieval. Germany. Vol 3493 of Lecture Notes in Computer Science. Springer

  56. Fukumoto J, Aburai N, Yamanishi R (2013) Interactive document expansion for answer extraction of question answering system. Procedia Comput Sci 22(3):991–1000

    Article  Google Scholar 

  57. Gan L, Hong H (2015) Improving query expansion for information retrieval using wikipedia. Int J Database Theory Appl 8(3):27–40

    Article  Google Scholar 

  58. Ganguly D, Leveling J, Jones G (2011) Query expansion for language modeling using sentence similarities. In: Proceeding of the 2nd information retrieval facility (IRF), Vienna, Austria, pp 26–77

  59. Ganguly D, Leveling J, Jones G (2013) An LDA-smoothed relevance model for document expansion: a case study for spoken document retrieval. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, Ireland, pp 1057–1060

  60. García P, García A, Alonso S (2017) Exploring the relevance of europeana digital resources: preliminary ideas on Europeana metadata quality. Revista Interamericana de Bibliotecología Journal 40(1):59–69

    Article  Google Scholar 

  61. Gendt, M, Isaac, A, Meij, L, Schlobach (2006) Semantic web techniques for multiple views on heterogeneous collections: a case study. In: Proceeding of 10th European conference on digital libraries (ECDL), Spain, pp 426–437

  62. Gergatsoulis M, Bountouri L, Gaitanou P, Papatheodorou C (2010) Mapping cultural metadata schemas to CIDOC conceptual reference model. Artificial intelligence: theories, models and applications, volume 6040 of LNCS. Springer, pp 321–326

  63. Godby J, Smith D, Childress E (2003) Two paths to interoperable metadata. In: Proceedings of the international conference for dublin core and metadata applications, Washington, USA

  64. Goodale P, Clough P, Ford N, Hall M, Stevenson M, Fernando S, Aletras N, Fernie K, Archer P, de Polo A (2012) User-centred design to support exploration and path creation in cultural heritage collections. In: Proceedings of the CEUR workshop, vol. 909, pp 75–78

  65. Grefenstette G, Rafes K (2016) Transforming wikipedia into an ontology-based information retrieval search engine for local experts using a third-party taxonomy. In: Joint second workshop on language and ontology and terminology and knowledge structures, Portoroz, Slovenia

  66. Gunasekara R, Wijegunasekara M, Dias N (2014) Comparison of major clustering algorithms using Weka tool. In: Proceedings of the 14th international conference on advances in ICT for emerging regions (ICTer), Colombo, pp 272–273

  67. Guo L, Shao F, Botev C, Shanmugasundaram J (2003) XRANK: ranked keyword search over XML documents. In: Proceedings of ACM SIGMOD international conference on management of data

  68. Hajmoosaei A, Skoric P (2016) Museum ontology-based metadata. In: Presented at the 2016 IEEE tenth international conference on semantic computing (ICSC)

  69. Hampson C, Lawless S, Bailey E, Yogev S, Zwerdling N, Carmel D, Conlan O, O’Connor A, Wade V (2012) CULTURA: a metadata-rich environment to support the enhanced interrogation of cultural collections. In: Proceedings of the 6th metadata and semantics research conference, Cádiz, Spain. Springer, pp 227–238

  70. Han MS (2013) Semantic Information retrieval based on wikipedia taxonomy. Int J Comput Appl Technol Res 2(1):77–80

    MathSciNet  Google Scholar 

  71. Haslhofer B, Klas W (2010) A survey of techniques for achieving metadata interoperability. ACM Comput Surv 42(2), Article 7 (March 2010), p 37. http://dx.doi.org/10.1145/1667062.1667064

  72. Hatano K, Kinutani H, Yoshikawa M, Uemura S (2002) Information retrieval system for XML documents. In: Proceedings of the 13th international conference on database and expert systems applications (DEXA ‘02). Springer, London, UK, pp 758–767

  73. He B, Ounis I (2009) Studying query expansion effectiveness. In: Proceedings of the 31th European conference on information retrieval, France, pp 611–619

  74. He T, Li L, Qu G, Zhang Y (2007) Chinese information retrieval based on document expansion. In: Proceedings of NTCIR-6 workshop meeting, Tokyo, Japan, pp 77–80

  75. Hennicke S (2013) Representation of archival user needs using CIDOC CRM. In: Paper presented at the practical experiences with CIDOC CRM and its extensions (CRMEX 2013) workshop, 17th international conference on theory and practice of digital libraries (TPDL 2013)

  76. Hersh WR (2005) Information retrieval and digital libraries. Medical informatics: knowledge management and data mining in biomedicine. Springer, Berlin, pp 237–275

    Google Scholar 

  77. Heung-Seon Oh, Jung Yuchul (2015) Cluster-based query expansion using external collections in medical information retrieval. J Biomed Inform 58:70–79. https://doi.org/10.1016/j.jbi.2015.09.017

    Article  Google Scholar 

  78. Huang Y, Sun L, Nie J-Y (2009) Smoothing document language model with local word graph. In: Proceedings of CIKM’09, pp 1943–1946

  79. Hyvönen E, Heino E, Leskinen P, Ikkala E, Koho M, Tamper M, Tuominen J, Mäkelä M (2016) WarSampo data service and semantic portal for publishing linked open data about the second world war history. In: Proceedings of the 13th international conference on the semantic web. Latest advances and new domains, vol 9678. Springer, New York, USA, pp 758–773

  80. Ikonomov N, Simeonov B, Parvanova J, Alexiev V (2013) Europeana creative. EDM endpoint. Custom views. In: Digital presentation and preservation of cultural and scientific heritage, vol 3, No. 1, pp 35–43

  81. Imran H, Sharan A (2009) Thesaurus and query expansion. Int J Comput Sci Inf Technol (IJCSIT) 1(2):89–97

    Google Scholar 

  82. Isaac A, Manguinhas H, Stiller J, Charles V (2015) Report on enrichment and evaluation. Retrieved on July 12, 2019 from https://pro.europeana.eu/files/Europeana_Professional/EuropeanaTech/EuropeanaTech_taskforces/Enrichment_Evaluation/FinalReport_EnrichmentEvaluation_102015.pdf. Accessed 2017

  83. Johnson SE, Jourlin P, Spärck Jones K, Woodland PC (1999) Spoken document retrieval for TREC-8 at Cambridge University. In: Proceedings of the 8th text retrieval conference (TREC 8)

  84. Kahn R, Wilensky R (2006) A framework for distributed digital object services. Int J Digit Libr 6(2):115–123. https://doi.org/10.1007/s00799-005-0128-x

    Article  Google Scholar 

  85. Kakali C, Lourdi I, Stasinopoulou T, Bountouri L, Papatheodorou C, Doerr M, Gergatsoulis M (2007) Integrating Dublin core metadata for cultural heritage collections using ontologies. In: Proceeding of the international conference on Dublin core and metadata applications, Singapore, pp 128–138

  86. Kando N, Adachi J (2004) Cultural heritage on line: information access across heterogeneous cultural heritage in Japan. In: Proceedings of international symposium on digital libraries and knowledge communities in networked information society (DLKC’04), pp 136

  87. Kanhabua N, Kemkes P, Nejdl W, Nguyen TN, Reis F, Tran NK (2016) How to search the internet archive without indexing it. In: Proceedings of 20th international conference on theory and practice of digital libraries. Springer

  88. Kazai G, Lalmas M, Roelleke T (2002) Focussed structured document retrieval. In: String processing and information retrieval, SPIRE 2002

  89. Kollia I, Tzouvaras V, Drosopoulos N, Stamou G (2012) A systemic approach for effective semantic access to cultural content. Semant Web J 3(1):65–83

    Article  Google Scholar 

  90. Koolen M, Arampatzis A, Kamps J, de Keijzer V, Nussbaum N (2007) Unified access to heterogeneous data in cultural heritage. RIAO

  91. Koolen Marijn, Kamps Jaap, de Keijzer Vincent (2009) Information retrieval in cultural heritage. ISR Interdiscip Sci Rev 34(2/3):268–284

    Article  Google Scholar 

  92. Kurland O, Krikon E (2011) The opposite of smoothing: a language model approach to ranking query-specific document clusters. J Artif Intell Res 41:367–395

    MathSciNet  MATH  Article  Google Scholar 

  93. Kuzi S, Shtok A, Kurland O (2016) Query expansion using word embeddings. In: Proceedings of the 25th ACM international on conference on information and knowledge management. ACM, pp 1929–1932

  94. Lee KS, Croft WB, Allan J (2008) A cluster-based resampling method for pseudo-relevance feedback. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 235–242

  95. Leveling J, Jones, GJF (2010) Classifying and filtering blind feedback terms to improve information retrieval effectiveness. In: Proceedings of the RIAO 2010, CID

  96. Liang S, Ren Z, de Rijke M (2014) The impact of semantic document expansion on cluster-based fusion for microblog search. In: Proceeding of 36th European conference on IR research, Amsterdam, pp 493–499

  97. Liao S-H, Huang H-C, Chen Y-N (2010) A semantic web approach to heterogeneous metadata integration. In: Proceeding of second international conference ICCCI computational collective intelligence, Taiwan, pp 205–214

  98. Lilis P, Lourdi I, Papatheodorou C, Gergatsoulis M, Department of Archive and Library Sciences (2005) A metadata model for representing time-dependent information in cultural collections. In: Proceedings of the first online metadata and semantics research conference

  99. Lin Y, Lin H, He L (2012) A cluster-based resource correlative query expansion in distributed information retrieval. J Comput Inf Syst 8(1):31–38

    Google Scholar 

  100. Liu S, McMahon CA, Culley SJ (2008) A review of structured document retrieval (SDR) technology to improve information access performance in engineering document management. Comput Ind 59(1):3–16

    Article  Google Scholar 

  101. Liu Z, Natarajan S, Chen H (2011) Query expansion based on clustered results. J VLDB Endow 4(6):350–361

    Article  Google Scholar 

  102. Lor PJ, Britz JJ (2012) An ethical perspective on political-economic issues in the long-term preservation of digital heritage. J Am Soc Inf Sci Technol 63(11):2153–2164

    Article  Google Scholar 

  103. Lourdi I, Papatheodorou C, Doerr M (2009) Semantic integration of collection description: combining CIDOC/CRM and Dublin core collections application profile. D-Lib Magazine 15(7/8)

  104. Madhavan J, Bernstein PA, Rahm E (2001) Generic schema matching with cupid. In: Proceedings of the 27th international conferences on very large databases, pp 49–58

  105. Mahdabi P, Andersson L, Keikha M, Crestani F (2012) Automatic refinement of patent queries using concept importance predictors. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval (SIGIR), pp 505–514

  106. Manguinhas H, Freire N, Isaac A, Stiller J, Charles V, Soroa A, Simon R, Alexiev V (2016) Exploring comparative evaluation of semantic enrichment tools for cultural heritage metadata. In: Proceeding of international conference on theory and practice of digital libraries. Springer, pp 266–278

  107. Manning CD, Raghavan P, Schutze H (2007) Introduction to information retrieval. Cambridge University Press, Cambridge, pp 405–416

    MATH  Google Scholar 

  108. Miles A, Bechhofer S (2009) SKOS simple knowledge organization system namespace document. W3C Recommendation

  109. Mizzaro S, Marco P, Ivan S, Martino V (2014) Short text categorization exploiting contextual enrichment and external knowledge. In: Proceedings of the first international workshop on social media retrieval and analysis, Australia, pp 57–62

  110. Mouromtsev D, Haase P, Cherny E, Pavlov D, Andreev A, Spiridonova A (2015) Towards the Russian linked culture cloud: data enrichment and publishing. In: European semantic web conference. Springer, pp 637–651

  111. Naidu R, Bharti, SK, Babu KS, Mohapatra RK (2018) Text summarization with automatic keyword extraction in telugu e-Newspapers smart computing and informatics. Springer, pp 555–564

  112. Ng TD, Wactlar HD (2002) Enriching perspectives in exploring cultural heritage documentaries using informedia technologies. In: Proceedings of the 4th international workshop on multimedia information retrieval in conjunction with ACM multimedia. Juan-les-Pins, France

  113. Orgel T, Höffernig M, Bailer W, Russegger S (2015) A metadata model and mapping approach for facilitating access to heterogeneous cultural heritage assets. Int J Digit Libr 15(2–4):189–207

    Article  Google Scholar 

  114. Pandey P, Maurya LS (2012) Information retrieval systems in XML based database—a review. Int J Adv Res Comput Commun Eng 1(2):789–793

    Google Scholar 

  115. Parikh N, Sriram P, Al Hasan M (2013) On segmentation of ecommerce queries. In: Proceedings of the international conference on information and knowledge management, Vol 31, pp 493–518

  116. Partridge C (2002) The role of ontology in integrating semantically heterogeneous databases. Technical Report 05/02 LADSEB-CNR, Padoue

  117. Peroni S, Tomasi F, Vitali F (2013) The aggregation of heterogeneous metadata in web-based cultural heritage collections: a case study. J Web Eng Technol 8(4):412–432

    Article  Google Scholar 

  118. Peterson A, Vieglais D, Sigüenza A, Silva M (2003) A global distributed biodiversity information network: building the world museum. Bul Br Ornithol Club 123A:186–196

    Google Scholar 

  119. Pipanmaekaporn, L, Kamonsantoroj S (2016) Latent space learning for enhanced short text classification. In: Proceedings of the international conference on communication and information systems, Thailand, pp 47–52

  120. Rami Ghorab M, Zhou Dong, O’Connor Alexander, Wade Vincent (2013) Personalized information retrieval: survey and classification. User Modeling User-Adapt Interact 23(4):381–443

    Article  Google Scholar 

  121. Rangrej A, Kulkarni S, Tendulkar AV (2011) Comparative study of clustering techniques for short text documents. In: Proceedings of the 20th international conference companion on World Wide Web, Hyderabad, India, pp 111–112

  122. Rebaï RZ, Mnif F, Zayani CA, Amous I (2015) Adaptive global schema generation from heterogeneous metadata schemas. Procedia Comput Sci 60:197–205

    Article  Google Scholar 

  123. Reid J, Lalmas M, Finesilver K, Hertzum M (2006) Best entry points for structured document retrieval-part II: types, usage and effectiveness. Inf Process Manag 42(1):89–105. https://doi.org/10.1016/j.ipm.2005.03.002

    Article  Google Scholar 

  124. Rivas AR, Iglesias EL, Borrajo L (2014) Study of query expansion techniques and their application in the biomedical information retrieval. Sci World J 2014. https://doi.org/10.1155/2014/132158

    Article  Google Scholar 

  125. Rocha C, Schwabe D, Aragao MP (2004) A hybrid approach for searching in the semantic web. In: Paper presented at the proceedings of the 13th international conference on World Wide Web

  126. Roelleke T, Lalmas M, Kazai G, Ruthven I, Quicker S (2002) The accessibility dimension for structured document retrieval. In: Proceedings of the 24th European colloquium on information retrieval research, ECIR02, Glasgow

  127. Roy D, Paul D, Mitra M, Garain U (2016) Using word embeddings for automatic query expansion. In: Proceeding of the ACM SIGIR 2016 workshop on neural information retrieval (Neu-IR 2016)

  128. Salton G, Buckley C (1990) Improving retrieval performance by relevance feedback. J Am Soc Inf Sci 41(4):288–297

    Article  Google Scholar 

  129. Schlötterer J, Seifert C, Granitzer M (2014) Web-based just-in-time retrieval for cultural content. In: Proceedings of the 7th international ACM workshop on personalized access to cultural heritage, Vienna, Austria, pp 805–808

  130. Seifert C, Bailer W, Orgel T, Gantner L, Kern R, Ziak H, Petit A, Schlötterer J, Zwicklbauer S, Granitzer M (2017) Ubiquitous access to digital cultural heritage. J Comput Cult Herit (JOCCH) Article 4, 27 pages, vol 10 (1)

  131. Sharma P, Tripathi R, Tripathi R (2015) Finding similar patents through semantic query expansion. Procedia Comput Sci J 54:390–395

    Article  Google Scholar 

  132. Shekarpour S, Marx E, Auer S, Sheth A (2017) Rquery: rewriting natural language queries on knowledge graphs to alleviate the vocabulary mismatch problem. In: Proceedings of the AAAI, pp 3936–3943

  133. Shirakawa M, Hara T, Nishio S (2015) N-gram idf: a global term weighting scheme based on information distance. In: Paper presented at the proceedings of the 24th international conference on World Wide Web

  134. Signore DO (2008) The semantic web and cultural heritage: ontologies and technologies help in accessing museum information. In: Robering K (Hrsg.) Semiotik der Kultur/Semiotics of culture, Vol. 6, pp 1–31

  135. Singhal A, Pereira F (1999) Document expansion for speech retrieval. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM Press, USA, pp 34–41

  136. Sokvitne L (2000) An evaluation of the effectiveness of current Dublin Core metadata for retrieval. In: VALA conference. pp 1–15

  137. Spink A (2003) Web search: emerging patterns. Library Trends 52(4):299–306

    Google Scholar 

  138. Tallerås K, Massey D, Husevåg A-SR, Preminger M, Pharo N (2014) Evaluating (linked) Metadata transformations across cultural heritage domains. In: Proceeding of the 8th metadata and semantics research conference. Karlsruhe, Germany, Springer, pp 250–261

  139. Tan K, Berrut C, Chevallet JP, Mulhem P (2014) Integrating semantic term relations into information retrieval systems based on language models. J Inf Retr Technol 88(70):136–147

    Google Scholar 

  140. Tao T, Wang X (2006) Language model information retrieval with document expansion. In: Proceedings of the human language technology conference, New York, pp 407–414

  141. Teixeira L, Lopes G, Ribeiro RA (2011) Automatic extraction of document topics. In: Paper presented at the doctoral conference on computing, electrical and industrial systems

  142. Theobald A, Weikum G (2002) The index-based XXL search engine for querying XML data with relevance ranking. In: Proceedings of conference on extending database technology, pp 477–495

  143. Tomasi F, Ciotti F, Daquino M, Lana M (2015) Using ontologies as a faceted browsing for heterogeneous cultural heritage collections. In: Proceeding of the 1st workshop on intelligent techniques at libraries and archives, Italy

  144. Tonkin EL, Tourte GJL (2016) Using the crowd to update cultural heritage catalogue. In: Presented at involving the crowd in future museum experience design—CHI 2016 workshop, San Jose, CA, United States, pp 1–6

  145. Tzompanaki K, Doerr M (2012) A new framework for querying semantic networks. In: Proceedings of museums and the web 2012: the international conference for culture and heritage on-line

  146. Uschold M, Gruninger M (1996) Ontologies: principles, methods and applications. Knowl Eng Rev 11(2):93–136

    Article  Google Scholar 

  147. Vrochidis S, Doulaverakis C, Gounaris A, Nidelkou E, Makris L, Kompatsiaris I (2009) A hybrid ontology and visual-based retrieval model for cultural heritage multimedia collections. Metadata and semantics. Springer, pp 1–10

  148. Wache H, Vogele T, Visser U, Stuckenschmidt H, Schuster G, Neumann H, Hubner S (2001) Ontology-based integration of information—a survey of existing approaches. In: Proceeding of the Stuckenschmidt, H., ed., IJCAI-01 Workshop: ontologies and information sharing, pp 108–117

  149. Walsh D, Hall MM (2015) Just looking around: supporting casual user’s initial encounters with digital cultural heritage. In: Proceedings of the first international workshop on supporting complex search tasks, volume 1338 of CEUR workshop proceedings. CEUR-WS.org

  150. Wang J, Oard DW (2005) Document and query expansion using side collections and thesauri. In: Proceedings of the CLEF 2005 workshop

  151. Wen J-R, Nie J-Y, Zhang H-J (2001) Clustering user queries of a search engine. In: Proceedings of the 10th international conference on World Wide Web, Hong Kong

  152. Windhager F, Mayr E, Schreder G, Smuc M, Federico P, Miksch S (2016) Reframing cultural heritage collections in a visualization framework of space–time cubes. In: Proceedings of the 3rd histo informatics workshop, vol 1632, pp 20–24

  153. Xiong C, Callan J (2015) Query expansion with freebase. In: Proceedings of the 2015 international conference on the theory of information retrieval. ACM, pp 111–120

  154. Xu J, Croft WB (1996) Query expansion using local and global document analysis. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR’96. ACM, New York, NY, USA, pp 4–11

  155. Xu J, Croft WB (2000) Improving the effectiveness of information retrieval with local context analysis. ACM Trans Inf Syst 18(1):79–112. https://doi.org/10.1145/333135.333138

    Article  Google Scholar 

  156. Xu X, Hu X (2010) Cluster-based query expansion using language modeling in the biomedical domain. In: Proceedings of IEEE international conference on bioinformatics and biomedicine workshops

  157. Xu Q, He F, Qiu RG (2005) Heterogeneous information integration for supply chain. In: Proceedings of 2005 IEEE international conference on systems, man and cybernetics. IEEE Press, New Jersey, pp 97–105

  158. Xu X, Zhu W, Zhang X, Hu X, Song I-Y (2006) A comparison of local analysis, global analysis and ontology-based query expansion strategies for bio-medical literature search. In: Proceedings of IEEE international conference on systems, man and cybernetics, SMC, vol. 4, pp 3441–3446

  159. Yang Z, Fan K, Lai Y, Gao K, Wang Y (2014) Short texts classification through reference document expansion. Chin J Electron 23(2):315–321

    Google Scholar 

  160. Zervanou K, Korkontzelos I, van den Bosch A, Ananiadou S (2011) Enrichment and structuring of archival description metadata. In: Proceedings of the 5th ACL-HLT workshop on language technology for cultural heritage, Oregon, pp 44–53

  161. Zhang Z, Wang Q, Si L, Gao J (2016) Learning for efficient supervised query expansion via two-stage feature selection. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 265–274

Download references

Author information



Corresponding author

Correspondence to Wafa’ Za’al Alma’aitah.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Alma’aitah, W.Z., Talib, A.Z. & Osman, M.A. Opportunities and challenges in enhancing access to metadata of cultural heritage collections: a survey. Artif Intell Rev 53, 3621–3646 (2020). https://doi.org/10.1007/s10462-019-09773-w

Download citation


  • Metadata
  • Information retrieval
  • Data retrieval
  • Cultural heritage