Abstract
Harvesting tasks gather information to a central repository. We studied the metadata returned from 744179 harvesting tasks from 2120 harvesting services in 529 harvesting rounds during a period of two years. To achieve that, we initiated nearly 1,500,000 tasks, because a significant part of the Open Archive Initiative harvesting services never worked or have ceased working while many other services fail occasionally. We studied the synthesis (elements and verbosity of values) of the harvested metadata, and how it evolved over time. We found that most services utilize almost all Dublin Core elements, but there are services with minimal descriptions. Most services have very minimal updates and, overall, the harvested metadata is slowly improving over time with “description” and “relation” improving the most. Our results help us to better understand how and when the metadata are improved and have more realistic expectations about the quality of the metadata when we design harvesting or information systems that rely on them.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Beall, J.: Metadata and data quality problems in the digital library. J. Digit. Inf. 6(3) (2005)
Bui, Y., Park, J.: An assessment of metadata quality: a case study of the national science digital library metadata repository. In: Moukdad, H. (ed.) CAIS/ACSI 2006 Information Science Revisited: Approaches to Innovation. Proceedings of the 2005 Annual Conference of the Canadian Association for Information Science held with the Congress of the Social Sciences and Humanities of Canada at York University, Toronto, Ontario (2005)
Fuhr, N.: Evaluation of digital libraries. Int. J. Digit. Libr. 8(1), 21–38 (2007)
Hillmann, D.I., Phipps, J.: Application profiles: exposing and enforcing metadata quality. In: Proceedings of the International Conference on Dublin Core and Metadata Applications, Singapore (2007)
Hughes, B.: Metadata quality evaluation: experience from the open language archives community. In: Chen, Z., Chen, H., Miao, Q., Fu, Y., Fox, E., Lim, E.-p. (eds.) ICADL 2004. LNCS, vol. 3334, pp. 320–329. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30544-6_34
Kapidakis, S.: Comparing metadata quality in the europeana context. In: Proceedings of the 5th ACM International Conference on PErvasive Technologies Related to Assistive Environments (PETRA 2012), Heraklion, Greece, 6–8 June 2012, ACM International Conference Proceeding Series, vol. 661 (2012)
Kapidakis, S.: Rating quality in metadata harvesting. In: Proceedings of the 8th ACM international conference on PErvasive Technologies Related to Assistive Environments (PETRA 2015), Corfu, Greece, 1–3 July 2015, ACM International Conference Proceeding Series (2015). ISBN 978-1-4503-3452-5
Kapidakis, S.: Exploring metadata providers reliability and update behavior. In: Fuhr, N., Kovács, L., Risse, T., Nejdl, W. (eds.) TPDL 2016. LNCS, vol. 9819, pp. 417–425. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43997-6_36
Kapidakis, S.: Exploring the consistent behavior of information services. In: 20th International Conference on Circuits, Systems, Communications and Computers (CSCC 2016), Corfu, 14–17 July 2016 (2016)
Kapidakis, S.: When a metadata provider task is successful. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 544–552. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67008-9_44
Kapidakis, S.: Unexpected errors from metadata OAI-PMH providers. In: 10th Qualitative and Quantitative Methods in Libraries International Conference, QQML 2018, Chania, Greece, 22–25 May 2018 (2018)
Kapidakis, S.: Error analysis on harvesting data over the internet. In: 11th ACM International Conference on PErvasive Technologies Related to Assistive Environments, PETRA 2018, Corfu, 26–29 June 2018 (2018)
Lagoze, C., Krafft, D., Cornwell, T., Dushay, N., Eckstrom, D., Saylor, J.: Metadata aggregation and “automated digital libraries”: a retrospective on the NSDL experience. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital libraries (JCDL 2006), pp. 230–239 (2006)
Moreira, B.L., Goncalves, M.A., Laender, A.H.F., Fox, E.A.: Automatic evaluation of digital libraries with 5SQual. J. Informetr. 3(2), 102–123 (2009)
Ochoa, X., Duval, E.: Automatic evaluation of metadata quality in digital repositories. Int. J. Digit. Libr. 10(2/3), 67–91 (2009)
Vullo, G., et al.: Quality interoperability within digital libraries: the DL.org perspective. In: 2nd DL.org Workshop in Conjunction with ECDL 2010, 9–10 September 2010, Glasgow, UK (2010)
Ward, J.: A quantitative analysis of unqualified Dublin core metadata element set usage within data providers registered with the open archives initiative. In: Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2003), pp. 315–317 (2003). ISBN 0-7695-1939-3
Zhang, Y.: Developing a holistic model for digital library evaluation. J. Am. Soc. Inf. Sci. Technol. 61(1), 88–110 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Kapidakis, S. (2018). Metadata Synthesis and Updates on Collections Harvested Using the Open Archive Initiative Protocol for Metadata Harvesting. In: Méndez, E., Crestani, F., Ribeiro, C., David, G., Lopes, J. (eds) Digital Libraries for Open Knowledge. TPDL 2018. Lecture Notes in Computer Science(), vol 11057. Springer, Cham. https://doi.org/10.1007/978-3-030-00066-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-00066-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00065-3
Online ISBN: 978-3-030-00066-0
eBook Packages: Computer ScienceComputer Science (R0)