Skip to main content

Metadata Synthesis and Updates on Collections Harvested Using the Open Archive Initiative Protocol for Metadata Harvesting

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11057))

Abstract

Harvesting tasks gather information to a central repository. We studied the metadata returned from 744179 harvesting tasks from 2120 harvesting services in 529 harvesting rounds during a period of two years. To achieve that, we initiated nearly 1,500,000 tasks, because a significant part of the Open Archive Initiative harvesting services never worked or have ceased working while many other services fail occasionally. We studied the synthesis (elements and verbosity of values) of the harvested metadata, and how it evolved over time. We found that most services utilize almost all Dublin Core elements, but there are services with minimal descriptions. Most services have very minimal updates and, overall, the harvested metadata is slowly improving over time with “description” and “relation” improving the most. Our results help us to better understand how and when the metadata are improved and have more realistic expectations about the quality of the metadata when we design harvesting or information systems that rely on them.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm.

  2. 2.

    http://www.openarchives.org/rs/1.1/resourcesync.

  3. 3.

    https://www.openarchives.org/Register/BrowseSites.

References

  1. Beall, J.: Metadata and data quality problems in the digital library. J. Digit. Inf. 6(3) (2005)

    Google Scholar 

  2. Bui, Y., Park, J.: An assessment of metadata quality: a case study of the national science digital library metadata repository. In: Moukdad, H. (ed.) CAIS/ACSI 2006 Information Science Revisited: Approaches to Innovation. Proceedings of the 2005 Annual Conference of the Canadian Association for Information Science held with the Congress of the Social Sciences and Humanities of Canada at York University, Toronto, Ontario (2005)

    Google Scholar 

  3. Fuhr, N.: Evaluation of digital libraries. Int. J. Digit. Libr. 8(1), 21–38 (2007)

    Article  MathSciNet  Google Scholar 

  4. Hillmann, D.I., Phipps, J.: Application profiles: exposing and enforcing metadata quality. In: Proceedings of the International Conference on Dublin Core and Metadata Applications, Singapore (2007)

    Google Scholar 

  5. Hughes, B.: Metadata quality evaluation: experience from the open language archives community. In: Chen, Z., Chen, H., Miao, Q., Fu, Y., Fox, E., Lim, E.-p. (eds.) ICADL 2004. LNCS, vol. 3334, pp. 320–329. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30544-6_34

    Chapter  Google Scholar 

  6. Kapidakis, S.: Comparing metadata quality in the europeana context. In: Proceedings of the 5th ACM International Conference on PErvasive Technologies Related to Assistive Environments (PETRA 2012), Heraklion, Greece, 6–8 June 2012, ACM International Conference Proceeding Series, vol. 661 (2012)

    Google Scholar 

  7. Kapidakis, S.: Rating quality in metadata harvesting. In: Proceedings of the 8th ACM international conference on PErvasive Technologies Related to Assistive Environments (PETRA 2015), Corfu, Greece, 1–3 July 2015, ACM International Conference Proceeding Series (2015). ISBN 978-1-4503-3452-5

    Google Scholar 

  8. Kapidakis, S.: Exploring metadata providers reliability and update behavior. In: Fuhr, N., Kovács, L., Risse, T., Nejdl, W. (eds.) TPDL 2016. LNCS, vol. 9819, pp. 417–425. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43997-6_36

    Chapter  Google Scholar 

  9. Kapidakis, S.: Exploring the consistent behavior of information services. In: 20th International Conference on Circuits, Systems, Communications and Computers (CSCC 2016), Corfu, 14–17 July 2016 (2016)

    Google Scholar 

  10. Kapidakis, S.: When a metadata provider task is successful. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 544–552. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67008-9_44

    Chapter  Google Scholar 

  11. Kapidakis, S.: Unexpected errors from metadata OAI-PMH providers. In: 10th Qualitative and Quantitative Methods in Libraries International Conference, QQML 2018, Chania, Greece, 22–25 May 2018 (2018)

    Google Scholar 

  12. Kapidakis, S.: Error analysis on harvesting data over the internet. In: 11th ACM International Conference on PErvasive Technologies Related to Assistive Environments, PETRA 2018, Corfu, 26–29 June 2018 (2018)

    Google Scholar 

  13. Lagoze, C., Krafft, D., Cornwell, T., Dushay, N., Eckstrom, D., Saylor, J.: Metadata aggregation and “automated digital libraries”: a retrospective on the NSDL experience. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital libraries (JCDL 2006), pp. 230–239 (2006)

    Google Scholar 

  14. Moreira, B.L., Goncalves, M.A., Laender, A.H.F., Fox, E.A.: Automatic evaluation of digital libraries with 5SQual. J. Informetr. 3(2), 102–123 (2009)

    Article  Google Scholar 

  15. Ochoa, X., Duval, E.: Automatic evaluation of metadata quality in digital repositories. Int. J. Digit. Libr. 10(2/3), 67–91 (2009)

    Article  Google Scholar 

  16. Vullo, G., et al.: Quality interoperability within digital libraries: the DL.org perspective. In: 2nd DL.org Workshop in Conjunction with ECDL 2010, 9–10 September 2010, Glasgow, UK (2010)

    Google Scholar 

  17. Ward, J.: A quantitative analysis of unqualified Dublin core metadata element set usage within data providers registered with the open archives initiative. In: Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2003), pp. 315–317 (2003). ISBN 0-7695-1939-3

    Google Scholar 

  18. Zhang, Y.: Developing a holistic model for digital library evaluation. J. Am. Soc. Inf. Sci. Technol. 61(1), 88–110 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sarantos Kapidakis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kapidakis, S. (2018). Metadata Synthesis and Updates on Collections Harvested Using the Open Archive Initiative Protocol for Metadata Harvesting. In: Méndez, E., Crestani, F., Ribeiro, C., David, G., Lopes, J. (eds) Digital Libraries for Open Knowledge. TPDL 2018. Lecture Notes in Computer Science(), vol 11057. Springer, Cham. https://doi.org/10.1007/978-3-030-00066-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00066-0_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00065-3

  • Online ISBN: 978-3-030-00066-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics