Skip to main content
Log in

Benchmarking and evaluating the interpretation of bibliographic records

  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

In a global context which promotes the use of explicit semantics for sharing information and developing new services, the MAchine Readable Cataloguing (MARC) format that is commonly used by libraries worldwide has demonstrated its many limitations. The conceptual reference model for bibliographic information presented in the Functional Requirements for Bibliographic Records (FRBR) is expected to be the foundation for a new generation of catalogs that will replace MARC and the digital card catalog. The need for transformation of legacy MARC records to FRBR representation (FRBRization) has led to the proposal of various tools and approaches. However, these projects and the results they achieve are difficult to compare due to lack of common datasets and well defined and appropriate metrics. Our contributions fill this gap by proposing BIB-R, the first public benchmark for the FRBRization process. It is composed of two datasets that enable the identification of the strengths and weaknesses of a FRBRization tool. It also defines a set of well defined metrics that evaluate the different steps of the FRBRization process. Those resources, as well as the results of a large experiment involving three FRBRization tools tested against our benchmark, are available to the community under an open licence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. http://www.rdatoolkit.org/.

  2. http://www.ld4l.org/ontology.

  3. http://loc.gov/bibframe/.

  4. http://bib-r.github.io/.

  5. http://creativecommons.org/licenses/by-nc/2.0/.

  6. We use “Person” in our examples for the sake of readability. The initial FRBR model also includes a Corporate Body entity type. In the revised Library Reference Model the proper supertype is “Agent”.

  7. “Concept” is used as a categorical supertype for anything that can be the subject.

  8. Variations VFRBR rules available at http://www.dlib.indiana.edu/projects/vfrbr/projectDoc/.

  9. http://bib-r.github.io/specifications-metrics.txt.

  10. Note that the extraction can have a higher complexity in specific cases, such as when records contain references to other record(s) which needs to be looked up during the extraction.

  11. Presentation at code4lib 2011 about improving the performance of eXtensible Catalog’s deduplication module, http://www.extensiblecatalog.org/learnmore/publications.

  12. Our expert collections include specific annotations for each element of the patterns, else it would not be possible to compute the metrics MEND, MRND and ESE.

  13. http://www.rdaregistry.info/.

  14. BIB-RCAT is a recursive acronym that stands for “BIB-RCAT Is Basically a Real-world CATalogue”.

  15. http://bib-r.github.io/mappings.xml.

  16. https://github.com/naimdjon/marc2frbr FRBR-ML tool, previously named marc2frbr.

  17. http://www.extensiblecatalog.org/ Extensible Catalog.

  18. https://github.com/naimdjon/vfrbr-frbrize-marc Variations VFRBR tool (adjusted version, only to facilitate compilation).

  19. The tests have been chosen according to a sequential order (remind that test 5.4 does not exist). The analysis of the results is, however, not limited to this subset.

  20. Note that XC does not create Agent and Concepts entities, but it rather adds properties within the main Work or Expression. Our evaluation takes this specificity into account and XC is not penalized when a property and its associated value correctly represent the Agent or the Concept.

  21. Note that the category patterns 4.x (aggregations) and 5.x (complementary works) do not have secondary elements and all tools achieve a 0% ESE score for these tests.

  22. Note that the expert had knowledge about the proposed metrics, and the given time may increase for people who need to understand the concepts behind these metrics.

  23. http://www.nist.gov/tac/2016/KBP/.

References

  1. Aalberg, T.: A process and tool for the conversion of MARC records to a normalized FRBR implementation. LNCS Digit. Libr. Achiev. Chall. Oppor. 4312, 283–292 (2006). https://doi.org/10.1007/11931584_31

    Google Scholar 

  2. Aalberg, T., Merčun, T., Žumer, M.: Coding FRBR-structured bibliographic information in MARC, pp. 128–137. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-24826-9

    Google Scholar 

  3. Aalberg, T., Žumer, M.: The value of MARC Data, or, challenges of FRBRisation. J. Doc. 69, 851–872 (2013)

    Article  Google Scholar 

  4. Alemu, G., Stevens, B., Ross, P., Chandler, J.: Linked data for libraries: benefits of a conceptual shift from library-specific record structures to RDF-based data models. New Libr. World 113, 549–570 (2012)

    Article  Google Scholar 

  5. Alexe, B., Tan, W.C., Velegrakis, Y.: STBenchmark: towards a benchmark for mapping systems. Proc. VLDB 1(1), 230–244 (2008)

    Article  Google Scholar 

  6. Bailey, P., Hawking, D., Krumpholz, A.: Toward meaningful test collections for information integration benchmarking. In: Proceedings of IIWeb. http://es.csiro.au/pubs/bailey_iiweb.pdf (2006)

  7. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol) 57, 289–300 (1995)

    MathSciNet  MATH  Google Scholar 

  8. Boehm, B., Abts, C., Chulani, S.: Software development cost estimation approaches—a survey. Ann. Softw. Eng. 10(1), 177–205 (2000). https://doi.org/10.1023/A:1018991717352

    Article  MATH  Google Scholar 

  9. Bowen, J.: Moving library metadata toward linked data: opportunities provided by the eXtensible catalog. In: International Conference on Dublin Core and Metadata Applications (2010)

  10. Buchanan, G.: FRBR: enriching and integrating digital libraries. In: Proceedings of Joint Conference on Digital Libraries, pp. 260–269 (2006). https://doi.org/10.1145/1141753.1141812

  11. Chang, N., Tsai, Y., Dunsire, G., Hopkinson, A.: Experimenting with implementing FRBR in a Chinese Koha system. Libr. Hi Tech News 30, 10–20 (2013)

    Article  Google Scholar 

  12. Christen, P., Goiser, K.: Quality and complexity measures for data linkage and deduplication. Qual. Meas. Data Min. 43, 127–151 (2007)

    Google Scholar 

  13. Committee, S., Group, I.S.: Functional Requirements for Bibliographic Records: Final Report, vol. 19. K. G. Saur (1998)

  14. Coyle, K.: FRBR, twenty years on. Cat. Classif. Q. 57, 1–21 (2014)

    Google Scholar 

  15. Decourselle, J., Duchateau, F., Aalberg, T., Takhirov, N., Lumineau, N.: Appendix: benchmarking and evaluating the interpretation of bibliographic records. Tech. rep., LIRIS, NTNU. http://liris.cnrs.fr/~fduchate/docs/appendix/appendix-BIB-R.pdf (2016)

  16. Decourselle, J., Duchateau, F., Aalberg, T., Takhirov, N., Lumineau, N.: BIB-R: a benchmark for the interpretation of bibliographic records. In: Theory and Practice of Digital Libraries (TPDL). Hannover, Germany. https://hal.archives-ouvertes.fr/hal-01324529 (2016)

  17. Decourselle, J., Duchateau, F., Aalberg, T., Takhirov, N., Lumineau, N.: Open datasets for evaluating the interpretation of bibliographic records. In: Proceedings of Joint Conference on Digital Libraries. ACM (2016)

  18. Decourselle, J., Duchateau, F., Lumineau, N.: A survey of FRBRization techniques. In: Theory and Practice of Digital Libraries, pp. 185–196. https://hal.archives-ouvertes.fr/hal-01198487 (2015)

  19. Denton, W.: FRBR and the History of Cataloging. In: Taylor, A.G. (ed.) Understanding FRBR: What it is and How it Will Affect Our Retrieval Tools (2007) Libraries Unlimited, Westport

  20. Dickey, T.J.: FRBRization of a library catalog: better collocation of records, leading to enhanced search, retrieval, and display. Inf. Technol. Libr. 27, 23–32 (2008)

    Google Scholar 

  21. Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007)

    Article  Google Scholar 

  22. Euzenat, J., Rosoiu, M.E., Trojahn, C.: Ontology matching benchmarks: generation, stability, and discriminability. Web Semant. Sci. Serv. Agents World Wide Web 21, 30–48 (2013)

    Article  Google Scholar 

  23. Hickey, T., Vizine-Goetz, D.: Implementing FRBR on Large Databases. OCLC, Dublin (2002)

    Google Scholar 

  24. Hickey, T.B., O’Neill, E.T.: FRBRizing OCLC’s WorldCat. Cat. Classif. Q. 39, 239–251 (2005)

    Google Scholar 

  25. Hickey, T.B., Toves, J.: FRBR work-set algorithm (2.0). OCLC. http://www.oclc.org/research/activities/frbralgorithm.html?urlm=159780 (2009)

  26. Ioannou, E., Rassadko, N., Velegrakis, Y.: On generating benchmark data for entity matching. J. Data Semant. 2(1), 37–56 (2013). https://doi.org/10.1007/s13740-012-0015-8

    Article  Google Scholar 

  27. Kilner, K.: The AustLit gateway and scholarly bibliography: a specialist implementation of the FRBR. Cat. Classif. Q. 39, 87–102 (2005)

    Google Scholar 

  28. Kroeger, A.: The road to bibframe: the evolution of the idea of bibliographic transition into a post-marc future. Cat. Classif. Q. 51(8), 873–890 (2013)

    Google Scholar 

  29. Le Bœuf, P.: Customized OPACs on the Semantic Web: the OpenCat prototype. IFLA World Library and Information Congress, pp. 1–15 (2013)

  30. Leopold, C.: Parallel and Distributed Computing: A Survey of Models, Paradigms and Approaches. Wiley, Hoboken (2001)

    Google Scholar 

  31. Leroy, G.: Gold Standard and User Evaluations, pp. 131–137. Springer, London (2011). https://doi.org/10.1007/978-0-85729-622-1_9

    Google Scholar 

  32. Manguinhas, H.M.A., Freire, N.M.A., Borbinha, J.L.B.: FRBRization of MARC records in multiple catalogs. In: Hunter, J., Lagoze, C., Giles, C.L., Li, Y.F. (eds.) JCDL, pp. 225–234. ACM (2010)

  33. Minadakis, N., Marketakis, Y., Kondylakis, H., Flouris, G., Theodoridou, M., Doerr, M., de Jong, G.: X3ml framework: an effective suite for supporting data mappings. In: Workshop for Extending, Mapping and Focusing the CRM—Co-located with TPDL (2015)

  34. Norman, D.A.: The Design of Everyday Things: Revised and, Expanded edn. Basic Books, New York (2013)

    Google Scholar 

  35. Notess, M., Dunn, J.W., Hardesty, J.L.: Scherzo: A FRBR-Based Music Discovery System. In: International Conference on Dublin Core and Metadata Applications, pp. 182–183 (2011)

  36. Phipps, J., Dunsire, G., Hillmann, D.: Building a platform to manage RDA vocabularies and data for an international, linked data world. J. Libr. Metadata 15(3–4), 252–264 (2015). https://doi.org/10.1080/19386389.2015.1099990

    Article  Google Scholar 

  37. Pisanski, J., Žumer, M.: User verification of the FRBR conceptual model. J. Doc. 68(4), 582–592 (2012). https://doi.org/10.1108/00220411211239129

    Article  Google Scholar 

  38. Putz, M., Schaffner, V., Seidler, W.: FRBR: the MAB2 perspective. Cat. Classif. Q. 50, 387–401 (2012)

    Google Scholar 

  39. Riley, J.: Enhancing interoperability of FRBR-based metadata. In: International Conference on Dublin Core and Metadata Applications (2010)

  40. Riva, P.: Mapping MARC 21 linking entry fields to FRBR and Tillett’s taxonomy of bibliographic relationships. Libr. Resour. Tech. Serv. 48(2), 130–143 (2013)

    Google Scholar 

  41. Riva, P., Le Boeuf, P., Žumer, M.: FRBR-Library reference model. Tech. rep., IFLA FRBR Review Group. https://www.ifla.org/files/assets/cataloguing/frbr-lrm/frbr-lrm_20160225.pdf (2016)

  42. Romero, G.C., Esteban, M.P.E., Such, M.M., Carrasco, R.C.: Transformation of a library catalogue into RDA linked open data. In: Theory and Practice of Digital Libraries (TPDL), pp. 321–325 (2015). https://doi.org/10.1007/978-3-319-24592-8_26

  43. Schneider, J.: FRBRizing MARC records with the FRBR Display Tool. http://jodischneider.com/pubs/2008may_frbr.html (2008)

  44. Takhirov, N., Aalberg, T., Duchateau, F., Žumer, M.: FRBR-ML: a FRBR-based framework for semantic interoperability. Semant. Web J. 3, 23–43 (2012)

    Google Scholar 

  45. Vassallo, V., Piccininno, M.: Aggregating Content for Europeana: A Workflow to Support Content Providers, pp. 445–454. Springer, Berlin (2012). https://doi.org/10.1007/978-3-642-33290-6_50

    Google Scholar 

  46. Vila-Suero, D., Villazón-Terrazas, B.,: Gómez-Pérez, A.: datos. bne. es: a library linked dataset. Semant. Web 4(3), 307–313 (2013)

    Google Scholar 

  47. Zhang, Y., Salaba, A.: Implementing FRBR in Libraries: Key Issues and Future Directions. Neal-Schuman Publishers, New York (2009)

    Google Scholar 

Download references

Acknowledgements

This work has been partially supported by the French Agency ANRT (www.anrt.asso.fr), the company PROGILONE (www.progilone.com/), a PHC Aurora funding (#34047VH) and a CNRS PICS funding (#PICS06945).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabien Duchateau.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aalberg, T., Duchateau, F., Takhirov, N. et al. Benchmarking and evaluating the interpretation of bibliographic records. Int J Digit Libr 20, 143–165 (2019). https://doi.org/10.1007/s00799-018-0233-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00799-018-0233-2

Keywords

Navigation