Abstract
In a global context which promotes the use of explicit semantics for sharing information and developing new services, the MAchine Readable Cataloguing (MARC) format that is commonly used by libraries worldwide has demonstrated its many limitations. The conceptual reference model for bibliographic information presented in the Functional Requirements for Bibliographic Records (FRBR) is expected to be the foundation for a new generation of catalogs that will replace MARC and the digital card catalog. The need for transformation of legacy MARC records to FRBR representation (FRBRization) has led to the proposal of various tools and approaches. However, these projects and the results they achieve are difficult to compare due to lack of common datasets and well defined and appropriate metrics. Our contributions fill this gap by proposing BIB-R, the first public benchmark for the FRBRization process. It is composed of two datasets that enable the identification of the strengths and weaknesses of a FRBRization tool. It also defines a set of well defined metrics that evaluate the different steps of the FRBRization process. Those resources, as well as the results of a large experiment involving three FRBRization tools tested against our benchmark, are available to the community under an open licence.
Similar content being viewed by others
Notes
We use “Person” in our examples for the sake of readability. The initial FRBR model also includes a Corporate Body entity type. In the revised Library Reference Model the proper supertype is “Agent”.
“Concept” is used as a categorical supertype for anything that can be the subject.
Variations VFRBR rules available at http://www.dlib.indiana.edu/projects/vfrbr/projectDoc/.
Note that the extraction can have a higher complexity in specific cases, such as when records contain references to other record(s) which needs to be looked up during the extraction.
Presentation at code4lib 2011 about improving the performance of eXtensible Catalog’s deduplication module, http://www.extensiblecatalog.org/learnmore/publications.
Our expert collections include specific annotations for each element of the patterns, else it would not be possible to compute the metrics MEND, MRND and ESE.
BIB-RCAT is a recursive acronym that stands for “BIB-RCAT Is Basically a Real-world CATalogue”.
https://github.com/naimdjon/marc2frbr FRBR-ML tool, previously named marc2frbr.
http://www.extensiblecatalog.org/ Extensible Catalog.
https://github.com/naimdjon/vfrbr-frbrize-marc Variations VFRBR tool (adjusted version, only to facilitate compilation).
The tests have been chosen according to a sequential order (remind that test 5.4 does not exist). The analysis of the results is, however, not limited to this subset.
Note that XC does not create Agent and Concepts entities, but it rather adds properties within the main Work or Expression. Our evaluation takes this specificity into account and XC is not penalized when a property and its associated value correctly represent the Agent or the Concept.
Note that the category patterns 4.x (aggregations) and 5.x (complementary works) do not have secondary elements and all tools achieve a 0% ESE score for these tests.
Note that the expert had knowledge about the proposed metrics, and the given time may increase for people who need to understand the concepts behind these metrics.
References
Aalberg, T.: A process and tool for the conversion of MARC records to a normalized FRBR implementation. LNCS Digit. Libr. Achiev. Chall. Oppor. 4312, 283–292 (2006). https://doi.org/10.1007/11931584_31
Aalberg, T., Merčun, T., Žumer, M.: Coding FRBR-structured bibliographic information in MARC, pp. 128–137. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-24826-9
Aalberg, T., Žumer, M.: The value of MARC Data, or, challenges of FRBRisation. J. Doc. 69, 851–872 (2013)
Alemu, G., Stevens, B., Ross, P., Chandler, J.: Linked data for libraries: benefits of a conceptual shift from library-specific record structures to RDF-based data models. New Libr. World 113, 549–570 (2012)
Alexe, B., Tan, W.C., Velegrakis, Y.: STBenchmark: towards a benchmark for mapping systems. Proc. VLDB 1(1), 230–244 (2008)
Bailey, P., Hawking, D., Krumpholz, A.: Toward meaningful test collections for information integration benchmarking. In: Proceedings of IIWeb. http://es.csiro.au/pubs/bailey_iiweb.pdf (2006)
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol) 57, 289–300 (1995)
Boehm, B., Abts, C., Chulani, S.: Software development cost estimation approaches—a survey. Ann. Softw. Eng. 10(1), 177–205 (2000). https://doi.org/10.1023/A:1018991717352
Bowen, J.: Moving library metadata toward linked data: opportunities provided by the eXtensible catalog. In: International Conference on Dublin Core and Metadata Applications (2010)
Buchanan, G.: FRBR: enriching and integrating digital libraries. In: Proceedings of Joint Conference on Digital Libraries, pp. 260–269 (2006). https://doi.org/10.1145/1141753.1141812
Chang, N., Tsai, Y., Dunsire, G., Hopkinson, A.: Experimenting with implementing FRBR in a Chinese Koha system. Libr. Hi Tech News 30, 10–20 (2013)
Christen, P., Goiser, K.: Quality and complexity measures for data linkage and deduplication. Qual. Meas. Data Min. 43, 127–151 (2007)
Committee, S., Group, I.S.: Functional Requirements for Bibliographic Records: Final Report, vol. 19. K. G. Saur (1998)
Coyle, K.: FRBR, twenty years on. Cat. Classif. Q. 57, 1–21 (2014)
Decourselle, J., Duchateau, F., Aalberg, T., Takhirov, N., Lumineau, N.: Appendix: benchmarking and evaluating the interpretation of bibliographic records. Tech. rep., LIRIS, NTNU. http://liris.cnrs.fr/~fduchate/docs/appendix/appendix-BIB-R.pdf (2016)
Decourselle, J., Duchateau, F., Aalberg, T., Takhirov, N., Lumineau, N.: BIB-R: a benchmark for the interpretation of bibliographic records. In: Theory and Practice of Digital Libraries (TPDL). Hannover, Germany. https://hal.archives-ouvertes.fr/hal-01324529 (2016)
Decourselle, J., Duchateau, F., Aalberg, T., Takhirov, N., Lumineau, N.: Open datasets for evaluating the interpretation of bibliographic records. In: Proceedings of Joint Conference on Digital Libraries. ACM (2016)
Decourselle, J., Duchateau, F., Lumineau, N.: A survey of FRBRization techniques. In: Theory and Practice of Digital Libraries, pp. 185–196. https://hal.archives-ouvertes.fr/hal-01198487 (2015)
Denton, W.: FRBR and the History of Cataloging. In: Taylor, A.G. (ed.) Understanding FRBR: What it is and How it Will Affect Our Retrieval Tools (2007) Libraries Unlimited, Westport
Dickey, T.J.: FRBRization of a library catalog: better collocation of records, leading to enhanced search, retrieval, and display. Inf. Technol. Libr. 27, 23–32 (2008)
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007)
Euzenat, J., Rosoiu, M.E., Trojahn, C.: Ontology matching benchmarks: generation, stability, and discriminability. Web Semant. Sci. Serv. Agents World Wide Web 21, 30–48 (2013)
Hickey, T., Vizine-Goetz, D.: Implementing FRBR on Large Databases. OCLC, Dublin (2002)
Hickey, T.B., O’Neill, E.T.: FRBRizing OCLC’s WorldCat. Cat. Classif. Q. 39, 239–251 (2005)
Hickey, T.B., Toves, J.: FRBR work-set algorithm (2.0). OCLC. http://www.oclc.org/research/activities/frbralgorithm.html?urlm=159780 (2009)
Ioannou, E., Rassadko, N., Velegrakis, Y.: On generating benchmark data for entity matching. J. Data Semant. 2(1), 37–56 (2013). https://doi.org/10.1007/s13740-012-0015-8
Kilner, K.: The AustLit gateway and scholarly bibliography: a specialist implementation of the FRBR. Cat. Classif. Q. 39, 87–102 (2005)
Kroeger, A.: The road to bibframe: the evolution of the idea of bibliographic transition into a post-marc future. Cat. Classif. Q. 51(8), 873–890 (2013)
Le Bœuf, P.: Customized OPACs on the Semantic Web: the OpenCat prototype. IFLA World Library and Information Congress, pp. 1–15 (2013)
Leopold, C.: Parallel and Distributed Computing: A Survey of Models, Paradigms and Approaches. Wiley, Hoboken (2001)
Leroy, G.: Gold Standard and User Evaluations, pp. 131–137. Springer, London (2011). https://doi.org/10.1007/978-0-85729-622-1_9
Manguinhas, H.M.A., Freire, N.M.A., Borbinha, J.L.B.: FRBRization of MARC records in multiple catalogs. In: Hunter, J., Lagoze, C., Giles, C.L., Li, Y.F. (eds.) JCDL, pp. 225–234. ACM (2010)
Minadakis, N., Marketakis, Y., Kondylakis, H., Flouris, G., Theodoridou, M., Doerr, M., de Jong, G.: X3ml framework: an effective suite for supporting data mappings. In: Workshop for Extending, Mapping and Focusing the CRM—Co-located with TPDL (2015)
Norman, D.A.: The Design of Everyday Things: Revised and, Expanded edn. Basic Books, New York (2013)
Notess, M., Dunn, J.W., Hardesty, J.L.: Scherzo: A FRBR-Based Music Discovery System. In: International Conference on Dublin Core and Metadata Applications, pp. 182–183 (2011)
Phipps, J., Dunsire, G., Hillmann, D.: Building a platform to manage RDA vocabularies and data for an international, linked data world. J. Libr. Metadata 15(3–4), 252–264 (2015). https://doi.org/10.1080/19386389.2015.1099990
Pisanski, J., Žumer, M.: User verification of the FRBR conceptual model. J. Doc. 68(4), 582–592 (2012). https://doi.org/10.1108/00220411211239129
Putz, M., Schaffner, V., Seidler, W.: FRBR: the MAB2 perspective. Cat. Classif. Q. 50, 387–401 (2012)
Riley, J.: Enhancing interoperability of FRBR-based metadata. In: International Conference on Dublin Core and Metadata Applications (2010)
Riva, P.: Mapping MARC 21 linking entry fields to FRBR and Tillett’s taxonomy of bibliographic relationships. Libr. Resour. Tech. Serv. 48(2), 130–143 (2013)
Riva, P., Le Boeuf, P., Žumer, M.: FRBR-Library reference model. Tech. rep., IFLA FRBR Review Group. https://www.ifla.org/files/assets/cataloguing/frbr-lrm/frbr-lrm_20160225.pdf (2016)
Romero, G.C., Esteban, M.P.E., Such, M.M., Carrasco, R.C.: Transformation of a library catalogue into RDA linked open data. In: Theory and Practice of Digital Libraries (TPDL), pp. 321–325 (2015). https://doi.org/10.1007/978-3-319-24592-8_26
Schneider, J.: FRBRizing MARC records with the FRBR Display Tool. http://jodischneider.com/pubs/2008may_frbr.html (2008)
Takhirov, N., Aalberg, T., Duchateau, F., Žumer, M.: FRBR-ML: a FRBR-based framework for semantic interoperability. Semant. Web J. 3, 23–43 (2012)
Vassallo, V., Piccininno, M.: Aggregating Content for Europeana: A Workflow to Support Content Providers, pp. 445–454. Springer, Berlin (2012). https://doi.org/10.1007/978-3-642-33290-6_50
Vila-Suero, D., Villazón-Terrazas, B.,: Gómez-Pérez, A.: datos. bne. es: a library linked dataset. Semant. Web 4(3), 307–313 (2013)
Zhang, Y., Salaba, A.: Implementing FRBR in Libraries: Key Issues and Future Directions. Neal-Schuman Publishers, New York (2009)
Acknowledgements
This work has been partially supported by the French Agency ANRT (www.anrt.asso.fr), the company PROGILONE (www.progilone.com/), a PHC Aurora funding (#34047VH) and a CNRS PICS funding (#PICS06945).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Aalberg, T., Duchateau, F., Takhirov, N. et al. Benchmarking and evaluating the interpretation of bibliographic records. Int J Digit Libr 20, 143–165 (2019). https://doi.org/10.1007/s00799-018-0233-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00799-018-0233-2