Benchmarking and evaluating the interpretation of bibliographic records

Aalberg, Trond; Duchateau, Fabien; Takhirov, Naimdjon; Decourselle, Joffrey; Lumineau, Nicolas

doi:10.1007/s00799-018-0233-2

Benchmarking and evaluating the interpretation of bibliographic records

Published: 30 January 2018

Volume 20, pages 143–165, (2019)
Cite this article

International Journal on Digital Libraries Aims and scope Submit manuscript

Trond Aalberg¹,
Fabien Duchateau ORCID: orcid.org/0000-0001-6803-917X²,
Naimdjon Takhirov³,
Joffrey Decourselle² &
…
Nicolas Lumineau²

480 Accesses
4 Citations
6 Altmetric
Explore all metrics

Abstract

In a global context which promotes the use of explicit semantics for sharing information and developing new services, the MAchine Readable Cataloguing (MARC) format that is commonly used by libraries worldwide has demonstrated its many limitations. The conceptual reference model for bibliographic information presented in the Functional Requirements for Bibliographic Records (FRBR) is expected to be the foundation for a new generation of catalogs that will replace MARC and the digital card catalog. The need for transformation of legacy MARC records to FRBR representation (FRBRization) has led to the proposal of various tools and approaches. However, these projects and the results they achieve are difficult to compare due to lack of common datasets and well defined and appropriate metrics. Our contributions fill this gap by proposing BIB-R, the first public benchmark for the FRBRization process. It is composed of two datasets that enable the identification of the strengths and weaknesses of a FRBRization tool. It also defines a set of well defined metrics that evaluate the different steps of the FRBRization process. Those resources, as well as the results of a large experiment involving three FRBRization tools tested against our benchmark, are available to the community under an open licence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence to automate the systematic review of scientific literature

Article Open access 11 May 2023

Visualizing Bibliometric Networks

Scientific paper recommendation systems: a literature review of recent publications

Article Open access 05 October 2022

Notes

http://www.rdatoolkit.org/.
http://www.ld4l.org/ontology.
http://loc.gov/bibframe/.
http://bib-r.github.io/.
http://creativecommons.org/licenses/by-nc/2.0/.
We use “Person” in our examples for the sake of readability. The initial FRBR model also includes a Corporate Body entity type. In the revised Library Reference Model the proper supertype is “Agent”.
“Concept” is used as a categorical supertype for anything that can be the subject.
Variations VFRBR rules available at http://www.dlib.indiana.edu/projects/vfrbr/projectDoc/.
http://bib-r.github.io/specifications-metrics.txt.
Note that the extraction can have a higher complexity in specific cases, such as when records contain references to other record(s) which needs to be looked up during the extraction.
Presentation at code4lib 2011 about improving the performance of eXtensible Catalog’s deduplication module, http://www.extensiblecatalog.org/learnmore/publications.
Our expert collections include specific annotations for each element of the patterns, else it would not be possible to compute the metrics MEND, MRND and ESE.
http://www.rdaregistry.info/.
BIB-RCAT is a recursive acronym that stands for “BIB-RCAT Is Basically a Real-world CATalogue”.
http://bib-r.github.io/mappings.xml.
https://github.com/naimdjon/marc2frbr FRBR-ML tool, previously named marc2frbr.
http://www.extensiblecatalog.org/ Extensible Catalog.
https://github.com/naimdjon/vfrbr-frbrize-marc Variations VFRBR tool (adjusted version, only to facilitate compilation).
The tests have been chosen according to a sequential order (remind that test 5.4 does not exist). The analysis of the results is, however, not limited to this subset.
Note that XC does not create Agent and Concepts entities, but it rather adds properties within the main Work or Expression. Our evaluation takes this specificity into account and XC is not penalized when a property and its associated value correctly represent the Agent or the Concept.
Note that the category patterns 4.x (aggregations) and 5.x (complementary works) do not have secondary elements and all tools achieve a 0% ESE score for these tests.
Note that the expert had knowledge about the proposed metrics, and the given time may increase for people who need to understand the concepts behind these metrics.
http://www.nist.gov/tac/2016/KBP/.

References

Aalberg, T.: A process and tool for the conversion of MARC records to a normalized FRBR implementation. LNCS Digit. Libr. Achiev. Chall. Oppor. 4312, 283–292 (2006). https://doi.org/10.1007/11931584_31
Google Scholar
Aalberg, T., Merčun, T., Žumer, M.: Coding FRBR-structured bibliographic information in MARC, pp. 128–137. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-24826-9
Google Scholar
Aalberg, T., Žumer, M.: The value of MARC Data, or, challenges of FRBRisation. J. Doc. 69, 851–872 (2013)
Article Google Scholar
Alemu, G., Stevens, B., Ross, P., Chandler, J.: Linked data for libraries: benefits of a conceptual shift from library-specific record structures to RDF-based data models. New Libr. World 113, 549–570 (2012)
Article Google Scholar
Alexe, B., Tan, W.C., Velegrakis, Y.: STBenchmark: towards a benchmark for mapping systems. Proc. VLDB 1(1), 230–244 (2008)
Article Google Scholar
Bailey, P., Hawking, D., Krumpholz, A.: Toward meaningful test collections for information integration benchmarking. In: Proceedings of IIWeb. http://es.csiro.au/pubs/bailey_iiweb.pdf (2006)
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol) 57, 289–300 (1995)
MathSciNet MATH Google Scholar
Boehm, B., Abts, C., Chulani, S.: Software development cost estimation approaches—a survey. Ann. Softw. Eng. 10(1), 177–205 (2000). https://doi.org/10.1023/A:1018991717352
Article MATH Google Scholar
Bowen, J.: Moving library metadata toward linked data: opportunities provided by the eXtensible catalog. In: International Conference on Dublin Core and Metadata Applications (2010)
Buchanan, G.: FRBR: enriching and integrating digital libraries. In: Proceedings of Joint Conference on Digital Libraries, pp. 260–269 (2006). https://doi.org/10.1145/1141753.1141812
Chang, N., Tsai, Y., Dunsire, G., Hopkinson, A.: Experimenting with implementing FRBR in a Chinese Koha system. Libr. Hi Tech News 30, 10–20 (2013)
Article Google Scholar
Christen, P., Goiser, K.: Quality and complexity measures for data linkage and deduplication. Qual. Meas. Data Min. 43, 127–151 (2007)
Google Scholar
Committee, S., Group, I.S.: Functional Requirements for Bibliographic Records: Final Report, vol. 19. K. G. Saur (1998)
Coyle, K.: FRBR, twenty years on. Cat. Classif. Q. 57, 1–21 (2014)
Google Scholar
Decourselle, J., Duchateau, F., Aalberg, T., Takhirov, N., Lumineau, N.: Appendix: benchmarking and evaluating the interpretation of bibliographic records. Tech. rep., LIRIS, NTNU. http://liris.cnrs.fr/~fduchate/docs/appendix/appendix-BIB-R.pdf (2016)
Decourselle, J., Duchateau, F., Aalberg, T., Takhirov, N., Lumineau, N.: BIB-R: a benchmark for the interpretation of bibliographic records. In: Theory and Practice of Digital Libraries (TPDL). Hannover, Germany. https://hal.archives-ouvertes.fr/hal-01324529 (2016)
Decourselle, J., Duchateau, F., Aalberg, T., Takhirov, N., Lumineau, N.: Open datasets for evaluating the interpretation of bibliographic records. In: Proceedings of Joint Conference on Digital Libraries. ACM (2016)
Decourselle, J., Duchateau, F., Lumineau, N.: A survey of FRBRization techniques. In: Theory and Practice of Digital Libraries, pp. 185–196. https://hal.archives-ouvertes.fr/hal-01198487 (2015)
Denton, W.: FRBR and the History of Cataloging. In: Taylor, A.G. (ed.) Understanding FRBR: What it is and How it Will Affect Our Retrieval Tools (2007) Libraries Unlimited, Westport
Dickey, T.J.: FRBRization of a library catalog: better collocation of records, leading to enhanced search, retrieval, and display. Inf. Technol. Libr. 27, 23–32 (2008)
Google Scholar
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007)
Article Google Scholar
Euzenat, J., Rosoiu, M.E., Trojahn, C.: Ontology matching benchmarks: generation, stability, and discriminability. Web Semant. Sci. Serv. Agents World Wide Web 21, 30–48 (2013)
Article Google Scholar
Hickey, T., Vizine-Goetz, D.: Implementing FRBR on Large Databases. OCLC, Dublin (2002)
Google Scholar
Hickey, T.B., O’Neill, E.T.: FRBRizing OCLC’s WorldCat. Cat. Classif. Q. 39, 239–251 (2005)
Google Scholar
Hickey, T.B., Toves, J.: FRBR work-set algorithm (2.0). OCLC. http://www.oclc.org/research/activities/frbralgorithm.html?urlm=159780 (2009)
Ioannou, E., Rassadko, N., Velegrakis, Y.: On generating benchmark data for entity matching. J. Data Semant. 2(1), 37–56 (2013). https://doi.org/10.1007/s13740-012-0015-8
Article Google Scholar
Kilner, K.: The AustLit gateway and scholarly bibliography: a specialist implementation of the FRBR. Cat. Classif. Q. 39, 87–102 (2005)
Google Scholar
Kroeger, A.: The road to bibframe: the evolution of the idea of bibliographic transition into a post-marc future. Cat. Classif. Q. 51(8), 873–890 (2013)
Google Scholar
Le Bœuf, P.: Customized OPACs on the Semantic Web: the OpenCat prototype. IFLA World Library and Information Congress, pp. 1–15 (2013)
Leopold, C.: Parallel and Distributed Computing: A Survey of Models, Paradigms and Approaches. Wiley, Hoboken (2001)
Google Scholar
Leroy, G.: Gold Standard and User Evaluations, pp. 131–137. Springer, London (2011). https://doi.org/10.1007/978-0-85729-622-1_9
Google Scholar
Manguinhas, H.M.A., Freire, N.M.A., Borbinha, J.L.B.: FRBRization of MARC records in multiple catalogs. In: Hunter, J., Lagoze, C., Giles, C.L., Li, Y.F. (eds.) JCDL, pp. 225–234. ACM (2010)
Minadakis, N., Marketakis, Y., Kondylakis, H., Flouris, G., Theodoridou, M., Doerr, M., de Jong, G.: X3ml framework: an effective suite for supporting data mappings. In: Workshop for Extending, Mapping and Focusing the CRM—Co-located with TPDL (2015)
Norman, D.A.: The Design of Everyday Things: Revised and, Expanded edn. Basic Books, New York (2013)
Google Scholar
Notess, M., Dunn, J.W., Hardesty, J.L.: Scherzo: A FRBR-Based Music Discovery System. In: International Conference on Dublin Core and Metadata Applications, pp. 182–183 (2011)
Phipps, J., Dunsire, G., Hillmann, D.: Building a platform to manage RDA vocabularies and data for an international, linked data world. J. Libr. Metadata 15(3–4), 252–264 (2015). https://doi.org/10.1080/19386389.2015.1099990
Article Google Scholar
Pisanski, J., Žumer, M.: User verification of the FRBR conceptual model. J. Doc. 68(4), 582–592 (2012). https://doi.org/10.1108/00220411211239129
Article Google Scholar
Putz, M., Schaffner, V., Seidler, W.: FRBR: the MAB2 perspective. Cat. Classif. Q. 50, 387–401 (2012)
Google Scholar
Riley, J.: Enhancing interoperability of FRBR-based metadata. In: International Conference on Dublin Core and Metadata Applications (2010)
Riva, P.: Mapping MARC 21 linking entry fields to FRBR and Tillett’s taxonomy of bibliographic relationships. Libr. Resour. Tech. Serv. 48(2), 130–143 (2013)
Google Scholar
Riva, P., Le Boeuf, P., Žumer, M.: FRBR-Library reference model. Tech. rep., IFLA FRBR Review Group. https://www.ifla.org/files/assets/cataloguing/frbr-lrm/frbr-lrm_20160225.pdf (2016)
Romero, G.C., Esteban, M.P.E., Such, M.M., Carrasco, R.C.: Transformation of a library catalogue into RDA linked open data. In: Theory and Practice of Digital Libraries (TPDL), pp. 321–325 (2015). https://doi.org/10.1007/978-3-319-24592-8_26
Schneider, J.: FRBRizing MARC records with the FRBR Display Tool. http://jodischneider.com/pubs/2008may_frbr.html (2008)
Takhirov, N., Aalberg, T., Duchateau, F., Žumer, M.: FRBR-ML: a FRBR-based framework for semantic interoperability. Semant. Web J. 3, 23–43 (2012)
Google Scholar
Vassallo, V., Piccininno, M.: Aggregating Content for Europeana: A Workflow to Support Content Providers, pp. 445–454. Springer, Berlin (2012). https://doi.org/10.1007/978-3-642-33290-6_50
Google Scholar
Vila-Suero, D., Villazón-Terrazas, B.,: Gómez-Pérez, A.: datos. bne. es: a library linked dataset. Semant. Web 4(3), 307–313 (2013)
Google Scholar
Zhang, Y., Salaba, A.: Implementing FRBR in Libraries: Key Issues and Future Directions. Neal-Schuman Publishers, New York (2009)
Google Scholar

Download references

Acknowledgements

This work has been partially supported by the French Agency ANRT (www.anrt.asso.fr), the company PROGILONE (www.progilone.com/), a PHC Aurora funding (#34047VH) and a CNRS PICS funding (#PICS06945).

Author information

Authors and Affiliations

NTNU, Trondheim, Norway
Trond Aalberg
LIRIS, UMR5205, Université Claude Bernard Lyon 1, Lyon, France
Fabien Duchateau, Joffrey Decourselle & Nicolas Lumineau
Faculty of Technology, Westerdals - Oslo School of Arts, Communication and Technology, Oslo, Norway
Naimdjon Takhirov

Authors

Trond Aalberg
View author publications
You can also search for this author in PubMed Google Scholar
Fabien Duchateau
View author publications
You can also search for this author in PubMed Google Scholar
Naimdjon Takhirov
View author publications
You can also search for this author in PubMed Google Scholar
Joffrey Decourselle
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Lumineau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fabien Duchateau.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aalberg, T., Duchateau, F., Takhirov, N. et al. Benchmarking and evaluating the interpretation of bibliographic records. Int J Digit Libr 20, 143–165 (2019). https://doi.org/10.1007/s00799-018-0233-2

Download citation

Received: 31 January 2017
Revised: 14 December 2017
Accepted: 01 January 2018
Published: 30 January 2018
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s00799-018-0233-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Benchmarking and evaluating the interpretation of bibliographic records

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence to automate the systematic review of scientific literature

Visualizing Bibliometric Networks

Scientific paper recommendation systems: a literature review of recent publications

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Benchmarking and evaluating the interpretation of bibliographic records

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence to automate the systematic review of scientific literature

Visualizing Bibliometric Networks

Scientific paper recommendation systems: a literature review of recent publications

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation