Skip to main content

Identification of FRBR Works Within Bibliographic Databases: An Experiment with UNIMARC and Duplicate Detection Techniques

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4822))

Abstract

Many experiments and studies have been conducted on the application of FRBR as an implementation model for bibliographic databases, in order to improve the services of resource discovery and transmit better perception of the information spaces represented in catalogues. One of these applications is the attempt to identify the FRBR work instances shared by several bibliographic records. In our work we evaluate the applicability to this problem of techniques based on string similarity, used in duplicate detection procedures mainly by the database research community. We describe the particularities of the application of these techniques to bibliographic data, and empirically compare the results obtained with these techniques to those obtained by current techniques, which are based on exact matching. Experiments performed on the Portuguese national union catalogue show a significant improvement over currently used approaches.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. IFLA Study Group on the Functional Requirements for Bibliographic Records: Functional requirements for bibliographic records: final report. München: K.G. Saur, UBCIM publications, new series, vol. 19 (1998), www.ifla.org/VII/s13/frbr/frbr.pdf ISBN 3-598-11382-X

  2. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523 (1988)

    Article  Google Scholar 

  3. Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate Record Detection: A Survey. IEEE Transactions on knowledge and data engineering 19(1), 1–16 (2007)

    Article  Google Scholar 

  4. Bilenko, M., Mooney, R.J., Cohen, W.W., Ravikumar, P., Fienberg, S.E.: Adaptive name matching in information integration. IEEE Intelligent Systems 18(5), 16–23 (2003)

    Article  Google Scholar 

  5. Zhao, M.: Semantic matching across heterogeneous data sources. Communications of the ACM 50(1), 45–50 (2007)

    Article  Google Scholar 

  6. Zhao, H., Ram, S.: Entity identification for heterogeneous database integration: A multiple classifier system approach and empirical evaluation. Information Systems 30(2), 119–132 (2005)

    Article  Google Scholar 

  7. Hickey, T.B., O’Neill, E.T., Toves, J.: Experiments with the IFLA Functional Requirements for Bibliographic Records (FRBR). D-Lib Magazine 8, 9 (2002), http://www.dlib.org/dlib/september02/hickey/09hickey.html

  8. California Digital Library.: The Melvyl Recommender Project. Full Text Extension. Supplementary Report (2006), http://www.cdlib.org/inside/projects/melvyl_recommender/report_docs/mellon_extension.pdf

  9. Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A Comparison of String Distance Metrics for Name-Matching Tasks. American Association for Artificial Intelligence (2003), http://www.isi.edu/info-agents/workshops/ijcai03/papers/Cohen-p.pdf

  10. Jaro, M.A.: Advances in record linking methodology as applied to the 1985 census of Tampa Florida. Journal of the American Statistical Society 64, 1183–1210 (1989)

    Google Scholar 

  11. Kaiser, M., Lieder, H.J., Majcen, K., Vallant, H.: New Ways of Sharing and Using Authority Information. D-Lib Magazine 9, 11 (2003), http://www.dlib.org/dlib/november03/lieder/11lieder.html

  12. Lawrence, S., Giles, C.L., Bollacker, K.D.: Autonomous Citation Matching. In: Proceedings of the Third International Conference on Autonomous Agents, ACM press, New York (1999)

    Google Scholar 

  13. Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity Uncertainty and Citation Matching. In: Advances in Neural Information Processing (2002), http://people.csail.mit.edu/milch/papers/nipsnewer.pdf

  14. Lee, D., On, B.W., Kang, J., Park, S.: Effective and Scalable Solutions for Mixed and Split Citation Problems in Digital Libraries. In: Proceedings of the 2nd international workshop on Information quality in information systems, pp. 69–76 (2005)

    Google Scholar 

  15. Aalberg, T.: A process and tool for the conversion of MARC records to a normalized FRBR implementation. Digital Libraries: Achievements, Challenges and Opportunities. In: 9th International Conference on Asian Digital Libraries, pp. 283–292 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Dion Hoe-Lian Goh Tru Hoang Cao Ingeborg Torvik Sølvberg Edie Rasmussen

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Freire, N., Borbinha, J., Calado, P. (2007). Identification of FRBR Works Within Bibliographic Databases: An Experiment with UNIMARC and Duplicate Detection Techniques. In: Goh, D.HL., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds) Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers. ICADL 2007. Lecture Notes in Computer Science, vol 4822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77094-7_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77094-7_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77093-0

  • Online ISBN: 978-3-540-77094-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics