Skip to main content

PACE: A General-Purpose Tool for Authority Control

  • Conference paper

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 240))

Abstract

Curating the records of an authority file is an activity as important as committing for many organizations, which have to rely on experts equipped with so-called authority control tools, capable of automatically supporting complex disambiguation workflows through user-friendly interfaces. This paper presents PACE, an open source authority control tool which offers user interfaces for (i) customizing the structure (ontology) of authority files, (ii) tune-up probabilistic disambiguation of authority files through a set of similarity functions for detecting record candidates for duplication and overload (iii) curate such authority files by applying record merges and splitting actions, and (iv) expose authority files to third-party consumers in several ways. PACE’s back-end is based on Cassandra’s “NOSQL”technology to offer (i) read-write performances that scale up linearly with the number of records and (ii) parallel and efficient (MapReduce-based) record sorting and matching algorithms.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Benjelloun, O., Garcia-Molina, H., Su, Q., Widom, J.: Swoosh: A generic approach to entity resolution. Stanford University technical report (March 2005)

    Google Scholar 

  2. Charikar, M.: Similarity estimation techniques from rounding algorithms. In: 34th Annual Symposium on Theory and Computing, Montreal, Quebec, Canada (May 2002)

    Google Scholar 

  3. Christen, T., Churches, P., Zhu, J.: Probabilistic name and address cleaning and standardization. In: The Australian Data Mining Workshop (November 2002)

    Google Scholar 

  4. Churches, T., Christen, P., Lu, J., Zhu, J.X.: Preparation of name and address data for record linkage using hidden markov models. BioMed Central Medical Informatics and Decision Making 2(9) (2002)

    Google Scholar 

  5. Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A comparison of string metrics for matching names and addresses. In: International Joint Conference on Artificial Intelligence, Proceedings of the Workshop on Information Integration on the Web (August 2003)

    Google Scholar 

  6. Dalrymple, P.W., Young, J.A.: From authority control to informed retrieval: Framing the expanded domain of subject access. College & Research Libraries 52, 139–149 (1991)

    Article  Google Scholar 

  7. Elmagarmid, A., Ipeirotis, P., Verykios, V.: Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1–16 (2007)

    Article  Google Scholar 

  8. Fayad, U., Uthurusamy, R.: Evolving data mining into solutions for insights. Communications of the Association of Computing Machinery 45(8), 28–31 (2002)

    Article  Google Scholar 

  9. Gong, C., Huang, Y., Cheng, X., Bai, S.: Detecting near-duplicates in large-scale short text databases. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 877–883. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  10. Gorman, M.: Authority control in the context of bibliographic control in the electronic environment. In: International Conference Authority Control: Definition and International Experiences, Florence, February 10-12 (2003)

    Google Scholar 

  11. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44, 35–40 (2010)

    Article  Google Scholar 

  12. Manku, G., Jain, A., S.A.D.: Detecting near-duplicates for web crawling. In: 16th International World Wide Conference, Banff, Alberta, Canada (May 2007)

    Google Scholar 

  13. Rick, B., Hengel-Dittrich, C., O’Neill, E.T., Tillett, B.: Viaf (virtual international authority file): Linking the deutsche nationalbibliothek and library of congress name authority files. International Cataloging and Bibliographic Control 36(1), 12–19 (2007)

    Google Scholar 

  14. Tejada, S., Knoblock, C., Minton, S.: Learning object identification rules for information extraction. Information Systems 26(8), 607–633 (2001)

    Article  MATH  Google Scholar 

  15. Tillett, B.T.: Authority control: State of the art and new perspectives. In: Authority Control International Conference, Florence, Italy (2003)

    Google Scholar 

  16. Wang, C., Wang, J., Lin, X., Wang, W., Wang, H., Li, H., Tian, W., Xu, J., Li, R.: Mapdupreducer: detecting near duplicates over massive datasets. In: Proceedings of the 2010 International Conference on Management of Data, SIGMOD 2010, pp. 1119–1122. ACM, New York (2010)

    Google Scholar 

  17. Weber, J.: Leaf. linking and exploring authority files. In: International Conference Authority Control: Definition and International Experiences, Florence, February 10-12 (2003)

    Google Scholar 

  18. Winkler, W.E.: String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage. In: Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 354–359 (1990)

    Google Scholar 

  19. Winkler, W.E.: Overview of record linkage and current research directions. Technical report, Research Report Series, RRS (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Manghi, P., Mikulicic, M. (2011). PACE: A General-Purpose Tool for Authority Control. In: García-Barriocanal, E., Cebeci, Z., Okur, M.C., Öztürk, A. (eds) Metadata and Semantic Research. MTSR 2011. Communications in Computer and Information Science, vol 240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24731-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24731-6_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24730-9

  • Online ISBN: 978-3-642-24731-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics