Advertisement

SudocAD: A Knowledge-Based System for the Author Linkage Problem

  • Michel Chein
  • Michel Leclère
  • Yann Nicolas
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 244)

Abstract

SudocAD is a system concerning the author linkage problem in a bibliographic database context. Having a bibliographic database \(\mathcal E\) and a (new) bibliographic notice d, r being an identifier of an author in \(\mathcal E\) and r′ being an identifier of an author in d: is that r and r′ refer to the same author ? The system, which is a prototype, has been evaluated in a real situation. Compared to results given by expert librarians, the results of SudocAD are interesting enough to plan a transformation of the prototype into a production system. SudocAD is based on a method combining numerical and knowledge based techniques. This method is abstractly defined and even though SudocAD is devoted to the author linkage problem the method could be adapted for other kinds of linkage problems especially in the semantic web context.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
    Arasu, A., Christopher, R., Suciu, D.: Large-scale deduplication with constraints using dedupalog. In: Proceedings of the 25th International Conference on Data Engineering (ICDE), pp. 952–963 (2009)Google Scholar
  3. 3.
    Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Euijong Whang, S., Widom, J.: Swoosh: a generic approach to entity resolution. The VLDB Journal 18(9-10), 255–276 (2009)CrossRefGoogle Scholar
  4. 4.
    Baget, J.-F., Leclère, M., Mugnier, M.-L., Salvat, E.: On rules with existential variables: Walking the decidability line. Artif. Intell. 175(9-10), 1620–1654 (2011)CrossRefzbMATHGoogle Scholar
  5. 5.
  6. 6.
    Chein, M., Mugnier, M.-L.: Graph-based Knowledge Representation. Springer, London (2009)zbMATHGoogle Scholar
  7. 7.
  8. 8.
    de Carvalho, M.G., Laender, A.H.F., Goncalves, M.A., da Silva, A.S.: Genetic programming approach to record deduplication. IEEE Transactions on Knowledge and Data Engineering 24(3), 399–412 (2012)CrossRefGoogle Scholar
  9. 9.
    Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: A survey. Transactions on Knowledge and Data Engineering, p. 2007 (2007)Google Scholar
  10. 10.
  11. 11.
  12. 12.
    Object Formulation of FRBR, http://www.cidoc-crm.org/frbr_inro.html
  13. 13.
    Fellegi, I.P., Sunter, A.B.: A theory for record linkage. Journal of the American Statistical Association (1969)Google Scholar
  14. 14.
    Fatiha Sais, F., Pernelle, N., Rousset, M.-C.: Combining a logical and a numerical method for reference reconciliation. Journal of Data Semantics, 66–94 (2009)Google Scholar
  15. 15.
    Gu, L., Baxter, R., Vickers, D., Rainsford, C.: Record linkage: current practice and future directions. Technical Report 03/83, CSIRO Mathematical and Information Sciences (2003)Google Scholar
  16. 16.
    Gomatam, S.: An empirical comparison of record linkage procedures. Statist. Med. 21(1), 1485–1496 (2002)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Hernández, M.A., Stolfo, S.J.: Real-world data is dirty: data cleansing and the merge/purge problem. Data Min. Knowl. Discov. 20(2(1)), 9–37 (1998)Google Scholar
  18. 18.
    IdRef:authority files of the Sudoc database, http://en.abes.fr/Other-services/IdRef
  19. 19.
    Suchanek, F.M., Abiteboul, S., Senellart, P.: Paris: Probabilistic alignment of relations, instances, and schema. Proceedings of the VLDB Endowment 5(3), 157–168 (2012)Google Scholar
  20. 20.
    Newcombe, H.B., Kennedy, J.M., Axford, S.J., James, A.P.: Automatic linkage of vital records. Science (1959)Google Scholar
  21. 21.
    Neil, N.R., Smalheiser, R., Torvik, V.I.: Author name disambiguation. Annual Review of Information Science and Technology (ARIST) 43 (2009)Google Scholar
  22. 22.
  23. 23.
    RDA: Resource Description and Access, http://www.rda-jsc.org/rda.html
  24. 24.
    Singla, P., Domingos, P.: Object identification with attribute-mediated dependences. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 297–308. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  25. 25.
    Shvaiko, P., Euzenat, J.: Ontology matching: State of the art and future challenges. IEEE Trans. Knowl. Data Eng. 25(1), 158–176 (2013)CrossRefGoogle Scholar
  26. 26.
    Sais, F., Pernelle, N., Rousset, M.-C.: L2r: a logical method for reference reconciliation. In: Proc. of AAAI 2007, pp. 329–334 (2007)Google Scholar
  27. 27.
  28. 28.
  29. 29.
    Winkler, W.E.: Overview of record linkage and current research directions. Technical report, U.S. Census Bureau (2006)Google Scholar
  30. 30.
    Winkler, W.E.: Record linkage references. Technical report, U.S. Census Bureau (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Michel Chein
    • 1
  • Michel Leclère
    • 1
  • Yann Nicolas
    • 2
  1. 1.LIRMM-GraphIK (CNRS, INRIA, UM2)Montpellier Cedex 5France
  2. 2.ABESMontpellier Cedex 5France

Personalised recommendations