Skip to main content

Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order Terms

  • Conference paper
Inductive Logic Programming (ILP 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5194))

Included in the following conference series:

Abstract

Integrating heterogeneous data from sources as diverse as web pages, digital libraries, knowledge bases, the Semantic Web and databases is an open problem. The ultimate aim of our work is to be able to query such heterogeneous data sources as if their data were conveniently held in a single relational database. Pursuant to this aim, we propose a generalisation of joins from the relational database model to enable joins on arbitrarily complex structured data in a higher-order representation. By incorporating kernels and distances for structured data, we further extend this model to support approximate joins of heterogeneous data. We demonstrate the flexibility of our approach in the publications domain by evaluating example approximate queries on the CORA data sets, joining on types ranging from sets of co-authors through to entire publications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Codd, E.F.: The Relational Model for Database Management, Version 2. Addison-Wesley, Reading (1990)

    MATH  Google Scholar 

  2. Date, C.J.: An Introduction to Database Systems. Addison-Wesley Longman Publishing Co., Inc., Boston (1991)

    Google Scholar 

  3. Lloyd, J.W.: Logic and Learning. Springer, New York (2003)

    Google Scholar 

  4. Gaertner, T., Lloyd, J.W., Flach, P.A.: Kernels and distances for structured data. Mach. Learn. 57(3), 205–232 (2004)

    Article  MATH  Google Scholar 

  5. Church, A.: A formulation of the simple theory of types. Journal of Symbolic Logic 5(2), 56–68 (1940)

    Article  MATH  MathSciNet  Google Scholar 

  6. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  7. Gyftodimos, E., Flach, P.A.: Combining bayesian networks with higher-order data representations. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds.) IDA 2005. LNCS, vol. 3646, pp. 145–156. Springer, Heidelberg (2005)

    Google Scholar 

  8. Culotta, A., McCallum, A.: Joint deduplication of multiple record types in relational data. In: CIKM 2005: Proceedings of the 14th ACM international conference on Information and knowledge management, pp. 257–258. ACM, New York (2005)

    Chapter  Google Scholar 

  9. Lawrence, S., Bollacker, K., Giles, C.L.: Autonomous citation matching. In: Proceedings of the 3rd International Conference on Autonomous Agents, pp. 392–393. ACM Press, New York (May 1999)

    Chapter  Google Scholar 

  10. Newman, M.E.J.: The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. USA 98, 404–409 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  11. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American (May 2001)

    Google Scholar 

  12. Prud’hommeaux, E., Seabourne, A.: SPARQL Query Language for RDF. W3C, W3C Working Draft April 19, 2005 edn. (April 2005)

    Google Scholar 

  13. McGuinness, D.L., van Harmelen, F.: OWL Web Ontology Language overview (2004)

    Google Scholar 

  14. Maedche, A., Staab, S.: Measuring similarity between ontologies. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, pp. 251–263. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  15. Nienhuys-Cheng, S.H.: Distance between herbrand interpretations: A measure for approximations to a target concept. In: [24], pp. 213–226

    Google Scholar 

  16. Sebag, M.: Distance induction in first order logic. In: [24], pp. 264–272

    Google Scholar 

  17. Bohnebeck, U., Horváth, T., Wrobel, S.: Term comparisons in first-order similarity measures. In: Page, D.L. (ed.) ILP 1998. LNCS, vol. 1446, pp. 65–79. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  18. Kirsten, M., Wrobel, S.: Extending k-means clustering to first-order representations. In: Cussens, J., Frisch, A.M. (eds.) ILP 2000. LNCS (LNAI), vol. 1866, pp. 112–129. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  19. Bhattacharya, I., Getoor, L.: Relational clustering for multi-type entity resolution. In: MRDM 2005: Proceedings of the 4th international workshop on Multi-relational mining, pp. 3–12. ACM Press, New York (2005)

    Chapter  Google Scholar 

  20. Woznica, A., Kalousis, A., Kalousis, M.H.A., Hilario, M.: Kernels over relational algebra structures. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 588–598. Springer, Heidelberg (2005)

    Google Scholar 

  21. Domingos, P., Domingos, P.: Multi-relational record linkage. In: Dzeroski, S., Blockeel, H. (eds.) Proceedings of the 2004 ACM SIGKDD Workshop on Multi-Relational Data Mining, pp. 31–48 (August 2004)

    Google Scholar 

  22. Bhattacharya, I., Getoor, L.: A latent Dirichlet model for unsupervised entity resolution. In: 6th SIAM Conference on Data Mining (SDM 2006), Bethesda, MD (2006)

    Google Scholar 

  23. d’Amato, C., Fanizzi, N., Esposito, F.: Induction of optimal semantic semi-distances for clausal knowledge bases. In: Blockeel, H., Ramon, J., Shavlik, J., Tadepalli, P. (eds.) ILP 2007. LNCS (LNAI), vol. 4894, pp. 29–38. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  24. Lavrac, N., Dzeroski, S.(eds.): ILP 1997. LNCS, vol. 1297. Springer, Heidelberg (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Filip Železný Nada Lavrač

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Price, S., Flach, P. (2008). Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order Terms. In: Železný, F., Lavrač, N. (eds) Inductive Logic Programming. ILP 2008. Lecture Notes in Computer Science(), vol 5194. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85928-4_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85928-4_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85927-7

  • Online ISBN: 978-3-540-85928-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics