Discovering Similarity and Dissimilarity Relations for Knowledge Propagation in Web Ontologies

Abstract

We focus on the problem of predicting missing class memberships and property assertions in Web Ontologies. We start from the assumption that related entities influence each other, and they may be either similar or dissimilar with respect to a given set of properties: the former case is referred to as homophily, and the latter as heterophily. We present an efficient method for predicting missing class and property assertions for a set of individuals within an ontology by: identifying relations that are likely to encode influence relations between individuals (learning phase) and Leveraging such relations for propagating property information across related entities (inference phase). We show that the complexity of both inference and learning is nearly linear in the number of edges in the influence graph, and we provide an empirical evaluation of the proposed method.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. 1.

    OWL 2 W3C Recommendation: http://www.w3.org/TR/owl-overview/.

  2. 2.

    Plase note that class memberships can be regarded as type, or is-a, properties: the assertion “x is American” can be encoded as \(\text {American}(x)\) or as \(\text {nationality}(x, \text {American})\).

  3. 3.

    For instance, see the probabilistic interpretation of the penalty term in the end of this section.

  4. 4.

    A matrix \(\mathbf {A}\) is SDD iff \(\mathbf {A}\) is symmetric (i.e. \(\mathbf {A}= \mathbf {A}^{T}\)) and \(\forall i : \mathbf {A}_{ii} \ge \sum _{i \ne j} |\mathbf {A}_{ij}|\).

  5. 5.

    The main difference between SPARQL and SPARQL-DL queries is that, in SPARQL-DL queries, one needs to specify whether a variable occurring in place of a role name refers to an object property or a data property.

  6. 6.

    Pellet v2.3.1—http://clarkparsia.com/pellet/.

  7. 7.

    Static dump version V2012-02-21 retrieved from http://www.aifb.kit.edu/web/Wissensmanagement/Portal.

  8. 8.

    http://data.bgs.ac.uk/ as of March 2014.

  9. 9.

    https://www.bgs.ac.uk/opengeoscience/.

  10. 10.

    http://www.aifb.kit.edu/

  11. 11.

    https://code.google.com/p/lamg/.

References

  1. 1.

    Aggarwal CC (ed) (2011) Social network data analytics. Springer, New York

    Google Scholar 

  2. 2.

    Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives ZG Dbpedia (2007) A nucleus for a web of open data. In: Aberer K et al (eds) The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 \(+\)ASWC2007, Busan,Korea, November 11–15, 2007, LNCS, vol 4825. Springer, Berlin, pp 722–735

  3. 3.

    Baader F, Calvanese D, McGuinness DL, Nardi D, Patel-Schneider PF (eds) (2007) The description logic handbook. Cambridge University Press, Cambridge

    Google Scholar 

  4. 4.

    Bengio Y, Delalleau O, Le Roux N (2006) Semi-Supervised Learning. In: Chapelle O, Schölkopf B, Zien A (eds) Label propagation and quadratic criterion. MIT Press, Cambridge, pp 193–216

    Google Scholar 

  5. 5.

    Berners-Lee T, Hendler J, Lassila O (2001) The semantic web. Sci Am 284(5):34–43

    Article  Google Scholar 

  6. 6.

    Bhagat S, Cormode G, Muthukrishnan S (2011) Node classification in social networks. In: Aggarwal CC [2], pp 115–148

  7. 7.

    Bishop CM (2006) Pattern recognition and machine learning. Springer, New York

    Google Scholar 

  8. 8.

    Bizer C, Heath T, Berners-Lee T (2009) Linked data—the story so far. Int J Semant Web Inf Syst 5(3):1–22

    Article  Google Scholar 

  9. 9.

    Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) DBpedia—a crystallization point for the web of data. J Web Sem 7(3):154–165

    Article  Google Scholar 

  10. 10.

    Bloehdorn S, Sure Y (2007) Kernel methods for mining instance data in ontologies. In: Aberer K et al (eds) The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 \(+\)ASWC2007, Busan,Korea, November 11–15, 2007, LNCS, vol 4825. Springer, Berlin, pp 58–71

  11. 11.

    Bollacker KD, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Wang JT (ed) Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10–12, 2008, pp 1247–1250. ACM

  12. 12.

    Bordes A, Gabrilovich E (2014) Constructing and mining web-scale knowledge graphs: KDD 2014 tutorial. In: Macskassy SA et al (eds) The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA—August 24–27, 2014. ACM

  13. 13.

    Bordes A, Glorot X, Weston J, Bengio Y (2014) A semantic matching energy function for learning with multi-relational data—application to word-sense disambiguation. Mach Learn 94(2):233–259

    MathSciNet  Article  MATH  Google Scholar 

  14. 14.

    Bordes A, Usunier N, García-Durán A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Burges CJC et al (eds) Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States, pp 2787–2795

  15. 15.

    Bordes A, Weston J, Collobert R, Bengio Y (2011) Learning structured embeddings of knowledge bases. In: Burgard W et al (eds) Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2011, San Francisco, California, USA, August 7–11, 2011. AAAI Press

  16. 16.

    Socher R, Chen D, Manning CD, Ng AY (2013) Reasoning with neural tensor networks for knowledge base completion. In: Burges CJC et al (eds) Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States, pp 926–934

  17. 17.

    Chapelle O, Schölkopf B, Zien A (eds) (2006) Semi-supervised learning. MIT Press, Cambridge

    Google Scholar 

  18. 18.

    Cohen MB, Kyng R, Miller GL, Pachocki JW, Peng R, Rao A, Xu SC (2014) Solving SDD linear systems in nearly mlog1/2n time. In: Shmoys DB (ed) Symposium on Theory of Computing, STOC 2014, New York, NY, USA,May 31—June 03, 2014. ACM, New York, pp 343–352

  19. 19.

    d’Amato C, Fanizzi N, Esposito F (2010) Inductive learning for the semantic web: what does it buy? Semantic Web 1(1–2):53–59. doi:10.3233/SW-2010-0007

    Google Scholar 

  20. 20.

    Davis J, Goadrich M (2006) The relationship between Precision-Recall and ROC curves. In: Cohen W et al (eds) Proceedings of ICML’06. ACM, pp 233–240

  21. 21.

    de Vries GKD (2013) A Fast Approximation of the Weisfeiler–Lehman Graph Kernel for RDF Data. In: Blockeel H et al (eds) Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23–27, 2013, Proceedings, Part I, LNCS, vol 8188. Springer, pp 606–621

  22. 22.

    Delalleau O, Bengio Y, Roux NL (2005) Efficient non-parametric function induction in semi-supervised learning. In: Cowell RG et al (eds) Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, AISTATS 2005, Bridgetown, Barbados, January 6–8, 2005. Society for Artificial Intelligence and Statistics

  23. 23.

    Domingos P, Lowd D, Kok S, Poon H, Richardson M, Singla P (2008) Just Add Weights: Markov Logic for the Semantic Web. In: da Costa PCG et al (eds) Uncertainty Reasoning for the Semantic Web I, LNAI, vol 5327. Springer, Berlin, pp 1–25

  24. 24.

    Dong X, Gabrilovich E, Heitz G, Horn W, Lao N, Murphy K, Strohmann T, Sun S, Zhang W (2014) Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: Macskassy SA et al (eds) The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA—August 24–27, 2014. ACM, pp 601–610

  25. 25.

    Fergus R,Weiss Y, Torralba A (2006) Semi-supervised learning in gigantic image collections. In: Bengio Y et al (eds) Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7–10 December 2009, Vancouver, British Columbia, Canada. Curran Associates, Inc, pp 522–530

  26. 26.

    Franz T, Schultz A, Sizov S, Staab S (2009) Triplerank: ranking semantic web data by tensor decomposition. In: Bernstein A et al (eds) International Semantic Web Conference, LNCS, vol 5823. Springer, Heidelberg, pp 213–228

  27. 27.

    Galárraga LA, Teflioudi C, Hose K, Suchanek FM (2013) AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In: Schwabe D et al (eds) 22nd International World Wide Web Conference, WWW ’13, Rio de Janeiro, Brazil, May 13–17, 2013. International World Wide Web Conferences Steering Committee/ACM, pp 413–422

  28. 28.

    Goldberg AB, Zhu X, Wright SJ (2007) Dissimilarity in graph-based semi-supervised classification. In: Meila M et al (eds) Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, AISTATS 2007, San Juan, Puerto Rico, March 21–24, 2007, JMLR Proceedings, vol 2, pp 155–162. JMLR.org

  29. 29.

    Harris S, Seaborne A (2013) SPARQL 1.1 Query Language . http://www.w3.org/TR/sparql11-query/

  30. 30.

    Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer New York Inc., New York

    Google Scholar 

  31. 31.

    Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer, Berlin

    Google Scholar 

  32. 32.

    Hellmann S, Lehmann J, Auer S (2009) Learning of OWL class descriptions on very large knowledge bases. Int J Semant Web Inform Syst 5(2):25–48

    Article  Google Scholar 

  33. 33.

    Hitzler P, Krötzsch M, Rudolph S (2009) Foundations of semantic web technologies. Chapman & Hall/CRC, Boca Raton

    Google Scholar 

  34. 34.

    Ji M, Sun Y, Danilevsky M, Han J, Gao J (2010) Graph regularized transductive classification on heterogeneous information networks. In: Balcázar JL et al (eds) ECML/PKDD (1), LNCS, vol 6321. Springer, Heidelberg, pp 570–586

  35. 35.

    Kok S, Domingos PM (2007) Statistical predicate invention. In: Ghahramani Z(ed) Machine Learning, Proceedings of the Twenty-Fourth International Conference (ICML 2007), Corvallis, Oregon, USA, June 20–24, 2007, ACM International Conference Proceeding Series, vol 227, pp 433–440. ACM, New York

  36. 36.

    Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT Press, Cambridge

    Google Scholar 

  37. 37.

    Koutra D, Ke TY, Kang U, Chau DH, Pao HKK, Faloutsos C (2011) Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms. In: Gunopulos D et al (eds) Proceedings of ECML/PKDD’11, LNCS, vol 6912, Springer, Berlin, pp 245–260

  38. 38.

    Krompaß D, Nickel M, Tresp V (2014) Querying factorized probabilistic triple databases. In: Mika P et al (eds) The Semantic Web—ISWC 2014—13th International Semantic Web Conference, Riva del Garda, Italy, October 19–23, 2014. Proceedings, Part II, LNCS, vol 8797. Springer, New York, pp 114–129

  39. 39.

    LeCun Y, Chopra S, Hadsell R, Ranzato M, Huang F (2006) Predicting Structured Data. In: Bakir G et al (eds) A tutorial on energy-based learning. MIT Press, Cambridge

    Google Scholar 

  40. 40.

    Lin HT, Koul N, Honavar V (2011) Learning Relational Bayesian Classifiers from RDF Data. In: Aroyo L et al (eds) International Semantic Web Conference (1), LNCS, vol 7031. Springer, Berlin, pp 389–404

  41. 41.

    Liu W, He J, Chang S (2010) Large graph construction for scalable semi-supervised learning. In: Fürnkranz J et al (eds) Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21–24, 2010, Haifa, Israel. Omnipress, Haifa, pp 679–686

  42. 42.

    Livne OE, Brandt A (2012) Lean algebraic multigrid (LAMG): fast graph laplacian linear solver. SIAM J Sci Comput 34(4):499–522

    MathSciNet  Article  MATH  Google Scholar 

  43. 43.

    Lösch U, Bloehdorn S, Rettinger A (2012) Graph kernels for RDF data. In: Simperl E et al (eds) The Semantic Web: Research and Applications—9th Extended Semantic Web Conference, ESWC 2012, Heraklion, Crete, Greece, May 27–31, 2012. Proceedings, LNCS, vol 7295. Springer, Heidelberg, pp 134–148

  44. 44.

    Luo C, Guan R, Wang Z, Lin C (2014) Hetpathmine: A novel transductive classification algorithm on heterogeneous information networks. In: de Rijke M et al (eds) Advances in Information Retrieval—36th European Conference on IR Research, ECIR 2014, Amsterdam, The Netherlands, April 13–16, 2014. Proceedings, LNCS, vol 8416. Springer, Berlin, pp 210–221

  45. 45.

    McPherson M, Lovin LS, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 27(1):415–444

    Article  Google Scholar 

  46. 46.

    Miller KT, Griffiths TL (2009) Jordan MI Nonparametric latent feature models for link prediction. In: Bengio Y et al (eds) Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7–10 December 2009, Vancouver, British Columbia, Canada. Curran Associates, Inc, pp 1276–1284

  47. 47.

    Minervini P, d’Amato C, Fanizzi N, Esposito F (2014) Adaptive knowledge propagation in web ontologies. In: Janowicz K et al (eds) Knowledge Engineering and Knowledge Management—19th International Conference, EKAW 2014, Linköping, Sweden, November 24–28, 2014. Proceedings, LNCS, vol. 8876. Springer, Berlin, pp 304–319

  48. 48.

    Minervini P, d’Amato C, Fanizzi N, Tresp V (2014) Learning to propagate knowledge in web ontologies. In: Bobillo F et al (eds) Proceedings of the 10th International Workshop on Uncertainty Reasoning for the Semantic Web (URSW 2014) co-located with the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, October 19, 2014., CEUR Workshop Proceedings, vol 1259. CEUR-WS.org, pp 13–24

  49. 49.

    Nayak R, Senellart P, Suchanek FM, Varde AS (2012) Discovering interesting information with advances in web technology. SIGKDD Explor 14(2):63–81

    Article  Google Scholar 

  50. 50.

    Nickel M, Murphy K, Tresp V, Gabrilovich E (2016) A review of relational machine learning for knowledge graphs. Proc IEEE 104(1):11–33

    Article  Google Scholar 

  51. 51.

    Nickel M, Tresp V, Kriegel H (2011) A three-way model for collective learning on multi-relational data. In: Getoor L et al (eds) Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28–July 2, 2011. Omnipress, pp 809–816

  52. 52.

    Nickel M, Tresp V, Kriegel H (2012) Factorizing YAGO: scalable machine learning for linked data. In: Mille A et al (eds) Proceedings of the 21st World Wide Web Conference 2012, WWW 2012, Lyon, France, April 16–20, 2012. ACM, pp 271–280

  53. 53.

    Peng R (2014) Spielman DA An efficient parallel solver for SDD linear systems. In: Shmoys DB (ed) Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May 31—June 03, 2014. ACM, New York, pp 333–342

  54. 54.

    Rasmussen CE, Williams CKI (2005) Gaussian processes for machine learning (adaptive computation and machine learning). MIT Press, Cambridge

    Google Scholar 

  55. 55.

    Rettinger A, Lösch U, Tresp V, d’Amato C, Fanizzi N (2012) Mining the Semantic Web: Statistical learning for next generation knowledge bases. Data Min Knowl Discov 24(3):613–662

    MathSciNet  Article  MATH  Google Scholar 

  56. 56.

    Rettinger A, Nickles M, Tresp V (2009) Statistical relational learning with formal ontologies. In: Buntine WL et al (eds) Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2009, Bled, Slovenia, September 7–11, 2009, Proceedings, Part II, LNCS, vol 5782. Springer, Berlin, pp 286–301

  57. 57.

    Schmachtenberg M, Bizer C, Paulheim H (2014) Adoption of the linked data best practices in different topical domains. In: Mika P et al (eds) The Semantic Web—ISWC 2014—13th International Semantic Web Conference, Riva del Garda, Italy, October 19–23, 2014. Proceedings, Part I, LNCS, vol 8796. Springer, Heidelberg, pp 245–260

  58. 58.

    Shadbolt N, Berners-Lee T, Hall W (2006) The semantic web revisited. IEEE Intell Syst 21(3):96–101

    Article  Google Scholar 

  59. 59.

    Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge

  60. 60.

    Shor NZ, Kiwiel KC, Ruszcaynski A (1985) Minimization Methods for Non-differentiable Functions. Springer-Verlag New York Inc, New York

  61. 61.

    Sirin E, Parsia B (2007) SPARQL-DL: SPARQL Query for OWL-DL. In: Golbreich C et al (eds) OWLED, CEUR Workshop Proceedings, vol 258. CEUR-WS.org

  62. 62.

    Sirin E, Parsia B, Grau BC, Kalyanpur A, Katz Y (2007) Pellet: a practical OWL-DL reasoner. J Web Sement 5(2):51–53

    Article  Google Scholar 

  63. 63.

    Suchanek FM, Kasneci G, Weikum G (2007) Yago: a core of semantic knowledge. In: Williamson CL et al (eds) Proceedings of the 16th International Conference on World Wide Web, WWW 2007, Banff, Alberta, Canada, May 8–12, 2007, ACM, pp 697–706

  64. 64.

    Sun Y, Han J (2012) Mining heterogeneous information networks: principles and methodologies. Synthesis lectures on data mining and knowledge discovery. Morgan & Claypool Publishers, San Rafael

    Google Scholar 

  65. 65.

    Sun Y, Han J, Zhao P, Yin Z, Cheng H, Wu T (2009) Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In: Kersten ML et al (eds) EDBT, ACM International Conference Proceeding Series, vol 360. ACM, pp 565–576

  66. 66.

    Tresp V, Huang Y, Bundschus M, Rettinger A (2009) Materializing and querying learned knowledge. In: Proceedings of IRMLeS’09

  67. 67.

    Vapnik VN (1998) Statistical learning theory, 1st edn. Wiley, New York

    Google Scholar 

  68. 68.

    Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Brodley CE et al (eds) Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27–31, 2014, Québec City, Québec, Canada. AAAI Press, pp 1112–1119

  69. 69.

    Zhang K, Kwok JT, Parvin B (2009) Prototype vector machine for large scale semi-supervised learning. In: Danyluk AP et al (eds) Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14–18, 2009, ACM International Conference Proceeding Series, vol 382, ACM, pp 1233–1240

  70. 70.

    Zhang Y, Huang K, Liu C (2011) Fast and robust graph-based transductive learning via minimum tree cut. In: Cook DJ et al (eds) 11th IEEE International Conference on Data Mining, ICDM 2011, Vancouver, BC, Canada, December 11–14, 2011. IEEE Computer Society, pp 952–961

  71. 71.

    Zhu X (2005) Semi-supervised learning literature survey. Tech. Rep. 1530, Computer Science, University of Wisconsin-Madison

  72. 72.

    Zhu X (2005) Semi-supervised learning with graphs. Ph.D. thesis, Pittsburgh, PA, USA . AAI3179046

  73. 73.

    Zhu X, Ghahramani Z, Lafferty JD (2003) Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. In: Fawcett T et al (eds) Proceedings of ICML’03, AAAI Press, pp 912–919

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Pasquale Minervini.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Minervini, P., d’Amato, C., Fanizzi, N. et al. Discovering Similarity and Dissimilarity Relations for Knowledge Propagation in Web Ontologies. J Data Semant 5, 229–248 (2016). https://doi.org/10.1007/s13740-016-0062-7

Download citation

Keywords

  • Discriminant Function
  • Penalty Term
  • Similarity Graph
  • Conjunctive Query
  • Heterogeneous Information Network