Data Mining and Knowledge Discovery

, Volume 33, Issue 5, pp 1254–1297 | Cite as

Counts-of-counts similarity for prediction and search in relational data

  • Manfred JaegerEmail author
  • Marco Lippi
  • Giovanni Pellegrini
  • Andrea Passerini
Part of the following topical collections:
  1. Journal Track of ECML PKDD 2019


Defining appropriate distance functions is a crucial aspect of effective and efficient similarity-based prediction and retrieval. Relational data are especially challenging in this regard. By viewing relational data as multi-relational graphs, one can easily see that a distance between a pair of nodes can be defined in terms of a virtually unlimited class of features, including node attributes, attributes of node neighbors, structural aspects of the node neighborhood and arbitrary combinations of these properties. In this paper we propose a rich and flexible class of metrics on graph entities based on earth mover’s distance applied to a hierarchy of complex counts-of-counts statistics. We further propose an approximate version of the distance using sums of marginal earth mover’s distances. We show that the approximation is correct for many cases of practical interest and allows efficient nearest-neighbor retrieval when combined with a simple metric tree data structure. An experimental evaluation on two real-world scenarios highlights the flexibility of our framework for designing metrics representing different notions of similarity. Substantial improvements in similarity-based prediction are reported when compared to solutions based on state-of-the-art graph kernels.


Relational data mining Graph mininga Similarity search Earth-mover’s distance Statistical relational learning 


Supplementary material


  1. Arya S, Mount DM, Netanyahu NS, Silverman R, Wu AY (1998) An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J ACM (JACM) 45(6):891–923MathSciNetCrossRefzbMATHGoogle Scholar
  2. Assche AV, Vens C, Blockeel H, Dzeroski S (2006) First order random forests: learning relational classifiers with complex aggregates. Mach Learn 64:149–182CrossRefzbMATHGoogle Scholar
  3. Barla A, Odone F, Verri A (2003) Histogram intersection kernel for image classification. In: Proceedings 2003 international conference on image processing (Cat. No.03CH37429), vol. 3, pp III–513Google Scholar
  4. Bellet A, Habrard A, Sebban M (2013) A survey on metric learning for feature vectors and structured data. CoRR arXiv:1306.6709
  5. Berg C, Christensen JP, Ressel P (1984) Harmonic analysis on semigroups: theory of positive definite and related functions, Graduate Texts in Mathematics, vol 100, 1st edn. Springer, BerlinCrossRefzbMATHGoogle Scholar
  6. Chan T, Esedoglu S, Ni K (2007) Histogram based segmentation using Wasserstein distances. In: International conference on scale space and variational methods in computer vision, Springer, pp 697–708Google Scholar
  7. Clarkson KL (2006) Nearest-neighbor searching and metric space dimensions. In: In nearest-neighbor methods for learning and vision: theory and practice, MIT Press, CambridgeGoogle Scholar
  8. Cuturi M, Avis D (2014) Ground metric learning. J Mach Learn Res 15(1):533–564MathSciNetzbMATHGoogle Scholar
  9. Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surv 40(2):5:1–5:60CrossRefGoogle Scholar
  10. Egghe L (2006) Theory and practise of the g-index. Scientometrics 69(1):131–152MathSciNetCrossRefGoogle Scholar
  11. Gardner A, Duncan CA, Kanno J, Selmic RR (2018) On the definiteness of earth mover’s distance and its relation to set intersection. IEEE Trans Cybern 48(11):3184–3196CrossRefGoogle Scholar
  12. Grover A, Leskovec J (2016) Node2vec: scalable feature learning for networks. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, KDD ’16, pp 855–864Google Scholar
  13. Hirsch JE (2005) An index to quantify an individual’s scientific research output. Proc Natl Acad Sci USA 102(46):16,569CrossRefzbMATHGoogle Scholar
  14. Hoff PD (2009) Multiplicative latent factor models for description and prediction of social networks. Comput Math Organ Theory 15(4):261–272CrossRefGoogle Scholar
  15. Jaeger M, Lippi M, Passerini A, Frasconi P (2013) Type extension trees for feature construction and learning in relational domains. Artif Intell 204:30–55MathSciNetCrossRefzbMATHGoogle Scholar
  16. Järvelin K, Kekäläinen J (2000) IR evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, ACM, pp 41–48Google Scholar
  17. Jeh G, Widom J (2002) Simrank: a measure of structural-context similarity. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, KDD ’02, pp 538–543Google Scholar
  18. Khan A, Li N, Yan X, Guan Z, Chakraborty S, Tao S (2011) Neighborhood based fast graph search in large networks. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, ACM, pp 901–912Google Scholar
  19. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Proceedings of the 3rd international conference on learning representations (ICLR)Google Scholar
  20. Knobbe AJ, Siebes A, van der Wallen D (1999) Multi-relational decision tree induction. In: Proceedings of PKDD-99, pp 378–383Google Scholar
  21. Leicht EA, Holme P, Newman ME (2006) Vertex similarity in networks. Phys Rev E 73(2):026,120CrossRefGoogle Scholar
  22. Liu T, Moore AW, Yang K, Gray AG (2005) An investigation of practical approximate nearest neighbor algorithms. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems, vol 17. MIT Press, Cambridge, pp 825–832Google Scholar
  23. Liu Z, Zheng VW, Zhao Z, Zhu F, Chang KC, Wu M, Ying J (2017) Semantic proximity search on heterogeneous graph by proximity embedding. In: Singh SP, Markovitch S (eds) Proceedings of the thirty-first AAAI conference on artificial intelligence, February 4–9, 2017, San Francisco, CA, AAAI Press, pp 154–160Google Scholar
  24. Ljosa V, Bhattacharya A, Singh AK (2006) Indexing spatially sensitive distance measures using multi-resolution lower bounds. In: International conference on extending database technology, Springer, Berlin, pp 865–883Google Scholar
  25. Loosli G, Canu S, Ong CS (2016) Learning SVM in Kreĭn spaces. IEEE Trans Pattern Anal Mach Intell 38(6):1204–1216CrossRefGoogle Scholar
  26. Mottin D, Lissandrini M, Velegrakis Y, Palpanas T (2014) Exemplar queries: give me an example of what you need. Proc VLDB Endow 7(5):365–376CrossRefGoogle Scholar
  27. Muja M, Lowe DG (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans Pattern Anal Mach Intell 36(11):2227–2240CrossRefGoogle Scholar
  28. Naor A, Schechtman G (2007) Planar earthmover is not in l_1. SIAM J Comput 37(3):804–826MathSciNetCrossRefzbMATHGoogle Scholar
  29. Neumann M, Garnett R, Bauckhage C, Kersting K (2016) Propagation kernels: efficient graph kernels from propagated information. Mach Learn 102(2):209–245MathSciNetCrossRefzbMATHGoogle Scholar
  30. Neville J, Jensen D, Friedland L, Hay M (2003) Learning relational probability trees. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD-03)Google Scholar
  31. Newman ME (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036,104MathSciNetCrossRefGoogle Scholar
  32. Oglic D, Gaertner T (2018) Learning in reproducing kernel Krein spaces. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, PMLR, proceedings of machine learning research, vol 80, pp 3856–3864Google Scholar
  33. Pele O, Werman M (2008) A linear time histogram metric for improved SIFT matching. In: Forsyth DA, Torr PHS, Zisserman A (eds) Computer vision–ECCV 2008, 10th European conference on computer vision, Marseille, France, October 12–18, 2008, proceedings, Part III, Springer, Lecture Notes in Computer Science, vol 5304, pp 495–508Google Scholar
  34. Richards DS (1985) Positive definite symmetric functions on finite-dimensional spaces ii. Stat Probab Lett 3(6):325–329MathSciNetCrossRefzbMATHGoogle Scholar
  35. Richardson M, Domingos P (2006) Markov logic networks. Mach Learn 62(1):107–136CrossRefGoogle Scholar
  36. Rubner Y, Tomasi C, Guibas LJ (1998) A metric for distributions with applications to image databases. In: Sixth international conference on computer vision, 1998, IEEE, pp 59–66Google Scholar
  37. Schölkopf B, Smola A (2002) Learning with Kernels. The MIT Press, Cambridge, MAzbMATHGoogle Scholar
  38. Shervashidze N, Schweitzer P, van Leeuwen EJ, Mehlhorn K, Borgwardt KM (2011) Weisfeiler–Lehman graph kernels. J Mach Learn Res 12:2539–2561MathSciNetzbMATHGoogle Scholar
  39. Sun Y, Han J, Yan X, Yu PS, Wu T (2011) Pathsim: meta path-based top-k similarity search in heterogeneous information networks. Proc VLDB Endow 4(11):992–1003Google Scholar
  40. Tong H, Faloutsos C, Gallagher B, Eliassi-Rad T (2007) Fast best-effort pattern matching in large attributed graphs. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, KDD ’07, pp 737–746Google Scholar
  41. Uhlmann JK (1991) Satisfying general proximity/similarity queries with metric trees. Inf Process Lett 40(4):175–179CrossRefzbMATHGoogle Scholar
  42. Vens C, Gassen SV, Dhaene T, Saeys Y (2014) Complex aggregates over clusters of elements. In: Davis J, Ramon J (eds) Inductive logic programming–24th international conference, ILP 2014, Nancy, France, September 14–16, 2014, Revised Selected Papers, Springer, Lecture Notes in Computer Science, vol 9046, pp 181–193Google Scholar
  43. Wang F, Guibas LJ (2012) Supervised earth mover’s distance learning and its computer vision applications. In: Fitzgibbon AW, Lazebnik S, Perona P, Sato Y, Schmid C (eds) Computer vision–ECCV 2012–12th European conference on computer vision, Florence, Italy, October 7–13, 2012, Proceedings, Part I, Springer, Lecture Notes in Computer Science, vol 7572, pp 442–455Google Scholar
  44. Wang J, Shen HT, Song J, Ji J (2014) Hashing for similarity search: a survey. CoRR arXiv:1408.2927
  45. Wang J, Zhang T, Song J, Sebe N, Shen HT (2017) A survey on learning to hash. IEEE Trans Pattern Anal Mach Intell PP(99):1–1Google Scholar
  46. Wichterich M, Assent I, Kranen P, Seidl T (2008) Efficient emd-based similarity search in multimedia databases via flexible dimensionality reduction. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, ACM, pp 199–212Google Scholar
  47. Yanardag P, Vishwanathan S (2015) Deep graph kernels. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1365–1374Google Scholar
  48. Zhang CT (2009) The e-index, complementing the h-index for excess citations. PLoS One 4(5):e5429CrossRefGoogle Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Institut for DatalogiAalborg UniversityAalborgDenmark
  2. 2.Dipartimento di Scienze e Metodi per l’IngegneriaUniversità degli Studi di Modena e Reggio EmiliaReggio EmiliaItaly
  3. 3.Dipartimento di Ingegneria e Scienza dell’InformazioneUniversità degli Studi di TrentoTrentoItaly

Personalised recommendations