Data Mining and Knowledge Discovery

, Volume 29, Issue 5, pp 1233–1257

Beyond rankings: comparing directed acyclic graphs

Article

Abstract

Defining appropriate distance measures among rankings is a classic area of study which has led to many useful applications. In this paper, we propose a more general abstraction of preference data, namely directed acyclic graphs (DAGs), and introduce a measure for comparing DAGs, given that a vertex correspondence between the DAGs is known. We study the properties of this measure and use it to aggregate and cluster a set of DAGs. We show that these problems are \(\mathbf {NP}\)-hard and present efficient methods to obtain solutions with approximation guarantees. In addition to preference data, these methods turn out to have other interesting applications, such as the analysis of a collection of information cascades in a network. We test the methods on synthetic and real-world datasets, showing that the methods can be used to, e.g., find a set of influential individuals related to a set of topics in a network or to discover meaningful and occasionally surprising clustering structure.

Keywords

Directed acyclic graphs Aggregation Clustering  Preferences Information cascades 

References

  1. Ailon N (2010) Aggregation of partial rankings, p-ratings and top-\(m\) lists. Algorithmica 57(2):284–300MathSciNetCrossRefMATHGoogle Scholar
  2. Ailon N, Charikar M, Newman A (2008) Aggregating inconsistent information: ranking and clustering. J ACM 55(5):23MathSciNetCrossRefGoogle Scholar
  3. Anagnostopoulos A, Kumar R, Mahdian M (2008) Influence and correlation in social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. pp 7–15Google Scholar
  4. Barbieri N, Bonchi F, Manco G (2013) Cascade-based community detection. In: Proceedings of the sixth ACM international conference on Web search and data mining. pp 33–42Google Scholar
  5. Bender MA, Fineman JT, Gilbert S, Tarjan RE (2011) A new approach to incremental cycle detection and related problems. arXiv:1112.0784
  6. Borda J (1781) Mémoire sur les élections au scrutin. Histoire de l’Académie Royale des SciencesGoogle Scholar
  7. Brandenburg F, Gleißner A, Hofmeier A (2012) Comparing and aggregating partial orders with Kendall tau distances. In: WALCOM: algorithms and computation. Lecture notes in computer science, vol 7157. Springer Berlin Heidelberg, pp 88–99Google Scholar
  8. Brandenburg F, Gleißner A, Hofmeier A (2013) The nearest neighbor Spearman footrule distance for bucket, interval, and partial orders. J Comb Optim 26(2):310–332MathSciNetCrossRefMATHGoogle Scholar
  9. Bunke H, Shearer K (1998) A graph distance metric based on the maximal common subgraph. Pattern Recognit Lett 19(3):255–259CrossRefMATHGoogle Scholar
  10. Dinur I, Safra S (2005) On the hardness of approximating minimum vertex cover. Ann Math 162(1):439–485MathSciNetCrossRefMATHGoogle Scholar
  11. Dwork C, Kumar R, Naor M, Sivakumar D (2001) Rank aggregation methods for the web. In: Proceedings of the 10th international conference on World Wide Web. pp 613–622Google Scholar
  12. Even G, Naor J, Schieber B, Sudan M (1995) Approximating minimum feedback sets and multi-cuts in directed graphs. In: Proceedings of the 4th international conference on integer programming and combinatorial optimization. pp 14–28Google Scholar
  13. Fagin R, Kumar R, Mahdian M, Sivakumar D, Vee E (2006) Comparing partial rankings. SIAM J Discrete Math 20(3):628–648MathSciNetCrossRefMATHGoogle Scholar
  14. Fagin R, Kumar R, Sivakumar D (2003) Comparing top-\(k\) lists. SIAM J Discrete Math 17(1):134–160MathSciNetCrossRefMATHGoogle Scholar
  15. Friedman JH, Rafsky LC (1979) Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. Ann Stat 7(4):697–717MathSciNetCrossRefMATHGoogle Scholar
  16. Gomez-Rodriguez M, Balduzzi D, Schölkopf B (2011) Uncovering the temporal dynamics of diffusion networks. In: Proceedings of the 28th international conference on machine learning. pp 561–568Google Scholar
  17. Gomez-Rodriguez M, Leskovec J, Krause A (2012) Inferring networks of diffusion and influence. ACM Trans Knowl Discov Data 5(4):21CrossRefGoogle Scholar
  18. Goodman LA, Kruskal WH (1972) Measures of association for cross classifications, iv: simplification of asymptotic variances. J Am Stat Assoc 67(338):415–421CrossRefMATHGoogle Scholar
  19. Goyal A, Bonchi F, Lakshmanan LVS (2008) Discovering leaders from community actions. In: Proceedings of the 17th ACM conference on information and knowledge management. pp 499–508Google Scholar
  20. Goyal A, Bonchi F, Lakshmanan LVS (2010) Learning influence probabilities in social networks. In: Proceedings of the third ACM international conference on Web search and data mining. pp 241–250Google Scholar
  21. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218CrossRefGoogle Scholar
  22. Jiang X, Munger A, Bunke H (2001) An median graphs: properties, algorithms, and applications. IEEE Trans Pattern Anal Mach Intell 23(10):1144–1151CrossRefGoogle Scholar
  23. Kann V (1992) On the approximability of np-complete optimization problems. Ph.D. thesis, KTHGoogle Scholar
  24. Karp RM (1972) Reducibility among combinatorial problems. In: Complexity of computer computations. Springer, New YorkGoogle Scholar
  25. Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. pp 137–146Google Scholar
  26. Kendall M (1938) A new measure of rank correlation. Biometrika 30:81–93MathSciNetCrossRefMATHGoogle Scholar
  27. Kendall M (1976) Rank correlation methods, 4th edn. Hodder Arnold, LondonGoogle Scholar
  28. Kenyon-Mathieu C, Schudy W (2007) How to rank with few errors. In: Proceedings of the 39th annual ACM symposium on theory of computing. pp 95–103Google Scholar
  29. Laming D (2003) Human judgment: the eye of the beholder. Cengage Learning EMEAGoogle Scholar
  30. Macchia L, Bonchi F, Gullo F, Chiarandini L (2013) Mining summaries of propagations. In: Proceedings of the 13th IEEE international conference on data mining. pp 498–507Google Scholar
  31. Madden JI (1995) Analyzing and modeling rank data. Chapman & Hall, LondonGoogle Scholar
  32. Murphy TB, Martin D (2003) Mixtures of distance-based models for ranking data. Comp Stat Data Anal 41(3–4):645–655MathSciNetCrossRefMATHGoogle Scholar
  33. Saito K, Nakano R, Kimura M (2008) Prediction of information diffusion probabilities for independent cascade model. In: Knowledge-based intelligent information and engineering systems. pp 67–75Google Scholar
  34. Su H, Gionis A, Rousu J (2014) Structured prediction of network response. In: Proceedings of the 31st international conference on machine learning. pp 442–450Google Scholar

Copyright information

© The Author(s) 2015

Authors and Affiliations

  1. 1.HIIT and Department of Computer ScienceAalto UniversityEspooFinland

Personalised recommendations