Advertisement

Data Mining and Knowledge Discovery

, Volume 29, Issue 5, pp 1233–1257 | Cite as

Beyond rankings: comparing directed acyclic graphs

  • Eric Malmi
  • Nikolaj Tatti
  • Aristides Gionis
Article

Abstract

Defining appropriate distance measures among rankings is a classic area of study which has led to many useful applications. In this paper, we propose a more general abstraction of preference data, namely directed acyclic graphs (DAGs), and introduce a measure for comparing DAGs, given that a vertex correspondence between the DAGs is known. We study the properties of this measure and use it to aggregate and cluster a set of DAGs. We show that these problems are \(\mathbf {NP}\)-hard and present efficient methods to obtain solutions with approximation guarantees. In addition to preference data, these methods turn out to have other interesting applications, such as the analysis of a collection of information cascades in a network. We test the methods on synthetic and real-world datasets, showing that the methods can be used to, e.g., find a set of influential individuals related to a set of topics in a network or to discover meaningful and occasionally surprising clustering structure.

Keywords

Directed acyclic graphs Aggregation Clustering  Preferences Information cascades 

Notes

Acknowledgments

The authors are grateful to Nicola Barbieri for providing the Last.fm dataset. We also thank the anonymous reviewers for their constructive feedback. This work was supported by Academy of Finland grant 118653 (ALGODAN).

References

  1. Ailon N (2010) Aggregation of partial rankings, p-ratings and top-\(m\) lists. Algorithmica 57(2):284–300MathSciNetCrossRefzbMATHGoogle Scholar
  2. Ailon N, Charikar M, Newman A (2008) Aggregating inconsistent information: ranking and clustering. J ACM 55(5):23MathSciNetCrossRefGoogle Scholar
  3. Anagnostopoulos A, Kumar R, Mahdian M (2008) Influence and correlation in social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. pp 7–15Google Scholar
  4. Barbieri N, Bonchi F, Manco G (2013) Cascade-based community detection. In: Proceedings of the sixth ACM international conference on Web search and data mining. pp 33–42Google Scholar
  5. Bender MA, Fineman JT, Gilbert S, Tarjan RE (2011) A new approach to incremental cycle detection and related problems. arXiv:1112.0784
  6. Borda J (1781) Mémoire sur les élections au scrutin. Histoire de l’Académie Royale des SciencesGoogle Scholar
  7. Brandenburg F, Gleißner A, Hofmeier A (2012) Comparing and aggregating partial orders with Kendall tau distances. In: WALCOM: algorithms and computation. Lecture notes in computer science, vol 7157. Springer Berlin Heidelberg, pp 88–99Google Scholar
  8. Brandenburg F, Gleißner A, Hofmeier A (2013) The nearest neighbor Spearman footrule distance for bucket, interval, and partial orders. J Comb Optim 26(2):310–332MathSciNetCrossRefzbMATHGoogle Scholar
  9. Bunke H, Shearer K (1998) A graph distance metric based on the maximal common subgraph. Pattern Recognit Lett 19(3):255–259CrossRefzbMATHGoogle Scholar
  10. Dinur I, Safra S (2005) On the hardness of approximating minimum vertex cover. Ann Math 162(1):439–485MathSciNetCrossRefzbMATHGoogle Scholar
  11. Dwork C, Kumar R, Naor M, Sivakumar D (2001) Rank aggregation methods for the web. In: Proceedings of the 10th international conference on World Wide Web. pp 613–622Google Scholar
  12. Even G, Naor J, Schieber B, Sudan M (1995) Approximating minimum feedback sets and multi-cuts in directed graphs. In: Proceedings of the 4th international conference on integer programming and combinatorial optimization. pp 14–28Google Scholar
  13. Fagin R, Kumar R, Mahdian M, Sivakumar D, Vee E (2006) Comparing partial rankings. SIAM J Discrete Math 20(3):628–648MathSciNetCrossRefzbMATHGoogle Scholar
  14. Fagin R, Kumar R, Sivakumar D (2003) Comparing top-\(k\) lists. SIAM J Discrete Math 17(1):134–160MathSciNetCrossRefzbMATHGoogle Scholar
  15. Friedman JH, Rafsky LC (1979) Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. Ann Stat 7(4):697–717MathSciNetCrossRefzbMATHGoogle Scholar
  16. Gomez-Rodriguez M, Balduzzi D, Schölkopf B (2011) Uncovering the temporal dynamics of diffusion networks. In: Proceedings of the 28th international conference on machine learning. pp 561–568Google Scholar
  17. Gomez-Rodriguez M, Leskovec J, Krause A (2012) Inferring networks of diffusion and influence. ACM Trans Knowl Discov Data 5(4):21CrossRefGoogle Scholar
  18. Goodman LA, Kruskal WH (1972) Measures of association for cross classifications, iv: simplification of asymptotic variances. J Am Stat Assoc 67(338):415–421CrossRefzbMATHGoogle Scholar
  19. Goyal A, Bonchi F, Lakshmanan LVS (2008) Discovering leaders from community actions. In: Proceedings of the 17th ACM conference on information and knowledge management. pp 499–508Google Scholar
  20. Goyal A, Bonchi F, Lakshmanan LVS (2010) Learning influence probabilities in social networks. In: Proceedings of the third ACM international conference on Web search and data mining. pp 241–250Google Scholar
  21. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218CrossRefGoogle Scholar
  22. Jiang X, Munger A, Bunke H (2001) An median graphs: properties, algorithms, and applications. IEEE Trans Pattern Anal Mach Intell 23(10):1144–1151CrossRefGoogle Scholar
  23. Kann V (1992) On the approximability of np-complete optimization problems. Ph.D. thesis, KTHGoogle Scholar
  24. Karp RM (1972) Reducibility among combinatorial problems. In: Complexity of computer computations. Springer, New YorkGoogle Scholar
  25. Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. pp 137–146Google Scholar
  26. Kendall M (1938) A new measure of rank correlation. Biometrika 30:81–93MathSciNetCrossRefzbMATHGoogle Scholar
  27. Kendall M (1976) Rank correlation methods, 4th edn. Hodder Arnold, LondonGoogle Scholar
  28. Kenyon-Mathieu C, Schudy W (2007) How to rank with few errors. In: Proceedings of the 39th annual ACM symposium on theory of computing. pp 95–103Google Scholar
  29. Laming D (2003) Human judgment: the eye of the beholder. Cengage Learning EMEAGoogle Scholar
  30. Macchia L, Bonchi F, Gullo F, Chiarandini L (2013) Mining summaries of propagations. In: Proceedings of the 13th IEEE international conference on data mining. pp 498–507Google Scholar
  31. Madden JI (1995) Analyzing and modeling rank data. Chapman & Hall, LondonGoogle Scholar
  32. Murphy TB, Martin D (2003) Mixtures of distance-based models for ranking data. Comp Stat Data Anal 41(3–4):645–655MathSciNetCrossRefzbMATHGoogle Scholar
  33. Saito K, Nakano R, Kimura M (2008) Prediction of information diffusion probabilities for independent cascade model. In: Knowledge-based intelligent information and engineering systems. pp 67–75Google Scholar
  34. Su H, Gionis A, Rousu J (2014) Structured prediction of network response. In: Proceedings of the 31st international conference on machine learning. pp 442–450Google Scholar

Copyright information

© The Author(s) 2015

Authors and Affiliations

  1. 1.HIIT and Department of Computer ScienceAalto UniversityEspooFinland

Personalised recommendations