Beyond rankings: comparing directed acyclic graphs

Abstract

Defining appropriate distance measures among rankings is a classic area of study which has led to many useful applications. In this paper, we propose a more general abstraction of preference data, namely directed acyclic graphs (DAGs), and introduce a measure for comparing DAGs, given that a vertex correspondence between the DAGs is known. We study the properties of this measure and use it to aggregate and cluster a set of DAGs. We show that these problems are \(\mathbf {NP}\)-hard and present efficient methods to obtain solutions with approximation guarantees. In addition to preference data, these methods turn out to have other interesting applications, such as the analysis of a collection of information cascades in a network. We test the methods on synthetic and real-world datasets, showing that the methods can be used to, e.g., find a set of influential individuals related to a set of topics in a network or to discover meaningful and occasionally surprising clustering structure.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. 1.

    Most often the Kendall-tau distance is defined to be a value between 0 and 1 by normalizing with the total number of vertex pairs \({{|V|} \atopwithdelims ()2}\).

  2. 2.

    The dataset can be downloaded at http://users.ics.aalto.fi/emalmi/artist_preference_data.zip.

References

  1. Ailon N (2010) Aggregation of partial rankings, p-ratings and top-\(m\) lists. Algorithmica 57(2):284–300

    MathSciNet  Article  MATH  Google Scholar 

  2. Ailon N, Charikar M, Newman A (2008) Aggregating inconsistent information: ranking and clustering. J ACM 55(5):23

    MathSciNet  Article  Google Scholar 

  3. Anagnostopoulos A, Kumar R, Mahdian M (2008) Influence and correlation in social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. pp 7–15

  4. Barbieri N, Bonchi F, Manco G (2013) Cascade-based community detection. In: Proceedings of the sixth ACM international conference on Web search and data mining. pp 33–42

  5. Bender MA, Fineman JT, Gilbert S, Tarjan RE (2011) A new approach to incremental cycle detection and related problems. arXiv:1112.0784

  6. Borda J (1781) Mémoire sur les élections au scrutin. Histoire de l’Académie Royale des Sciences

  7. Brandenburg F, Gleißner A, Hofmeier A (2012) Comparing and aggregating partial orders with Kendall tau distances. In: WALCOM: algorithms and computation. Lecture notes in computer science, vol 7157. Springer Berlin Heidelberg, pp 88–99

  8. Brandenburg F, Gleißner A, Hofmeier A (2013) The nearest neighbor Spearman footrule distance for bucket, interval, and partial orders. J Comb Optim 26(2):310–332

    MathSciNet  Article  MATH  Google Scholar 

  9. Bunke H, Shearer K (1998) A graph distance metric based on the maximal common subgraph. Pattern Recognit Lett 19(3):255–259

    Article  MATH  Google Scholar 

  10. Dinur I, Safra S (2005) On the hardness of approximating minimum vertex cover. Ann Math 162(1):439–485

    MathSciNet  Article  MATH  Google Scholar 

  11. Dwork C, Kumar R, Naor M, Sivakumar D (2001) Rank aggregation methods for the web. In: Proceedings of the 10th international conference on World Wide Web. pp 613–622

  12. Even G, Naor J, Schieber B, Sudan M (1995) Approximating minimum feedback sets and multi-cuts in directed graphs. In: Proceedings of the 4th international conference on integer programming and combinatorial optimization. pp 14–28

  13. Fagin R, Kumar R, Mahdian M, Sivakumar D, Vee E (2006) Comparing partial rankings. SIAM J Discrete Math 20(3):628–648

    MathSciNet  Article  MATH  Google Scholar 

  14. Fagin R, Kumar R, Sivakumar D (2003) Comparing top-\(k\) lists. SIAM J Discrete Math 17(1):134–160

    MathSciNet  Article  MATH  Google Scholar 

  15. Friedman JH, Rafsky LC (1979) Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. Ann Stat 7(4):697–717

    MathSciNet  Article  MATH  Google Scholar 

  16. Gomez-Rodriguez M, Balduzzi D, Schölkopf B (2011) Uncovering the temporal dynamics of diffusion networks. In: Proceedings of the 28th international conference on machine learning. pp 561–568

  17. Gomez-Rodriguez M, Leskovec J, Krause A (2012) Inferring networks of diffusion and influence. ACM Trans Knowl Discov Data 5(4):21

    Article  Google Scholar 

  18. Goodman LA, Kruskal WH (1972) Measures of association for cross classifications, iv: simplification of asymptotic variances. J Am Stat Assoc 67(338):415–421

    Article  MATH  Google Scholar 

  19. Goyal A, Bonchi F, Lakshmanan LVS (2008) Discovering leaders from community actions. In: Proceedings of the 17th ACM conference on information and knowledge management. pp 499–508

  20. Goyal A, Bonchi F, Lakshmanan LVS (2010) Learning influence probabilities in social networks. In: Proceedings of the third ACM international conference on Web search and data mining. pp 241–250

  21. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218

    Article  Google Scholar 

  22. Jiang X, Munger A, Bunke H (2001) An median graphs: properties, algorithms, and applications. IEEE Trans Pattern Anal Mach Intell 23(10):1144–1151

    Article  Google Scholar 

  23. Kann V (1992) On the approximability of np-complete optimization problems. Ph.D. thesis, KTH

  24. Karp RM (1972) Reducibility among combinatorial problems. In: Complexity of computer computations. Springer, New York

  25. Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. pp 137–146

  26. Kendall M (1938) A new measure of rank correlation. Biometrika 30:81–93

    MathSciNet  Article  MATH  Google Scholar 

  27. Kendall M (1976) Rank correlation methods, 4th edn. Hodder Arnold, London

    Google Scholar 

  28. Kenyon-Mathieu C, Schudy W (2007) How to rank with few errors. In: Proceedings of the 39th annual ACM symposium on theory of computing. pp 95–103

  29. Laming D (2003) Human judgment: the eye of the beholder. Cengage Learning EMEA

  30. Macchia L, Bonchi F, Gullo F, Chiarandini L (2013) Mining summaries of propagations. In: Proceedings of the 13th IEEE international conference on data mining. pp 498–507

  31. Madden JI (1995) Analyzing and modeling rank data. Chapman & Hall, London

    Google Scholar 

  32. Murphy TB, Martin D (2003) Mixtures of distance-based models for ranking data. Comp Stat Data Anal 41(3–4):645–655

    MathSciNet  Article  MATH  Google Scholar 

  33. Saito K, Nakano R, Kimura M (2008) Prediction of information diffusion probabilities for independent cascade model. In: Knowledge-based intelligent information and engineering systems. pp 67–75

  34. Su H, Gionis A, Rousu J (2014) Structured prediction of network response. In: Proceedings of the 31st international conference on machine learning. pp 442–450

Download references

Acknowledgments

The authors are grateful to Nicola Barbieri for providing the Last.fm dataset. We also thank the anonymous reviewers for their constructive feedback. This work was supported by Academy of Finland grant 118653 (ALGODAN).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Eric Malmi.

Additional information

Responsible editors: Joao Gama, Indre Zliobaite, Alipio Jorge, Concha Bielza.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Malmi, E., Tatti, N. & Gionis, A. Beyond rankings: comparing directed acyclic graphs. Data Min Knowl Disc 29, 1233–1257 (2015). https://doi.org/10.1007/s10618-015-0406-1

Download citation

Keywords

  • Directed acyclic graphs
  • Aggregation
  • Clustering
  • Preferences
  • Information cascades