Advertisement

Reducing seed noise in personalized PageRank

  • Shengyu Huang
  • Xinsheng Li
  • K. Selçuk CandanEmail author
  • Maria Luisa Sapino
Original Article
  • 222 Downloads

Abstract

Network-based recommendation systems leverage the topology of the underlying graph and the current user context to rank objects in the database. Random walk-based techniques, such as PageRank, encode the structure of the graph in the form of a transition matrix of a stochastic process from which the significances of the nodes in the graph are inferred. Personalized PageRank (PPR) techniques complement this with a seed node set which serves as the personalization context. In this paper, we note (and experimentally show) that PPR algorithms that do not differentiate among the seed nodes may not properly rank nodes in situations where the seed set is incomplete and/or noisy. To tackle this problem, we propose alternative robust personalized PageRank (RPR) strategies, which are insensitive to noise in the set of seed nodes and in which the rankings are not overly biased towards the seed nodes. In particular, we show that novel teleportation-discounting and seed-set maximal PPR techniques help eliminate harmful bias of individual seed nodes and provide effective seed differentiation to lead to more accurate rankings. We also show that the proposed techniques lead to efficient implementations, where existing approximation algorithms and/or parallel implementations for computing the PPR scores can be easily leveraged. Moreover, the proposed formulations are reuse-promoting in the sense that, it is possible to divide the work relative to individual seed nodes and cache the intermediary results obtained during the computation, and especially in systems with large query throughputs, it may be possible to cluster queries based on the partial overlaps between the seed sets and reduce the overall robust PPR computation costs. Experiment results show that the proposed techniques are efficient and highly effective in improving recommendations and eliminating unwanted bias due to imperfections in the seed set.

Keywords

Seed Node Block Diagonalization Random Walk With Restart Seed Differentiation PageRank Score 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Andersen R, Borgs C, Chayes J, Feige U, Flaxman A, Kalai A, Mirrokni V, Tennenholtz M (2008) Trust-based recommendation systems: an axiomatic approach. In: Proceedings of the 17th international conference on World Wide Web. ACM, New York, pp 199–208Google Scholar
  2. Avrachenkov K, Litvak N, Nemirovsky D, Smirnova E, Sokol M (2011) Quick detection of top-k personalized PageRank lists. In: Algorithms and Models for the Web Graph. Springer, Berlin Heidelberg, pp 50–61Google Scholar
  3. Bahmani B, Chakrabarti K, Xin D (2011) Fast personalized PageRank on MapReduce. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, pp 973–984Google Scholar
  4. Bahmani B, Chowdhury A, Goel A (2010) Fast incremental and personalized PageRank. In: Proceedings of the VLDB Endowment, vol 4, pp 173–184Google Scholar
  5. Balmin A, Hristidis V, Papakonstantinou Y (2004) ObjectRank: authority-based keyword search in databases. In: Proceedings of the Thirtieth international conference on very large data bases. Morgan Kaufman, San Francisco, pp 564–575Google Scholar
  6. Berkhin P (2007) Bookmark-coloring approach to personalized pagerank computing. Int Math 3(1):41–62MathSciNetzbMATHGoogle Scholar
  7. Boldi P, Rosa M, Vigna S (2011) HyperANF: approximating the neighbourhood function of very large graphs on a budget. In: Proceedings of the 20th international conference on World Wide Web. ACM, New York, pp 625–634Google Scholar
  8. Borgs C, Brautbar M, Chayes J, Teng SH (2014) Multiscale matrix sampling and sublinear-time pagerank computation. Int Math 10(1–2):20–48MathSciNetGoogle Scholar
  9. Borgatti MG, Jones C, Everett MG (1998) Network measures of social capital. Connections 21(2):27–36Google Scholar
  10. Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. Comput Netw ISDN Syst 30:107–117CrossRefGoogle Scholar
  11. Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, New York, pp 25–32Google Scholar
  12. Candan KS, Li WS (2000) Using random walks for mining web document associations. In: Proceedings of the fourth European conference on machine learning and principles and practice of knowledge discovery in databases, pp 294–305Google Scholar
  13. Candan KS, Li WS (2002) Reasoning for Web document associations and its applications in site map construction. Data Knowl Eng 43(2):121–150CrossRefzbMATHGoogle Scholar
  14. Chakrabarti S (2007) Dynamic personalized pagerank in entity-relation graphs. In: Proceedings of the 16th international conference on World Wide Web. ACM, New York, pp 571–580Google Scholar
  15. Chen M, Liu J, Tang X (2008) Clustering via random walk hitting time on directed graphs. In: Proceedings of the 23rd national conference on Artificial intelligence, pp 616–621Google Scholar
  16. Cohen E, Halperin E, Kaplan H, Zwick U (2003) Reachability and distance queries via 2-hop labels. SIAM J Comput 32(5):1338–1355MathSciNetCrossRefzbMATHGoogle Scholar
  17. Csalogany K, Fogaras D, Rácz B, Sarlós T (2005) Towards scaling fully personalized PageRank: algorithms, lower bounds, and experiments. Int Math 2(3):333–358MathSciNetzbMATHGoogle Scholar
  18. Davis TA (2006) Direct methods for sparse linear systems. SIAM, Philadephia, PA, pp 1–211CrossRefzbMATHGoogle Scholar
  19. Foster KC, Muth SQ, Potterat JJ, Rothenberg RB (2001) A faster Katz status score algorithm. Comput Math Organ Theo 7(4):275–285CrossRefGoogle Scholar
  20. Fouss F, Pirotte A, Renders JM, Saerens M (2007) Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Transact Knowl Data Eng 5:1041–4347Google Scholar
  21. Fujiwara Y, Nakatsuji M, Onizuka M, Kitsuregawa M (2012) Fast and exact top-k search for random walk with restart. In: Proceedings of the VLDB Endowment, vol 5, pp 442–453Google Scholar
  22. Guan Z, Bu J, Mei Q, Chen C, Wang C (2009) Personalized tag recommendation using graph-based ranking on multi-type interrelated objects. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 540–547Google Scholar
  23. Gupta M, Pathak A, Chakrabarti S (2008) Fast algorithms for top-k personalized PageRank queries. In: Proceedings of the 17th international conference on World Wide Web, pp 1225–1226Google Scholar
  24. Haveliwala TH (2002) Topic-sensitive PageRank. In: Proceedings of the 11th international conference on World Wide Web. ACM, New York, pp 517–526Google Scholar
  25. Jeh G, Widom J (2003) Scaling personalized web. In: Proceedings of the 12th international conference on World Wide Web. ACM, New York, pp 271–279Google Scholar
  26. Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18:39–43CrossRefzbMATHGoogle Scholar
  27. Kamvar SD, Haveliwala T, Manning CD, Golub G (2003) Extrapolation methods for accelerating PageRank computations. In: Proceedings of the 12th international conference on World Wide Web. ACM, New York, pp 261–270Google Scholar
  28. Kim HJ, Candan KS, Sapino ML (2013) LR-PPR: locality-sensitive, re-use promoting, approximate personalized PageRank computation. In: Proceedings of the 22nd ACM international conference on information & knowledge management, pp 1801–1806Google Scholar
  29. Kim HN, El-Saddik A (2011) Personalized PageRank vectors for tag recommendations: inside FolkRank. In: Proceedings of the fifth ACM conference on recommender systems. ACM, New York, pp 45–52Google Scholar
  30. Langville AN, Meyer CD (2004) Updating pagerank with iterative aggregation. In: Proceedings of the 13th international World Wide Web conference on alternate track papers & posters. ACM, New York, pp 392–393Google Scholar
  31. Maehara T, Akiba T et al (2014) Computing personalized PageRank quickly by exploiting graph structures. In: Proceedings of the VLDB endowment, vol 7, pp 1023–1034Google Scholar
  32. Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, New York, pp 135–146Google Scholar
  33. Mei Q, Zhou D, Church K (2008) Query suggestion using hitting time. In: Proceedings of the 17th ACM conference on information and knowledge management. ACM, New York, pp 469–478Google Scholar
  34. Palmer C, Gibbons P, Faloutsos C (2002) Anf: a fast and scalable tool for data mining in massive graphs. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 81–90Google Scholar
  35. Piegorsch W, Casella GE (1990) Inverting a sum of matrices. SIAM Rev 32:470MathSciNetCrossRefzbMATHGoogle Scholar
  36. Perozzi B, McCubbin C, Halbert JT (2014) Scalable graph clustering with parallel approximate PageRank. Soc Netw Anal Min 4:179–189CrossRefGoogle Scholar
  37. Sarkar P, Moore AW, Prakash A (2008) Fast incremental proximity search in large graphs. In: Proceedings of the 25th international conference on machine learning. ACM, pp 896–903Google Scholar
  38. Sarma AD, Molla AR, Pandurangan G, Upfal E (2013) Fast distributed PageRank computation. In: Proceedings of 14th international conference on distributed computing and networking, pp 11–26Google Scholar
  39. Tong H, Faloutsos C (2006) Center-piece subgraphs: problem definition and fast solutions. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 404–413Google Scholar
  40. Tong H, Faloutsos C, Koren Y (2007) Fast direction-aware proximity for graph mining. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 747–756Google Scholar
  41. Tong H, Faloutsos C, Pan JY (2006) Fast random walk with restart and its applications. In: Proceedings of the sixth international conference on data mining, pp 613–622Google Scholar
  42. Wei F (2010) Tedi: efficient shortest path query answering on graphs. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, New York, pp 99–110Google Scholar
  43. White DR, Borgatti SP (1994) Betweenness centrality measures for directed graphs. Soc Netw 16:335–346CrossRefGoogle Scholar
  44. Xiao Y, Wu W, Pei J, Wang W, He Z (2009) Efficiently indexing shortest paths by exploiting symmetry in graphs. In: Proceedings of the 12th international conference on extending database technology: advances in database technology. ACM, pp 493–504Google Scholar
  45. Zhou L, Chen L, Ozsu MT (2009) Distance-join: pattern match query in a large graph. In: Proceedings of the VLDB endowment, vol 2, pp 886–897Google Scholar

Copyright information

© Springer-Verlag Wien 2016

Authors and Affiliations

  • Shengyu Huang
    • 1
  • Xinsheng Li
    • 1
  • K. Selçuk Candan
    • 1
    Email author
  • Maria Luisa Sapino
    • 2
  1. 1.Arizona State UniversityTempeUSA
  2. 2.University of TorinoTurinItaly

Personalised recommendations