Deterministic Coresets for Stochastic Matrices with Applications to Scalable Sparse PageRank

Lang, Harry; Baykal, Cenk; Samra, Najib Abu; Tannous, Tony; Feldman, Dan; Rus, Daniela

doi:10.1007/978-3-030-14812-6_25

Harry Lang¹⁶,
Cenk Baykal¹⁶,
Najib Abu Samra¹⁷,
Tony Tannous¹⁷,
Dan Feldman¹⁷ &
…
Daniela Rus¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11436))

Included in the following conference series:

International Conference on Theory and Applications of Models of Computation

634 Accesses
1 Citations
1 Altmetric

Abstract

The PageRank algorithm is used by search engines to rank websites in their search results. The algorithm outputs a probability distribution that a person randomly clicking on links will arrive at any particular page. Intuitively, a node in the center of the network should be visited with high probability even if it has few edges, and an isolated node that has many (local) neighbours will be visited with low probability. The idea of PageRank is to rank nodes according to a stable state and not according to the previous local measurement of inner/outer edges from a node that may be manipulated more easily than the corresponding entry in the stable state.

In this paper we present a deterministic and completely parallelizable algorithm for computing an \(\varepsilon \)-approximation to the PageRank of a graph of n nodes. Typical inputs consist of millions of pages, but the average number of links per page is less than ten. Our algorithm takes advantage of this sparsity, assuming the out-degree of each node at most s, and terminates in \(O(n s / \varepsilon ^2)\) time. Beyond the input graph, which may be stored in read-only storage, our algorithm uses only O(n) memory. This is the first algorithm whose complexity takes advantage of sparsity. Real data exhibits an average out-degree of 7 while n is in the millions, so the advantage is immense. Moreover, our algorithm is simple and robust to floating point precision issues. Our sparse solution (core-set) is based on reducing the PageRank problem to an \(\ell _2\) approximation of the Carathéodory problem, which independently has many applications such as in machine learning and game theory. We hope that our approach will be useful for many other applications for learning sparse data and graphs.

Algorithm, analysis, and open code with experimental results are provided.

Lang, Baykal, and Rus thank NSF 1723943, NSF 1526815, and The Boeing Company. This research was supported by Grant No. 2014627 from the United States-Israel Binational Science Foundation (BSF) and by Grant No. 1526815 from the United States National Science Foundation (NSF). Dan Feldman is grateful for the support of the Simons Foundation for part of this work that was done while he was visiting the Simons Institute for the Theory of Computing.

H. Lang and C. Baykal—contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ahmed, N.K., Neville, J., Kompella, R.: Network sampling: from static to streaming graphs. ACM Trans. Knowl. Discov. Data 8(2), 7:1–7:56 (2013). https://doi.org/10.1145/2601438
Article Google Scholar
Bahmani, B., Chakrabarti, K., Xin, D.: Fast personalized PageRank on mapreduce. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 973–984. ACM (2011)
Google Scholar
Bahmani, B., Chowdhury, A., Goel, A.: Fast incremental and personalized PageRank. Proc. VLDB Endow. 4(3), 173–184 (2010). https://doi.org/10.14778/1929861.1929864
Article Google Scholar
Bahmani, B., Kumar, R., Mahdian, M., Upfal, E.: PageRank on an evolving graph. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 24–32. ACM (2012)
Google Scholar
Barman, S.: Approximating nash equilibria and dense bipartite subgraphs via an approximate version of caratheodory’s theorem. In: Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, pp. 361–369. ACM (2015)
Google Scholar
Das Sarma, A., Nanongkai, D., Pandurangan, G.: Fast distributed random walks. In: Proceedings of the 28th ACM Symposium on Principles of Distributed Computing, PODC 2009, pp. 161–170. ACM, New York (2009). https://doi.org/10.1145/1582716.1582745
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Q. 3(1–2), 95–110 (1956)
Article MathSciNet Google Scholar
Haveliwala, T., Kamvar, A., Klein, D., Manning, C., Golub, G.: Computing PageRank using power extrapolation, August 2003
Google Scholar
Jin, Z., Shi, D., Wu, Q., Yan, H., Fan, H.: LBSNRank: personalized PageRank on location-based social networks. In: Proceedings of the 2012 ACM Conference on Ubiquitous Computing, pp. 980–987. ACM (2012)
Google Scholar
Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: Open source scientific tools for Python (2001). http://www.scipy.org/. Accessed
Leskovec, J., Sosič, R.: Snap: a general-purpose network analysis and graph-mining library. ACM Trans. Intell. Syst. Technol. (TIST) 8(1), 1 (2016)
Article Google Scholar
Mitliagkas, I., Borokhovich, M., Dimakis, A.G., Caramanis, C.: FrogWild!: fast PageRank approximations on graph engines. Proc. VLDB Endow. 8(8), 874–885 (2015)
Article Google Scholar
Rossi, R.A., Gleich, D.F.: Dynamic PageRank using evolving teleportation. In: Bonato, A., Janssen, J. (eds.) WAW 2012. LNCS, vol. 7323, pp. 126–137. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30541-2_10
Chapter Google Scholar
Rozenshtein, P., Gionis, A.: Temporal PageRank. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9852, pp. 674–689. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46227-1_42
Chapter Google Scholar
Sarma, A.D., Gollapudi, S., Panigrahy, R.: Estimating pageRank on graph streams. J. ACM 58(3), 13:1–13:19 (2011). https://doi.org/10.1145/1970392.1970397
Article MathSciNet MATH Google Scholar
Sarma, A.D., Molla, A.R., Pandurangan, G.: Near-optimal random walk sampling in distributed networks. arXiv preprint arXiv:1201.1363 (2012)
Das Sarma, A., Molla, A.R., Pandurangan, G., Upfal, E.: Fast distributed PageRank computation. In: Frey, D., Raynal, M., Sarkar, S., Shyamasundar, R.K., Sinha, P. (eds.) ICDCN 2013. LNCS, vol. 7730, pp. 11–26. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35668-1_2
Chapter Google Scholar
Yu, W., Lin, X., Zhang, W.: Fast incremental simrank on link-evolving graphs. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 304–315. IEEE (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

MIT CSAIL, Cambridge, USA
Harry Lang, Cenk Baykal & Daniela Rus
Computer Science Department, University of Haifa, Haifa, Israel
Najib Abu Samra, Tony Tannous & Dan Feldman

Authors

Harry Lang
View author publications
You can also search for this author in PubMed Google Scholar
Cenk Baykal
View author publications
You can also search for this author in PubMed Google Scholar
Najib Abu Samra
View author publications
You can also search for this author in PubMed Google Scholar
Tony Tannous
View author publications
You can also search for this author in PubMed Google Scholar
Dan Feldman
View author publications
You can also search for this author in PubMed Google Scholar
Daniela Rus
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cenk Baykal .

Editor information

Editors and Affiliations

Anna University, Chennai, India
T.V. Gopal
Waseda University, Kitakyushu, Japan
Junzo Watada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lang, H., Baykal, C., Samra, N.A., Tannous, T., Feldman, D., Rus, D. (2019). Deterministic Coresets for Stochastic Matrices with Applications to Scalable Sparse PageRank. In: Gopal, T., Watada, J. (eds) Theory and Applications of Models of Computation. TAMC 2019. Lecture Notes in Computer Science(), vol 11436. Springer, Cham. https://doi.org/10.1007/978-3-030-14812-6_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-14812-6_25
Published: 06 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14811-9
Online ISBN: 978-3-030-14812-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics