Skip to main content

Web Page Rank Prediction with PCA and EM Clustering

  • Conference paper
Algorithms and Models for the Web-Graph (WAW 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5427))

Included in the following conference series:

Abstract

In this paper we describe learning algorithms for Web page rank prediction. We consider linear regression models and combinations of regression with probabilistic clustering and Principal Components Analysis (PCA). These models are learned from time-series data sets and can predict the ranking of a set of Web pages in some future time. The first algorithm uses separate linear regression models. This is further extended by applying probabilistic clustering based on the EM algorithm. Clustering allows for the Web pages to be grouped together by fitting a mixture of regression models. A different method combines linear regression with PCA so as dependencies between different web pages can be exploited. All the methods are evaluated using real data sets obtained from Internet Archive, Wikipedia and Yahoo! ranking lists. We also study the temporal robustness of the prediction framework. Overall the system constitutes a set of tools for high accuracy pagerank prediction which can be used for efficient resource management by search engines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Vazirgiannis, M., Drosos, D., Senellart, P., Vlachou, A.: Web Page Rank Prediction with Markov Models. WWW poster, Beijing, China (2008)

    Book  Google Scholar 

  2. Vlachou, A., Vazirgiannis, M., Berberich, K.: Representing and quantifying rank - change for the web graph. In: Aiello, W., Broder, A., Janssen, J., Milios, E.E. (eds.) WAW 2006. LNCS, vol. 4936, pp. 157–165. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  3. Tipping, M.E., Bishop, C.M.: Mixtures of probabilistic principal component analyzers. Neural Computation 11, 443–482 (1999)

    Article  Google Scholar 

  4. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  5. Jolliffe, I.T.: Principal Component Analysis. Springer, New York (1986)

    Book  MATH  Google Scholar 

  6. Bishop, C.M.: Machine learning and pattern recognition. Information Science and Statistics. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  7. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. TOIS 20(4), 422–446 (2002)

    Article  Google Scholar 

  8. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)

    MATH  Google Scholar 

  9. Kan, M.-Y., Thi, H.O.N.: Fast webpage classification using URL features. In: Proc. CIKM, Bremen, Germany (October 2005)

    Google Scholar 

  10. Chien, S., Dwork, C., Kumar, R., Simon, D.R., Sivakumar, D.: Link evolution: Analysis and algorithms. Internet Mathematics 1(3), 277–304 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  11. Broder, A.Z., Lempel, R., Maghoul, F., Pedersen, J.: Efficient PageRank approximation via graph aggregation. Information Retrieval 9(2), 123–138 (2006)

    Article  Google Scholar 

  12. Chen, Y.-Y., Gan, Q., Suel, T.: Local methods for estimating PageRank values. In: Proc. CIKM, Washington, USA (November 2004)

    Google Scholar 

  13. Haveliwala, T.H.: Topic-sensitive PageRank. In: Proc. WWW, Honolulu, USA (May 2002)

    Google Scholar 

  14. Langville, A.N., Meyer, C.D.: Updating PageRank with iterative aggregation. In: Proc. WWW, New York, USA (May 2004)

    Google Scholar 

  15. Kendall, M.G., Gibbons, J.D.: Rank Correlation Methods, Charles Griffin, London, UK (1990)

    Google Scholar 

  16. Yang, H., King, I., Lyu, M.R.: Predictive ranking: a novel page ranking approach by estimating the Web structure. In: Proc. WWW (May 2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zacharouli, P., Titsias, M., Vazirgiannis, M. (2009). Web Page Rank Prediction with PCA and EM Clustering. In: Avrachenkov, K., Donato, D., Litvak, N. (eds) Algorithms and Models for the Web-Graph. WAW 2009. Lecture Notes in Computer Science, vol 5427. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-95995-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-95995-3_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-95994-6

  • Online ISBN: 978-3-540-95995-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics