Web Page Rank Prediction with PCA and EM Clustering

  • Polyxeni Zacharouli
  • Michalis Titsias
  • Michalis Vazirgiannis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5427)


In this paper we describe learning algorithms for Web page rank prediction. We consider linear regression models and combinations of regression with probabilistic clustering and Principal Components Analysis (PCA). These models are learned from time-series data sets and can predict the ranking of a set of Web pages in some future time. The first algorithm uses separate linear regression models. This is further extended by applying probabilistic clustering based on the EM algorithm. Clustering allows for the Web pages to be grouped together by fitting a mixture of regression models. A different method combines linear regression with PCA so as dependencies between different web pages can be exploited. All the methods are evaluated using real data sets obtained from Internet Archive, Wikipedia and Yahoo! ranking lists. We also study the temporal robustness of the prediction framework. Overall the system constitutes a set of tools for high accuracy pagerank prediction which can be used for efficient resource management by search engines.


Linear Regression Model Probabilistic Cluster Internet Archive Query Load PageRank Score 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Vazirgiannis, M., Drosos, D., Senellart, P., Vlachou, A.: Web Page Rank Prediction with Markov Models. WWW poster, Beijing, China (2008)CrossRefGoogle Scholar
  2. 2.
    Vlachou, A., Vazirgiannis, M., Berberich, K.: Representing and quantifying rank - change for the web graph. In: Aiello, W., Broder, A., Janssen, J., Milios, E.E. (eds.) WAW 2006. LNCS, vol. 4936, pp. 157–165. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  3. 3.
    Tipping, M.E., Bishop, C.M.: Mixtures of probabilistic principal component analyzers. Neural Computation 11, 443–482 (1999)CrossRefGoogle Scholar
  4. 4.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B 39(1), 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Jolliffe, I.T.: Principal Component Analysis. Springer, New York (1986)CrossRefzbMATHGoogle Scholar
  6. 6.
    Bishop, C.M.: Machine learning and pattern recognition. Information Science and Statistics. Springer, Heidelberg (2006)zbMATHGoogle Scholar
  7. 7.
    Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. TOIS 20(4), 422–446 (2002)CrossRefGoogle Scholar
  8. 8.
    Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)zbMATHGoogle Scholar
  9. 9.
    Kan, M.-Y., Thi, H.O.N.: Fast webpage classification using URL features. In: Proc. CIKM, Bremen, Germany (October 2005)Google Scholar
  10. 10.
    Chien, S., Dwork, C., Kumar, R., Simon, D.R., Sivakumar, D.: Link evolution: Analysis and algorithms. Internet Mathematics 1(3), 277–304 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Broder, A.Z., Lempel, R., Maghoul, F., Pedersen, J.: Efficient PageRank approximation via graph aggregation. Information Retrieval 9(2), 123–138 (2006)CrossRefGoogle Scholar
  12. 12.
    Chen, Y.-Y., Gan, Q., Suel, T.: Local methods for estimating PageRank values. In: Proc. CIKM, Washington, USA (November 2004)Google Scholar
  13. 13.
    Haveliwala, T.H.: Topic-sensitive PageRank. In: Proc. WWW, Honolulu, USA (May 2002)Google Scholar
  14. 14.
    Langville, A.N., Meyer, C.D.: Updating PageRank with iterative aggregation. In: Proc. WWW, New York, USA (May 2004)Google Scholar
  15. 15.
    Kendall, M.G., Gibbons, J.D.: Rank Correlation Methods, Charles Griffin, London, UK (1990)Google Scholar
  16. 16.
    Yang, H., King, I., Lyu, M.R.: Predictive ranking: a novel page ranking approach by estimating the Web structure. In: Proc. WWW (May 2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Polyxeni Zacharouli
    • 1
  • Michalis Titsias
    • 2
  • Michalis Vazirgiannis
    • 1
  1. 1.Univ. of Economics and BusinessAthensGreece
  2. 2.School of Computer ScienceUniversity of ManchesterUK

Personalised recommendations