Web Page Rank Prediction with PCA and EM Clustering

  • Polyxeni Zacharouli
  • Michalis Titsias
  • Michalis Vazirgiannis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5427)

Abstract

In this paper we describe learning algorithms for Web page rank prediction. We consider linear regression models and combinations of regression with probabilistic clustering and Principal Components Analysis (PCA). These models are learned from time-series data sets and can predict the ranking of a set of Web pages in some future time. The first algorithm uses separate linear regression models. This is further extended by applying probabilistic clustering based on the EM algorithm. Clustering allows for the Web pages to be grouped together by fitting a mixture of regression models. A different method combines linear regression with PCA so as dependencies between different web pages can be exploited. All the methods are evaluated using real data sets obtained from Internet Archive, Wikipedia and Yahoo! ranking lists. We also study the temporal robustness of the prediction framework. Overall the system constitutes a set of tools for high accuracy pagerank prediction which can be used for efficient resource management by search engines.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Polyxeni Zacharouli
    • 1
  • Michalis Titsias
    • 2
  • Michalis Vazirgiannis
    • 1
  1. 1.Univ. of Economics and BusinessAthensGreece
  2. 2.School of Computer ScienceUniversity of ManchesterUK

Personalised recommendations