Information Retrieval

, Volume 8, Issue 2, pp 245–264

Rank-Stability and Rank-Similarity of Link-Based Web Ranking Algorithms in Authority-Connected Graphs

Article

Abstract

Web search algorithms that rank Web pages by examining the link structure of the Web are attractive from both theoretical and practical aspects. Today’s prevailing link-based ranking algorithms rank Web pages by using the dominant eigenvector of certain matrices—like the co-citation matrix or variations thereof. Recent analyses of ranking algorithms have focused attention on the case where the corresponding matrices are irreducible, thus avoiding singularities of reducible matrices. Consequently, rank analysis has been concentrated on authority connected graphs, which are graphs whose co-citation matrix is irreducible (after deleting zero rows and columns). Such graphs conceptually correspond to thematically related collections, in which most pages pertain to a single, dominant topic of interest.

A link-based search algorithm A is rank-stable if minor changes in the link structure of the input graph, which is usually a subgraph of the Web, do not affect the ranking it produces; algorithms A,B are rank-similar if they produce similar rankings. These concepts were introduced and studied recently for various existing search algorithms.

This paper studies the rank-stability and rank-similarity of three link-based ranking algorithms—PageRank, HITS and SALSA—in authority connected graphs. For this class of graphs, we show that neither HITS nor PageRank is rank stable. We then show that HITS and PageRank are not rank similar on this class, nor is any of them rank similar to SALSA.

Keywords

Web IR citation link analysis 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Achlioptas D, Fiat A, Karlin A and McSherry F (2001) Web search through hub synthesis. In: Proc. 42nd Annual Symposium on Foundations of Computer Science (FOCS 2001), Las Vegas, Nevada.Google Scholar
  2. Amento B, Terveen L and Hill W (2000) Does “Authority” mean quality? predicting expert quality ratings of web documents. In: Proc. 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece.Google Scholar
  3. Arasu A, Cho J, Garcia-Molina H, Paepcke A and Raghavan S (2001) Searching the Web. ACM Transactions on Internet Technology, 1(1):2–43.Google Scholar
  4. Azar Y, Fiat A, Karlin A, McSherry F and Saia J (2001) Spectral analysis of data. In: Proc. 33rd annual ACM Symposium on Theory of Computing (STOC 2001), Crete, Greece.Google Scholar
  5. Barabasi A-L and Albert R (1999) Emergence of scaling in random networks. Science, 286:509–512.CrossRefGoogle Scholar
  6. Bharat K and Henzinger MR (1998) Improved algorithms for topic distillation in a hyperlinked environment. In: Proc. 21’th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Google Scholar
  7. Borodin A, Roberts GO, Rosenthal JS and Tsaparas P (2001) Finding authorities and hubs from link structures on the world wide web. Submitted for publication. Extended abstract appeared in Proc. 10th International World Wide Web Conference, pp. 415–429.Google Scholar
  8. Brin S and Page L (1998) The anatomy of a large-scale hypertextual web search engine. In: Proc. 7th International WWW Conference, pp. 107–117.Google Scholar
  9. Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A and Wiener J (2000) Graph structure in the web. In: Proc. 9th International WWW Conference, pp. 309–320.Google Scholar
  10. Chakrabarti S, Dom B, Gibson D, Kleinberg J, Kumar S, Raghavan P, Rajagopalan S and Tomkins A (1999a) Hypersearching the web. Scientific American.Google Scholar
  11. Chakrabarti S, Dom B, Gibson D, Kleinberg J, Kumar S, Raghavan P, Rajagopalan S and Tomkins A (1999b) Mining the link structure of the WWW. IEEE Computer.Google Scholar
  12. Chakrabarti S, Dom B, Gibson D, Kleinberg JM, Raghavan P and Rajagopalan S (1998a) Automatic resource list compilation by analyzing hyperlink structure and associated text. In: Proc. 7th International WWW Conference.Google Scholar
  13. Chakrabarti S, Dom B, Gibson D, Kumar S, Raghavan P, Rajagopalan S and Tomkins A (1998b) Spectral filtering for resource discovery. In: ACM SIGIR workshop on Hypertext Information Retrieval on the Web.Google Scholar
  14. Chakrabarti S, Joshi M and Tawde V (2001) Enhanced topic distillation using text, markup tags, and hyperlinks. In: Proc. 24’th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 208–216.Google Scholar
  15. Chien S, Dwork C, Kumar R, Simon D and Sivakumar D (2002) Link evolution: Analysis and algorithms. In: Workshop on Algorithms and Models for the Web Graph (WAW). Vancouver, Canada.Google Scholar
  16. Davison BD (2000) Recognizing nepotistic links on the web. Technical Report WS-00-01, Artificial Intelligence for Web Search.Google Scholar
  17. Diaconis P (1988) Group Representation in Probability and Statistics. IMS Lecture Series 11, Institute of Mathematical Statistics.Google Scholar
  18. Dominich S and Tuza Z (2003) Computational aspects of connectionist interaction information retrieval. In: Proc. ACM SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval (MF/IR). Toronto, Canada.Google Scholar
  19. Dwork C, Kumar R, Naor M and Sivakumar D (2001) Rank aggregation methods for the web. In: Proc. 10th International World Wide Web Conference, pp. 613–622.Google Scholar
  20. Farahat A, LoFaro T, Miller JC, Rae G, Schaefer F and Ward LA (2001) Modifications of Kleinberg’s HITS algorithm using matrix exponentiation and web log records. In: Proc. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, Louisiana, USA.Google Scholar
  21. Gallager RG (1996) Discrete Stochastic Processes. Kluwer Academic Publishers.Google Scholar
  22. Haveliwala TH (2002) Topic-Sensitive PageRank. In: Proc. 11th International WWW Conference (WWW2002).Google Scholar
  23. Henzinger MR, Motwani R and Silverstein C (2002) Challenges in web search engines. SIGIR Forum, 36(2).Google Scholar
  24. Horn RA and Johnson CR (1985) Matrix Analysis. Cambridge University Press.Google Scholar
  25. Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604–632.Google Scholar
  26. Kleinberg JM, Kumar R, Raghavan P, Rajagopalan S and Tomkins AS (1999) The web as a graph: Measurements, models and methods. In: Proc. of the Fifth International Computing and Combinatorics Conference, pp. 1–17.Google Scholar
  27. Kumar R, Raghavan P, Rajagopalan S, Sivakumar D, Tomkins AS and Upfal E (2000) Stochastic models for the web graph. In: Proc. 41st Annual Symposium on Foundations of Computer Science (FOCS 2000), Redondo Beach, California. pp. 57–65.Google Scholar
  28. Lee HC (2002) When the hyperlinked environment is perturbed. In: Workshop on Algorithms and Models for the Web Graph (WAW), Vancouver, Canada.Google Scholar
  29. Lempel R and Moran S (2001a) Rank-stability and rank-similarity of web link-based ranking algorithms. Technical Report CS-2001-22 (revised version), Dept. of Computer Science, Technion—Israel Institute of Technology.Google Scholar
  30. Lempel R and Moran S (2001b) SALSA: The stochastic approach for link-structure analysis. ACM Transactions on Information Systems, 19(2):131–160.Google Scholar
  31. Marchiori M (1997) The quest for correct information on the web: Hyper search engines. In: Proc. 6th International WWW Conference.Google Scholar
  32. Ng AY, Zheng AX and Jordan MI (2001) Stable algorithms for link analysis. In: Proc. 24’th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 258–266.Google Scholar
  33. Pandurangan G, Raghavan P and Upfal E (2002) Using pageRank to characterize web structure. In: Proc. 8th Annual International Computing and Combinatorics Conference, pp. 330–339.Google Scholar
  34. Ruhl M, Bharat K, Chang B-W and Henzinger M (2001) Who links to whom: Mining linkage between web sites. In: IEEE International Conference on Data Mining (ICDM), pp. 51–58.Google Scholar
  35. Silva I, Ribeiro-Neto B, Calado P, Moura E and Ziviani N (2000) Link-based and content-based evidential information in a belief network model. In: Proc. 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, pp. 96–103.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  1. 1.IBM Research LabHaifaIsrael
  2. 2.Department of Computer ScienceTechnionHaifaIsrael

Personalised recommendations