Abstract
Web search algorithms that rank Web pages by examining the link structure of the Web are attractive from both theoretical and practical aspects. Today’s prevailing link-based ranking algorithms rank Web pages by using the dominant eigenvector of certain matrices—like the co-citation matrix or variations thereof. Recent analyses of ranking algorithms have focused attention on the case where the corresponding matrices are irreducible, thus avoiding singularities of reducible matrices. Consequently, rank analysis has been concentrated on authority connected graphs, which are graphs whose co-citation matrix is irreducible (after deleting zero rows and columns). Such graphs conceptually correspond to thematically related collections, in which most pages pertain to a single, dominant topic of interest.
A link-based search algorithm A is rank-stable if minor changes in the link structure of the input graph, which is usually a subgraph of the Web, do not affect the ranking it produces; algorithms A,B are rank-similar if they produce similar rankings. These concepts were introduced and studied recently for various existing search algorithms.
This paper studies the rank-stability and rank-similarity of three link-based ranking algorithms—PageRank, HITS and SALSA—in authority connected graphs. For this class of graphs, we show that neither HITS nor PageRank is rank stable. We then show that HITS and PageRank are not rank similar on this class, nor is any of them rank similar to SALSA.
Article PDF
Similar content being viewed by others
References
Achlioptas D, Fiat A, Karlin A and McSherry F (2001) Web search through hub synthesis. In: Proc. 42nd Annual Symposium on Foundations of Computer Science (FOCS 2001), Las Vegas, Nevada.
Amento B, Terveen L and Hill W (2000) Does “Authority” mean quality? predicting expert quality ratings of web documents. In: Proc. 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece.
Arasu A, Cho J, Garcia-Molina H, Paepcke A and Raghavan S (2001) Searching the Web. ACM Transactions on Internet Technology, 1(1):2–43.
Azar Y, Fiat A, Karlin A, McSherry F and Saia J (2001) Spectral analysis of data. In: Proc. 33rd annual ACM Symposium on Theory of Computing (STOC 2001), Crete, Greece.
Barabasi A-L and Albert R (1999) Emergence of scaling in random networks. Science, 286:509–512.
Bharat K and Henzinger MR (1998) Improved algorithms for topic distillation in a hyperlinked environment. In: Proc. 21’th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
Borodin A, Roberts GO, Rosenthal JS and Tsaparas P (2001) Finding authorities and hubs from link structures on the world wide web. Submitted for publication. Extended abstract appeared in Proc. 10th International World Wide Web Conference, pp. 415–429.
Brin S and Page L (1998) The anatomy of a large-scale hypertextual web search engine. In: Proc. 7th International WWW Conference, pp. 107–117.
Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A and Wiener J (2000) Graph structure in the web. In: Proc. 9th International WWW Conference, pp. 309–320.
Chakrabarti S, Dom B, Gibson D, Kleinberg J, Kumar S, Raghavan P, Rajagopalan S and Tomkins A (1999a) Hypersearching the web. Scientific American.
Chakrabarti S, Dom B, Gibson D, Kleinberg J, Kumar S, Raghavan P, Rajagopalan S and Tomkins A (1999b) Mining the link structure of the WWW. IEEE Computer.
Chakrabarti S, Dom B, Gibson D, Kleinberg JM, Raghavan P and Rajagopalan S (1998a) Automatic resource list compilation by analyzing hyperlink structure and associated text. In: Proc. 7th International WWW Conference.
Chakrabarti S, Dom B, Gibson D, Kumar S, Raghavan P, Rajagopalan S and Tomkins A (1998b) Spectral filtering for resource discovery. In: ACM SIGIR workshop on Hypertext Information Retrieval on the Web.
Chakrabarti S, Joshi M and Tawde V (2001) Enhanced topic distillation using text, markup tags, and hyperlinks. In: Proc. 24’th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 208–216.
Chien S, Dwork C, Kumar R, Simon D and Sivakumar D (2002) Link evolution: Analysis and algorithms. In: Workshop on Algorithms and Models for the Web Graph (WAW). Vancouver, Canada.
Davison BD (2000) Recognizing nepotistic links on the web. Technical Report WS-00-01, Artificial Intelligence for Web Search.
Diaconis P (1988) Group Representation in Probability and Statistics. IMS Lecture Series 11, Institute of Mathematical Statistics.
Dominich S and Tuza Z (2003) Computational aspects of connectionist interaction information retrieval. In: Proc. ACM SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval (MF/IR). Toronto, Canada.
Dwork C, Kumar R, Naor M and Sivakumar D (2001) Rank aggregation methods for the web. In: Proc. 10th International World Wide Web Conference, pp. 613–622.
Farahat A, LoFaro T, Miller JC, Rae G, Schaefer F and Ward LA (2001) Modifications of Kleinberg’s HITS algorithm using matrix exponentiation and web log records. In: Proc. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, Louisiana, USA.
Gallager RG (1996) Discrete Stochastic Processes. Kluwer Academic Publishers.
Haveliwala TH (2002) Topic-Sensitive PageRank. In: Proc. 11th International WWW Conference (WWW2002).
Henzinger MR, Motwani R and Silverstein C (2002) Challenges in web search engines. SIGIR Forum, 36(2).
Horn RA and Johnson CR (1985) Matrix Analysis. Cambridge University Press.
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604–632.
Kleinberg JM, Kumar R, Raghavan P, Rajagopalan S and Tomkins AS (1999) The web as a graph: Measurements, models and methods. In: Proc. of the Fifth International Computing and Combinatorics Conference, pp. 1–17.
Kumar R, Raghavan P, Rajagopalan S, Sivakumar D, Tomkins AS and Upfal E (2000) Stochastic models for the web graph. In: Proc. 41st Annual Symposium on Foundations of Computer Science (FOCS 2000), Redondo Beach, California. pp. 57–65.
Lee HC (2002) When the hyperlinked environment is perturbed. In: Workshop on Algorithms and Models for the Web Graph (WAW), Vancouver, Canada.
Lempel R and Moran S (2001a) Rank-stability and rank-similarity of web link-based ranking algorithms. Technical Report CS-2001-22 (revised version), Dept. of Computer Science, Technion—Israel Institute of Technology.
Lempel R and Moran S (2001b) SALSA: The stochastic approach for link-structure analysis. ACM Transactions on Information Systems, 19(2):131–160.
Marchiori M (1997) The quest for correct information on the web: Hyper search engines. In: Proc. 6th International WWW Conference.
Ng AY, Zheng AX and Jordan MI (2001) Stable algorithms for link analysis. In: Proc. 24’th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 258–266.
Pandurangan G, Raghavan P and Upfal E (2002) Using pageRank to characterize web structure. In: Proc. 8th Annual International Computing and Combinatorics Conference, pp. 330–339.
Ruhl M, Bharat K, Chang B-W and Henzinger M (2001) Who links to whom: Mining linkage between web sites. In: IEEE International Conference on Data Mining (ICDM), pp. 51–58.
Silva I, Ribeiro-Neto B, Calado P, Moura E and Ziviani N (2000) Link-based and content-based evidential information in a belief network model. In: Proc. 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, pp. 96–103.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported by the Fund for the Promotion of Research at the Technion, and by the Barnard Elkin Chair in Computer Science.
Rights and permissions
About this article
Cite this article
Lempel, R., Moran, S. Rank-Stability and Rank-Similarity of Link-Based Web Ranking Algorithms in Authority-Connected Graphs. Inf Retrieval 8, 245–264 (2005). https://doi.org/10.1007/s10791-005-5661-0
Issue Date:
DOI: https://doi.org/10.1007/s10791-005-5661-0