Effective Filtering for Collaborative Publishing

  • Arindam Chakrabarti
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3828)


In little over the last decade the World Wide Web has established itself as a medium of interaction, communication, content delivery, and collaboration, opening doors of opportunity never before available to humanity, and on a scale unprecedented in human history. At the same time, information overload, due to democratization of content creation and delivery, remains a major problem. In this paper, we postulate that the problems of democracy are solved by democracy itself: harnessing the people power of the world wide web through collaborative filtering of content is the natural solution to the information overload problem; and we present approaches to promote such collaboration.

We show that the standard PageRank Algorithm, inspired by the effectiveness of citation-structure analysis (“all links are good, and the more the better”) to estimate the relative importance of articles in scientific literature, is becoming less effective in this increasingly democratized world of online content. As long as uniformly edited content produced by media companies and other corporate entities dominated online content, the topological similarity of the web to the world of scientific literature was maintained sufficiently well. The explosion of unedited blogs, discussion fora, and wikis, with their “messier” hyperlink structure, is rapidly reducing this similarity, and also the effectiveness of standard PageRank-based filtering methods.

We assume a slightly modified Web infrastructure in which links have positive and negative weights, and show that this enables radically different and more effective approaches to page ranking and collaborative content filtering, leading to a vastly improved environment to incentivize content creation and co-operation on the World Wide Web, helping realize, in essence, a vastly more efficient information economy in today’s online global village.


Discussion Forum Online Content Content Creation Corporate Entity PageRank Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bharat, K., Henzinger, M.R.: Improved Algorithms for Topic Distillation in Hyperlinked Environments. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 104–111. ACM Press, New York (1998)CrossRefGoogle Scholar
  2. 2.
    Bharat, K., Mihaila, G.A.: When experts agree: Using non-affiliated experts to rank popular topics. In: Proceedings of the 11th International World Wide Web Conference, WWW 2002 (2002)Google Scholar
  3. 3.
    Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Finding Authorities and Hubs from Link Structures on the World Wide Web. In: Proceedings of the 10th International World Wide Web Conference, WWW 2001 (2001)Google Scholar
  4. 4.
    Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: Proceedings of the 7th International World Wide Web Conference, WWW 1998 (1998)Google Scholar
  5. 5.
    Chakrabarti, S., Dom, B., Gibson, D., Kleinberg, J., Raghavan, P., Rajagopalan, S.: Automatic Resource Compilation by Analyzing Hyperlink structure and Associated Text. In: Proceedings of the 7th International World Wide Web Conference, WWW 1998 (1998)Google Scholar
  6. 6.
    Chakrabarti, S., Joshi, M.M., Punera, K., Pennock, D.M.: The Structure of Broad Topics on the Web. In: Proceedings of the 11th International World Wide Web Conference, WWW 2002 (2002)Google Scholar
  7. 7.
    Cohn, D., Chang, H.: Learning to Probabilistically Identify Authoritative Documents. In: Proceedings of the 17th International Conference on Machine Learning, pp. 167–174. Morgan Kaufmann, San Francisco (2000)Google Scholar
  8. 8.
    Diligenti, M., Gori, M., Maggini, M.: Web Page Scoring Systems for Horizontal and Vertical Search. In: Proceedings of the 11th International World Wide Web Conference, WWW 2002 (2002)Google Scholar
  9. 9.
    Haveliwala, T.H.: Topic-Sensitive PageRank. In: Proceedings of the 11th International World Wide Web Conference, WWW 2002 (2002)Google Scholar
  10. 10.
    Haveliwala, T.H.: Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search. IEEE Transactions on Knowledge and Data Engineering 15(4), 784–796 (2003)CrossRefGoogle Scholar
  11. 11.
    Kleinberg, J.: Authoritative Sources in a Hyperlinked Environment. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. ACM Press, New York (1998)Google Scholar
  12. 12.
    Ng, A.Y., Zheng, A.X., Jordan, M.I.: Stable Algorithms for Link Analysis. In: Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI 2001 (2001)Google Scholar
  13. 13.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999-66, Stanford University (1999),
  14. 14.
    Pennock, D.M., Flake, G., Lawrence, S., Glover, E., Giles, C.L.: Winners Don’t Take All: Characterizing the Competition for Links on the Web. Proceedings of the National Academy of Sciences (2002)Google Scholar
  15. 15.
    Rafiei, D., Mendelzon, A.O.: What is this Page Known for?: Computing Web Page Reputations. In: Proceedings of the 9th International World Wide Web Conference, WWW 2000 (2000)Google Scholar
  16. 16.
    Zhang, D., Dong, Y.: An Efficient Algorithm to Rank Web Resources. In: Proceedings of the 9th International World Wide Web Conference (WWW 2000). Elsevier Science, Amsterdam (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Arindam Chakrabarti
    • 1
  1. 1.Computer Science Division, Department of Electrical Engineering and Computer SciencesUniversity of California at Berkeley 

Personalised recommendations