Advertisement

Social Network Analysis

  • Bing LiuEmail author
Chapter
Part of the Data-Centric Systems and Applications book series (DCSA)

Abstract

Early search engines retrieved relevant pages for the user based primarily on the content similarity of the user query and the indexed pages of the search engines. The retrieval and ranking algorithms were simply direct implementation of those from information retrieval. Starting from 1996, it became clear that content similarity alone was no longer sufficient for search due to two reasons. First, the number of Web pages grew rapidly during the middle to late 1990s. Given any query, the number of relevant pages can be huge. For example, given the search query “classification technique”, the Google search engine estimates that there are about 10 million relevant pages. This abundance of information causes a major problem for ranking, i.e., how to choose only 30–40 pages and rank them suitably to present to the user. Second, content similarity methods are easily spammed. A page owner can repeat some important words and add many remotely related words in his/her pages to boost the rankings of the pages and/or to make the pages relevant to a large number of possible queries.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Bibliography

  1. 1.
    Aleman-Meza, B., M. Nagarajan, C. Ramakrishnan, L. Ding, P. Kolari, A. Sheth, I. Arpinar, A. Joshi, and T. Finin. Semantic analytics on social networks: experiences in addressing the problem of conflict of interestGoogle Scholar
  2. 2.
    detection. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.Google Scholar
  3. 3.
    Andersen, R. and K. Lang. Communities from seed sets. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.Google Scholar
  4. 4.
    Baeza-Yates, R., F. Saint-Jean, and C. Castillo. Web structure, dynamics and page quality. In Proceedings of String Processing and Information Retrieval, 2002: Springer.Google Scholar
  5. 5.
    Bar-Yossef, Z. and M. Gurevich. Random sampling from a search engine's index. Journal of the ACM (JACM), 2008, 55(5): p. 1–74.CrossRefMathSciNetGoogle Scholar
  6. 6.
    Barabasi, L. and R. Albert. Emergence of Scaling in Random Walk. Science, 1999, 286(5439): p. 509–512.CrossRefMathSciNetGoogle Scholar
  7. 7.
    Bharat, K. and A. Broder. A technique for measuring the relative size and overlap of public web search engines. Computer Networks, 1998, 30(1–7): p. 379–388.Google Scholar
  8. 8.
    Bharat, K. and M. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-1998), 1998.Google Scholar
  9. 9.
    Borgman, C. and J. Furner. Scholarly communication and bibliometrics. Annual Review of Information Science and Technology, 2002, 36: p. 3–72.Google Scholar
  10. 10.
    Brin, S. and P. Lawrence. The anatomy of a large-scale hypertextual web search engine. Computer Networks, 1998, 30(1–7): p. 107–117.Google Scholar
  11. 11.
    Cai, D., S. Yu, J. Wen, and W. Ma. Block-based web search. In Proceedings of ACM SIGIR Research and Development in Information Retrieval (SIGIR-2004), 2004.Google Scholar
  12. 12.
    Chakrabarti, S. Integrating the document object model with hyperlinks for enhanced topic distillation and information extraction. In Proceedings of 10th international conference on World Wide Web. 2001, ACM: Hong Kong, Hong Kong. p. 211–220.Google Scholar
  13. 13.
    Chakrabarti, S., B. Dom, S. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. Kleinberg. Mining the Web's link structure. Computer, 2002, 32(8): p. 60–67.CrossRefGoogle Scholar
  14. 14.
    Chakrabarti, S., K. Puniyani, and S. Das. Optimizing scoring functions and indexes for proximity search in type-annotated corpora. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.Google Scholar
  15. 15.
    Chen, Y., Q. Gan, and T. Suel. Local methods for estimating pagerank values. In Proceedings of ACM International Conference on Information and knowledge management (CIKM-2004), 2004.Google Scholar
  16. 16.
    Cho, J. and S. Roy. Impact of search engines on page popularity. In Proceedings of International Conference on World Wide Web (WWW-2004), 2004.Google Scholar
  17. 17.
    Diaz, F. Integration of news content into web results. In Proceedings of ACM International Conference on Web Search and Data Mining (WSDM-2009), 2009.Google Scholar
  18. 18.
    Diesner, J. and K. Carley. Exploration of communication networks from the Enron email corpus. In Proceedings of Workshop on Link Analysis, Counterterrorism and Security at SDM’05, 2005.Google Scholar
  19. 19.
    Diligenti, M., M. Gori, and M. Maggini. Web page scoring systems for horizontal and vertical search. In Proceedings of International Conference on World Wide Web (WWW-2002), 2002.Google Scholar
  20. 20.
    Ding, C., X. He, P. Husbands, H. Zha, and H. Simon. PageRank, HITS and a unified framework for link analysis. In Proceedings of SIAM International Conference on Data Mining (SDM-2002), 2002.Google Scholar
  21. 21.
    Dong, A., Y. Chang, Z. Zheng, G. Mishne, J. Bai, R. Zhang, K. Buchner, C. Liao, and F. Diaz. Towards recency ranking in web search. In Proceedings of ACM International Conference on Web Search and Data Mining (WSDM-2010), 2010.Google Scholar
  22. 22.
    Eckmann, J. and E. Moses. Curvature of co-links uncovers hidden thematic layers in the world wide web. Proceedings of the National Academy of Sciences of the United States of America, 2002, 99(9): p. 5825.CrossRefMathSciNetGoogle Scholar
  23. 23.
    Fagin, R., R. Kumar, K. McCurley, J. Novak, D. Sivakumar, J. Tomlin, and D. Williamson. Searching the workplace web. In Proceedings of International Conference on World Wide Web (WWW-2003), 2003.Google Scholar
  24. 24.
    Farahat, A., T. LoFaro, J. Miller, G. Rae, and L. Ward. Authority rankings from HITS, PageRank, and SALSA: Existence, uniqueness, and effect of initialization. SIAM Journal on Scientific Computing, 2006, 27(4): p. 1181-Google Scholar
  25. 25.
  26. 26.
    Flake, G., S. Lawrence, and C. Giles. Efficient identification of web communities. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2000), 2000.Google Scholar
  27. 27.
    Flake, G., S. Lawrence, C. Giles, and F. Coetzee. Self-organization of the web and identification of communities. IEEE Computer, 2002, 35(3): p. 66–71.Google Scholar
  28. 28.
    Ford, L. and D. Fulkerson. Maximal flow through a network. Canadian Journal of Mathematics, 1956, 8(3): p. 399–404.zbMATHCrossRefMathSciNetGoogle Scholar
  29. 29.
    Fortunato, S., A. Flammini, and F. Menczer. Scale-free network growth by ranking. Physical review letters, 2006, 96(21): p. 218701.CrossRefGoogle Scholar
  30. 30.
    Fortunato, S., A. Flammini, F. Menczer, and A. Vespignani. Topical interests and the mitigation of search engine bias. Proceedings of the National Academy of Sciences, 2006, 103(34): p. 12684.CrossRefGoogle Scholar
  31. 31.
    Gibson, D., J. Kleinberg, and P. Raghavan. Inferring web communities from link topology. In Proceedings of ACM Conference on Hypertext and Hypermedia, 1998.Google Scholar
  32. 32.
    Girvan, M. and M. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America, 2002, 99(12): p. 7821.zbMATHCrossRefMathSciNetGoogle Scholar
  33. 33.
    Golub, G. and C. Van Loan. Matrix computations. 1996: Johns Hopkins Univ Press.Google Scholar
  34. 34.
    Grimmet, G. and D. Stirzaker. Probability and Random Process. 1989: Oxford University Press.Google Scholar
  35. 35.
    Henzinger, M., A. Heydon, M. Mitzenmacher, and M. Najork. Measuring index quality using random walks on the Web. Computer Networks, 1999, 31(11–16): p. 1291–1303.CrossRefGoogle Scholar
  36. 36.
    Ino, H., M. Kudo, and A. Nakamura. Partitioning of Web graphs by community topology. In Proceedings of International Conference on World Wide Web (WWW-2005), 2005.Google Scholar
  37. 37.
    Kamvar, S., T. Haveliwala, C. Manning, and G. Golub. Extrapolation methods for accelerating PageRank computations. In Proceedings of International Conference on World Wide Web (WWW-2003), 2003.Google Scholar
  38. 38.
    Kessler, M. Bibliographic coupling between scientific papers. American documentation, 1963, 14(1): p. 10–25.CrossRefGoogle Scholar
  39. 39.
    Kleinberg, J. Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 1999, 46(5): p. 604–632.zbMATHCrossRefMathSciNetGoogle Scholar
  40. 40.
    Kumar, R., P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the Web for emerging cyber-communities. Computer Networks, 1999, 31(11–16): p. 1481–1493.CrossRefGoogle Scholar
  41. 41.
    Langville, A. and C. Meyer. Deeper inside pagerank. Internet Mathematics, 2004, 1(3): p. 335–380.zbMATHCrossRefMathSciNetGoogle Scholar
  42. 42.
    Langville, A. and C. Meyer. Google's PageRank and beyond: the science of search engine rankings. 2006: Princeton University Press.Google Scholar
  43. 43.
    Lempel, R. and S. Moran. The stochastic approach for link-structure analysis (SALSA) and the TKC effect. Computer Networks, 2000, 33(1–6): p. 387–401.CrossRefGoogle Scholar
  44. 44.
    Li, X., B. Liu, and P. Yu. Discovering overlapping communities of named entities. Knowledge Discovery in Databases: PKDD 2006, 2006: p. 593–600.CrossRefGoogle Scholar
  45. 45.
    Li, X., B. Liu, and P. Yu. Time Sensitive Ranking with Application to Publication Search. In Proceedings of IEEE International Conference on Data Mining (ICDM-2008), 2008.Google Scholar
  46. 46.
    Li, X., B. Liu, and P. Yu. Time sensitive ranking with application to publication search. In Link Mining: Models, Algorithms, and Applications, P. Yu, J. Han, and C. Faloutsos, Editors. 2010, Springer. p. 187–209.Google Scholar
  47. 47.
    Liu, Y., B. Gao, T. Liu, Y. Zhang, Z. Ma, S. He, and H. Li. BrowseRank: letting web users vote for page importance. In Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-2008), 2008.Google Scholar
  48. 48.
    McSherry, F. A uniform approach to accelerated PageRank computation. In Proceedings of International Conference on World Wide Web (WWW-2005), 2005.Google Scholar
  49. 49.
    Menczer, F. Evolution of document networks. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(Suppl 1): p. 5261.CrossRefGoogle Scholar
  50. 50.
    Menczer, F. Growing and navigating the small world web by local content. Proceedings of the National Academy of Sciences of the United States of America, 2002, 99(22): p. 14014.CrossRefGoogle Scholar
  51. 51.
    Ng, A., A. Zheng, and M. Jordan. Stable algorithms for link analysis. In Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-2001), 2001.Google Scholar
  52. 52.
    Nie, Z., Y. Zhang, J. Wen, and W. Ma. Object-level ranking: bringing order to web objects. In Proceedings of International Conference on World Wide Web (WWW-2005), 2005.Google Scholar
  53. 53.
    Ntoulas, A., J. Cho, and C. Olston. What's new on the web?: the evolution of the web from a search engine perspective. In Proceedings of International Conference on World Wide Web (WWW-2004), 2004.Google Scholar
  54. 54.
    Page, L., S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. In Technical Report 1999–0120. 1998, Computer Science Department, Stanford University.Google Scholar
  55. 55.
    Pandey, S., S. Roy, C. Olston, J. Cho, and S. Chakrabarti. Shuffling a stacked deck: The case for partially randomized ranking of search engine results. In Proceedings of International Conference on Very Large Data Bases (VLDB-2005), 2005.Google Scholar
  56. 56.
    Pennock, D., G. Flake, S. Lawrence, E. Glover, and C. Giles. Winners don't take all: Characterizing the competition for links on the web. Proceedings of the National Academy of Sciences of the United States of America, 2002, 99(8): p. 5207.zbMATHCrossRefGoogle Scholar
  57. 57.
    Small, H. Co citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 1973, 24(4): p. 265–269.CrossRefGoogle Scholar
  58. 58.
    Tomlin, J. A new paradigm for ranking pages on the world wide web. In Proceedings of International Conference on World Wide Web (WWW-2003), 2003.Google Scholar
  59. 59.
    Toyoda, M. and M. Kitsuregawa. Creating a Web community chart for navigating related communities. In Proceedings of ACM Conf. on Hypertext and Hypermedia, 2001.Google Scholar
  60. 60.
    Toyoda, M. and M. Kitsuregawa. Extracting evolution of web communities from a series of web archives. In Proceedings of ACM Conf. on Hypertext and Hypermedia, 2003.Google Scholar
  61. 61.
    Tyler, J.R., D.M. Wilkinson, and B.A. Huberman. Email as Spectroscopy: Automated Discovery of Community Structure within Organizations. Communities and Technologies, 2003.Google Scholar
  62. 62.
    Wasserman, S. and K. Faust. Social Network Analysis. 1994: Cambridge University Press.Google Scholar
  63. 63.
    Wu, X., L. Zhang, and Y. Yu. Exploring social annotations for the semantic web. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.Google Scholar
  64. 64.
    Yu, P.S., X. Li, and B. Liu. Adding the Temporal Dimension to Search – A Case Study in Publication Search. In Proceedings of International Conference on Web Intelligence, 2005.Google Scholar
  65. 65.
    Zhang, R., Y. Chang, Z. Zheng, D. Metzler, and J. Nie. Search result reranking by feedback control adjustment for time-sensitive query. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2009.Google Scholar
  66. 66.
    Zhou, D., E. Manavoglu, J. Li, C. Giles, and H. Zha. Probabilistic models for discovering e-communities. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of Illinois, ChicagoChicagoUSA

Personalised recommendations