Advertisement

CUVIM: Extracting Fresh Information from Social Network

  • Rui Guo
  • Hongzhi Wang
  • Kaiyu Li
  • Jianzhong Li
  • Hong Gao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7923)

Abstract

Social network preserves the life of users and provides great potential for journalists, sociologists and business analysts. Crawling data from social network is a basic step for social network information analysis and processing. As the network becomes huge and information on the network updates faster than web pages, crawling is more difficult because of the limitations of bandwidth, politeness etiquette and computation power. To extract fresh information from social network efficiently and effectively, this paper presents a novel crawling method of social network. To discover the feature of social network, we gather data from real social network, analyze them and build a model to describe the discipline of users’ behavior. With the modeled behavior, we propose methods to predict users’ behavior. According to the prediction, we schedule our crawler more reasonably and extract more fresh information. Experimental results demonstrate that our strategies could obtain information from SNS efficiently and effectively.

Keywords

Social Network Crawler Freshness 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
    Stanford Graph Set, http://snap.stanford.edu/data/
  3. 3.
    Leskovec, J.: Social Media Analytics. SIGKDD, tutorial (2011)Google Scholar
  4. 4.
  5. 5.
    Denev, D., Mazeika, A., Spaniol, M., Weikum, G.: SHARC: Framework for Quality-Conscious Web Archiving. In: VLDB (2009)Google Scholar
  6. 6.
    Olston, C., Pandey, S.: Recrawl scheduling based on information longevity. In: WWW, pp. 437–446 (2008)Google Scholar
  7. 7.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 1–38 (1977)Google Scholar
  8. 8.
    Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachoura, V., Silvestri, F.: Design trade-offs for search engine caching. ACM Trans. Web 2(4), 1–28 (2008)CrossRefGoogle Scholar
  9. 9.
    Cho, J., Ntoulas, A.: Eective change detection using sampling. In: VLDB, pp. 514–525 (2002)Google Scholar
  10. 10.
    Casella, G., Berger, R. (eds.): Statistical Inference. Brooks/Cole (2008)Google Scholar
  11. 11.
    Cho, J., Garcia-Molina, H., Page, L.: Efficient crawling through url ordering. In: WWW, pp. 161–172 (1998)Google Scholar
  12. 12.
    Cho, J., Garcia-Molina, H.: Estimating frequency of change. Trans. Inter. Tech. 3(3), 256–290 (2003)CrossRefGoogle Scholar
  13. 13.
    Castillo, C., Marin, M., Rodriguez, A., Baeza-Yates, R.: Scheduling algorithms for web crawling. In: WebMedia, pp. 10–17 (2004)Google Scholar
  14. 14.
    Cho, J., Schonfeld, U.: Rankmass crawler: a crawler with highpersonalized pagerank coverage guarantee. In: VLDB, pp. 375–386 (2007)Google Scholar
  15. 15.
  16. 16.
    Byun, C., Lee, H., Kim, Y.: Automated Twitter Data Collecting Tool for Data Mining in Social Network. In: RACS (2012)Google Scholar
  17. 17.
    Okazaki, T.M., Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proc. of Conf. on World Wide Web, WWW (2010)Google Scholar
  18. 18.
    Aramaki, E., Maskawa, S., Morita, M.: Twitter Catches, The Flu: Detecting Influenza Epidemics using Twitter. In: Proceedings of the 2011 Conference on Empirical Methods, in Natural Language Processing, Edinburgh, Scotland, UK, July 27-31, pp. 1568–1576. Association for Computational Linguistics (2011)Google Scholar
  19. 19.
    Bošnjak, M., Oliveira, E., Martins, J., Mendes, E., Sarmento, L.: TwitterEcho - A Distributed Focused Crawler to Support Open Research with Twitter Data. In: WWW 2012 – MSND 2012 Workshop, Lyon, France, April 16-20 (2012)Google Scholar
  20. 20.
    Noordhuis, P., Heijkoop, M., Lazovik, A.: Mining Twitter in the Cloud. In: IEEE 3rd International Conference on Cloud Computing (2010)Google Scholar
  21. 21.
    Dziczkowski, G., Bougueroua, L., Wegrzyn-Wolska, K.: Social Network – An tutonoumous system designed for radio recommendation. In: International Conference on Computational Aspects of Social Networks, SASoN (2009)Google Scholar
  22. 22.
    Chau, D., Pandit, S., Wang, S., Faloutsos, C.: Parallel Crawling for Online Social Networks. In: WWW (2007)Google Scholar
  23. 23.

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Rui Guo
    • 1
  • Hongzhi Wang
    • 1
  • Kaiyu Li
    • 1
  • Jianzhong Li
    • 1
  • Hong Gao
    • 1
  1. 1.Harbin Institute of TechnologyChina

Personalised recommendations