Soc Web: Efficient Monitoring of Social Network Activities

  • Fotis Psallidas
  • Alexandros Ntoulas
  • Alex Delis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8181)


Although the extraction of facts and aggregated information from individual Online Social Networks (OSNs) has been extensively studied in the last few years, cross–social media–content examination has received limited attention. Such content examination involving multiple OSNs gains significance as a way to either help us verify unconfirmed-thus-far evidence or expand our understanding about occurring events. Driven by the emerging requirement that future applications shall engage multiple sources, we present the architecture of a distributed crawler which harnesses information from multiple OSNs. We demonstrate that contemporary OSNs feature similar, if not identical, baseline structures. To this end, we propose an extensible model termed SocWeb that articulates the essential structural elements of OSNs in wide use today. To accurately capture features required for cross-social media analyses, SocWeb exploits intra-connections and forms an “amalgamatedOSN. We introduce a flexible API that enables applications to effectively communicate with designated OSN providers and discuss key design choices for our distributed crawler. Our approach helps attain diverse qualitative and quantitative performance criteria including freshness of facts, scalability, quality of fetched data and robustness. We report on a cross-social media analysis compiled using our extensible SocWeb-based crawler in the presence of Facebook and Youtube.


Social Network Application Program Interface Online Social Network Object Type Link Type 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Asur, S., Huberman, B.A., Szabo, G., Wang, C.: Trends in Social Media. In: 5th Int. AAAI Conf. on Weblogs and Social Media, Barcelona, Spain (February 2011)Google Scholar
  2. 2.
    Backstrom, L., Huttenlocher, D., Kleinberg, J., Lan, X.: Group Formation in Large Social Networks: Membership, Growth, and Evolution . In: Proc. of the 12th ACM SIGKDD Conf., Philadelphia, PA (October 2006)Google Scholar
  3. 3.
    Bar-Yossef, Z., Berg, A., Chien, S., Fakcharoenphol, J., Weitz, D.: Approximating Aggregate Queries about Web Pages via Random Walks. In: Proc. of 26th Int. VLDB Conf., Seoul, Korea, pp. 535–544 (September 2006)Google Scholar
  4. 4.
    Becker, H., Iter, D., Naaman, M., Gravano, L.: Identifying Content for Planned Events across Social Media Sites. In: Proc. of 5th ACM Int. Conf. on WSDM, Seattle, WA (February 2012)Google Scholar
  5. 5.
    Budak, C., Agrawal, D., El Abbadi, A.: Structural Trend Analysis for Online Social Networks. Proc. of the VLDB Edowment 4(10), 646–656 (2011)Google Scholar
  6. 6.
    Catanese, S.A., De Meo, P., Ferrara, E., Fiumara, G., Provetti, A.: Crawling facebook for social network analysis purposes. In: Proc. of the Int. Conf. on Web Intelligence, Mining and Semantics (WIMS 2011), Songdal, Norway (May 2011)Google Scholar
  7. 7.
    Chau, D.H., Pandit, S., Wang, S., Faloutsos, C.: Parallel crawling for online social networks. In: Proc. of the 16th Int. Conf. on WWW, Banff, Canada, pp. 1283–1284 (May 2007)Google Scholar
  8. 8.
    Cho, J., Garcia-Molina, H.: Synchronizing a database to improve freshness. In: Proc. of the 2000 ACM SIGMOD Conf., Dallas, TX, pp. 117–128 (May 2000)Google Scholar
  9. 9.
    Cho, J., Garcia-Molina, H.: Parallel Crawlers. In: Proc. of the 11th Int. Conf. on WWW, Honolulu, HI, pp. 124–135 (May 2002)Google Scholar
  10. 10.
    Rundensteiner, E.A., Wang, D., Ellison, R.T.: Active Complex Event Processing Over Event Streams. Proc. of the VLDB Endow 4(10), 634–645 (2011)Google Scholar
  11. 11.
    Dou, W., Wang, K., Ribarsky, W., Zhou, M.: Event Detection in Social Media Data. In: IEEE VisWeek Workshop on Interactive Visual Text Analytics, Seattle, WA (October 2012)Google Scholar
  12. 12.
    Ali, M.H., et al.: Microsoft CEP Server and Online Behavioral Targeting. Proc. of the VLDB Endow. 2(2), 1558–1561 (2009)Google Scholar
  13. 13.
    Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A.: Walking in Facebook: A Case Study of Unbiased Sampling of OSNs. In: Proc. of the 29th INFOCOM Conf., San Diego, CA (March 2010)Google Scholar
  14. 14.
    Henzinger, M.R., Heydon, A., Mitzenmacher, M., Najork, M.: On Near-uniform URL Sampling. In: Proc. of the 9th Int WWW Conf., Amsterdam, The Netherlands (May 2000)Google Scholar
  15. 15.
    Ipeirotis, P.G., Agichtein, E., Jain, P., Gravano, L.: To search or to crawl?: Towards a query optimizer for text-centric tasks. In: Proc. of the ACM SIGMOD Cong., Chicago, IL, pp. 265–276 (June 2006)Google Scholar
  16. 16.
    Kahle, B.: Preserving the Internet. In: Scientific American. Nature Publishing Group (March 1997),
  17. 17.
    Leskovec, J., Lang, K.J., Mahoney, M.: Empirical Comparison of Algorithms for Network Community Detection. In: Proc. of the 19th Int. Conf. on WWW, Raleigh, NC, pp. 631–640 (April 2010)Google Scholar
  18. 18.
    Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefMATHGoogle Scholar
  19. 19.
    Naaman, M., Boase, J., Lai, C.-H.: Is It Really About Me?: Message Content in Social Awareness Streams. In: Proc. of ACM Conf. on Computer Supported Cooperative Work (CSCW 2010), Savannah, GA, pp. 189–192 (February 2010)Google Scholar
  20. 20.
    Ntoulas, A., Zerfos, P., Cho, J.: Downloading Textual Hidden Web Content Through Keyword Queries. In: Proc. of the 5th ACM/IEEE JCDL Conf., Denver, CO (June 2005)Google Scholar
  21. 21.
    Rabinovitch, M., Spatscheck, O.: Web Crawling and Replication. Addison Wesley (2001)Google Scholar
  22. 22.
    Punera, K., Chakrabarti, S., Subramanyam, M.: Accelerated focused crawling through online relevance feedback. In: Proc. of the 2002 ACM WWW Conf., Honolulu, Hawaii, USA, pp. 148–159 (2002)Google Scholar
  23. 23.
    Sadilek, A., Kautz, H., Bigham, J.P.: Finding your Friends and Following Them to Where You Are. In: Proc. of the 5th ACM Int. Conf. on WSDM, Seattle, WA, pp. 723–732 (February 2012)Google Scholar
  24. 24.
    Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors. In: Proc. of the 19th Int. Conf. on WWW, Raleigh, NC, pp. 851–860 (April 2010)Google Scholar
  25. 25.
    Shkapenyuk, V., Suel, T.: Design and Implementation of a High-performance Distributed Web Crawler. In: Proc. of the 18th IEEE ICDE Conf., San Jose, CA, pp. 357–368 (February 2002)Google Scholar
  26. 26.
    Wu, E., Diao, Y., Rizvi, S.: High-Performance Complex Event Processing Over Streams. In: Proc. of the 2006 ACM SIGMOD Conf., Chicago, IL, pp. 407–418 (June 2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Fotis Psallidas
    • 1
  • Alexandros Ntoulas
    • 2
    • 3
  • Alex Delis
    • 3
  1. 1.Columbia UniversityNew YorkUSA
  2. 2.ZyngaSan FranciscoGreece
  3. 3.Univ. of AthensAthensGreece

Personalised recommendations