Challenges in the Analysis of Online Social Networks: A Data Collection Tool Perspective


The present era of internet has radically changed the way people communicate with each other. Online Social Network platforms have enhanced this to real-time communication where interactions vary from casual relationships to formal bonding. This real-time communication between the users over Online Social Network platforms generates data which directly or indirectly gives lot of information. But extracting this data and mining information out of it is a profound challenge. Researchers need appropriate tools to churn out this data and get valuable information by analyzing and visualizing it. This paper does a comprehensive survey of types of Online Social Network Analysis resulting in segregation of research challenges associated with each of the types. A detailed study of the existing data collection tools and analysis techniques was further carried out to understand the challenges a researcher faces while using it. Finally, mapping analysis was done using research challenges, data collection tools and the types of Online Social Network Analysis, to understand to what extent the existing data collection tools and analysis techniques can meet the research challenges. The mapping analysis shows an absolute requirement of new data collection tools and algorithms by the researchers/developers.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2


  1. 1.

    Chen, Z., Kalashnikov, D. V., & Mehrotra, S. (2009, June). Exploiting context analysis for combining multiple entity resolution systems. In Proceedings of the 2009 ACM SIGMOD international conference on management of data (pp. 207–218). ACM.

  2. 2.

    Statista, Accessed December, 2015.

  3. 3.

    Wassaerman, S., & Faust, K. (1994). Social network analysis in the social and behavioural sciences. In Social network analysis: Methods and applications. Cambridge: Cambridge University Press.

  4. 4.

    Mislove, A., Marcon, M., Gummadi, K. P., Druschel, P., & Bhattacharjee, B. (2007, October). Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement (pp. 29–42). ACM.

  5. 5.

    Flake, G. W., Lawrence, S., Giles, C. L., & Coetzee, F. M. (2002). Self-organization and identification of web communities. Computer, 35(3), 66–70.

    Google Scholar 

  6. 6.

    Flake, G. W., Tarjan, R. E., & Tsioutsiouliklis, K. (2004). Graph clustering and minimum cut trees. Internet Mathematics, 1(4), 385–408.

    MathSciNet  MATH  Google Scholar 

  7. 7.

    Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821–7826.

    MathSciNet  MATH  Google Scholar 

  8. 8.

    Hopcroft, J., Khan, O., Kulis, B., & Selman, B. (2003, August). Natural communities in large linked networks. In Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 541–546). ACM.

  9. 9.

    Newman, M. E. (2004). Detecting community structure in networks. The European Physical Journal B-Condensed Matter and Complex Systems, 38(2), 321–330.

    Google Scholar 

  10. 10.

    Kaplan, A. M., & Haenlein, M. (2010). Users of the world, unite! The challenges and opportunities of Social Media. Business Horizons, 53(1), 59–68.

    Google Scholar 

  11. 11.

    Site of SEO Company, SEO Positive, Accessed December 31, 2014.

  12. 12.

    Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), 1–135.

    Google Scholar 

  13. 13.

    Asur, S., & Huberman, B. (2010). Predicting the future with social network. In Web intelligence and intelligent agent technology (WIIAT), 2010 IEEE/WIC/ACM international conference on (Vol. 1).

  14. 14.

    Bakshy, E., Hofman, J. M., Mason, W. A., & Watts, D. J. (2011, February). Identifying influencers on twitter. In Fourth ACM international conference on web search and data mining (WSDM).

  15. 15.

    Wen-ying, S. C., Hunt, Y. M., Beckjord, E. B., Moser, R. P., & Hesse, B. W. (2009). Social media use in the United States: Implications for health communication. Journal of Medical Internet Research, 11(4), e48.

    Google Scholar 

  16. 16.

    Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1–167.

    Google Scholar 

  17. 17.

    Shin, D. H., & Shin, Y. J. (2011). Why do people play social network games? Computers in Human Behavior, 27(2), 852–861.

    Google Scholar 

  18. 18.

    Blogger, Accessed December, 2015.

  19. 19., Accessed December, 2015.

  20. 20.

    Facebook, Accessed December, 2015.

  21. 21.

    Twitter, Accessed December, 2015.

  22. 22.

    LinkedIn, Accessed December, 2015.

  23. 23.

    YouTube, Accessed December, 2015.

  24. 24.

    Flikr, Accessed December, 2015.

  25. 25.

    Podcast Alley, Accessed December, 2015.

  26. 26.

    Digg, Accessed December, 2015.

  27. 27.

    Foursquare, Accessed December, 2015.

  28. 28.

    Google Groups, Accessed December 2015.

  29. 29.

    Yang, T. A., Kim, D. J., & Dhalwani, V. (2008). Social networking as a new trend in e-marketing. In Research and practical issues of enterprise information systems II (pp. 847–856). Springer US.

  30. 30.

    Karimzadehgan, M., Agrawal, M., & Zhai, C. (2009). Towards advertising on social networks. Information Retrieval and Advertising (IRA-2009), 28.

  31. 31.

    Huberman, B. A., Romero, D. M., & Wu, F. (2008). Social networks that matter: Twitter under the microscope. Available at SSRN 1313405.

  32. 32.

    Christakis, N. A., & Fowler, J. H. (2007). The spread of obesity in a large social network over 32 years. New England Journal of Medicine, 357(4), 370–379.

    Google Scholar 

  33. 33.

    Tracy, E. M., Kim, H., Brown, S., Min, M. O., Jun, M., & McCarty, C. (2012). Substance abuse treatment stage and personal social networks among women in substance abuse treatment. Journal of the Society for Social Work and Research, 3(2), 65–79.

    Google Scholar 

  34. 34.

    Wipfli, H. L., Fujimoto, K., & Valente, T. W. (2010). Global tobacco control diffusion: the case of the framework convention on tobacco control. American Journal of Public Health, 100(7), 1260–1266.

    Google Scholar 

  35. 35.

    Perliger, A., & Pedahzur, A. (2011). Social network analysis in the study of terrorism and political violence. PS: Political Science & Politics, 44(01), 45–50.

    Google Scholar 

  36. 36.

    Hewitt, A., & Forte, A. (2006). Crossing boundaries: Identity management and student/faculty relationships on the Facebook. Poster presented at CSCW, Banff, Alberta, 1–2.

  37. 37.

    Sjolander, C., & Ahlstrom, G. (2012). The meaning and validation of social support networks for close family of persons with advanced cancer. BMC Nursing, 11(1), 1.

    Google Scholar 

  38. 38.

    Dall’Asta, L., Marsili, M., & Pin, P. (2012). Collaboration in social networks. Proceedings of the National Academy of Sciences, 109(12), 4395–4400.

    Google Scholar 

  39. 39.

    Diesner, J., & Carley, K. M. (2005, April). Exploration of communication networks from the enron email corpus. In SIAM International Conference on Data Mining: Workshop on Link Analysis, Counterterrorism and Security, Newport Beach, CA.

  40. 40.

    Zuber, M. (2014). A survey of data mining techniques for social network analysis. International Journal of Research in Computer Engineering & Electronics, 3(6), 1–8.

    Google Scholar 

  41. 41.

    Shin, H., Byun, C., & Lee, H. (2015). The influence of social media: Twitter usage pattern during the 2014 super bowl game. Life, 10(3), 109–118.

    Google Scholar 

  42. 42.

    Ruhela, A., Tripathy, R. M., Triukose, S., Ardon, S., Bagchi, A., & Seth, A. (2011, December). Towards the use of online social networks for efficient internet content distribution. In Advanced networks and telecommunication systems (ANTS), 2011 IEEE 5th international conference on (pp. 1–6). IEEE.

  43. 43.

    Guille, A., Hacid, H., Favre, C., & Zighed, D. A. (2013). Information diffusion in online social networks: A survey. ACM SIGMOD Record, 42(2), 17–28.

    Google Scholar 

  44. 44.

    Edward M. Lazzarin, An overview of analysis of online social networks. Accessed January, 2015.

  45. 45.

    Baldi, P., Frasconi, P., & Smyth, P. (2003). Modeling the internet and the web—probabilistic methods and algorithms. Chichester, West Sussex: Wiley.

  46. 46.

    Barabási, A. L., Albert, R., & Jeong, H. (1999). The diameter of the world wide web. Nature, 401(6749), 130–131.

    Google Scholar 

  47. 47.

    Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., et al. (2000). Graph structure in the web. Computer Networks, 33(1), 309–320.

    Google Scholar 

  48. 48.

    Krapivsky, P. L., Redner, S., & Leyvraz, F. (2000). Connectivity of growing random networks. Physical Review Letters, 85(21), 4629.

    Google Scholar 

  49. 49.

    Dorogovtsev, S. N., Mendes, J. F. F., & Samukhin, A. N. (2000). Structure of growing networks with preferential linking. Physical Review Letters, 85(21), 4633.

    Google Scholar 

  50. 50.

    Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications (Vol. 8). Cambridge: Cambridge University Press.

    Google Scholar 

  51. 51.

    Freeman, L. C. (1977). A set of measures of centrality based on betweenness. Sociometry, 40(1), 35–41.

    Google Scholar 

  52. 52.

    de Sola Pool, I., & Kochen, M. (1979). Contacts and influence. Social Networks, 1(1), 5–51.

    MathSciNet  Google Scholar 

  53. 53.

    Milgram, S. (1967). The small world problem. Psychology Today, 2(1), 60–67.

    Google Scholar 

  54. 54.

    Strogatz, S. H. (2001). Exploring complex networks. Nature, 410(6825), 268–276.

    MATH  Google Scholar 

  55. 55.

    Amaral, L. A. N., Scala, A., Barthelemy, M., & Stanley, H. E. (2000). Classes of small-world networks. Proceedings of the National Academy of Sciences, 97(21), 11149–11152.

    Google Scholar 

  56. 56.

    Leskovec, J., & Horvitz, E. (2008, April). Planetary-scale views on a large instant-messaging network. In Proceedings of the 17th international conference on World Wide Web (pp. 915–924). ACM.

  57. 57.

    Cha, M., Mislove, A., Adams, B., & Gummadi, K. P. (2008, August). Characterizing social cascades in flickr. In Proceedings of the first workshop on Online social networks (pp. 13–18). ACM.

  58. 58.

    Ediger, D., Jiang, K., Riedy, J., Bader, D. A., Corley, C., Farber, R., & Reynolds, W. N. (2010, September). Massive social network analysis: Mining twitter for social good. In Parallel Processing (ICPP), 2010 39th International Conference on (pp. 583–593). IEEE.

  59. 59.

    Weisstein, E. W. “Weakly Connected Component.” From MathWorld—A Wolfram Web Resource.

  60. 60.

    Myers, S. A., Sharma, A., Gupta, P., & Lin, J. (2014, April). Information network or social network? The structure of the twitter follow graph. In Proceedings of the companion publication of the 23rd international conference on World wide web companion (pp. 493–498). International World Wide Web Conferences Steering Committee.

  61. 61.

    Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’networks. Nature, 393(6684), 440–442.

    MATH  Google Scholar 

  62. 62.

    Newman, M. E., Strogatz, S. H., & Watts, D. J. (2001). Random graphs with arbitrary degree distributions and their applications. Physical Review E, 64(2), 026118.

    Google Scholar 

  63. 63.

    Li, L., Alderson, D., Doyle, J. C., & Willinger, W. (2005). Towards a theory of scale-free graphs: Definition, properties, and implications. Internet Mathematics, 2(4), 431–523.

    MathSciNet  MATH  Google Scholar 

  64. 64.

    Garriss, S., Kaminsky, M., Freedman, M. J., Karp, B., Mazières, D., & Yu, H. (2006, May). RE: Reliable Email. In NSDI (Vol. 6, pp. 22–22).

  65. 65.

    Mislove, A., Gummadi, K. P., & Druschel, P. (2006, November). Exploiting social networks for internet search. In 5th Workshop on Hot Topics in Networks (HotNets06). Citeseer (p. 79).

  66. 66.

    Yu, H., Kaminsky, M., Gibbons, P. B., & Flaxman, A. (2006). Sybilguard: defending against sybil attacks via social networks. ACM SIGCOMM Computer Communication Review, 36(4), 267–278.

    Google Scholar 

  67. 67.

    Krishnamurthy, B. (2009, January). A measure of online social networks. In Communication systems and networks and workshops, 2009. COMSNETS 2009. First international (pp. 1–10). IEEE.

  68. 68.

    Golder, S. A., Wilkinson, D. M., & Huberman, B. A. (2007). Rhythms of social interaction: Messaging within a massive online network. In Communities and technologies 2007 (pp. 41–66). Springer London.

  69. 69.

    Some, R. (2013). A survey on social network analysis and its future trends. International Journal of Advanced Research in Computer and Communication Engineering, 2(6), 2403–2405.

    Google Scholar 

  70. 70.

    Ting, I. (2008, June). Web mining techniques for on-line social networks analysis. In Service Systems and Service Management, 2008 international conference on (pp. 1–5). IEEE.

  71. 71.

    Getoor, L., & Diehl, C. P. (2005). Link mining: A survey. ACM SIGKDD Explorations Newsletter, 7(2), 3–12.

    Google Scholar 

  72. 72.

    Zhang, M. (2009, January). Exploring adolescent peer relationships online and offline: an empirical and social network analysis. In Communications and mobile computing, 2009. CMC’09. WRI international conference on (Vol. 3, pp. 268–272). IEEE.

  73. 73.

    Zhu, M., Liu, W., Hu, W., & Fang, Z. (2009, December). Social Network Analysis in IT Company. In 2009 International conference on e-learning, E-business, enterprise information systems, and E-government (pp. 305–307). IEEE.

  74. 74.

    Yusof, N., & Rahman, A. A. (2009, November). Analyzing online asynchronous discussion using content and social network analysis. In Intelligent Systems Design and Applications, 2009. ISDA’09. Ninth International Conference on (pp. 872–877). IEEE.

  75. 75.

    Guber, T. (1993). A translational approach to portable ontologies. Knowledge Acquisition, 5(2), 199–229.

    Google Scholar 

  76. 76.

    Antoniou, G., & Van Harmelen, F. (2004). A semantic web primer. Cambridge: MIT Press.

    Google Scholar 

  77. 77.

    Wennerberg, P. O. (2005). Ontology based knowledge discovery in Social Networks. Final Report, JRC Joint Research Center, 1–34.

  78. 78. Social network analysis software and services for organizations and their consultants

  79. 79.

    Freeman, L. C. (2004). The development of social network analysis: A study in the sociology of science. Canada: Empirical Press.

  80. 80.

    Hoser, B., Hotho, A., Jäschke, R., Schmitz, C., & Stumme, G. (2006). Semantic network analysis of ontologies (pp. 514–529). Berlin, Heidelberg: Springer.

    Google Scholar 

  81. 81.

    Fox, S., Karnawat, K., Mydland, M., Dumais, S., & White, T. (2005). Evaluating implicit measures to improve web search. ACM Transactions on Information Systems (TOIS), 23(2), 147–168.

    Google Scholar 

  82. 82.

    Joachims, T. (2002, July). Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 133–142). ACM.

  83. 83.

    Joachims, T., Granka, L., Pan, B., Hembrooke, H., & Gay, G. (2005, August). Accurately interpreting clickthrough data as implicit feedback. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 154–161). ACM.

  84. 84.

    Xue, G. R., Zeng, H. J., Chen, Z., Yu, Y., Ma, W. Y., Xi, W., & Fan, W. (2004, November). Optimizing web search using web click-through data. In Proceedings of the thirteenth ACM international conference on Information and knowledge management (pp. 118–126). ACM.

  85. 85.

    Cooley, R., Mobasher, B., & Srivastava, J. (1997, November). Web mining: Information and pattern discovery on the world wide web. In Tools with Artificial Intelligence, 1997. Proceedings., Ninth IEEE International Conference on (pp. 558–567). IEEE.

  86. 86.

    Hand, D. J., Mannila, H., & Smyth, P. (2001). Principles of data mining. Cambridge: MIT Press.

    Google Scholar 

  87. 87.

    Chakrabarti, S. (2003). Mining the web: Discovering knowledge from hypertext data. San Francisco: Morgan Kaufmann Publishers.

    Google Scholar 

  88. 88.

    Chen, H., & Chau, M. (2004). Web mining: Machine learning for web applications. Annual Review of Information Science and Technology (ARIST), 38, 289–329.

    Google Scholar 

  89. 89.

    Desikan, P., Srivastava, J., Kumar, V., Tan, P.N. (2002). Hyperlink Analysis: Techniques and Applications, Technical Report (TR 2002-0152), Army High Performance Computing Center.

  90. 90.

    Faca, F. M., & Lanzi, P. L. (2005). Mining interesting knowledge from weblogs: A survey. Data Knowledge Engineering, 53(3), 225–241.

    Google Scholar 

  91. 91.

    Pal, S., Talwar, V., & Mitra, P. (2002). Web mining in soft computing framework: Relevance, state of the art and future directions. IEEE Transactions on Neural Networks, 13(5), 1163–1177.

    Google Scholar 

  92. 92.

    Srivastava, J., Cooley, R., Deshpande, M., & Tan, P. (2000). Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations, 1, 12–23.

    Google Scholar 

  93. 93.

    Garg, A. K., Amir, M., Jarrar Ahmed, M. S., & Bansal, S. (2014). Implementation of a Search Engine. International Journal of Science and Research (IJSR) ISSN (Online), 3(4), 2319–7064.

    Google Scholar 

  94. 94.

    Srivastava, J. “Web Mining: Accomplishments & Future Directions”, University of Minnesota USA,,

  95. 95.

    Mr. Dushyant B. Rathod, Dr. Samrat Khanna, “A Review on Emerging Trends of Web Mining and its Applications” ISSN: 2321-9939.

  96. 96.

    Sona, J. S., & Ambhaikar, A. (2014). A reconciling website system to enhance efficiency with web mining techniques. International Journal of Scientific and Engineering Research, 3(2), 498–500.

    Google Scholar 

  97. 97.

    Sandhya., Chaturvedi, M. (2013). A survey on web mining algorithms. The International Journal Of Engineering And Science (IIJES), 2(3), 25–30.

  98. 98.

    Zhang, Y., Yu, J. X., & Hou, J. (2005). Web communities: Analysis and construction. Berlin: Springer.

    Google Scholar 

  99. 99.

    Chen, F., Deng, P., Wan, J., Zhang, D., Vasilakos, A. V., & Rong, X. (2015). Data mining for the internet of things: Literature review and challenges. International Journal of Distributed Sensor Networks, 2015, 12.

    Google Scholar 

  100. 100.

    Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys (CSUR), 31(3), 264–323.

    Google Scholar 

  101. 101.

    Tseng, B. L., Tatemura, J., & Wu, Y. (2005, May). Tomographic clustering to visualize blog communities as mountain views. In WWW 2005 Workshop on the weblogging ecosystem.

  102. 102.

    Godbole, N., Srinivasaiah, M., & Skiena, S. (2007). Large-scale sentiment analysis for news and blogs. ICWSM, 7(21), 219–222.

    Google Scholar 

  103. 103.

    Mika, P. (2005). Flink: Semantic web technology for the extraction and analysis of social networks. Web Semantics: Science, Services and Agents on the World Wide Web, 3(2), 211–223.

    Google Scholar 

  104. 104.

    Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive algorithms and representations for text categorization. In Proceedings of the Seventh ACM International Conference on Information and Knowledge Management (pp. 148–155).

  105. 105.

    Frank, E., Trigg, L. E., Holmes, G., & Witten, I. H. (1998). Naive Bayes for regression. Machine Learning, 41(1), 5–25.

    Google Scholar 

  106. 106.

    Feldman, R., & Dagan, I, (1995). Knowledge discovery in textual databases (kdt). In The proceeding of the first international conference on knowledge discovery and data mining (KDD-95).

  107. 107.

    Freitag, D., & McCallum, A. (1999, July). Information extraction with HMMs and shrinkage. In Proceedings of the AAAI-99 workshop on machine learning for information extraction (pp. 31–36).

  108. 108.

    Pierrakos, D., Paliouras, G., Papatheodorou, C., & Spyropoulos, C. D. (2003). Web usage mining as a tool for personalization: A survey. User Modeling and User-Adapted Interaction, 13(4), 311–372.

    Google Scholar 

  109. 109.

    Lento, T., Welser, H. T., Gu, L., & Smith, M. (2006, May). The ties that blog: Examining the relationship between social ties and continued participation in the wallop weblogging system. In 3rd Annual Workshop on the Weblogging ecosystem (Vol. 12).

  110. 110.

    Patil, U. M., & Patil, J. B. (2012, August). Web data mining trends and techniques. In Proceedings of the International Conference on Advances in Computing, Communications and Informatics (pp. 961–965). ACM.

  111. 111.

    Ting, I. H., & Wu, H. J. (2009). Web mining techniques for on-line social networks analysis: An overview. In Web Mining Applications in E-commerce and E-services (pp. 169–179). Springer Berlin Heidelberg.

  112. 112.

    Nina, S. P., Rahaman, M., Bhuiyan, K., and Khandakar E. (2009). Pattern Discovery Of Web Usage Mining, International Conference On Computer Technology and Development, Vol. 1.

  113. 113.

    Kosala, R., & Blockeel, H. (2000). Web mining research: A survey. ACM SIGKDD Explorations Newsletter, 2(1), 1–15.

    Google Scholar 

  114. 114.

    Büchner, A. G., & Mulvenna, M. D. (1998). Discovering internet marketing intelligence through online analytical web usage mining. ACM Sigmod Record, 27(4), 54–61.

    Google Scholar 

  115. 115.

    Raju, E., & Sravanthi, K. (2012). Analysis of social networks using the techniques of web mining. International Journal of Advanced Research in Computer Science and Software Engineering, 2(10), 5.

    Google Scholar 

  116. 116.

    Goodreau, S. M. (2007). Advances in exponential random graph (p*) models applied to a large social network. Social Networks, 29(2), 231–248.

    Google Scholar 

  117. 117.

    Kolari, P., & Joshi, A. (2004). Web mining: Research and practice. Computing in Science & Engineering, 6(4), 49–53.

    Google Scholar 

  118. 118.

    Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46(5), 604–632.

    MathSciNet  MATH  Google Scholar 

  119. 119.

    Biswal, B. (2008). Web site optimization through mining user navigational patterns, web engineering and application. New Delhi: Narosa Publishing House.

    Google Scholar 

  120. 120.

    Li, F. (2008). Extracting structure of web site based on hyperlink analysis, fourth international conference on wireless communication. Networking and Mobile Computing, 1–4.

  121. 121.

    Fang, X., & Sheng, O. (2004). LinkSelector: Web mining approach to hyperlink selection for web portals. ACM Transactions on Internet Technology, 4(2), 209–237.

    Google Scholar 

  122. 122.

    Brin, S., & Page, L. (2012). Reprint of: The anatomy of a large-scale hypertextual web search engine. Computer Networks, 56(18), 3825–3833.

    Google Scholar 

  123. 123.

    Bharat, K., & Henzinger, M. R. (1998, August). Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 104–111). ACM.

  124. 124.

    Mladenic, D., Grobelnik, M. (1999). Predicting content from hyperlinks. In Proceedings of the 16th International ICML99 Workshop on Machine Learning in Text Data Analysis (pp. 109–113).

  125. 125.

    Berendt, B. (2002). Using site semantic to analyze, visualize and support navigation. Data Mining and Knowledge Discovery, 6, 37–59.

    MathSciNet  Google Scholar 

  126. 126.

    Dai, H. Mobasher, B. (2003). A road map to more effective Web personalization; Integrating domain knowledge with Web usage mining. In Proceedings of the International Conference on Internet Computing (IC 2003), Las Vegas, Nevada.

  127. 127.

    Oberle, D., Berendt, B., Hotho, A., Gonzalez, J. (2003). Conceptual user tracking. Lecture notes on artificial intelligence (Vol. 2663, pp. 155–164).

  128. 128.

    Spiliopoulou, M., & Pohle, C. (2001). Data mining for measuring and improving the success of Web sites. Data Mining and Knowledge Discover, 5(1–2), 85–114.

    MATH  Google Scholar 

  129. 129.

    Mishne, G. (2007, March). Using blog properties to improve retrieval. In ICWSM.

  130. 130.

    Jalali, M., Mustapha, N., Sulaiman, M. N., & Mamat, A. (2010). WebPUM: A Web-based recommendation system to predict user future movements. Expert Systems with Applications, 37(9), 6201–6212.

    Google Scholar 

  131. 131.

    Yu, J. X., Ou, Y., Zhang, C., & Zhang, S. (2005). Identifying interesting visitors through web log classification. IEEE Intelligent Systems, 20(3), 55–59.

    Google Scholar 

  132. 132.

    Bommepally, K., Glisa T.K., Prakash, J. J., Singh, R., and Murthy, H. A. (2010). Internet Activity Analysis through Proxy Log, National Conference on Communications (NCC), Chennai, India.

  133. 133.

    Suneetha, K. R., & Krishnamoorthi, R. (2010). Classification of web log data to identify interested user using decision trees. In Proceedings of the International Conference on Computing Communications and Information Technology Applications.

  134. 134.

    Bai, S., Han, Q., Liu, Q., & Gao, Z. (2009). Research of an algorithm based on web usage mining. In IEEE International Workshop on Intelligent Systems and Applications (pp. 1–4).

  135. 135.

    Lappas, G. (2011, July). From web mining to social multimedia mining. In Advances in Social Networks Analysis and Mining (ASONAM), 2011 International Conference on (pp. 336–343). IEEE.

  136. 136.

    Feldman, R. (2002). Link analysis: Current state of the art. In Tutorial at the KDD-02.

  137. 137.

    Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. In Technical Report. Stanford, CA: Stanford University.

  138. 138.

    Freeman, L. C. (1978). Centrality in social networks conceptual clarification. Social Networks, 1(3), 215–239.

    Google Scholar 

  139. 139.

    Bonacich, P. (1987). Power and centrality: A family of measures. American Journal of Sociology, 92, 1170–1182.

    Google Scholar 

  140. 140.

    O’Madadhain, J., Hutchins, J., & Smyth, P. (2005). Prediction and ranking algorithms for event-based network data. ACM SIGKDD Explorations Newsletter, 7(2), 23–30.

    Google Scholar 

  141. 141.

    O’Madadhain, J., & Smyth, P. (2005, August). EventRank: A framework for ranking time-varying networks. In Proceedings of the 3rd international workshop on Link discovery (pp. 9–16). ACM.

  142. 142.

    Oh, H. J., Myaeng, S. H., & Lee, M. H. (2000, July). A practical hypertext catergorization method using links and incrementally available class information. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval (pp. 264–271). ACM.

  143. 143.

    Chakrabarti, S., Dom, B., & Indyk, P. (1998, June). Enhanced hypertext categorization using hyperlinks. In ACM SIGMOD record (Vol. 27, No. 2, pp. 307–318). ACM.

  144. 144.

    Lafferty, J., McCallum, A., & Pereira, F. C. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML (pp. 282–289).

  145. 145.

    Neville, J., & Jensen, D. (2000, July). Iterative classification in relational data. In Proceedings of the AAAI-2000 workshop on learning statistical models from relational data (pp. 13–20).

  146. 146.

    Lu, Q., & Getoor, L. (2003, August). Link-based classification. In ICML (Vol. 3, pp. 496–503).

  147. 147.

    Dzeroski, S., & Lavrac, N. (1993). Inductive logic programming: Techniques and applications. New York: Routledge.

  148. 148.

    Bach, F. R., & Jordan, M. I. (2004). Learning spectral clustering. In Advances in neural information processing systems (pp. 305–312).

  149. 149.

    Tyler, J. R., Wilkinson, D. M., & Huberman, B. A. (2005). E-mail as spectroscopy: Automated discovery of community structure within organizations. The Information Society, 21(2), 143–153.

    Google Scholar 

  150. 150.

    Nowicki, K., & Snijders, T. A. B. (2001). Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96(455), 1077–1087.

    MathSciNet  MATH  Google Scholar 

  151. 151.

    Ananthakrishna, R., Chaudhuri, S., & Ganti, V. (2002, August). Eliminating fuzzy duplicates in data warehouses. In Proceedings of the 28th international conference on Very Large Data Bases (pp. 586–597). VLDB Endowment.

  152. 152.

    Kalashnikov, D. V., Mehrotra, S., & Chen, Z. (2005, April). Exploiting relationships for domain-independent data cleaning. In SDM (pp. 262–273).

  153. 153.

    Bhattacharya, I., & Getoor, L. (2004, June). Iterative record linkage for cleaning and integration. In Proceedings of the 9th ACM SIGMOD workshop on research issues in data mining and knowledge discovery (pp. 11–18). ACM.

  154. 154.

    Dong, X., Halevy, A., & Madhavan, J. (2005, June). Reference reconciliation in complex information spaces. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data (pp. 85-96). ACM.

  155. 155.

    Li, X., Morie, P., & Roth, D. (2005). Semantic integration in text: From ambiguous names to identifiable entities. AI Magazine, 26(1), 45.

    Google Scholar 

  156. 156.

    Domingos, P. (2004). Multi-relational record linkage. In Proceedings of the KDD-2004 workshop on multi-relational data mining.

  157. 157.

    Pasula, H., Marthi, B., Milch, B., Russell, S., & Shpitser, I. (2002). Identity uncertainty and citation matching. In Advances in neural information processing systems (pp. 1401–1408).

  158. 158.

    Culotta, A., & McCallum, A. (2005, October). Joint deduplication of multiple record types in relational data. In Proceedings of the 14th ACM international conference on Information and knowledge management (pp. 257–258). ACM.

  159. 159.

    Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(1), 993–1022.

    MATH  Google Scholar 

  160. 160.

    Gupta, N., & Singh, A. (2014, December). A novel strategy for link prediction in social networks. In Proceedings of the 2014 CoNEXT on student workshop (pp. 12–14). ACM.

  161. 161.

    Al Hasan, M., & Zaki, M. J. (2011). A survey of link prediction in social networks. In Social network data analytics (pp. 243–275). Springer US.

  162. 162.

    Liben-Nowell, David, & Kleinberg, Jon. (2007). The link prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7), 1019–1031.

    Google Scholar 

  163. 163.

    Albert, R., & Barabási, A. L. (2002). Statistical mechanics of complex networks. Reviews of Modern Physics, 74(1), 47.

    MathSciNet  MATH  Google Scholar 

  164. 164.

    Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3), 211–230.

    Google Scholar 

  165. 165.

    Bliss, C. A., Frank, M. R., Danforth, C. M., & Dodds, P. S. (2014). An evolutionary algorithm approach to link prediction in dynamic social networks. Journal of Computational Science, 5(5), 750–764.

    MathSciNet  Google Scholar 

  166. 166.

    Chebotarev, P., & Shamis, E. (2006). The matrix-forest theorem and measuring relations in small social groups. arXiv preprint math/0602070.

  167. 167.

    Fire, M., Tenenboim, L., Lesser, O., Puzis, R., Rokach, L., & Elovici, Y. (2011, October). Link prediction in social networks using computationally efficient topological features. In Privacy, security, risk and trust (PASSAT) and 2011 IEEE third international conference on social computing (SocialCom), 2011 IEEE third international conference on (pp. 73–80). IEEE.

  168. 168.

    Fire, M., Tenenboim-Chekina, L., Puzis, R., Lesser, O., Rokach, L., & Elovici, Y. (2013). Computationally efficient link prediction in a variety of social networks. ACM Transactions on Intelligent Systems and Technology (TIST), 5(1), 10.

    Google Scholar 

  169. 169.

    Popescul, A., & Ungar, L. H. (2003, August). Statistical relational learning for link prediction. In IJCAI workshop on learning statistical models from relational data (Vol. 2003).

  170. 170.

    O’Madadhain, J., Smyth, P., & Adamic, L. (2005, February). Learning predictive models for link formation. In International sunbelt social network conference.

  171. 171.

    Getoor, L. (2003). Link mining: A new data mining challenge. ACM SIGKDD Explorations Newsletter, 5(1), 84–89.

    Google Scholar 

  172. 172.

    Rattigan, M. J., & Jensen, D. (2005). The case for anomalous link discovery. ACM SIGKDD Explorations Newsletter, 7(2), 41–47.

    Google Scholar 

  173. 173.

    Chellappa, R., & Jain, A. (1993). Markov random fields. Theory and application. Boston: Academic Press, 1993, edited by Chellappa, Rama; Jain, Anil, 1.

  174. 174.

    Taskar, B., Wong, M. F., Abbeel, P., & Koller, D. (2003). Link prediction in relational data. In Advances in neural information processing systems.

  175. 175.

    Domingos, P., & Richardson, M. (2004). Markov logic: A unifying framework for statistical relational learning. In ICML-2004 Workshop on Statistical Relational Learning (Vol. 1, pp. 49–54).

  176. 176.

    Alavijeh, Z. Z. (2015). The application of link mining in social network analysis. Advances in Computer Science: An International Journal, 4(3), 64–69.

    Google Scholar 

  177. 177.

    Inokuchi, A., Washio, T., & Motoda, H. (2000). An apriori-based algorithm for mining frequent substructures from graph data. In Principles of data mining and knowledge discovery (pp. 13–23). Springer Berlin Heidelberg.

  178. 178.

    Kuramochi, M., & Karypis, G. (2001). Frequent subgraph discovery. In Data Mining, 2001. ICDM 2001, Proceedings IEEE international conference on (pp. 313–320). IEEE.

  179. 179.

    Yan, X., & Han, J. (2002). gspan: Graph-based substructure pattern mining. In Data mining, 2002. ICDM 2003. Proceedings. 2002 IEEE international conference on (pp. 721–724). IEEE.

  180. 180.

    Agrawal, R., & Srikant, R. (1994, September). Fast algorithms for mining association rules. In Proceedings of the 20th international conference very large data bases, VLDB (Vol. 1215, pp. 487–499).

  181. 181.

    Otero, R., & Tamaddoni-Nezhad, A. (1992). In S. Muggleton (Ed.), Inductive logic programming (Vol. 38, pp. 281–298). London: Academic Press.

  182. 182.

    Matsuda, T., Horiuchi, T., Motoda, H., & Washio, T. (2000). Extension of graph-based induction for general graph structured data. In Knowledge discovery and data mining. Current issues and new applications (pp. 420–431). Springer Berlin Heidelberg.

  183. 183.

    Cook, D. J., & Holder, L. B. (1994). Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research, 1, 231–255.

    Google Scholar 

  184. 184.

    Holder, L. B., & Cook, D. J. (2009). Graph-based data mining. Encyclopedia of Data Warehousing and Mining, 2, 943–949.

    Google Scholar 

  185. 185.

    Yoshida, K., Motoda, H., & Indurkhya, N. (1994). Graph-based induction as a unified learning framework. Applied Intelligence, 4(3), 297–316.

    Google Scholar 

  186. 186.

    King, R. D., Muggleton, S. H., Srinivasan, A., & Sternberg, M. J. (1996). Structure-activity relationships derived by machine learning: The use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. Proceedings of the National Academy of Sciences, 93(1), 438–442.

    Google Scholar 

  187. 187.

    Gärtner, T., Driessens, K., & Ramon, J. (2002). Exponential and geometric kernels for graphs. In NIPS workshop on unreal data: Principles of modeling nonvectorial Data (Vol. 5, pp. 49–58).

  188. 188.

    Kashima, H., & Inokuchi, A. (2002, July). Kernels for graph classification. In ICDM workshop on active mining (Vol. 2002).

  189. 189.

    Yin, H., Wong, S., Xu, J., & Wong, C. K. (2002). Urban traffic flow prediction using a fuzzy-neural approach. Transportation Research Part C: Emerging Technologies, 10(2), 85–98.

    Google Scholar 

  190. 190.

    Kazienko, P., Musiał, K., & Kajdanowicz, T. (2011). Multidimensional social network in the social recommender system. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 41(4), 746–759.

    Google Scholar 

  191. 191.

    Kunegis, J., Lommatzsch, A., & Bauckhage, C. (2009, April). The slashdot zoo: Mining a social network with negative edges. In Proceedings of the 18th international conference on World wide web (pp. 741–750). ACM.

  192. 192.

    Zhang, Z. K., Zhou, T., & Zhang, Y. C. (2010). Personalized recommendation via integrated diffusion on user–item–tag tripartite graphs. Physica A: Statistical Mechanics and its Applications, 389(1), 179–186.

    MathSciNet  Google Scholar 

  193. 193.

    Carnegie, J. K., Kubica, J., Moore, A., & Schneider, J. (2003). Tractable Group Detection on Large Link Data Sets. In The third IEEE international conference on data mining.

  194. 194.

    Kubica, J., Moore, A., Schneider, J., & Yang, Y. (2002, July). Stochastic link and group detection. In Proceedings of the national conference on artificial intelligence (pp. 798–806). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.

  195. 195.

    Adibi, J., Chalupsky, H., Melz, E., & Valente, A. (2004, July). The KOJAK group finder: Connecting the dots via integrated knowledge-based and statistical reasoning. In Proceedings of the national conference on artificial intelligence (pp. 800–807). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.

  196. 196.

    Wang, X., Mohanty, N., & McCallum, A. (2005, August). Group and topic discovery from relations and text. In Proceedings of the 3rd international workshop on Link discovery (pp. 28–35). ACM.

  197. 197.

    Carpenter, T., Karakostas, G., & Shallcross, D. (2002). Practical issues and algorithms for analyzing terrorist networks. In Proceedings of the western simulation multiconference.

  198. 198.

    Huang, Z., & Lin, D. K. (2009). The time-series link prediction problem with applications in communication surveillance. INFORMS Journal on Computing, 21(2), 286–303.

    Google Scholar 

  199. 199.

    Scripps, J., Nussbaum, R., Tan, P. N., & Esfahanian, A. H. (2011). Link-based network mining. In Structural analysis of complex networks (pp. 403–419). Boston: Birkhäuser.

  200. 200.

    Basuchowdhuri, P., & Chen, J. (2010, August). Detecting communities using social ties. In Granular Computing (GrC), 2010 IEEE International Conference on (pp. 55-60). IEEE.

  201. 201.

    Sugimoto, C., Hank, C., Bowman, T., & Pomerantz, J. (2015). Friend or faculty: Social networking sites, dual relationships, and context collapse in higher education. First Monday. doi:10.5210/fm.v20i3.5387.

    Article  Google Scholar 

  202. 202.

    Scott, J., & Carrington, P. J. (2011). The SAGE handbook of social network analysis. Thousand Oaks: SAGE Publications.

    Google Scholar 

  203. 203.

    Laumann, E. O., Marsden, P. V., & Prensky, D. (1989). The boundary specification problem in network analysis. Research Methods in Social Network Analysis, 61, 87.

    Google Scholar 

  204. 204.

    Borgatti, S. P., Mehra, A., Brass, D. J., & Labianca, G. (2009). Network analysis in the social sciences. Science, 323(5916), 892–895.

    Google Scholar 

  205. 205.

    Yu, S., & Kak, S. (2012). A survey of prediction using social media. arXiv preprint arXiv:1203.1647.

  206. 206.

    Kemp, C., Griffiths, T. L., & Tenenbaum, J. B. (2004). Discovering latent classes in relational data. In Technical Report AI Memo 2004-019. MIT.

  207. 207.

    Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T., & Ueda, N. (2006, July). Learning systems of concepts with an infinite relational model. In AAAI (Vol. 3, p. 5).

  208. 208.

    Kurihara, K., Kameya, Y., & Sato, T. (2006). A frequency-based stochastic blockmodel. Bernoulli (R (e1, e2), 1(1), N2.

  209. 209.

    Airoldi, E. M., Blei, D. M., Fienberg, S. E., & Xing, E. P. (2006). Stochastic block models of mixed membership. Bayesian Analysis, 1(1), 1–23.

    MathSciNet  Google Scholar 

  210. 210.

    De Laat, M. (2002, January). Network and content analysis in an online community discourse. In Proceedings of the conference on computer support for collaborative learning: Foundations for a CSCL community (pp. 625–626). International Society of the Learning Sciences.

  211. 211.

    Lorrain, F., & White, H. C. (1971). Structural equivalence of individuals in social networks. The Journal of Mathematical Sociology, 1(1), 49–80.

    Google Scholar 

  212. 212.

    Wolfe, A. P., & Jensen, D. (2004). Playing multiple roles: Discovering overlapping roles in social networks. In ICML-04 workshop on statistical relational learning and its connections to other fields (p. 75).

  213. 213.

    Choudhary, P., & Singh, U. (2015). A survey on social network analysis for counter-terrorism. International Journal of Computer Applications, 112(9), 24–29.

    Google Scholar 

  214. 214.

    Campbell, W. M., Dagli, C. K., & Weinstein, C. J. (2013). Social network analysis with content and graphs. Lincoln Laboratory Journal, 20(1), 61–81.

    Google Scholar 

  215. 215.

    Youtube. Accessed September 1, 2014.

  216. 216.

    Flickr, Accessed September 1, 2014.

  217. 217.

    By the Numbers: 400 Amazing Facebook  Statistics and Facts. Accessed September 1, 2014.

  218. 218.

    Statista, Accessed September 1, 2014.

  219. 219.

    400 Amazing Twitter Statistics and Facts, Accessed September 1, 2014.

  220. 220.

    Szabo, G., & Huberman, B. A. (2010). Predicting the popularity of online content. Communications of the ACM, 53(8), 80–88.

    Google Scholar 

  221. 221.

    Lerman, K., & Galstyan, A. (2008, August). Analysis of social voting patterns on digg. In Proceedings of the first workshop on Online social networks (pp. 7–12). ACM.

  222. 222.

    Fiebert, M. S., Aliee, A., Yassami, H., & Dorethy, M. D. (2014). The life cycle of a facebook post. The Open Psychology Journal, 7(1), 18–19.

    Google Scholar 

  223. 223.

    Do, T. M. T., & Gatica-Perez, D. (2013). Human interaction discovery in smartphone proximity networks. Personal and Ubiquitous Computing, 17(3), 413–431.

    Google Scholar 

  224. 224.

    Olguın, D. O., Gloor, P. A., & Pentland, A. S. (2009). Capturing individual and group behavior with wearable sensors. In Proceedings of the 2009 aaai spring symposium on human behavior modeling, SSS (Vol. 9).

  225. 225.

    Weinstein, C., Campbell, W., Delaney, B., & O’Leary, G. (2009, March). Modeling and detection techniques for counter-terror social network analysis and intent recognition. In Aerospace conference, 2009 IEEE (pp. 1–16). IEEE.

  226. 226.

    Olguín, D. O., Waber, B. N., Kim, T., Mohan, A., Ara, K., & Pentland, A. (2009). Sensible organizations: Technology and methodology for automatically measuring organizational behavior. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 39(1), 43–55.

    Google Scholar 

  227. 227.

    Jansen, B. J., Zhang, M., Sobel, K., & Chowdury, A. (2009). Twitter power: Tweets as electronic word of mouth. Journal of the American Society for Information Science and Technology, 60(11), 2169–2188.

    Google Scholar 

  228. 228.

    Ghiassi, M., Skinner, J., & Zimbra, D. (2013). Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network. Expert Systems with Applications, 40(16), 6266–6282.

    Google Scholar 

  229. 229.

    Salganik, M. J., Dodds, P. S., & Watts, D. J. (2006). Experimental study of inequality and unpredictability in an artificial cultural market. Science, 311(5762), 854–856.

    Google Scholar 

  230. 230.

    Lansdall-Welfare, T., Lampos, V., & Cristianini, N. (2012, April). Effects of the Recession on Public Mood in the UK. In Proceedings of the 21st international conference companion on World Wide Web (pp. 1221–1226). ACM.

  231. 231.

    Tumasjan, A., Sprenger, T. O., Sandner, P. G., & Welpe, I. M. (2010). Predicting elections with twitter: What 140 characters reveal about political sentiment. ICWSM, 10, 178–185.

    Google Scholar 

  232. 232.

    Culotta, A. (2010, July). Towards detecting influenza epidemics by analyzing Twitter messages. In Proceedings of the first workshop on social media analytics (pp. 115–122). ACM.

  233. 233.

    Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1–8.

    Google Scholar 

  234. 234.

    Shamma, D. A., Kennedy, L., & Churchill, E. F. (2011, March). Peaks and persistence: modeling the shape of microblog conversations. In Proceedings of the ACM 2011 conference on Computer supported cooperative work (pp. 355–358). ACM.

  235. 235.

    Weng, J., & Lee, B. S. (2011). Event detection in twitter. ICWSM, 11, 401–408.

    Google Scholar 

  236. 236.

    Hu, M., Liu, S., Wei, F., Wu, Y., Stasko, J., & Ma, K. L. (2012, May). Breaking news on twitter. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 2751–2754). ACM.

  237. 237.

    Sakaki, T., Okazaki, M., & Matsuo, Y. (2010, April). Earthquake shakes Twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World wide web (pp. 851–860). ACM.

  238. 238.

    Neubig, G., Matsubayashi, Y., Hagiwara, M., & Murakami, K. (2011, November). Safety Information Mining-What can NLP do in a disaster-. In IJCNLP (Vol. 11, pp. 965–973).

  239. 239.

    Chen, J., Nairn, R., Nelson, L., Bernstein, M., & Chi, E. (2010, April). Short and tweet: experiments on recommending content from information streams. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 1185–1194). ACM.

  240. 240.

    Backstrom, L., Kleinberg, J., Lee, L., & Danescu-Niculescu-Mizil, C. (2013, February). Characterizing and curating conversation threads: expansion, focus, volume, re-entry. In Proceedings of the sixth ACM international conference on Web search and data mining (pp. 13–22). ACM.

  241. 241.

    Irfan, R., King, C. K., Grages, D., Ewen, S., Khan, S. U., Madani, S. A., et al. (2015). A survey on text mining in social networks. The Knowledge Engineering Review, 30(02), 157–170.

    Google Scholar 

  242. 242.

    Kurka, D. B., Godoy, A., & Von Zuben, F. J. (2015). Online social network analysis: A survey of research applications in computer science. arXiv preprint arXiv:1504.05655.

  243. 243.

    Yoo, K. (2012). Automatic document archiving for cloud storage using text mining-based topic identification technique. In Proceedings of international conference on information and computer application, Singapore (pp. 189–192).

  244. 244.

    Cimiano, P., Handschuh, S., & Staab, S. (2004, May). Towards the self-annotating web. In Proceedings of the 13th international conference on World Wide Web (pp. 462–471). ACM.

  245. 245.

    Mika, P. (2005). Ontologies are us: A unified model of social networks and semantics. In The Semantic WebISWC 2005 (pp. 522–536). Springer Berlin Heidelberg.

  246. 246.

    Finin, T., Ding, L., Zhou, L., & Joshi, A. (2005). Social networking on the semantic web. The Learning Organization, 12(5), 418–435.

    Google Scholar 

  247. 247.

    Friend of a Friend. Accessed October, 2015.

  248. 248.

    Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The semantic web. Scientific American, 284(5), 28–37.

    Google Scholar 

  249. 249.

    Maia, M., Almeida, J., & Almeida, V. (2008, April). Identifying user behavior in online social networks. In Proceedings of the 1st workshop on social network systems (pp. 1–6). ACM.

  250. 250.

    Adar, E., & Huberman, B. A. (2000). Free riding on Gnutella. First Monday, 5(10).

  251. 251.

    Feldman, M., Papadimitriou, C., Chuang, J., & Stoica, I. (2004, September). Free-riding and whitewashing in peer-to-peer systems. In Proceedings of the ACM SIGCOMM workshop on Practice and theory of incentives in networked systems (pp. 228–236). ACM.

  252. 252.

    Marques Neto, H. T., Almeida, J. M., Rocha, L. C., Meira, W., Guerra, P. H., & Almeida, V. A. (2004). A characterization of broadband user behavior and their e-business activities. ACM SIGMETRICS Performance Evaluation Review, 32(3), 3–13.

    Google Scholar 

  253. 253.

    Backstrom, L., Kumar, R., Marlow, C., Novak, J., & Tomkins, A. (2008, February). Preferential behavior in online groups. In Proceedings of the 2008 international conference on web search and data mining (pp. 117–128). ACM.

  254. 254.

    Agichtein, E., Brill, E., & Dumais, S. (2006, August). Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 19–26). ACM.

  255. 255.

    Agichtein, E., Castillo, C., Donato, D., Gionis, A., & Mishne, G. (2008, February). Finding high-quality content in social media. In Proceedings of the 2008 international conference on web search and data mining (pp. 183–194). ACM.

  256. 256.

    Fisher, D., Smith, M., & Welser, H. T. (2006, January). You are who you talk to: Detecting roles in usenet newsgroups. In System Sciences, 2006. HICSS’06. Proceedings of the 39th annual hawaii international conference on (Vol. 3, pp. 59b–59b). IEEE.

  257. 257.

    Menascé, D. A., Almeida, V. A., Fonseca, R., & Mendes, M. A. (2000). Business-oriented resource management policies for e-commerce servers. Performance Evaluation, 42(2), 223–239.

    MATH  Google Scholar 

  258. 258.

    Oard, D. W., & Kim, J. (2001). Modeling information content using observable behavior. In Proceedings of the 64th annual conference of the American society for information science and technology (pp. 481–488). Washington.

  259. 259.

    Viswanath, B., Bashir, M. A., Crovella, M., Guha, S., Gummadi, K. P., Krishnamurthy, B., & Mislove, A. (2014, August). Towards detecting anomalous user behavior in online social networks. In Proceedings of the 23rd USENIX Security Symposium (USENIX Security)}.

  260. 260.

    Benevenuto, F., Rodrigues, T., Cha, M., & Almeida, V. (2009, November). Characterizing user behavior in online social networks. In Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference (pp. 49–62). ACM.

  261. 261.

    Jin, L., Chen, Y., Wang, T., Hui, P., & Vasilakos, A. V. (2013). Understanding user behavior in online social networks: A survey. IEEE Communications Magazine, 51(9), 144–150.

    Google Scholar 

  262. 262.

    Tan, E., Guo, L., Chen, S., Zhang, X., & Zhao, Y. (2012, June). Spammer behavior analysis and detection in user generated content on social networks. In Distributed Computing Systems (ICDCS), 2012 IEEE 32nd International Conference on (pp. 305–314). IEEE.

  263. 263.

    Sato, Y., Utsuro, T., Murakami, Y., Fukuhara, T., Nakagawa, H., Kawada, Y., & Kando, N. (2008, April). Analysing features of Japanese splogs and characteristics of keywords. In Proceedings of the 4th international workshop on Adversarial information retrieval on the web (pp. 33–40). ACM.

  264. 264.

    Wang, Y. M., Ma, M., Niu, Y., & Chen, H. (2007, May). Spam double-funnel: Connecting web spammers with advertisers. In Proceedings of the 16th international conference on World Wide Web (pp. 291–300). ACM.

  265. 265.

    E-mail spam, spam. Accessed December, 2014.

  266. 266.

    Stringhini, G., Kruegel, C., & Vigna, G. (2010, December). Detecting spammers on social networks. In Proceedings of the 26th Annual Computer Security Applications Conference (pp. 1–9). ACM.

  267. 267.

    Gomes, L. H., Cazita, C., Almeida, J. M., Almeida, V., & Meira Jr, W. (2004, October). Characterizing a spam traffic. In Proceedings of the 4th ACM SIGCOMM conference on Internet measurement (pp. 356–369). ACM.

  268. 268.

    Ramachandran, A., & Feamster, N. (2006). Understanding the network-level behavior of spammers. ACM SIGCOMM Computer Communication Review, 36(4), 291–302.

    Google Scholar 

  269. 269.

    Androutsopoulos, I., Koutsias, J., Chandrinos, K. V., & Spyropoulos, C. D. (2000, July). An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval (pp. 160–167). ACM.

  270. 270.

    Jung, J., & Sit, E. (2004, October). An empirical study of spam traffic and the use of DNS black lists. In Proceedings of the 4th ACM SIGCOMM conference on Internet measurement (pp. 370–375). ACM.

  271. 271.

    Delany, M. (2007). Domain-based email authentication using public keys advertised in the DNS (DomainKeys). In RFC 4870, Network Working Group. IETF.

  272. 272.

    Xie, Y., Yu, F., Achan, K., Panigrahy, R., Hulten, G., & Osipkov, I. (2008, August). Spamming botnets: signatures and characteristics. In ACM SIGCOMM Computer Communication Review (Vol. 38, No. 4, pp. 171–182). ACM.

  273. 273.

    Hao, S., Syed, N. A., Feamster, N., Gray, A. G., & Krasser, S. (2009, August). Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine. In USENIX Security Symposium (Vol. 9).

  274. 274.

    Becchetti, L., Castillo, C., Donato, D., Leonardi, S., & Baezayates, R. (2006, December). Linkbased characterization and detection of web spam. In 2nd International workshop on adversarial information retrieval on the web, AIRWeb 2006-29th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR 2006.

  275. 275.

    Castillo, C., Donato, D., Gionis, A., Murdock, V., & Silvestri, F. (2007, July). Know your neighbors: Web spam detection using the web topology. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 423–430). ACM.

  276. 276.

    Gyongyi, Z., & Garcia-Molina, H. (2005). Web spam taxonomy. In First international workshop on adversarial information retrieval on the web (AIRWeb 2005).

  277. 277.

    Niu, Y., Chen, H., Hsu, F., Wang, Y. M., & Ma, M. (2007, February). A quantitative study of forum spamming using context-based analysis. In NDSS.

  278. 278.

    Kolari, P., Java, A., & Finin, T. (2006, May). Characterizing the splogosphere. In Proceedings of the 3rd annual workshop on weblogging ecosystem: Aggregation, analysis and dynamics, 15th World Wid Web conference. University of Maryland, Baltimore County.

  279. 279.

    Grier, C., Thomas, K., Paxson, V., & Zhang, M. (2010, October). @ spam: the underground on 140 characters or less. In Proceedings of the 17th ACM conference on computer and communications security (pp. 27–37). ACM.

  280. 280.

    Kolari, P., Finin, T., & Joshi, A. (2006, March). SVMs for the blogosphere: Blog identification and splog detection. In AAAI spring symposium: Computational approaches to analyzing weblogs (pp. 92–99).

  281. 281.

    Kolari, P., Java, A., Finin, T., Oates, T., & Joshi, A. (2006, July). Detecting spam blogs: A machine learning approach. In Proceedings of the national conference on artificial intelligence (Vol. 21, No. 2, p. 1351). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.

  282. 282.

    Gao, H., Hu, J., Wilson, C., Li, Z., Chen, Y., & Zhao, B. Y. (2010, November). Detecting and characterizing social spam campaigns. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement (pp. 35–47). ACM.

  283. 283.

    Katayama, T., Utsuro, T., Sato, Y., Yoshinaka, T., Kawada, Y., & Fukuhara, T. (2009, April). An empirical study on selective sampling in active learning for splog detection. In Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web (pp. 29-36). ACM.

  284. 284.

    Lee, K., Caverlee, J., & Webb, S. (2010, July). Uncovering social spammers: social honeypots+ machine learning. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval (pp. 435–442). ACM.

  285. 285.

    Lin, Y. R., Sundaram, H., Chi, Y., Tatemura, J., & Tseng, B. L. (2007, May). Splog detection using self-similarity analysis on blog temporal dynamics. In Proceedings of the 3rd international workshop on Adversarial information retrieval on the web (pp. 1–8). ACM.

  286. 286.

    Ma, J., Saul, L. K., Savage, S., & Voelker, G. M. (2009, June). Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1245–1254). ACM.

  287. 287.

    Rieder, B. (2013, May). Studying Facebook via data extraction: the Netvizz application. In Proceedings of the 5th Annual ACM Web Science Conference(pp. 346–355). ACM.

  288. 288.

    Quercia, D., Lambiotte, R., Stillwell, D., Kosinski, M., & Crowcroft, J. (2012, February). The personality of popular facebook users. In Proceedings of the ACM 2012 conference on computer supported cooperative work (pp. 955–964). ACM.

  289. 289.

    Abdesslem, F. B., Parris, I., & Henderson, T. (2012). Reliable online social network data collection. In Computational Social Networks (pp. 183–210). Springer London.

  290. 290.

    Besmer, A., & Richter Lipford, H. (2010, April). Moving beyond untagging: photo privacy in a tagged world. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 1563–1572). ACM.

  291. 291.

    Ozok, A. A., & Zaphiris, P. (2009). Online communities and social computing. New York: Springer.

    Google Scholar 

  292. 292.

    Ellison, N. B., Steinfield, C., & Lampe, C. (2007). The benefits of Facebook “friends:” Social capital and college students’ use of online social network sites. Journal of Computer-Mediated Communication, 12(4), 1143–1168.

    Google Scholar 

  293. 293.

    Krasnova, H., Günther, O., Spiekermann, S., & Koroleva, K. (2009). Privacy concerns and identity in online social networks. Identity in the Information Society, 2(1), 39–63.

    Google Scholar 

  294. 294.

    Lampe, C., Ellison, N. B., & Steinfield, C. (2008, November). Changes in use and perception of Facebook. In Proceedings of the 2008 ACM conference on Computer supported cooperative work (pp. 721–730). ACM.

  295. 295.

    Roblyer, M. D., McDaniel, M., Webb, M., Herman, J., & Witty, J. V. (2010). Findings on Facebook in higher education: A comparison of college faculty and student uses and perceptions of social networking sites. The Internet and Higher Education, 13(3), 134–140.

    Google Scholar 

  296. 296.

    Csikszentmihalyi, M., & Larson, R. (2014). Validity and reliability of the experience-sampling method. In Flow and the Foundations of Positive Psychology (pp. 35–54). Springer Netherlands.

  297. 297.

    Mancini, C., Thomas, K., Rogers, Y., Price, B. A., Jedrzejczyk, L., Bandara, A. K., … & Nuseibeh, B. (2009, September). From spaces to places: emerging contexts in mobile privacy. In Proceedings of the 11th international conference on Ubiquitous computing (pp. 1–10). ACM.

  298. 298.

    Pempek, T. A., Yermolayeva, Y. A., & Calvert, S. L. (2009). College students’ social networking experiences on Facebook. Journal of Applied Developmental Psychology, 30(3), 227–238.

    Google Scholar 

  299. 299.

    Anthony, D., Henderson, T., & Kotz, D. (2007). Privacy in location-aware computing environments. IEEE Pervasive Computing, 4, 64–72.

    Google Scholar 

  300. 300.

    Schäfer, M. T. (2011). Bastard culture! How user participation transforms cultural production (p. 256). Amsterdam: Amsterdam University Press.

    Google Scholar 

  301. 301.

    Ugander, J., Karrer, B., Backstrom, L., & Marlow, C. (2011). The anatomy of the facebook social graph. arXiv preprint arXiv:1111.4503.

  302. 302.

    Leskovec, J. (2008). Dynamics of large networks. Doctoral Dissertation, Carnegie Mellon University, Pittsburgh.

  303. 303.

    Ahn, Y. Y., Han, S., Kwak, H., Moon, S., & Jeong, H. (2007, May). Analysis of topological characteristics of huge online social networking services. In Proceedings of the 16th international conference on World Wide Web (pp. 835–844). ACM.

  304. 304.

    DATASIFT, Accessed September, 2014.

  305. 305.

    GNIP, Accessed September, 2014.

  306. 306.

    Customer relationship management, Accessed September, 2014.

  307. 307.

    Garg, S., Gupta, T., Carlsson, N., & Mahanti, A. (2009, November). Evolution of an online social aggregation network: an empirical study. In Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference (pp. 315–321). ACM.

  308. 308.

    Cha, M., Haddadi, H., Benevenuto, F., & Gummadi, P. K. (2010). Measuring user influence in twitter: The million follower fallacy. ICWSM, 10(10–17), 30.

    Google Scholar 

  309. 309.

    Ghosh, S., Korlam, G., & Ganguly, N. (2010, June). The Effects of Restrictions on Number of Connections in OSNs: A Case-Study on Twitter. In WOSN.

  310. 310.

    Ghosh, S., Zafar, M. B., Bhattacharya, P., Sharma, N., Ganguly, N., & Gummadi, K. (2013, October). On sampling the wisdom of crowds: Random vs. expert sampling of the twitter stream. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management (pp. 1739–1744). ACM.

  311. 311.

    González-Bailón, S., Wang, N., Rivero, A., Borge-Holthoefer, J., & Moreno, Y. (2014). Assessing the bias in samples of large online networks. Social Networks, 38, 16–27.

    Google Scholar 

  312. 312.

    Morstatter, F., Pfeffer, J., Liu, H., & Carley, K. M. (2013). Is the sample good enough? comparing data from twitter’s streaming api with twitter’s firehose. arXiv preprint arXiv:1306.5204.

  313. 313.

    Lindamood, J., Heatherly, R., Kantarcioglu, M., & Thuraisingham, B. (2009, April). Inferring private information using social network data. In Proceedings of the 18th international conference on World wide web (pp. 1145–1146). ACM.

  314. 314.

    Gyarmati, L., & Trinh, T. A. (2010). Measuring user behavior in online social networks. IEEE Network, 24(5), 26–31.

    Google Scholar 

  315. 315.

    Iachello, G., Smith, I., Consolvo, S., Chen, M., & Abowd, G. D. (2005, July). Developing privacy guidelines for social location disclosure applications and services. In Proceedings of the 2005 symposium on Usable privacy and security (pp. 65–76). ACM.

  316. 316.

    Prabaker, M., Rao, J., Fette, I., Kelley, P., Cranor, L., Hong, J., & Sadeh, N. (2007, September). Understanding and capturing people’s privacy policies in a people finder application. In Proceedings of the workshop ubicomp privacy.

  317. 317., Accessed September, 2014.

  318. 318.

    TAPoR, Accessed September, 2014.

  319. 319.

    Truthy, Accessed September, 2014.

  320. 320.

    Tweet Archivist, Accessed September, 2014.

  321. 321.

    TweetStats, Accessed September, 2014.

  322. 322.

    Twiangulate, Accessed September, 2014.

  323. 323.

    Twitonomy, Accessed September, 2014.

  324. 324.

    YourTwapperKeeper, Accessed September, 2014.

  325. 325.

    Tweetnest, Accessed September, 2014.

  326. 326.

    NodeXL, Accessed September, 2014.

  327. 327.

    Netlytic, Accessed September, 2014.

  328. 328.

    Textexture, Accessed September, 2014.

  329. 329.

    ThinkUp, Accessed September, 2014.

  330. 330.

    Aggarwal, C. C., & Wang, H. (2011). Text mining in social networks. In Social Network Data Analytics (pp. 353–378). Springer US.

  331. 331.

    ClusterHQ, Accessed November, 2015.

  332. 332.

    Followthehashtag, Accessed November, 2015.

  333. 333.

    iSciencemaps, Accessed November, 2015.

  334. 334.

    QSR, Accessed November, 2015.

  335. 335.

    Mozdeh, Accessed November, 2015.

  336. 336.

    The Chorus project. Accessed November, 2015.

  337. 337.

    Cattell, R. (2011). Scalable SQL and NoSQL data stores. ACM SIGMOD Record, 39(4), 12–27.

    Google Scholar 

  338. 338.

    Stonebraker, Michael. (2010). SQL databases v. NoSQL databases. Communications of the ACM, 53(4), 10–11.

    Google Scholar 

  339. 339.

    Gjoka, M., Kurant, M., Butts, C. T., & Markopoulou, A. (2010, March). Walking in Facebook: A case study of unbiased sampling of OSNs. In INFOCOM, 2010 Proceedings IEEE (pp. 1–9). IEEE.

  340. 340.

    Lewis, K., Kaufman, J., & Christakis, N. (2008). The taste for privacy: An analysis of college student privacy settings in an online social network. Journal of Computer-Mediated Communication, 14(1), 79–100.

    Google Scholar 

  341. 341.

    Doddington, G. R., Mitchell, A., Przybocki, M. A., Ramshaw, L. A., Strassel, S., & Weischedel, R. M. (2004, May). The Automatic Content Extraction (ACE) Program-Tasks, Data, and Evaluation. In LREC (Vol. 2, p. 1).

Download references

Author information



Corresponding author

Correspondence to Ajey Kumar.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Goswami, A., Kumar, A. Challenges in the Analysis of Online Social Networks: A Data Collection Tool Perspective. Wireless Pers Commun 97, 4015–4061 (2017).

Download citation


  • OSN
  • SNA
  • Data collection tools
  • Research challenges