Social Network Analysis and Mining

, Volume 2, Issue 1, pp 69–95 | Cite as

Semantically interconnected social networks

  • Alessandro CucchiarelliEmail author
  • Fulvio D’Antonio
  • Paola Velardi
Review Article


Social network analysis aims to identify collaborations and helps people organize themselves through community participation and information sharing. The primary sources for social network modelling are explicit relationships such as co-authoring, citations, friendship, etc. However, to enable the integration of on-line community information and to fully describe the content and structure of community sites, secondary sources of information, such as documents, e-mails, blogs and discussions, can be exploited. In this paper we describe a methodology and a battery of tools to automatically extract from documents the relevant topics shared among community members and to analyse the evolution of the network also in terms of emergence and decay of collaboration themes. Experiments are conducted on a scientific network funded by the European Community, the INTEROP network of excellence, and on the United Kingdom research community in medical image understanding and analysis.


Social networks Semantic web Natural language processing Text analysis Clustering Computer-supported collaborative work 



The authors wish to thank Vincenzo Casini for his help in developing the GVI tool.


  1. Baeza-Yates R, Ribeiro-Neto R (1999) Modern Information Retrieval. ACM Press Series/Addison Wesley, New YorkGoogle Scholar
  2. Berners-Lee T, Hendler J, Lassila O (2001) The semantic web. Scientific American, MayGoogle Scholar
  3. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Machine Learn Res 3:993–1022zbMATHGoogle Scholar
  4. Bojars U, Breslin JG, Finn A, Decker S (2008) Using the semantic web for linking and reusing data across Web 2.0 communities. Web Seman Sci Services Agen World Wide Web 6(1):21–28Google Scholar
  5. Bollegala D, Matsuo Y, Ishiuka M (2007) Measuring semantic similarity between words using web search engines. In: Proceedings of the 16th international conference on world wide web, Banff, AlbertaGoogle Scholar
  6. Budanitsky A, Hirst G (2006) Evaluating WordNet-based measures of semantic distance. Comput Linguist 32(1):13–47CrossRefGoogle Scholar
  7. Chlia M, De Wilde P (2006) Internet search: subdivision-based interactive query expansion and the soft semantic web. Appl Soft Comput 6(4):372–383CrossRefGoogle Scholar
  8. Dhiraj J, Gatica-Perez D (2006) Discovering groups of people in google news. In: Proceedings of the 1st ACM International workshop on human-centered multimedia (HCM). Santa Barbara, CAGoogle Scholar
  9. Domeniconi C, Al-Razgan M (2009) Weighted cluster ensembles: methods and analysis. In: ACM transactions on knowledge discovery from data, vol 2, No. 4Google Scholar
  10. Eckart C, Young G (1936) The approximation of one matrix by another of lower rank. Psychometrika 1:211–218zbMATHCrossRefGoogle Scholar
  11. Finin T, Ding L, Zhou L, Joshi A (2005) Social networking on the semantic web. In: The learning organization, Emerald pub, New York, pp 418–435Google Scholar
  12. Fuhr N (1992) Probabilistic models in information retrieval. Comp J 35(3):243–255zbMATHCrossRefGoogle Scholar
  13. Gruber T (2003) It is what it does: the pragmatics of ontology. Invited presentation to the meeting of the CIDOC Conceptual Reference Model committee, Smithsonian Museum, WashingtonGoogle Scholar
  14. Hammouda K, Kamel M (2004) Efficient phrase-based document indexing for web document clustering. IEEE Trans Knowl Data Eng (TKDE) 16:1279–1296CrossRefGoogle Scholar
  15. Hansen M, Yu B (2001) Model selection and the principle of minimum description length. J Am Stat Assoc 96:746–774MathSciNetzbMATHCrossRefGoogle Scholar
  16. Ha-Tuc V, Srinivasan P (2008) Topic models and a revisit of text-related applications. In: Proceedings of conference on information and knowledge management, Napa Valley, CA, pp 25–32Google Scholar
  17. Hirst G, Budanitsky A (2001) Lexical chains and semantic distance. In: Proceedings of EUROLAN-2001, Iasi, RomaniaGoogle Scholar
  18. Hirst G, St-Onge D (1998) Lexical chains as representations of context for the detection and correction of malapropisms. In: Fellbaum C (ed) WordNet: an electronic lexical database. MIT Press, USA, pp 305–332Google Scholar
  19. Jain K, Murty M, Flynn P (1999) Data clustering: a review. In: ACM computing surveys, vol 31, No. 3. pp 264–323Google Scholar
  20. Jamali M, Abolhhassani H (2006) Different aspects of social network analysis. In: Proceedings of the 2006 IEEE-WIC-ACM international conference on web intelligence, Hong Kong, pp 66–72Google Scholar
  21. Jiang J, Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of international conference on research in computational linguistics, TaiwanGoogle Scholar
  22. Jung J, Euzenat J (2007) Towards semantic social networks. In: Proceedings of the European semantic web conference (ESWC), Innsbruck, Austria, pp 267–280Google Scholar
  23. Kang S (2003) Keyword-based document clustering. In: Proceedings of the 6th international workshop on information retrieval with Asian languages, vol 11. Japan, pp 132–137Google Scholar
  24. Kanungo T, Mount DM, Netanyahu N, Piatko C, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans Pattern Anal Machine Intell 24:881–892CrossRefGoogle Scholar
  25. Kleinberg J (2002) An impossibility theorem for clustering. In: Advances in neural information processing systems 15: Proceedings of the 2002 conference. Bradford Books, pp 446–453Google Scholar
  26. Kovacs F, Legany C, Babos A (2005) Cluster validity measurement techniques. In: Proceedings of 6th international symposium of Hungarian researchers on computational intelligence. Budapest, HungaryGoogle Scholar
  27. Kuhn A, Ducasse S, Girba T (2007) Semantic clustering: identifying topics in source code. In: Journal of Information and software technology, vol 49, no. 3. pp 230–243Google Scholar
  28. Landauer TK, McNamara DS, Dennis S, Kintsch W (eds) (2007) Handbook of latent semantic analysis, Lawrence Erlbaum Associates Inc., MahwahGoogle Scholar
  29. Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. In: Fellbaum C (ed) WordNet: an electronic lexical database. MIT Press, USA, pp 265–283Google Scholar
  30. Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the 15th international conference on machine learning. Madison, USAGoogle Scholar
  31. Macherey W, Viechtbauer J, Ney H (2002) Probabilistic retrieval based on document representations. In: Proceedings of the international conference on spoken language processing, Denver, CO, pp 1481–1484Google Scholar
  32. McCallum A, Corrada-Emmanuel A, Wang X (2005) Topic and role discovery in social networks. In: Proceedings of international joint conference on artificial intelligence (IJCAI), Edinburgh, pp 786–791Google Scholar
  33. Mei Q, Cai D, Zhang D, Zhai C (2008) Topic modeling with network regularization. In: Proceedings of WWW 2008, April 21–25, 2008 Beijing, ChinaGoogle Scholar
  34. Mika P (2007) Social networks and the semantic web, series in semantic web and beyond, vol 5. Springer, BerlinGoogle Scholar
  35. Nallapati R, Ahmed A, Xing E, Cohen WW (2008) Joint latent topic models for texts and citations. In: Proceedings of KDD 2008, August 24–27, 2008, las Vegas, Nevada, USAGoogle Scholar
  36. Navigli R, Crisafulli G (2010) Inducing word senses to improve web search result clustering. In: Proceedings of the 2010 conference on empirical methods in natural language processing (EMNLP 2010), MIT Stata Center, Massachusets, pp 116–126Google Scholar
  37. Navigli R, Velardi P (2008) From glossaries to ontologies: extracting semantic structure from textual definitions. Ontology learning and population: bridging the gap between text and knowledge. In: Buitelaar P, Cimiano P (eds) Series information for frontiers in artificial intelligence and applications, IOS Press, Amsterdam, pp 71–87Google Scholar
  38. Nenadic G, Rice S, Spasic I, Ananiadou S, Sy B (2003) Selecting text features for gene name classification: from documents to terms. In: Proceedings of the ACL workshop on NLP in biomedicine, vol 13. Sapporo, Japan, pp 121–128Google Scholar
  39. Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45:167–256MathSciNetzbMATHCrossRefGoogle Scholar
  40. Pedersen T, Pakhomov SV, Patwardhan S, Chute CG (2007) Measures of semantic similarity and relatedness in the biomedical domain. J Biomed Inform 40(3):288–299CrossRefGoogle Scholar
  41. Ponzetto SP, Strube M (2007) Knowledge derived from Wikipedia for computing semantic relatedness. J Artificial Intell Res 30(1):181–212zbMATHGoogle Scholar
  42. Purandare A, Pedersen T (2004) Word sense discrimination by clustering contexts in vector and similarity spaces. In: Proceedings of the conference on computational natural language learning (CoNLL), May 6–7, 2004, Boston, MA, pp 41–48Google Scholar
  43. Resnik P (1999) Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J Artificial Intell Res 11:95–130zbMATHGoogle Scholar
  44. Russo V (2007) State of the art of clustering techniques: support vector methods and minimum Bregman information principle, Master Thesis, University of Napoli “Federico II”, Computer Science DeptGoogle Scholar
  45. Salton G, Mcgill M (1983) An Introduction to modern information retrieval. McGraw-Hill, New YorkGoogle Scholar
  46. Sclano F, Velardi P (2007) TermExtractor: a web application to learn the common terminology of Interest Groups and Research Communities. In: Proceedings of 9th conference on terminology and artificial intelligence (TIA 2007), Sophia AntinopolisGoogle Scholar
  47. Scott J (2000) Social network analysis. SAGE Publications, ChennaiGoogle Scholar
  48. Staab S, Studer R (2009) Handbook on ontologies. Springer, BerlinGoogle Scholar
  49. Sussna M (1993) Word sense disambiguation for free-text indexing using a massive semantic network. In: Proceedings of the second international conference on information and knowledge management, Washington, DC, USA, pp 67–74Google Scholar
  50. Tagarelli AY, Karypis G (2008) A segment-based approach to clustering multi-topic documents. In: Proceedings of SIAM data mining conference text mining workshop, Atlanta, Georgia, USAGoogle Scholar
  51. Tan P, Steinbach M, Kumar V (2006) Cluster analysis: basic concepts and algorithms. In: Introduction to data mining. Addison-Wensley, New YorkGoogle Scholar
  52. Terra E, Clarke CL (2003) Frequency estimates for statistical word similarity measures. In: Proceedings of the 2003 Conference of the North American chapter of the ACL on HLT (NAACL ‘03), Morristown, NJ, pp 165–172Google Scholar
  53. Velardi P, Cucchiarelli A, Petit M (2007) A taxonomy learning method and its application to characterize a scientific web community. IEEE Trans Data Knowl Eng (TDKE) 19(2):180–191CrossRefGoogle Scholar
  54. Velardi P, Navigli R, D’Amadio P (2008a) Mining the web to create specialized glossaries. IEEE Intell Syst 23:5CrossRefGoogle Scholar
  55. Velardi P, Cucchiarelli A, D’Antonio F (2008b) Monitoring the status of a reserach community through a knowledge map, web intelligence, agent systems. Int J 6(3):1–22Google Scholar
  56. Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge University Press, UKGoogle Scholar
  57. Weeds J, Weir D (2006) Co-occurrence retrieval: a flexible framework for lexical distributional similarity. Comput Linguist 31(4):439–475CrossRefGoogle Scholar
  58. Wood M (2005) Bootstrapped confidence intervals as an approach to statistical inference. Organ Res Methods 8(4):454–470CrossRefGoogle Scholar
  59. Wu Z, Palmer M (1994) Verb semantics and lexical selection. In: Proceedings of 32nd annual meeting of the association for computational linguistics (ACL), Las Cruces, New Mexico, USA, pp 133–138Google Scholar
  60. Zhao Y, Karypis G (2004) Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learn 55(3):311–331zbMATHCrossRefGoogle Scholar
  61. Zhao Y, Karypis G (2005) Hierarchical clustering algorithms for document datasets. Data Min Knowl Disc 10:141–168MathSciNetCrossRefGoogle Scholar
  62. Zhong M, Chen Z, Lin Y, Yao J (2004) Using classification and key phrases extraction for information retrieval. In: Proceedings of 5th World Congress on intelligent control and automation, June 15–19, 2004, Hangzhou, ChinaGoogle Scholar
  63. Zhou D, Ji X, Zha H, Giles CL (2006) Topic evolution and social interactions: how authors effect research. In: Proceedings of CIKM 2006, November 5–11, 2006, Arlington, Virginia, USAGoogle Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  • Alessandro Cucchiarelli
    • 1
    Email author
  • Fulvio D’Antonio
    • 1
  • Paola Velardi
    • 2
  1. 1.DIIGAUniversità Politecnica delle MarcheAnconaItaly
  2. 2.DIS‘Sapienza’ University of RomeRomeItaly

Personalised recommendations