Knowledge and Information Systems

, Volume 48, Issue 1, pp 179–199 | Cite as

Local bilateral clustering for identifying research topics and groups from bibliographical data

  • Sara Elena Garza Villarreal
  • Satu Elisa SchaefferEmail author
Regular Paper


The structure of scientific collaboration networks provides insight on the relationships between people and disciplines. In this paper, we study a bipartite graph connecting authors to publications and extract from it clusters of authors and articles, interpreting the author clusters as research groups and the article clusters as research topics. Visualisations are proposed to ease the interpretation of such clusters in terms of discovering leaders, the activity level, and other semantic aspects. We discuss the process of obtaining and preprocessing the information from scientific publications, the formulation and implementation of the clustering algorithm, and the creation of the visualisations. Experiments on a test data set are presented, using an initial prototype implementation of the proposed modules.


Clustering Knowledge discovery Collaboration networks Network analysis 



The first author was supported by SEP-PROMEP Grant No. 103.5/12/7884. We thank the anonymous reviewers for their useful suggestions that helped improve the manuscript.


  1. 1.
    Avalos-Gaytán V, Almendral JA, Papo D, Schaeffer SE, Boccaletti S (2012) Assortative and modular networks are shaped by adaptive synchronization processes. PRE 86(1):015101(R)CrossRefGoogle Scholar
  2. 2.
    Barber MJ (2007) Modularity and community detection in bipartite networks. Phys Rev E 76(6):066102MathSciNetCrossRefGoogle Scholar
  3. 3.
    Batagelj V (2003) Efficient algorithms for citation network analysis. Technical Report. arXiv:cs/0309023
  4. 4.
    Bian J, Xie M, Hudson TJ, Eswaran H, Brochhausen M, Hanna J, Hogan WR (2014) Collaborationviz: interactive visual exploration of biomedical research collaboration networks. PloS One 9(11):e1119280CrossRefGoogle Scholar
  5. 5.
    Bogárdi-Mészöly Á, Rövid A, Ishikawa H (2013) Topic recommendation from tag clouds. Bull Netw Comp Sys Softw 2(1):25Google Scholar
  6. 6.
    Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. Comput Netw ISDN Syst 30(1–7):107–117CrossRefGoogle Scholar
  7. 7.
    Catanzaro M, Caldarelli G, Pietronero L (2004a) Assortative model for social networks. PRE 70(3), Article ID 037101. doi: 10.1103/PhysRevE.70.037101
  8. 8.
    Catanzaro M, Caldarelli G, Pietronero L (2004b) Social network growth with assortative mixing. Phys A 338(1–2):119–124CrossRefGoogle Scholar
  9. 9.
    Clement R, Sharp D (2003) Ngram and Bayesian classification of documents for topic and authorship. Lit Linguist Comput 18(4):423–447CrossRefGoogle Scholar
  10. 10.
    Diestel R (2010) Graph theory, GTM, vol 173, 4th edn. Springer, BerlinCrossRefzbMATHGoogle Scholar
  11. 11.
    Ding Y, Yan E, Frazho A, Caverlee J (2009) PageRank for ranking authors in co-citation networks. JASIST 60(11):2229–2243CrossRefGoogle Scholar
  12. 12.
    Dorogovtsev S, Mendes J (2002) Evolution of networks: from biological nets to the internet and WWW. Clarendon Press, OxfordzbMATHGoogle Scholar
  13. 13.
    Du N, Wu B, Pei X, Wang B, Xu L (2007) Community detection in large-scale social networks. In: Proceedings of WebKDD and SNA-KDD, ACM, New York, pp 16–25Google Scholar
  14. 14.
    da Costa LF, Rodrigues F, Travieso G, Boas P (2007) Characterization of complex networks: a survey of measurements. Adv Phys 56(1):167–242CrossRefGoogle Scholar
  15. 15.
    Flake G, Lawrence S, Giles C (2000) Efficient identification of web communities. In: Proceedings of KDD, ACM New York, pp 150–160Google Scholar
  16. 16.
    Fortunato S (2010) Community detection in graphs. Phys Rep 486:75–174MathSciNetCrossRefGoogle Scholar
  17. 17.
    Fruchterman T, Reingold E (1991) Graph drawing by force-directed placement. Softw Pract Exp 21(11):1129–1164CrossRefGoogle Scholar
  18. 18.
    Gleiser PM, Danon L (2003) Community structure in jazz. Adv Complex Syst 6(4):563–573CrossRefGoogle Scholar
  19. 19.
    Huang J, Zhuang Z, Li J, Giles CL (2008) Collaboration over time: characterizing and modeling network evolution. In: Proceedings of WSDM, ACM, New York, pp 107–116Google Scholar
  20. 20.
    Jeong H, Néda Z, Barabási A (2003) Measuring preferential attachment in evolving networks. Europhys Lett 61:567–572. doi: 10.1209/epl/i2003-00166-9
  21. 21.
    Kirkpatrick S, Gelatt CD Jr, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Larremore DB, Clauset A, Jacobs AZ (2014) Efficiently inferring community structure in bipartite networks. arXiv:1403.2933
  23. 23.
    Li M, Fan Y, Chen J, Gao L, Di Z, Wu J (2005) Weighted networks of scientific communication: the measurement and topological role of weight. Phys A 350(2–4):643–656CrossRefGoogle Scholar
  24. 24.
    Liu J, Li Y, Ruan Z, Fu G, Chen X, Sadiq R, Deng Y (2015) A new method to construct co-author networks. Phys A Stat Mech Its Appl 419:29–39CrossRefGoogle Scholar
  25. 25.
    Liu X, Murata T (2009) Community detection in large-scale bipartite networks. In: IEEE/WIC/ACM international joint conferences on web intelligence and intelligent agent technologies, 2009. WI-IAT’09. IET, vol 1, pp 50–57Google Scholar
  26. 26.
    Liu X, Bollen J, Nelson M, Van de Sompel H (2005) Co-authorship networks in the digital library research community. Inf Process Manag 41(6):1462–1480CrossRefGoogle Scholar
  27. 27.
    Ma T, Rong H, Ying C, Tian Y, Al-Dhelaan A, Al-Rodhaan M (2015) Detect structural-connected communities based on bschef in c-dblp. Concurr Comput Pract Exp. doi: 10.1002/cpe.3437
  28. 28.
    Milgram S (1967) The small world problem. Psych Today 2:60–67Google Scholar
  29. 29.
    Moody J (2004) The structure of a social science collaboration network: disciplinary cohesion from 1963 to 1999. Am Sociol Rev 69(2):213–238CrossRefGoogle Scholar
  30. 30.
    Newman M (2001a) Clustering and preferential attachment in growing networks. PRE 64(2) Article ID 025102(R). doi: 10.1103/PhysRevE.64.025102
  31. 31.
    Newman M (2001b) Scientific collaboration networks. I. Network construction and fundamental results. PRE 64:016131. doi: 10.1103/PhysRevE.64.016131 CrossRefGoogle Scholar
  32. 32.
    Newman M (2001c) Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. PRE 64, Article ID 016132. doi: 10.1103/PhysRevE.64.016132
  33. 33.
    Newman M (2001d) The structure of scientific collaboration networks. PNAS 98(2):404–409. doi: 10.1073/pnas.98.2.404
  34. 34.
    Newman M (2002) Assortative mixing in networks. PRL 89 Article ID 208701. doi: 10.1103/PhysRevLett.89.208701
  35. 35.
    Newman M (2004a) Coauthorship networks and patterns of scientific collaboration. PNAS 101(Suppl. 1):5200–5205. doi: 10.1073/pnas.0307545100
  36. 36.
    Newman M (2004b) Who is the best connected scientist? A study of scientific coauthorship networks. Complex Netw 650:337–370MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    Newman M (2006) Modularity and community structure in networks. PNAS 103(23):8577–8582. doi: 10.1073/pnas.0601602103
  38. 38.
    Newman M (2010) Networks: an introduction. Oxford University Press, OxfordCrossRefzbMATHGoogle Scholar
  39. 39.
    Papadopoulos S, Kompatsiaris Y, Vakali A, Spyridonos P (2012) Community detection in social media. Data Min Knowl Discov 24(3):515–554CrossRefGoogle Scholar
  40. 40.
    Perianes-Rodríguez A, Olmeda-Gmez C, Moya-Anegn F (2010) Detecting, identifying and visualizing research groups in co-authorship networks. Scientometrics 82(2):307–319CrossRefGoogle Scholar
  41. 41.
    Porter M (1980) An algorithm for suffix stripping. Program 14(3):130–137CrossRefGoogle Scholar
  42. 42.
    Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D (2004) Defining and identifying communities in networks. PNAS 101(9):2658–2663CrossRefGoogle Scholar
  43. 43.
    Ramasco J, Dorogovtsev S, Pastor-Satorras R (2004) Self-organization of collaboration networks. PRE 70(3):036106CrossRefGoogle Scholar
  44. 44.
    Schaeffer S (2007) Graph clustering. CoSRev 1(1):27–64MathSciNetzbMATHGoogle Scholar
  45. 45.
    Schaeffer SE (2005) Stochastic local clustering for massive graphs. In: Ho TB, Cheung D, Liu H (eds) Advances in knowledge discovery and data mining. Proceedings of the 9th Pacific-Asia conference, PAKDD 2005, Hanoi, Vietnam, May 18–20, 2005. Lecture notes in computerscience, vol 3518. Springer, Berlin, pp 354–360. doi: 10.1007/11430919_42
  46. 46.
    Sozio M, Gionis A (2010) The community-search problem and how to plan a successful cocktail party. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 939–948Google Scholar
  47. 47.
    Stamatatos E (2009) A survey of modern authorship attribution methods. J Am Soc Inf Sci Technol 60(3):538–556CrossRefGoogle Scholar
  48. 48.
    Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z (2008) Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 990–998Google Scholar
  49. 49.
    Tran DH, Takeda H, Kurakawa K, Tran MT (2012) Combining topic model and co-author network for KAKEN and DBLP linking. In: Intelligent information and database systems, lecture notes in computer science, vol 7198, Springer, pp 396–404Google Scholar
  50. 50.
    Yang T, Jun R, Chi Y, Zhu S (2009) Combining link and content for community detection: a discriminative approach. In: Proceedings of KDD, ACM, New York, pp 927–936Google Scholar
  51. 51.
    Ye Q, Wu B, Wang B (2008) Visual analysis of a co-authorship network and its underlying structure. In: Fifth international conference on fuzzy systems and knowledge discovery, 2008. FSKD ’08., vol 4, pp 689–693. doi: 10.1109/FSKD.2008.436
  52. 52.
    Zhou S, Cox I, Hansen LK (2009) Second-order assortative mixing in social networks. Technical Report. arXiv:0903.0687

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  • Sara Elena Garza Villarreal
    • 1
  • Satu Elisa Schaeffer
    • 1
    Email author
  1. 1.FIMEUniversidad Autónoma de Nuevo LeónSan Nicolás de los GarzaMexico

Personalised recommendations