Skip to main content

Local bilateral clustering for identifying research topics and groups from bibliographical data

Abstract

The structure of scientific collaboration networks provides insight on the relationships between people and disciplines. In this paper, we study a bipartite graph connecting authors to publications and extract from it clusters of authors and articles, interpreting the author clusters as research groups and the article clusters as research topics. Visualisations are proposed to ease the interpretation of such clusters in terms of discovering leaders, the activity level, and other semantic aspects. We discuss the process of obtaining and preprocessing the information from scientific publications, the formulation and implementation of the clustering algorithm, and the creation of the visualisations. Experiments on a test data set are presented, using an initial prototype implementation of the proposed modules.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Notes

  1. 1.

    As the graph is bipartite, necessarily \(\varGamma (v) \subseteq T\) as well as \(\varGamma (w) \subseteq T\).

  2. 2.

    Available at: http://arnetminer.org.

  3. 3.

    Available at http://tartarus.org/martin/PorterStemmer/.

  4. 4.

    In the symmetric mode, once a cluster is computed, the included vertices are no longer available for inclusion in future cluster computations.

  5. 5.

    The weight of an edge w(vu) is computed as the multiplicity of that edge; for purposes of the clustering phase, the edges are treated as directed and the weight is normalised by the degree of vertex v, making the directed edge weight asymmetric.

  6. 6.

    Available at http://www.mathiasbader.de/studium/bioinformatics/.

  7. 7.

    Available at http://neoformix.com/2008/ClusteredWordClouds.html.

  8. 8.

    For example, the Wordle tool (http://www.wordle.net).

  9. 9.

    Available at http://dblp.uni-trier.de/ in XML format.

  10. 10.

    In a systematic sample, each element is chosen after k steps, where k results from dividing the total number of elements by the desired sample size.

  11. 11.

    Only iterations where the cluster order was above the threshold were considered.

References

  1. 1.

    Avalos-Gaytán V, Almendral JA, Papo D, Schaeffer SE, Boccaletti S (2012) Assortative and modular networks are shaped by adaptive synchronization processes. PRE 86(1):015101(R)

    Article  Google Scholar 

  2. 2.

    Barber MJ (2007) Modularity and community detection in bipartite networks. Phys Rev E 76(6):066102

    MathSciNet  Article  Google Scholar 

  3. 3.

    Batagelj V (2003) Efficient algorithms for citation network analysis. Technical Report. arXiv:cs/0309023

  4. 4.

    Bian J, Xie M, Hudson TJ, Eswaran H, Brochhausen M, Hanna J, Hogan WR (2014) Collaborationviz: interactive visual exploration of biomedical research collaboration networks. PloS One 9(11):e1119280

    Article  Google Scholar 

  5. 5.

    Bogárdi-Mészöly Á, Rövid A, Ishikawa H (2013) Topic recommendation from tag clouds. Bull Netw Comp Sys Softw 2(1):25

    Google Scholar 

  6. 6.

    Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. Comput Netw ISDN Syst 30(1–7):107–117

    Article  Google Scholar 

  7. 7.

    Catanzaro M, Caldarelli G, Pietronero L (2004a) Assortative model for social networks. PRE 70(3), Article ID 037101. doi:10.1103/PhysRevE.70.037101

  8. 8.

    Catanzaro M, Caldarelli G, Pietronero L (2004b) Social network growth with assortative mixing. Phys A 338(1–2):119–124

    Article  Google Scholar 

  9. 9.

    Clement R, Sharp D (2003) Ngram and Bayesian classification of documents for topic and authorship. Lit Linguist Comput 18(4):423–447

    Article  Google Scholar 

  10. 10.

    Diestel R (2010) Graph theory, GTM, vol 173, 4th edn. Springer, Berlin

    Book  MATH  Google Scholar 

  11. 11.

    Ding Y, Yan E, Frazho A, Caverlee J (2009) PageRank for ranking authors in co-citation networks. JASIST 60(11):2229–2243

    Article  Google Scholar 

  12. 12.

    Dorogovtsev S, Mendes J (2002) Evolution of networks: from biological nets to the internet and WWW. Clarendon Press, Oxford

    MATH  Google Scholar 

  13. 13.

    Du N, Wu B, Pei X, Wang B, Xu L (2007) Community detection in large-scale social networks. In: Proceedings of WebKDD and SNA-KDD, ACM, New York, pp 16–25

  14. 14.

    da Costa LF, Rodrigues F, Travieso G, Boas P (2007) Characterization of complex networks: a survey of measurements. Adv Phys 56(1):167–242

    Article  Google Scholar 

  15. 15.

    Flake G, Lawrence S, Giles C (2000) Efficient identification of web communities. In: Proceedings of KDD, ACM New York, pp 150–160

  16. 16.

    Fortunato S (2010) Community detection in graphs. Phys Rep 486:75–174

    MathSciNet  Article  Google Scholar 

  17. 17.

    Fruchterman T, Reingold E (1991) Graph drawing by force-directed placement. Softw Pract Exp 21(11):1129–1164

    Article  Google Scholar 

  18. 18.

    Gleiser PM, Danon L (2003) Community structure in jazz. Adv Complex Syst 6(4):563–573

    Article  Google Scholar 

  19. 19.

    Huang J, Zhuang Z, Li J, Giles CL (2008) Collaboration over time: characterizing and modeling network evolution. In: Proceedings of WSDM, ACM, New York, pp 107–116

  20. 20.

    Jeong H, Néda Z, Barabási A (2003) Measuring preferential attachment in evolving networks. Europhys Lett 61:567–572. doi:10.1209/epl/i2003-00166-9

  21. 21.

    Kirkpatrick S, Gelatt CD Jr, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680

    MathSciNet  Article  MATH  Google Scholar 

  22. 22.

    Larremore DB, Clauset A, Jacobs AZ (2014) Efficiently inferring community structure in bipartite networks. arXiv:1403.2933

  23. 23.

    Li M, Fan Y, Chen J, Gao L, Di Z, Wu J (2005) Weighted networks of scientific communication: the measurement and topological role of weight. Phys A 350(2–4):643–656

    Article  Google Scholar 

  24. 24.

    Liu J, Li Y, Ruan Z, Fu G, Chen X, Sadiq R, Deng Y (2015) A new method to construct co-author networks. Phys A Stat Mech Its Appl 419:29–39

    Article  Google Scholar 

  25. 25.

    Liu X, Murata T (2009) Community detection in large-scale bipartite networks. In: IEEE/WIC/ACM international joint conferences on web intelligence and intelligent agent technologies, 2009. WI-IAT’09. IET, vol 1, pp 50–57

  26. 26.

    Liu X, Bollen J, Nelson M, Van de Sompel H (2005) Co-authorship networks in the digital library research community. Inf Process Manag 41(6):1462–1480

    Article  Google Scholar 

  27. 27.

    Ma T, Rong H, Ying C, Tian Y, Al-Dhelaan A, Al-Rodhaan M (2015) Detect structural-connected communities based on bschef in c-dblp. Concurr Comput Pract Exp. doi:10.1002/cpe.3437

  28. 28.

    Milgram S (1967) The small world problem. Psych Today 2:60–67

    Google Scholar 

  29. 29.

    Moody J (2004) The structure of a social science collaboration network: disciplinary cohesion from 1963 to 1999. Am Sociol Rev 69(2):213–238

    Article  Google Scholar 

  30. 30.

    Newman M (2001a) Clustering and preferential attachment in growing networks. PRE 64(2) Article ID 025102(R). doi:10.1103/PhysRevE.64.025102

  31. 31.

    Newman M (2001b) Scientific collaboration networks. I. Network construction and fundamental results. PRE 64:016131. doi:10.1103/PhysRevE.64.016131

    Article  Google Scholar 

  32. 32.

    Newman M (2001c) Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. PRE 64, Article ID 016132. doi:10.1103/PhysRevE.64.016132

  33. 33.

    Newman M (2001d) The structure of scientific collaboration networks. PNAS 98(2):404–409. doi:10.1073/pnas.98.2.404

  34. 34.

    Newman M (2002) Assortative mixing in networks. PRL 89 Article ID 208701. doi:10.1103/PhysRevLett.89.208701

  35. 35.

    Newman M (2004a) Coauthorship networks and patterns of scientific collaboration. PNAS 101(Suppl. 1):5200–5205. doi:10.1073/pnas.0307545100

  36. 36.

    Newman M (2004b) Who is the best connected scientist? A study of scientific coauthorship networks. Complex Netw 650:337–370

    MathSciNet  Article  MATH  Google Scholar 

  37. 37.

    Newman M (2006) Modularity and community structure in networks. PNAS 103(23):8577–8582. doi:10.1073/pnas.0601602103

  38. 38.

    Newman M (2010) Networks: an introduction. Oxford University Press, Oxford

    Book  MATH  Google Scholar 

  39. 39.

    Papadopoulos S, Kompatsiaris Y, Vakali A, Spyridonos P (2012) Community detection in social media. Data Min Knowl Discov 24(3):515–554

    Article  Google Scholar 

  40. 40.

    Perianes-Rodríguez A, Olmeda-Gmez C, Moya-Anegn F (2010) Detecting, identifying and visualizing research groups in co-authorship networks. Scientometrics 82(2):307–319

    Article  Google Scholar 

  41. 41.

    Porter M (1980) An algorithm for suffix stripping. Program 14(3):130–137

    Article  Google Scholar 

  42. 42.

    Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D (2004) Defining and identifying communities in networks. PNAS 101(9):2658–2663

    Article  Google Scholar 

  43. 43.

    Ramasco J, Dorogovtsev S, Pastor-Satorras R (2004) Self-organization of collaboration networks. PRE 70(3):036106

    Article  Google Scholar 

  44. 44.

    Schaeffer S (2007) Graph clustering. CoSRev 1(1):27–64

    MathSciNet  MATH  Google Scholar 

  45. 45.

    Schaeffer SE (2005) Stochastic local clustering for massive graphs. In: Ho TB, Cheung D, Liu H (eds) Advances in knowledge discovery and data mining. Proceedings of the 9th Pacific-Asia conference, PAKDD 2005, Hanoi, Vietnam, May 18–20, 2005. Lecture notes in computerscience, vol 3518. Springer, Berlin, pp 354–360. doi:10.1007/11430919_42

  46. 46.

    Sozio M, Gionis A (2010) The community-search problem and how to plan a successful cocktail party. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 939–948

  47. 47.

    Stamatatos E (2009) A survey of modern authorship attribution methods. J Am Soc Inf Sci Technol 60(3):538–556

    Article  Google Scholar 

  48. 48.

    Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z (2008) Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 990–998

  49. 49.

    Tran DH, Takeda H, Kurakawa K, Tran MT (2012) Combining topic model and co-author network for KAKEN and DBLP linking. In: Intelligent information and database systems, lecture notes in computer science, vol 7198, Springer, pp 396–404

  50. 50.

    Yang T, Jun R, Chi Y, Zhu S (2009) Combining link and content for community detection: a discriminative approach. In: Proceedings of KDD, ACM, New York, pp 927–936

  51. 51.

    Ye Q, Wu B, Wang B (2008) Visual analysis of a co-authorship network and its underlying structure. In: Fifth international conference on fuzzy systems and knowledge discovery, 2008. FSKD ’08., vol 4, pp 689–693. doi:10.1109/FSKD.2008.436

  52. 52.

    Zhou S, Cox I, Hansen LK (2009) Second-order assortative mixing in social networks. Technical Report. arXiv:0903.0687

Download references

Acknowledgments

The first author was supported by SEP-PROMEP Grant No. 103.5/12/7884. We thank the anonymous reviewers for their useful suggestions that helped improve the manuscript.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Satu Elisa Schaeffer.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Villarreal, S.E.G., Schaeffer, S.E. Local bilateral clustering for identifying research topics and groups from bibliographical data. Knowl Inf Syst 48, 179–199 (2016). https://doi.org/10.1007/s10115-015-0867-y

Download citation

Keywords

  • Clustering
  • Knowledge discovery
  • Collaboration networks
  • Network analysis