Skip to main content

Advertisement

Log in

An interactive graphical method for community detection in network data

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

The detection of community structures within network data is a type of graph analysis with increasing interest across a broad range of disciplines. In a network, communities represent clusters of nodes that exhibit strong intra-connections or relationships among nodes in the cluster. Current methodology for community detection often involves an algorithmic approach, and commonly partitions a graph into node clusters in an iterative manner before some stopping criterion is met. Other statistical approaches for community detection often require model choices and prior selection in Bayesian analyses, which are difficult without some amount of data inspection and pre-processing. Because communities are often fuzzily-defined human concepts, an alternative approach is to leverage human vision to identify communities. The work presents a tool for community detection in form of a web application, called gravicom, which facilitates the detection of community structures through visualization and direct user interaction. In the process of detecting communities, the gravicom application can serve as a standalone tool or as a step to potentially initialize (and/or post-process) another community detection algorithm. In this paper we discuss the design of gravicom and demonstrate its use for community detection with several network data sets. An “Appendix” describes details in the technical formulation of this web application built on the R package Shiny and the JavaScript library D3.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Airoldi EM, Blei DM, Fienberg SE, Xing EP (2009) Mixed membership stochastic blockmodels. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems 21. Curran Associates Inc, Red Hook, pp 33–40

    Google Scholar 

  • Amini AA, Chen A, Bickel PJ, Levina E (2013) Pseudo-likelihood methods for community detection in large sparse networks. Ann Stat 41(4):2097–2122

    Article  MathSciNet  MATH  Google Scholar 

  • Ball B, Karrer B, Newman MEJ (2011) Efficient and principled method for detecting communities in networks. Phys Rev E 84(036):103

    Google Scholar 

  • Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. http://www.aaai.org/ocs/index.php/ICWSM/09/paper/view/154

  • Bickel P, Choi D, Chang X, Zhang H (2013) Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. Ann Stat 41(4):1922–1943

    Article  MathSciNet  MATH  Google Scholar 

  • Bostock M, Ogievetsky V, Heer J (2011) D3: data-driven documents. IEEE Trans Vis Computer Gr 17(12):2301–2309

    Article  Google Scholar 

  • Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(066):111

    Google Scholar 

  • Couture-Beil A (2013) rjson: JSON for R. http://CRAN.R-project.org/package=rjson, R package version 0.2.12

  • Csardi G, Nepusz T (2006) The igraph software package for complex network research. InterJ Complex Syst :1695, http://igraph.sf.net

  • Cukier J (2012) Events in the game of thrones. http://www.jeromecukier.net/projects/agot/events.html

  • Duch J, Arenas A (2005) Community detection in complex networks using extremal optimization. Phys Rev E 72(027):104

    Google Scholar 

  • Dunne C, Shneiderman B (2013) Motif simplification: improving network visualization readability with fan, connector, and clique glyphs. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, New York, NY, USA, CHI ’13, pp 3247–3256

  • Dwyer T, Lee B, Fisher D, Quinn KI, Isenberg P, Robertson G, North C (2009) A comparison of user-generated and automatic graph layouts. IEEE Trans Vis Computer Gr 15(6):961–968

    Article  Google Scholar 

  • Fortunato S (2010) Community detection in graphs. Phys Rep 486(35):75–174

    Article  MathSciNet  Google Scholar 

  • Gandrud C (2014) d3Network: tools for creating D3 JavaScript network, tree, dendrogram, and Sankey graphs from R. http://CRAN.R-project.org/package=d3Network, R package version 0.5.1

  • Gansner ER, North SC (2000) An open graph visualization system and its applications to software engineering. Softw Pract Exp 30(11):1203–1233

    Article  MATH  Google Scholar 

  • Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci 99(12):7821–7826

    Article  MathSciNet  MATH  Google Scholar 

  • Greene D, Doyle D, Cunningham P (2010) Tracking the evolution of communities in dynamic social networks. In: Advances in Social Networks Analysis and Mining (ASONAM), international conference on 2010, pp 176–183

  • Guo J, Wilson AG, Nordman DJ (2013) Bayesian nonparametric models for community detection. Technometrics 55(4):390–402

    Article  MathSciNet  Google Scholar 

  • Hansen D, Shneiderman B, Smith MA (2010) Analyzing social media networks with NodeXL: Insights from a connected world. Morgan Kaufmann, Burlington

    Google Scholar 

  • Hofman JM, Wiggins CH (2008) Bayesian approach to network modularity. Phys Rev Lett 100(258):701

    Google Scholar 

  • Holland PW, Leinhardt S (1981) An exponential family of probability distributions for directed graphs. J AmStat Assoc 76(373):33–50

    Article  MathSciNet  MATH  Google Scholar 

  • Karrer B, Newman MEJ (2011) Stochastic blockmodels and community structure in networks. Phys Rev E 83(016):107

    MathSciNet  Google Scholar 

  • Kawadia V, Sreenivasan S (2012) Sequential detection of temporal communities by estrangement confinement. Sci Rep 2, http://www.nature.com/articles/srep00794

  • Kemp C, Tenenbaum JB, Griffiths TL, Yamada T, Ueda N (2006) Learning systems of concepts with an infinite relational model. In: Proceedings of the 21st national conference on artificial intelligence - Vol 1, AAAI Press, AAAI’06, pp 381–388

  • Kling F (2014) Jsnetworkx: A javascript port of the networkx graph library. http://felix-kling.de/JSNetworkX/

  • Krebs V (2004a) Books about US politics. http://networkdata.ics.uci.edu/data.php?d=polbooks

  • Krebs V (2004b) Divided we stand... still. http://www.orgnet.com/divided2.html

  • Lancichinetti A, Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E 78(046):110

    Google Scholar 

  • Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2008) Statistical properties of community structure in large social and information networks. In: Proceedings of the 17th International Conference on World Wide Web, ACM, New York, NY, USA, WWW ’08, pp 695–704

  • Leskovec J, Lang KJ, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International Conference on World Wide Web, ACM, New York, NY, USA, WWW ’10, pp 631–640

  • Martin GR (1996) A game of thrones. Bantam Books, New York

    Google Scholar 

  • Martin GR (1999) A clash of kings. Bantam Books, New York

    Google Scholar 

  • Martin GR (2000) A storm of swords. Bantam Books, New York

    Google Scholar 

  • Martin GR (2005) A feast for crows. Bantam Books, New York

    Google Scholar 

  • Martin GR (2011) A dance with dragons. Bantam Books, New York

    Google Scholar 

  • McGrath C, Blythe J, Krackhardt D (1996) Seeing groups in graph layouts. Connections 19(2):22–29

    Google Scholar 

  • Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256

    Article  MathSciNet  MATH  Google Scholar 

  • Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(026):113

    Google Scholar 

  • Nowicki K, Snijders TAB (2001) Estimation and prediction for stochastic blockstructures. J Am Stat Assoc 96(455):1077–1087

    Article  MathSciNet  MATH  Google Scholar 

  • R Core Team (2014) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org

  • Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci 105(4):1118–1123

    Article  Google Scholar 

  • RStudio Inc (2013) shiny: Web application framework for R. http://CRAN.R-project.org/package=shiny, R package version 0.4.0

  • Wasserman S, Faust K (1994) Social network analysis: methods and applications, vol 8. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  • Xie J, Chen M, Szymanski BK (2013) Labelrankt: Incremental community detection in dynamic networks via label propagation. In: Proceedings of the workshop on dynamic networks management and mining, ACM, New York, NY, USA, DyNetMM ’13, pp 25–32

  • Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andee Kaplan.

Technical appendix

Technical appendix

gravicom utilizes three main pieces of software to establish interactive user control of a random graph as sketched out in Fig. 15, which are Shiny, D3, and igraph. These are used, respectively, for server/client interaction management, user interface and graph layout, and data formatting, respectively. In the following subsections, we describe the purposes of these three components in more detail.

Fig. 15
figure 15

Relationship between client and server, specifically focusing on how data travels between the two

There are very minimal software requirements for a user of gravicom. The client simply needs to have a JavaScript enabled internet browser with HTML5 compatibility, something which almost any modern browser fulfills (an exception is IE8 and below).

The server side requirements are more extensive, but this does not affect the user of gravicom, only those wanting to host their own instance of the application. To host gravicom, a Linux server is required, with the following installed:

  • Node.js (0.8.16 or later)

  • R (2.15 or later)

  • Shiny R package, installed into the machine-wide site library.

  • Shiny Server

1.1 Shiny

Shiny RStudio Inc (2013) is an R package created by RStudio that enables R users to create an interactive web application that utilizes R as the background engine. Through default methods to build user interface elements in HTML and a handle to the server side code, Shiny is a simple way to turn R code into a website.

gravicom uses the Shiny functionality to create user controls, pass correctly formatted data to the client, and as a means to display summary information regarding the user’s interactions with a graph at any point in time. In this context, Shiny serves as the translator between the formatted data and what the user sees and interacts with on his screen.

1.2 D3

D3 Bostock et al. (2011) stands for “Data Driven Documents” and is a JavaScript library developed and maintained by Mike Bostock with the purpose of visualizing and interacting with data in a web-based interface. It is freely available from http://www.d3js.org. The library facilitates manipulation of HTML elements, SVG (scalable vector graphics), and CSS (cascading style sheets) with the end goal of rendering animations and providing user interactions that are tied to the underlying data. The key idea behind the library is that Document Object Model elements are completely determined by the data. The Document Object Model (DOM) is a convention for representing and interacting with objects in HTML, XHTML and XML. So, rather than adding elements to a web page to be viewed by users, D3 allows users to see and interact with graphical representations of their data in a web framework.

Fig. 16
figure 16

Page lifecycle beginning from on load. User actions are highlighted in red, server actions in green, and actions completed on the client side are highlighted in blue (color figure online)

gravicom uses D3 to handle all graphical displays and user interactions with the graph. The data is passed to the client and able to be used through Shiny’s input bindings. It is crucial that the data has been formatted correctly at this point for the JavaScript to properly function. For this reason, we limit the file types being passed into the tool to a robust graph-specific type.

At this point in the page lifecycle, the graph nodes are tied to circles and the edges are tied to paths on the page. User manipulations such as selecting, dragging, and grouping are handled by D3 and data is passed back to the server via Shiny’s output bindings to allow for communication between user and the R engine underneath. This is illustrated in Fig. 16. What this means is that all visualization and user interaction with the graph are accomplished using JavaScript, more specifically the library D3. Shiny and R serve as the framework on which the data sits, but when the user touches the data they are doing so through the JavaScript elements.

1.3 igraph

igraph Csardi and Nepusz (2006) is a software package used for creating and manipulating undirected and directed graphs. It is a cross-language package available for C, R, python, and Ruby. igraph also supports multiple graph file formats and visualization of graph structures.

gravicom utilizes two parts of igraph, first is the conversion from a gml file to an XML file. The gml file format, short for Graph Modelling Language, is a hierarchical ASCII-based file format for describing graphs. Below is an example gml file of an undirected graph consisting of two nodes linked by a single edge. The important points to note are that node identifiers (id) have to be numeric. An edge consists only of source and target ids of the nodes it connects, while nodes can have other attributes, e.g. value in the example. For a directed graph, the parameter directed has to be set to 1, which will result in the edge information on target and source being evaluated accordingly.

figure a

For the conversion from an XML file to a JSON file we make user of the R package rjson Couture-Beil (2013). JSON is the native data format used in D3, which makes working with data in the D3 library incredibly straightforward. Here is our example in the finalized JSON format:

figure b

Our example data will yield the graph in Fig. 17.

Fig. 17
figure 17

Graph created from sample gml file

The second use of igraph within gravicom is to compute initial x and y coordinates for the nodes of the graph using a force-driven layout. This provides the initialization for the force-layout algorithm in D3. This reduces the computational load on the clients’ side and helps minimize unnecessary movement by the nodes. This is critical as the extra movement at the loading of the pages creates an unnecessarily chaotic start to the user’s experience.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kaplan, A., Hofmann, H. & Nordman, D. An interactive graphical method for community detection in network data. Comput Stat 32, 535–557 (2017). https://doi.org/10.1007/s00180-016-0663-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-016-0663-5

Keywords

Navigation