Employing the Gini coefficient to measure participation inequality in treatment-focused Digital Health Social Networks

Digital Health Social Networks (DHSNs) are common; however, there are few metrics that can be used to identify participation inequality. The objective of this study was to investigate whether the Gini coefficient, an economic measure of statistical dispersion traditionally used to measure income inequality, could be employed to measure DHSN inequality. Quarterly Gini coefficients were derived from four long-standing DHSNs. The combined data set included 625,736 posts that were generated from 15,181 actors over 18,671 days. The range of actors (8–2323), posts (29–28,684), and Gini coefficients (0.15–0.37) varied. Pearson correlations indicated statistically significant associations between number of actors and number of posts (0.527–0.835, p < .001), and Gini coefficients and number of posts (0.342–0.725, p < .001). However, the association between Gini coefficient and number of actors was only statistically significant for the addiction networks (0.619 and 0.276, p < .036). Linear regression models had positive but mixed R2 results (0.333–0.527). In all four regression models, the association between Gini coefficient and posts was statistically significant (t = 3.346–7.381, p < .002). However, unlike the Pearson correlations, the association between Gini coefficient and number of actors was only statistically significant in the two mental health networks (t = −4.305 and −5.934, p < .000). The Gini coefficient is helpful in measuring shifts in DHSN inequality. However, as a standalone metric, the Gini coefficient does not indicate optimal numbers or ratios of actors to posts, or effective network engagement. Further, mixed-methods research investigating quantitative performance metrics is required.


Introduction
Technology is now an important component of the healthcare ecosystem, and both computer and management science are playing a greater role in analyzing data to measure efficiencies (Chaudhry et al. 2006;Harrison et al. 2007;Holden and Karsh 2010). The digital health industry is maturing, and as a result, highly tailored evidence-based behavior change programs are increasingly available to consumers through pharmaceutical companies, non-profit organizations, insurers, private corporations, and government entities.
A component of these programs is Digital Health Social Networks (DHSNs), otherwise known as bulletin boards or peer-to-peer support groups. While there are still no firm conclusions on how to determine their efficaciousness (Eysenbach et al. 2004;Graham et al. 2015), the general consensus is that social support and knowledge sharing increases patient education, enhances self-management, and decreases burden on existing health services (Bender et al. 2013;Brennan et al. 1995;Cobb et al. 2011;Conrad et al. 2016;Ploderer et al. 2013;Takahashi et al. 2009;Wicks et al. 2012;Wright 2002).
Hypothetically, an ideal DHSN would consist of members who are equally engaged. In reality, network participation is unequal. An issue is that other than observing a network's number of actors and number of posts, there are few metrics that can be used to identify participation inequality.
To address this issue, some research has sought to define actor roles (Carron-Arthur et al. 2015;Cleary and Stanton 2015;Cunningham et al. 2008;Jones et al. 2011;Selby et al. 2010;van Mierlo et al. 2012). By systematically categorizing participants, taxonomies can give insight into how various actors in complex networks function in relation to one another.
Other research has explored network topologies. Some of these studies have employed traditional social network analysis and method that focuses on nodes, ties, density of relationships, and degree centrality (Cobb et al. 2010;Urbanoski et al. 2016). Other streams have examined marketing rules of thumb (van Mierlo 2014), latent semantic analysis (Myneni et al. 2013), natural language processing (Wang et al. 2015), or the phenomenon of power laws (Carron-Arthur et al. 2014;van Mierlo et al. 2015).
As DHSNs shift into mainstream healthcare delivery it will be important to develop metrics that help managers assess growth, sustainability, and participation equality (Healey et al. 2014;Stearns et al. 2014). While the quality of DHSN content is important and is rooted in behavioral science, quantitative methods to analyze the health of a network may come from established theoretical constructs in economics and computer science.
Through measuring 222 quarters of participation from 15,181 actors from four separate DHSNs, this paper investigates whether the Gini coefficient, an economic measure of statistical dispersion traditionally used to measure income distribution in populations, can be employed as a management tool to help measure inequality of member participation over time.

The Lorenz curve
In economics, the Lorenz curve is a popular method that illustrates income distribution (Lorenz 1905). Graphically, the y-axis represents the percentage of income in an economy, and the x-axis represents cumulative income distribution in the total percentage of households (Fig. 1).
In a perfectly equal economy, all citizens share the same income, and the Lorenz curve would resemble the red line in Fig. 1, where y = x. Most economies are not equal. In Fig. 1, the Lorenz curve (blue line) illustrates economic inequality. Here, approximately 20 % of households receive 1 % of income, 40 % receive 3 %, 60 % receive 8 %, and 80 % receive 25 %.

The Gini coefficient
Developed by Corrado Gini in 1912(Gini 1912, the Gini coefficient is an inequality measure based on the Lorenz curve. Specifically, it measures the distance between the Lorenz curve and perfect equality (Bellu and Liberati 2006). A Gini coefficient of 1 represents an economy where a single individual generates all income, whereas a Gini coefficient of 0 represents an economy where all citizens share the same income.
If translated to DHSNs, a Gini coefficient of 1 would represent a network where one individual created all posts. Alternatively, a Gini coefficient of 0 would represent a social network where all members authored the same number of posts.
To our knowledge, Gini coefficient numerical and visual outputs have yet to be applied to assess participation inequality in DHSNs. However, the method has been utilized in other studies.
For example, a 2005 study accessed U.S. census data to measure Personal Computer (PC) ownership inequalities amongst whites and African Americans at national, regional, and state levels (Chakraborty and Bosman 2005). Results indicated that although decreasing overall, PC ownership inequality is substantially smaller among white households. A strength of the study was the use of the Lorenz curve and Gini coefficient to graphically illustrate variations in income and PC ownership within and between the two groups.
Gini coefficient numerical scores and their visual representations were also utilized in a 2002 Statistics Canada research study depicting the digital divide, or cumulative internet usage amongst differing household income deciles (Sciadas 2002). Results indicate that as time progresses, the Gini coefficient is decreasing and the digital divide is closing. However, graphical outputs show that the shift is mainly attributable to middle-income groups, while lowerincome and upper-income groups remain fairly stable.
Although there are other uses in research, a final example is a 2010 study analyzing university rankings. This study employed the Gini coefficient to assess whether academic institutions were becoming increasingly unequal (Halffman and Leydesdorff 2010). Ranking data mainly consisted of weighted contributions of total number of publications and citations, and findings indicate that contrary to popular belief, the 500 universities that publish at greatest frequency were becoming more equal in terms of output.
For the study duration, a staff of trained moderators monitored and maintained the four networks. Key moderator roles included ensuring compliance with network policies and user agreements, answering actor questions, guiding discussions when deemed appropriate, and offering technical assistance. For the purpose of analysis, moderator posts were removed from the dataset.
For each network, data sets were divided into annual quarters. Within each quarter, the total number of posts by each unique actor was calculated, and actors were ranked according to level of contribution. Next, the population in each quarter was divided into quintiles. Finally, Gini coefficients for each quarter were calculated. Pearson correlations and linear regression were used to assess the strength of relationships between Gini coefficient, actors, and posts.
All data collection policies and procedures adhered to international privacy guidelines (   3 Results Table 2 displays the relationships between each quarter's Gini coefficient, number of actors, and number of posts (Table 2). For each network, fluxions in quarterly shifts in the Gini coefficient were graphed (Figs. 2,3,4,5). Summary statistics were computed to identify means and standard deviation for each network's number of actors, number of posts, and Gini coefficient (see Table 3).
The DHSN with the least number of actors in any single quarter was the DC (n = 8), and the SSC was the DHSN with the greatest number of actors in any given quarter (n = 2323). Mean number of actors varied (n = 40.8-304.7), as did standard deviation (n = 19.6-347.3).
The AHC and DC had the least number of posts in any single quarter (n = 29), and the SSC was the DHSN with the greatest number of posts in any given quarter (n = 28,684).
In regards to Gini coefficient, three of the four DHSNs had at least one quarter with the highest level of inequality (0.37). Interestingly, each of the four social networks had quarters with similar lowest levels of inequality (0.15-0.19). Range of Gini coefficient varied slightly (0.15-0.22), however, mean (0.287 and 0.332) and standard deviation (0.032-0.058) did not.
Pearson correlations were then calculated to compare number of actors, posts, and Gini coefficient (see Table 4).
Pearson correlations computed positive and statistically significant relationships between number of actors and number of posts (0.527-0.835, p \ .001), and Gini coefficient and number of posts (0.342-0.725, p \ .001). However, the relationship between Gini coefficient and number of actors was only positive and statistically significant for the addiction networks (0.619 and 0.276, p \ .036).
Multiple regressions were then computed to examine the association between the Gini coefficient (dependent variable), and independent variables actors and posts (see Table 5).
Linear regression models had mixed R 2 results (0.333-0.527). In all four regression models, the association between Gini coefficient and posts were statistically significant (t = 3.346-7.381, p \ .002). However, as opposed to Pearson correlations, the relationship between Gini coefficient and number of actors was only statistically significant in the two mental health networks (t = -4.305 and -5.934, p \ .000).
In regards to collinearly, tolerance was all above 0.10 (0.303-0.723), and variance inflation factors were moderately correlated (1.384-3.297), indicating that multicollinearity is not a concern. However, as they did not approach a score of 2.0, Durbin-Watson statistics indicated the presence of autocorrelation (0.312-1.638).

Discussion
The results of this study generate several unique contributions to DHSN research, all of which require further investigation.

Detecting shifts in social network inequality
From both visual and quantitative perspectives, the Gini coefficient was effective in identifying shifts and trends in inequality. However, as a standalone metric, shifts in the Gini coefficient can be deceptive as coefficient of 0.33 can be calculated from a network of 29 actors who created 321 posts (see Table 2, AHC period 10-Q2), or a network of 119 actors who created 2867 posts (see Table 2, SSC period 10-Q2).
Future research in developing metrics to determine social network inequality should incorporate the Gini coefficient, but also test ratios pertaining to number of posts per actor, number of posts per Gini score, number of actors to Gini score, or various combinations.

Use of the Gini coefficient as a management tool
During the process of the study, informal qualitative inperson interviews were conducted with moderators and other management, and the visual depictions of shifts in quarterly Gini coefficient were deemed particularly helpful when recalling effects of technical outages, policy changes, the implementation of management issues and techniques, and dynamics of personalities within groups of actors.
In these informal meetings, discussion often centered on the status and intensity of participation from Superusers, actors who generate the greatest amount of network externalities (van Mierlo et al. 2012). Strategies to increase equality and engage non-Superusers were also discussed. Future research may investigate the value of utilizing the Gini coefficient as a tool to assist with network management.

Insights into social network utility and function
While there were consistent and statistically significant associations between Gini coefficient and number of posts in the Pearson correlations and the regressions, the associations between Gini coefficient and number of actors was inconsistent. It is interesting to note that this inconsistency was consistent for the two addiction DHSNs (AHC and SSC), and the two mood disorder DHSNs (DC and PC), and may potentially lend insight in quantifying the utility and function of DHSNs that promote different therapeutic approaches. For example, the AHC and SSC interventions focus on addictions, and the theoretical approach to treatment in these programs is the Stages of Change (Prochaska et al. 2008) and Structured Relapse Prevention (Sanchez-Craig 1995). Both treatment approaches are designed to assist users in developing coping skills to assist with strong, yet relatively short-term cravings. This is reflected in program content, which consists of short exercises that offer brief,  A 2010 mixed-methods study on the SSC social network (Selby et al. 2010) analyzed demographic and smoking characteristics for both actors and non-actors, and qualitative analyses were conducted to explore themes in message content. Results indicated that the most frequent first posts were from new actors who were struggling with their quit attempts, and 90.6 % of responses to these posts were from experienced actors. The authors concluded that social support in the network was particularly beneficial to many new actors who short-term, time-sensitive assistance.
Conversely, the two mental health interventions (DC and PC) are heavily focused on Cognitive Behavioral Therapy (CBT) (Herbert and Forman 2011). At the core of each intervention are nine sessions that are designed to take a minimum of 9 weeks to complete. Members are given homework between each session, which involves intensive experiments, journaling, and self-reflection.
A 2013 mixed-methods doctoral thesis on the DC social network (Sugimoto 2013) found that DC actors generally sought informational support, emotional support, coaching  support, and social companionship over long periods of time. Actors providing emotional support created an average of 8.3 posts, and those actors seeking support created an average of 5 posts. The author concluded that actor participation and social support in the network was long-term, and contributed to the everyday lives of actors.

Future directions
Future research may investigate potential relationships or patterns between Gini coefficient and number of actors in DHSNs leveraging differing therapeutic approaches. While this study focused on calculating the Gini coefficient over annual quarters, future research may experiment with calculating the Gini coefficient over shorter or longer time periods.

Strengths and limitations
The main strength of this paper was the number of study participants, the extensive longevity of the DHSNs, the number of posts, and the four separate indications, and that half of the social networks in the study were focused on mental health, and the other half addictions. We are not aware of any other study in the healthcare literature with such an extensive and complete data set. Both a strength and limitation is that the populations analyzed are self-selecting populations that actively sought help. In the context of this study, it was helpful to have data sets of active and engaged participants. However, these results will not be indicative of populations of patients in health plans, hospital networks, or mass public health campaigns.

Conclusion
The Gini coefficient is helpful in measuring shifts in DHSN inequality. However, as a standalone metric, the Gini coefficient may be misleading as does not indicate optimal numbers or ratios of actors to posts, or effective network engagement.
From a management perspective, the Gini coefficient may be leveraged as a tool to assist moderators in detecting trends, or as a training tool to help explain how network inequality fluctuates.
The results to this study may support prior mixedmethods research on two of the four social networks (Selby et al. 2010;Sugimoto 2013), which found differences in social network utility, functionality, and tone.
Further research investigating quantitative scoring techniques, performance metrics, and optimization benchmarks is required.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://crea tivecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.