Abstract
Network information has become a common feature of many modern experiments. From vaccine efficacy studies to marketing for product adoption, stakeholders aim to estimate global treatment effects — what happens if everyone in a network is treated versus if no one is treated. Because individual outcomes are potentially influenced by the treatments or behaviors of others in the network, experimental designs must condition on the underlying network. Social networks frequently exhibit homophilous community structure, meaning that individuals within observed or latent communities are more similar to each. This observation motivates the development of community aware experimental design. This design recognizes that information between individuals likely flows along within community edges rather than across community edges. We demonstrate that this design reduces the bias of a simple difference in means estimator, even when the community structure of the graph needs to be estimated. Further, we show that as the community detection problem gets more difficult or if the community structure does not affect the causal question, the proposed design maintains its performance.
Similar content being viewed by others
References
Abbe E (2017) Community detection and stochastic block models: recent developments. J Mach Learn Res 18(1):6446–6531
Adamic LA, Glance N (2005) The political blogosphere and the 2004 us election: divided they blog. Proceedings of the 3rd international workshop on link discovery (pp. 36–43)
Aldrich H, Dubini P (1991) Personal and extended networks are central to the entrepreneurial process. J Bus Ventur 6(5):305–313
Aral S, Muchnik L, Sundararajan A (2009) Distinguishing influencebased contagion from homophily-driven diffusion in dynamic networks. Proc Nat Acad Sci 106(51):21544–21549
Aronow PM, Samii C (2017) Estimating average causal effects under general interference, with application to a social network experiment. Ann Appl Stat 11(4):1912–1947
Athey S, Eckles D, Imbens GW (2018) Exact p values for network interference. J Am Stat Assoc 113(521):230–240
Aukett R, Ritchie J, Mill K (1988) Gender differences in friendship patterns. Sex Roles 19(1–2):57–66
Awan U, Morucci M, Orlandi V, Roy S, Rudin C, Volfovsky A (2020) Almost-matching-exactly for treatment effect estimation under network interference. International conference on artificial intelligence and statistics (pp. 3252–3262)
Bail CA, Argyle LP, Brown TW, Bumpus JP, Chen H, Hunzaker MF, Volfovsky A (2018) Exposure to opposing views on social media can increase political polarization. Proc Nat. Acad. Sci. 115(37):9216–9221
Basse GW, Airoldi EM (2018) Model-assisted design of experiments in the presence of network-correlated outcomes. Biometrika 105(4):849–858
Bhattacharyya S, Bickel PJ (2014) Community detection in networks using graph distance. arXiv preprint arXiv:1401.3915
Binkiewicz N, Vogelstein JT, Rohe K (2017) Covariate-assisted spectral clustering. Biometrika 104(2):361–377
Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10008
Bruna J, Li X (2017) Community detection with graph neural networks. Stat 1050:27
Budel G, Van Mieghem P (2020) Detecting the number of clusters in a network. J Complex Netw 8(6):047
Chamberlain B, Kasair C, Rotheram-Fuller E (2007) Involvement or isolation? the social networks of children with autism in regular classrooms. J Autism Dev Disord 37(2):230–242
Eckles D, Karrer B, Ugander J (2016) Design and analysis of experiments in networks: reducing bias from interference. J Causal Inference 5(1):7530
Faust K, Wasserman S (1992) Blockmodels: interpretation and evaluation. Soc Netw 14(1–2):5–61
Geng J, Bhattacharya A, Pati D (2019) Probabilistic community detection with unknown number of communities. J Am Stat Assoc 114(526):893–905
Granovetter MS (1973) The strength of weak ties. Am J Soc 78(6):1360–1380
Hoff P (2008) Modeling homophily and stochastic equivalence in symmetric relational data. In: Platt J, Koller D, Singer Y, Roweis S (eds) Advances in neural information processing systems, vol 20. MIT Press, Cambridge MA, pp 657–664
Hoff P, Fosdick B, Volfovsky A, Stovel K (2013) Likelihoods for fixed rank nomination networks. Netw Sci 1(3):253–277
Holland PW, Laskey KB, Leinhardt S (1983) Stochastic blockmodels: first steps. Soc Netw 5(2):109–137
Hudgens MG, Halloran ME (2008) Toward causal inference with interference. J Am Stat Assoc 103(482):832–842
Igarashi T, Takai J, Yoshida T (2005) Gender differences in social network development via mobile phone text messages: A longitudinal study. J Soc Pers Relatsh 22:691–713
Jagadeesan R, Pillai NS, Volfovsky A (2020) Designs for estimating the treatment effect in networks with interference. Ann Stat 48(2):679–712
Karrer B, Shi L, Bhole M, Goldman M, Palmer T, Gelman C, Sun, F (2021) Network experimentation at scale. Proceedings of the 27th acm sigkdd conference on knowledge discovery & data mining (pp. 3106–3116)
Karwa V, Airoldi EM (2018). A systematic investigation of classical causal inference strategies under mis-specification due to network interference. arXiv preprint arXiv:1810.08259
Kohavi R, Deng A, Frasca B, Walker T, Xu Y, Pohlmann N (2013). Online controlled experiments at large scale. Proceedings of the 19th acm sigkdd international conference on knowledge discovery and data mining, (pp. 1168–1176)
Kossinets G, Watts DJ (2006) Empirical analysis of an evolving social network. Science 311(5757):88–90
Krzakala F, Moore C, Mossel E, Neeman J, Sly A, Zdeborová L, Zhang P (2013) Spectral redemption in clustering sparse networks. Proc Nat Acad Sci 110(52):20935–20940. https://doi.org/10.1073/pnas.1312486110
Lorrain F, White HC (1971) Structural equivalence of individuals in social networks. J Math Soc 1(1):49–80
Manski CF (1995) Identification problems in the social sciences. Harvard University Press, Cambridge
Mathews H, Mayya V, Volfovsky A, Reeves G (2019) Gaussian mixture models for stochastic block models with non-vanishing noise. 2019 IEEE 8th international workshop on computational advances in multi-sensor adaptive processing (camsap), pp. 699–703
Mathews H, Volfovsky A (2021) Latent community adaptive network regression. arXiv preprint arXiv:2112.06097
Mayer A, Puller SL (2008) The old boy (and girl) network: social network formation on university campuses. J Pub Econ 92(1–2):329–347
Mayya V, Reeves G (2019). Mutual information in community detection with covariate information and correlated networks. 2019 57th annual allerton conference on communication, control, and computing (allerton), pp. 602–607
Newman ME, Reinert G (2016) Estimating the number of communities in a network. Phys Rev Lett 117(7):078301
Paluck EL, Shepherd H, Aronow PM (2016). Changing climates of conflict: A social network experiment in 56 schools. Proc Nat Acad Sci, 113 (3):566–571. Retrieved from https://www.pnas.org/content/113/3/566 https://arxiv.org/abs/ https://www.pnas.org/content/113/3/566.full.pdf 10.1073/pnas.1514483113
Paluck EL, Shepherd HR, Aronow P (2020) Changing climates of conflict: a social network experiment in 56 schools. Proceedings of the National Academy of Sciences. NJ 10.3886/ICPSR37070.v2
Puelz D, Basse G, Feller A, Toulis P (2019). A graph-theoretic approach to randomization tests of causal effects under general interference. arXiv preprint arXiv:1910.10862
Rajkumar K, Saint-Jacques G, Bojinov I, Brynjolfsson E, Aral S (2022) A causal test of the strength of weak ties. Science 377(6612):1304–1310
Reeves G, Mayya V, Volfovsky A (2019). The geometry of community detection via the mmse matrix. 2019 IEEE international symposium on information theory (isit), pp. 400–404
Rienties B, Nolan E-M (2014) Understanding friendship and learning networks of international and host students using longitudinal social network analysis. Int J Intercult Relat 41:165–180
Rohe K, Chatterjee S, Yu B et al (2011) Spectral clustering and the highdimensional stochastic blockmodel. Ann Stat 39(4):1878–1915
Rubin DB (1990). Formal mode of statistical inference for causal effects. J Stat Plann Inference 25 (3):279-292. Retrieved from https://www.sciencedirect.com/science/article/pii/0378375890900778 https://doi.org/10.1016/0378-3758(90)90077-8
Särndal C-E, Swensson B, Wretman J (2003) Model assisted survey sampling. Springer Science and Business Media, Berlin
Sävje F (2021). Causal inference with misspecified exposure mappings. arXiv preprint arXiv:2103.06471
Sävje F, Aronow PM, Hudgens MG (2021) Average treatment effects in the presence of unknown interference. Ann Stat 49(2):673–701
Sentse M, Kiuru N, Veenstra R, Salmivalli C (2014) A social network approach to the interplay between adolescents’ bullying and likeability over time. J Youth Aadolesc 43(9):1409–1420
Shen L, Amini A, Josephs N, Lin L (2022) Bayesian community detection for networks with covariates. arXiv preprint arXiv:2203.02090
Staber U (1993) Friends, acquaintances, strangers: gender differences in the structure of enterpreneurial networks. J Small Bus Entrep 11:73–82
Sussman DL, Airoldi EM (2017) Elements of estimation theory for causal effects in the presence of network interference. arXiv preprint arXiv:1702.03578
Toulis P, Kao E (2013). Estimation of causal peer influence effects. In International conference on machine learning. PMLR, NY, pp. 1489–1497
Ugander J, Karrer B, Backstrom L, Kleinberg J (2013) Graph cluster randomization: Network exposure to multiple universes. Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 329–337
Ugander J, Yin H (2020) Randomized graph cluster randomization. arXiv preprint arXiv:2009.02297
White HC, Boorman SA, Breiger RL (1976) Social structure from multiple networks. i. blockmodels of roles and positions. Am J Soc 81(4):730–780
Xu Y, Chen N, Fernandez A, Sinno O, Bhasin A (2015). From infrastructure to culture: A/b testing challenges in large scale social networks. Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 2227–2236
Yan B, Sarkar P (2021) Covariate regularized community detection in sparse graphs. J Am Stat Assoc 116(534):734–745
Zhou Y, Liu Y, Li P, Hu F (2020) Cluster-adaptive network a/b testing: from randomization to estimation. arXiv preprint arXiv:2008.08648
Acknowledgements
The authors gratefully acknowledge financial support from the Statistical and Applied Mathematical Sciences Institute, the National Science Foundation (DMS 2046880) and the Army Research Institute. (W911NF1810233).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors do not have any competing interests.
Code availability
Code will be made available for all simulation studies and no additional data was generated for this manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mathews, H., Volfovsky, A. Community informed experimental design. Stat Methods Appl 32, 1141–1166 (2023). https://doi.org/10.1007/s10260-022-00679-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-022-00679-6