Causal inference in collaboration networks using propensity score methods

Using panel data of school-class networks of 11–13-year-old students, this study investigates effects of schoolwork collaboration-networks on grades and school-related well-being. It suggests propensity score weighting-regression as a method of causal inference for data collected in social contexts, and in studies analyzing node-attributes as outcomes of interest. It will argued that this alternative approach is useful when stochastic actor-based models (SAOMs) show convergence problems in sparse networks. Three methods of causal analysis dealing with the problems of endogeneity bias and interference between observations will be discussed in this study: first, SAOMs for the co-evolution of networks and behavior/attitudes will be estimated, but this results in a systematic loss of data. Second, propensity score matching compares treated cases with untreated nearest neighbors. However, the stable-unit-treatment-value assumption (SUTVA) requires that the analysis controls for network embeddedness in the final analysis. This is possible by using propensity score weighting-regression, which is a flexible approach to capture treatment diffusion via multiplex networks.


Introduction
Human communities depend on diffusion of information through networks, social influence (Henrich 2016) and social exchange (Windzio 2018). While most social network analyses were rather descriptive in the 1960s and 1970s (Prell 2012), recent studies focus on network diffusion (Valente 1995) and the co-evolution of network ties and behavior (Snijders et al. 2010). New methods for causal inference sensitized researchers to the pitfalls of deriving causal conclusions from cross-sectional data, but also from longitudinal regression models (Morgan and Winship 2007;Brüderl and Ludwig 2014, 331p;VanderWeele and An 2014). Causal inference is important in research fields where social influence, contagion or diffusion regularly occur. Nevertheless, the 1 3 development of longitudinal methods for the analysis of networks took place more or less separately from innovations in methods for causal inference. Surprisingly, appropriate methods of causal inference for network data are rather new (Robins 2015, 216 pp), and only few network researchers seem to be familiar with the literature on causal inference (VanderWeele and An 2014; An 2018; Aral and Nicolaides 2017).
Studies on peer influence face the challenge of disentangling selection and influence (Ragan et al. 2019): cross-sectional correlations between ego's and alter's characteristics do not distinguish between whether we select our peers with respect to certain characteristics, or whether we assimilate towards our peers' characteristics (Shalizi and Thomas 2011). Undoubtedly, stochastic actor-oriented models (SAOMs) for the co-evolution of networks and behavior (Snijders et al. 2010) are a breakthrough for empirical research. In some cases, however, SAOMs are prone to convergence problems if networks are either sparse or do not show an appropriate ratio of stability and change. Excluding non-converging networks from the analysis can lead to considerable loss of data, which is why 'conventional' methods of causal analysis such as propensity score matching (PSM) are worthy of consideration, although network data violates the assumption of independent observations. In a recent contribution, Ragan et al. (2019) showed that conventional methods of panel data analysis, namely random effects, hybrid fixed effects and lagged hybrid fixed effects do not tend to overestimate peer influence compared with SAOMs. Nevertheless, in most network data the stable unit treatment value assumption (SUTVA) is violated and causal inference from methods that do not explicitly account for the embeddedness of dyads in the surrounding network structure should be interpreted with caution. According to the SUTVA, causal inference e.g. by using propensity score matching is only reliable when there is no diffusion of relevant information between cases in the treatment and in the control group.
The present study investigates the challenge of estimating causal effects of ties in dyadic collaboration-networks. In a first step of this study, SAOMs for the co-evolution of networks and behavior will estimate the effect of schoolwork collaboration on grades and well-being in school. Non-convergence of SAOMs (VanderWeele and An 2014: 368) indeed results in a considerable loss of data, for instance due to sparseness of ties in the schoolwork network. SAOMs are also vulnerable to omitted variable bias (Shalizi and Thomas 2011, 218;Robins 2015, 220;Ragan et al. 2019, 25)-e.g. when an explanatory variable x is correlated with the error term e (corr(x,e) ≠ 0), and is contaminated with (unobserved) information related to x and the dependent variable y. Secondly, propensity score matching will be used to estimate the causal effect of ties in the schoolwork-network on grades and on well-being. Since this approach suffers from the violation of the SUTVA and does not account for inherent spill-over from neighboring dyads, propensity score weighting regression (Morgan and Winship 2007;Guo and Fraser 2010) will be suggested as an alternative third approach. In line with the model suggested by  propensity score weighting regression can account for multiplex networks, in this case for the embeddedness of collaborative dyads into friendship networks. Friendships between treated and non-treated students can be controlled in order to account for potential 'diffusion' of information across groups. The method captures at least partially the effect of contact among observations and is more in line with the SUTVA. Further developing propensity score weighting regression and related methods for the analysis of causal effects in network data can be a fruitful alternative to SAOMs in situations of limited data or sparse networks.

Analyzing outcomes of network ties
Communication among adolescents in networks is considered increasingly important e.g. for political mobilization (Saud 2018;Ida et al. 2020) and cooperation. Over the last three decades, school-children's group-work became a growing research field in education science (Howe and Tolmie 2003). The outcome of interest is usually the development of academic performance (e.g. grades) (Webb 1989;Crosnoe 2000;Lubbers 2004). In addition, also social benefits and pupils' well-being are desired results of group work, since they affect the learning environment in the classroom (Howe and Tolmie 2003). Providing causal evidence of group-work networks is far from being trivial. When two pupils become involved in networks of schoolwork collaboration, they might also have similar attitudes towards academic issues. This selectivity is prone to endogeneity bias (Morgan and Winship 2007: 77p) in the estimation of a causal effect. Gremmen et al. (2017) analyzed the co-evolution of adolescents' friendship and academic achievement. According to their six wave network analysis in first and second year secondary school, students select friends on the basis of alters' grades. Subsequently, their own grades develop in the same direction as their alters' grades ("First selection, than influence") (Gremmen et al. 2017). To date, systematic analyses of networks of schoolwork collaboration are rare. Schoolwork-networks have been analyzed (Windzio 2013;Ivaniushina et al. 2016), but not the impact of these networks on outcomes.
Studies on the diffusion of knowledge, behaviour or attitudes are well established (Rogers 2003), even though the substantial social structure through which diffusion proceeds, namely the network, became systematically considered not before T. Valente's work (Valente 1995). In the early standard models of network diffusion, the hazard rate of adoption at time t depends on ties to e.g. infectors or opinion leaders at t − 1. However, there was no systematic treatment of selection processes due to the characteristics of interest, e.g. when non-infected persons get into contact with infected persons in order to care for them, and thereby adopt the disease. Criticism of Christakis and Fowler's (2007) study on diffusion of obesity through networks sensitized researchers to the problem of latent homophily when making causal inference in network-diffusion studies. How difficult it is to statistically disentangle selection and influence also in longitudinal settings, and further, how vulnerable diffusion models in general are to omitted variable bias, has been demonstrated by Shalizi et al. (2011).
Analysing the diffusion of mobile service application in a large global instant messaging network, Aral et al. (2009) used propensity score matching methods (PSM). The treatment was the presence of adopters in the subjects' local messaging network, and was predicted by a vector of behavioral and demographic covariates of the respective individual at time t (Aral et al. 2009). The authors conclude that not accounting for selection by using PSM would lead to a 700% over-estimation of the treatment effect. Arpino et al. (2017) combined social network data with PSM and analyzed effects of countries' GATT membership (General Agreement on Tariffs and Trade) on global economic interdependence. Units of analysis where 1319 country dyads in the year 1954. The outcome of interest was the log of trade flows in the respective dyad at t + 1. Their study is an important substantial contribution to the field, but also to the methodological problem of applying propensity score methods to network data. When predicting the propensity score, Arpino et al. (2017) controlled for a set of network statistics such as a node's degree-centrality and local and global clustering. The authors assume that the 1 3 statistical non-independence of dyads would be controlled when computing the treatment by accounting for the network characteristics in the selection model.
Propensity score matching methods of causal inference for networks imply the 'no interference' assumption: subjects are independent of each other and there is no diffusion of information from treated to control cases (VanderWeele and An 2014). More precisely, the stable value treatment assignment assumption (SUTVA) states "… that the value of Y for unit i when exposed to the treatment w will be the same no matter what mechanism is used to assign treatment w to unit i and no matter what treatments the other units receive" (Guo and Fraser 2010: 35). In other words, it is assumed "… that there is no contamination, no information shared, between treated and untreated matched samples …" (Barringer et al. 2014, p. 18). Ruling out contamination and diffusion is obviously difficult in social network studies, but the problem also affects studies on school classes where networks remain unobserved, but social processes which could be represented by networks nevertheless exist!
A revealing example of how information spreads through networks and thereby affects non-treated subjects comes from a smoking-prevention intervention study. Regarding interference itself as the outcome of interest, the study showed how friendship ties significantly increased the log odds of receiving information from an intervention brochure (An and VanderWeele 2019). Whilst interference is a nuisance in many research settings when it violates the SUTVA, it is here the outcome of interest. An (2018) explicitly measured information-diffusion in the survey, but admits that "… treatment diffusion data may not be readily available" in most studies (An 2018: 172). Figure 1 shows a school-class network where black lines represent ties of schoolwork collaboration and grey lines children's friendships. Having network ties in two different dimensions-friendship and schoolwork collaboration-is called "multiplexity" in social network terminology. The issue of contamination and shared information becomes obvious in Fig. 1: dyad 1 is just a friendship, there is no "treatment" by joint schoolwork collaboration. If both students in dyad 2 benefit from collaboration (see below), then also dyad 1 will show an average increase in competence because the upper-left actor of dyad 2 is also involved in the untreated dyad 1. There is contamination between these two dyads. Furthermore, information from a treatment could also pass several steps through the network. Violation of the SUTVA might be an issue in the studies of Aral et al. (2009) and Arpino et al. (2017), where this problem has not been systematically discussed.
Units of the following PSM analysis are dyads. The specific characteristic of collaboration networks is the (potential) mutual benefit of both actors in a collaborative dyad. Figure 2 shows four possible dyadic constellations A-D of a collaboration effect in schoolwork networks.
In scenario A of symmetric collaboration, the dark grey small circles indicate a gain in competence for ego and alter. This the ideal case of collective good-generation. In scenario B there is an asymmetric, unidirectional transfer of competence from the left to the right actor. Scenario C is asymmetric as well, but due to the collaboration also the left actor further increases his or her competence, since explaining academic issues usually leads to a consolidation of skills and knowledge also at the sender's side. Finally, the sender in D benefits, but does not succeed in transferring competence to the potential receiver.
There is a specific problem of causal inference of dyadic collaboration-effects in a network N. With respect to the average benefit in a dyad, the empty dyad {d, i} ∈ N in Fig. 2 benefits from the collaboration in {c, d} ∈ N, even though actors in {d, i} ∈ N do not collaborate. If, however, collaboration generates a collective good the outcome is dyadic. While this is not a problem for the SAOM (see below) because it is actor-oriented, it has consequences for propensity score matching based on dyads. Propensity score weighting regression, in contrast, allows addressing this problem by controlling the relevant covariates. Figure 3 summarizes the analytical approach graphically. The first step is the SAOM. In step 2, a probit p* model will be estimated to predict the selectivity of ties in schoolwork networks also by parental contact and children's friendship (Table 1). Since the causal order between ties in these network dimensions could also be reversed, observed values of the respective independent variable (friendship, parental contact) have been replaced by an instrumented variable, where the instruments are derived from the models in Table 6 in the "Appendix". Ties in the parental contact and friendship network have been predicted by network self-organization (Lusher et al. 2013;Windzio 2015), namely by network structural effects of mutuality, 2-in-and 2-out-stars, transitive and cyclic triads and same-sex (the latter in the parental contact network only). What is the reason to estimate the models in Table 6 ("Appendix")? They predict the propensity of a tie in a friendship or parental contact network basically by 'network self-organization'. Results are instrumented variables, which will be used as explanatory variables in the Fig. 2 Modes of competence generation in collaborative in dyads. A Both actors benefit; B one is sender, the other is receiver, only the receiver benefits; C one is the sender, the other is the receiver, both benefit, D sender tries to explain and benefits, but the receiver is desperate probit model in Table 1, instead of the observed network ties (Windzio 2015, and see below).

Modeling approach
These models allow a good prediction of the propensity to form a tie in the schoolwork collaboration network used in the matching analysis. Propensity scores are predicted separately for different subgroups in order to get the conditional average treatment effect on the treated (CATT) in step 2 (columns (2) and (3) in Table 1). The
In the first approach, stochastic actor-based models of co-evolution of networks and behaviour are applied to disentangle effects of selection and influence in longitudinal network data (Snijders et al. 2010). Losing a large amount of data in the SAOM is prone to bias, which is why, secondly, propensity score matching is used (step 2). This method is based on the 'stable unit treatment value assignment assumption' (SUTVA): there should not be any interaction between units from treatment and control group (Morgan and Winship 2007;Gangl 2010). Moreover, having social network information in the data allows at least to control for the embeddedness of dyads in the surrounding social network-e.g. in transitive and cyclic triads, 2-in-and 2-out stars and patterns of mutuality. This will be the third approach, the propensity score weighting regression model (Guo and Fraser 2010) What are the advantages of propensity score weighting regression?
1. Predicting the propensity score by a p* model for ties in networks takes the statistical non-independence of dyads into account. However, also the final estimation of the average treatment effect on the treated (ATT) assumes statistical non-independence of dyads. By using propensity score weighting regression researchers can control for a large set of independent variables in the second step, and thereby can take the embeddedness of a dyad into the surrounding subnetwork into account, e.g. diffusion of information of the treatment 'schoolwork collaboration' via friendship networks. Arpino et al. (2017: 545) mention propensity score weighting regression as an option, but regret at the same time: "Unfortunately, there is little guidance on how to select between propensity score methods". Maybe, also their study could have benefited from the advantage of including further control variables in a regression model. 2. The propensity score weighting regression model can account for the multiplexity of social networks (VanderWeele and An 2014: 370) by including these networks into the final estimation. Depending on the research question, contact and interference (Vander-Weele and An 2014: 353) in different network dimensions and in various forms, also between treated and untreated, can be included into the final model. 3. In a longitudinal setting, the final estimation of the ATT can account for a lagged dependent variable in the propensity score weighting regression and thereby allows at least for a testing of the causal effect (VanderWeele and An 2014: 366). Furthermore, a longitudinal analysis can use a differences-in-differences (DiD) estimator, which is a further advantage in the identification of a causal effect. 4. Controlling for the network embeddedness in propensity score weighting regression is particularly important if bias-reduction due to propensity score matching is insufficient. Appropriate matching usually reduces bias for (observed) variables that induce the selectivity of the treatment. In some cases, however, the matching procedure does not sufficiently reduce the bias for indicators of network embeddedness, which might be due to a strong dependence of observations in networks. Instead of simply accepting insufficient bias reduction for network indicators and just proceeding with propensity score matching, or abstaining from causal analysis, propensity score weighting regression allows controlling for these network indicators in the final prediction of the outcome of interest.

Data and measurement
Our school-survey collected three wave panel-data on 1676 students between 2010 and 2012 in grades 5-7 in two cities in northern Germany (Windzio 2018). Respondents were 10-12 year-old pupils. The population consisted of 149 fifth-grade school-classes, out of which 94 classes in 55 registered schools participated in the first wave (time 1 in 2010). The response varied between these three waves; 1087 children in 58 school-classes completed the questionnaire in wave 2, and 1561 children from 65 classes in wave 3. The majority of school principals was willing to cooperate, but teachers could decide on participation. Non-response occurred predominantly at the class-level. At the children's level, response rates varied between 75.4% (wave 1 in 2010), 80.4% (wave 2 in 2011) and 80.4% (wave 3 in 2012). Only classes where either N = 17 or 75% of pupils were present during the survey were included in the network analysis. Moreover, since the propensity score analysis uses longitudinal information for the differences-in-differences estimation (see below), only classes that participated in the first two waves could be used. Resulting from this selection rule and due to the item non-response in the pupils' cases, N = 382 pupils, 6170 dyads in 26 classrooms were potentially available (see "Appendix" Table 5). Moreover, for the SAOM the data was limited to classes where information on all three waves were available. Due to convergence issues of the SAOM the sample has been limited to 6-12 classes. The mean number of students in a class is 23.7 (grade) and 24.4 (well-being), with a range between 20 and 30, and the total number of students varies between N = 148 and N = 281. During the class interview pupils filled out the questionnaires under the interviewers' guidance. To guarantee the anonymity of the information clearly visible ID numbers were placed on the pupils' desks during the survey. The respective pupil's own ID number was entered into the questionnaire, and the network contacts with classmates were recorded by entering their ID numbers.

Methods
Processes of network self-organization motivate the inclusion of structural parameters in models predicting ties in networks. One important self-organizing process in networks is closure, especially in terms of transitive triads (Lusher et al. 2013;Windzio 2015). Such processes are considered "…'purely structural' effects because they do not involve actor attributes or other exogenous factors. … the network patterns arise solely from the internal processes of the system of network ties" (Lusher et al. 2013: 23). We take advantage of the 'purely structural' internal processes by constructing an instrument in step 2 (Fig. 3). This variable is the predicted propensity of a tie e.g. by triadic closure (see Table 6, "Appendix", for the models). Only the particular component of a change in x (e.g. parental contact) which results from a change in z (e.g. transitive closure) will be used to predict y (e.g. a tie in the friendship network) (Morgan and Winship 2007, 190). The instrumental variable estimator β IV is thus: Transitivity-based predictions in the friendship and parental contact networks will be used as explanatory variables in the p* model to predict ties in the schoolwork network (Table 6, IV = dy∕dz dx∕dz "Appendix"). Subsequently, predictions from the p* models in Table 1 will be used to compute a propensity score (Guo and Fraser 2010) (step 2 in Fig. 3). The p* model is a pseudolikelihood estimation of the probability of a tie in a network and is a variant of an exponential random graph model (ERGM) (Harris 2014: 23).
The first approach to analyse the causal effect of schoolwork collaboration on grades and well-being are stochastic actor-oriented models (SAOM) (step 1 in Fig. 3), which disentangle the effects of selection into network ties and social influence (Snijders et al. 2010). The algorithm (SIENA) simulates changes of networks between discrete states by assuming continuous micro-steps of actors' decisions between measurements. The resulting coefficients represent the log odds of observing a tie in the network and the effects on behavioural change.
For the second approach (step 2 in Fig. 3), propensity scores have been predicted from the subgroup-specific probit p* models in Table 1. Figure 4 compares the distribution of the propensity score between the treated and the non-treated in the overall sample, whereby treatment of a dyad is defined as having a tie in the schoolwork network. The overlap of the distributions indicates a good condition for propensity score matching analyses. The average treatment effect of the treated (ATT) will be estimated in two variants (Gangl 2010): First, nearest-neighbour matching with four observations of the non-treated as matches (NN+4) will be used to estimate effects on grades or well-being at time 2. The same procedure will be applied to compute differences on differences (DiD). The DiD approach also includes a lagged dependent variable. Different variants will be estimated by using callipers with values of 0.2 and 0.01. Callipers are restrictions to the maximal distance between nearest neighbours. If the distance is exceeded, cases will not be matched even if they are nearest neighbours (Guo and Fraser 2010: 147).
This propensity score will be also used to create the probability weight for propensity score weighting (Guo and Fraser 2010: 173) (step 3 in Fig. 3). The propensity score weight ω was constructed by the indicator W of the treatment (0 or 1), plus 1 minus the indicator W times the probability of the treatment ê, divided by 1 minus the probability of the treatment ê (Guo and Fraser 2010, p.161).
If relational information on students' ties is available, we can control for simple forms of embeddedness of dyads into the surrounding network. We control for mutuality, 2-in-stars, 2-out-stars, transitive triads, and cyclic triads in the friendship network. In addition, we can measure the number of each student's friends who belong to the treatment group. Interacting this indicator with the treatment status captures at least to some degree the social embeddedness and thereby the possible diffusion of information between treated and untreated dyads. Table 1 shows predictions of propensity scores by three probit p* models. We see that the instrumented variable 'parental contact (log)(IV)' (predicted by models in Table 6, "Appendix") has a positive and highly significant effect on ties in the schoolwork network (all: 0.126***; at least one migrant: 0.132***), except for whole day schools. Similarly, the effect on ties in the friendship network is significantly positive in all three models. Ties in the friendship network might capture a considerable part of homophily. Spatial proximity matters for ties in all three networks: if ego lives close to alter (walk within 5 min) the propensity of a schoolwork tie is considerably increased (e.g. at least on migrant: 0.613***). We also see that the effect of similarity in grade point average is significant (10% level) only in the subset of dyads where at least one node is a migrant (0.135 + ). As known from other social network studies, there is a positive effect of 'same sex' on ties in all three network-dimensions (all: 0.457**). Similarity in the (negative) learning self-concept and in the number of books at home are not significant at the 5% level. There is a strong tendency towards mutuality in all three network-dimensions. The effect of 2-in-stars as an indicator of prestige is negative in all three networks, but insignificant in dyads with at least one migrant, whereas 2-out-stars are generally insignificant. We find the expected effects of transitive (positive) and cyclic triads (negative) for all three models (Robins 2015).

Results
The first approach of the causal analysis is the SAOM (step 1 in Fig. 3). In Table 2 the behaviour change variable in Models 1-2 is the grade point average in mathematics, German and English. The outcome in Models 3-4 is the level of well-being in school, ranging from 1 to 10 (see Table 5, "Appendix"). There is a significantly positive effect of having or establishing a reciprocal tie in the schoolwork network. Similarly, transitive triplets show a positive effect, indicating the common tendency towards triadic closure. Aside from that, there are not any other significant effects in the network-parts of the models. In the behavioural change-equations (grades and well-being in school) we find negative quadratic shape effects in Models 1-2. Accordingly, there seems to be a significant decrease in average academic performance over time.
The most important information is the absence of effects of networks on behaviour in the dimensions 'grades' and 'well-being in school'. If there were noteworthy social influence via grades or well-being, we would have found positive effects of 'total alter' and/or the 'average alter'-which is not the case. What would positive effects indicate? In case of 'total alter' it would mean that ego adapts to the sum of respective behaviour values of those alter with whom ego is connected, 'averge alter' is the average of these values.
The number of networks used in the meta-analysis varies between k = 6 (148 students) and k = 12 (N = 281 students), depending on the number of non-converging schoolworknetworks which were all of comparatively low density. Overall, the survey data includes 21 school-classes with data for 3 waves, so the loss of data is considerable, and might not be at random. Table 3 shows the results of the propensity score matching analysis (step 2 in Fig. 3). Differences-in-differences (DiD) test whether the outcome in a dyad has increased between t 1 and t 2 due to school-work collaboration. The propensity score has been estimated from the probit p* model for schoolwork networks at t 1 (see Table 1). In the lower panel of Table 4, conditional average treatment effects on the treated (CATT) are shown for dyads where at least one node is of migrant background, or where students learn in whole-day schools (see Table 1, model 'at least one migrant', and model 'whole day').
Results in Table 3 are again pessimistic about effects of schoolwork collaboration on both grades and well-being. Balancing the sample by matching leads to an insignificant ATT on grades (− 0.029, t = − 0.56). The DiD is insignificant as well, and small in magnitude (0.011, t = 0.35). Moreover, there even tends to be a negative effect of the DiD estimator of schoolwork ties on well-being (− 0.277, t = − 1.76 +). This general pattern does not change when we introduce calipers of the 0.2 or 0.01. The CATTs are Table 2 SAOM-Effects on ties in homework networks, grades and well-being Results of SIENA co-evolution model and random-effects meta-analysis, grades 5, 6, 7; k = no. of networks; N = no. of students + p < 0.10, *p < 0.05, **p < 0.01, ***p < 0.001 not significantly positive for any outcome; there is no evidence of a benefit from being involved in schoolwork-dyads. In contrast, the effect of schoolwork collaboration even turns negative in whole day schools when we introduce a caliper of 0.01 (− 0.270, t = − 2.97**).
Objections against this analysis could be that (a) the estimation violates the SUTVA and (b), also untreated dyads will gain in competence (averaged between ego and alter) if at least one node is connected to a (successfully) treated dyad (see Fig. 2). If network information is available, it can be used to mitigate the consequences of the SUTVA violation when estimating the ATT . The propensity score weighting regression (Guo and Fraser 2010: 161) allows controlling for the embeddedness of dyads into the surrounding network, which can be particularly important if the amount of bias reduction due to matching is insufficient. For instance, according to a common rule of thumb, a remaining residual bias after matching of 5-8% is acceptable [but recent studies have shown that the balancing should be better (Gangl 2014: 261)]. But if the bias is above a certain threshold, should researchers then abstain from a causal analysis?
Propensity score weighting regression can control for variables that still show considerably bias after matching. Table 7 ("Appendix") shows the remaining bias estimated in a post-matching test where variables indicating network embeddedness have been excluded from the probit model (not shown in Table 1). Table 8 shows the remaining bias when also Table 3 Effects of network ties on grades and well-being at t 2 and DiD (Δ: differences in means, t: t-statistics), nearest-neighbor propensity score machting Units are Dyads. Higher values of grade/wellbeing indicate better performance/higher wellbeing  Table 4 The effect of ties in schoolwork networks on grades and well-being in school. ATT from propensity score weighting. Units are Dyads + p < 0.10, *p < 0.05, **p < 0.01, ***p < 0.001, standard errors corrected for clustering in persons the network variables are included. In the first case, the remaining median bias is 6.7, in the second case 8.0, so the remaining bias after matching is at the upper limit. Instead of abstaining from the matching analysis, propensity score weighting regression (step 3 in Fig. 3) offers the opportunity to control the bias-inducing variables in the final model. In order to take the network embeddedness into account in the final prediction of the ATT, effects of a dyad's embeddedness in the friendship network, namely mutuality, 2-in-stars, 2-out-stars, transitive triads and cyclic triads have been controlled. A further control variable is the number of friendship ties each student has with treated students ('no. of treated friends'). Similar to the embeddedness effects, in combination with an interaction effect with the treatment status ('no. of treated friends X treated'), the main effect controls the number of 'treated' friendships for those who did not receive treatment-which indicates the opportunities for the 'diffusion' of treatment information to non-treated cases. In addition, the analysis controls for whole day schools, the interaction 'treatment X whole day schools', number of books at home of ego and alter, migration background, negative learning concept of ego and alter, spatial proximity (ego lives close to either), as well as ego for 'maternal control of leisure time' and 'mother helps with schoolwork if needed' (see Table 5, "Appendix" for the coding). Table 4 shows three models of the propensity score weighting regression models for grades and school-related well-being at t 2 (Models 1 and 4). Each second model (Models 2 and 5) contains a lagged dependent variable, each third model (Models 3 and 6) is based on a DiD estimator. The effects of the individual control variables are interesting in themselves, but not interpreted here. All models show insignificant effects of the treatment, and in 5 out of 6 models the insignificant effect is even negative. Accordingly, also step 3 of estimating a causal effect of ties in schoolwork networks does not show any significant effect, neither on grades nor on well-being.

Discussion
The research design in the present study differs from studies discussed above: while e.g. Aral et al. (2009) use nodes (individuals) as units of observation, dyads have been analysed in the matching analyses of present study. If actor A in a treated dyad improves his or her grade, this automatically affects untreated dyads that include A as well. In present study, the dyadic characteristic of mean grades or well-being could be disaggregated into nodal characteristics-which is impossible for trade flows between countries (Arpino et al. 2017). In their study, edge-attributes (trade flows) are of the outcome of interest, whereas here it is dyadic node-attributes, namely both actor's grades and well-being. Node-attributes of ego and alter are often of primary interest e.g. in research on social capital, cooperation and the consequences of network embeddedness. Providing appropriate methods for such research was one motive to develop models for the co-evolution models of network and behaviour in the SAOM.
It is controversial whether the SAOM of co-evolution actually is a causal model. Ragan et al. (2019) have shown that SAOM-results are similar to conventional methods of panel analysis. In econometrics, panel analysis is an approach to causal analysis when only observational, non-experimental data is available (Brüderl and Ludwig 2014). In this respect, also SAOMs are an approach to causal analysis, even though causal interpretations will always remain open to easy criticism if the study design is non-experimental. Alternatively, PSM is useful in limited data-situations where non-convergence of SAOMs would result in loss of many networks, e.g. when networks are sparse. PSM is based on the SUTVA assumption, which becomes obvious when researchers collect information on social networks. Strictly speaking, the problem of the SUTVA is relevant to any kind of causal inference using school-class data or otherwise clustered data, where networks exist, but network information was ignored in the data collection. Researchers usually address statistical non-independence of observations by applying multilevel models, but these models do not appropriately account for the diffusion of information within clusters, e.g. in subnetworks.
Please note that the approach suggested in this paper is rather conservative because untreated dyads in a network consists of nodes involved in other dyads, which can be treated (see Figs. 1, 2). Hence, if there were a treatment effect due to schoolwork collaboration, also untreated dyads would benefit, which reduces the estimated effect, even though propensity score weighting regression is able to control for many confounders.

Conclusion
The SAOM for the co-evolution of schoolwork networks and grades/well-being did not show significant effects on the outcomes. An obvious limitation of the SAOM is the considerable loss of data, particularly when there are good reasons to assume that the selection is informative with regard y and x.
Matching methods do not require strong assumptions, except for the SUTVA. Any school-or organization-related causal research should be aware of this assumption and its implications. Statistical non-independence becomes obvious in social network data, when relational information is available: pupils interact with each other in friendships, during leisure time, schoolwork and other kinds of social ties. Since causal effects of dyadic collaboration embedded in networks pose a challenge (Figs. 1, 2) for matching methods, propensity score weighting regression allows to control for potential spill-over effects.
Overall, SAOMs, propensity score matching and weighting analysis did not show any substantial average treatment effect on the treated. A problem of the present study is the low density of the schoolwork collaboration networks. Effects might be insignificant simply because there is not enough information available in these networks. However, if schoolwork collaboration networks really had a considerable effect on the outcomes, it should become apparent in the statistical models based on around 300 treated dyads (Table 3).
The present study suggested applying propensity score weighting regression, which allows for the statistical control of dyadic embeddedness in the wider network structure, such as transitivity and the number of network ties to subjects from the treatment group, which indicates the opportunity structure for 'diffusion' of treatment information via friendship-ties. Moreover, the model can account for lagged dependent variables and differences-indifferences in longitudinal studies.
This problem affects most studies using school surveys. A practical implication of the present paper is that school studies should consider collecting networks in clustered data situations. Hence, when researchers conduct surveys in (almost) complete school-classes and are interested in causal inference they should consider collecting network data in order to apply propensity score weighting regression. Maybe, the present study stimulates the collection of network data in future studies on organisational contexts, such as schools, even if networks are not of primary interest. Propensity score weighting regression provides a method of causal inference for observational data when the SUTVA is violated.
Future research should think about simulation studies which assess how strongly violations of the SUTVA affect results. Moreover, researchers should more systematically elaborate the consequences of different research settings for causal inference. Regarding the development of causal methods for network data, propensity score weighting regression can be a fruitful contribution, in particular when data limitations or sparse networks impede the convergence of SAOMs or induce severe bias in these models due to systematic loss of data (VanderWeele and An 2014: 368).