High school students’ friendship network, physical activity and residential locations – a stochastic actor based model

Evidence shows that adolescents do not do enough physical activity (PA), which could contribute to childhood overweight and obesity. Studies have shown that both the built environment and social networks could influence adolescents’ PA, but more studies are needed to investigate their combined influence using longitudinal data. We used a stochastic actor-based model analyzing two waves of Add Health data to test if (1) home location has a significant influence on high school student’s friendships, and (2) the neighborhood built environment has a significant influence on high school student’s PA while controlling for friendship networks. The results indicate that students’ PA level emulated peers’ PA levels and students who lived closer together, increased the likelihood of forming friendships. However, the built environment variables that described adolescents’ residential neighborhoods did not show a significant influence on students’ PA dynamics. This study contributes to our understanding of the joint impacts of social networks and home location on adolescents’ friend networks and PA dynamics in urban settings.


Obesity, physical activity, environment, and social network
Since the mid-1990s, obesity has been recognized as one of the leading public health problems in the U.S. (Flegal et al., 2016). Obesity among adolescents is also a serious health issue. Between 2015 and 2016, nearly one-fifth of all U.S. adolescents were obese (prevalence rate = 18.5%) (Hales et al., 2017). As obesity and overweight are due to an imbalance of caloric intake and expenditure, the lack of exercise is a major direct cause of unhealthy weight.
A national sample of 24,800 U.S. high school students between 2013 and 2015 showed that about 66% of boys and 75% of girls did not get daily PA. On the contrary, approximately one-fifth of students spent over 5 h on screen devices (e.g. computers, smartphones etc.) per day (Kenney & Gortmaker, 2017), an activity counterproductive to exercise.
Among many factors that are associated with obesity and PA, the built environment is an important one. Since the 1990s, scholars have been investigating the association between public health outcomes, in particular obesity, and the low-density, automobile-dependent urban form in the U.S., and in particular, the influence of built environment on PA (Ledoux et al., 2016). Studies revealed that a pedestrian and bicycle-friendly built environment, which is characterized by high population and housing density, mixed land uses, and highly connected road network, promotes people's PA (Handy et al., 2002). Availability and accessibility to PA facilities (Mason et al., 2018;Powell et al., 2006), as well as neighborhood safety (Harrison et al., 2007;Molnar et al., 2004), also exert a positive influence on PA. Meanwhile, travel behavior, accessibility, safety, and PA are also shaped by the socio-demographic composition of various neighborhoods (Vojnovic et al., 2019).
More recently, studies also began to investigate the relationship between obesity and social networks. In the longitudinal Framingham Heart Study, Christakis and Fowler (Christakis & Fowler, 2007) conducted a social network analysis, and their findings suggested that obesity is "contagious" via social influence. Such a spreading mechanism among social networks was also found to be associated with other health-related factors, such as smoking , happiness , and loneliness (Cacioppo et al., 2009). Christakis and Fowler's study on obesity attracted wide attention and aroused public debates (Zhang et al., 2018). Controversial arguments suggested that the clustering of obesity observed in a social network could result from the shared environment (Cohen-Cole & Fletcher, 2008;Lyons, 2011), or the friendship selection process, i.e., the homophily effect where people tend to associate with those who share similar characteristics (Lyons, 2011). These debates inspired researchers to further investigate the complex relationship between social networks and health. In terms of social networks and obesity, studies were conducted to disentangle social influence from social selection as well as other confounding processes (Zhang et al., 2018).

SAB models of obesity and PA: a brief review
Among studies exploring the underlying causal relationship between obesity (and obesity-related behaviors) and social networks, a commonly used method is called a Stochastic Actor Based (SAB) model (Snijders et al., 2010) or a 'stochastic actor-oriented' model (Snijders, 2017). SAB models use longitudinal data to simulate the evolution of a network as a stochastic process driven by actors who decide on their outgoing ties (e.g., friendship). Network dynamics are affected by its structure and exogenous factors, i.e., the characteristics of actors or dyads (ties). SAB models have the advantage to simultaneously analyze the coevolution of the network and the behavior(s) of its actors. This dynamic system is the outcome of a Markov process, where a number of unobserved small changes are assumed to occur between each of the successive observed states of a network and behavior. A SAB model has two discrete parts: a friendships' dynamic model and a behavior dynamic model. More details about the SAB model can be found in Snijders (2001) and Snijders et al. (2010).
Most studies on social networks and obesity using SAB models are about children and adolescents. To demonstrate how the SAB model was applied in studies about the relationship between social network and obesity, we reviewed research articles included in two recently published systematic review papers (Prochnow et al., 2020;Zhang et al., 2018). De La Haye et al. (2011a used longitudinal data of four waves from a high school in Australia. They found that similarities in the weights of friends were mainly driven by friend selection (both homophily and weight-based stigma) instead of influence. Meng's (2016) study investigated the relationship between social network and body weigh in a virtual space through analyzing social network data collected from a social networking site for weight management. The results indicated that homophily predicted preferential selection in an online social network and individual's weight tended to be similar to 'health buddies' over time as an outcome of social influence (Meng, 2016).
Two studies (Shoham et al., 2012;Simpkins et al., 2013) both using the National Longitudinal Study of Adolescent Health (Add Health) data and the SAB models, found evidence of the homophily effect based on PA and social influence from peers. The effect was in the form of assimilation, i.e., over time, individual's PA level was becoming closer to that of their friends. Consistent results about PA's influence on friend selection and assimilation of PA among friends were also found in an Australian study (De La Haye et al., 2011b). Different from the above studies, Gesell et al. (2012) found that PA had no impact on forming or dissolving friendships in an after-school friendship network. They found that students would adjust their PA level to emulate the activity levels of their peers. Although the same sample of schools from Add Health data were used, the study by Long et al. (2017) only found significant assimilation to friends' PA but no significant homophily effect of PA on friend selection.

Purpose of study
The review of existing studies points to a gap in the literature. There are inconsistent findings from previous longitudinal studies using SAB models on PA among adolescents. More importantly, existing studies using social network analysis did not emphasize the role of geographic space when investigating adolescent's friendship and PA. The environmental influence was not considered and tested. Since little research has been conducted to investigate the combined influence of the environment and social network on adolescent PA using longitudinal data, this study aims to integrate the environmental drivers within the social network models to investigate their joint impact. Specifically, we aimed to test the following hypotheses: (1) home location has a significant influence on high school student's friendships, and (2) neighborhood environment has a significant Liu et al. Computational Urban Science (2021)  influence on high school student's PA while controlling for friendship networks.
We used the SAB model to extend the existing studies by including the environmental variables and the home distance to schools. The SAB model was implemented using the Simulation Investigation for Empirical Network Analysis package in R (R-SIENA version 4) (Ripley & Snijders, 2009).

Study population
This study used the Add Health (the National Longitudinal Study of Adolescent Health) data, which is a nationwide school-based longitudinal dataset originally collected to understand the driving forces of adolescent health and health behaviours (Harris, 2013). In the first wave, participants in grades 7-12 were sampled from sample schools across the United States. They were also followed through adolescences towards adulthood in the following three waves (Wave 2 to Wave 4) with in-home interviews. In this study, we used Wave 1 (1994-95 school year) and Wave 2 (95-96 school year) Add Health data. More details about the Add Health sampling methodology and study design can be found elsewhere (Harris, 2013;Harris et al., 2009). Among all Add Health sample schools, students from 16 selected schools were interviewed and they were asked to nominate up to five male and five female friends, i.e., a maximum of 10 friends in total. These 16 schools were called "saturated" schools as a complete friendship network can be constructed with these interview data. In this research, Wave 1 and Wave 2 data of students from two saturated schools, with the largest sample size, were used for the analysis. These two selected schools are good for comparative analysis because one is in a midsized town dominated by non-Hispanic white students while the other one is in an urban setting with more diverse population (Shoham et al., 2012). The Institutional Review Board of Michigan State University approved the use of Add Health data for this study (IRB# × 16-380e).
There were 2553 samples in total from the Wave 1 inhome survey (school A: N = 832; school B: N = 1721). We excluded students in grade 12 because they would not be at school in Wave 2 due to graduation; thus, 756 students were removed (192 from school A and 564 from school B). After merging with Wave 2 data, 222 students were removed (78 from school A and 144 from school B) due to no observations at the second wave. Lastly, we examined the friendship data and excluded students who did not nominate any other student as their friends or were nominated by others in both waves. This is because this study simultaneously focused both on the dynamics of the social network and the influence of peers. In the final sample set used in this study, there were 557 students from school A and 948 students from school B.

Friendship network
During the in-home interviews in both waves, students were asked to nominate up to five male and five female closest friends. Among all the nominees, we excluded those who were not students in the two selected sample schools. As mentioned earlier, students who did not nominate any friends in their school and were not nominated by any other participants were dropped from this study.

Pa
Three ordinal variables measuring students' PA were selected to create an index reflecting their PA on a weekly basis. There are a number of times that students: 1) went "roller-blading, roller skating, skate-boarding or bicycling"; 2) played "an active sport, such as baseball, softball, basketball, soccer, swimming, or football"; and 3) exercised, "such as jogging, walking, karate, jumping rope, gymnastics or dancing". Each variable ranged from 0 to 3, where 0 indicates no such PA at all, 1 indicates 1 or 2 times, 2 indicates 3 or 4 times, and 3 indicates 5 or more times in a week. We calculated the sum of all three variables as the Total PA, of which value ranged from 0 to 9. "Refused" and "don't know" were treated as missing values during the calculation.

Spatial data
The coordinates of home addresses were collected during the Add Health survey and GPS reads were converted to relative coordinates based on the central point of a community to ensure anonymity among the students. Samples of the same school are in the same community in this research. We calculated the Euclidian distance between home locations of each pair of students from School A and School B respectively to control for propinquity among students affected by where they lived.
The Obesity and Neighborhood Environment (ONE) database linked Add Health respondents' residential locations with their community-level data spatially and temporally, which enabled us to investigate the influence of neighborhood environment on students' behavior. Among all the available measures, we extracted five variables that we hypothesized to influence students' PA: (1) distance from home to school; (2) counts of all types of PA resources within 3, 5 and 8 km road network radius; (3) road connectivity index within 3, 5, and 8 km of Wave I respondent locations, i.e. the Gamma index, which is the ratio of actual links over the maximum number of all possible links between nodes in the road network; (4) the Simpson's diversity index (ranging between 0 and 1) of land cover within 3, 5 and 8 km radiuses, with a higher value indicating greater land cover diversity; (5) the population of year 1990 within 3, 5, and 8 km buffers around each residential location.
Distance to school could affect available time for extracurricular sports. Amount of PA facilities might influence the availability and accessibility to PA resources. Road connectivity, land-use diversity, and population density were related to neighborhood walkability (Handy et al., 2002). Distribution of sample students' home location and neighborhood environment variables included in this study were mapped and can be found in Supplementary Materials 1.

Other related measurements
2.2.4.1 Sex Information on student's sex was recorded as male = 1 and female = 2.

Race and ethnicity
Race information was stored in five different binary variables (White, Black or African American, American Indian or Native American, Asian or Pacific Islander, and Other). We integrated all five variables and recoded values (1 = White, 2 = Black or African American, 3 = American Indian or Native American, 4 = Asian or Pacific Islander, 5 = Other, 6 = missing value). Ethnicity was a binary variable with value 1 indicating Hispanic or Latino origin and 0 as not.

Body mass index (BMI)
Students reported their height and weight in both waves. The BMI value was calculated using the weight (kg) and height (m) reported in the survey (BMI = weight/height 2 ). BMI at Wave 1 was used as a constant covariate in our models.

Motivation
During the in-home survey, participants were asked whether, in the past 7 days, they exercised to 1) lose weight/keep from gaining weight, or 2) gain weight/build muscle. To control one's motivation, we created two variables called "exercise to lose weight" and "exercise to gain muscle". The value of motivation variables was set to 1 if the answer to the corresponding motive was true and to 0 otherwise.

Course overlapping
Add Health data provide information about the extent of courses common to each pair of students. A weighted course-overlap measure was used in this study to control for the influence of taking the same course on friend selection. Weights were determined based on the number of Carnegie units taken by students and the number of classes per course.

Analytic plan
In this study, we adopted the SAB models to understand the relationship among high school student's friendship in the school, PA, and their residential location. In a SAB model, the evolution of a network is treated as a stochastic process driven by actors (i.e. students) who decide on their outgoing ties (i.e. friend nominations). The SAB model assumes many unobserved micro-steps between two consecutive observations (in our case, a certain number of micro-steps between Wave 1 and Wave 2). A rate parameter determines the number of micro-steps. In each micro-step, one change occurred in the network (forming a new tie, dropping an existing tie, or no change to current network). Which tie and how it will change is captured by a linear additive objective function, consisting of many effects, whose value can be translated into an expected probability. To test our first hypothesis about the influence of home location on friend selection, we included Euclidean distance between home locations of each pair of students from the same school as a covariate in the SAB selection. A significant coefficient would reject our null hypothesis that residence distance between two adolescents has no impact on forming or maintaining friendship between them. Other effects in the selection model include (1) structured effects that represent the endogenous network processes; (2) homophily effects that captures the assimilation process during friend selection; and (3) behavior effects, which helped to investigate the influence of PA on the dynamics of a social network. Descriptions of effects included in the model is shown in Table 1.
Coevolution of behavior is also integrated into the SAB model, which enabled us to analyze peers' influence on participants' PA. Similar to the selection model, the SAB behavior model also has a rate parameter and a linear additive objective function describing how different effects would influence change in an actor's PA (increasing one unit, decreasing one unit, or no change per micro-step). To test our second hypothesis about the influence of the built environment on PA, we included five environmental effects (see Table 1) at three different geographic scales (3 km, 5 km, and 8 km) with each scale as a separate SAB behavior model.
To test our hypotheses, we built the SAB models using the RSiena package in R. We used a forward selection process (Snijders et al., 2010) and only kept the significant effects in the selection model before we modeled the coevolution of selection and behavioral change. Since there were two schools and three geographic scales of environmental effects, a total of six models were tested. Liu et al. Computational Urban Science (2021)   In both schools, more than half of the students indicated motivation to increase PA at Wave 1. Also, on average, students from School A had lower BMI and more PA than School B. In terms of environmental variables, there were no dramatic differences among index type variables (road connectivity and land cover diversity) at different scales while count-type variables  Table 4 shows the results of the SAB friend selection model. The overall convergence ratios of both schools were under 0.25. All the convergence t-ratios were under 0.1. Together they indicate an adequate convergence of the model for two sample schools.

SAB friend selection model
First, the spatial effect we examined in the friend selection model -the distance between individual's home and friends' home -had a significantly negative coefficient in both schools. This important finding suggests that an alter living far apart from the ego was slightly less likely to be selected as a friend (estimate = − 0.0712, esp.(− 0.0712) = 0.93). Consequently, we reject our first null hypothesis and conclude that home location had a significant impact on the dynamics of the friendship network.
In terms of other effects, all included structural effects exerted significant influence (p < 0.05) on the network dynamics and the results of two schools were consistent with each other. According to the estimates, outdegree had a significant negative coefficient, suggesting that the actors in the network were not inclined to make friends with random alters. The significant positive coefficients of reciprocity indicated that students liked to maintain existing friendship ties or nominated those who nominated them as friends. Estimates for transitive triplets and popularity were also significant and positive. The former suggests that the individual was inclined to become a friend with their friend's friends. The later indicates that students who received a lot of nominations would attract more incoming ties.
In terms of the homophily effects, for School B, all variables included in the selection model exerted significant (p < 0.05) influence on forming or maintaining ties. However, race, ethnicity, and BMI homophily effects were not significant for School A. Students who had more course overlapping were more likely to be friends. If two students were of the same gender, they would be  The PA alter effect was not significant, indicating that a physically active student and a physically inactive student had no difference in terms of being nominated as a friend by others, with all other characteristics unchanged. Also, the insignificant coefficient of PA similarity suggested that similar PA level had no impact on attracting more incoming ties. The estimate of PA ego was significantly negative in School B's selection model, which indicated that the more physically active students in that school were less likely to form or maintain friendship ties with others.
Following our analysis plan, only significant covariates in the selection model were kept in developing the network-behavior coevolution model. Given the inconsistency in the results of two sample schools, the coevolution model of School A had fewer covariates than School B.

SAB coevolution model
In the SAB network-behavior coevolution model, PA was treated as another dependent variable to test the influence of student's social network on their PA behavior. Since the estimates and significance test results were consistent with the selection model that the coevolution model is built from, we only focused on the results of the behavior model in this section.
For School A (Table 5), the PA total similarity effects were positive and significant (p < 0.05) in all models of different spatial scales (3 km, 5 km, 8 km), indicating an assimilation process where adolescents tended to adopt a similar level of PA of their friends. In our model we used and reported the total similarity effect which means the total influence of nominated friends was proportional to the number of nominations. We also tested the average similarity effect at different spatial scales while holding all other effects the same, and results showed that the average similarity effect of PA remained significant. For school B (Table 5), the PA total similarity effect showed a consistent result as in School A, i.e. the effect was significant at all three geographic scales (p < 0.05). All estimates were positive thus we concluded that like in School A, students in school B also tended to adopt their friends' PA level.
In terms of other direct effects (motivation and environmental effects), we did not observe any significant influence for both schools among different spatial scales. Thus, in this study, we were not able to reject our second null hypothesis (i.e., built environment exert no significant influence on adolescents' PA dynamics in selected sample schools between Wave 1 and Wave 2).

Discussion
This study extended prior research conducted by other scholars and contributed to the physical inactivity and childhood obesity literature by using the combined social, spatial, and environmental variables to test their influences on the dynamics of friend selection and adolescent PA. Our results show that, in the friend selection model, home distance between high school students was significantly and negatively associated with tie creation and maintenance, which means that students who live closer together are more likely to be friends. This can indicate that students interact outside of school contexts, such as spending time together t after school or during summer and winter breaks. We also found that student's PA could be influenced by friends via an assimilation process. Together, these two findings imply that intervention outside school, such as PA involved activities in community centers or self-organized outdoor sports arranged by parents. Such activities might be able to facilitate promoting PA of adolescents by direct participation or indirect influence via a change in the behavior of friends.
The environment variables that described adolescents' residential neighborhoods did not show a significant influence on students' PA dynamics. This is consistent with some existing studies which showed the built environment had trivial to small impacts on PA among youth (McGrath et al., 2015). However, other reasons could contribute to an insignificant association between the built environment and PA dynamics in this study. One might be that the Wave 1 and Wave 2 were only 1 year apart, but the shaping effects of the environment on behavior may take a longer time. Another possible reason is that for students participating in Add Health survey, the neighborhood outdoor environment was not their primary location for PA. Without further detailed information, we were not able to figure out if the PA  reported in the survey took place near home or mostly in school. Unlike adults, who may largely rely on public amenities such as parks to do certain sports, adolescents spend a great amount of time in school and have easy access to facilities available for students provided by the school. It is also possible that the features of the neighborhood environment we chose to investigate were not very important for adolescents' decision making about PA. In future studies, it may be useful to examine other variables such as safety. Some of our results are consistent with the findings of Simpkins et al. (2013) and Shoham et al. (2012) who used the same dataset. These include homophily effects of grade and gender, and the effects, of course, overlapping in friend selection. However, we also ended up with some inconsistencies. For instance, in our study, the PA ego effects and BMI similarity effects in the SAB selection model were only significant in School B, whereas they were both significant in the work of Simpkins et al. (2013). We hypothesize that these disparities can be attributed to differences in data filtering and the selection of explanatory variables due to different research questions. In the model of Simpkins et al. (2013), BMI was classified whereas we used raw (numerical) BMI values, which may also cause differences in the level of significance.
This study has some limitations. First, in our analysis, we used secondary data collected in 1994/1995. We realize that, after 30 years, the way that high school students interact with peers may have changed, or not. Compared to millennials, the lifestyle of centennials (people born between the late 1990s and 2010) is greatly influenced by online interaction, which may have a varying effect on PA. Online social networks are playing an  (Lenhart, 2015). The popularization of computers, cellphones and internet not only greatly influence the social network of adolescents, but also contribute to their screen time, which might otherwise be devoted to PA. Friendships and their influence on students may also be moderated by screens and how influential a friend is compared to one-on-one contact relationships. Given these changes in the society and culture, samples used in this study might not well represent the behavior pattern and attitudes of adolescents in current times. More recent largescaled longitudinal data with complete social network will be of great value for future studies. Another limitation is that the data was self-reported rather than measured. For example, the key variable that we used in our analyses, total PA, only reflected the reported frequency of PA in 7 days preceding the survey. However, the duration and intensity of the activity were unknown. This could lead to inconsistency and uncertainty when trying to investigate the changes in PA and the difference of PA between a pair of students.
Third, this study did not reveal the actual processes behind adolescents' influence on their peer behavior. Although we found some association between change in one's PA and the average PA of this student's nominated friends, it is not clear what mechanisms cause these associations. There is a lack of information about whether or not the reported PA was done with an individual's friends. The influence from peers could be from direct interactions. It is possible that a student was frequently invited by friends to participate in PA together after school, which boosted her PA to be similar to her physically active friends. Or, on the contrary, she could be invited to watch TV or play video games together, which reduced her leisure time for PA and made her less physically active. A student could also be influenced by friends by simple observation or verbal communication. A student might not participate in PA with her friends, but she might see her physically active friends as role models and mimic their behavior when she is in a more private setting. It is also possible that she devoted more time to certain activities, such as doing sports or watching TV, in order to have a conversation with friends as a way of maintaining the friendship or becoming more popular among peers.
We also recognized that the small sample size (two sample schools with 557 and 948 participants respectively included in the analyses) of this research could affect the generalizability of the study. In addition, collecting complete social network data is time and resource-consuming and there is a lack of secondary social network data of adolescents available in the field. Therefore, more funded studies that provide data access with adolescent participants' privacy and confidentiality well protected would greatly benefit the field.
Regardless of the many limitations embedded in the data or the availability of data, we noticed that the Siena model had been used in many contexts. Examples include understanding the dynamics of online social networks, such as among online course discussion forums (Zhang et al., 2016), open-source software project communities (Kavaler & Filkov, 2017), and health-specific social networking sites (Meng, 2016). However, while in the era of big data scholars have access to online social network dynamic data, studies on minors still face data accessibility and availability problems, as well a lot of serious ethical issues. Regardless aforementioned limitations, we believe that this study lays a strong foundation to further our understanding of the joint impact of social networks and neighborhood environments on adolescents' friend selection and PA.

Summary
In this study, we analyzed two waves' Add Health data of two sample schools. We built SAB models to investigate the relationship among friends' networks, home locations, neighborhood environments, and adolescents' PA. We found that students were inclined to be friends with those who lived closer, but we failed to detect a significant influence of the built environment on PA level. This study contributes to the field of children's studies by extending existing research via incorporating spatial and environmental variables in the analysis. Due to limitations of this study, the relationship between environment, PA and obesity is still not clear and further research with more recent data are required in the future.
Additional file 1. Distribution of environment variables.
Additional file 2. Specification of the SAB Model.