The Impact of Youth Digital Citizenship Education: Insights from a Cluster Randomized Controlled Trial Outcome Evaluation of the Be Internet Awesome (BIA) Curriculum

School psychologists and other school professionals are increasingly turning to digital citizenship programs to improve children’s online safety. However, to date, there has been no rigorous outcome evaluation of these programs. The current study used a cluster randomized control trial methodology (CRCT) to evaluate the impact of the Be Internet Awesome digital citizenship program on knowledge, attitudinal, and behavioral outcomes for 1072 4th–6th graders at 14 participating schools (7 treatment and 7 control schools). Multilevel logistic, multinomial, and Gaussian models were used to test outcomes. The study found support for program impact on children’s knowledge of new online safety concepts and self-efficacy for handling online problems relative to students in control conditions. However, no program impact was found for other outcomes including online privacy or civility, cyberbullying, and talking with parents about online problems. Findings highlight the need for clarity on digital citizenship program goals, innovative approaches, and increased evaluation research.


Introduction
Internet safety education and digital citizenship programs have been developed to improve youth online safety, with schools as one of the primary channels for education efforts (Palfrey et al., 2010). A 2010 national survey found that almost half of youth in the USA reported receiving internet safety education in their schools (Mitchell et al., 2013), and it is likely that percentage has increased over the last decade. For school psychologists and other school professionals working to improve children's emotional well-being and reduce mean behavior and bullying (Elbedour et al., 2020), digital citizenship education may be seen as a promising tool. However, to date, there has been no rigorous evaluation to define the outcomes of these programs or assess their effectiveness (Finkelhor et al., 2020;Jones, 2010). To initiate rigorous research on the impact of these programs on children's online safety and digital citizenship skills and behaviors, the current study used a cluster randomized control trial methodology (CRCT) to evaluate the Be Internet Awesome digital citizenship program on knowledge, attitudinal, and behavioral outcomes for 4th-6th graders.

Online Safety and Digital Citizenship Education for Youth
Ninety-four percent of children (3-18 years) in the USA have internet access at home (National Center for Education Statistics, 2020) and in 2018, 95% of teens reported using a smartphone and 89% said they used the internet either "almost constantly" or "several times a day" (Anderson & Jiang, 2018). Social media use is a popular online activity even for younger children: 63% of 13-14-year-olds reported using Instagram and Snapchat in 2018, and 47% used Facebook. The Pew Research Center found that children 11 and younger are typically using the internet to watch videos and play online games (Auxier et al., 2020). However, 37% of parents of 9-11-year-old children say their child has their own smartphone and between 10 and 30% of children in this age group are using social media sites, with rates varying depending on the social media platform.
The extent to which younger children are interacting with digital technology has raised worries about their vulnerability to online risks (Finkelhor et al., 2000;Holloway et al., 2013;Livingstone & Haddon, 2009;Mitchell et al., 2011;Rice et al., 2015). One meta-analysis found prevalence rates of 15% for youth cyberbullying victimization (Modecki et al., 2014) and a meta-analysis of nine studies on unwanted online sexual solicitation experiences by youth found a prevalence rate of 11.5% (Madigan et al., 2018). Relatively less research has been conducted on other online youth victimization experiences such as online fraud and hacking. One US survey found that 12% of 15-to 17-year-olds said they have had someone hack into their accounts online, and 6% said someone has stolen or coercively obtained their password (Lenhart et al., 2016).
Early approaches to online safety education focused primarily on online predators, and then increasingly began to address a wider variety of online concerns such as cyberbullying, online scams, and privacy. Digital citizenship education appears to address many of these same online safety concerns, but also often incorporates digital literacy education elements, such as safe password use. Cortesi and colleagues from the Berkman Klein Center for Internet & Society (2020) conducted a review of 35 youth digital citizenship resources and identified four thematic clusters: (1) participation (digital access, digital literacy, content production, security, and law); (2) empowerment (civic and political engagement, context, information quality, and media literacy); (3) engagement (e.g., digital economy, data, computational thinking, and artificial intelligence); and (4) well-being (e.g., privacy and reputation, identity exploration and formation, positive and respectful behavior, and safety and well-being). "Positive online behavior" and "digital literacy" were the most common elements included in the reviewed frameworks, and this appears to be where school-based digital citizenship education focuses (Finkelhor et al., 2020; International Society for Technology in Education (ISTE), n.d.; Mossberger et al., 2008;Ribble & Bailey, 2011).
A recent survey found that approximately 60% of US teachers reported using digital citizenship materials in their classrooms, with cyberbullying cited as the most frequently addressed topic (Vega & Robb, 2019). Another survey found that three-quarters of 3-5th grade teachers reported teaching digital citizenship in their classrooms, including topics of privacy and safety, digital drama, cyberbullying, digital footprints, and media literacy (Lauricella et al., 2020). To school psychologists who are working with vulnerable populations of youth or advocating for prevention programs in schools to improve student emotional well-being, digital citizenship programs may be seen as an attractive addition. Research has documented the greater online risks for emotionally vulnerable youth (Hinduja & Patchin, 2007;Wells & Mitchell, 2008); and, for youth today, peer relationships require savvy and skill navigating online communication in healthy ways (Clark et al., 2018).
Unfortunately, little evaluation research has been conducted to determine whether digital citizenship programs are achieving their expected outcomes. A recent summary of evidence on online safety and digital citizenship programs and policies noted "there is little evidence available for their effectiveness or long-term impact" (Burton et al.,in press,p. 71). Schools are increasingly seeking optimal integration of a range of school-based prevention goals (Domitrovich et al., 2010), and at this time the field lacks data on how digital citizenship programs might be enhancing or supporting existing school-based prevention.

Current Study
The current study sought to begin building a more rigorous evidence base for digital citizenship education by evaluating Be Internet Awesome (BIA), a digital citizenship curriculum developed and disseminated by Google, Inc. While no prior evaluation had been conducted on the Be Internet Awesome program, the program's content was adapted from earlier internet safety education programs, which did have some exploratory evaluation. However, evaluations of early internet safety education programs had small sample sizes, no data from control groups, little assessment of behavioral or experiential outcomes, and the original evaluation reports are no longer publicly accessible (Jones et al., 2014). A cluster randomized controlled trial (CRCT) methodology was chosen to assess the impact of the BIA program for youth (grades 4-6) on knowledge, attitudes, and behaviors related to improved safety online. Fourteen US elementary schools were randomly assigned to either an intervention condition or a control condition for the 2018-2019 school year. Comparing outcomes for children pre-and post-program delivery, we hypothesized that in comparison to students in control schools, students in intervention schools who received the BIA program would see the following improvements: (1) increased knowledge of digital citizenship vocabulary and concepts taught in the BIA program; (2) increased self-efficacy to handle online problems; (3) increased online privacy behaviors; (4) decreased harassing behaviors; (5) increased positive and civil behaviors online; and (6) increased help-seeking from parents for online problems (Fig. 1).

Design
US primary schools with 4th, 5th, and 6th grades were identified through the Common Core of Data (CCD) school directory published by the Department of Education's Institute of Education Sciences (National Center for Education Statistics, 2018) and were sent information about the study using postcards and emails in the spring of 2018. Nineteen schools contacted us for more information about the study. To be eligible for the study, schools had to agree to have students complete a pre-test survey in the fall of 2018 and a post-test survey in spring of 2019.
Participation also required that schools agreed to random assignment to one of two conditions: (1) implementing the BIA program between pre-test and post-test administration (intervention condition); or (2) not implementing the BIA program until after the post-test had been completed (control condition). One school withdrew participation after learning more about the details of the study requirements.
The 18 remaining schools were divided into pairs by matching them as closely as possible on three variables: (1) grade level, (2) percentage of students eligible for free or reduced-priced lunch, and (3) student ethnic/racial distribution. First, two 5th/6th grade only schools were matched, then six schools with less than 50% White student body were matched on this variable, finally, the remainder were  Increased knowledge of concepts taught in BIA lessons: Improved understand meaning of concepts such as "digital footprint," "catfishing," "hacker," and "trolling," and indicators that a website is safe.

Improved online privacy behaviors:
Check app privacy settings more; set stricter privacy settings; and think more about others' privacy and who might see posts before posting.
Increased safety with passwords: Increased perception of sharing passwords as risky.

Increased digital citizenship behaviors:
Increased positive behaviors online.
Increased self-efficacy to handle online problems: Increased knowledge about what to do when faced with online problems, such mean behavior or upsetting content.

Decreased online harassment behaviors:
Decreased online rude comments online, telling rumors about someone, or posting photos of someone when they knew it might upset them.

Increased asking adults for help:
Increased number of youth asking parent to help with responding to someone online or handling an online problem. matched on percent free or reduced lunch. After the matching process, one school in each pair was then randomly assigned to either the intervention or control condition using a random number generator. Prior to the start of the school year, two schools withdrew from participation because of personnel changes (one intervention and one control school), resulting in two broken pairs. At the analysis stage, errors in student id number designation and tracking problems resulted in an additional two schools being dropped from the analyses for this paper (one intervention and one control school), resulting in a final evaluation sample of 14 schools (7 treatment and 7 control schools) (see Fig. 2). The schools included in the evaluation study were from nine different states, from a range of geographic regions of the country (Northeastern, Midwest, Western, and Southern USA). They were all public elementary schools, except for the inclusion of one charter school. Eight schools in the sample were described as being located in rural communities, five in suburban communities, and one in an urban area. The total student body at the elementary schools ranged from 64 to 559, with an average of 256 students, smaller than the national average of 483 (National Center for Education Statistics, 2018). Overall, our sample of schools had a greater percentage of White students (62% on average, ranging from 6 to 96%) compared to the national average (47%); but had approximately the same mean percentage of students receiving free or reduced-cost lunch (50%, ranging from 19 to 95%, compared to the national average of 52%).
Initially, there were no significant differences between treatment groups in school-level racial and ethnic distributions or percent of students receiving reduced-price lunch. However, when the 18 schools were randomly distributed to   intervention and control conditions, the withdrawal of two schools and removal of two schools from analyses resulted in a statistically significant difference in student racial composition of schools in the intervention condition (64% White students) versus the control condition (40% White students) (F = 13.614, p < 0.001). There remained, however, no statistically significant difference in the percentage of students receiving free or reduced-price lunch (40% for intervention schools and 54% for control schools). Additionally, despite a large school in the intervention arm (n = 511, one of the broken pairs), there was also no statistically significant difference in student body size across schools (mean student body size was 189 and 101 for intervention and control schools, respectively).

Participants
Across the 14 schools included in the evaluation study, 1536 students completed the pre-test, with response rates per school ranging from 30 to 100% (M = 70%, SD = 24%). Four hundred and sixty-four pre-test students lacked a matching post-test (e.g., due to absence or missing id codes) and were excluded from analyses. The final sample of participants consisted of 1072 students in 4th, 5th, and 6th grades with completed and matched pre-post data (70% retention rate). There were no statistically significant differences between excluded and included students in terms of the school condition assignment or gender. However, there were differences across grade with 38% of 4th graders unable to be matched compared to 27% and 29% of 5th and 6th graders, respectively (χ 2 = 13.75, p < 0.001). Table 1 provides data on sample demographics across intervention and control schools for the final sample of 1072 students. One of the intervention schools was a large school with only 5th and 6th grade students, resulting in fewer 4th graders proportionally in the intervention school sample. Additional baseline differences between intervention and control samples are discussed below in the "Results" section. Table 1 Student demographics and technology use (past 6 months) at baseline at implementing and control schools a Cohen's D was used for binary variables and Cohen's W was used for the ordinal variable grade b A "high internet user" was defined as a student who reported texting, sharing content online at least several times a week, and being online for 3 or more hours per day **p ≤ 0.01; ***p ≤ 0.001

Procedures
Prior to pre-test administration, caregivers of students in all schools were sent a letter that provided detailed information on the project and a form for opting their child out of the study if they did not want their child to participate. Data were collected at two time points: (1) before the intervention at all schools (baseline, time 1); and (2) approximately 2-3 months post-intervention (time 2). Data were collected via a secure online survey that is hosted by the University of New Hampshire. Students accessed the survey through a web address provided by their teacher. Prior to pre-test and post-test administration, a teacher or school staff member read an assent form verbally to the youth. Students indicated assent by continuing with the survey if they wanted to participate. If students' caregivers opted them out of the survey or they chose not to participate, they did other computer work.
Online surveys were provided in English and completed during school hours. Surveys took the students 30-40 min to complete, on average. All procedures were approved by the Institutional Review Board (IRB) at the University of New Hampshire. Each participating school was provided with a $1000 stipend for the help they provided with planning and data collection.

Intervention and Control Conditions
The BIA program, curriculum, and materials were developed by Google, Inc. in partnership with educators, researchers, and online safety experts for youth in grades 3 to 6 (ages 8-12). The free, online curriculum (Google Inc., 2018) provides educators with materials and activities for five lessons: "Share with Care (Be Internet Smart);" "Don't Fall for Fake (Be Internet Alert);" "Secure Your Secrets (Be Internet Strong);" "It's Cool to Be Kind (Be Internet Kind);" and "When in Doubt, Talk It Out (Be Internet Brave)." Each lesson provides educators with 2-6 activities that incorporate worksheet activities, small group discussions, and role-playing. The first four lessons are also accompanied by online games intended to reinforce the lesson materials. The curriculum suggests some flexibility for delivery, noting "we encourage you to experiment to find what works well for your learners, whether that means completing the curriculum start to finish or going deep on the one or two lessons of greatest importance for your learning environment." A logic model was developed by researchers for the study connecting each of the five BIA lessons with associated outcomes (see Fig. 1). Instructors in the seven intervention schools taught the BIA program to students after the students completed the pre-test survey. At five of these schools, lessons were provided by a single lead instructor for all students (library and education technology professionals); at two schools, teachers provided lessons to their own classrooms. Mirroring procedures for how the program is delivered in school settings, instructors were provided with access to all online program materials including the curriculum manual; no specialized training was provided to schools beyond what was in the curriculum materials. Although we offered to help answer any questions that instructors had about program delivery, the only questions received were related to survey administration.
Implementation information was collected via instructor emails and surveys. The timing of the delivery of the five lessons varied across schools from October to February. Most educators provided lessons to students once a week or almost every week (68%), with the remainder using other timeframes, such as multiple days a week over the course of 2-3 weeks (32%). Although the study required presentation of all five BIA lessons and four games, a small percentage of instructors (13%) did not complete all five lessons due to time constraints. Use of each of the four BIA games were reported by 88-100% of instructors depending on the game. Because of variations in implementation timeframes and testing schedules, the time between program completion and post-test surveys ranged from 6 to 12 weeks for the intervention schools.
Control schools were instructed to not use the BIA program, or to wait until after the post-test survey was completed to implement. Five of the schools were planning to implement the BIA program in late spring after the post-test survey; however, two of the control schools were required to provide students with digital literacy education during the study timeframe, and students received some non-BIA internet safety program education using materials that had been developed by the educators.

Measurement
Most of the survey questions were developed for the study, with some adaptation from prior work by the authors (Jones & Mitchell, 2016). For the evaluation, we focused on six categories of questions.

Technology Use
Survey questions asked students about activities they had done online in the last month, how much time each day they spend online, and whether they had their own cellphone to use. Three composite variables were then created that identified: (a) whether they were doing any online activities (yes/no); (b) whether they were posting and sharing content online (yes/no); and (c) "high technology users" defined as texting and sharing content online at least several times a week and being online for 3 or more hours per day.

Knowledge of Online Safety Concepts
Five knowledge questions tested material covered in the BIA program. Specifically, students were asked multiple-choice questions about the terms: "digital footprint," "catfishing," "hacker," and "trolling," and about indicators that a website is safe. Selection of the correct answer was coded as yes/no. Missing responses were coded as having incorrectly answered the question.

Self-Efficacy to Handle Online Problems
Two questions asked students about whether they know what to do when faced with an online problem: "I know what to do if someone is mean to me online" and "I know what to do if I see something online that makes me sad or upset." Students responded on a 4-point scale from "Not true for me" to "Very true for me."

Online Privacy Behaviors and Attitudes
Four questions about online privacy behaviors and attitudes were used in the evaluation study: (1) "When you set up a new online account or profile, how often do you check the privacy settings to see how private they are?" (4-point scale from "Never" to "All of the time"); (2) "Thinking about your social media profiles like on Instagram or Facebook, how private are your posts?" (My posts are public-anyone can see them; My posts are partly private-friends of friends can see them; My posts are private-only friends can see them); (3) "How often do you think of people's privacy before you post a picture online with other people in it?" (4-point scale from "Never" to "All of the time"); (4) "When you post something online, how often do you think about all of the people who might see it?" (4-point scale from "Never" to "All of the time"). An additional question asked students how risky they thought it was to share passwords with their friends (4-point scale from "Not at all risky" to "Very risky").

Online Harassment
Online harassment was measured with a composite of three questions. Students were asked whether, in the last month, they had (1) made any rude comments about someone online in texts or messages; (2) told rumors about someone online through texts or messages; or (3) posted a video or picture of someone online when they knew it might upset them. If students responded yes to any of the three questions, they were recorded as having engaged in online harassment in the past month.

Help-Seeking from Adults for Online Problems
Two questions asked students about whether they asked for their parents' advice or help with an online problem: "In the last month, I asked for my parent's advice for what I should say back to someone online;" "In the last month, I have asked for my parent's help with a problem online that was hard for me to handle." Students responded on a 4-point scale: "Never;" "A few times;" "Several times;" and "A lot." The modal response to both questions was for students to report they "never" sought help from adults for these problems (>50% both pre-test and post-test). We therefore dichotomized the response to measure whether the student asked a parent for help at all in the last month for the problem (no/yes).

Online Civility Behaviors
An 8-item Online Civility Scale (OCS) was adapted from a prior study (Beseler et al., 2021) for use with a younger age group (4-6th graders). Students were provided with 8 statements about positive technology use such as "When I'm online, I try to make sure that I don't sound mean" and "I use the Internet to share things that I am good at or proud of," and asked how much the statements were "like them" (4-point scale from "Not at all like me" to "A lot like me"). Psychometric analyses on the scale indicated that a onefactor model best fit the eight items. Internal reliability was strong with an alpha of 0.85 at both pre-test and post-test. Validity measures (privacy related to posting and texting, pretending to be someone else online, bullying others online, and having behavioral issues in school) were all significantly associated with the online civility construct.

Missing Data
Of the sample of 1072 students, 719 youth had complete survey data (67%). For all of the study outcome variables except the privacy variables and OCS scores, missing data ranged from 1 to 12%. For the privacy variables, youth were excluded from analyses if they reported on the survey that they did not participate in the activity queried (e.g., when asked about privacy settings for online accounts, they selected the response: "I have not set up any online accounts or profiles"). This resulted in a smaller n for these analyses ranging from n = 487 to n = 895. For OCS scores, a summed score was used and therefore youth were only included if they responded to every item at pre-test and post-test, resulting in n = 824 for the analyses with this scale. When youth were missing any data from the questions on this scale, they were typically missing more than half of the data. Fifty percent (n = 248) were missing all or almost all (7/8) of the data on the pre-test or posttest OCS scale. We used pairwise deletion in the analyses of the data rather than imputing missing values because only our dependent variable had missing values; treatment group, gender, grade, and matched pretest responses were not missing (Jakobsen et al., 2017).

Analyses
Analyses for this paper were conducted using SAS statistical software, version 9.4 except for effect sizes, which were calculated in R using the effect size package. In testing for baseline equivalence between treatment groups, we calculated effect sizes using the conversion from an odds ratio to a Cohen's D value for binary variables and Cohen's W for ordinal variables (Borenstein et al., 2009). To test hypotheses, we used generalized linear mixed models (multilevel modeling) to account for students being nested in schools and specified a random intercept for each school. We used the Laplace estimation method and specified a variance components structure for the random effects covariance matrix. Potential confounding was assessed by testing whether gender or grade were associated with treatment group membership and the outcomes of interest. By definition, a confounder must be associated with both the covariate and the outcome to impact the association between a predictor and an outcome (Rothman, 2012). Grade level met the definition of a confounder, but gender did not. When grade was not significant in a model (self-efficacy), a likelihood ratio test (LRT) was used to assess its effect on model fit. Pretest measures were included in each model to account for the level of each knowledge and behavior trait at baseline. Odds ratios (OR) with 95% confidence intervals (CI) were calculated for each question and estimates adjusted for confounding variables.
A logit link function was used for binary responses including knowledge of BIA content (vocabulary items), online harassment behaviors, and asking a parent for help with an online problem. A proportional odds model for ordinal outcomes was used to estimate intervention effects over the levels of the response variable for the two self-efficacy questions and five online privacy questions. We converted the odds ratios obtained from binary and multinomial multilevel models into a Cohen's D for an easy estimate of effect size that removes the effect of varying sample sizes. To test the impact of the BIA program on the eight questions in the Online Civility Scale (OCS), the items were summed to form a score. We used a multilevel Gaussian model with the dependent variable being the post-test OCS score and adjusted the treatment effect for the OCS pre-test score and potential confounding variables.

Sensitivity Analysis
Given the inclusion of the large school in the intervention group, and the imbalance of 4th graders in the intervention and control groups, we re-ran analyses excluding both the large school and 4th graders. We have included these analyses as supplementary material (Online Resource Tables A  and B). Table 1 provides baseline data on the students' use of technology. Most youth reported having their own cell phone (60.9%), and almost all youth were participating in some kind of online activity (98.3%). Just over half of the sample (53.0%) used social media, and 64.7% of the sample reported sharing and posting content online. Around 10% of the sample were categorized as "high technology users" (i.e., texting, sharing content online several times a week, and being online for 3 or more hours a day). At baseline, youth in the intervention condition reported greater technology involvement than those in the control group. Intervention condition youth were significantly more likely to have their own cell phone (p < 0.001, Cohen's D = 0.35), were more likely to be posting and sharing content online (p < 0.001, Cohen's D = 0.34), and were more likely to use social media (p < 0.001, Cohen's D = 0.24). The non-equivalence at baseline was likely related to the greater percentage of younger 4th grade students in the control condition. There were no significant differences between the groups of youth in terms of general online activities. There were also no significant differences in the percentage of youth in each group determined to be a "high" technology user (i.e., texting, sharing content online several times a week, and being online for 3 or more hours a day). Table 2 provides overall sample percentages on outcome variables and bivariate differences between youth in the intervention and control conditions at baseline. At baseline, youth in the intervention conditions had better knowledge of several internet safety terms such as "catfishing" (p < 0.001, Cohen's D = 0.37); "hacker" (p < 0.001, Cohen's D = 0.38); and "trolling" (p = 0.002, Cohen's D = 0.26). There were no significant differences at baseline between groups in terms of their self-efficacy for how to handle someone being mean to them online or when something upsets them online. Online privacy behaviors did vary at baseline between groups. Youth in the intervention condition were more likely to check privacy settings in new online accounts (p < 0.001, Cohen's W = 0.17); less likely to have public social media posts, or not know their setting (p = 0.043, Cohen's W = 0.08); more likely to think of others' privacy when posting (p = 0.004, Cohen's W = 0.11); more likely to see sharing passwords with friends as not at all or not very likely risky (p = 0.018, Cohen's W = 0.09); and more likely to ask a parent to help respond to a comment online (p = 0.038, Cohen's D = 0.17). There was no significant difference between groups at baseline in terms of thinking about who might see posts.

Baseline Differences
Finally, there were also no significant differences between groups in reports of online harassment behaviors or asking a parent to help them with an online problem, or for mean scores on the Online Civility Scale.

Online Safety Knowledge
The values in Table 3 provide the odds ratios for the five knowledge questions, accounting for differences by grade level. A 95% confidence interval around the comparison in the odds was computed to provide a measure of certainty in the estimation in the difference in odds. Odds ratios indicated that youth in the intervention condition were more likely to show significantly greater knowledge of the meaning of "catfishing" (OR = 1.99, p < 0.001) and "digital footprint" (OR = 2.09, p = 0.006). They were also more likely to correctly identify markers of a safe website (OR = 2.07, p = 0.028). No significant differences were found for two of the knowledge questions: the meaning of "hacker" and "trolling".

Online Harassment Behavior
Controlling for grade, the odds ratios presented in Table 3 reflect the lack of change in online harassment behaviors from pre-to post-test for both study groups (OR = 1.96, p = 0.078), with the odds of a student reporting harassing behavior being greater for the intervention group of students overall than the control group, after adjusting for grade.

Asking an Adult for Help
Analyses in Table 3 provide no evidence of impact of the BIA program on youth asking an adult to help with an online problem that was hard to handle, or for advice about what to say back to someone online. There were no significant differences by group for the response to either question. There was no significant change pre-test to post-test for youth asking a parent to help with an online problem that was hard to handle or to ask for a parents' help with what to say back to someone online in models adjusted for grade and gender.

Self-Efficacy
The values in Table 3 provide the odds ratios for the two self-efficacy variables. These models were not adjusted for grade because grade was not associated with either of the self-efficacy items (p = 0.39, p = 0.48), and including them significantly reduced the model fit in an LRT. For both questions, the odds ratios showed a significant effect for the intervention group. Specifically, odds ratios indicated that youth in the intervention condition reported more knowledge about what to do if someone is mean to them online (OR = 1.34, p = 0.047), and if something online upsets them (OR = 1.54, p = 0.031).

Privacy Behaviors and Attitudes
Four questions were asked about privacy behaviors and attitudes, and results of the analyses presented in Table 3 show no significant impact of the BIA program on these outcomes. There were no significant differences across pre-and posttests for any of the four privacy outcomes in models adjusted for grade.

Online Civility Scale
At pre-test, the implementation group was similar to the control group (Table 2). After the intervention, the implementation group's OCS scores increased slightly (mean = 26.4, standard deviation (SD) = 4.86) and the control group's OCS scores decreased slightly (mean = 25.5, SD = 6.05). In the multilevel model, no treatment effects were seen for the OCS, adjusting for pre-test OCS scores and grade. The beta coefficient was positive, indicating that the treatment group showed a 0.72 unit change in the score compared to the control group; however, variability was high (SE = 0.54).

Sensitivity Analyses
We ran all outcome analyses without inclusion of the large school and with only 5th and 6th graders (see Online  Resource Tables A and B). Although there were some small changes to outcomes due to the resulting reduction in sample size and power, overall, study findings remained mostly consistent.

Discussion
This study represents the first rigorous evaluation of a digital citizenship education program. The evaluation of the Be Internet Awesome program suggests promise for program impact across some selected outcomes, but also highlights areas of caution for the current approach to digital citizenship education. Although it is critical that research expands on and verifies the findings from this first study, results suggest that the BIA program can increase youth knowledge of online safety vocabulary and concepts, even 2 months post-program delivery. Students who received the BIA program also reported a significant increase in self-efficacy for knowing what to do when online problems occur such as someone being mean to them or coming across something upsetting online. However, no effect of the BIA program was found for a number of other hypothesized outcomes including online privacy behaviors, asking a parent for help with online problems, self-reported online harassment behaviors, or self-reported online civility. Although study power limitations may have reduced the ability to detect some program effects, overall, study findings emphasize the need for stakeholders to clarify digital citizenship education goals and increase evaluation research to establish and verify effective education practices.

Knowledge of Online Safety Concepts
It is encouraging that gains in knowledge were found for students receiving the BIA program. Retention of new information is likely a foundation for change for some online behavior and safety outcomes. Understanding the meaning of terms like "hacker" or "catfishing," for example, may help increase children's awareness of online risks. Research will need to map out the media literacy concepts that seem most important for younger children to understand to make sure they are fully incorporated into education program. Additionally, the program theory linkages between increased media literacy and improved skills or safety need to be made explicit and tested. Nonetheless, children appear to maintain the information about online safety provided to them through educational programs.

Self-Efficacy
Self-efficacy to handle online problems may be a particularly important value of digital citizenship programs, and nextstep evaluation of digital citizenship education should study this possible outcome in more detail. As in their face-to-face interactions, youth are likely to encounter problematic situations online that can be resolved successfully in multiple ways. Providing younger children who are starting to engage with technology with multiple options for handling problems might increase their sense of confidence. Research in other areas has documented the relationship between self-efficacy and behavioral outcomes often targeted by prevention programs, such as bystander action and help-seeking (Bussey et al., 2020;Sheeran et al., 2016). Future studies should further examine the pathway between online knowledge, self-efficacy for handling online problems, and longer-term problem-solving and safety outcomes.

Online Privacy Behaviors
Helping to prevent privacy breaches and reduce online contact by those who might seek to harm youth is a key aim for digital citizenship programs (Finkelhor et al., 2020). Educational messages related to privacy encourage youth to be careful about who they include as "friends" on social media sites, limit interactions with strangers online, reduce identifiable information about themselves online, keep passwords private, and respect their friends' privacy. The evaluation of the BIA program did not find successful program outcomes with respect to any of the measured online privacy behaviors. The power of the evaluation for this particular outcome may have been limited by the fact that only a portion of younger children (4th-6th graders) are using social media and sharing content online, so future research should explore this online safety goal with larger samples and older youth. However, it is also possible that online privacy behaviors are difficult to influence, and that this is a content and outcome goal that needs to be re-evaluated by digital citizenship programs. Research finds that the majority of youth already use social media privacy settings and report high confidence that they can manage these settings (Madden et al., 2013). It is possible that they do not view adults as contributing additional helpful knowledge on this. It also might be that the abstract and personal nature of privacy makes it a difficult behavior to teach (Livingstone et al., 2019).

Online Harassment and Online Civility
Reducing cyberbullying behavior and increasing children's positive behavior when interacting with others online is a prominent focus for digital citizenship programs (Cortesi et al., 2020). The evaluation of the BIA program saw no impact on students' self-reported behaviors such as making rude comments about someone online, spreading rumors, or posting pictures and videos that might upset someone. It is notable that in a recent meta-analysis, many of the programs that resulted in successful reductions in cyberbullying behaviors were intensive and comprehensive school-based bullying prevention programs (Gaffney et al., 2019). The solutions for reducing online harassment and cyberbullying likely go beyond what internet safety or digital citizenship programs education can provide (Finkelhor et al., 2020). Significant differences between treatment and control groups were also not found for scores on the Online Civility Scale (OCS), which measures children's endorsement of positive online behaviors and preferences. Although students receiving the BIA program showed a small increase in OCS scores pre-test to post-test compared to students in control condition, who saw a decrease in scores, no significant differences between groups were found in the multi-level models. For 6th graders, it is possible that messages about the importance of "being kind online" start to become ineffective as students gain more independence and maturity, and more sophisticated strategies are needed.

Telling a Parent About Online Problems
Finally, the evaluation of the BIA program did not find compelling evidence for increased help-seeking from parents by youth when faced with online problems. Telling an adult when there is a problem online is an intuitive message for online safety. As much as children and youth might be able to navigate many online issues themselves, there are times when online harassment or questionable contacts require adult assistance. Yet, most youth who fail to disclose problems are probably aware that they can tell an adult and are choosing not to. Research suggests that youth are skeptical that telling helps (Davis & Nixon, 2010) and report that such disclosures often result in no change or can even make things worse (Bradshaw et al., 2007). Making the issue of "telling" even more complex, many of the youth running into the greatest amount of trouble online often have existing communication problems with parents and other adults (Ybarra et al., 2007). Digital citizenship and online safety programs might need to develop and evaluate creative new strategies to improve help-seeking for youth who run into problems online that pose significant risk and need adult intervention.

Limitations
Study findings should be considered in light of the limitations of the study. School recruitment procedures occurred over a short timeframe, and the small number of schools involved in the study limited our ability to ensure equivalence between implementation and control schools. The resulting baseline differences between groups may have affected results in ways that we were not able to fully control for. Additionally, many of the 4th, 5th, and 6th graders in the study were not yet posting and sharing photos online, affecting the power of the evaluation study to adequately test privacy behavior outcomes. Also limiting power were some differences in how schools implemented the BIA program, both in terms of timing of the lessons and in the instructor's professional background (i.e., technology specialists, librarians, or classroom teachers). Additionally, students in at least two control-condition schools were exposed to general internet us lessons between pre-and post-surveys.

Implications for Digital Citizenship Education and Research
Despite widespread dissemination of digital citizenship education programs in schools across the USA, there has been little work to define measurable goals of these educational initiatives. The current evaluation defined and operationalized a set of outcomes specific to the BIA curriculum, and we hope this initiates conversations about whether these outcomes successfully capture the goals of digital citizenship education more generally. Once goals are better defined, research can help identify optimal strategies for achieving digital citizenship goals. Findings from the study suggest that current strategies might be effective for some goals, although follow-up research is needed to confirm this and understand longer-term impacts. It is encouraging, for example, that the evaluation of BIA found positive results for knowledge and self-efficacy. Digital citizenship education for young children might be able to prepare children with tools and options for handling online problems and instill excitement about participating in many of the positive aspects of online culture.
The findings indicate that other apparent goals of digital citizenship education need to be revisited, including the focus on cyberbullying, privacy, and reputation. For these concerns, it might be more effective for digital citizenship proponents and online safety advocates to advocate for well-established prevention programs that target underlying causes of concerning behaviors. For example, cyberbullying prevention may be most successfully addressed through evidence-based bullying prevention or social emotional learning (SEL) programs (Durlak et al., 2010;Gaffney et al., 2019). Or, if reduced risk from technology-facilitated sexual exploitation is a goal, online safety experts could help contribute content to sexual assault prevention programs, sexual health programs, or curricula on healthy relationships.
For other outcomes, it may be that digital citizenship programs need to get more creative in their approach. For example, encouraging younger children to seek help from adults for online problems is a reasonable goal for digital citizenship educators, but creativity is needed to overcome some of the barriers. It may help youth to role-play how to talk with adults about problems that have gone beyond what they can handle on their own. An additional focus for online safety education could be programming for parents and teachers that helps them respond constructively to conversations with children about online problems.
Finally, as digital citizenship education evolves, it is critical that it is accompanied by a greater amount of research. Research is needed to establish psychometrically strong outcome measures, and rigorous evaluation is needed to test that educational objectives are being achieved with digital citizenship programs and messaging strategies. The findings from the current study need to be replicated, both with BIA and with similar programs, testing key outcomes more comprehensively. Additional research is also needed that focuses specifically on online safety outcomes for vulnerable and diverse youth populations. It was clear from the current study that even across a narrow age range, 4th, 5th, and 6th grades, youth development and online activity changed substantially. These changes will continue to extend and expand through middle and high school, with children's behavior online, independence, and receptiveness to adult instruction changing substantially each year. Using research to guide and verify, educational programs will need to adjust their content and approach to improve outcomes for children at each developmental stage.

Conclusions
The evaluation of the BIA program provides the first study that we are aware of to use a rigorous evaluation methodology to explore the impact of internet safety or digital citizenship programs. The findings of this study on 4th-6th grade students offer encouragement for the possible impact of digital citizenship education on some intended outcomes but highlight the need for a re-evaluation of others. Findings identified that the BIA program can help late elementary and early middle school children learn and retain knowledge of new online safety vocabulary and concepts and feel more confident in their ability to handle online problems. Results from other outcomes such as online privacy behaviors, cyberbullying, and talking with parents about online problems suggest the need for more conversations and research by school professionals and digital citizenship experts to clarify educational goals and improve strategies.

Acknowledgements
The authors would like to acknowledge the substantial contribution and assistance by staff and students in the participating schools to the study.

Data and Code Availability
The data and code are available from the corresponding author upon request.
Author Contribution LMJ and KJM designed the study and instruments; CLB conducted the statistical analysis; all coauthors contributed to writing and editing of the manuscript.