A total of 117 students (87 women) participated in exchange for course credits. Most participants were Dutch (109 participants). Participants’ ages varied between 17 and 33 years, and the mean age was 19.56 years (SD = 2.14). 109 participants self-identified as heterosexual, one participant identified as homosexual/lesbian, four participants identified as bisexual, and three participants identified as “other” (e.g., pansexual). On average, participants knew ± 5 LGBT’s in person (SD = 5.5; range, 0–50), of which they considered ± 2 to be good friends (SD = 2.2; range, 0–15). Only four people indicated not to know any LGBT people in person, and about one-third (32.5%) did not have any good friend that was LGBT. Sixteen percent of the participants indicated to be religious.Footnote 2 Participants also indicated on a 7-point scale how important religion was for them (1 = totally unimportant; 7 = very important); on average, participants scored on the lower end of this scale (M = 2.86; SD = 2.03).
We used a mixed (between- and within-subjects) design. The between-subjects factor was condition (control condition vs. experimental condition), which was randomized at the group level using a random number generator. The within-subjects factor was time (pretest, t0 vs. posttest, t1 vs. follow-up, t2). Participants in the control condition were on a “waiting-list” and received the contact intervention several weeks after the experimental groups. Group-randomized designs are unlikely to have adequate power for between-group comparisons without at least 8–10 groups per condition (Murray et al., 2004). Therefore, we aimed to collect a minimum of 16 groups in this study and allocated any additional groups to a related study for which we recruited participants simultaneously. Note that for studying the change in prejudice over time (i.e., within-group comparisons), power is good as three measurements were present for each participant and we had in total about 100 participants spread across the (limited number of) groups. Participants participated in existing workgroups as part of a first-year introduction to psychology course. Out of the 32 groups that were approached, 23 groups agreed to participate (~ 72% participation rate). Eighteen groups were included in the current study. The five remaining groups participated in a subsequent study that was executed after the current study, and that will not be discussed here further. Group size in the current study varied between 3 and 10, with a mean of 6.87 participants (SD = 1.80).
Eight groups were randomly assigned to the experimental condition (56 students in total), and eight groups were randomly assigned to the control condition (61 students in total). Two groups that were initially planned to participate in the subsequent study were eventually added to the current sample: For one of these groups, there was no educator and, as a consequence, the participants in this groups did not receive any treatment although they completed the pre-measure and the follow-up measure; this group was then added to the control condition of the current study. The other group did participate in the intervention program, but did not receive the additional treatment that was part of the subsequent study and, as a consequence, did receive exactly the same treatment as the participants in the experimental condition of the current study; this group was added to the experimental condition of the current study. This led to a total of k = 18 groups.
Below, we describe what happened during each part of the study; see the Measures section for the specific measures used in the current study and the time-points at which they were collected.
Participants who registered to participate in the study received a link to an online pretest survey (programmed in Qualtrics). The pretest started with an information screen where participants provided informed consent. After providing informed consent, a variety of measures were assessed (see Table S1 in the supplementary materials), such as several scales measuring sexual and gender prejudice and background variables. Participants could also note down any remarks they had. Hereafter, participants were thanked for their participation in the first part of this study.
T1: Intervention + T1 posttest (experimental groups only)
Only the experimental groups participated in the intervention. There were on average 11.7 days (SD = 6.5; range, 4.6–22.9 days) between the pretest (T0) and the contact intervention (T1). The intervention was taught by experienced educators of the “COC Mid-Netherlands” who identified as sexual or gender identity minority members themselves (four identified as gay men, three as lesbians, and one woman as bisexual), and discussed their personal experiences as minority members during the intervention.
In the current research, we examined an existing intervention that was developed and implemented by the organization “COC Mid-Netherlands” (www.cocmiddennederland.nl).Footnote 3 The contact intervention was led by one or two experienced LGBT educators. The contact intervention was executed in a video laboratory and was videotaped by means of four video cameras that were mounted in the corners of the ceiling. This part lasted 45–60 min. Participants were invited to the laboratory where they met one or two educators. Chairs were aligned in a half circle, facing the front of the room where the educators were seated, next to a flip-over board. Each session consisted of two parts: the introduction part and the interaction part. During the introduction part, participants and the educators introduced themselves, and the rules for the meeting were explained, e.g., that participants could ask all questions they wanted, but that they should do so respectfully. Hereafter, participants were asked to write down or mention their associations regarding sexual and gender diversity and formulate one or more questions they had regarding this topic. The questions and associations were written on the flip-over board and were later used as a guideline for discussion. During this first part, factual information about sexual and gender diversity was also discussed (e.g., what sexual and gender diversity entails, how many people are LGBT + , etc.). The introduction part ended with an educator telling his or her personal coming-out story. This story typically consisted of the educator explaining when and how they found out they were LGBT, and how their friends and family reacted to their coming out. The coming-out stories differed between different educators, as they were based on their personal experiences. One example of a personal coming-out story entailed a homosexual educator discussing how difficult it was for him to come out to his parents. He told the participants how he found himself sitting at his parents’ kitchen table every week with a different excuse, until he finally found the courage to tell his mother that he was gay.
Hereafter, the interaction part started, where participants and educators discussed sexual and gender diversity topics and participants got the chance to ask questions. While the first part of the class was relatively similar in all groups, the second part depended on the input of the participants in the introduction part. For example, in some groups, participants asked more questions about transgender people, while in other groups there were more questions on how same-sex couples could become parents. During the second part, the educators tried to discuss all questions that were raised. To this end, several techniques were used. For example, to let participants experience how difficult it can be to “come out,” the “statements-game” was used, where statements were read out, and participants had to stand up when a statement applied to them personally. These statements were somewhat personal, such as “I have stolen something in the past” or “I want to make my parents proud.” Playing this game let participants experience how difficult it could be to stand up and acknowledge something personal. This experience could be compared to a coming out, where someone has to “stand up” and tell people that he or she is not heterosexual. These techniques were based on “best practices” and were often developed by the educators themselves and (as far as we know) not based on theory or empirically tested before.
Together, the intervention combines some of the “contact conditions” as described by contact theory (Cook, 1985), like acquaintance potential (coming-out-story), common goals (several collective tasks), and support by authorities (the school/university that included the intervention in its curriculum). The contact intervention session ended after 45 min; the experimenter then entered the room and notified the educators that they needed to wrap-up the session.
Directly after the contact intervention, participants in the experimental condition completed the online posttest (T1; posttest) in individual cubicles or on laptops in the room where the session took place. Participants completed questionnaires measuring their evaluation of the contact intervention, modern LGBT negativity, and additional measures. In order to reduce demand characteristics, it was noted several times that there were no right or wrong answers and that we were interested in participants’ personal opinions. Finally, participants could provide remarks and/or ask questions about the research, were thanked for their participation, and told that they would receive an email one week later with a link to the final questionnaire.
One week after the contact intervention, participants received a link to the online follow-up test. On average, there were 7.5 days (SD = 0.6; range, 6.8–9.0 days) between the intervention + posttest (T1) and the follow-up test (T2). In the follow-up test, participants completed questionnaires measuring their evaluations of the contact intervention, modern LGBT negativity, and additional measures (see Table S1 in the supplementary materials). Finally, participants in the experimental condition were thanked for their participation and received course credits. Participants in the control condition were thanked for their participation and were informed that they would be taking part in the contact intervention at a designated time in the future. For these participants, the full debriefing followed after their participation in the contact intervention.
Except for where indicated otherwise, responses to all self-report items were provided using 7-point Likert scales ranging from 1 (strongly disagree) to 7 (strongly agree); see Table S1 in the supplementary materials for the exact wording of all items, as well as response options and time-points the items were assessed.
Modern LGBT negativity (t0, t1, t2) was assessed with an adapted version of Morrison and Morrison’s (2003, 2011) 12-item modern Homonegativity Scale–Gay men (MHS-G; Cronbach’s α: t0 = 0.82, t1 = 0.86, t2 = 0.86), where “gay men” was replaced in all statements by “lesbians, gays, bisexuals and transgenders.” An example item is “Many lesbians, gays, bisexuals and transgendered people use their sexual orientation so that they can obtain special privileges.” Items could be answered using a slider bar ranging from 0 (strongly disagree) to 100 (strongly agree).
Old-fashioned prejudice (t0, t1, t2) was assessed with the 10-item Revised Short Version of the Attitudes Towards Lesbians and Gay Men Scale (ATLG-R-S5; Herek, 1997; 5 items about lesbians and 5 items about gay men) (α: T0 = 0.64, t1 = 0.61, t2 = 0.61). An example item was “Sex between two men is just plain wrong.”
Attitudes toward public displays of affection (t0, t1, t2) were assessed with four items (Kuyper, 2015), such as “I think it’s offensive when two women kiss in public” (α: t0 = 0.88, t1 = 0.90, t2 = 0.90).
Attitudes toward gender non-conformity (t0, t1, t2) were assessed with four items (Kuyper, 2015) such as “I do not feel comfortable being around women who look masculine” (α: t0 = 0.85, t1 = 0.89, t2 = 0.83).
Evaluation of the Intervention
Participants’ self-reported effectiveness of the contact intervention (t1, t2) was assessed with the statement “The contact program has positively changed my views of LGBT people.” Evaluation of the intervention (t1, t2) was assessed with 3 items (α: t1 = 0.76, t2 = 0.83): “I thought the education class was useful,” “I thought the education class was informative,” and “I thought the contact program was useless” (recoded). Two open-ended questions (t1) assessed which aspects of the contact intervention had made the most positive and least positive impression and why. Participants also graded the intervention (t1, t2) and the educators (t1), on scales ranging from 1 (lowest grade) to 10 (highest grade). Experienced empathy after the intervention (t1) was assessed with a self-developed 8-item scale (α at t1 = 0.73). An example item was: “I could empathize with the educators’ stories.” Feelings of unsafety during the intervention (t1) were assessed with a self-developed 9-item scale (α at t1 = 0.82). An example item was “I was afraid that my opinions would be criticized during the education class.” All these questions were only completed by participants in the experimental condition.
Behavior During the Intervention
The behaviors by participants and educators during the intervention were coded by two trained research assistants who were masked to conditions and hypotheses, but aware of the general topic of the study (i.e., testing the effectiveness of an interaction). Seven out of eight videos were coded by both coders. To increase inter-coder reliability, the coding of the first few videos was discussed together with the first author, during which the ratings of the two coders were compared across 5-min intervals, and possible discrepancies were discussed.
Initially, coders tallied how often and how long the following behaviors took place: asking a question, making a comment, mentioning a negative stereotype/prejudice, laughing, positive reinforcement, making a positive remark, nodding, raising one’s hand, talking among themselves, and mentioning a positive stereotype/prejudice. However, scoring and interpretation of these different categories proved to be complex. For example, in many of the videos these behaviors did not, or just a few times, occur. Furthermore, it often happened that participants mentioned examples of behaviors or attitudes that they disagreed with, which were hard to score (i.e., is the comment that “some people think that gay men are effeminate, but I disagree” a negative stereotype, a positive remark, or both?). It also happened that the verbal behavior of the participants did not match their non-verbal behaviors (e.g., saying that one has no problem with LGBT people, but laughing when a stereotype is mentioned). As a consequence, coders sometimes had large discrepancies in their coding of the events (i.e., one coder marked such an event as positive and the other as negative). Because of these difficulties, for the current report, we focus on the percentage of time educators and participants who were actively engaged (e.g., talking and making comments) as this was easy to determine in an objective and reliable way.
The following demographic variables were assessed: sex, age, sexual orientation, ethnicity (t0), and religiosity (t2). Contact with LGBTs (t0) was assessed with two open-ended questions, assessing how many lesbians, gay men, bisexuals, and/or transgender people participants knew personally, and how many of them they considered to be (good) friends. Finally, to examine the possible role of demand characteristics in reporting (low levels of) prejudice, we included the Social Desirability Scale (Strahan & Gerbasi, 1972, 20 items, α = 0.74) at t0. Answers could be given on a binary scale (true vs. false). An example item was “I'm always willing to admit it when I make a mistake.”
As we were primarily interested in the effect of the intervention on modern LGBT negativity, we first investigated—by means of multilevel modeling—whether modern LGBT negativity could be predicted based on condition and time of measurement (controlling for gender, age, and group size). Next, we also studied the pattern of change over time for the other forms of prejudice (old-fashioned prejudice, attitudes toward gender non-conformity, and attitudes toward public displays of affection).
Two models were built for modern LGBT negativity. First, we built a multilevel model for the experimental condition only, to compare modern LGBT negativity across all three measurements (Model 1: t0, t1, t2). This allowed us to examine the immediate effect of the intervention (t0-t1) and the extent to which this effect lasted over time (t1-t2 and t0 vs t2). Second, we built a multilevel model to compare differences between pre- and follow-up measures of LGBT negativity (t0- t2) between the experimental and control condition (Model 2; note that there are no measurements for the control condition at t1).
For both models, we primarily examined changes in prejudice associated with the intervention. Additionally, we examined whether the intervention effect was influenced by the control variables of sex, age, and group size. We used multilevel modeling in order to deal with the hierarchical structure of the data (i.e., three-level data with repeated measures of prejudice within participants and multiple participants per group, which caused prejudice scores within participants and within groups to be correlated) and examined the within-participant and within-group differences (Singer & Willett, 2003). To this end, we adopted the statistical software R version 3.3.1 and used the "lmer"-function from the "lme4" package. We obtained p-values by the Satterthwaite approximation using the "lmerTest" package.
Building the Multilevel Model
We constructed multilevel models in the following stepwise fashion: We started with a simple model and added or removed effects until we reached a final model that adequately described our data. In particular, first, an unconditional means model with only random intercepts for participants (Level 2) and groups (Level 3) was fit in order to inspect how the dependent variable varied across time-points, participants, and groups (Step 1). In a second step, we added time as a fixed effect to compare modern LGBT negativity across the three time-points (Step 2). Next, we made the effect of time random across participants (Step 3) and groups (Step 4) in order to examine variations in the slopes for time across participants and groups.
In a next step, we added (Level 2) participant characteristics (i.e., sex and age) as fixed effects to explain between-participant differences in modern LGBT negativity at the beginning of the study and between-participant differences in the (short- and long-term) intervention effect (Step 5). In a subsequent step, we tested whether the between-participant differences from the previous model differed across groups (i.e., random effects at Level 3) and we added (Level 3) group characteristics (i.e., condition and group size) as fixed effects to explain these between-group differences (Step 6). Next, we also added the interaction effects of all fixed effects already in the model (Step 7). Finally, we made the model more parsimonious by eliminating all variables and interaction effects of which deleting them did not substantially decrease model fit, starting with the highest order interaction effects. In each step, we tested with a likelihood-ratio test (LRT) whether adding an effect(s) significantly improved the model fit and whether removing (an) effect(s) did not decrease the model fit significantly.
Assumptions and Bootstrap
We tested the multilevel model assumptions for the final models. In particular, we tested for linearity, normality, and homoscedasticity of the residuals. Unless described otherwise, we found no clear violations of these assumptions. To determine the robustness of our conclusions against possible model assumption violations, we also performed a clustered bootstrap analysis with 10.000 bootstrap samples (Davidson & Hinkley, 1997; Deen & de Rooij, 2020) and compared our obtained final model with the bootstrap results.