1 Introduction

Muslims form the largest religious minority in Germany. Anti-Muslim attitudes have remained stable for around 20 years and are repeatedly taken up in political debates (Decker and Brähler 2020; Zick et al. 2019). Anti-Islam and anti-Muslim crime has been on the rise in recent years (BKA 2020; Jansen 2021). At the same time a right-wing party, the AfD, has been enjoying enormous electoral success since 2015 by, among other things, specifically speaking out against the immigration of people from predominantly Muslim countries and rejecting Islam and Muslims as foreign and alien.

Various initiatives from civil society and politics are countering this antagonism. The German Islam Conference, established by the Federal Minister of the Interior in 2006, aims at an improvement of relations between the state and Islam. The nationwide Open Mosque Day, established in 1997 by a national federation of mosques, is intended to signify the attachment of Islam to its non-Muslim environment and continues to receive high media attention. For a much longer period of time, and presumably with a wider reach, albeit less conspicuously, organized tours of mosques have been the form of action chosen by a multitude of actors at the grassroots level. They are initiated from within mosque communities and often jointly organized with non-Muslim individuals and institutions. Mosque community volunteers and imams guide mainly non-Muslim visitors through the premises, tell about the history and everyday life of the community and give lectures on Islamic faith.

This paper focuses on the effects of the tours from the perspective of applied contact research. Scientific studies on the effects of intergroup contact were spurred by Gordon Allport’s (1954) work on The Nature of Prejudice. The basic idea of the so-called contact hypothesis is that contact with minority groups reduces prejudice against members of these groups. Already supported at that time by numerous studies, his remarks triggered an enormous wave of studies. To this day, contact is studied in various forms (direct contact, indirect contact, imagined contact or extended contact), under diverse conditions and between different groups in different regions of the world. The findings point to a great potential for reducing prejudice against members of outgroups. Therefore, they have a high social relevance for political and social work. However, research on contact increasingly laments the lack of practice-based studies that can provide concrete evidence for policy-making (Paluck and Green 2009; Paluck et al. 2019; Lemmer and Wagner 2015). Among other things, there is a lack of verification of long-term effects, of experimental studies with control groups, but also of specific descriptions of the contact studied (Paluck et al. 2019).

From an analytical point of view, mosque tours are encounters of a non-Muslim in-group with Muslims and Islam as their out-group. This contact constellation is characterized by an unusual contrast. On the one hand, favorable conditions prevail. A large number of encounters between Muslims and non-Muslims (250,000 annually according to a projection by Bentrup and Salentin 2021) have steadily taken place in mosques for decades. They are embedded in informal but nonetheless tried-and-true structures, because dedicated activists are involved on both sides. The institutions in which they operate are sympathetic to the encounters. Access is low-threshold: only a phone call or email is required to register. In principle the encounters could thus have a broad impact.

On the other hand, going by findings we shall present in more detail in Chap. 2, the implementation of the contact has a number of weaknesses, as an evaluation of reports on the visits (Haubach and Salentin 2015) has shown. The tours have no evidence-based conceptual design tapping contact research knowledge. At first sight they have little in common with the contact studied in the majority of academic inquiries, as we shall explain in more detail below. Meetings with socially similar persons are the exception. Most hosts are poorly prepared pedagogically and rhetorically to deal with conflict situations. Indeed, a typical tour in a way resembles a visit to a museum. All in all, these circumstances cast doubt on the impact of the visits. The open and as yet empirically unanswered question is therefore whether desirable contact effects occur even under obviously unfavorable conditions. Our paper seeks to answer this question with data from a pre-post study of mosque visitor attitudes.

The aim of our empirical part is twofold. First, we want to ascertain whether a tour of a mosque reduces prejudice. We use quantitative data from a three wave panel sample of secondary school students. This section focuses on a unidimensional attitude measure and adheres to conventional survey methodology. A straightforward answer to that question certainly has value in itself. But it would no doubt be helpful to gain a hint of explanations as well. We want to learn more about what changes the tour induces in the cognitive representation of Islam and of Muslims. This is the second aim. Therefore, we widen the scope of inquiry and analyze qualitative data on mental images held before and after the tour.

We shall set out with a discussion of findings from previous research. The circumstances under which contact is particularly effective are well known, but these findings are not applicable here. We therefore review work that examines the effects of contact under less favorable conditions and that at least suggests the possibility of attitudinal change.

2 Applied research of contact hypothesis—What do we know about contact in the field?

The largest meta-analysis of contact studies to date by Pettigrew and Tropp (2006) shows that contact generally attenuates prejudice. In addition, the two authors show that the conditions, already formulated by Allport (1954), strengthen the effect of contact. This includes “equal status contact between majority and minority groups in the pursuit of common goals” and contact, that is “sanctioned by institutional supports (i.e., by law, custom or local atmosphere), and provided it is of a sort that leads to the perception of common interests and common humanity between members of the two groups” (Allport 1954, p. 281). The four conditions—equal status, common goals, intergroup cooperation and the support of authorities—have since been tested in many studies. Yet the general message is that even in the absence of optimal conditions, contact has a prejudice-reducing effect (Pettigrew and Tropp 2006, pp. 760–766).

However, these results are only of limited use for political and social work in practice. This is because the majority of results comes either from surveys or from experimental studies under laboratory conditions, i.e. contact situations that can be controlled and manipulated (e.g. also Kotzur et al. 2019; Kende et al. 2017). Laboratory studies have the advantage that treatments can be used in a targeted manner and the effects can be tested for causality through an experimental design. The boundary conditions can be controlled. What cannot be taken into account, however, are social processes that frame contact in the real world (Paluck and Green 2009, p. 349). The complexity of social reality and its influence on contact are thus left out. Transferring findings from experimental studies under laboratory conditions to the design of encounters in practice is therefore difficult.

In survey studies, contact experiences are recorded retrospectively. For example, people are asked about the frequency of contact in their circle of friends, at school, at work or in general with people of a certain group and then brought into connection with prejudice against this group (e.g. Wagner et al. 1989; Steinmann 2020). This correlation has also been examined for Muslims in German surveys. The results generally show that the more contact with Muslims is reported, the lower the prejudice (e.g. Pickel et al. 2020; Foroutan et al. 2014). However, the extent to which contact leads to less prejudice or whether people with less prejudice are more open to outgroup contact cannot be clearly concluded in simple cross-sectional surveys (Pettigrew and Tropp 2006, p. 753). The problem of the unknown direction of causality of contact and prejudice is known as selection bias (Pettigrew 1998, p. 69). Panel studies are necessary to determine the direction of causality. The selection bias affects also the validity of experimental studies if the participants are not randomly distributed between the control and treatment groups, but decide on their own whether they want to make contact with an outgroup or not.

In order to investigate the effectiveness of contact under real-world influences Paluck and Green (2009) make two demands (Paluck and Green 2009, p. 357). First, the results of laboratory experiments should be tested under real conditions. Second, they see field experimentation not only as “a method for testing theoretical ideas developed in the laboratory—the field itself should be used as a laboratory for generating richer, more multidimensional theory” (ibid.). The authors thus also call for generating information from the field.

The criticism of the lack of practical relevance of research on the contact hypothesis and the interest in studies outside the laboratory are taken up by field experiments with control group designs. For example, Green and Wong (2009) examine contact during two- to three-week wilderness courses on a camping expedition. They compare groups with only white participants to groups with at least three African Americans and/or Latinos/as whereby the contacts within the respective groups are linked to a common goal and are designed cooperatively. The authors found a favorable effect on prejudice a few weeks later (Green and Wong 2009, pp. 5–14). Finseraas and Kotsadam (2017) also find more positive attitudes towards the work ethics of immigrants among soldiers with an ethnic Norwegian background who share a room with at least one soldier with an ethnic minority background for eight weeks (Finseraas and Kotsadam 2017, p. 714). In this setting too, cooperation between the participants is required to accomplish a task together, in this case military training. Another typical feature of this form of contact is that it takes place in peer groups that are in comparable life situations. This provides essential conditions that favor a reduction of prejudice.

Other practice-based studies focus on targeted intervention programs that aim to reduce prejudice and associated discrimination against minorities. Intervention programs are characterized by the fact that contact does not take place naively, but follows a structured procedure. Examples of this are workshops in which groups meet. For instance, Maoz (2000) describes a reduction of prejudice for both Jewish-Israelis and Palestinians after a two-day meeting with joint activities and discussions. However, many studies also focus on interventions that only force indirect contact. These can be, for example, stories told from the perspective of protagonists from the outgroup (Aronson et al. 2016; Liebkind et al. 2019) or knowledge transfer about the outgroup (Moritz et al. 2018). There is broad empirical evidence for effects of both direct and indirect contact interventions (Lemmer and Wagner 2015; Paluck et al. 2019). In a meta-analysis, Lemmer and Wagner (2015) show that both structured intergroup discussions and dialogues and cooperative learning programs improve ethnic attitudes in practice. They also confirm long-term effects and identify indirect contact as a useful alternative to direct contact.

Rather unusual formats are the two studies by Walch et al. (2012) and Orosz et al. (2016). Their focus is on outgroup individuals who report discrimination experiences. For example, the “living library” project aims to reduce prejudice against Roma and LGBT people. Volunteer Roma or LGBT persons (the so-called “Books”—30 to 50 years old) share their experiences of discrimination, which can be traced back to their group membership, with participants (the “Readers”—14 to 20 years old) who do not belong to these groups. The aim is to establish a personal conversation. Here too, the before-and-after surveys show consistent effects of contact (Orosz et al. 2016, p. 516). The study by Walch et al. (2012) also speaks for the effect of personal narratives. It compares two differently designed interventions on the topic of transphobia. The first intervention includes a factual lecture about transphobia by an expert. In the second intervention, a transgender person speaks about her own experiences. The authors find the stronger effect after the experience report. Whether written down in stories or told by a person present, such formats make use of the effect of perspective taking (Berthold et al. 2013). However, the studies of Orosz et al. and Walch et al. cannot show any other conditions, which have proven to be optimal for the reduction of prejudice; the contacts are not necessarily designed for peers or persons of equal social status. Nor are common goals or cooperation apparent. Support by authorities could be present in principle, but is not mentioned. Contacts that are not subject to ideal conditions thus also show a positive effect on prejudice in practice.

The scientific discussions about the intergroup effect of narratives from a personal perspective in organized settings go beyond quantitative considerations. The NGO initiative “My Story” was accompanied by a qualitative interview study. With the aim of reconciling the former war opponents, personal stories were told by the three main Bosnian ethnic groups at events (Oberpfalzerová et al. 2019, p. 2). The analyses of the interviews reveal, among other things, that the listeners show more emotional and cognitive empathy and individualization of the outgroup instead of homogenization; the individual persons move into the foreground through their story and their respective nationality into the background. The authors describe this effect as personalization and rehumanization of the outgroup (Oberpfalzerová et al. 2019, pp. 9–14). From the viewpoint of perspective taking, not only the influence of literature but also narratives conveyed through film is discussed as a factor that can support a reconciliation process between two groups in conflict (Bocheńska 2018).

To summarize, research on practice-based contact confirms a favorable effect on prejudice against outgroups even in the absence of optimal conditions. However, personal narratives by outgroup members compensated for these conditions, clearing the way for outgroup perspective taking. But what happens when even this possibility is missing?

3 A case study on mosque visits

Our study examines a contact situation in which, at first glance, an outgroup individual is in the foreground: the guide on a mosque tour. A mosque community volunteer or an imam guides a (predominantly) non-Muslim group through the building. But contrary to what the ongoing presence of this person might lead one to expect, it is not about his or her personal experience. This is a huge difference from the projects described above such as “living library” or “My Story”. It is rather the building, the explanations about the building and also the explanations about the religion that make up the main part of the tour. We thus find a contact situation that generally remains superficial in interpersonal terms and deviates from ideal contact conditions, that has proven itself in research (equal status between the groups in the situation, common goals, intergroup cooperation and the support of authorities). Furthermore, perspective taking is rarely fostered in such encounters.

Mosque tours cannot be understood as interventions in the classical sense. Rather, they have emerged as a grassroots movement, acting in a decentralized and unstructured manner, and can be understood as a bona fide reaction to the prevailing anti-Islam and anti-Muslim attitudes in Europe. Even though the course of mosque tours is often similar (Haubach and Salentin 2015), they were not developed along research hypotheses or concepts, but rather came about through pragmatic considerations and hands-on experience. With the tours, mosques nevertheless refer to the effects of contact and aim to reduce anti-Islam and anti-Muslim prejudice among non-Muslims (Janzen et al. 2016).

According to our research (Haubach and Salentin 2015; Bentrup and Salentin 2021), school classes form the largest subgroup of mosque tour participants, which is why the focus here is on students. The excursions to mosques are undertaken as a class unit. Apart from the teachers, the contact takes place between an adult from the mosque and young people forming a school class. A special feature of this group of visitors is certainly that there are also Muslim students in most classes. On the one hand, this can have an effect on the course of conversations during the guided tours or afterwards. On the other hand, these fellow students represent an earlier contact for the non-Muslim participants. This fact is taken into account in the analyses.

With a duration of about one to two hours, the encounters are short, and they only take place once. The contact cannot be understood as purely interreligious, as the visitors do not necessarily belong to a particular religious community or identify as religious at all. Another special feature of the contact is the location. Mosques themselves have become projection surfaces for prejudice (see Rezek 2019; Bayrakli and Hafez 2020, p. 18). During a guided tour, participants not only come into contact with a devout Muslim, but also enter a place where Islam is practiced.

In the present study, we pursue two goals. First, we test the contact hypothesis under real conditions by asking whether the encounters at mosque visits reduce prejudice against Islam. For this purpose, we use panel data and a control group design to compare students’ attitudes before and after a visit (quantitative study). Second, we ask how contact in the specific setting of a mosque visit affects perceptions of Muslims. The second question is an attempt to use the field itself as a site of observation and to gain an insight into the impact of these special encounters through explorative analyses (Paluck and Green 2009, p. 357). We use the evaluation of free associations with Muslims before and after a mosque visit (qualitative study).

As explained above, we focus on both anti-Islam and anti-Muslim attitudes. In doing so, we follow recent discussions and recommendations regarding the distinction between the two phenomena (see Diekmann 2022; Uenal 2016). According to Diekmann (2022), we define anti-Islam attitudes as hostile attitudes toward the religion of Islam and anti-Muslim attitudes as hostile attitudes toward people of Muslim faith (Diekmann 2022, p. 299). There is empirical evidence that anti-Islam and anti-Muslim attitudes are not congruent, which we acknowledge by analyzing anti-Islam attitudes (first study) and anti-Muslim attitudes (second study) separately. Nonetheless, both phenomena are correlated and, due to their close relationship and overlap, should ideally be recognized and interpreted as complementary to one another (Diekmann 2022). Therefore, anti-Islam attitudes and anti-Muslim attitudes would have to be considered in both the quantitative and the qualitative study. However, the available data limits the analysis. For this reason, we make full use of the data base, shedding light on both changes in attitudes towards Islam and in images about Muslims instead of examining only one of these dimensions.

4 Part 1: quantitative Study

4.1 Study design

For the quantitative examination of whether mosque tours change participants’ attitudes towards Islam, the study is designed to explain any changes in attitudes necessarily through the tours and to exclude other influences. The set-up corresponds to a natural experiment in the terminology suggested by Shadish et al. (2002, p. 12 ff) while Paluck and Green (2009, p. 344) would classify it as a quasi-experimental panel study: we could neither manipulate the presumed cause of attitude change, viz. mosque attendance, nor randomly assign participants, but we included pretest observations and a control group of the highest achievable similarity to the treatment group to rule out alternative explanations. Specifically, this means:

First, attitudes of a panel sample were measured several times. The first measurement (t1) took place before, the second (t2) after the guided tour. This was to exclude participants’ experiences that had an impact on their attitudes before t1 as an explanation. Such influences are conceivable during the preparation for the tour, for example through reading, lectures or other information. A third measurement took place several months after the mosque tour (t3). It was intended to test whether potential changes in attitudes were sustainable.

Secondly, we interviewed a directly comparable control group, which, with otherwise identical conditions, only differed from the visitors in that they did not undertake a visit. This design was necessary to exclude third-party influences in the period after t1 as a cause of attitude changes. Such influences could have come from daily political events such as terrorist attacks (which, however, did not actually occur) and the like.

The study approached the ideal of a natural experiment within the possibilities and restrictions that the practice of visiting mosques entails. The treatment consisted of guided tours of mosques organized by teachers for school classes. In order to reveal the effects of the encounters as they naturally exist, we did not take any influence on the course of the tour. The program was determined by the guides and teachers alone (Haubach and Salentin 2015 describe typical tour procedures). In contrast to an experiment proper (described by Shadish et al. 2002, p. 7), the treatment was not amenable to experimenter influence and thus not manipulated based on theoretical considerations. Further, the random division into treatment and control groups was also not feasible for practical reasons. The organization of the lessons made it necessary to adopt the given division according to school classes. Thus, already at t1, differences between the visitor and control groups and between individual classes within the two groups could not be ruled out. They come about through a varying proportion of Muslim students in a class, the religious instruction attended, private contacts with Muslims and other factors. In our analyses of the change between the pretest and posttest, we therefore take into account any differences between the groups before the treatment. Thus, the design does not correspond to the ideal experiment in two aspects, but only under these conditions did the study become possible at all. The great advantage of this approach is the particular closeness to reality of the data. The design of the study claims to reflect contact as it is practiced in society.

Another positive aspect is that as teachers assign entire school classes to treatment and control conditions, selection bias (see Chap. 2) occurs, if at all, through administrator selection (Shadish et al. 2002, p. 14) rather than through self-selection based on pre-existing dispositions at the level of the subjects linked to the attitudes at hand. After the teacher’s decision as to whether a school visits a mosque, it takes an active refusal by the student or the parents to avoid contact. The decision whether a class visits a mosque or not can also depend on other factors (e.g. other topics or other excursions with the class). We cannot say how relevant the teachers’ selection bias is. However, we shall check whether the students’ attitudes at t1 differ between the treatment and control group (see Chap. 4c).

4.2 Sample

We contacted mosques all over Germany and found out in which mosques school classes had registered for guided tours. In order to ensure a certain variation in the number of students and the mosques visited, we selected a total of six mosques in the north, south, east and west of Germany. They belong to different mosque associations. We then contacted the teachers involved and their schools. The students were informed that their participation in our study was voluntary.Footnote 1

The students were from grades seven to nine, from different types of schools in the hierarchical German school system. The control group consisted of parallel classes of the same age in the same school. One of the schools did not have a parallel class available and we chose a comparable class at a neighboring school. The survey took place between April 2016 and February 2017, with visits occurring between late April and early December. The average interval between t1 and t2 was 11.5 days (SD = 9.1, min. 4, max. 35). The average interval between t1 and t3 was 117.4 days (SD = 47.5, min. 59, max. 209). With regard to the time intervals of the surveys, we mainly had to follow the possibilities of the schools in order to be able to conduct the survey in class.

A total of 20 classes from nine schools took part in the survey. Ten classes attended a guided tour between t1 and t2. They form the treatment group. The other ten classes did not visit a mosque either within the survey period or previously in this class composition. They constitute our control group. In total, over 400 students were interviewed at each time point and 1353 valid questionnaires were collected from the two groups across the three time points (see Table 1). We excluded from analysis the questionnaires of those students that did not participate at all time points (viz. N = 261 questionnaires, ≙ 19.2%). Among the likely reasons for non-participation at individual time points were illness, change of school, and refusal. However, since the schools did not provide us with details, we cannot specify this any further. Likewise, all respondents who stated that they belonged to a Muslim religious community (N = 57) were removed from the data set as this is supposed to be about outgroup prejudice. N = 3 cases with conspicuous response patterns gave reason to assume invalid data. After that, balanced panel data for 344 people remained. In a final step, all cases with no value on the dependent variable (see Chap. 4c) were also excluded. We calculated how many cases had valid values on the factor at all three time points. One case had a valid value at only one time point, 19 cases had a valid value at two time points, 324 cases had valid values at all three time points. The analyses thus take place with a panel of 324 respondents (treatment group: n = 162, control group: n = 162).

Table 1 Sample

The age of the respondents is between 11 and 16 years (M = 13.47, SD = 1.235, N = 318). The proportion of female respondents is slightly higher than that of male respondents (170 female, 140 male, 6 divers, 8 missing values).

Although we were able to achieve variation according to several characteristics, it is explicitly not a random sample. For example, the number of classes is unevenly distributed regionally. It would also be inadmissible to speak of a random sample because the population of visitors to mosque tours is unknown.

Moreover, due to the small number of tours, characteristics such as school type, region and mosque association are closely linked to individual tours. Therefore, statistical controls for these characteristics are not possible and we only consider dummy categories for individual tours in our analyses. The number of classes is also too small for multi-level analyses.

4.3 Measurement

The scale on anti-Islam attitudes was part of a larger self-administered paper-and-pencil questionnaire. Nine items measured attitudes toward Islam. An exploratory factor analysis across these items showed one single factor. The anti-Islam factor (see Table 2) is represented by four negative and five positive statements, with a good internal consistence (Cronbach’s alpha t1 = 0.870Footnote 2). We calculated the index as the average of the nine items. This is our dependent variable. Responses were rated on a 5-point scale ranging from 1 (totally disagree) to 5 (totally agree). Items with a positive wording were recoded. A high index value thus stands for a strong rejection of Islam.

Table 2 Items of the factor: anti-Islam attitudes at t1

The proportion of missing values for most items ranges above what is usual in studies with university samples. In part, this is due to the general inexperience of the sample subjects in this age group with questionnaires. In part, however, the students refused—justifiably, in fact—to assess matters they did not think they knew well enough (Janzen et al. 2016, p. 95). In a pretest, they said, “But we can’t say anything about that because we don’t know enough about it.” Therefore, to counter the risk of unit non-responses, we added a “don’t know” option to the response scale. Many students made use of it for the question “Islam helps its believers to overcome difficulties”, for example. Given the number of missing values, before indexing, we used Little’s test to check whether the missing values were completely at random (MCAR): They are (χ2 = 523.836, DF = 524, p = 0.494). Despite the comparatively high number of missing values (see Table 2), we then calculated an index value for each case for which at least one item was non-missing.

In order to measure earlier contact or contact in the current close social environment, we asked “How many of your friends are Muslim?” and “How many of your classmates are Muslim”? Respondents could choose (almost) nobody, less than half, about half, more than half or (almost) all. For further analyses, the two contact items were recoded into dichotomous variables (contact vs. no contact), since the vast majority of respondents placed themselves in the category “(almost) nobody”. The Chi-Square test showed no significant difference between treatment and control group at t1 (χ2(DF = 1, N = 300) = 1.733, p = 0.236 for Muslims as friends, χ2(DF = 1, N = 287) = 1.307, p = 0.258 for Muslims in class).

The data also show no significant difference in attitudes towards Islam between the treatment (M = 2.97, SD = 0.780) and control group (M = 3.05, SD = 0.743) at t1 (F(DF = 1, N = 322) = 0.47, p = 0.368). Despite the non-random selection of the two groups, they do not show any differences on the important indicators, so they start with the same preconditions in terms of attitudes and earlier contact.

4.4 Do mosque visits change attitudes towards Islam?

Figure 1 illustrates the development of anti-Islam attitudes over time for all 324 persons divided into control and treatment group (left side) and into the respective mosque visit groups of the treatment group (right side). Our measure of anti-Islam attitudes shows almost no change across the interviews for the control group (t1C = 3.05, t2C = 3.09, t3C = 3.08, NC = 162). The treatment group averages range from 2.34 for group 4 at t2, up to 3.70 for group 5 at t1 and t3. The treatment group averages at t1 are also evenly distributed below and above the control group mean, with groups 1 and 5 having higher values than the control group mean at t1 and groups 2, 3, 4 and 6 having lower values than the control group mean at t1. This observation and the fact that the overall mean for the treatment group at t1 (t1T = 2.97) is not largely different from the control group mean at t1 (t1C = 3.05) speaks in favor of the consistency of our anti-Islam attitudes measure.

Fig. 1
figure 1

Mean values of the groups. Note: Group 1: N = 16, Group 2: N = 36, Group 3: N = 66, Group 4: N = 15, Group 5: N = 15, Group 6: N = 14. For detailed information, see Table 5 in the appendix

The averages values of the treatment groups show a change between the different interviews. A decrease can be noticed for most treatment groups between the first and the second point in time when the guided mosque visit happened. Going from t2 to t3 there seems to be a general trend of growing anti-Islam prejudice. However, this growth does not lead to higher anti-Islam prejudice than at the initial time point in any case.

For our analysis we choose a fixed-effect approach to isolate a possible treatment effect using a difference-in-difference estimator, which controls for unobserved heterogeneity in a panel setting and is commonly used to evaluate natural experiments (Wooldridge 2016).

The first Model (M1) explains the difference in anti-Islam attitudes between the point in time prior to the mosque visit (t1) and the first follow-up interview (t2). It has the capability to provide information on the short-term effect of mosque visits on anti-Islam attitudes. As all time-varying variables are eliminated in fixed-effect regression, the model only includes a dummy-coding for each of the six mosque visits. Each person has either taken part in one of the six mosque visits or did not visit a mosque at all. Students who did not visit a mosque at all are the reference group and their coefficient is represented in the intercept in our model.

From the six different mosque visits our model shows four visits to have a significant impact on anti-Islam attitudes (group 1, 3, 4 and 6; p < 0.05). All six coefficients are negative which corresponds to a decrease in anti-Islam prejudice from t1 to t2. Figure 2 contains the coefficient plots including the 95% per cent confidence intervals for M1.

Fig. 2
figure 2

First model (M1)—Regression analysis t1 − t2. Note: For detailed information, see Table 6 in the appendix

The second model (M2) serves to analyze the difference between the point in time prior to the mosque visit (t1) and the second follow-up interview (t3), thus checking for a possible long-term effect.

In M2 only one of six mosques visit variables remains significant (group 3). M2, however, does not necessarily contradict M1 as can be seen in Fig. 3, which contains the coefficient plots including the 95% per cent confidence intervals for M2. Most notably, all point estimates remain below zero (left-side of the dashed line in Fig. 3). This particular pattern can be interpreted as a hint onto possible, albeit weak, long-term effects.

Fig. 3
figure 3

Second model (M2)—Regression analysis t1 − t3. Note: For detailed information, see Table 7 in the appendix

Even though we cannot explain exactly why we do not observe stronger effects in the long term, our methods allow us to be very certain that the short-term effects are induced by the mosque visits and not by other factors. This is nicely depicted by the homogeneous decrease from t1 to t2 shown in Fig. 1. By eliminating unobserved heterogeneity through first-differencing in our models, we can also be a lot more certain that the mosque visits are the primary reasons for the change in anti-Islam attitudes. This is supported by an additional robustness check in the third model (M3). Instead of using a fixed-effects approach, we choose the anti-Islam attitudes at the second interview as our dependent variable. We model it using some important variables that have been surveyed such as the students’ anti-Islam attitudes at the first interview, if students had Muslims in their class (earlier contact—Muslims in class), if students had any Muslims as close friends (earlier contact—Muslims as friends), as well as the students‘ gender and age. Figure 4 contains the coefficient plots including the 95% per cent confidence intervals for M3. It can be observed that the same groups which show a significant difference in anti-Islam attitudes between t1 and t2 in M1, also show to have a significant influence on anti-Islam attitudes in M3 compared to the control group. Furthermore, anti-Islam attitudes at t1 are a strong positive predictor for anti-Islam attitudes at t2. Gender has a small, but significant effect on anti-Islam attitudes, showing that individuals who identify as female have weaker anti-Islam prejudice. Whether students had Muslims in their class or whether they had Muslims as close friends as well as the age, is not predictive of anti-Islam attitudes. Previous contact does not seem to have any influence on the effect of the mosque visit.

Fig. 4
figure 4

Third model (M3)—Regression analysis with control variables. Note: The intercept (t-value = 1.94, p = 0.053) has been omitted in this Figure due to lack of space. Coefficients can be found in Table 8 in the appendix

An overall look at the data and the results from the models suggests that guided mosque visits can indeed alleviate anti-Islam prejudice, but the effects seem to fall off and have less impact in the long term. This can be due to a myriad of reasons, but most likely candidates include the violated condition that contact between groups should last for an extended period of time. Being only a one time experience it is not very surprising that at best a very small reduction in anti-Islam prejudice is present at a later point in time.

Another likely reason is that the different mosque visits had very different structure and content. While some may be more interactive, cooperative and personal, others may only stay on a very factual level presenting information without actual contact, let alone peer contact as described by the contact hypothesis.

The level differences regarding anti-Islam attitudes between the different groups range from 2.34 up to 3.70 on our scale from 1 (lowest) to 5 (highest). It is not very surprising that some variance between groups in terms of anti-Islam attitudes can be observed. One of multiple possible explanations for this is the fact that the groups are from very heterogeneous regions of Germany where some regions may include a rather high population of Muslims, while others have a rather low percentage of Muslim population which implies the mosque visit being a more novel experience for students with previously less Muslims in their everyday life around them.

All in all, guided mosque visits certainly show potential to alleviate students’ anti-Islam prejudice, but the question how to properly create sustained effects remains. Our results suggest that long-term effects are, in principle, possible. Further research is needed to find the right levers for a sustainable impact of mosque tours.

5 Part 2: qualitative Study

Findings from the regression analyses show that contact with Muslims and Islam in the setting of a guided mosque visit has the potential to reduce anti-Islam prejudice. This discovery holds significant importance, and it warrants further investigation. Often, when quantitative research designs reach this point, they fail to offer opportunities for in-depth analysis. Fortunately, our questionnaire includes an open-ended question that can elicit more detailed information on the participants’ attitudes towards Muslims. To comprehensively understand the cognitive changes between t1 and t2, during which we observed a reduction in prejudice, we zoom in on this particular time frame and perform a thorough qualitative analysis of the participants’ associations with Muslims. We aim to identify relevant topics and connotations in the context of the category ‘Muslims’ and compare the students’ free associations before and after visiting the mosque. Changes in the meaning attached to the label ‘Muslims’ might give us some insights into how a mosque visit influences the content of a category.

To investigate the possible changes in the meanings of the category, we use the same data set as described in Chap. 4b. In this part, we focus on the treatment group, so that 162 cases were available for the analysis of the free associations. We focus on the changes between t1 and t2, as this is the most interesting point in time regarding the reduction of prejudice.

5.1 Exploring/Understanding the Category ‘Muslims’ by Using Free Associations Method

Categories, such as ‘Muslims’, are heterogeneous and the underlying content of such labels varies across individuals, countries, and contexts. Meanings attached to labels or categories can depend on experiences from (in)direct encounters, population size/composition and visibility of specific groups, and positive or negative portrayal and media coverage (Asbrock et al. 2014; Wallrich et al. 2020). A mosque visit—conceived as an encounter with Islam and Muslims that generates new experiences and knowledge—potentially challenges existent associations and creates new ones. Research in this field indicates that the strength of prejudice towards groups depends on the underlying content, the meaning given to certain labels, i.e. the most salient group that is mainly associated by the participants and which influences the participants’ response (Asbrock et al. 2014; Spruyt et al. 2016; Wallrich et al. 2020). Associating primarily Muslims when asked for foreigners/strangers, for instance, correlates with more negative attitudes towards foreigners or strangers, respectively (Spruyt et al. 2016; Wallrich et al. 2020).

Based on these findings, i.e. the idea that a category’s content influences attitudes towards this category, we aim to gain insights into the content that is evoked by the category ‘Muslims’ before and after the guided mosque visit since changing meanings attached to the group of Muslims might be responsible for changing attitudes towards Islam and Muslims. To study changes and continuities regarding the perception of Muslims, we analyze free associations from an open-ended question that was part of the questionnaire. It says “Please write down everything that spontaneously comes to mind when you think of the group of Muslims. The following comes to mind about Muslims …”, followed by ten numbered blank lines for the participants to write down their associations. Each line represents one association. All associations per participant and time point form one set. This approach results in two sets per participant (t1 and t2), each containing up to ten associations.

We then coded all the sets we received from the open-ended question. We used both a deductive as well as an inductive approach which means that some codes were derived from the literature while others were generated from the material itself. Park et al. (2007), for instance, finds that many associations with Arab Muslims can be located within the thematic field of threat & conflict (e.g. ‘terrorism’, ‘violent’, ‘destructive’) and deep religiosity. They also identify physical features and outfits to be a relevant topic which we, too, see in our data. Violence and oppression (Gottschalk and Greenberg 2008) and ‘Islam as a threat’ (Halm 2013), respectively, are further examples for relevant associations that can be found in the literature on associations with and discourses around Islam and Muslims. It does not come as a surprise that in the context of a mosque visit, associations referring to the doctrine of Islam and religious practices are quite dominant among the participants’ associations. Therefore, we developed several subcodes such as Praying Five Times a Day, No Pork, Mosque, or Mecca/Pilgrimage. Each set could potentially be given multiple codes (e.g. Threat & Conflict, Mosque, and Ramadan/Fasting). However, a code could only be assigned once per set, so that we can make statements about the percentage of students who mentioned at least one aspect from a certain code at a certain point in time.

5.2 How do associations change after a visit to a mosque?

A first remarkable finding regarding the associations with the term ‘Muslims’ relates to the number of associations as such. Contrary to the control group, for which we observe no significant difference regarding the mean number of associations per student between t1 and t2 (t1: M = 4.02, SD = 2.685, t2: M = 4.14, SD = 3.028, t(161) = −0.85, p =0.398), the mean number of associations increases significantly for the treatment group from an average of 4.65 (SD = 2.851) associations per participant at t1 to 5.72 (SD = 3.098) associations at t2 (t(161) = −5.50, p < 0.001). From a quantitative perspective, the treatment group’s associations with ‘Muslims’ change in the sense that the number of associations mentioned by the participants increases significantly after having visited a mosque. This finding can be interpreted against the background of increased outgroup heterogeneity which has in the past been associated with more positive attitudes towards the outgroup (Wallrich et al. 2020). A significantly higher number of associations per participant at t2 compared to t1 is, therefore, an interesting finding when considering the decrease of anti-Islam prejudice we observed in the regression models.

To go beyond sheer numbers, we further focus on the quality of this change within the treatment group, i.e., the concrete content lying behind the category ‘Muslims’ before and after the mosque visit and take a closer look at the different codes. We concentrate on those codes only that have been mentioned in at least fifteen sets. In addition, we also focus on codes that have been mentioned at t2 only, meaning associations the participants did mention after but not prior to the mosque visit and therefore representing new content of the category ‘Muslim’ that might have been generated by visiting the mosque. Table 3 shows the 22 different codes we were able to extract from and find in our data, including exemplary associations to illustrate the content of the codes.

Table 3 Codes and its Descriptions/Examples

The codes displayed in Table 3 can be divided into two groups. Firstly, we see codes directly related to religion, religiosity, and religious practices (see Table 3 “Religion and Faith Practice”). A considerable number of sets can be assigned to one of these codes. Secondly, we find codes in which religion tends to be absent or at least plays a subordinate role (see Table 3 “Other Dimensions”). In the context of these codes, other dimensions, such as political ones, come to the fore. Instead of religious practices, traditions, rules, or objects, these codes contain associations regarding, for instance, terror and violence, gender, language, or optical appearance of Muslims or those perceived as Muslims.

Table 4 compares the percentage of participants who mention at least one aspect of a given code at t1 and t2, respectively. At first, it can be noted that we find codes that increase and some that decrease from t1 to t2 concerning the number of students who mentioned at least one aspect from this category. The category Appearance & Clothing, for instance, has at t1 been popular among 15% of the respondents. It decreases at t2, meaning that after having visited the mosque, only 4% of the participants mention aspects related to Muslims’ appearance or specific clothing traditions. This finding does not come as a surprise, since a person’s appearance is easily accessible and does not require a deeper knowledge about this (out)group. Therefore, associations in the realm of Appearance & Clothing can be interpreted as an expression of lacking (alternative) information on and contacts with Muslims. We find similar patterns for nearly all codes that do not refer to religion in the first place. International Reference, Gender, and Threat & Conflict are all codes that are more prevalent prior to than after the mosque visit. The decrease of the code International Reference by half indicates evidence for a shifted perception: Compared to t1, Islam is at t2 less often perceived as an international and therefore maybe distant phenomenon. Though it is not explicit in the wording, this code may also indicate a cognitive link between Islam and grievance and conflict as the countries mentioned here are often associated with war and human rights violations in media reports. A decrease could then be taken to mean that a menacing connotation has been stripped from Islam. Gender and Threat & Conflict are especially interesting from a valence analytical perspective since associations within this code are almost entirely negative. Negative associations, such as oppression of women, terror, war, or so-called Islamic State, are decreasing after having visited a mosque. Such ‘secondary meanings’, which are not explicitly related to religion and religiosity but refer to other—mainly political—dimensions and which other religions, in comparison, rarely face, are decreasing after the mosque visit. If we locate the phenomenon on a cognitive map, it seems as if it has slightly shifted from Islamism towards the mere religion of Islam.

Table 4 Percentage of Participants mentioning at least one aspect from the respective Code

The second group of codes refers to Religion and Faith Practices, which reflects the topics at the center of a mosque visit. The most dominant codes at t1 are Headscarf/Veiling and Mosque, but also aspects from codes such as Ramadan/Fasting or Quran have been mentioned by around one-third of the participants. Around one in five students names aspects that can be assigned to the codes Praying, Strictly Religious/Strong Belief, No Pork, Mecca/Pilgrimage, and Islam/Religion. Some of these codes, such as Islam/Religion or Headscarf/Veiling experience a moderate decrease at t2, meaning that after having visited a mosque, fewer participants mention aspects from these codes. The same tendency, but an even sharper decline can be observed for the code Strictly Religious/Strong Belief, which has been reduced by almost half. In a way, this code occupies a special position in the group of religion-related codes, since most of the associations contain evaluations in the form of adjectives such as “strict” or “strong”, whereas all other religion-related codes refer to (more or less) concrete and neutral objects. So if there is any code at all with a rather negative connotation in this second group, it is this one and it shows a strong decline for the time after the mosque visit.

We also find codes that show an increase from t1 to t2. Some of these codes appear in the context of the five pillars of Islam (Five Pillar of Islam, Praying Five Times a Day, and Mecca/Pilgrimage). At t2, after visiting the mosque, a higher proportion of participants mention at least one aspect from the respective code than at t1. This finding can be interpreted as an indicator that imparting general knowledge about Islam is an integral part of guided mosque visits and that this specific, albeit very theoretical knowledge is retained by the students.

Analyzing these subcategories, it becomes clear that the participants seem to gain knowledge through the mosque visit. This does not only become visible by the increase of the aforementioned codes, but also by the emergence of completely new codes at t2. The most striking difference between t1 and t2 is the amount of specific knowledge and the richness of detail. Some subcategories that refer to very specific religious practices—Prayer Times/Calendar/Sun, Moon, Prayer Niche, Prayer Rug, Ritual Washing/Clean Place—are nearly not existent at t1. At t2, we suddenly find associations that are linked to these subcategories. These subcategories represent very detailed knowledge that, on the one hand, goes far beyond what we would assume to be general knowledge about Islam, and, on the other hand, is less theoretical and may be linked to concrete experiences within the framework of the mosque visit. This can, for instance, mean that the students had to take off their shoes before entering the mosque, that they saw a prayer niche, or touched a prayer rug. This form of knowledge at least has the potential to be connected with concrete experiences the participants made during this mosque visit.

How much some associations change can be illustrated by these two sets of the same participant at t1 compared to t2:

t1:

“faith in Allah—some are with the IS—attend a mosque”

t2:

“pray more often during the day—are very social—attend a mosque—pray in the direction of Mecca”

Prior to the intervention, this student associated Muslims with IS—although we already see a distinction instead of a generalization here (“some are …”). After the mosque visit, the associations are not only more detailed concerning religious practices, but also much more positive.

At t2 we observe new and more detailed knowledge of Islam and Muslims and at the same time reduced prejudice towards Islam. Of course, we cannot make claims about correlation or causality based on these data. We can only try to understand how the participants’ associations with Muslims change during the mosque visit. We can take these findings seriously and take them as an indicator when discussing possible explanations of a decrease in anti-Islam prejudice. Nevertheless, this observation is in line with the contact hypothesis, which postulates that “Contacts that bring knowledge and acquaintance are likely to engender sounder beliefs concerning minority groups, and for this reason contribute to the reduction of prejudice” (Allport 1954, p. 268). Pettigrew and Tropp (2008) demonstrate a significant mediation effect of knowledge in a meta-analysis.

To sum up, we not only find a significant increase in the number of associations per participant from t1 to t2, but we also observe qualitative changes regarding the meanings attached to the category ‘Muslim’ before and after visiting the mosque. Some codes are less dominant at t2 compared to t1. These are especially codes that are not related to religion in the first place but refer to other (rather political) dimensions. From a valence analytical perspective, we also observe a decrease for the codes which might be described as rather negative, such as Threat & Conflict, Gender, or Strictly Religious/Strong Belief. Negative associations seem to abate after the mosque visit. In contrast to these codes, other codes experience an increase and are more prevalent at t2 compared to t1. These codes mainly refer to knowledge about Islam, although they can be categorized into two different types of knowledge. On the one hand, we find an increase in codes related to theoretical general knowledge about Islam. On the other hand, our findings indicate that very specific and potentially experience-based knowledge emerges after having visited the mosque. In addition to Part 1, in which we could show a decrease in anti-Islam prejudice, Part 2 revealed substantial changes in the meanings attached to the category ‘Muslim’, that might have an effect on the evaluation of Islam. All in all, we find a shift in the meanings attached to the category ‘Muslim’ from rather superficial associations at t1 to more detailed knowledge and less negative evaluations at t2. Furthermore, we observe a prevalence of religion-related associations at t2 and a decrease in connotations from other dimensions, such as political ones. These findings reveal the potential of mosque visits to generate new and broader religion-related knowledge and to rid Islam and its believers of such non-religious connotations as war, terror, or oppression.

6 Discussion

We examined the contact between Muslims and non-Muslims during guided tours of mosques in Germany with two goals in mind. The first goal was to find out whether prejudice is actually reduced under real world conditions of mosque tours. Their character does not necessarily favor an effect: The contact is short, takes place only once, and does not comprise targeted perspective taking. Neither can we speak of a peer contact throughout, nor is a common goal or cooperative behavior recognizable. Whether the contact is supported by authorities remains questionable. However, guided tours are offered by mosques all over Germany. They thus provide a platform for encounters that would not take place without them. Our data now show that the efforts are worthwhile. In a pre- and post-test, students show a significant short-term decrease in anti-Islam attitudes. Mosque tours, as they are practiced in Germany today, have the potential to contribute to more positive attitudes toward Islam. If one takes into account the unfavorable conditions of contact during such tours from the perspective of contact research, this is a remarkable result.

Even if the effect partially disappears in a second post-test after a few months, the results confirm that this platform for encounters is in principle effective. However, in view of the unsystematic conception of the tours so far, one may also assume a lot of unused potential. The research results on the contact hypothesis suggest that the effects could be strengthened if certain conditions were changed. This would not necessarily require major alterations in the way the tours have been conducted to date. For example, it is conceivable that authorities, such as the city authorities or other local religious communities, would provide visible support. Since mosque tours are carried out by a grassroots movement that is primarily the initiative of mosque congregations and individual teachers, visible external support has been rare. Next, elements of perspective taking could also be introduced. From what we know so far, mosque tours focus on lectures about the building, rituals and the orthopraxy of the religion. An additional portrayal of the guide’s own experiences and indirect peer contact through the inclusion of narratives by young people of Muslim faith (e.g., in the form of text, audio, or video) could increase the impact.

These are only theoretical considerations, because the impact of variations in the program of mosque tours is yet to be investigated. In order to avoid erroneous conclusions when naively transferring the results of other studies, further research would be necessary that experimentally tests the magnitude of effects of different elements of mosque tours within a targeted program design. The importance of doing so is demonstrated by the variation in the effects of the six tours studied here. It appears that certain elements of tours are likely to either inhibit or promote positive effects. We might ask, for example, whether the gender of the guides makes a difference, or whether tours that involve exploration of the entire building have a stronger effect than tours that consist only of mere lectures in the main hall of the mosque. Such differences need to be systematically investigated. Particularly in the case of school classes, additional questions must be asked about how the tours are embedded in the lessons, what form of preparation and follow-up reinforces the effects, and how the effects are sustained by long-term strategies. Is it possible, for example, to use teaching materials to establish indirect contact in the following school term and thereby consolidate the effects? Finally, due to the sample, the results can only be generalized to a limited extent. The six tours observed certainly do not fully represent the range of possible visitor experiences. The reports evaluated by Haubach and Salentin (2015) document ambivalent visitor reactions to confrontation with gender segregation, rigid dress codes, and other aspects of orthopraxy. Negative contact consequences that would strain attitudes toward Islam (Schäfer et al. 2021) therefore cannot be ruled out. In particular, further studies are needed on effects in adulthood, when visitors arrive with more consolidated and sometimes markedly more biased attitudes.

The second aim of the article was to explore how the image of Muslims changes as a result of a mosque tour. The analysis of mosque visitors’ free associations concerning their images of Muslims shows that qualitative shifts take place. First, the context becomes visible. The place where the encounter took place plays a significant role. After the visit, objects are remembered that were hardly or not at all mentioned beforehand and with which the respondents came into contact during the tour. The acquired knowledge, e.g., about religious rituals and commandments, also becomes visible in the subsequent associations. The way in which knowledge is acquired differs from school lessons. In the rooms of a mosque, content becomes vivid and tangible. An increase in knowledge about religion can be observed in the changes within categories with religious reference, which contains more details and is less superficial than before. We can conclude that images after a contact have a lot to do with the concrete situation. Second, however, we also see a change in higher-level topics. The decrease of superficial associations is also shown in the Appearance & Clothing category. Associations with external features such as hair color or clothing style play a lesser role. In addition, we find that Islam was less understood as an international movement after the visit and came to be perceived as a somewhat more domestic phenomenon. At the same time, negatively charged categories such as Threat & Conflict decrease. The image of Muslims thus indirectly becomes more positive. In comparison, the associations before a visit to a mosque are more superficial, more negative and fewer in number. After the visit, more associations are reported overall, and these are more varied, go into more detail, have fewer international references, and deal less often with threats and conflicts. Mosque tours thus have the potential to enrich knowledge and generate fresh cognitive associations.

From the perspective of contact research, mosque tours are located in the gray area between everyday contacts and professional interventions. Such contact platforms are difficult to classify within the scope of previous research and receive little attention from it. At the same time, they have an enormous relevance for society, as they create opportunities for encounter and combat prejudice, devaluation and discrimination of outgroups. A scientific approach to forms of contact such as mosque tours is important in order to make results from research usable for practice. In the present case, we take a first step by testing the contact hypothesis under the realistic conditions of mosque tours. The experimental design of the study in the field makes it possible to clearly attribute the effects to the tours.

Another innovative component of the study is the application of an explorative method that allows a better understanding of the effect of mosque tours on visitors. The comparison of cognitive associations with the outgroup before and after a contact transcends the standardized measurements of established research, allows a more multi-layered understanding of the contact situation and promises new categories of analysis.

In this study, the comparison of associations has proven to be a fruitful instrument for tracing the process of change in mosque visitors. We see how the setting of the contact is reflected in the images after the encounter, but that at the same time superordinate meanings change. Such categories of analysis, generated from the field, allow insights into the content of prejudice toward a group, how it is framed and perceived by respondents prior to contact, and the changes that each specific contact situation engenders. For further research on contact situations, the central questions are whether overarching categories can be found alongside situation-specific categories in different forms of contact, what influences them, and how they interact with conventional measures of prejudice. Such findings, in turn, are particularly useful for practice when it comes to the way in which contact is designed and thus to the question of the ways in which contact can have the most positive effect possible.