The link between ethnic diversity and scientific impact: the mediating effect of novelty and audience diversity

Understanding the nature and value of scientific collaboration is essential for sound management and proactive research policies. One component of collaboration is the composition and diversity of contributing authors. This study explores how ethnic diversity in scientific collaboration affects scientific impact, by presenting a conceptual model to connect ethnic diversity, based on author names, with scientific impact, assuming novelty and audience diversity as mediators. The model also controls for affiliated country diversity and affiliated country size. Using path modeling, we apply the model to the Web of Science subject categories Nanoscience & Nanotechnology, Ecology and Information Science & Library. For all three subject categories, and regardless of if control variables are considered or not, we find a weak positive relationship between ethnic diversity and scientific impact. The relationship is weaker, however, when control variables are included. For all three fields, the mediated effect through audience diversity is substantially stronger than the mediated effect through novelty in the relationship, and the former effect is much stronger than the direct effect between the ethnic diversity and scientific impact. Our findings further suggest that ethnic diversity is more associated with short-term scientific impact compared to long-term scientific impact.


Introduction
Nowadays, scientific collaboration is common and one of the main modes for researchers to do research. Understanding the nature and value of scientific collaboration is not only an attractive research topic for exploring the patterns of scientific activities, but also important 1 3 to rationally manage science and when creating collaboration polices. As co-authorship has increased in scholarly communication, the number of authors per paper has increased in most fields (Wuchty et al., 2007), and especially among international collaborations (e.g. Gazni et al., 2012;Glänzel & De Lange, 2002). As the collaboration networks become larger and more complex, it is difficult to obtain a deeper understanding of the nature of scientific collaboration only from linear perspectives, such as the degree or scale of collaboration. Therefore, a structural perspective is more appropriate to analyze scientific collaboration. Ecological diversity theory, which combines three dimensions such as variety, balance and similarity/disparity (Leinster & Cobbold, 2012;Stirling, 2007), can provide a structural perspective for this analysis. This perspective can help to understand the structure, nature and value of scientific collaboration. Then, this is in line with similar studies that applied an ecological diversity perspective to study co-authorship diversity (e.g. AlShebli et al., 2018;Freeman & Huang, 2015).
Our study focuses on co-authorship diversity of ethnicity, i.e. ethnic diversity in scholar collaboration. Most of the research problems and their solutions are borderless and globalization is an inevitable phenomenon in research (Abbasi, & Jaafari 2013;Barjak & Robinson, 2008). In a global research activity, ethnic diverse collaboration can happen within and across nations. There are several possible reasons for ethnic diverse collaboration. The main reason can be the collaboration between co-authors from different countries. However, diverse ethnic authorship could also be realized when all authors are affiliated to one country as result of the increasing mobility of researchers between countries and continents (Chinchilla-Rodríguez et al., 2018;Robinson-Garcia et al., 2019;Sugimoto et al., 2017). Similarly, countries with high levels of immigration, like USA, and research-intensive universities with high levels of international staff could also lead to high degrees of ethnic diverse co-authorship.
Ethnic diversity can potentially lead to multiple types of advantages for the research that is performed, for instance novelty regarding ideas (Freeman & Huang, 2014), and citation impact through publications being more well-known (AlShebli et al., 2018). Nathan and Lee (2013) studied links among cultural diversity (which is similar to ethnic diversity), innovation, entrepreneurship, and sales strategies in London businesses. They found that companies with cultural diverse management are more likely to introduce new product innovations than are those with homogeneous top teams. Ethnic diversity may also play a role in shaping scientists' social identities, unconscious biases, and knowledge that likely applies to social situations (AlShebli et al., 2018). Ethnic diversity is further associated with the outcomes in terms of health (Alvarez & Levy, 2012;Dasmunshi et al., 2010) and economic development (Montalvo & Reynalquerol, 2005). Ottaviano and Peri (2006) found that there are positive economic consequences to U.S. natives of the growing cultural diversity of American cities.
In the scientometric field, when referring to (ecological) diversity studies, many earlier studies are related to the topic of interdisciplinary, i.e. field diversity based on references, mainly focusing on the methods to measure interdisciplinarity and the relationship with scientific performance (e.g. Leydesdorff & Rafols, 2011;Leydesdorff et al., 2019;Rafols & Meyer, 2010;Wagner et al., 2011;Wang & Schneider, 2017;Zhang et al., 2016). Most earlier studies on co-authorship diversity of different attributes (e.g. Dong et al., 2018;Freeman & Huang, 2015;Wagner et al., 2019) use similar diversity measurement methods as those in the mentioned interdisciplinary studies and, like these, focus on the relationship between co-authorship diversity and scientific performance.
Few previous quantitative studies have studied ethnic diversity in co-authorship, and most of these deal with the relationship between ethnic diversity and scientific impact. Freeman and Huang (2015) found that higher diversity in author ethnicity on average leads to higher citation impact. These authors used the Herfindahl-Hirschman Index of concentration as a measure of homophily (the opposite of diversity), based on the ethnicity of first and last authors of 2.57 million publications with co-authors from the United States from Web of Science and PubMed. AlShebli et al. (2018) analyzed papers from Microsoft Academic Graph dataset to study the relationship between research impact and five classes of diversity including ethnic diversity. Taking Gini Impurity as measure of diversity and number of citations received within 5 years of publication as measure of scientific impact, they found, using regression analysis and randomized baseline models, that ethnic diversity is positively correlated with scientific impact. Lerback et al. (2020), using data for submissions and publications across all American Geophysical Union (AGU) journals, found that diversity in USA-based author teams regarding race/ethnicity (as identified by selfprovided demographic information within the AGU's membership database) is associated with lower acceptance rates and citation rates. However, more studies, using data from different sources and different disciplines, on ethnic diversity are needed especially for understanding the causes behind the relationship between ethnic diversity and scientific impact.
In this study, we consider three research questions: 1. Does higher ethnic diversity in co-authorship increase scientific impact, taking the two control variables affiliated country diversity and affiliated country size into account? Even though earlier research has considered the relationship between ethnic diversity and scientific impact (AlShebli et al., 2018;Freeman & Huang, 2015), these two variables have not been taken into account. Moreover, the earlier research referred to in the preceding sentence used an ethnic diversity measure, which only takes the balance and variety components of the diversity concept into account. In our study, however, also the disparity component is taken into account (Leinster & Cobbold, 2012;Stirling, 2007). 2. Does ethnic diversity affect scientific impact mediated through novelty or audience diversity? Earlier studies proposed some causes behind the relationship of ethnic diversity and scientific impact (Freeman & Huang, 2014. However, these studies did not test the mechanisms empirically. 3. Is ethnic diversity more associated with short-term scientific impact compared to longterm scientific impact? Here, the aim is to detect if the relationship between ethnic diversity in authorship and scientific impact is influenced by time accumulation, an issue that has not, to our best knowledge, been dealt with in earlier ethnic diversity studies.
To answer the three research questions, we put forward a model to connect ethnic diversity with scientific impact, with international collaboration factors as control 1 3 variables, treating novelty and audience diversity as mediators, and use this to study both short-term and long-term effects.
This paper, which is expanding and improving on Ding et al. (2019), is structured as follows. In the second section, we put forward hypotheses, and a conceptual framework of the study is given. In the third section, the data, indicators and methods of the study are described. The fourth section gives the results, as well as interpretations of them. In the fifth section, the results are discussed, and in the final section, we deal with limitations of the study and with future research.

Hypotheses
We hypothesize that the effect of ethnic diversity on scientific impact is mediated by novelty and audience diversity. Novelty and audience effects are assumed as effects of different kinds of diversity in the discussion part of some earlier publications (e.g. Freeman & Huang, 2014. However, novelty and audience can be assumed to work thorough different mechanisms. Novelty in scientific publications can be related to the level of new scientific findings and/or their quality (Klavans & Boyack, 2013), which can drive reuse, breakthroughs and, hence, citation impact. On the other hand, the audience effect is dealing with how knowledge about new publications are spread and in what scientific networks, which can be expected to relate to scientific impact through a "marketing" effect. We take audience diversity as a variable to represent the audience effect in order to make it measurable.
As we aim to detect if the relationship between ethnic diversity and scientific impact is influenced by time accumulation, our model includes both a short term and a long term variant of the two variables audience diversity and scientific impact. The two variants are based on different time windows. The other variables are based on co-authorship, which is fixed when a publication published. As indicated above, we further use the two control variables affiliated country diversity and affiliated country size. The reasons for using them are given in the section "Data and methods".
Our conceptual model ( Fig. 1) includes five hypotheses, H1-H5, which we now describe. Earlier research has found there is a positive relationship between ethnic diversity and scientific impact. AlShebli et al. (2018) found that group and individual ethnic diversity can have positive effects on scientific impact, whereas Freeman and Huang (2015) found that ethnic diversity can increase the scientific impact of collaborations.
H2 Ethnic diversity has positive effects on novelty.
A reasonable assumption is that more diverse groups will, on average, contain more diverse perspectives, and this may in turn lead to a higher probability to form novel ideas. More diverse teams have greater opportunity to leverage the expertise of members and to bring a wider range of information to the knowledge creation process (Bercovitz & Feldman, 2011;Cohen & Levinthal, 2000;Wagner et al., 2019). Peterson (2001) showed that multi-national collaboration, where the collaborators have different cultural (and educational) backgrounds, tend to stimulate new ideas and develop new approaches to theoretical or practical problems. Freeman and Huang (2014) suggested that teams with members from diverse ethnic backgrounds may benefit from a greater variety of perspectives. Wagner et al. (2019) suggest that co-authored articles are more novel and diverse groups have a greater chance of producing creative work.
Moreover, ethnic diverse co-authorship can break the lock-in and inertia effect in homogenous co-authorship networks, an effect that can be an obstacle to innovation in collaboration. Boschma (2005) suggested that proximity of different forms for people or organizations, such as cognitive or social proximity, can bring lock-in and inertia effects, which can prevent them from recognizing new possibilities and then hinder innovation in development. Thus, ethnic diverse co-authorship may reduce obstacles to innovation to a certain extent, and thereby have a positive effect on novelty. We further mention that Brixy et al. (2020) found, in a study on newly founded firms in Germany, that unusual combinations ethnic backgrounds had a positive association with the probability of a start-up introducing an innovation.
H3 Ethnic diversity has positive effects on audience diversity. Freeman and Huang (2014) assumed network effects, which is another expression of audience effect, and offered a different sort of explanation of why ethnic diversity is associated with higher scientific impact. They suggested that a publication generated by a more diverse research group could tap into different networks of research groups and thus attract greater diversity with respect to more citing authors as potential audience. Wagner et al. (2019), in their research on co-authorship diversity of affiliated countries, speculated that citations to international work may be explained by an audience effect, where more authors from more countries results in access to a larger citing community.
H4 Novelty has positive effects on scientific impact.  found that highly novel publications deliver high gains to science: they are more likely to be highly cited papers (top 1%) in the long run. Klavans and Boyack (2013) using Scopus and WoS data observed that the citation impact of a publication is larger if it is more innovative, even though there are occasional exceptions. Moreover, they 1 3 observed that the impact of novel research may not be obvious in the first few years after publication, and that it takes time to accumulate the advantage in academic impact. On the other hand, Freeman and Huang (2014) pointed out that novelty could be an explanation of their findings that greater ethnic homogeneity among authors is associated with publishing in lower-impact journals.
H5 Audience diversity has positive effects on scientific impact.
If the ethnic diversity of an audience is large, this is likely to open more channels for disseminating the research, which may lead to higher scientific impact. Kerr (2008) showed that ethnic technology transfers are particularly strong in high-tech industries and among Chinese economies. The strong Chinese outcomes of technology transfer may be due to unique qualities of this ethnicity's network, like size and network effects, which can yield an audience effect with respect to the outcomes. Alternatively, the strong Chinese outcomes may be due to manufacturing focus. Wagner et al. (2019), in studying the effect of the co-authorship diversity of affiliated countries on citation impact, proposed that the audience effect of international collaboration could be an important reason for the increase of academic impact.

Data collection
The data source of the study was Bibmet, a database of the Royal Institute of Technology (KTH) in Sweden and based on Web of Science (WoS). Bibmet covers publications from SCIE, SSCI and A&HCI since 1980, and is updated quarterly. To effectively study the effects of ethnic diversity, we selected two research fields with a large number of coauthors on average, in order to get a large spectrum of ethnic diversity, from low diversity to very high diversity in some papers. With this in mind, we selected the two following WoS subject categories: Nanoscience & Nanotechnology and Ecology. Information Science & Library Science was the third WoS subject category selected, since it shows a lower rate of co-authorship and most of the authors of this paper are experts in this field, which facilitates the interpretation of the results.
The WoS database links author names to addresses since the publication year 2008. For publications published before 2008, there are only a few with such links. Since the variable affiliated country diversity in our design needs to be calculated based on the corresponding links between authors and their addresses, we use publication year 2008 as our start year. Furthermore, in order to calculate long-term citation impact according to our design, the publication end year should not be too recent. Considering these two aspects, we use the publication period 2008-2012. We further use two citation windows with different lengths: a 3-year citing window and 6-year citing window.
We collected publications of the types Article from the three subject categories, and the number of publications are shown in the row 'All publications' in Table 1. Co-author publications, i.e. publications with two or more authors, are the focus of this study, and the number and proportion of co-author publications are also shown in Table 1. The table shows that the collaboration rate for Nanoscience & Nanotechnology are higher than the rate for Ecology, which in turn has a higher rate than Information Science & library.
In our preliminary dataset of co-authored publications, it is not feasible to retrieve values for all variables in our design for all publications. This includes publications without references pointing to journals covered by WoS-since we only use references pointing to journals covered by WoS for calculating novelty, like -as well as publications without links between authors and their addresses, 1 as such links are needed for calculating affiliated country diversity. After excluding the publications without references of the indicated kind or without author-address links from the set of all co-author publications, the final datasets used in our analysis of the three subject categories were obtained (

Ethnicity identification
How to identify and classify ethnicity of each author is the first step for measuring and studying ethnic diversity. Classification of ethnicity is a topic pertinent to demographic applications and public health. In this study we use the ontology Cultural, Ethnic and Linguistic (CEL) Taxonomy/classification (Ambekar et al., 2009;Mateos et al., 2007;Mateos, 2007). CEL is a new ontology of ethnicity, which is multidimensional in nature, assimilating aspects of language, religion, geographical region and culture through the shared 1 3 characteristics of names . There are two commonly used CEL hierarchical classifications, which can be used to classify a population into different groups based on names. One of the classifications consists of 13 groups as leaf nodes (Mateos 2007;Ambekar et al., 2009), while the other is a more fine-grained classification consisting of 39 groups as leaf nodes (Ye et al., 2017). HMM (Hidden Markov Model), NamePrism, Ethnea and EthnicSeer are existing name-based nationality classifiers, which use name substrings as features to identify ethnicity/CELs groups. Among these four tools, NamePrism uses 57 million contact lists (telephone directory) from a major Internet company as training data, while the other three classifiers are trained on small, non-representative sets of labeled names, typically extracted from Wikipedia. NamePrism is more accurate than the other three tools and it has a larger F1 (a measure that considers both precision and recall) score regarding some common nationality/ethnic classes compared to the other three classifiers (Ye et al., 2017). Moreover, NamePrism provides a fine-grained classification of 39 CELs groups and provide a free API for researchers (Ye et al., 2017). Considering these advantages of NamePrism compared to the other three tools, we use it to automatically identify the ethnicity of authors. For each name, NamePrism gives several candidate ethnicities according probabilities. For most of the cases, the probability of the first candidate is usually higher than 60%, 2 which is considerably higher than the probabilities of the remaining ones. In our analysis, we use the first candidate ethnicity as the final ethnicity for an author name. The WoS database has full names for authors since the publication year 2006. 3 The author names in our dataset are thereby to large extent full names as the publication period of the study is 2008-2012. For the 171,167 target publications, there are 402,690 distinct author names, 79.1% of which are full names. There are 1,375,287 citing publications for the target publications in the 6-year citing window, and there are 1,895,194 distinct author names in the citing publications, 82.2% of which are full names.
The ethnicities of all target publication authors and citing authors have been obtained by use of the free API provided by NamePrism. The distribution of ethnicities in the 10 most frequently occurring affiliated countries (the top 10 affiliated countries based on P_AF i in Eq. (10)), with regard to the target publications and the three fields, is shown in "Appendix 1" (Fig. 5). Normalized collaboration frequencies for all pairs of the 39 ethnicities, with regard to the target publications and the three fields, are also shown in "Appendix 1" (Figs. 6,7,8).
Clearly, algorithmically identifying the ethnicity of authors based on their names has limitations. One important concern is that ethnic diversity can potentially be substantially overestimated, especially for publications with authors from countries with high levels of immigration. This in turn would yield that we underestimate, perhaps to a large extent, an effect (positive or negative) of real ethnic diversity on citation impact. In a country with a high level of immigration, like USA, a group of authors can be highly diverse with regard to name ethnicity. However, it might be the case that all the authors in the group grew up in the same area, visited the same schools (from high school to university) and obtained the same university degree. In cases like this, real ethnic diversity can reasonably be said to be absent. This yields that there are no scientific gains in different problem solving strategies, 1 3 ways of thinking, and so on, that can be attributed to diverse ethnic background of the authors in the group. We will return to the discrepancy between name ethnic diversity and real ethnic diversity in the section "Limitations and future research".

Measurement of variables
The measures used for the variables in the conceptual model of this research ( Fig. 1) are briefly reported in Table 3. All the measures are calculated for each publication in the three datasets.

Ethnic diversity
Ethnic diversity is the independent variable of the analysis. We used the true diversity measure (Zhang et al., 2016) for ethnic diversity, based on the distribution of authors in ethnic categories for a target publication. For a target publication, t_n is number of ethnic categories represented in the target publication, P t_i is the proportion of authors in ethnic category i, P t_j is the proportion of number of authors in ethnic category j. S ij is the ethnicity similarity between ethnic category i and ethnic category j, which is calculated from the ethnicity identification results of our data from NamePrism. As mentioned above, Name-Prism gives several candidate ethnicities and corresponding positive probabilities for each author name as input. We used the probabilities of each name in each ethnicity category to create a matrix of ethnic categories as rows and author names as columns. We only used the probabilities of the first three candidate ethnicities for each name to construct the matrix, since the total probabilities of the first three candidate ethnicities are usually over 80% for most names. 4 We used the cosine measure to calculate ethnicity similarity between FNCS_3y-field normalized citation score with 3-year citation window FNCS_6y-field normalized citation score with 6-year citation window Control variables Affiliated country diversity AF_D-true diversity measure of affiliated countries for a target publication Affiliated country size AF_S-Average number of publication fractions of the affiliated countries for a target publication 39 ethnic categories based on the rows of the matrix. 5 The result of the ethnicity similarity calculations is shown in "Appendix 1" ( Table 6). The ethnic diversity of a target publication, E_D, is defined as:

Novelty
Novelty is one of two mediators. We follow the combinatorial novelty perspective, which assesses the extent to which it makes novel combinations of prior knowledge components. We used the novelty measure proposed by . This measure is commonly used or discussed in scientometric works (e.g. Leydesdorff et al., 2018;Tahamtan & Bornmann, 2018). The basic idea of this novelty measure is to count the new, important and useful combinations of referenced journal pairs weighted by the difficulty of forming such combinations. We describe the novelty measure as follows. For each target publication p: 1. We retrieved all referenced journal pairs of p. To reduce the probability to picking up trivial pairs, however, we used the condition that both journals in a pair belong the 50% most frequently cited journals. This was based on the total number of citations received by publications published 1980 and onwards, where the citations come from publications published in the past three years. A ij = 1 for a referenced journal pair i-j if this condition is true, otherwise A ij = 0. A ij refers to an element in the symmetric matrix A, in which the rows and the columns represent the journals referenced in p. Similarly, B ij , D ij and C ij in steps (2)-(4) below refer to elements in the symmetric matrices B, D and C, respectively. Also for these matrices, it holds that the rows and the columns represent the journals referenced in p. With respect to the elements of the main diagonal of the four matrices (elements not used for the calculation of novelty values), we set these elements to 1. 2. We examined each pair i-j to see whether it is new, i.e. has never appeared before (with the publication year 1980 as start year). If so, B ij = 0, otherwise B ij = 1. 3. Then, to further reduce the probability of picking up trivial journal pairs, we checked if a pair i-j is reused in the following three years. If so, D ij = 1, otherwise D ij = 0. 4. We assessed the ease of forming each journal pair i-j, by measuring the cosine similarity (C ij ) between the co-citation profiles of i and j. By subtracting C ij from 1, we obtain a value that indicates how difficult it is to form the referenced pair i-j. 5. Finally, the novelty of p is defined as: where J is a unit matrix with the same dimensions as the other four matrices, and "*" stands for element-wise matrix multiplication. Since i > j, the summation concerns the sub-diagonal elements of the involved matrices. Informally, the novelty of p is the sum of the difficulty values (the third factor in Eq. (2)) across the journal pairs referenced in p, where only highly cited, new and reused pairs contribute to the novelty value of p.
For the details of the novelty calculations, we refer the reader to "Appendix 2".

Audience diversity
Audience diversity is one of the two mediators. Also for this variable, we use the true diversity measure (Zhang et al., 2016). We use author names from citing publications and a 3-year citation window for short term audience diversity, and author names from citing publications and a 6-year citation window for long term audience diversity. For a target publication, a3_n (a6_n) is number of ethnic categories of its three-year (six-year) citing publications, P a3_i (P a6_i ) is the proportion of number of authors from its three-year (six-year) citing publications in ethnic category i, P a3_j (P a6_j ) is the proportion of number of authors from its three-year (six-year) citing publications in ethnic category j. S ij is ethnicity similarity, i.e. similarity between different ethnic categories, defined in the same way as for E_D.
The audience diversity of short term (three-year citing window) of a publication, A3_E_D, is defined as follows: The audience diversity of long term (six-year citing window) of a publication, A6_E_D, is defined as follows:

Scientific impact
Scientific impact is the dependent variable of the study. We use the field normalized citation score (FNCS) using different citing window length to measure short term impact and long term impact. Even though we compare the scientific impact of publications inside a certain WoS subject category, we still choose the field normalized citation score. A reason for this is that a large percentage of the target publications belong to more than one WoS category.
For a publication, C 3 (C 6 ) is its citation count in the latest three years (six years), and BC 3 (BC 6 ) is its citation reference value, equal to the average citation count in latest three years (six years) for the publications in the same publication year, of the same document type and belonging to the same WoS subject category. When a publication belongs to more than one WoS subject category, the publication, as well as its citation count, is distributed uniformly across the categories. For a given category, its reference value is obtained by dividing the sum of products of publication fractions and citation counts for the category by the sum of the publication fractions for the category.
The field normalized citation score, for a three-year citing window, of a publication, FNCS_3y, is defined as follows: The field normalized citation score, for a six-year citing window, of a publication, FNCS_6y, is defined as follows: Moreover, for calculating FNCS_3y and FNCS_6y when a publication belongs to more than one WoS subject category, we use the mean of the ratios. For example, if a publication (3-year citation count is C 3 ) belong to both subject category A (reference value of A is BC 3_a ) and subject category B (reference value of A is BC 3_b ), the FNCS_3y of that publication is equal to 1/2(C3/BC 3_a + C3/BC 3_b ).

Control variables
As ethnic diversity in co-authorship might be correlated to multiple countries in author affiliations or broad international collaborations, and international collaboration is associated with greater scientific impact (e.g. Lancho- Barrantes et al., 2012;Wagner et al., 2019), we use affiliated country diversity as a control variable. We further use affiliated country size as a control variable. Adding authors from large countries with high publication output might mean that the chance of receiving citations from these countries increases (by higher visibility and interest in the publication in the countries where the added authors come from). To add two authors from an ethnicity with a large research output to a publication without international or ethnic collaboration could potentially increase this "audience effect" much more compared to adding two authors from a smaller country or a smaller ethnic group, all other things being equal (like ethnic diversity). Of course, readership and citations by researchers are governed mainly by other factors (quality, relevance etc.), but country size is a reason why one could expect that the "audience effect" might be different between different countries. We do not claim that affiliated country size is a perfect proxy for the country properties determining the "audience effect", but the variable should be able to control for this effect, to some extent.
We use the true diversity measure (Zhang et al., 2016) for affiliated country diversity, based on the distribution of authors in affiliated countries for a target publication. For a target publication, a_n is number of affiliated countries, P a_i is the proportion of author fractions (author number in fractional counting when one author affiliated to more than one countries) in affiliated country i, P a_j is the proportion of author fractions in affiliated country j. S a_ij denotes the similarity between affiliated countries, which, in our analysis, is based on geographic distance. For affiliated countries i and j, we define the geographic distance between them as the great-circle distance, 6 denoted by D a_ij between the capitals of i and j. We used the maximum distance on the earth's surface, Max_D, to normalize D a_ij . Max_D can be obtained by taking (0, 0) and (0, 180) as input for the geographical coordinates. S a_ij is then defined as follows: Then the affiliated country diversity of a target publication, AF_truediv, is defined as follows: For the other control variable, affiliated country size, it is operationalized as the average number of publication fractions of the affiliated countries of a target publication. For a target publication, n is the number of affiliated countries in the address list, P_AF i is the sum of publication fractions (author level fractionalization for assigning a publication to its affiliated countries was used; when an author is affiliated to more than one country, the contribution of the author is distributed uniformly across the involved countries 7 ) for country i in the period 2008-2012 and in the subject category (one of the three selected categories used in the study) of the target publication. 8 Then the affiliated country size of a target publication, AF_S, is defined as follows: One might consider to use number of authors of a publication as a control variable (AlShebli et al., 2018). However, we only use co-authored publications in our analysis. More importantly, though, is that our used ethnic diversity indicator, E_D (see the section "Data and methods" below), implicitly control for number of authors, to some extent. Even if there is a positive correlation between number of authors and number of ethnic categories, this is consistent with no such correlation, or a considerably weaker correlation, between number of authors and E_D. This depends on the fact that E_D takes into account, not only the number of ethnic categories, but also the balance of the distribution of authors across the ethnic categories of a publication (the higher balance, the higher value of E_D, everything else equal). Thereby can a publication, P, with a considerably higher number of authors than another publication, P', have a smaller value of E_D compared to P'. We believe, then, that number of authors is not a confounder in our study. Notice that there is no need to control for publication year, considering the fact that we normalize citation counts in the study: the raw citation count of a given publication is divided by a reference value, equal to the average citation count for the publications published the same publication year, of the same document type and belonging to the same WoS subject category.

Path models
Path modeling is capable of modeling hypothesized routes to the same outcome, assessing reciprocal effects, and decomposing the total effect of a hypothesized causal factor into (Morgan, 2013). In scientometrics, path modeling has been used to analysis more complex relationships, such as multi-layer relationship, among a set of variables (Guan et al., 2015;Potthoff & Zimmermann, 2017;Yu et al., 2009). Path modeling is further suitable for the conceptual model given in Fig. 1. In this study, we apply path modelling to examine the relationships between the variables in the conceptual model. After defining relevant measures for the variables (shown in Table 3), we transform the conceptual model into a set of corresponding path models. For each of the three studied subject categories, models for ethnic diversity with short-term and long-term impact are obtained. Besides full models, we also obtain reduced models, where the control variables are absent from the models. With this design, comparisons can be made (a) between full models and reduced models in order to study the influence of control variables, and (b) between ethnic diversity in relation to short-term scientific impact and ethnic diversity in relation to long-term scientific impact. For each subject category, there are four path models, two full and two reduced. Taking models in Nanoscience & Nanotechnology as examples, Nano_S and Nano_L are full path models for short-term and long-term impact, whereas Nano_S' and Nano_L' are reduced models for short-term and long-term impact.
In path modelling, global fit indices measure how well a model fits the data globally. If the model is considered to be acceptable, the corresponding regression coefficients of the specific paths are subsequently being investigated. Table 4 lists several commonly used measures of global fit for path models, such as RMSEA, GFI etc. The values of these fit indices with regard to our models are reported in the table. The twelve models can be regarded as acceptable, since all of them satisfy the fit thresholds for all the fit indices, except the RMSEA value of ILS_L'.
Regression coefficients were standardized for all paths to allow comparisons between variables and between models. Since the variance of the dependent variables (FNCS_3y and FNCS_6y) is not constant across different segments of the independent variable (E_D), regardless of subject category, we used White robust standard errors in the path models 1 3 (White, 1980). By this, more accurate confidence intervals for the regression coefficients are obtained. For the relationship between ethnic diversity on scientific impact, there are two kinds of effect: direct effect and total effect. The direct effect is the regression coefficient between the variables, while the total effect is the sum of the direct effect and the indirect (mediated) effect. The indirect effect is the sum of products of the direct effects of the paths from ethnic diversity to scientific impact through mediators. We now show, using the model Nano_S (Fig. 2 below) as an example, how these effects are obtained. All ρ values given below are standardized regression coefficients 9 (standardized and non-standardized regression coefficients are given in "Appendix 3"). Further, a pair of indices, "ij", with regard to the ρ notation corresponds to an arrow from j to i in Fig. 2. As the two control variables are taken into account in Nano_S, the total effect of E_D (1) on FNCS_3y (4) is 0.056, which is decomposed as follows. The direct effect of E_D on FNCS_3y is ρ 41 = − 0.030, the indirect effect of E_D on FNCS_3y trough Novelty (2) is ρ 42 ρ 21 = 0.004 × 0.018, and the indirect effect of E_D on FNCS_3y trough A3_E_D (3) is ρ 43 ρ 31 = 0.287 × 0.299. The indirect effect of E_D on FNCS_3y is then ρ 42 ρ 21 + ρ 43 ρ 31 = 0.004 × 0.018 + 0.287 × 0.299 = 0.08 6, whereas the total effect of E_D on FNCS_3y is the sum of the direct effect and the indirect effect: ρ 41 + ρ 42 ρ 21 + ρ 43 ρ 31 = 0.056. When not considering control variables, the total effect of ethnic diversity on scientific impact is identical to the correlation between the two variables in a correlation analysis. Using the model Nano_S′ (Fig. 2) as an example, the total effect between E_D (1) and FNCS_3y (4) is r 14 = 0.065, which is equal to the correlation value between the two variables in a correlation analysis (the outcome of a correlation analysis is given in "Appendix 3").
For a given target publication, a necessary condition for audience diversity to increase is that the number of citing publications for the target publication increases. This yields that audience diversity is to some extent dependent on number of citing publications. However, a target publication with, say, twice as many citing publications as another target publication (within the same field) may have considerably less audience diversity compared to the less cited publication. In view of this, and the fact that the proportions of uncited target publications are small in each of our three datasets 10 , we believe that the effect of number of citing publications on audience diversity is sufficiently small not to cause any serious problems for the path models of the study.
The path modeling was done with the aid of AMOS 22.0 and the R package lavaan (Rosseel, 2012).

Results
Figures 2, 3 and 4, which correspond to the three studied subject categories, display our 12 path models with 95% confidence intervals for regression coefficients. In order to facilitate the interpretation of the results shown in Figs. 2, 3 and 4 relative to our five hypotheses, we summarize the results in Table 5.
In the remainder of this section, we analyze the results for the three subject categories Nanoscience & Nanotechnology (Nano), Ecology (Eco) and Information Science & Library Science (ILS) according to our three research questions, and we investigate if the regression results are related to outlier observations.

The relationship between ethnic diversity and scientific impact
Regarding the total effect of ethnic diversity on scientific impact, the results for the three fields are quite similar (Figs. 2, 3, 4, Table 5).
a. For Nano, Eco and ILS, regardless of long-term or short term time window, and regardless of if control variables are considered or not, we find a weak positive relationship between the ethnic diversity and scientific impact. Taking the results for Nano as an Model results for Nanoscience & Nanotechnology. Note: 95% confidence intervals for std. regression coefficients. The intervals based on White robust standard errors. N = 97,684. Confidence intervals marked with "*" concern total effects (direct effects + mediated effects) for the corresponding paths, while the other intervals concern direct effects for their corresponding paths example ( Fig. 1), for each of the four models, Nano_S, Nano_L, Nano_S' and Nano_L', there is a weak positive relationship between E_D and FNCS_3y (FNCS_6y). b. For all the three fields, the relationship between the two variables is weaker when control variables are included compared to when they are not included, for both long-term and short-term time window. This shows that the two control variables moderate the relationship between ethnic diversity and scientific impact. c. When comparing the coefficients of the two variables among the three fields, we find that the relationship in ILS is stronger than the relationship in Eco, which in turn is stronger than the relationship in Nano. In ILS, when control variables are considered and with respect to long-term scientific impact, a one standard deviation increase in E_D is associated with an increase in FNCS_6y by 0.087 standard deviations ("Appendix 3", Table 9). Regarding the corresponding non-standardized regression coefficient, a one unit increase in E_D is associated with an increase in FNCS_6y by 0.285 ('Appendix 3", Table 12). Information on non-standardized regression coefficients is helpful for a reader who wants to assess how important the total effect is in terms of raw data. Model results for Ecology. Note: 95% confidence intervals for std. regression coefficients. The intervals based on White robust standard errors. N = 64,409. Confidence intervals marked with "*" concern total effects (direct effects + mediated effects) for the corresponding paths, while the other intervals concern direct effects for their corresponding paths Regarding the direct effect of ethnic diversity on scientific impact, the results for the three fields (Figs. 2, 3, 4, Table 5) are slightly different, but have some similar points: a. In all three fields, when considering control variables, ethnic diversity has a weak negative (direct) effect on scientific impact for the short-term window (Nano_S, Eco_S, ILS_S), while the variable does not have any effect, or has a negative effect, on scientific impact for the long-term window (Nano_L, Eco_L, ILS_L). b. In general, the two control variables influence the (direct) relationship between ethnic diversity and scientific impact. In Nano and Eco, for short-term relationship, E_D has a weak negative effect on FNCS_3y (Nano_S' and Eco_S') when not considering control variables, but the negative regression coefficient is stronger when considering control variables (Nano_S and Eco_S). For the long-term relationship between the two variables in Nano, E_D has a weak positive effect on FNCS_6y (Nano_L') without control variables. However, in the full model, E_D does not have any effect on FNCS_6y (Nano_L).

Fig. 4
Model results for Information Science & Library Science. Note: 95% confidence intervals for std. regression coefficients. The intervals based on White robust standard errors. N = 9074. Confidence intervals marked with "*" concern total effects (direct effects + mediated effects) for the corresponding paths, while the other intervals concern direct effects for their corresponding paths In summary, regarding the effect of ethnic diversity on scientific impact, the results are quite different if total effect or direct effect (excluding mediated effects from total effect) is considered. This is due to the strong mediating effects of the two mediate variables, effects that we now describe.

The mediating effects of novelty and audience diversity
About the mediating effects of novelty and audience diversity in the relationship between ethnic diversity and scientific impact, the results for three fields (Figs. 2, 3, 4, Table 5) are again similar.
a. For all three fields, i.e. Nano, Eco and ILS, audience diversity is a more important mediator than novelty. Taking results in Nano as an example, in all four models (Nano_S, Nano_L, Nano_S′ and Nano_L′), E_D has a positive relationship with A3_E_D (and with A6_E_D), but has a much weaker relationship with Novelty. Further, A3_E_D (and A6_E_D) has a positive relationship with FNCS_3y (and with FNCS_6y), but Novelty has a much weaker relationship with FNCS_3y (and with FNCS_6y). b. For all three fields, we also find that the mediating effect of audience diversity in the relationship between ethnic diversity and scientific impact is much stronger than the direct effect between the two variables. This can be seen as an explanation of the difference in total effect and direct effect of ethnic diversity on scientific impact.

The difference regarding ethnic diversity related to short-term impact versus long-term impact
Regarding ethnic diversity in relation to short-term impact versus long-term impact, the results for the three fields (Figs. 2, 3, 4, Table 5) are also similar. Ethnic diversity is related more to short-term impact than to long-term impact. The direct effect of ethnic diversity on scientific impact (short-term and long-term) is weak for all three fields, however, so we only compare short-term and long-term impact based on total effect. For the total effect of ethnic diversity on scientific impact, ethnic diversity promotes scientific impact more for short-term impact than for long-term impact in all three fields. This is connected to the important mediator variable audience diversity, which is more related to short-term impact than to long-term impact. We discuss this latter fact in the section "Discussion".

The effects of removing the most influential observations
It cannot be ruled out that the regression results are related to outlier observations. To investigate this question, we performed analysis based on Cook's distance, which is a measure of overall influence an observation has on the estimated regression coefficients (Cook, 1977). The following was done for each of the 12 models: (1) Calculation of Cook's distance D i for each observation i, which yielded a distribution of n distances.
(2) Deletion of the observation with the largest D value.
(3) Re-estimation of the model with n -1 observations. (4) Comparison of the confidence intervals for the new regression coefficients with the original intervals.
It turned out that all corresponding confidence intervals are not only overlapping but also have equal or very similar endpoints ("Appendix 3", Tables 7, 8, 9; "Appendix 5", Tables 24, 25, 26). Thus, our regression results did not change substantially when the most influential observations were removed.

Discussion
We have studied the relationship between ethnic diversity and scientific impact, with novelty and audience diversity as mediators, and with affiliated country diversity and affiliated country size as control variables. Using path modeling, we took Web of Science subject categories Nanoscience & Nanotechnology, Ecology and Information Science & Library Science as cases. We will now discuss our three initial research questions, as well as the five hypothesis in Fig. 1, in the light of the empirical findings.
Does higher ethnic diversity in co-authorship increase scientific impact, taking the two control variables affiliated country diversity and affiliated country size into account?
Hypotheses H1, which is related to this research question, was confirmed by our results ( Table 5, the two rows for H1 with respect to total effect). Overall, regardless of Web of Science subject category, we have observed a weak positive relationship between the two variables. In earlier studies on ethnic diversity, similar findings have been obtained (AlShebli et al., 2018;Freeman & Huang, 2015). However, in contrast to these earlier studies, we have used control variables, and we have studied the direct effect of ethnic diversity on scientific impact.
When taking into account the control variables affiliated country diversity and affiliated country size, the effect of ethnic diversity on scientific impact is reduced, which means when reducing the noise of control variables, the effect of ethnic diversity on scientific impact becomes less positive or more negative. The noise of control variables can be understood as positive noise for the effect of ethnic diversity on scientific impact. This finding indicates that the scientific impact promoted by ethnic diversity to some extent comes from the two control variables.
Regarding the direct effect of ethnic diversity on scientific impact, the results show that ethnic diversity has either a negative effect on scientific impact or no effect on scientific impact. One reason behind this finding might be that ethnic diversity bring cultural barriers in collaboration (Freeman & Huang, 2015). Earlier research has pointed out that working in diverse groups may encounter greater challenges to the ideas of the group members (Apfelbaum et al., 2014), and communication can be hampered by linguistic or cultural differences (Samovar et al., 2009;Freeman & Huang, 2014), as well as by transaction costs (Wanger et al. 2019). The indicated factors may partly explain the negative direct effect of ethnic diversity on scientific impact. Moreover, the direct effect is negative, but the total effect is positive. This indicates that ethnic diversity can bring mediated effects, which can overcome the negative effect of ethnic diversity itself.

Does ethnic diversity affect scientific impact mediated through novelty or audience diversity?
We generally found that the effect of ethnic diversity was mediated (see hypotheses H2-H5), which confirmed our research question. More specifically, in the relationship between ethnic diversity and scientific impact, audience diversity played the largest part in the relationship, regardless of which Web of Science subject category was studied, while 1 3 novelty played a smaller role. When Wagner et al. (2019) studied international collaboration diversity-another type of co-authorship diversity than ethnic co-authorship diversity-they found that international collaboration diversity was not associated with more novel publications, and speculated that audience effect might be an important reason why international collaboration diversity can generate scientific impact. Our study confirms and extend these results, but suggesting that the effect of international collaboration on scientific impact may be driven by audience diversity. Either way, our findings highlight the important role of audience diversity for the relationship between ethnic diversity and scientific impact. Ethnic diversity can generate audience diversity by attracting more audiences to pay attention to the research, which in turn can increase scientific impact. This path of influence can explain the difference between total effect and direct effect of ethnic diversity on scientific impact.

Is ethnic diversity more associated with short-term scientific impact compared to long-term scientific impact?
The hypotheses H1, which related to this research question, was confirmed by our results ( Table 5, the two rows for H1 with respect to total effect). Our analysis, regardless of considered Web of Science category, lends support to the research question. The findings suggest that ethnic diversity promote short-term scientific impact more than long-term, both with or without control variables. We have found that the positive effect of ethnic diversity on scientific impact mainly comes from the mediating effects of audience diversity and novelty, and that these effects are sensitive to time accumulation factors. When comparing the effect of each mediator in different time windows, we see that novelty is connected to long-term impact more than to short-term impact, whereas the opposite is the case for audience diversity. Moreover, as we have seen, the mediating effect of audience diversity is much more important than the corresponding effect of novelty in the relationship between ethnic diversity and scientific impact. The stronger relationship between ethnic diversity and short-term impact, compared to the relationship between ethnic diversity and long-term impact, is then determined by the stronger relationship between audience diversity and short-term impact, compared to the relationship between novelty and long-term impact.
It is not difficult to understand why novelty is more related to long-term scientific impact than to short-term and that the opposite is the case for audience diversity. Audience diversity is related to particular attention from certain citing communities. The particular attention usually declines with time, as the potential citing community increases. Novelty is related to the content or the value of the research underlying a publication. It can take time before a large amount of peers have recognized the value of the research, and earlier research shows that the value of research with a high degree of novelty may not obvious in the first few years after publication (Klavans & Boyack, 2013).

Limitations and future research
The study has several limitations that need to be pointed out. First, for the mediated effects, our study only takes audience diversity and novelty into account, and there may be other variables that are mediators with respect to the relationship between ethnic diversity and citation impact. Second, the audience diversity variable is based on citing authors, an audience that can be extended to a wider one, including (non-citing) readers. Third, the novelty measure we used in this study has high requirements (for instance, that a journal pair has never occurred in prior publications and must be reused in the next three years) for a publication to be regarded as novel. In our data, a lot of publications fail to meet the novelty requirements, and thereby get zero as value of the novelty measure. However, for model calculation, the results would potentially be more informative if a more differentiated or distinctive novelty measure would be used. Forth, there are limitations related to Nameprism, the ethnicity classification tool we used in our study. Nameprism gives different identification result for the same name in upper case letters and in lower case letters. Another limitation is that the ethnic classification of NamePrism, which classifies the world population into 39 CEL groups/ethnicities, has granularity differences between different areas of the world. For example, there are more fine-grained CEL groups for Europe than for Asia or Africa.
In the section "Ethnicity identification", we discussed a discrepancy between real ethnic diversity and name ethnic diversity. A fifth limitation is that ethnic diversity could potentially be substantially overestimated in our study, foremost in publications involving countries with a high level of immigration. This in turn would yield that we underestimate, perhaps to a large extent, an effect (positive or negative) of real ethnic diversity on citation impact. In order to shed light upon this issue, we conducted a robustness test. For each of the three subject categories, from the full publication set for the category we retrieved the subset of publications with at least one author affiliated to USA. In this way, we obtained three datasets of USA publications. We further refer to the three full datasets without their USA publications (defined as above) as the full-not-USA datasets. Now, for each USA and each full-not-USA set, we performed the same statistical analysis as for the three full publications sets. Generally, the results for the USA sets agree well with the corresponding results for the full-not-USA sets with respect to the directions of the relationships ("Appendix 4", Tables 16,17,18,19,20,21,22,23). For instance, the observed weak positive relationship (total effect) between ethnic diversity and scientific impact in the full-not-USA sets, irrespective of Web of Science subject field, is observed also in the USA datasets. It might be the case, though, that our study involves overestimation measurement errors, since there is a tendency that the point estimates (i.e. the standardized regression coefficients), with respect to direct effects (of ethnic diversity on scientific impact), for the USA sets are less negative compared to the full-not-USA sets. However, the general agreement between the results for the USA sets and the results for the full-not-USA sets indicates that the effect of the errors are negligible.
For future research, besides trying to find more mediate variables and trying to design a wider audience diversity measure, another plan of ours is to study other types of coauthorship diversity, such as educational background diversity, geographic diversity, age diversity and gender diversity. It is possible that common rules for different types of coauthorship diversity can be found, which would help in understanding the structure of collaboration and the value of diversity itself. However, more research on co-authorship diversity is clearly needed.

Appendix 1: Statistics related to ethnicities
See Figs. 5, 6, 7, 8, Table 6. Fig. 5 The distribution of ehtnicites in the 10 most frequently occurring countries. Note. The top 10 countries are selected based on the sum of publication fractions (author fractionalization for assigning a publication to its affiliated countries was used) of affiliated countries for the target publications of a given field. The affiliated countries are ordered from left to right by their publication fractions. The number in each cell is the relative number of distinct author names of an ethnicity in an affiliated country: 100 × x/y, where x is the number of distinct author names of an ethnicity in an affiliated country, and y is the total number of distinct author names in the affiliated country , where x is number of collaboration publications between two ethnities in the subject field, and y and z is the total number of publications of the two ethnities in the subject field, respectively .000

C39
In the heat map of the table, the lower (higher) a similarity value is, the closer the color of its corresponding cell is to red (green)

Appendix 2: Calculation of novelty values
In this section, we give the details of the calculation of novelty values.
Step 1. Selection of the top 50% most frequently cited journals in order to reduce the probability of picking up trivial journal pairs.
For the publication year y of target publications, the top 50% cited journals were selected based on the total citations received by publications published in the period 1980 to (y − 1), where the citations come from publications published in the period (y − 3) to (y − 1). Let J50% = {J 1 , J 2 , J 3 ,…J n } be the set of the selected journals. All journal information was manually cleaned and disambiguated before the calculations.
Step 2. Construction of a journal combination matrix from the references of a target publication based on co-citation.
For a target publication p published in year y, we constructed a journal combination matrix A for the journals in J50%, with A ij ∈ {0, 1} representing whether journal i and journal j, regarding their publications published in the period 1980 to y, are co-cited by p (1) or not (0).
Step 3. Construction of an existence journal combination matrix based on co-citations to decide if a given journal combination in the matrix of step 2 is a new combination or not.
For p, we constructed the existence co-citation journal matrix E for the journals in J50%, with E ij representing the number of times publications in journal i and journal j, published in the period 1980 to (y − 1), are being co-cited by publications published in the same period. We then transformed E to a binary matrix B, with B ij = 1 if E ij > 0, B ij = 0 otherwise.
Step 4. Construction of a journal similarity matrix, in order to obtain a measure of the ease of forming journal pairs.
The journal similarity matrix C was calculated based on the co-citation matrix E from step 3 using cosine similarity. In C, C ij = COS i,j = J i × J j /(||J i || ×||J j ||).
Step 5. Construction of the reused journal combination matrix in order to further reduce the probability of picking up trivial journal pairs.
For p, we calculated the reused journal combination matrix D for the journals in J50%, with D ij = 1 representing that there is at least one publication in journal i and one in journal j, both published in the period 1980 to (y + 3), that are co-cited by at least one publication published in the period (y + 1) to (y + 3). If this condition is not true, D ij = 0.

Table 7
Model results for Nano (standardized regression coefficients) 95% confidence intervals for std. regression coefficients. The intervals based on White robust standard errors. N = 97,684. Confidence intervals marked with "*" concern total effects (direct effects + mediated effects) for the corresponding paths, while the other intervals concern direct effects for their corresponding paths Path  Table 8 Model results for Eco (standardized regression coefficients) 95% confidence intervals for std. regression coefficients. The intervals based on White robust standard errors. N = 64,409. Confidence intervals marked with "*" concern total effects (direct effects + mediated effects) for the corresponding paths, while the other intervals concern direct effects for their corresponding paths Path  Table 9 Model results for ILS (standardized regression coefficients) 95% confidence intervals for std. regression coefficients. The intervals based on White robust standard errors. N = 9074. Confidence intervals marked with "*" concern total effects (direct effects + mediated effects) for the corresponding paths, while the other intervals concern direct effects for their corresponding paths Path

Table 10
Model results for Nano (non-standardized regression coefficients) 95% confidence intervals for non-standardized regression coefficients. The intervals based on White robust standard errors. N = 97,684. Confidence intervals marked with "*" concern total effects (direct effects + mediated effects) for the corresponding paths, while the other intervals concern direct effects for their corresponding paths Path

Table 11
Model results for Eco (non-standardized regression coefficients) 95% confidence intervals for non-standardized regression coefficients. The intervals based on White robust standard errors. N = 64,409. Confidence intervals marked with "*" concern total effects (direct effects + mediated effects) for the corresponding paths, while the other intervals concern direct effects for their corresponding paths Path

Table 12
Model results for ILS (non-standardized regression coefficients) 95% confidence intervals for non-standardized regression coefficients. The intervals based on White robust standard errors. N = 9074. Confidence intervals marked with "*" concern total effects (direct effects + mediated effects) for the corresponding paths, while the other intervals concern direct effects for their corresponding paths Path   16,17,18,19,20,21,22 and 23.

Table 16
Model results for Nano (standardized regression coefficients) for the USA datasets 95% confidence intervals for std. regression coefficients. The intervals based on White robust standard errors. N = 25,751. Confidence intervals marked with "*" concern total effects (direct effects + mediated effects) for the corresponding paths, while the other intervals concern direct effects for their corresponding paths Path

Table 17
Model results for Eco (standardized regression coefficients) for the USA dataset 95% confidence intervals for std. regression coefficients. The intervals based on White robust standard errors. N = 23,770. Confidence intervals marked with "*" concern total effects (direct effects + mediated effects) for the corresponding paths, while the other intervals concern direct effects for their corresponding paths Path

Table 18
Model results for ILS (standardized regression coefficients) for the USA dataset 95% confidence intervals for std. regression coefficients. The intervals based on White robust standard errors. N = 3755. Confidence intervals marked with "*" concern total effects (direct effects + mediated effects) for the corresponding paths, while the other intervals concern direct effects for their corresponding paths Path

Table 19
Model results for Nano (standardized regression coefficients) for the full-not-USA dataset 95% confidence intervals for std. regression coefficients. The intervals based on White robust standard errors. N = 71,933. Confidence intervals marked with "*" concern total effects (direct effects + mediated effects) for the corresponding paths, while the other intervals concern direct effects for their corresponding paths Path

Table 20
Model results for Eco (standardized regression coefficients) for the full-not-USA dataset 95% confidence intervals for std. regression coefficients. The intervals based on White robust standard errors. N = 40,639. Confidence intervals marked with "*" concern total effects (direct effects + mediated effects) for the corresponding paths, while the other intervals concern direct effects for their corresponding paths Path

Table 21
Model results for ILS (standardized regression coefficients) for the full-not-USA dataset 95% confidence intervals for std. regression coefficients. The intervals based on White robust standard errors. N = 5319. Confidence intervals marked with "*" concern total effects (direct effects + mediated effects) for the corresponding paths, while the other intervals concern direct effects for their corresponding paths Path  " + " indicates that the hypothesis is confirmed by a positive relationship between the two, "-" indicates a negative relationship, whereas "/" indicates that there is no relationship (The confidence interval not including 0 means a positive/ negative relationship, while the confidence interval including 0 means no relationship.). For each cell, two results are given only if the results are different when considering control variables compared to not considering them  "+" indicates that the hypothesis is confirmed by a positive relationship between the two, "−" indicates a negative relationship, whereas "/" indicates that there is no relationship (The confidence interval not including 0 means a positive/ negative relationship, while the confidence interval including 0 means no relationship.). For each cell, two results are given only if the results are different when considering control variables compared to not considering them

Table 24
Model results for Nano (standardized regression coefficients) for the full dataset minus 1 obs 95% confidence intervals for std. regression coefficients. The intervals based on White robust standard errors. N = 97,683. Confidence intervals marked with "*" concern total effects (direct effects + mediated effects) for the corresponding paths, while the other intervals concern direct effects for their corresponding paths Path

Table 25
Model results for Eco (standardized regression coefficients) for the full dataset minus 1 obs 95% confidence intervals for std. regression coefficients. The intervals based on White robust standard errors. N = 64,408. Confidence intervals marked with "*" concern total effects (direct effects + mediated effects) for the corresponding paths, while the other intervals concern direct effects for their corresponding paths Path

Table 26
Model results for ILS (standardized regression coefficients) for the full dataset minus 1 obs 95% confidence intervals for std. regression coefficients. The intervals based on White robust standard errors. N = 9073. Confidence intervals marked with "*" concern total effects (direct effects + mediated effects) for the corresponding paths, while the other intervals concern direct effects for their corresponding paths Path