Why do I publish research articles in English instead of my own language? Differences in Spanish researchers’ motivations across scientific domains


Previous studies have reported the increased use of English as the “lingua franca” for academic purposes among non-Anglophone researchers. But despite data that confirm this trend, little is known about the reasons why researchers decide to publish their results in English rather than in their first language. The aim of this study is to determine the influence of researchers’ scientific domain on their motivation to publish in English. The results are based on a large-scale survey of Spanish postdoctoral researchers at four different universities and one research centre, and reflect responses from 1717 researchers about their difficulties, motivations, attitudes and publication strategies. Researchers’ publication experiences as corresponding authors of articles in English and in their first language are strongly related to their scientific domain. But surprisingly, Spanish researchers across all domains expressed a similar degree of motivation when they write research articles in English. They perceive a strong association between this language and the desire for their research to be recognized and rewarded. Our study also shows that the target scientific audience is a key factor in understanding the choice of publication language. The implications of our findings go beyond the field of linguistics and are relevant to studies of scientific productivity and visibility, the quality and impact of research, and research assessment policies.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. 1.

    In the field of contrastive rhetoric, this concept is based on the assumption that language learners will transfer the rhetorical or stylistic features of their native language to the target language, causing interference in second language writing (Connor 1996; Davies 2003).

  2. 2.

    “The argument of embeddedness” (Granovetter 1985: 481) states that behaviours and institutions are constrained by ongoing social relations.

  3. 3.

    Webster’s Dictionary (http://www.webster-dictionary.org) defines ‘value orientation’ as “principles of right and wrong that are accepted by an individual or a social group”. According to McCarty and Hattwick (1992: 34), “cultural value orientations represent the basic and core beliefs of a culture; these basic beliefs deal with human’s relationships with one another and with their world”.

  4. 4.

    The concept of ‘discourse community’ is widely used in the literature on multilingual researchers’ international publication practices. Swales (1990: 29) uses this notion to describe a group of individuals defined by six characteristics: “common goals, participatory mechanisms, information exchange, community-specific genres, a highly specialized terminology and a high general level of expertise”.

  5. 5.

    ‘Integrated regulation’ is the most developmentally advanced form of extrinsic motivation. It involves regulations that are fully assimilated within the individual's other values, needs, and identities.


This study is part of a project financed by the Spanish Ministry of Science and Innovation (Ref. FFI2009-08336/FILO; Ana I. Moreno, Principal Investigator). Our study would not have been possible without the collaboration of the following institutions and researchers: Consejo Superior de Investigaciones Científicas (CSIC), Universidad de León, Universidad de La Laguna, Sally Burgess and Pedro Martín-Martín, Universitat Jaume I, María Lluisa Gea Valor, Universidad de Zaragoza, Rosa Lorés, Pilar Mur and Enrique Lafuente. Our particular thanks go to Itesh Sachdev, School of Oriental & African Studies, University of London. We express our appreciation to members of the technical staff (José Manuel Rojo, Belén Garzón and Almudena Mata) of the Statistical Analysis Unit of the Centro de Ciencias Humanas y Sociales (CCHS-CSIC), and the Centro de Supercomputación de Galicia (CESGA). Our thanks also go to all our interview informants and survey participants. We are also grateful to María Bordons and the two reviewers for their thoughtful reading and constructive comments and suggestions. We thank K. Shashok for improving the use of English in the manuscript.

Appendix 1: formulation of the position index (PI)

The PI is formulated as follows (Silva 1997; author’s translation into English):

Let Pi be the proportion of individuals who choose the category i of the scale (in our case i can take integer values between 1 and 5). The weighted score M is calculated as follows:

$$M = \sum\limits_{i = 1}^{k} {iP_{i} }$$

Accordingly, PI is defined as follows:

$${\text{PI}} = \frac{M - 1}{k - 1}$$

Appendix 2: PROXSCAL procedure for calculating distances among scientific domains

PROXSCAL (proximity scaling) uses multidimensional scaling to find the structure in a set of proximity measures between objects such that the distances between points in the space match the given (dis)similarities as closely as possible (Meulman and Heiser 2010).

Distances are calculated as follows: given the table of averages for the variables (in our case, the ratings of different motivations for publishing in English and Spanish), in each of the groups (in our case each of the domains and languages), a distance matrix was constructed such that cell ij corresponds to the distance between the averages of groups ij.

Starting with a table such as the one below (see, for example, Table 5).

  Natural and Exact Sciences (NE) Technological Sciences (TS) Social Sciences (SS) Arts and Humanities (AH)
Item 1 Average NE1 Average TS1 Average SS1 Average AH1
Item 2 Average NE2 Average TS2 Average SS2 Average AH2
Item n Average NEn Average TSn Average SSn Average AHn

we converted the information to a matrix with the following structure:

NE   X 1 Y 1 Z 1
TS X 2   Y 2 Z 2
SS X 3 Y 3   Z 3
AH X 4 Y 4 Z 4  

where each of the values from X 1 to Z 4 are the Euclidean distances, calculated as follows for each domain in each language:

$${\text{X}}_{1} = \left[ {\left( {{\text{Average NE}}_{ 1} - {\text{Average TS}}_{ 1} } \right)^{ 2} + \, \left( {{\text{Average NE}}_{ 2} - {\text{Average TS}}_{ 2} } \right)^{ 2} + \ldots + \, \left( {{\text{Average NE}}_{\text{n}} - {\text{Average TS}}_{\text{n}} } \right)^{ 2} } \right]^{ 1/ 2}$$

To make distances between English and Spanish comparable, averages were homogenized through ranks, due to the differences in size among the subsamples (i.e. the number of informants who reported having published in English and in Spanish, and who were therefore asked to assess their motivations for publishing in one language or another). This made it possible to represent assessments of the motivations for publishing in either language in the same plane in a PROXSCAL graph.

Appendix 3: procedure for the allocation of respondents to a specific scientific domain

The procedure is based on the following assumptions: (a) Researchers belonging to a specific domain have a profile determined by the presence or absence of particular UNESCO codes; (b) Researchers working simultaneously in two scientific areas do not necessarily work 50 % in each; instead they work mainly to a single domain. To resolve draws (i.e. respondents belonging to more than one domain), we developed a model based on the UNESCO codes to predict which domain each researcher belongs to. We started with those who selected UNESCO codes in both Natural and Exact Sciences and in Technological Sciences. Taking into consideration the different UNESCO codes selected by individuals in NE only or in TS only, we developed a model to predict the domain that best fit each respondent’s profile. A logistic regression model was used to estimate the coefficients of the model, using only sample units that belonged to a single domain.

$$P\left( {Y = {\text{Domain}}1/{\text{Unesco}}_{11} \ldots {\text{Unesco}}_{99} } \right) = {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 {1 + e^{{ - \left( {\sum {b_{i} *{\text{unesco}}_{i} } } \right)}} }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${1 + e^{{ - \left( {\sum {b_{i} *{\text{Unesco}}_{i} } } \right)}} }$}}$$

To estimate the parameters and evaluate the predictive model we used only the sample with no draws and then applied this model to the rest of the sample (i.e. researchers with codes belonging to more than one domain). We used only UNESCO codes with σ > 0. To resolve the logical problems of multiple correlations between the codes, the data matrix was reduced by factor analysis without rotation, as this technique ensures orthogonality of the factors. The predictive capacity of this model is shown in Table 9. The model correctly classified 99.6 % of cases, thus showing optimal predictive capacity.

Table 9 Classification tablea

Appendix 4: factorial analyses: model summary

See Tables 10, 11, 12, 13 and 14.

Table 10 Total variance explained
Table 11 Communalities
Table 12 Kaiser–Meyer–Olkin and Bartlett’s test
Table 13 Correlation matrix for motivations to publish in Englisha,b
Table 14 Correlation matrix for motivations to publish in Spanisha,b

