1 Introduction

Two of the most commonly applied measures of professional recognition in the academia are the number of articles published in recognised journals and the number of times they are cited by fellow researchers [23]. These numbers are frequently examined when decisions for funding, promotion, and career advancement are taken [7]. Moreover, publications and citation counts are also used to evaluate departments, schools and journals [19]. In this article we will focus on citation counts in particular.

The widespread use of citation counts for academic evaluation is based on the assumption that researchers select references based on the degree to which they are relevant and contribute to the authors’ own work, and that all important sources are cited [12]. In addition, citation counts are frequently used as a proxy for publication quality [1]. It is therefore common to assume that highly cited articles are more influential and of higher quality than less cited ones.

Nevertheless, the validity of applying citation counts in order to evaluate research has been questioned by several scholars. Because space is limited in journal articles, not all sources drawn upon by researchers are cited in their works, leaving room for secondary citer motives [17]. In research fields such as ecology [12], crime-psychology [23], medicine [5, 13] and chemistry [10] factors unrelated to scientific relevance and quality have been found to be associated with citation counts.

This research has typically made a distinction between the characteristics of the article, the characteristics of the authors who have written it, their affiliation, and the characteristics of the journal in which the article has been published [15, 21, 23].

The influence of author characteristics on citation counts have been investigated in previous research. Research on articles in sociology [6] and astrophysics [2] finds that articles written by men are more cited than articles written by women. However, the citation count of articles in ecology suggests that gender does not influence citation counts in this particular field of research [12]. Research also suggests that articles with a first author from an English-speaking country receive significantly more citations than articles written by authors from non-English-speaking countries [12], even though both groups wrote in English.

Research looking at the influence of articles’ characteristics on citation counts has found a significant increase in citation rates with the number of authors [3, 4]. Also, articles that lead an issue, i.e. appear first, are generally more cited than articles appearing at the back of an issue [18]. Moreover, because there is more content to cite in long articles than in short articles, there is a tendency for longer articles to be cited more frequently [2, 11].

This brief review of research on the influence of different more or less extra-scientific variables on citation counts is drawn from studies of articles which have been published in several fields of research. However, to the best of our knowledge, no similar study has been conducted on articles published in transportation journals; leaving open ended the question whether a citation count is a valid measure of scientific merit in the research field of transportation.

Because unfair research evaluation can prove a major source of frustration in a scientific community and a potential threat to the scientific enterprise [17], this article aims to investigate the influence on article citation counts from attributes related to article characteristics and author characteristics, controlling for the journal and year in which the article was published. More specifically, the purpose is to assess whether different factors unrelated to scientific relevance and quality influence citation counts in transportation research. This can be useful in order to evaluate researchers and research departments in the field of transport.

The remainder of this article is organised as follows. Section 2 describes the data sources and the variables applied in the study. Section 3 presents the model specification and the estimation results. Lessons to be learned from the findings are discussed in section 4. Lastly, in section 5, conclusions are drawn based on our findings.

2 The data and a priori assumptions

2.1 The data

The data used in this study are drawn from the following five transportation journals: Transportation (TR), Transportation Research Part A (PA), Transportation Research Part B (PB), Transportation Science (TS) and Journal of Transport Economics and Policy (JT). These are all internationally recognised peer reviewed journals published by Springer (TR), Elsevier (PA, PB), The Institute for Operations Research and the Management Sciences (TS) and the University of Bath (JT).

We use data from Scopus, the world’s largest abstract and citation database of peer-reviewed literature (www.scopus.com), on all the articles published in the five above-mentioned journals from January 1st 2000 to December 31st 2004; i.e. over a 5-year period. This particular time period was chosen because it allows sufficient time for articles to reveal their merit. The dataset includes 783 articles which have in total been cited 21.987 times by December 27th 2012. This means 28.1 citations per article. Table 1 shows the relevant data for each of the five journals.

Table 1 Descriptive statistics for the journals analysed

Table 2 summarises the variables explored in this study. The dependent variable, i.e. the number of article citations (CI), refers to the number of times each article in our dataset has been cited in Scopus by December 27th 2012. The value of CI includes self-citations.

Table 2 Summary statistics of the dataset

Tables 3 and 4 show the pairwise correlation matrix of the explanatory variables. The highest correlation coefficients are between PA and PB (−0.38), between KW and PA (0.36), and between KW and JT (−0.36). The first coefficient indicates that references to Transportation Research Part A and to Transportation Research Part B are substitutes to some degree. The latter two indicate that Transportation Research Part A and Journal of Transport Economics and Policy allow most and least keywords, respectively. These three coefficients are all statistically significant at the 5 % level. Later tests show that the correlations in Tables 3 and 4 do not cause any estimation problems as far as our model specification is concerned.

Table 3 Pairwise correlation matrix of the explanatory variables
Table 4 Pairwise correlation matrix of the explanatory variables

2.2 A priori assumptions

Nine independent variables are regressed against CI. Four of these are article characteristics and five are author characteristics. In addition, we used dummy variables to control for the journal, and year, in which the articles were published.

The first article characteristic included in our model is KW which controls for how many keywords the authors of each article have listed. Online databases like Scopus or Web of Science search in keywords. It is therefore likely that articles with many keywords are in more search results than articles with fewer keywords. Because awareness of an article is a prerequisite for it being cited, we hypothesise that there is a positive association between citation count and the number of keywords. The second article characteristic in our model is the number of references (RF) the authors of each article have cited. Authors of scientific articles may ask to be notified electronically when their articles are cited. It can therefore be assumed that more researchers will be notified about an article with a long reference list, making us expect to find a positive association between citation count and the number of references in an article.

The third article characteristic is the length of the article abstract (WA), measured by number of words. Online databases search in abstracts; it is therefore reasonable to assume that articles with long abstracts are in more search results than articles with short abstracts. We therefore hypothesise that there is a positive association between citation count and the number of words in the abstract. The fourth, and final, article characteristic is the length of the article title (WT), measured by number of words. Because online databases also search in article titles, an article with a long title can be expected to appear in the result list of more searches than articles with shorter titles. This draws in the direction that higher WT leads to more counts. However, a negative association between title length and citation count has been made by Paiva, Lima and Paiva [16], and shorter titles might be more frequently cited because long or confusing titles may act as deterrents to further reading [22]. Consequently, it is difficult to come up with firm a priori hypotheses on how WT influences the counts.

The first author characteristic included in our model is the alphabetical position of the first author’s surname (AN). Tregenza [20] found that articles written by authors whose surname begin with a letter close to the beginning of the alphabet are more frequently cited than articles written by authors whose surname is at the end of the alphabet. We therefore expect to find that articles with a first author whose surname begins with a letter early in the alphabet is more frequently cited than articles written by authors whose surname is late in the alphabet. The second author characteristic in our model is the gender of the first author (GE). Because articles written by women have been found to be less cited than articles written by men [2, 6], we expect to find that articles by women are less cited than articles by men. English as national language (EN) is the third author characteristic in our model. This variable controls for whether articles with a first author from a country where English is the national language receive more citations than other articles. Because previous studies have found higher citation counts for articles written by authors from countries where English is the native language [12], we expect to find that articles written by authors from English-speaking countries receive more citations. The fourth author characteristic is the number of authors (NA). Articles written by more than one author may benefit from division of labour and are, other things being equal, likely to be distributed to larger research networks. We therefore expect to find that articles written by more than one author are more cited than articles written by one author. The fifth and final author characteristic is the number of countries (AC) the authors are from. This variable controls for whether cross border cooperation in writing scientific articles influences how often they are cited. Articles written by authors from more than one country may benefit from division of labour and are, other things being equal, likely to be distributed to larger research networks than articles written by authors from one country only. The hypothesis is therefore that articles with co-authors from more than one country are more frequently cited than articles written by authors from one country only.

The single most important factor driving citations to an article is the prestige or average citation rate of the journal in which the article was published [9]. In order to control for the prestige, and average citation rate, of the journals from which the articles were derived, five dummy variables represent the journals; TR = 1 if the article was published in Transportation, TR = 0 otherwise, etc. In addition, dummy variables (Y2000, Y2001, Y2002, Y2003) representing the publication year of the article were added to our model. A reasonable assumption is that articles published in 2000, 2001, 2002 and 2003 have more citations in 2012 than articles published in 2004 because they are older. It is also worth noting that the values of WT, AN, GE and EN have little or nothing to do with the scientific quality of the article. Hence, they can be regarded as non-scientific factors.

3 Model specification and estimation results

3.1 The model

The following model is employed to investigate the influence on article citation counts from attributes related to article and author characteristics, when controlling for publication journal and publication year:

CI = β KW KW + β RF RF + β WA WA + β WT WT + β AN AN + β GE GE + β EN EN + β NA NA + β AC AC + β TR TR + β PA PA + β PB PB + β TS TS + β 2000 Y 2000 + β 2001 Y 2001 + β 2002 Y 2002 + β 2003 Y 2003 + ε
(1)

in which ε is a random error term with constant variance and an expected value of zero. The analysis is made at the article level, and it follows from Eq. 1 that CI j = β j , j = KW RF WA . Hence, the marginal change of the dependent variable (CI) with respect to one unit change in j, j = {KW,RF,WA, …}, is β j .

It follows from the hypotheses presented earlier, and from previous research, that, all other things being equal, articles with a first author whose surname begins with a letter at the end of the alphabet is less cited than an article written by a first author whose surname begins with a letter figuring early on in the alphabet. Moreover, we assume, other things being equal, that articles written by females are less cited than articles written by men. Hence, we assume that β AN , β GE < 0. Earlier discussion suggests that the sign of β WT is uncertain. The remaining independent variables are assumed to have positive impacts on the number of citations. Therefore, we assume as follows: β KW , β RF , β WA , β EN , β NA , β AC , β TR , β PA , β PB , β TS , β2000, β2001, β2002, β2003 > 0. As such, all the independent variables, with the exception of the placing in the alphabet of the first author’s surname (AN) and the gender of the first author (GE) are assumed to influence the number of citations positively.

3.2 Model estimates

Table 5 provides results for the OLS-analyses of Eq. 1 using 774 articles published between 2000 and 2004 in 5 transportation journals. The F-value shows that the model is significant at the 1 % level. The explanatory power of the model (R2) is 0.14, suggesting that the model explains 14 % of the variance in article citation counts in the 774 transportation articles. All t-statistics of variable coefficients are calculated using White [24] robust standard errors to correct for heteroscedasticity. To check for multicollinearity, we have estimated the variance inflation factors (VIF). The VIF in our model ranges from 3.47 to 1.02 with an average of 1.72; this is well below 10, at which level researchers would begin to worry about multicollinearity [8].

Table 5 Regression analysis (dependent variable: citations per article, n = 774, R2 = 0.14, F-statistic: 6.33**)ab

The estimations show that all author and article characteristics included in the model, except GE, influence the citation counts (CI) in the hypothesised directions. Three of the characteristics (RF, WT, AC) are statistically significant at the 1 % level or better. Overall, based on the results of our analysis, the following conclusions can be inferred.

Firstly, two of the four explanatory variables representing article characteristics were found to contribute significantly to citation counts. One of these (RF) is positively associated with the number of times the articles in our study have been cited, and one variable (WT) is negatively associated with CI.

The regression coefficient of RF is 0.48 (p < 0.01). Thus, one additional reference in the articles studied in this paper was found to be associated with an increase in the number of times the article was cited by 0.48. This finding, i.e. that the reference count is positively associated with the citation count, should not be interpreted as an indicator that researchers look at the number of references cited by the authors of an article before deciding to cite the article itself. It is rather more likely that the reference count is an indicator of the quality of the article, i.e. that articles of high quality have more references. The rationale for such an argument is that an extensive literature review, increasing the reference count, implies that the article is built on a firm theoretical basis. Also, several authors have automatic identification of articles in which they are cited. It can therefore be assumed that more authors are notified about articles with long reference lists than short ones. Finally, a high number of references may also indicate that the research community is interested in the research topic of the article, increasing the probability that fellow researchers will cite an article with a high reference count.

The regression coefficient of WT is −0.73 (p < 0.01). Thus one additional word in the title of the articles studied in this paper was found to be associated with a reduction in the number of times the article was cited by 0.73.

Secondly, of the five author characteristics regressed against citation counts (AN, GE, EN, NA, AC), only the dummy variable on international cooperation (AC) is found to significantly influence the citation count of the articles studied in this article. The regression coefficient of AC is 8.86 (p < 0.01). The finding that articles which are the result of international cooperation are more frequently cited can be due to the following factors. (1) Network effect: Knowledge about an article can be expected to be spread to a wider audience of researchers faster when it has been written by authors from more than one country. This is because the authors of such articles more likely belong, or have access, to separate research networks than authors from one country. It is known from White [25] and Mählck and Persson [14] that researchers tend to cite articles written by researchers with whom they have a personal acquaintance. (2) Complementary competencies: The rationale behind international cooperation in research may be that this provides access to highly competent researchers, thus resulting in high quality research which will be more frequently cited. (3) Quality projects: Because the start-up cost of international research projects is likely to exceed the start-up cost of a national project, it is to be expected that such projects are only initiated when a really good idea exists. Research projects based on good ideas may in turn be expected to generate more citations. (4) Wider interest: An author will probably not participate in a research project without an interest in the research topic being dealt with. It is reasonable to assume that some research topics are the domain of only some countries. Authors from more than one country indicate that an international research community exists which is interested in the topic dealt with in the article concerned. Such international research communities can be expected to include more members than national research communities and, as such, articles written by authors from more than one country can be expected to be cited more frequently than articles by authors coming from just one country.

4 Lessons to be learned

The most important lesson to be drawn by researchers in transportation based on our findings is that international cooperation does pay off in the form of higher citation counts. Articles written by researchers from more than one country can be expected to be more cited than articles written by authors from just one country. This effect can probably be attributed to complementary effects, i.e. that the team members from the different countries participating in the project in question possess complementary competencies. Thus, researchers should carefully consider prospective international partners based on whether they can contribute with knowledge and competencies not already present in the project group.

Another important lesson researchers can draw from our findings is that they can expect to be more cited if they give their article a short and concise title that encourage readers to read the full article. Finally, articles with extensive literature reviews will, other things being equal, receive more citations than articles with short reference lists. Consequently, researchers are recommended to take the time and effort needed to conduct an extensive literature review, even though this may not necessarily improve the actual quality of the article.

Funders of research in transportation should, based on our findings, encourage international cooperation when they announce new funding and thus favour research projects with international participation. The rationale for this is that research projects with participants from more than one country can be expected to, in accordance with the above-mentioned arguments, produce higher quality research, which will also therefore be more frequently cited. Funders should, however, also be aware that factors unrelated to scientific relevance and quality influence the number of times research articles are cited. Consequently, they should be careful in considering citation counts as anything more than a crude measure of research quality, and not make this a decisive factor when allocating research funds.

Editors can learn from our findings that their journal is likely to achieve a higher impact factor if the articles published in their journals have titles which are easy to understand and not so long that readers get bored when reading them. Shorter titles will not in itself increase the quality of the articles published in the journal, but because the impact factor is a metric often interpreted as a measure of journal quality increasing the impact factor may raise public perception regarding the quality of the journal. This may subsequently persuade authors to submit articles of higher quality to the journal concerned. Consequently, by implementing strategies that improve article titles, an editor may, indirectly, increase the quality of the articles published in the journal he or she edits.

Editors can also learn from our findings that the impact factor of the journal they edit can indirectly be influenced by the thoroughness of the literature review in the articles they publish. Greater emphasis on literature reviews will lead to more citations and a higher impact factor, in turn leading to greater prestige for the editor. More importantly, it will also increase the probability that the theoretical foundation of the published articles is at the frontier of research, making these articles more interesting to read and more likely to be cited by fellow researchers. Article reviewers should therefore be asked by editors to place great emphasis on the thoroughness of the literature reviews in articles being considered for publication.

In sum, the implications of the research presented in this article can help transportation researchers produce articles that are cited more frequently, thereby enhancing their professional standing. Funders of transportation research may by implementing the suggestions in this article experience that the research they fund is more relevant for the research community and thus has a greater impact on the development of research. Editors of transportation journals can, by following the suggestions proposed in this article, improve the quality of the articles published in their journals, make the journals more relevant for the research community, and also not least improve their journals’ professional standing.

5 Conclusion

The main objective of this article has been to investigate the influence on article citation counts from attributes related to article and author characteristics. The purpose was to assess whether different factors unrelated to scientific relevance and quality influence citation counts in these particular journals. Controlling for publication year and the journal in which the articles were published, two factors were found to be positively associated with citation count: (1) The number of references cited in each article and (2) whether the article was written by authors from more than one country. Moreover, short-titled articles were found to be more cited than articles with longer titles.

Finally, it should be remembered that our results – in line with all empirical studies have weaknesses. First, the articles analysed in our study were published in a relatively narrow timeframe of 5 year, and it could be argued that the results would have been different had articles published in another timeframe been analysed. Second, only articles published in a limited number of journals were analysed, these journals are probably not representative of all transportation journals. Third, our analysis did not include any measure of the quality of the research design employed in the articles. Because high quality research design produce more trustworthy results, such studies are likely to be more frequently cited. Last, we have no explanatory variables that indicate the academic merits to the authors.

Despite the above limitations, this paper is the first to investigate the influence on citation counts of article and author characteristics in transportation research. Researchers, funders and editors of transportation research can use these findings to maximise the impact of their research.