Introduction

In 2009, an estimated 194,280 new cases of breast cancer were diagnosed in the United States; breast cancer was estimated to account for 27% of all new cancer cases and 15% of cancer-related mortality in women [1]. Similarly, in Europe in 2008, the disease was reckoned to account for some 28% and 17% of new cancer cases and cancer-related mortality in women, respectively [2].

The last 50 years have seen an exponential increase in scientific yield generally, and particularly in oncology; a recent report demonstrated that in January of 2009 alone there were 11,215 new cancer-related papers and 1,220 review articles indexed in Pubmed [3]. The importance of quantitative and qualitative assessment of scientific output has increased in tandem with this information explosion, and these assessments now play an integral role in decisions regarding grant funding and prioritisation of resources, as exemplified by the Research Assessment Exercise in the UK [4]. Despite its aforementioned disease burden, relatively little effort has previously been made to understand the trends emanating from the breast cancer-associated literature. While there has been some concentration on the bibliometrics of cancer research generally [5, 6], just three publications have evaluated breast-related output specifically; Dalpe et al. focused on the identification of BRCA1 and BRCA2 in the 1990 s [7], while Donato et al. published an analysis of the Portuguese contribution [8], and Li and McCain focused specifically on the development of research themes in the radiological detection of breast cancer [9]. The primary aim of this present work was thus to provide an in-depth evaluation of research yield in breast cancer from 1945 to 2008, using large-scale data analysis, the employment of bibliometric indicators of production and quality, and density-equalizing mapping.

Materials and methods

Data source

Data were retrieved from the Web of Science (WOS) Science Citation Expanded database (SCI-Expanded) produced by Thomson Reuters. In order to approximate the overall number of published items on breast cancer, the following search strategy was employed; TS = ((phyllodes tumo$r$) OR (Cystosarcoma Phyllo$des) OR (Malignant Cystosarcoma Phyllodes) OR (breast invasive ductal carcinoma) OR (infiltrating duct carcinoma$) OR (mammary ductal carcinoma$) OR (breast cancer) OR (breast neoplasm$) OR (breast tumo$r$) OR (human mammary neoplasm$) OR (human mammary carcinoma$)) where TS = Topic search, $ = any character. Because this work was designed to assess overall activity in relation to breast cancer, we did not refine our search to include some document types such as original articles or reviews, or to exclude others such as letters and editorials. The time span analysed was 1945 to 2008 inclusive. The search was performed in November 2009, and thus 2009 was excluded as database entries for this period would not have been complete at the time of the search.

Each item of information downloaded from the WOS was contained in a 'data block'. Each block was preceded by a tag which gave information about the content of the block (that is, AU = authors, TI = title, PY = publication year). Software developed at the Charite University in Berlin was then employed to parse the data. Each time it found a tag it read the associated data and saved it to an Access database; the information was then later transferred to an Excel database for analysis. Published items were analysed using the citation report method as described previously [10, 11]. The number of citations per year and the average number of citations per item were assessed, thereby indicating the average number of citing articles for all items in the set. This is the sum of the times cited divided by the number of results found.

Mapping was performed as described by Groneberg-Kloft et al. in 2008 [12]. Those nations which had contributed output were resized according to one of a number of different variables under study; that is, the average number of citations per item from each country. As part of this resizing procedure, the area of each country was scaled relative to, for example, the total number of items it had published on breast cancer. Specific calculations were based on Gastner and Newman's algorithm [13], published in 2004. These calculations employ a diffusion equation in the Fourier domain borrowed from elementary physics, which allows variable resolution by tracking moving boundaries [13, 14].

Cooperation analysis was employed to determine bilateral and multilateral cooperation between countries on breast cancer research. A cooperation network between countries was computed by checking all combinations of those countries which registered international cooperation on at least 25 items over the study period. These data were then saved to a "matrix" or two-dimensional table, and the software then read this matrix and produced a density-equalising map which graphically represented this data. The threshold of 25 articles was set to improve readability.

Journals which had published on breast cancer were analysed relative to both the Journal Impact Factor (IF) and the recently developed Eigenfactor (EF). The former is based on two elements; the numerator, which is the number of citations in the current year to items published in the previous two years, and the denominator, which is the number of substantive articles and reviews published in the same two years [15]. The EF is calculated based on a complex algorithm that takes into account not only the quantity of citations but also their "quality" by assigning weights to the source of the citations. The full details of the algorithm can be found online [16].

Results

Total number of published items

The number of published items on breast cancer was employed as an index of research productivity. During the period 1945 to 2008 (1974 excluded, n = 352), a total of 180,126 items were produced on this topic, as catalogued in the WOS. The earliest studies catalogued were published in 1945 (n = 17), although it was 1990 before activity began to increase considerably, year on year (Figure 1); output more than doubled from 1990 (n = 1,436) to 1992 (n = 3,342). The greatest output for any year was that for 2008 (n = 17,413).

Figure 1
figure 1

Total breast cancer-related output and associated citations, Web of Science, Science Citation Expanded Database, 1945-2008.

Total number of citations

The 180,126 indexed items have been cited 4,136,224 times since 1945. Figure 1 demonstrates the parallel increase in the number of citations in conjunction with the increase in published items. Articles published in 2001 were responsible for more citations than those published in any other year (n = 274,601). The average number of citations per item was greatest in 1957, however, when 40 items were responsible for 2,767 citations, returning an average of 69.01 citations per item published. There has been a downward trend in the average number of citations per item since the millennium.

Country of origin

A total of 155 different countries contributed to the literature on breast cancer over the study period. The United States was responsible for the greatest output, returning 77,101 items. Other high output countries included the United Kingdom (n = 18,357), Germany (n = 12,529), Italy (n = 10,828) and Japan (10,109) (Table 1). Density equalising mapping of this dataset demonstrates that a relatively small number of countries was responsible for the majority of the output (Figure 2). The Gambia had the highest average citation rate per item (67.67), followed by Kenya (40.69), and Costa Rica (39.53) (Table 1). When confined to those countries which had produced at least 30 items, however, those with the highest average citation per item were Iceland (56.62), Finland (35.48), Denmark (32.88) and Switzerland (31.85) (Figure 3).

Table 1 Leading countries by output and average citations per item, 1945-2008
Figure 2
figure 2

Density equalizing calculations, total output by country. Illustration of the total number of breast-related items, per country. The size of each country has been scaled in proportion to the total number of publications. A colour-coded system shows the publication numbers.

Figure 3
figure 3

Density equalizing calculations, research quality. Illustration of the average number of citations per breast cancer-related item, per country. The size of each country has been scaled in proportion to the average number of citations per item. A colour-coded system shows the average number of citations per item. Threshold excludes countries with <30 items published.

Cooperation analysis was employed to assess bilateral and multilateral cooperation from 1973 to 2008; the first item in the dataset produced as a result of international cooperation was published in 1973. In total, 142 different countries had collaborated on at least one item published. International cooperation increased steadily through the study period, reaching a peak in 2008, with 3,127 entries produced as a result of cooperation. Bilateral cooperation was the most common form of cooperation (19,437 entries), followed by trilateral cooperation (n = 3,157) and quadrilateral cooperation (n = 836). Cooperation between the United States and Canada was the most common form of bilateral cooperation (n = 2,223), followed by that between the United States and the United Kingdom (n = 2,007) (Figure 4). Relationships between the United States and other countries formed the basis for the 10 most common forms of bilateral cooperation (Table 2).

Figure 4
figure 4

Radar chart of cooperation density. Threshold >25 international cooperative partnerships.

Table 2 Top 25 collaborating relationships, breast cancer-related items, 1949-2008

Publishing journals

A total of 4,096 journals had published at least one item on breast cancer. The journals which have published most prolifically on breast cancer, led by Cancer Research (5,290 items), are listed in Table 3. The top 50 most prolific titles, representing just 1.2% of all contributing journals, together accounted for over 43% (77517/180126) of the total output. Thirty of these top 50 titles were in the category 'Oncology' of the Journal Citation Report; other represented subject categories included 'Surgery' (n = 5), 'Pathology' (n = 4), 'Radiology, Nuclear Medicine and Medical Imaging' (n = 4). 'Biochemistry and Molecular Biology' (n = 3), and 'Medicine, General and Internal' (n = 3). The median impact factor (IF) and Eigenfactor (EF) of these titles was 4.73 and 0.05, respectively. Cancer Research also recorded the highest number of citations overall (n = 309,568), followed by the Journal of Clinical Oncology (n = 177,189), Cancer (n = 166,834), the Journal of the National Cancer Institute (JNCI) (n = 131,637), and the British Journal of Cancer (n = 110,307) (Table 3).

Table 3 Leading titles, breast cancer-related items, 1945 to 2008

Discussion

In his seminal work on the exponential growth of science, Little Science, Big Science, Price noted in 1963 that all of the scientific periodicals founded since the first, the Journal de Scavaus (first published in 1665), had together produced a world total of six million scientific papers over the course of the preceding 300 years [17]. By contrast, Druss demonstrated that in just 23 years, from 1978 to 2001, a total of 8.1 million articles were published in Medline [18]. The results of this present analysis have demonstrated this growth in breast cancer research specifically, with an average 15% increase in output annually since 1945, and a greater than 100% increase since the millennium alone. This compares with a recent analysis of total scientific output from PubMed, which estimated an average growth rate of 4% per year between 1957 and 2007 [4].

This analysis has employed the citation count as a proxy measure of research quality. Forming an essential component in the dialogue of medical research [19], citations are regarded as a key indicator of the relevance and importance of a published item. We have shown a parallel increase in citation count with the number of breast cancer-related articles, a not unexpected finding recently mirrored in analyses of scientific output on scoliosis [20] and asthma [10]. The average number of citations per year was highest in 1957, although this was thanks largely to the citation classic by Bloom and Richardson in which they outlined their system for the histological grading of breast cancer and its association with prognosis [21]; it has since been cited 2,259 times. To put this figure into perspective, Garfield noted in 2006 that of 38 million items cited from 1900 to 2005, only 0.5% were cited more than 200 times [15]. Although there has been a decreasing trend in the average number of citations per item since the mid-1990 s, it is difficult to draw firm conclusions on the relevance of this finding; it may be explained by the sharp increase in the number of outputs in the intervening years, or indeed by the time-lag associated with citation analysis which results in an inherent bias towards older publications.

This analysis has demonstrated the leading role which the United States plays in breast cancer research, a finding previously noted in other scientific disciplines [22, 23]. This is not surprising given the enormous amount of money spent on the management of breast cancer there annually; it has been estimated that new cases of breast cancer diagnosed globally in 2009 alone will have cost an estimated $28 billion; of this $28 billion, $16 billion was spent in the United States [24]. In addition to being the single largest contributor to the literature on breast cancer, the United States has further played a key role in fostering international cooperation, in particular with its neighbour Canada, but also with many European nations, including Germany, the United Kingdom and Italy.

The large number of nations involved in breast cancer research reflects its global burden. That said, the map of global production shown in Figure 2 clearly demonstrates the dramatic underrepresentation of South America, Africa, and to a lesser extent, Asia. Given that the majority of the predicted 26% increase in the incidence of breast cancer by 2020 will occur in the developing world [24], there needs to be a concerted effort to further involve these areas in future research initiatives, particularly focusing on how the cost-effective diagnosis and management of breast cancer can be delivered with levels of efficacy similar to those presently seen in Europe and the United States.

The quality of breast-related output from both the United States and the United Kingdom was high as measured using the average citation rate per published item as a proxy measure for quality. In addition, the contribution of many smaller countries, including Iceland, Finland, Switzerland and Denmark, was of high quality, with all four associated with impressive average citation rates. Interestingly, all of these countries collaborated internationally in a high proportion of their output (Figure 4) (Iceland 110/216, 50.92%; Finland 1,045/2,334, 44.77%; Switzerland 1,741/2,989, 58.24%; Denmark 1,050/2,377, 44.17%), suggesting perhaps that this form of cooperation results in improved quality, and hence citation rate, of associated output.

Our finding that the breast cancer-associated research has been published across over 4,000 journals reiterates the view that it is now impossible for those working in breast cancer to ensure that they appraise all of the relevant literature. Our work has, however, identified a core set of journals publishing on breast cancer, with the top 50 accounting for 43% of the total output. The median IF and EF of these titles compares particularly well with the median values for all 143 journals in the JCR category oncology in 2008 (2.66, 0.01, respectively), and alludes to the quality of output in this subject area.

There are a number of limitations to this work. Output from 1974 (n = 352, 0.2% of total output) was accidentally excluded during data collection, and hence, was not included in the subsequent analysis. In addition, this study has focused on entries contained in the Web of Science only, and it should be noted that the employment of other databases including PubMed and Scopus may have yielded slightly different results. That said, Web of Science covers the oldest publications with archived records back to 1900 [25], and should provide an accurate overview of output over the entire study period. Finally, while we have provided an overview of geographic output on breast cancer, we have not related our findings to underlying socio-economic and demographic variables, and clearly this would be an interesting future avenue for investigation.

Conclusions

This work represents the first bibliometric assessment of research quantity and quality in breast cancer-associated literature. The results have demonstrated the ongoing expansion of that literature, while also identifying the key nations and journals involved in its production over the past half-century. In an era when bibliometric indicators are increasingly being employed in the assessment of individual, institutional and national performance, these findings should provide useful information for those tasked with improving that performance.