Introduction

Any reasonable observer might assume that the 100 most cited articles in surgery represent classic, landmark, or marquee communications. Certainly, the number of times articles are cited is used widely to measure the impact of journals and assess the quality of authors’ contributions. Yet it has been reported that among such best seller lists exist articles relating to topics that were “hot” or popular at one time and then faded. As time passes, even true classics are arguably cited less frequently, because their substance has been absorbed into the received wisdom of the literature—a phenomenon described as obliteration by incorporation.

A reference is the acknowledgment that one article gives to another; a citation is the acknowledgement that one article receives from another. Citation analysis is that area of bibliometrics that deals with the study of these relations. Citation analysis involves ranking and evaluating an article or journal based on the number of citations it receives. The establishment of a citation rank list identifies published work that has had the greatest intellectual influence. In addition to determining the most frequently cited articles, this analysis is also used to rank journals in terms of impact [1]. Many medical specialties have utilized citation rank analysis to identify the most influential papers in their field which include trauma and orthopaedic surgery [2], general surgery [3, 4], emergency general surgery [5], and oncology [6].

Citations take time to accumulate, and other, faster assessment means have emerged recently which has led to the development of alternative metrics, or “Altmetrics”. These extend the concept of citation beyond references in other scientific papers; by recording, for example, how often a paper is downloaded, or when the outcome of a clinical trial is used to develop guidelines for doctors, or if a piece of work is included in a course curriculum. To date, no study has undertaken to determine the most influential articles in the global field of surgery and to compare the relative value of citation number, with level of evidence or Altmetric score (AS). The aim of this study, therefore, was to determine the areas of translational surgical research that have been most influential in driving advances in the art and science of surgery, not only to identify what makes a surgical article citable, but also to develop what might arguably constitute an all-time surgical must-do reading list. The hypotheses were: firstly, higher citation number and rank would correlate positively with higher levels of evidence than lower citation number and rank; secondly, AS and rank would correlate with citation number and rank, after a certain critical publication date, given that organisations such as “twitter” have only been in existence since 2006.

Methods

A search of the Thomson Reuters Web of Science citation-indexing database and research platform was completed using the search term Surgery. The returned dataset was filtered to include only English-language and full articles and sorted by number of citations; a method initially developed by Paladugu et al. [3]. The 100 most cited articles were identified from the large number of manuscripts returned. The dataset was then further evaluated examining title, first and senior author, institution and department of the first author, topic, specialty, year of publication, and the country of origin. The 5-year impact factor (for the year 2015) of each journal publishing the articles was recorded. The quality of evidence contained within the articles was assessed according to the Sackett scoring system [7] and the Oxford Evidence Based Medicine scoring system [8]. Altmetric scores were obtained by downloading the “Altmetric it” function from the Altmetric.com website (https://www.altmetric.com/products/free-tools/bookmarklet/) and analysed by utilising the journal article page containing the doi reference number.

Results

The Web of Science search returned 286,122 full-length, English-language articles. Table 1 lists the 100 most cited articles [9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108]. The number of citations ranged from 5746 for Dindo et al. [9] to 446 for Kennedy et al. [108]. The median citation was 574 [interquartile range (IQR) 354.8–792.3], which was not normally distributed. The oldest article featured in the top 100 was by Bigelow et al. [69]. The most recent article was by Clavien et al. [13].

Table 1 The top 100 cited papers in surgery

The 100 most cited articles were published in 30 journals with the number of articles per journal ranging from 1 to 32 (Table 2). Annals of Surgery not only published the most articles (n = 32), but also received the most citations (26,457). Annals of Surgery had the highest impact factor of 8.6, with a 5-year impact factor of 8.7.

Table 2 Journals with the top 100 cited surgery articles

The country and year with the most articles in the top 100 was the USA with 50 and 1999 (n = 11), followed by France with 8 and 2004 (n = 9). The institution with the most citations was the University of Zurich with 7644 across 3 articles (supplementary Table 1). The Memorial Sloan–Kettering Cancer Center and Washington University had the highest number of articles in the top 100 with 4 (supplementary Table 1). Seven authors had 2 first author articles in the top 100 with the highest citation index of 2312.

The commonest related specialty to feature in the top 100 was hepato-pancreatico-biliary surgery with 15 articles. This was followed by Trauma & Orthopaedic surgery (n = 13), and Cardiothoracic surgery (n = 11), respectively (supplementary Table 2). The specialties with the fewest articles in the top 100 were Ophthalmology, Plastic surgery, and Urology, with one each. The most common topic to feature in the top 100 was management of surgical disease with 53 articles, which included 14 (26%) randomised control trials (supplementary Table 3). The second commonest topic was the identification, classification and management of surgical complications with 16 articles.

Of the 14 randomised control trials, 8 related to the management of cancer, 2 related to the management of aneurysmal disease, 2 related to the use of laparoscopic bariatric procedures, 1 article related to the management of post-operative adhesions, and 1 compared autologous chondrocyte implantation with arthroscopic microfracture in patients with full-thickness traumatic defects of the knee. Of those articles related to cancer, colorectal cancer (n = 12) was the most prevalent followed by gastric cancer (n = 4). The median citation of the clinical trials was 504 (IQR 351.0–656.0) with a range of 687 to 446 compared with 588.0 (IQR 336.3–839.8) with a range of 5746 to 446, p = 0.031 for non clinical trials (supplementary figure 1).

Evidence levels of the included articles were initially scored using the Sackett scoring method [7]. Three were level 1 evidence, 12 were level 2 evidence, 18 were level 3 evidence, 25 were level 4 evidence, 28 were level 5 evidence, and 14 were not scored as they were guidelines or consensus statements. There was no relationship between the quality of evidence and the median number of citations received (p = 0.674, supplementary figure 2). The median number of citations received for each evidence level was: level 1 studies 488.00 (range 486.00–516.00), level 2 584.50 (IQR 449.00–766.25), level 3 604.50 (IQR 458.00–879.25), level 4 556.00 (IQR 446.00–798.00) and level 5 554.50 (IQR 446–919.50) (supplementary figure 2). A possible limitation of the Sackett scoring system is that a well-designed prognostic study is still a cohort study that is considered low-level evidence. Therefore, the studies were also scored using the Oxford Evidence Based Medicine scoring system [8]. Studies were grouped as either therapeutic/aetiology (n = 69) or prognostic (n = 25). With regard to therapeutic/aetiology studies, there was no difference in the median number of citations received for each evidence level. The median for level 1 (n = 10) was 546.00, level 2 (n = 8) was 559.00, level 3 (n = 1) was 587.00, level 4 (n = 40) was 587.50, and level 5 (n = 10) was 668.00 (p = 0.444) (Fig. 1). For prognostic studies, there was a difference in the median number of citations received for each evidence level. The median for level 1 (n = 3) was 853.00, level 2 (n = 12) was 525.00, level 3 (n = 2) was 498.00, level 4 (n = 4) was 655.50, and level 5 (n = 4) was 549.50 (p = 0.393) (Fig. 1).

Fig. 1
figure 1

The relationship between evidence level and number of citations in manuscripts stratified by study type. *Therapeutic/aetiology studies p = 0.444 Kruskal–Wallis test. Prognostic studies p = 0.393 Kruskal–Wallis test

To test for a correlation between citations and evidence levels, the number of citations variable was grouped into deciles to meet the assumptions of the Spearman’s correlation coefficient (SCC) test. There was no significant correlation between number of citations and the Oxford evidence level (SCC 0.094, p = 0.352). When therapeutic/aetiology studies were analysed, there was a weak statistical association between low evidence level and higher citation number (SCC 0.233, p = 0.054); however, this was not evident for prognostic studies (RCC − 0.012, p = 0.955).

A possible limitation of this type of study is that historical articles may accrue a larger number of citations despite lacking the impact of newer articles. To control for this, the number of citations were divided by the number of years since publication to give a citation rate index (CRI) (Table 3) [9,10,11,12,13, 15, 16, 19, 33, 50]. The CRI for the top 10 articles ranged from 442.00 for Dindo et al. [9] to 61.40 for Wente et al. [50]. The USA had the most articles in the top 10 CRI with 4, followed by Sweden and Switzerland with 2 each. Management of complications (n = 3) and consensus statements on the management of peripheral vascular disease (n = 3) were the commonest topics in the top 10 CRI. The citation rate analysis for clinical trials ranged from 53.91 to 18.43 compared with 442.00 to 8.03 for all 100 articles studied.

Table 3 Top 10 articles with the highest citation rate

Altmetric scores ranged from 0 to 53 (median 0) with 40 articles scoring ≥ 1.0. The article with the highest AS was by Bigelow et al. [69] (Table 4). The USA had the most articles in the top 10 AS with 4, followed by Canada and Sweden with 2 each. Clinical guidelines (n = 3) were the commonest topic in the top 10 AS, followed by characterisation and management of complications (n = 2). Articles published from the year 2000 onwards had a significantly higher AS (p = 0.018) with a median of 1.0 (IQR 0.0–9.0), compared with a median of 0.0 (0.0–1.0) in articles published before 2000 (Fig. 2). AS correlated with citation rate index (r = 0.266, p = 0.008), but not with total number of citations (r = 0.179, p = 0.079). In articles published after 2000, AS was associated with number of citations (r = 0.461, p = 0.001), and citation rate index (r = 0.455, p = 0.002). This correlation was not evident in articles published before 2000 for number of citations (r = 0.085, p = 0.548) or for citation rate index (r = 0.075, p = 0.595, Fig. 3). Twenty-three articles appeared in both the top 40 for citations and AS. AS was not associated with evidence level when grouped as therapeutic/aetiology studies (SCC = 0.149, p = 0.244), prognostic studies (SCC − 0.277, p = 0.197), or journal impact factor (r = 0.160, p = 0.118).

Table 4 Top 10 articles with the highest Altmetric score
Fig. 2
figure 2

The distribution of Altmetric scores in articles published pre- and post-2000. *Kruskal–Wallis test p = 0.018

Fig. 3
figure 3

The relationship between Altmetric score, number of citations, citation rate index stratified by pre- and post-2000 publication

Discussion

Periodical journals have been the principal means of disseminating scientific research since the seventeenth century, with the oldest still in existence, the Philosophical Transactions of the Royal Society, appearing first in 1665. Over the intervening three and a half centuries, journals have established conventions for publication; insisting on independent (and usually anonymous) peer review of submissions, intended to preserve the integrity of the scientific process, but have come under increasing recent scrutiny. This bibliometric and Altmetric analysis is the first of its kind to identify the authors and themes that have had the greatest impact within the global arena of surgery. Several different pathological conditions and surgical interventions were included within this diverse field, as demonstrated by the top 100 articles relating to some six different anatomical and physiological systems and subject areas. A surprising finding was that no correlation was found between citation number and level of evidence, as defined by objective evidence based medicine references, which may seem counterintuitive, yet likely represents the challenge of linking impact with citation and research quality. Altmetric scores represent the latest emerging development and, in a decade or less, have clearly had significant impact because AS was significantly associated with both citation number and citation rate.

Citation rates were higher for recently published articles, which imply that these will become more influential clinically within the next 5–10 years. Influential articles are more likely to be cited by the scientific community, and these citations form the basis of the impact factor; which quantifies the average citations of the articles published within the journal during a specific period. Journals with a higher impact factor are recognised as being of a higher quality and more likely to contain influential articles. The majority of articles were published in journals with an impact factor of less than 5.7. Annals of Surgery published the most articles ranked within the upper 100 (n = 32) and has historically been the surgical journal ranked highest with an impact factor of 8.7. Journals with high-impact factors (54.42–29.35), NEJM, Lancet, JAMA, and Nature Genetics, were not represented in this analysis, yet the citations accrued by the articles identified were similar to those reported in other bibliometric analyses; colorectal cancer (7850–989) [109], gastric cancer (2893–299) [4], and oesophageal cancer (1833–293) [110]. A possible explanation relates to the novelty of the results, which may be classified relating to the scientific community in general, or confined to the field studied. Findings already reported in other arenas may then be re-established in the arena of surgery, and such articles are unlikely to be published in high-impact scientific journals, yet within the context of this study, remain likely to be considered influential.

There are a number of potential limitations inherent within this study related to several types of bias, which may confound the results. Disproportionate multiple citations may result from institutional bias, self-citation, powerful person, or language bias. Geographically, the high rate of publications from the USA has been reported previously [4, 5], arguably reflecting a preference by USA institutions to cite research performed locally. This effect may have been amplified by limiting the search to English-language articles. Moreover, older articles have a greater citation potential simply because of the length of time they have been in the public domain, rather than representing a true measure of impact. In an attempt to control for this, the number of citations was divided by the number of years since publication: a citation rate index. Despite these measures, the lead-time for publication of citing articles may still result in under representation of more recent articles. The inclusion of only first and senior authors and the institution of the first author is also a possible limitation of the study, as it is possible that several first authors may have co-authored other articles, and are therefore under represented. The search terms used also represent an inherent weakness associated with interrogating bibliographic repositories, in that no combination of search terms is perfect. To identifying all papers relevant to surgery would arguably require an almost infinite number of search terms, which is not pragmatic. While the relative limitation of the search term ‘surgery’ is acknowledged, it is unlikely that this methodology has had a material effect on the ability of the study to address the a priori hypotheses. In contrast, the study has strengths. The many-landmark, randomized trials of surgical procedures, published in scientific and general medical journals, were included by the search methodology, which included the search term surgery, and encompassed non-surgical journals such as Nature, Science, NEJM, and the Lancet. The use of the search term surgery, more so than procedure or intervention, ensures that the results retain relevance to the field of surgery.

Another role that academic journals have come to play, and not part of their original job-description of disseminating scientific results, is as a marker of a researcher’s prowess, and thus a surrogate determinant of academic career potential. Publication in a blue-chip title such as Nature or Science, without question represents a feather in the cap of any clinician, and therefore unlikely to be overlooked by any academically focused appointment committee. An article’s true quality is better revealed by the number of times it is cited elsewhere (ideally not self-citation), but citations take time to accumulate, and other, faster assessment means would be welcome, which has inevitably led to the development of alternative metrics in this regard, now termed “altimetry”. These extend the concept of citation beyond references in other scientific papers; by recording, for example, how often a paper is downloaded, or when the outcome of a clinical trial is used to develop guidelines for doctors, or if a piece of work is included in a course curriculum. Altmetric.com, based in London, was one of the first companies to work in this area, and since 2011, it has tracked mentions of published papers in sources ranging from social media and Wikipedia, to policy documents published by government departments. A rival firm, Plum Analytics, in Philadelphia, tracks mentions, downloads, clicks and the like, of everything from preprints (papers that have been made publicly available, but are not yet formally published), and sets of raw data, to non-commercial computer programs which investigators have written to assist their own endeavours. Using Altmetrics should therefore indicate the importance of a wider range of research-related activities than citations provide, and moreover, do so faster. Plum Analytics was bought by Elsevier in 2017, one of the world’s largest scientific publishers; suggesting that Altmetrics may also prove profitable, as well as useful.

Findings of medical research have long been considered to be disseminated too slowly, but that is about to change. In January 2017 the Gates Foundation introduced a policy, that research it supports (it is the world’s biggest source of charitable money for scientific endeavours, to the tune of some $4bn a year) must, when published, be available to all, and followed this by announcing that it will foot the cost of placing such research in one repository of freely available articles, meaning that Gates-sponsored research cannot be invoiced. Such a manoeuvre would carry the caveat that recipients of Gates’ financial support can no longer offer their output to journals such as Nature, or the New England Journal of Medicine, since accessing the content of these publications has associated cost. Their prestige is based on their ability to pick and publish only the best. If some work is out of bounds to these journals, no matter how good, that will risk diminishing their quality, and arguably those journals’ businesses could suffer and even crumble. Moreover, by actively directing the beneficiaries of its patronage towards the repository in question, set up last year by the Wellcome Trust (after Gates, the world’s second-largest medical research charity), the foundation is pointing to a specific type of alternative, and a future scientific publication arena that, if not completely journal-free, is likely journal-light.

Conclusion

This list of the top-cited articles in surgery has worth for a number of reasons. It identifies seminal contributions, facilitates the understanding and development of contemporary surgical history; and offers clues regarding what makes an article a likely top-cited classic. To produce such a work, the author or research group must come up with a clinical or nonclinical innovation, observation, or discovery that has a potential long-standing effect on surgical clinical practice. Based on the findings of this study, to be rewarded by citation number, such a scholarly contribution should be published in a high-impact journal, is more likely to be amplified and resonate loudly if it originates from an Anglo-Saxon academic institute, and in view of the dissonance regarding level of evidence, needs to be noticed by surgeons active on social media, implying a deal of good fortune is required. Recent studies have identified Twitter as the most commonly used smartphone application by surgeons, but the rates of engagement are variable between medical specialties; for example colorectal surgeons’ engagement levels lag behind other disciplines. In 2014, only 31% of UK consultant colorectal surgeons were found to have Twitter profiles, compared with higher engagement rates within other surgical sub-specialties such as urology (33%), plastic surgery (22%) and vascular surgery (5%) [111]. To make any best seller list has now become that much more challenging.