“There Ain’t No Such Thing As A Free Lunch.” Robert Heinlein in “The Moon is a Harsh Mistress”, popularized by Milton Friedman.

Introduction

Committees assessing academics in medicine for promotion or tenure claim to use a rational process that relies heavily on indicators of scientific productivity and impact (i.e. the number of published papers and their citation rate), usually measured using two proxies: (a) the journal impact factor (JIF); (b) the Hirsch Index (h-index, the maximum number of published papers h that have each been cited at least h times) (Hirsch, 2005). The JIF has poor specificity and sensitivity for individual article quality. Authoritative statements, shared by numerous academic institutions, such as the San Francisco Declaration on Research Assessment (DORA), recommend discontinuing its use (Brembs et al., 2013; Hatch & Curry, 2020). The h-index also has severe limitations, such as not accounting for differences between fields or authorship contributions (Bornmann & Daniel, 2009). It is further limited by its retrospective nature, requiring many years of intensive work, publishing and an accumulation of citations to reach the higher values.

Furthermore, societal contributions should be crucial in evaluating academics, but there is presently no consensus as to how to reliably quantify these contributions. Given the ubiquitous societal consensus on the pursuit of profit, an unmediated measure in academic medicine could be derived from how much the industry invests in regaling a clinical researcher with rewards like a leisurely meal in a Michelin-starred restaurant or a relaxing stay in a high-end resort hotel. In return, the researcher, wittingly or unwittingly, may tend to lend his or herself to the bolstering of key marketing goals, for instance by not reporting (Turner et al., 2008), selectively reporting (Vedula et al., 2009) or spinning inconvenient research results (Lundh et al., 2017). In addition, the investment can sometimes yield even higher returns, as researchers that become key opinion leaders (KOL) contribute to developing clinical guidelines (Clinckemaillie et al., 2022) and thus have an even larger societal impact by orienting clinical practice to align with the industry’s financial objectives.

We thus developed a new indicator applied to medicine, the fl-index; fl for “Free lunches”—in reference to the “no Free Lunch” campaign (Abbasi & Smith, 2003)—measuring the total value of financial gifts, including meals, from any industry marketing products for human use and health purposes. Ideally, this index, ultimately a marker of integrity and research independence, should be unrelated or even inversely related to the standard for academic success. The alternative would imply that the industry, used to aiming for high returns on investment, is deliberately targeting the “crème de la crème”, the elite of the academic establishment and/or that the currently-used markers of academic accomplishment are largely manipulable and arbitrary.

We investigated the relationship between the fl-index and the presently-used measures of academic success (number of publications and the h-index). This study focused on French academics, providing an intriguing proof-of-concept, as the French are globally recognized as seasoned “gourmets” (who relish anything from snails to frogs).

Methods

Study design and participants

In France, academic duties require both active clinical activity and research activity. While clinical skills are evaluated qualitatively, the assessment of research activity relies on publication metrics.

Two researchers (PES and TC) identified French professors of medicine and associate professors according to their academic discipline from university lists and the “Annuaire Santé” of the Department of Health which details medical specialties. Academic disciplines were categorized according to the national organization (Conseil National des Universités) in charge of hiring, promotion and tenure of academics in France.

From an existing database of nearly all French academics’ administrative data matched with Web of Science publication data (Carayol & Carpentier, 2021; Carayol & Lanoë, 2017), we extracted: gender, the year of the first paper (an indicator of seniority) and quantitative indicators of productivity for the 2014–2019 period (h-index increase over the period—defined as the difference between the h-index at the start and at the end of the period), the number of publications and the number of citations for these publications.

The Department of Health’s open access database (https://www.transparence.sante.gouv.fr/), established according to the French Sunshine Act documents three types of financial payments by the industry in 3 respective tables (advantages, special agreements (i.e. conventions) and remunerations). For each academic, we retrieved the amount of money declared by the industry during the 2014–2019 period exclusively for direct categories of gifts (e.g. meals or accommodation, and travel) to healthcare professionals, i.e. the “advantage” table. We did not use the tables reporting “conventions” and “remunerations”, because these are not complete regarding payments, as the industry often refers to the contractual obligation to ensure business confidentiality. We retrieved the financial data concerning gifts on May 18, 2020 from EurosForDocs (https://www.eurosfordocs.fr/metabase/dashboard/2) a website created by Regards Citoyens (https://www.regardscitoyens.org/) a non-governmental organization that manages the official database and enables more user-friendly extractions. The linking of databases was automatized and completed by manual checks (by PES and TC) exploring the possibility of misidentification.

Indexes

The fl-index was defined as the sum of gifts received by each academic in the 2014–2019 period. As no gold standard for high scientific productivity and impact exists, we considered a straightforward, relevant reference: being in the top 25% scientists in term of increase in the h-index during the period 2014–2019. As institutions need to make decisions under constraints of limited time and budgets, this reflects the fact that in the current system only top researchers can actually expect promotion. Even if other indicators are perhaps more accepted by the scientometrics community (e.g. the average number of citations normalized by field and publication year, the number or percentage of publications in the top 10% most highly cited publications worldwide normalized by field and publication year), the h-index is widely known and in used in academia.

Outcomes and analysis

First, a descriptive analysis was performed, consisting of numbers and percentages for categorical outcomes, and median (interquartile range, IQR) for quantitative outcomes. Then Pearson’s correlations were used to explore the association between indicators and several characteristics of the academics, after logarithmic transformation (index value + 1€), if needed.

Receiver Operating Characteristic (ROC) curves were plotted and the optimal threshold was determined according to Youden’s J measure (Youden, 1950). The main outcomes for this study were the diagnostic properties (sensitivity, specificity, positive predictive value, negative predictive value) of the fl-index to objectively reflect high scientific productivity. An exploratory analysis of the effect of gender, seniority (year of first paper) and academic degree (MDs (Medical Doctors) versus non-MDs) on the fl-index association with h-index increases was performed by linear mixed model, using a random intercept for the academic discipline. Then, to account for possible variations according to subspecialty, subgroup analyses (fl-index, h-index, correlation between fl-index and h-index, optimal threshold) were explored by academic discipline separately. 95% confidence intervals (95% CI) for test parameters were estimated using a bootstrap procedure.

Statistical analyses were performed using the open source statistical software R (R Development Core Team), packages pROc and tidyverse (R: A Language & Environment for Statistical Computing, 2017; Robin et al., 2011; Wickham et al., 2019).

Reporting and ethics

This study is reported according to the Standards for Reporting Diagnostic Accuracy (Bossuyt et al., 2015). The project was based on a protocol that has been approved by the CNGE [Collège National des Généralistes Enseignants] Ethics Committee (Approval Number 110321260) on 03/26/2021. The project used publicly available data, and procedures for the protection of persons complied with the data protection regulations (GDPR). It was registered on 03/27/2021 on the Open Science Framework prior to any data collection (osf.io/7d4bk) and the protocol was publicly available online. The protocol insufficiently defined the primary outcome of the h-index over the study period and was amended, before the automatic extraction from Web of Science, to become the increase in the h-index over the period. We clarified the definition as the difference between h-index at the start and at the end of the period when we performed the automatic extraction from Web of Science. We explored the number of gifts as a new secondary outcome and we distinguished between MD and non-MD academics.

Patient and public involvement

No patients were involved in the research and it is unlikely that patients share meals with the academics. However, it is possible that an fl-index could make sense for the growing number of patient groups sponsored by the industry (McCoy et al., 2017).

Results

Participants

We identified 4,320 academics from 28 (out of 37) of the French Schools of Medicine that publicly list academics. Among them, 354 were not registered as healthcare professionals and could not be found in the Sunshine act database. An additional 30 were neither professors nor associate professors.

3,936 academics were included in the main analysis. Of these, 513 had no productive metrics from an automatic extraction from Web of Science. As this could have been due to either a mismatch in name or to hiring on the grounds of clinical but not scientific background, these academics were retained in the main analysis, and excluded from a sensitivity analysis performed on the 3,423 remaining academics. Web Appendix 1 summarizes the steps leading to the selection of the academics.

The total value of gifts to these 3,936 academics reached a total of 77.8 M€ over 5 years. The median value of individual gifts was 60 € (IQR: 30€ and 209€) and the median number of gifts was 41 (IQR: 8 and 120). The distribution of the values of individual gifts classified by type of gift is presented in Web Appendixes 2, 3. The correlation between the number of gifts and the fl-index was 0.88 [95% CI 0.88 to 0.89].

334 (8.5%) academics had no history of gifts from the industry. These academics were more frequently women, non-MDs or general practitioners, and they scored lower on productivity metrics (see Web Appendix 4). 908 (24%) academics had no increase in the h-index over the period 2014–2019. 120 (3%) academics neither received gifts, nor had an h-index increase in the period.

The fl-index as a marker of academic impact

The median h-index increase was 4 (IQR 1–8): an increase ≥ 8 defined high scientific productivity and impact for academics over the 2014–2019 period. The median fl-index value was 6,264 € (IQR: 603€ to 24,793€).

A positive correlation was found between the fl-index and the usual scientific publication metrics (Fig. 1), with correlation coefficients at 0.31 (95% CI 0.29 to 0.34) between the fl-index and the increase in the h-index, 0.32 (95% CI 0.30 to 0.35) between the fl-index and the number of publications and 0.41 (95% CI 0.38 to 0.47) between the fl-index and the number of citations. The correlation between the fl-index and increase in the h-index was 0.32 [95% CI 0.29 to 0.34] and 0.17 [95% CI 0.04 to 0.28] respectively for MD and non-MD academics (Web Appendix 5 details the differences between MD and non-MD academics). The association between the fl-index and the h-index persisted after controlling for seniority, gender, specialty, and degree (MD or not) (Web Appendix 6).

Fig. 1
figure 1

Relationship between the fl-index and increase in the h-index over the period 2014–2019, published papers and citations with linear regression plots. Coloring indicates density

The area under the ROC curve for the fl-index was 0.66 (95% CI 0.64 to 0.68). A threshold of 7,700 € was determined, resulting in a sensitivity to predict high productivity and impact of 65% (95% CI 61 to 67%), a specificity of 59% (95% CI 57 to 61%), a positive predictive value of 35% (95% CI 33 to 37%) and a negative predictive value of 83% (95% CI 81 to 84%).

Figure 2 and Web Appendix 7 present the analysis for subgroups of academic disciplines. There were considerable differences across medical disciplines, with correlations ranging from 0.12 (Morphology and morphogenesis) to 0.51 (Internal medicine, geriatrics, general surgery and general medicine), and with the median fl-index ranging from 37 € (Public health, environment and society) to 30,404 € (Cardiorespiratory and vascular pathologies). Importantly, the best correlations and the highest values for the fl-index were observed for clinical disciplines. Similar results were observed in the sensitivity analysis (Web Appendix 8).

Fig. 2
figure 2

Subgroup analyses per medical discipline: correlations, fl-index distribution (with the identified threshold and 95% IC) and h-index distribution

Discussion

Statement of the main findings

Overall, the correlation between the fl-index and an increase in the h-index was modest. The fl-index clearly cannot be used as a surrogate for academic success as gauged by productivity-based metrics. After all, medical doctors often receive lunches, not so much to encourage them to publish, as to encourage them to prescribe the medications/devices that the company is producing (Goupil et al., 2019). We nevertheless evidenced positive correlations across all academic disciplines (except morphology and morphogenesis). Despite heterogeneity, the most robust results were observed in the clinical fields. This unexpected property of the fl-index is in line with recent findings showing that the industry favors spending on KOLs with an impact on patient care (Clinckemaillie et al., 2022). The heterogeneity in the fl-index across fields suggest that, with the exception of general practitioners, MD academics differ in the sensitivity of their taste buds and related opportunities to delight in fine French food for free.

Furthermore, in a dystopic future, the fl-index could even be used to choose between two competing academics with similar productivity metrics. Indeed, assessment committees are interested not only in scientific productivity and impact, but also in scientific influence, which the fl-index could measure more efficiently than the h-index by capturing particular interactions within networks of researchers with financial conflicts of interest. In addition, while h-is built on citations corresponding to somewhat dated publications, fl-rather captures contemporary behaviors, which could have better prognostic properties for future academic accomplishments. But of course, this intriguing conjecture requires future research (e.g. by considering a potential time lag between the two indicators instead of using the same period for both type of measures), which we are confident will be highly cited, thereby further boosting some h-indexes.

Furthermore, the large variation in the fl-index observed across fields underscores the fact that different disciplines do not have the same “value” (or, more accurately, market value). Therefore, our findings could provide some guidance for future residents as to what specialties to choose (or to avoid), depending on whether they are more eager to produce scientific articles or to enjoy an affluent lifestyle perceived as well-deserved. Indeed, one could simply check the baseline h- and fl-indices of the specialty to assist an informed choice.

These bleak implications are made possible by the crude, tractable and ultimately meaningless nature of productivity-based metrics, which incentivize more (not “less”) research while failing as reliable proxies for both research quality (“better research”) and societal impact (“performed for the right reasons”) (Altman, 1994). Undeniably, the fl- and h-indexes share some caveats. Price tags unfortunately do not always reflect quality. Indeed, a high h-index might not even be related to the conduct of good quality research, as shown for other similarly revered productivity metrics, like the JIF and citation counts (Dougherty & Horne, 2022).

Nonetheless, importantly, the fl-index is easy to measure, as French law compels all companies to declare gifts to medical doctors. It is also easy to interpret, since the fl-index is expressed in the universal language of money. Furthermore, it is less susceptible to manipulation, as it is more difficult (and costlier) to invite oneself to a restaurant than to cite oneself in an article for publication. Invitations extended by a pharmaceutical company to an exquisite restaurant or a gorgeous island resort are not just an indicator of refined taste, but also of benevolence and open-mindedness towards genuine medical innovations, such as esketamine, brexanolone (Cristea & Naudet, 2019), eteplirsen (Kesselheim & Avorn, 2016), or aducanumab (Walsh et al., 2021). These are small steps forward (or, if one dispenses with charity, several steps backward) for evidence-based medicine, but leaps forward for commercial endeavors. In this highly competitive marketplace, also flooded with “me-too” drugs, gifts to KOLs may have become a crucial strategy for drugs to gain attention, sell more and make investors and company boards richer and happier. The fl-index is also more stable, since it is harder for academics to manipulate drug companies to pay them (though it appears to be pretty easy the other way around).

Conversely, citation-based measures can be hijacked and various problematic behaviors of this nature have been identified (Horne et al., 2009). Salami slicing, p-Hacking, and self-citations are a few examples of practices that can be used to improve productivity indicators, although such practices have been shown to be big threats to research reproducibility (Munafò et al., 2017). The present paper is an eminent example: it includes many subgroup analyses, targets a high-impact-factor journal, and is a humble attempt to boost our h-index. The ounce of integrity we still possessed led us to perform a sensitivity analysis to consider the data quality issues inherent in big data analysis. Luckily, the results proved robust.

Strengths and weaknesses of the study

The fl-index underestimates academics’ market value, as the French databases for “gifts received” excludes contracts such as scientific presentations, “training”, advice, consultancies, and various collaborations in clinical research or seeding (Braillon, 2014). However, it could also overestimate certain values of fl-as the data is derived from the declarations of the industry itself with possible typos.

Some mismatches and selection biases are possible, as we were not able to identify academics from 9 universities. We did not control for geographical differences. However, we included towns like Lyon which is well-known for its gastronomy and Bordeaux which is well known for its wine. Wine stimulates fruitful collaborations with academics and the industry knows this well, for instance a California winery, Ridge Vineyards, is a subsidiary company of Otsuka Pharmaceutical, a company that markets an alcohol-weaning pill for alcohol dependence in Japan (Miyata et al., 2019).

We only explored French academics and these results may not apply to other countries, all the more as French academics are small players. In contrast, the Yale School of Medicine Dean Robert Alpern received $648,183 from the pharmaceutical companies Abbott Laboratories and AbbVie, Inc in 2018, including about $162,000 for meals, drinks, travel and accommodation (Peryer, 2018). Last, the fl-index cannot be computed in countries with weak or no legislation, but a passion for joie de vivre, like good food, wine or opportunities to unwind in breathtaking scenery. Still, the fl-index could be externally validated in settings comparable to the situation in France, such as Italy, which has at last approved its own Sunshine act. It would also be important to explore the congruence of fl-index values with the actual conflict of interest disclosures, which are of course diligently (not) reported in their scientific productions (Okike et al., 2009). Still, there is no reason to despair if the fl-index turns out to be linked to lack of integrity or transparency, as revolving doors ensure a smooth communication between academic medicine and industry (Thomas & Ornstein, 2020).

Strengths and weaknesses in relation to other studies, discussing marked differences in results

Associations between industry funding and scientific productivity have previously been shown for neurosurgeons (Eloy et al., 2017) and oncologists (Kaestner et al., 2018). Yet the immense potential of industry funding as a tool for boosting academic careers has remained unexplored, even in major initiatives proposing to reform the academic reward system, for instance by adopting “responsible indicators for assessing scientists” (Moher et al., 2018). Finally, we did not explore whether ties with the industry precede or follow tenure. It is therefore unclear whether the industry presciently invests in future leaders or just resigns itself to feeding those already appointed.

Ideas and speculation

These preliminary findings provide avenues for improving academic assessment for key decisions like promotion and tenure. The usual “disclosure of conflict of interests” could be replaced by the details of “free lunch interests”. Indeed, the campaign lobbying for “No Free Lunches” (Abbasi & Smith, 2003) was a flash in the pan, and faced well-deserved scorn for such a tasteless and puritanical proposal. Finally, one cannot ignore the fact that the sponsorship of trials by the industry is associated with a reduced likelihood of reporting unfavorable results (Friedberg et al., 1999). Accordingly, it is likely that academics wined and dined by the industry will produce more favorable results. This fruitful partnership can thus foster major breakthroughs, if not for patients and evidence-based medicine, at least for companies and their profit margins. Yet it is often met with disdain and public backlash, most probably owing to the envy on the part of researchers not worthy of enough of “gifts” to reach high fl-indexes. Less cynically, if the standard for academic success is founded on crude indicators that may even positively correlate with the amount of gratifications received for leisure purposes from a profit-driven industry, perhaps the dystopian future that this paper lays out (and desperately tries to prevent) is not so remote. Sycophantic adoption of productivity-based metrics for all crucial decisions in assessing scientists prepares the ground for the next level: first, academics are defined by a meaningless number, and next this number is set to become a price-tag.