Introduction

If results from research were not published, there could be no scientific development: “scientists have to publish their results” (van Raan, 1999, p. 417). Since results need description, explanation, contextualization, and discussion, they are reported in publications that can be read, discussed, cited, and commented by colleagues. Every publication becomes part of the worldwide archive—Karl R. Popper’s world 3 (Popper, 1972)—and is available for further reception and use (in science). Scientists are connected to their research through their publications: research-reporting scientists are therefore perceived as research-active scientists in the community.

Since the beginning of the modern science system, “publication productivity is widely considered an appropriate operationalization of a scientist’s performance” (Sonnert, 1995, p. 36). The popularity of the publication count metric can probably be traced back (besides the necessity to publish research results) to its simple retrieval in multi-disciplinary literature databases (such as Web of Science, Clarivate, or Scopus, Elsevier). Although the number of publishing scientists (Dong et al., 2017) and the number of papers (Bornmann & Mutz, 2015; Bornmann et al., 2021) have grown at an exponential rate, scientific development does not seem to profit from the expansion: the absolute number of disruptive research that entered world 3 has been relatively constant over the years (Park et al., 2023). One reason for the constant number may be that the number of publications a typical scientist produces in one year is relatively constant over the years (Wang & Barabási, 2021). There seems to be a concentration of high (constant) productivity (referring to important contributions to the development of science) on a small group of publishing scientists.

Ioannidis et al. (2014) estimated the existence of around 15,000,000 active (publishing) scientists in the 16-year-period from 1996 to 2011. They found that “only 150,608 (< 1%) of them have published something in each and every year in this 16-year period (uninterrupted, continuous presence … in the literature)”. These few scientists “account for 41.7% of all papers in the same period and 87.1% of all papers with > 1000 citations in the same period”. The results of the study may reveal that scientific development—the regular appearance of new and important scientific findings—is a matter of a small core set of scientists. Other empirical results point out that this core set of high producers is relatively stable composed over the years (including the same scientists; Abramo et al., 2017) whereby this core set also seems to represent eminence in science in general: high producers are characterized by other signs of reputation such as having received important scientific prices and prestigious occupational positions (Li et al., 2019; Wang & Barabási, 2021).

The mathematically defined law for identifying the core set of high producers in science has been introduced by Price (1963): “half the published output in a subject field will be contributed by a highly productive subset of authors approximately equal to the square root of the total number of authors publishing in that area” (Nicholls, 1988, p. 469). The Price law of productivity states that most of the scientists produces only one or two publications, and only a few scientists ten or more publications in a certain period (van Raan, 2019). A similar law for predicting productivity differences between scientists has been published by Lotka (1926). According to Nicholls (1988), the Price law (and also the law by Lotka, 1926) is based on Rousseau’s law from the social science: “any population of size N contains an effective elite of size N0.5” (Rescher, 1978, p. 97). Three theories exist to explain the substantial and skewed productivity differences between scientists (Kwiek, 2015):

(1) The “sacred spark” theory claims that substantial individual differences in motivation, ability, and creativity lead to large differences in productivity. (2) The theory of “accumulative advantage” claims that productivity is a self-referential process: frequent early publications lead to an increasing number in further career trajectories (e.g., because frequent early publications are good starting conditions for later success in science such as later success in science funds). The related “reinforcement” theory states that “scientists who are rewarded are productive, and scientists who are not rewarded become less productive” (Cole & Cole, 1973, p. 114). (3) According to the “utility maximizing” theory, young scientists tend to have a high level of research activities, since they profit from the activities at that career stage (in terms of funds or jobs); eminent scientists tend to reduce these activities, since other tasks have become more important (e.g., mentoring and reviewing), and a significant additional increase in rewards cannot be expected from research in that career stage.

The three theories of research productivity are complementary and focus on different points. It is not clear which theory is valid or not. One can speculate that each theory may reflect one part of reality in science. This paper is intended to introduce a research program for analyzing productivity differences between scientists which can also be used to study the three theoretical perspectives on productivity. This program is based on the Anna Karenina Principle (AKP), which was introduced by Bornmann and Marx (2012) to the sociology of science area. The principle states that success in research is the result of several prerequisites that must be all given. If at least one prerequisite is not given, failure follows, whereby the failure is specific to the set of given and missing prerequisites: every failure has its specific combination of missing (and given) prerequisites.

In this study, the AKP is transferred to the phenomenon of productivity differences between scientists: high productivity is given for the few scientists who fulfill all prerequisites (e.g., high motivation, pronounced creativity, reputational professional position, early important papers in high-impact journals), and low productivity is connected to a specific combination of missing and fulfilled prerequisites. In the following sections, the research program for analyzing productivity differences between scientists is outlined. The program includes the AKP as theoretical principle underlying the program (from Bornmann & Marx, 2012), a mathematical concept explaining skewness (from Shockley, 1957), and statistical methods for empirical productivity analyses (Boolean logit and probit procedures from Braumoeller, 2003).

The Anna Karenina principle (AKP)

The AKP is rooted in the first sentence of Leo Tolstoy’s book Anna Karenina: “Happy families are all alike; every unhappy family is unhappy in its own way” (Tolstoy, 2001, (1875–1877/2001) p. 1). With this sentence, Leo Tolstoy claims that several key factors (prerequisites) must be present for being a happy family (e.g., financial security, good health, and stable relationship of all family members). If at least one key factor is deficient or absent, the family is unhappy. One can imagine that there are several factors required for a family to be happy. Leo Tolstoy claims that each unhappy family is characterized by a certain constellation of given, deficient, and absent factors. This leads to the situation for each unhappy family that it is unhappy in its own specific way.

The use of the AKP as a concept for understanding or explaining failure has a long tradition in science. Diamond (1994, 1997) used the first sentence of Leo Tolstoy’s book to discuss requirements for success in complex undertakings and introduced the expression AKP. He defined the AKP as follows: (1) success is based on several key factors (requirements) that must be fulfilled. (2) Failure in any one of the factors or many factors leads to specific failures of complex undertakings. Each factor from a set of factors is essential for the success of complex undertakings. The undertakings are doomed to fail if at least one factor is lacking. In the area of consumer behavior research or psychology, a related rule to the AKP has been proposed: the conjunctive decision rule (Gilbride & Allenby, 2004). The rule is defined as follows: a person selects only that object among the available objects in a certain decision situation that is found acceptable on the set of all relevant criteria. The conjunctive decision rule is different, e.g., from the compensatory decision rule: the unacceptable rating on one criteria can be compensated by an acceptable rating on another criteria for an object to be successful in a certain decision situation.

The use of the AKP for the explanation of complex undertakings has important implications for the conditions of success and failure in science:

  1. (1)

    Although “we tend to seek easy, single-factor explanations for success” (Diamond, 1994, p. 4), these explanations are usually not useful for complex phenomena or situations. Possible explanations for high productivity in science are good examples in this respect. Since high productivity in science is a complex phenomenon depending on many factors and conditions, the three theories which have been proposed to explain productivity differences in science (“sacred spark”, “accumulative advantage”, and “utility maximizing” theories) will not be sufficient: each theory focusses on only one or a few factors (areas).

  2. (2)

    “Success actually requires avoiding many separate causes of failure” (Diamond, 1997, p. 157); success only happens if failure in all factors is avoided.

  3. (3)

    “No one property [factor] guarantees success, but many can lead to failure” (Shugan, 2007, p. 145).

  4. (4)

    As success requires every factor from a set of factors to be acceptable, and failure only requires at least one unacceptable detail, “favorable outcomes are rare and more informative than unfavorable outcomes” (Shugan & Mitra, 2009, p. 11). This is an important implication for explaining productivity differences in science with the AKP, since favorable outcomes (scientists with an excellent publication record) are rare.

To explain success, the AKP has been used in many fields of research in the past years. For example, Diamond (1994, 1997) applied the AKP to find an answer to the question, “why have so many seemingly suitable, big, wild mammal species, such as zebras and peccaries, never been domesticated, and why were the successful domesticates almost exclusively Eurasian?” (Diamond, 1994, p. 4). Diamond’s AKP theory names several conditions that are necessary conditions for domestication, e.g., no nasty dispositions with tendencies to kill humans and quick growth rates. In another study, Moore (2001) used the AKP in ecological risk assessments: “Following from the Anna Karenina principle, there are many ways to ruin an ecological risk assessment, but only a few pathways to success” (p. 236).

McClay and Balciunas (2005) applied the AKP at explanations in the area of biological control of weeds. Shugan and Mitra (2009) used the AKP in a study on using certain statistics for success metrics: “When environments are adverse (e.g., failure-rich), non-averaging metrics correctly overweight favorable outcomes. We refer to this environmental effect as the Anna Karenina effect, which occurs when less favorable outcomes convey less information” (p. 4). In a recent commentary, Brand and Hardy (2022) deployed the AKP to learn more about the reasons why patients are dissatisfied: “whereas happy patients receive the same procedures for similar indications as unhappy patients, evidence supports a clear association between negative psychological function and worse preoperative and postoperative patient-reported outcome measures” (p. 3207).

Bornmann and Marx (2012) proposed to use the AKP as a concept to explain success in science. The authors explain the way the AKP probably works in science for three central areas in modern science:

  1. (1)

    Peer review of research grant proposals and manuscripts: manuscripts submitted to journals/proposals submitted to funding organizations are only accepted for publication/funding if all prerequisites for publication/funding (formulated by the publisher/funding organization) are fulfilled (Bornmann & Daniel, 2005).

  2. (2)

    Citations of publications: many factors exist with an influence on citation counts (Tahamtan & Bornmann, 2018, 2019). The AKP claims that many citations only then accrue for a paper when several prerequisites are fulfilled such as a publishing author with high reputation and reported research with high quality and interesting findings.

  3. (3)

    Scientific discoveries: the AKP proposes that several prerequisites for research exist that lead to scientific revolutions such as solid evidence, verified evidence from independent research groups, and available techniques for required measurements. In two case studies, Marx and Bornmann (2010) and Marx and Bornmann (2013) identified several prerequisites that were (possibly) necessary for two revolutions in science: the transition from a static to a dynamic universe in cosmology and the emergence of plate tectonics.

Bornmann and Marx (2012) presented peer review, citations of publications, and scientific discoveries as example areas for explaining success in science based on the AKP. Many additional areas may exist in science where the AKP is relevant. In this study, the AKP is used to underlie a program for studying productivity differences between individual scientists. Whereas this section focused on the general definition and application of the AKP in science, the next section will deal with the question of how several factors (prerequisites for success) should be combined that only a few ideas, papers or scientists are very successful, and many are not. This question was answered several years ago by William Shockley, who was one of the winners of the Nobel Prize in Physics in 1956.

Explanation by Shockley (1957) of individual productivity differences in research laboratories

Shockley (1957) examined individual productivity differences and published his findings in a paper entitled “on the statistics of individual variations of productivity in research laboratories”. The paper is based on his observation that great individual productivity differences exist “in any large and reasonably homogeneous laboratory, such as, for example, the Los Alamos Scientific Laboratory and the research staff of the Brookhaven National Laboratory” (Shockley, 1957, p. 280). Although Shockley (1957) does not mention the AKP to explain the observed individual productivity differences between scientists, he explains the differences alongside the AKP concept. Shockley (1957) demonstrates how the AKP can be mathematically operationalized to explain productivity differences based on several prerequisites for success.

In his paper, Shockley (1957) starts the discussion about productivity differences by a comparison of individual achievements in science with achievements in areas other than science. In contrast to science, individual achievements in many areas outside science are characterized by similarities and small differences: “very few individuals walk at speeds outside the range of 2 to 5 miles per hour. In competitive activities involving trained and selected people, such as running the mile, the variation is much smaller, the ratio of speed for the mile between world’s record and good high school performance being probably less than 1.5” (Shockley, 1957, p. 281). Whereas in science hyperprolific authors (Ioannidis et al., 2018) exist alongside non-publishing scientists (among active and trained scientists), many competitive activities outside science are characterized by bounded performance differences: “the bounded nature of performance reminds us that it is difficult, if not impossible, to significantly outperform the competition in any domain” (Wang & Barabási, 2021, p. 13). The question thus arises why this difference between science and many areas outside science exists. Shockley (1957) comes up with an elegant solution.

The basis of his explanation for the phenomenon that some scientists are much better than others is as follows: “the large changes in rate of production may be explained in terms of much smaller changes in certain attributes” (Shockley, 1957, p. 285). The publication of a research paper depends on many factors (prerequisites) that a scientist must manage. This approach of explaining productivity differences follows the AKP. Shockley (1957) presents a list of factors that is not ordered by importance and not complete: (1) “ability to think of a good problem, (2) ability to work on it, (3) ability to recognize a worthwhile result, (4) ability to make a decision as to when to stop and write up the results, (5) ability to write adequately, (6) ability to profit constructively from criticism, (7) determination to submit the paper to a journal, (8) persistence in making changes (if necessary as a result of journal action)” (p. 286). Several other factors may be considered here, especially those factors that have become relevant in the current science system. William Shockley wrote about the science system decades ago only a few years after the Second World War.

In a short literature overview, Milojevic et al. (2018) mention some other factors that may be relevant in the modern science system, e.g., number of collaborators (Zalewska-Kurek et al., 2010), prestige of PhD granting and hiring institutions (Li et al., 2019), prestige of the collaborators (advisors) (Hammarfelt et al., 2020; Li et al., 2019), and level of specialization. The results by Li et al. (2020) reveal that especially collaborations with other scientists from the beginning is central for being a successful (publishing) scientist. Ioannidis et al. (2018) asked around 300 authors for possible reasons of their very high productivity (“scientists who publish a paper every five days”). Common themes of 81 authors who replied are: “hard work; love of research; mentorship of very many young researchers; leadership of a research team, or even of many teams; extensive collaboration; working on multiple research areas or in core services; availability of suitable extensive resources and data; culmination of a large project; personal values such as generosity and sharing; experiences growing up; and sleeping only a few hours per day” (p. 168).

Previous studies point out therefore that many factors may be relevant for being a productive scientist. Shockley (1957) proposes that large and skewed distributed productivity differences between the scientists are the results of a multiplicative relationship between the single factors. Since Shockley (1957) mentions eight factors as being relevant for successfully publishing, the probability of a scientist to publish a paper in a certain period is the product of the eight factors that a scientist has to manage. If the factors are conceptually independent from each other, the productivity P of a scientist may be determined by the formula

$$P = F1 \times F2 \times F3 \times F4 \times F5 \times F6 \times F7 \times F8$$
(1)

To demonstrate the enormous influence of each factor in a multiplicative productivity model, Table 1 shows the publication output in percent depending on eight factors. The results in the table point out that only small changes on the factors’ side have an enormous influence on individual publication output and facilitate output outliers. Scientist A is set to 100% in publication output and the publication output of the other scientists (B to E) is compared to the output of scientist A. The extent whether a factor is given for a scientist or not is measured on a scale from 10 to 0 whereas 0 means that a factor is not existent at all and 10 means that a factor is maximally given.

Table 1 Differences in individual productivity depending on eight factors

The only difference between scientist A and scientist B is the reduced value of F8 (from 10 to 5): both scientists have the same value of ten for the remaining seven factors. This little change in only one factor—F8 is half-given for scientist B compared to scientist A—leads to a bisection in publication output. Thus, a small variation in the list of factors has an enormous effect on the individual publication numbers (Shockley, 1957). From the reduction in two factors (see scientist C in comparison to scientist A) from ten to five follows a reduction in output to only one quarter—compared to scientist A. For each paper of scientist C exist four papers of scientist A. As the result for scientist E shows, if at least one factor is not existent at a scientist (even if the scientist has strengths in many factors), not any publication can be expected from the scientist. One may speculate that non-publishing scientists are concerned by a specific, non-existent factor leading to failure.

On the one hand, the different performances of the fictitious scientists in Table 1 reveal the importance and large influence of each factor that may determine the publication process. On the other hand, the results for the scientists demonstrate how outliers in individual productivity emerge: they result from the success in all factors (Wang & Barabási, 2021). Both conclusions are characteristic for processes in which AKP navigates the outcome: the few “happy” scientists are the successful exceptions who can manage all relevant prerequisites; the many (more or less) “unhappy” scientists have specific factors’ patterns that lead to (significantly) reduced publication outputs.

The effect of the AKP on publication productivity differences does not only lead to an advantage in individual productivity for the few scientists. Empirical research suggests that these few scientists are not only happy because of their enormous publication output: there are sequelae. Since publication productivity is a decisive factor for careers success in science, the advantage in publication output leads to advantages in other areas. For example, several empirical studies have been published in the past that demonstrate the correlation between publication output and citation impact (e.g., Abramo et al., 2010; Tabah, 1999): researchers who publish frequently not only “write the best articles” (Hemlin, 1996, p. 236), but very high citation impact can also be expected for their most cited paper (Wang & Barabási, 2021). The results by Nielsen and Andersen (2021) further reveal that advantages of top-performing scientists compared to ordinary scientists have increased in the last years.

In the following, some studies are presented in more detail that demonstrate the strong dependency of output and impact. Diem and Wolter (2013) investigated research performance of education science professors in Switzerland. The authors not only found many professors without any publications and only a few professors with many publications, but also large differences between the professors in citation impact which corresponds with output. Larivière and Costas (2015, 2016) published a global study of individual performance including multiple disciplines. The authors report results that “conform to the Mertonian theory of cumulative advantages … the higher the number of papers an author contributes to, the more he or she gets known and, hence, is likely to attract citations” (Larivière & Costas, 2015, p. 594). According to the results by Costas et al. (2009), researchers profit from an increase in numbers of publications, but differently: “researchers in low field-citation-density regions and those whose impact is below world class tend to benefit the most from an increase in number of publications” (p. 750).

Haslam and Laham (2010) investigated the careers of 85 social-personality psychologists about ten years after receiving the PhD. They found that the “expected quality was maximized at about 30 publications (88th percentile) and declined at higher quantities” (p. 219). If individual productivity growth is examined, “growth is more pronounced for high-impact scientists and is modest for low-impact scientists” (Sinatra et al., 2016, p. 597). The authors examined careers of scientists not only from physics, but also several other disciplines. The more pronounced growth for high-impact scientists may be rooted in advantages in early careers. Lee (2019) found that the factor that “most contributed to explaining the future research performance (i.e. publication numbers) and future research impact (i.e. citation counts of publications) was the number of publications (both journal articles and conference papers) produced by the target scientists in their early career years” (p. 1481).

The results of the empirical studies correlating publication output and citation impact demonstrate that the significant advantage in publication output for a few scientists is probably the key factor for a successful scientific career in general: thus, if we deal with publication output, we speak about careers.

In this section, it has been demonstrated that Shockley (1957) proposed the formula to combine different factors for explaining productivity differences between scientists alongside the AKP (without knowing of the AKP). The next section will focus on the question of how the formula can be applied in the empirical application (based on bibliometric data): which statistical procedure can be used in the program outlined in this paper for studying productivity differences?

The empirical analysis of the Anna Karenina principle (AKP) by Boolean logit and probit procedures

In the previous sections, it has been demonstrated how the success of only a few scientists can be traced back to positive outcomes in several factors that are multiplicatively related. Negative outcomes in at least one factor lead to specific failures in publication output. Since publication output is decisive for a successful career in science, failure in publication output entails failure in other important areas for career success such as citation impact. In this section, a statistical procedure is presented that can be used to investigate whether factors are multiplicatively related with respect to a binary outcome. In case of publication output, the binary outcome could be, e.g., publishing (1) and non-publishing (0) scientists. The binary outcome could be also based on the position of a scientist in a ranking of all scientists in a certain field by publication output: the scientists in the list could belong to the 1% scientists with the highest productivity (1) or not (0). The Boolean logit and probit procedures introduced by Braumoeller (2003, 2004) model the probabilities of the dependent variable (e.g., probability of belonging to the 1% scientists with the highest productivity) “as ordinary logit and probit curves, constructs a likelihood function based on the posited logic of their interaction and the observed dependent variable … and maximizes to obtain coefficient estimates” (Braumoeller, 2004, p. 361).

The Boolean logit and probit procedures have been used in several studies in the past. Kroneberg (2012) applied the procedures to investigate acts of help in the rescue of Jews in World War II. Weingartner (2019) investigated cultural consumption (in the form of opera attendance) that may be explained by two decision processes on the individual level: “first, reflexive weighing of cultural preferences and opportunities, and second, automatic activation of cultural situation models and personal norms” (p. 53). The Boolean probit model revealed that each decision process can be regarded as being a sufficient cause for opera attendance. In another study from the science of science sector, Bornmann and Daniel (2005) investigated the relationship of selection criteria and decisions in the committee peer review process of the Boehringer Ingelheim Fonds (BIF, https://www.bifonds.de). BIF is an international foundation for the promotion of basic research in biomedicine—for awarding long-term fellowships to post-graduate researchers.

For approval or rejection of BIF fellowship applications, three criteria were decisive: the applicant’s track record, the originality of the research project, and the quality of the laboratory in which the applicant wants to pursue the project. A fellowship application could not be accepted if not all three selection criteria were rated positively. No substitute ability was foreseen. Bornmann and Daniel (2005) investigated whether these requirements stand up to empirical testing. The authors used the Boolean probit procedure to model the binary outcome rejection or approval of applications including the three criteria for funding. The results of the procedure showed a strong association between positive ratings on the criteria and approval of the application and between negative ratings on the criteria and rejection of the application. Thus, there seemed to be a conjunctural causation of the decisions.

Following the examples of Bornmann and Daniel (2005) and the other previous studies, the effect of the AKP on individual publication outputs can be empirically analyzed using the Boolean logit and probit procedures. The binary dependent variable could be the information whether a scientist belongs to the 1% scientists in a certain discipline with the most publications. To use the statistical procedure for assessing the probability of belonging to the 1% scientists with the highest productivity, one needs data and information for the factors which may be decisive for publishing papers. Shockley (1957) lists eight factors (see above): (1) “ability to think of a good problem, (2) ability to work on it, (3) ability to recognize a worthwhile result, (4) ability to make a decision as to when to stop and write up the results, (5) ability to write adequately, (6) ability to profit constructively from criticism, (7) determination to submit the paper to a journal, (8) persistence in making changes (if necessary as a result of journal action)” (p. 286). These factors may be difficult to empirically operationalize. In Sect. “Explanation by Shockley (1957) of individual productivity differences in research laboratories”, several other factors have been worked out from empirical studies such as the number of collaborators, the prestige of PhD granting and hiring institutions, and the prestige of the collaborators.

A study for investigating the influence of factor patterns on the productivity of scientists should have two steps: In the first step, it is necessary to identify the factors that are decisive for the productivity in a certain discipline or environment. These factors may be found (i) in empirical studies which investigate productivity differences and their correlates based on publication data or (ii) by surveys of scientists asking to weight predetermined factors by their importance. In the second step, one needs assessments by a sample of scientists with respect to the identified factors in step one. These assessments can be analyzed then with the Boolean logit and probit procedures.

Discussion

The scientific endeavor is characterized by events of great success which are less likely to occur. This applies not only to publication output, but also to many other activities in science: “like the probability of possessing scientific talent, the probability of having an article highly cited is low” (Seglen, 1992, p. 634). With respect to publication output, it can be empirically observed that only a few scientists are able to publish a substantial number of papers every year; most of the scientists have an output of only a few publications or no publications at all. Shockley (1957) found out that “the more or less normal distribution of the logarithm of rate of publication is characteristic of the statistics of the scientific creative process … the rate of publication increases approximately exponentially from individual to individual, taken in order of increasing rate, and that the differences in rate between low and high producers are very large” (Shockley, 1957, p. 281).

Several theories have been proposed in the past to explain the observable differences in publication output: (1) The “sacred spark” theory attributes the output differences to substantial individual differences in scientists’ properties (e.g., motivation or creativity). (2) In the context of the “accumulative advantage” theory, productivity is seen as a self-referential process that starts with (very) early publication success and later success builds on success in (early) previous years in a self-reinforcing process. (3) According to the “utility maximizing” theory, only scientists in certain career phases (mostly early career phases) are interested in maximizing their publication output.

The three theories focus on different aspects of the publication process for the explanation of productivity differences. In this study, a program is proposed for investigating individual publication processes. The AKP has been introduced as a principle in the program that may underly and link the three theories. The AKP focuses on certain prerequisites for publication success whereby the prerequisites can be linked to the three theories. That means that the prerequisites may be determined (i) by the creativity and motivation of the scientist (“sacred spark” theory), (ii) by the number of publications (in reputable journals) of the scientist in the early career (“accumulative advantage” theory), and (iii) by the current career stage of the focal scientist (“utility maximizing” theory). Depending on the context of the research the focal scientist is involved in, further prerequisites can be added (e.g., good knowledge in statistics and excellent writing skills).

Based on observations of scientists in laboratories, Shockley (1957) proposes that the publication output success of a few scientists is the result of positive outcomes in several prerequisites (factors) that are multiplicatively related. If one knows the factors that are important for the publication process in a certain research area, and has the factors’ data for the scientists in that area on hand, one is able to explain differences in publication success and may be able to predict success or failure in publishing papers. Either the negative outcome in at least one factor can lead to an individual failure in publication success, or the failure in the publication process is determined by a certain combination of positive and negative factor outcomes. Shockley (1957) proposed that the factors are combined multiplicatively; it is characteristic for multiplied combinations that they are concerned by great consequences if even one factor is lacking.

The use of the AKP for understanding success in publication outcome can yield important insights in success processes in science because it contradicts common assumptions linked to success (Bornmann & Marx, 2012). Success in the publication process is frequently associated with specific characteristics of the successful scientists and it is emphasized what is unusual about them. The study by Ioannidis et al. (2018) is a good example in this respect: Ioannidis et al. (2018) asked very productive authors for possible reasons of their very high productivity (“scientists who publish a paper every five days”). When examining success in publication outcome, we mostly do not think about a lack of success, although one would possibly learn more about publication success by interviewing non-successful scientists: these scientists could report about the factors that were specifically missing for their success. The consideration of AKP in thinking about success in science changes the focus from the successful to the non-successful scientists who are not successful each in their uniqueness.

Being successful in publication output is decisive for a successful career in science in general. Research revealed that this output is closely related to success in other scientific areas and outcomes (e.g., citation impact): it can be expected that the failure in publication output entails failure in other areas. Since publication data are available in multi-disciplinary databases (which is mostly not the case for other science outcome data such as funding received), success in publishing papers can be investigated straight forwardly. With the Boolean logit and probit analysis, a statistical procedure has been presented in Sect. “The empirical analysis of the Anna Karenina principle (AKP) by Boolean logit and probit procedures” which can be used to empirically examine whether factors are multiplicatively related with respect to a binary outcome. The binary outcome may separate publishing and non-publishing scientists or very productive and rather unproductive scientists. To identify factors that are decisive for being successful or not in publishing, factors from the scientometrics literature should be collected in a first step. In the second step, scientists can be surveyed with regard to the question whether the factors apply or not. With the Boolean logit and probit procedures, it can be tested then, whether the factors are multiplicatively related or not with respect to success or failure.

This study proposed a program for studying individual publication processes and their underlying factors. There are many areas in science with similar skewed distributions of outcomes as publications such as acquisition of third-party funds or citation impact of publications. Future studies could focus on these areas to discuss or empirically analyze success and failure based on the AKP, the multiplicative combination of factors leading to success and failure, and the Boolean logit and probit procedure.