Introduction

The traditional single-authored monograph-style doctoral dissertation has declined into relative obscurity in the natural and biomedical sciences (cf. Larivière et al. 2008), and been largely superseded by a modern PhD thesis consisting of a collection of multi-authored manuscripts and publications (e.g. Powell 2004). Although the two formats still coexist and are subject to the same basic academic requirement for the production of an independent and cohesive body of new scientific knowledge (Anonymous 2007a; Wilson 1998), the transition towards dissertation by publication has taken place without resolving the pertinent question of the actual number of publications (Powell 2004), or more specifically the actual amount of authorship credit required for a typical dissertation. For multi-authored papers the answer to this question must include a formula for how credit is to be impartially partitioned between the individual candidate who is awarded the PhD and the coauthors, whose participation has made the dissertation a cooperative effort (cf. Gannon 2006).

Impartial partitioning of authorship credit has hitherto been impractical because routine bibliometric methods for allocating authorship credit are biased by not recognizing differential coauthor contributions (Hagen 2008, 2009, 2010). Such bias is generated either by allocating one full credit repeatedly to all coauthors of a paper (inflated counting), or by dividing one credit equally among all coauthors irrespective of authorship rank (fractional counting). The end result of inflated counting is to award a PhD candidate full credit for a cooperative effort, while the end result of fractional counting is to underestimate the authorship credit for publications where the candidate is a primary author (Hagen 2008, fig. 3 therein).

In contrast, harmonic counting provides a transparent post hoc protocol for unbiased decoding of the byline hierarchy by implementing three simple ethical criteria for bibliometric partitioning of authorship credit among N coauthors (Hagen 2008, 2010):

  1. 1.

    one publication credit is shared among all coauthors,

  2. 2.

    the first author gets the most credit, and in general the ith author receives more credit than the (i + 1)th author, and

  3. 3.

    the greater the number of authors, the less credit per author.

The process by which a byline hierarchy is established is the subject of careful collegial scrutiny, recurrent controversy and interminable ethical debate which strives to ensure that the published byline reflects the best possible judgment of the relative importance of the contribution made by each coauthor (Anonymous 2009a, b; Costa and Gatz 1992; Cronin 2001; Fine and Kurdek 1993; Geelhoed et al. 2007; Moore and Griffin 2006; Oberlander and Spencer 2006; Sandler and Russell 2005; Spiegel and Keith-Spiegel 1970; Strange 2008). Given that a byline is the consensual end result of such a painstaking pre-publication process to determine authorship inclusion and rank, harmonic counting as defined above is a remarkably straightforward method for decoding byline information into unbiased authorship credit. Furthermore, I have recently shown that harmonic credit scores provide a robust fit to empirical data from studies on perceived notions of the relationship between byline position and authorship credit in medicine, psychology and chemistry (Hagen 2010).

Here I use harmonic counting to analyze retrospectively the actual amount of authorship credit attributable to the individual PhD candidates who graduated from two Scandinavian universities in 2008. I also establish a de facto baseline for the requisite scientific productivity of these contemporary PhD candidates, and present evidence to suggest that a sliding bibliometric baseline may be a side effect of non-quantification.

Materials and methods

Data

The dataset consists of all PhD dissertations completed in 2008 at the Karolinska Institute, Stockholm, Sweden, and all available PhD dissertations completed in 2008 at the University of Tromsø (UT), Tromsø, Norway. Dissertations were classified as traditional monographs when single authored and unpublished, or when consisting solely of unpublished single authored manuscripts or chapters. All papers submitted as part of a modern publication based dissertation, whether published, submitted or unpublished, were included in the bibliometric analysis.

The data for the Karolinska Institute consists of 352 modern doctoral dissertations by publication. (The official count of 353 included a double entry for S. Lundgren.) Pdf-files of all dissertations were downloaded from the exemplary, up to date repository of the Karolinska Institute, which provides instant online open access to all theses without delay http://diss.kib.ki.se/search/diss_2008_se.html.

The data for the University of Tromsø consists of 58 modern dissertations from a total of 104 doctoral dissertations completed in 2008 (Table 1). Excluding 32 monographs and a solitary modern thesis from the Faculty of Law, left a subtotal of 71 UT dissertations potentially available for bibliometric analysis. Four of these dissertations were unavailable due to restricted access from the depository at the national library in Oslo, and 9 dissertations were still unavailable by April 2009 as they had not yet been registered by the national Norwegian library system BIBSYS http://www.bibsys.no/norsk/english.php. Only 13 dissertations from the University of Tromsø were available for downloading as pdf-files from BIBSYS. The remainder were accessed through interlibrary loan, or directly at the University Library in Tromsø during a visit in February 2009.

Table 1 Bibliometric summary of PhD dissertations at Karolinska Institute and University of Tromsø (UT) in 2008

Partitioning authorship credit

Bibliometrically identifiable authorship credit was partitioned between PhD candidates and their coauthors according to a harmonic counting scheme based on relevant byline information including authorship rank, the number of coauthors and any indication of equal contribution by two or more coauthors (Hagen 2008, 2010). Harmonic credit for the ith author of a publication with N coauthors was calculated as follows:

$$ i{\text{th}}\,{\text{author}}\,{\text{credit}} = \left( { 1/i} \right)/\left[ { 1 + \left( { 1/ 2} \right) + \cdots + \left( { 1/N} \right)} \right] $$

Equal authorship was included in the calculation by summing the harmonic authorship credit of the equal coauthor positions, dividing the sum by the number of equal coauthors, and allotting the result to each. In the dataset, equal authorship was limited to two or three of several coauthors, and no paper indicated that all coauthors were equal. Fractional counting was therefore not a valid option, and would only have introduced bias (cf. Hagen 2010, fig. 3 therein).

Senior authorship is conventionally indicated by the corresponding author occupying the last position in the byline. This traditional but controversial practice is common in biomedical research but was apparently not used consistently in publications from the medical PhD programs at Karolinska and UT. Therefore, last authors were not conferred special status in the calculation of authorship credit. Group authors have no recognizable bibliometric status at present and were not included in the calculation of authorship credit.

Statistical tests

Leven’s test was used to test for equal coefficient of variation (CV) in harmonic and inflated authorship credit scores at Karolinska Institute (Van Valen 2005). Both the retention rate for ≥50% authorship credit from the different PhD programs, and the proportion of first authored papers submitted by PhD candidates from the Karolinska Institute were tested for statistical independence using a contingency table, chi square test (Sokal and Rohlf 1995).

Results

Partitioning authorship credit for multi-authored dissertation papers

The total proportion of unbiased harmonic authorship credit retained by PhD graduates ranged from 40% for Karolinska Medical Science, to ≈45% for UT Medical Science and UT Natural Science, to 50% for UT Fisheries Science (Table 1). In contrast, graduates from the UT Social Science program retained 75% authorship credit, and the solitary publication based dissertation from UT Law had no coauthors and therefore retained 100% credit.

The median amount of unbiased harmonic authorship credit retained per dissertation peaked at 2.9 undivided papers for UT Social Science, but was less than two undivided papers for the other PhD programs (Fig. 1a). In contrast, the median number of submitted papers per dissertation was four for all PhD-programs (Fig. 1b), suggesting that authorship rank and the number of coauthors are the main determinants of the observed variation in retained authorship credit per dissertation (Fig. 1a).

Fig. 1
figure 1

Authorship credit per dissertation. a Amount of harmonic authorship credit attributed to graduates from the different PhD-programs. Hypothetical benchmark at two undivided papers per dissertation is indicated by the horizontal red line. b Number of papers submitted by graduates from the different PhD-programs. Common median number of submitted papers per dissertation for all PhD programs is indicated by the black horizontal line. Bold lines represent the median, boxes the 25 and 75% quartiles, and whiskers extend to the 10 and 90% percentiles. The sample includes 1488 papers submitted in 2008 by 352 PhD graduates from Karolinska Institute, and 233 papers submitted by 58 PhD graduates from University of Tromsø (UT)

Multiple authorship was the overall norm, and single authorship was rare except in UT Social Science (48.3%) and UT Fisheries Science (9.8%). The median number of coauthors per submitted paper (Fig. 2) was highest in the Karolinska and UT Medical Science programs (5), intermediate in UT Natural Science and UT Fisheries Science (4), and lowest in UT Social Science (2).

Fig. 2
figure 2

Coauthors. Number of coauthors on papers submitted by graduates from the different PhD-programs. Most papers had ≥4 coauthors as indicated by the horizontal red line. Bold lines represent the median, boxes the 25 and 75% quartiles, and whiskers extend to the 10 and 90% percentiles. The sample includes 1488 papers submitted in 2008 by 352 PhD graduates from Karolinska Institute, and 233 papers submitted by 58 PhD graduates from University of Tromsø (UT)

First authorship by the graduate dominated on submitted papers from all PhD programs (Fig. 3) and covered a range from 77.7% for Karolinska Medical Science, to 86.5% for UT Medical Science, to 93.1 for UT Social Science (Table 2).

Fig. 3
figure 3

Authorship rank. First authorship dominates on submitted papers from all PhD-programs. Fractions indicate shared first authorship among 2 (0.5) or 3 (0.33) coauthors. The sample includes 1488 papers submitted in 2008 by 352 PhD graduates from Karolinska Institute, and 233 papers submitted by 58 PhD graduates from University of Tromsø (UT)

Table 2 First authorship patterns and effect of harmonic authorship benchmarks for PhD graduates from Karolinska Institute and University of Tromsø (UT) in 2008

Benchmarking authorship credit

The median amount of retained harmonic authorship credit for 4 of the 5 PhD programs fell below a hypothetical benchmark of two undivided papers per dissertation (Fig. 1a). This benchmark level would disqualify 37.5% of the graduates from UT Social Science, 57.1% from UT Natural Science, 72.7% from UT Fisheries Science, 80% from UT Medical Science, and 81.2% from Karolinska Medical Science (Fig. 4; Table 2).

Fig. 4
figure 4

Benchmarking authorship credit per dissertation. Setting a hypothetical benchmark at two undivided papers per dissertation (vertical red line) would disqualify approximately 80% of the graduates from the Karolinska Institute (horizontal red line). The sample includes 1488 papers submitted in 2008 by 352 PhD graduates from Karolinska Institute, and 233 papers submitted by 58 PhD graduates from University of Tromsø (UT)

The proportion of dissertations where the graduate retained ≥50% authorship credit was lowest for Karolinska Medical Science (11.7%, Table 2). In contrast, the retention rate for ≥50% authorship credit for UT Medical Science was 24%, and for UT Natural Science 35.7%. UT fisheries had an intermediate value of 27.3% while UT Social Science peaked at 75%. The retention rate for ≥50% authorship credit was not independent of PhD program for the whole sample (Table 2; 4 degrees of freedom, chi square = 23.858, P < 0.0001), nor for the subset consisting of the natural and medical science programs (3 degrees of freedom, chi square = 8.883, P < 0.05). This analysis suggests that authorship credit retention patterns for PhD graduates from the Karolinska Institute are substantially different from the credit retention patterns for PhD graduates from the University of Tromsø. The underlying causes of the observed difference is explored in more detail below.

Karolinska Institute: authorship credit by number of submitted papers

Most PhD candidates from Karolinska Institute submitted 4 papers for their dissertation, 12% of the candidates submitted 5 papers, and a small percentage submitted either 3 papers or ≥6 papers (Fig. 5a). However, the amount of harmonic authorship credit per dissertation was significantly more variable than the inflated credit represented by number of submitted papers (CVHarmonic = 30.2, CVInflated = 19.2, Levene’s test P < 0.0001). The amount of harmonic authorship credit attributable to the 79% majority of PhD candidates who submitted four papers each ranged from 0.67 to 2.55 undivided papers (median 1.58), indicating that a candidate’s actual contribution to any one submitted paper ranged from 16.7 to 63.6% (median 39.6%).

Fig. 5
figure 5

Karolinska Institute: Harmonic authorship credit by number of submitted papers. a Proportion of PhD candidates submitting 3, 4, 5 or ≥6 papers for their dissertation. b Harmonic authorship credit by number of submitted papers. Hypothetical benchmark at two undivided papers per dissertation is indicated by the horizontal red line. Bold lines represent the median, boxes the 25 and 75% quartiles, and whiskers extend to the 10 and 90% percentiles. The sample includes 1488 papers submitted in 2008 by 352 PhD graduates from Karolinska Institute

The median harmonic authorship credit per dissertation increased from 1.0 to 2.21 when the number of submitted papers increased from 3 to ≥6 (Fig. 5b), suggesting that a benchmark level of two undivided papers per dissertation would translate into a requirement for at least six submitted papers per dissertation at the current level of coauthorship.

Karolinska Institute: authorship credit and authorship rank

The proportion of first authored papers submitted by PhD candidates from the Karolinska Institute was significantly higher than would be expected if authorship rank were independent of candidature (chi-square = 483.07, P < 0.0001). First authorship or shared first authorship among two or three coauthors was identified for 80.4% of the 1488 submitted papers (Fig. 6a). Correcting for shared first authorship reduced the level of overall first authorship to 77.7% (Table 2).

Fig. 6
figure 6

Karolinska Institute: Harmonic authorship credit by authorship rank. a Proportion of submitted papers by PhD candidates whose authorship rank was classified as: single; first; shared first among 2 or 3 coauthors; second and lower. b Harmonic authorship credit attributable to PhD candidates by authorship rank. Hypothetical benchmark at 0.25 credits is indicated by the horizontal red line. Bold lines represent the median, boxes the 25 and 75 quartiles, and whiskers extend to the 10 and 90% percentiles. The sample includes 1488 papers submitted in 2008 by the 352 PhD graduates from Karolinska Institute

Harmonic authorship credit per paper ranged from 0.02, for a paper where the PhD candidate was the 14th of 20 coauthors, to the full amount of one credit for single authored papers (median 0.41). For first authored papers the PhD candidate’s median harmonic authorship credit was 0.48 per paper, and when first authorship was shared the median dropped to 0.29 for two equal first authors, and to 0.21 for three equal first authors. Second and lower ranking authors obtained a median harmonic credit of 0.19 per paper (Fig. 6b).

Setting a hypothetical benchmark of 0.25 as the PhD candidate’s minimal acceptable contribution to a paper, would effectively restrict submissions to single and first authored papers, or papers with shared first authorship between two coauthors (Fig. 6b).

Dissertation format

The modern publication based dissertation format dominated in the natural and medical science programs (Table 1), where it ranged from 87.5% for UT Fisheries Science to 100% for UT Natural Science and Karolinska Medical Science, but was less prevalent for UT Social Science and UT Law, and absent from UT Humanities. Conversely, the traditional monograph dissertation was completely absent from the Karolinska Institute and from the UT Natural Science program, and virtually absent from UT Fisheries and UT Medical Science (Table 1). Although still dominant in UT Humanities, and thriving in Law and Social Science at UT, the monograph has definitively yielded its former dominance in the natural and medical sciences where, according to the present sample, it survives merely as an anachronism. As a result, part of the evaluation process has shifted from the supervisors and external examiners to the editors and anonymous peer reviewers of the international scientific community.

Discussion

Harmonic allocation of authorship credit provides an unbiased post hoc bibliometric measure of current PhD requirements (Hagen 2008, 2010), and sets a de facto baseline for the requisite scientific productivity of contemporary PhD candidates from the Karolinska Institute at a median value of approximately 1.6 undivided papers per dissertation (Fig. 1a). This baseline figure reflects the fact that most PhD candidates submitted four papers but retained only 39.2% of the bibliometrically identifiable authorship credit after correction for authorship rank, shared first authorship, and the number of coauthors.

The median productivity of PhD candidates at the University of Tromsø, although comparable to Karolinska for Medical, Fisheries and Natural Science where the median amount of harmonic authorship credit ranged from 1.6 to 1.9 undivided papers per dissertation, was conspicuously higher at 2.9 undivided papers per dissertation for the Social Science program (Fig. 1a). This pattern was a result of higher authorship credit retention among fewer coauthors, and suggests that UT Social Science graduates publish primarily with their main supervisor, UT Fisheries and UT Natural Science graduates publish mainly with their supervisory committee members, and that medical science graduates from both programs tend to publish with members of an extended research group.

Shifting bibliometric baselines

The number of papers per dissertation appears to have decreased since the mid 1990s when Breimer (1996) reported that a random sample of 72 biomedical dissertations from Swedish universities contained 302 published and submitted papers, as well as 111 unpublished manuscripts. These figures give a combined mean value of 5.7 papers per dissertation, compared to a mean of 4.2 papers for the Karolinska Institute in 2008 (Table 1). During the same time period the median number of coauthors per paper increased from four in Breimer’s dataset to five in 2008 (Fig. 3). Breimer (1996) also suggested that a similar increase in the median number of coauthors, from three to four, had taken place during an earlier time period.

The combined effect, of a trend towards more coauthors and fewer papers, is a gradual diminishing of the PhD candidate’s share of authorship credit for the dissertation. This situation is analogous to the shifting baseline syndrome of fisheries management (Pauly 1995) and systems ecology (Dayton et al. 1998), whereby referential standards decline as each new generation of observers redefines a slightly reduced baseline as the new norm. The shifting baseline syndrome is relevant in light of the continued international trend towards an ever increasing number of coauthors (Wuchty et al. 2007), and explicit government policies to streamline PhD education while increasing the number of dissertations per annum (Anonymous 2009b).

For doctoral dissertations a simple solution to the shifting baseline syndrome would be to benchmark the amount of unbiased authorship credit deemed necessary for successful completion of a specific PhD program, and then monitor for departures from this level over time. For example, a hypothetical benchmark of two undivided papers per dissertation would disqualify 81.2% of the dissertations from the Karolinska Institute and 57.1% of the dissertations from the UT Natural Science program (Fig. 4). Setting the benchmark at “the equivalence of three single-authored published papers in leading journals” as recently suggested for Helsinki University (Kumpulainen 2008), would disqualify 98% of the dissertations from Karolinska Institute and 86% of the dissertations from UT Natural Science. An alternative approach would be to raise requirements by setting a minimum benchmark at a predetermined level of disqualification, e.g. by disqualifying 15% of current dissertations the minimum requirement would be raised to approximately 1.3 undivided papers per dissertation for Karolinska Institute and 1.5 papers per dissertation for UT Natural Science (Fig. 4).

Juxtaposing bibliometric benchmarks with current baselines may also assist the development of harmonized guidelines and transparent transnational quality assurance procedures for doctoral programs (Morley et al. 2002; Rauhvargers et al. 2009) by providing a robust and meaningful standard for further exploration of the causes of intra- and inter-institutional variation in the amount of unbiased authorship credit per dissertation.

Principal, minimal and senior authorship

Principal authorship, as required in some PhD regulations, is an undefined entity in terms of authorship credit. Taken to mean ≥50% authorship credit per paper, as occasionally specified in university guidelines (Anonymous 2007b; Kumpulainen 2008), would imply a request for first authorship on papers with no more than three coauthors (Hagen 2008, fig. 2A therein). At Karolinska Institute only 16.2% of the 1488 papers submitted in 2008 qualified for ≥50% harmonic authorship credit.

Inclusion of papers with a minimal contribution by the PhD candidate is frequently discouraged but rarely quantified (e.g. Anonymous 2007b; Kumpulainen 2008). Defining a minimal contribution as at least 25% harmonic authorship credit would imply first authorship on papers with no more than 30 coauthors, or second authorship on papers with no more than three coauthors (Hagen 2008, fig. 2A therein). At the Karolinska Institute 80.4% of papers submitted in 2008 qualified for ≥25% harmonic authorship credit.

Senior authorship (last author), although still recognized on a par with first authorship in some biomedical subfields (Wren et al. 2007), is controversially associated with unwarranted byline inclusion (e.g. Strange 2008; Ward 1994) and was not unequivocally identifiable in the present dataset due to inconsistent usage. For applicable papers the overall effect of not including senior authorship in the quantification of authorship credit is to overestimate the contribution of PhD graduates by an amount equivalent to the difference between sole first authorship and shared first authorship (Fig. 5) (Hagen 2008, fig. 5 therein).

Conclusions

Harmonic allocation of authorship credit provides an unbiased bibliometric measure of current PhD requirements, and sets a de facto baseline for the requisite scientific productivity of a contemporary PhD at a median value of approximately 1.6 undivided papers per dissertation. Comparison with previous census data suggests that the baseline has shifted over the past two decades as a result of a decrease in the number of submitted papers per candidate and an increase in the number of coauthors per paper. Setting a hypothetical benchmark at two undivided papers per dissertation would disqualify almost 80% of the dissertations in the sample.