Introduction

After introducing the h-index (Hirsch 2005), the methods calculating impact indices from an appropriately selected smaller set out of the total, are preferably applied in assessing publication impact of scientists or journals (Vinkler 2009, Schreiber 2010, Bornmann et al. 2011, Schreiber et al. 2012, Wildgaard et al. 2014, Todeschini and Baccini 2016). The appropriately selected smaller sets may be termed as elite or core sets (Vinkler 2017a, b, 2019).

The application of elite sets involves the assumption that indicators derived from an exclusive part of the publications with highest number of citations may represent the relative impact of the results more appropriately than that obtained from the total. The mentioned practice is in accordance with one of the basic paradigms of scientometrics: Citations indicate impact, and more citations indicate greater impact.

It is well known that the distribution of citations over journal papers is skewed (Seglen 1992). Accordingly, the highly cited papers in the elite set may be regarded as relatively most influential within the corresponding total set. There are several methods for obtaining elite sets, e.g., h, g, and π-statistics, or applying a certain percentage of the total (Vinkler 2007, 2019).

J. Hirsch, who introduced the h-index (Hirsch 2005), is a university professor of physics. He published altogether 272 journal papers up to 03 06 2019. Until that date, his papers received 19,304 citations. The mean citation frequency of the papers is equal to 71.23. Taking into account all journal papers of Hirsch, independent of their topic, his h-index is 58 (see WoS). The mentioned author initiated a revolution in the development of scientometric impact indices by suggesting the h-index.

Hirsch’s first scientometric paper (Hirsch 2005) obtained 3633 citations up to the mentioned date. Accordingly, we would think, he was one of the most influential scientists in the field. However, surveying the publication list of the mentioned author in WoS, among his 272 papers only 5 articles may be classified as scientometric publications. The citation rate of these papers is as follows: 1/(published in 2005): 3633, 8/(2007): 434, 31/(2010): 123, 145/(2014): 25, and 249/(2019): 0. (The cursive numbers are the rank numbers of the papers by citation taking into account all papers (272) of the mentioned author.) Accordingly, the h-index, i.e. the scientometric impact of J. Hirsch would be equal to 4 which seems to be a rather low value. It is because the value of h-index cannot exceed the number of publications in the set studied, and it does not regard the total number of citations obtained (Vinkler 2007, 2017a).

In contrast, calculating the π-index (Vinkler 2009) of the scientometric papers of Hirsch, a relatively high value is obtained. Hirsch published five scientometric papers up to 03 06 2019. Accordingly, the number of his π-core papers: √5 = 2.24 (rounded as 2), see Table 1. In calculating the π-index, we have to sum up the number of citations to his π-core papers, and the sum should be divided with 100. Accordingly Hirsch’s π-index = 0.01 (3633 + 434) = 40.67. This value seems to be rather high compared to that for well-known scientometricians (see Table 1 and Vinkler 2017b). (In Table 1 the π-index of papers of Hirsch in the scientometric part-set is 39.81. The reason for the difference (40.67 vs. 39.81) is the different date (03 06 2019 vs. 30 01 2019) for obtaining the citation data.)

Table 1 Number of publications (P), h-index (h), and π-index (π) for the total set (T) and its elite part-sets (Scientometrics: S and other sciences: O, respectively) of some scientometricians

The given example makes it clear that the π-index prefers scientists with high number of citations to π-core publications even they have published relatively low number of papers.

The above example makes it reasonable to separate publications in the complex set of researchers working on more than a single science field by part-sets when evaluating their activity. Separating publications according to research fields finds support in the fact that the bibliometric features (i.e. mean number and age of references in papers, yearly number of the publications, etc.) are rather different by the field (Vinkler 2010). It is obvious that criteria for selecting the papers in a complex (“mixed”) set by parts should be determined according to the goals of the assessment.

As criteria for classifying the publications within a complex set, different publishing periods, cooperating partners or affiliations of the authors, etc. may also be applied.

As complex publication sets, the sets consisting of two or more part-sets of publications with different bibliometric features may be regarded. From this definition of complex sets it follows that a publication part-set, i.e. part of a complex publication set selected by scientific disciplines or fields or subfields (or other criteria) would contain bibliometrically (or by other aspects) more or less consistent publications.

It seems to be obvious, if we were interested in determining the impact of publications of a physicist on physics, we should not calculate with his or her possible other publications on e.g. economics or history. In addition, for comparing scientific production of individuals working on different fields, we have to select different reference standards. Anyway, the goals of the evaluation should determine the possible classification of the papers in complex sets.

The methods, how to obtain elite sets may be distinguished according to the standards applied. The calculations may apply inside or outside standards.

Inside publication standards are reference values for publication assessments calculated from the data of the studied set. Such value is, e.g. the h-index of the studied publication set. This indicator represents the number of papers in the h-core. We may use also the number of π-core papers as elite papers, which is equal to the square root of total papers, or we may calculate the number of those papers in the studied set, that obtained more than an arbitrary number (e.g. 50 or 100) of citations. There are of course also other possibilities. E.g. the number of papers that are cited more heavily than the mean or the most cited 10% of papers in the set, etc.

Outside (field) standards are reference values obtained from data of a set selected as reference. Such reference set may be e.g., the set of papers in a single or in several journals, or in all journals in the field of which topic is related to that of the papers studied. As standard e.g., the set of papers with higher number of citations than the mean citation frequency (C/P) of the papers in a selected journal or in journals of the field may be applied. Accordingly, we may calculate the number of publications from the set studied within the reference set of papers selected according to the mentioned method.

The aims of the present study:

  • comparing the number of journal papers in the elite publication sets selected by different methods and comparing the number of citations to those papers,

  • comparing the h and π index calculated for complex (total) publication sets with the sum of h and π indices, respectively obtained for two part-sets within the corresponding complex set, separately.

Methods and data

Model 1 in Table 2 demonstrates one of the main research questions of the present study. It is assumed that the complEx publications set (E) of a scientist consists of 8 publications, which can be attributed to two different subject fields: a1 and a2. The papers are listed by decreasing number of citations. The h-index of set E is four because there are four papers with four or more citations. Let us suppose, the part-sets (a1 and a2) derived from the parent E set, would contain five and three papers, respectively. Accordingly the h-index for set a1 = 3 and for set a2 = 2. The sum of h-index of the part-sets: h(a1) + h(a2) = 5 that is higher than the h-index for the parent set: h(E) = 4. This observation would indicate to distribute set E into part-sets, if we were interested in impact of the papers on field a1 or on field a2, separately.

Table 2 Model 1: Calculating the sum (H) of h1-index and h2-index of two part-sets (a1 and a2, respectively) from the same parent set (E)

For the present study, 10 scientists were selected who published papers both in scientometrics (S) and in other field(s) (O) (Table 1). Scientometrics was selected because the author is active on this field for several years. The publications were classified as scientometric and “other” according to their title and abstract. Each abstract was surveyed individually if the title of the paper did not yield enough information for the decision. Consequently, not only publications in journals classified as library and information science (LIS) by WoS were taken into account as scientometric publications, but also those papers in any journal (e.g. Research Policy, Nature, Analytical Chemistry, etc.) which could be classified as scientometric. In addition, of course, not all papers published in LIS journals were accepted as scientometric publications. The papers were collected by the name of the authors from WoS. Both their scientometric, bibliometric, webometric and altmetric papers were classified as “scientometric”. Papers e.g. on physical, chemical, astronomical, astrophysical, medical, and biological topics and on library science were attributed to “other” fields.

The number of and citations to the papers were collected from Clarivate Analytics Web of Science All Databases 1975–2018 on 30 January, 2019. The field standard of citation frequency (Cf/Pf) was calculated from data of Scientometrics on 01 September, 2019. In 1975–2018 Pf = 5,542 papers were published in the periodical to which 87,409 citations (Cf) were received in the same period. Accordingly the citation frequency: Cf/Pf = 15.77 (rounded: 16). The standards applied in the paper, i.e. number of citations by paper are the following: 2.5Cf/Pf = 40 and 5Cf/Pf = 80. The paper in Scientometrics ranked as (P/100)th = 55, obtained 150 citations in the mentioned period. Accordingly, those papers of the studied scientists that obtained more citations than 150 are counted as papers belonging to the elite set of this type.

For obtaining inside standards (i.e. standards calculated from data of the publication set analysed) the following methods were applied:

Methods taking into account the size of the publication set analysed (P: number of publications) for obtaining the number of papers in the elite set (Pe):

  • π-statistics: P(e) = P(π) = √P, or 2P(π) = 2√P, or 3√P

  • logarithmic statistics: P(e) = 2 logP

  • percentage statistics: P(e) = 0.1P

A method taking into account the rank of papers by citation and number of citations obtained to the individual papers:

  • h-statistics: P(h), i.e. number of papers in the h-core.

Methods applying outside standards (here: field standards, e.g. those papers published in a special journal or all papers published on the corresponding field during a selected period):

  • Number of papers in the set studied with number of citations 5 times (or 2.5 times) higher than the mean citation rate (Cf/Pf = 80 and 40, respectively) of journal papers in Scientometrics in the studied period. (Cf is the total number of citations whereas Pf is total number of papers.)

  • Number of papers in the analysed set with equal to or higher than 150 citations.

It is obvious that studying relations between part sets and complex sets several indicators may be selected. In this study two elite set indicators, namely h-index and π-index are chosen. The former may represent most h-type indices, whereas π-index may refer to the elite set indices, which are obtained by simple mathematical calculations (e.g. 1% or 10% of total papers). Table 3 makes it possible to compare the number of papers in h-core and π-core with the size of elite sets obtained by other methods.

Table 3 Number of papers in different elite sets (P(h), P(π), etc.) of the scientometric publications (part-sets S in Table 1) of some scientometricians

The author was not interested in attributing papers to other topics or disciplines outside scientometrics. This way the present study may refer to „two part-sets systems”, i.e. papers within a complex (mixed) set consisting of publications on scientometrics and on „other” topics. The study with more than two part-sets may be more complicated. The „two part-sets system” presented here may be regarded as the starting step toward evaluating more complex publication systems of scientists through separating the publications according to different topics, time-periods, affiliations, cooperating partners, etc.

The statistical analyses were performed with STATISTICA data analysis software system version 13, TIBCO Software Inc. Palo Alto, USA.

Results

Complex sets and part-sets

Table 1 shows the number of publications (P), h, and π-index of the scientometricians analysed. The number of papers in the complex (total) set ranges from 30 (MacRoberts) up to 263 (Braun) and 272 (Hirsch). The ratios of the number of papers in the part sets (scientometrics, S related to “other” fields, O) range from 0.02 (Hirsch) and 0.06 (Seglen) up to 3.93 (Vinkler) and 9.80 (VanRaan). There are individuals publishing papers almost exclusively on non-scientometric topics (e.g. Hirsch, Seglen, Kosmulski), whereas other scientists publish similar number of papers in both scientometrics and in other fields (e.g. Braun or Moravcsik). There are researchers with many more scientometric papers than publications on other topics (VanRaan and Vinkler).

It is worth mentioning that the Spearman correlation coefficient between the number of papers (P) in the complex set and h-index of the studied scientists is 0.79, and that between P and π-index: 0.72, whereas the correlation coefficient between h-index and π-index is 0.83. The mentioned coefficients are significant at p ≤ 0.05.

Both h and π-index reflect scientific excellence. Both indices depend on the number of publications and citations and on the distribution of citations over publications. Nevertheless, the h-index lays greater stress upon the number of publications than the π-index. According to Iglesias and Pecharromán (2007) and Schubert and Glänzel (2007) there is a strong linear correlation between h-index and the product: [k·(C/P)2/3·P1/3], where C is the total number of citations to P papers in the set analysed and k is a constant. In contrast, the π-index lays greater stress upon the number of citations to the most frequently cited papers than the h-index, as it calculates with the number of citations to the top P1/2 papers multiplied by 0.01.

Table 1 shows both the h-index and π-index of papers of the studied scientists also by part-sets. There are scientists with significantly higher indices for their non-scientometric publications, h(O) or π(O) > h(S) or π(S), respectively, like Abt, Hirsch, Kosmulski, and Seglen, whereas other individuals (Braun, MacRoberts, Pouris, VanRaan, and Vinkler) show higher indices for their scientometric papers than that for other topics. Moravcsik seems to be an exception. His h-index is higher for papers in non-scientometric field (theoretical physics) than that in scientometrics (O: 13 vs. S:11). In contrast to this, his π-index is lower in physics than in scientometrics (O: 2.98 vs. S: 6.07). Taking into account the similar number of papers in the part-sets (S: 88 vs. O: 77), the higher π-index may indicate a greater influence on scientometrics than on theoretical physics. Naturally, for drawing the mentioned conclusion, we should assume similar bibliometric features for both fields.

The difference between the h(O) and h(S) or π(O) or π(S) should be influenced by the different number of papers in the part-sets. The scientists publishing more scientometric papers than papers on other topics (Braun, MacRoberts, Pouris, VanRaan, and Vinkler) show higher, whereas that with less papers in scientometrics than on other fields (Abt, Hirsch, Kosmulski, and Seglen) lower h(S) and π(S)-index than h(O) and π(O), respectively.

Table 1 contains also means and standard deviation (SD) values of the indicators. Surveying the SD values it is obvious that the differences between means cannot be significant in most cases. One of the reason that the number of cases (10) is relatively low. The difference is significant however between the mean number of papers in the complex set, P(E) and mean h(E) (p = 0.0003) and mean π(E) (p = 0.0007). Further, the difference is significant between the mean h-index of the total set, h(E) and mean h-index of the scientometric part-set, h(S) (p = 0.011).

No significant difference was found between the mean h and π-index of the complex set, h(E) and π(E) and the corresponding mean of sum of indices of the part-sets, H(S + O), Π(S + O), respectively. Nevertheless it is not surprising taking into consideration the high SD values (Table 1). The high SD values may be caused by the following factors: different bibliometric features of the fields where the authors are active, different number of papers of the individuals on the different fields, and the relatively low number of scientists studied.

In contrast to the mentioned general tendency of significance, it seems to be relevant to take into account the difference between the h or π indicator referring to the complex (total) set and the sum of the indicators derived as cumulative number of the index of the corresponding part-sets on individual level. Naturally, the difference mentioned may refer to any other impact indicator.

For clearing the relations between the mentioned indices, Model 2 and 3 would yield some more information.

Calculating elite subsets by different methods

The size of the elite sets of scientometric papers of the scientists obtained by inside and outside standards are shown in Table 3. The data indicate the following conclusions:

  • P(h) > P(π)

  • 2P(π) > P(π)

  • P(h) and P(π) > 2logP

Earlier studies maintain the statement: P(h) > P(π) (Vinkler 2011, 2017a, b). Relations: 2P(π) > P(π) and P(π) > 2logP are obvious.

Figure 1 shows the percentage rate of papers in the elite sets related to the total number of scientometric papers of the studied scientists. The rank of scientometricians on the x-axis reflects the decreasing number of citations in the h-core of their papers.

Fig. 1
figure 1

Per cent of papers in different elite sets related to the total of the scientometricians studied. The decreasing number of citations in the h-core gives the rank of the scientometricians

The highest share can be obtained in most cases by h-statistics. The dynamic range of P(h) indices: 12.50% (Moravcsik) – 81.82% (Seglen). The shares obtained by 2P(π) method show similar values: 16.00% (Braun) - 80.00% (Hirsch). The percentage rate strongly depends on the distribution of citations over the papers.

The Spearman correlational coefficients of the indices are given in Table 4. The data show, the number of scientometric papers, P(S) in the set correlates significantly (at p ≤ 0.05) with the size of elite sets obtained by inside standards: P(h), P(π), 2P(π), 2logP, 3√P, 0.1P (r = 0.73–0.99). However, P(S) does not correlate significantly with the size of elite sets obtained by outside standards (ci ≥ 150, ci ≥ 80, and ci ≥ 40). The size of the elite sets obtained by inside standards correlates significantly with each other (0.73–1.00). The same is valid also for the correlation between 5Cf/Pf and 0.1Pf (r = 0.96), 2.5Cf/Pf and 0.1Pf (r = 0.74), and 2.5Cf/Pf with 5Cf/Pf (r = 0.88). All the other correlations between the sizes of elite sets calculated by with outside standards are not significant at p ≤ 0.05.

Table 4 Spearman rank correlational coefficients between number of papers, P(S) in the scientometric part-set (see Table 1) and that in the corresponding elite sets, P(h), P(π), etc. (see Table 3)

The size of elite sets (i.e. number of papers in the elite sets) should not be regarded as an impact measure, a priori. The number of papers in the π-core, i.e. P(π) of the publication set of individuals shows only quantitative aspects. However, the size of the h-core, P(h) (i.e. number of papers in the h-core) is equal to the h-index which may be regarded as an impact index.

For obtaining comparable indices for characterizing impact of papers in elite subsets, the number of citations to the papers in the elite subset may be recommended. The number of citations in the elite subsets calculated by different methods is given in Table 5.

Table 5 Number of citations in different elite sets of the scientometric part-set (S) of publications of the studied scientists (see Table 1)

Figure 2 shows the number of citations in the different elite subsets by the scientists studied. The rank number of scientists (1, 2, 3, etc.) is the same as in Fig. 1. The data reveal, in most cases the number of citations is highest in P(h) or in 2P(π) set. The rank of the elite subsets by the number of citations strongly depends on the distribution of the publications by citation.

Fig. 2
figure 2

Number of citations to the publications in the elite sets of journal papers of the studied scientists. The decreasing number of citations in the h-core gives the rank of the scientometricians

The number of citations (C) to the papers in the individual elite sets (h-core, π-core, 2logP-core, 3√P-core, etc.) correlated with the rank number of the scientists by the corresponding number of citations to the papers in the individual elite sets, yields negative exponential functions. Some of the equations are given as follows: C(h) = 6949.547·e−0.390r(h) (see Fig. 3) and C(π) = 5661.205·e−0.401r(π), where C(h) and C(π) are the number of citations to h-core and π-core papers, respectively, whereas r(h) and r(π) are the rank number of scientists according to h-index and π-index, respectively. The corresponding equation for the citations (C) in the elite set obtained through the field standard C ≥ 5Cf/Pf: C = 7065.067·e−0.522r(C≥5Cf/Pf).

Fig. 3
figure 3

Number of citations in the h-core of publications of the studied scientometricians correlated with the rank number of the scientists by the number of citations to the h-core papers

Spearman correlation coefficients (Table 6) between the numbers of citations in the individual elite sets reveal close correlation independent of the calculation method. In contrast to the correlations between the number of papers in the elite subsets (Table 4), in Table 6 all correlation coefficients (r = 0.93–1.00) between the number of citations obtained by either inside or outside standard are significant at p ≤ 0.05. The correlation coefficient between the number of citations to the h-core and π-core is very strong: r = 0.99.

Table 6 Spearman rank correlational coefficients between the number of citations in the elite sets, P(h), P(π), etc. (see Table 5) of the corresponding scientometric part-set (see Table 1)

Model 2: Relating h-index and π-index of part-sets to h-index and π-index of the complex set

Table 1 reveals, h(S)-index of the part-set of the scientometricians containing exclusively scientometric papers, is in all cases lower than the h(E)-index of the complex set except for VanRaan (the value of both indices is 23). The difference between the indices may be extremely high (e.g. Hirsch: 4 vs. 58 or Seglen: 9 vs. 61) or lower (e.g. Moravcsik: 11 vs. 16). The π-index of the part-set of scientometric papers (S) is also lower in most cases than that for the corresponding complex (total, E = S + O) set. In two cases however, the indices are similar (MacRoberts: 9.33 vs. 9.33 and VanRaan: 10.87 vs. 10.87). The difference is extreme large for Hirsch (39.81 vs. 105.69) and for Seglen (16.99 vs. 103.54).

The very large differences in the impact indices may be attributed to the very high number of citations of the mentioned authors obtained to “other”, non-scientometric papers.

The sum of π-index and h-index of part-sets: H(S + O), Π(S + O), respectively, i.e. the cumulative index is in all cases higher than the corresponding value of the complex set (Table 1). The ratios, H(S + O)/h(E) however, may be different: e.g. Abt: = 42/30 = 1.40, whereas Seglen: 67/61 = 1.09.

The present study may be regarded as a first step for exploring relations of indicators calculated from data of complex sets containing part-sets. The number of the studied scientists is low, therefore most of the means in Table 1 do not significantly differ (at p < 0.05) from each other. However, the mean h-index for the complex sets, h(E) = 30.30 of the studied scientists significantly differs (p = 0.01) from the mean h(S) = 13.80 which refers to the scientometric part-sets. Nevertheless, the mean cumulative h-index, H(S + O) = 37.20 does not significantly differ (p = 0.37) from the mean h-index of the complex set, h(E) = 30.30. The relation between the mean cumulative π-index Π(S + O) = 37.23 to the corresponding index for the complex set, π(E) = 31.52 is similar. Although the difference of the mean h(E) and H(S + O) and π(E) and Π(S + O) referring to the set of individuals in Table 1 is not significant, on the individual level there are great discrepancies. E.g. Abt: h(E) = 30 versus H(S + O) = 42 and similarly: Braun: h(E) = 32 versus H(S + O) = 48, or Moravcsik: h(E) = 16 versus H(S + O) = 24. The situation is similar also for the π-index. E.g. Hirsch: π(E) = 105.62 versus Π(S + O) = 150.02. This observation may indicate the application of the cumulative index (sum of indices obtained for the individual part-sets) for evaluating general (life) performance of individuals, instead of assessing impact of their publications by indicators obtained from the complex publication set containing publications from different fields.

The measure of difference between h(E) and H(S + O) or π(E) and Π(S + O) depends on several factors. E.g. number of papers in the different part-sets, difference in the bibliometric features of the corresponding fields, scientific level, relevance and timeliness of the papers, etc. The studied scientists are active on different fields outside scientometrics. This may be one of the reasons that the corresponding means of the total set of scientists do not significantly differ. Nevertheless, this observation should not ignore the selection of the publications of individuals according to fields. Surveying the h(E) and H(S + O) or π(E) and Π(S + O) data of the scientists listed in Table 1, it would indicate that selecting the papers in the complex sets of individuals and assessing them separately, would be highly relevant.

In order to clear some relations between the sum of h and π-indices of the part-sets and the h and π-index of the corresponding complex set, a model experiment was made.

Table 7 shows Model 2 with several examples for calculating h and π-index of part-sets with different number of papers derived from the corresponding complex sets. The question: what is the relation between the h-index of the complEx set, h(E) and the sum of h-indices of the corresponding part-sets (a1 and a2, b1 and b2…f1 and f2)? Similarly, what is the ratio between the sum of π-indices referring to the corresponding part-sets and the π-index of the complex sets?

Table 7 Model 2: Calculating h and π index and their sum for part-sets (a1, a2, b1, b2…f1, f2) derived from a common Parent Set (E) containing 9 papers

Model 2 shows a publication set (“complex” set: E) consisting of 9 papers which received a total of 42 citations. The papers (1–9) are ranked according to decreasing number of citations. The h-index of the complex set is 5 because there are 5 papers in the set obtaining 5 or more citations. The π-index, π = 0.01C(√P) is 0.24 because there are √9 = 3 papers in the π-core, and the number of citations obtained by the first three papers is equal to: 9 + 8 + 7 = 24.

Case “A” in Table 7 represents two part-sets (a/1, a/2, differing from each other by research field) of which total number of papers is equal to that in the complex set: E(P) = 5 + 4 = 9. It is assumed that the number of citations to the individual papers in the corresponding part-sets is the same as that in the complex set. Table 7 shows, the value of h-index is equal to three for both a/1 and a/2, accordingly the sum: H = 3 + 3 = 6 is higher than the h-index of the complex set: h(E) = 5. The situation is similar for cases B and C.

It is obvious that the sum of h-index of two part-sets cannot be lower than the h-index of the parent complex set. Naturally, this relation is valid only if the sum of papers in the part-sets equals to the number of papers in the complex set. Further, the number of citations to the individual papers is the same in both the complex and in the corresponding part-set. If there is only a single paper in one of the part-sets with zero citation (see case f/1 in Table 7) – and this way the h-index of this part-set is equal to zero - the sum of indices of the part-sets: H = (h1 + h2) = (0 + 5) = 5 will be equal to the h-index of the complex set, h(E) = 5.

In cases D, F, and G the sum of h-indices of the part-sets is equal to the h-index calculated for the complex set.

Taking into account that the sum of h-indices of the part-sets can be equal to or higher than the corresponding index of the complex set, six possible relations can be assumed between the h and π-indices. Accordingly, the sum of the h-indices increases whereas the sum of π-indices also increases or decreases or it remains the same as the π-index of the complex set. Further, the sum of h-indices corresponds to that of the complex set, whereas the sum of π-indices decreases, increases or does not change related to the π-index of the complex set.

Model 2 demonstrates each of the mentioned opportunities. Case C, e.g. shows that the sum of h-indices of the part-sets (h1 + h2 = 5 + 1 = 6) is greater than the h-index of the complex set: h(E) = 5, whereas the sum of π-indices (π1 + π2 = 0.17 + 0.03 = 0.20) is lower than the value of the complex set: π(E) = 0.24.

Considering the data referring to A-G cases in Table 7, the conclusion may be drawn that the sum of h-indices of part sets can be equal to or higher than the h-index of the corresponding complex set. Further, the Model indicates that the sum of π-indices of the part-sets can be equal to or lower, and even higher that the π-index of the complex set.

Table 1 shows that the studied scientists belong to class A. Accordingly, both the sum of h-indices (h1 + h2) and π-indices (π1 + π2) of the part-sets (i.e. set of scientometric publications and set of papers on “other” topics) are higher than the h-index and π-index of the corresponding complex set.

Model 3: Maximum value (Hmax) of the sum of h-indices (cumulative h-index) of two part-sets (h1 + h2) belonging to the same complex set

In studying the publication indicators of part-sets derived from a complex set, the question may arise: what is the maximum value of the sum of h-indices of the part-sets? Model 3 may yield an answer. The criteria of the Model given in Table 8 are the following:

  • the complex set (E) contains 12 publications,

  • each publication received 5 citations,

  • the number of publications (12) in the complex set (E) is higher than two times the value of the h-index of the complex set: 2h(E) = 2·5 = 10,

  • the sum of the number of papers in the part-set pairs (a1 and a2) or (b1 and b2) or…(g1 and g2) equals the number of papers in the complex set, i.e.: P(E) = P(a1) + P(a2) = P(b1) + P(b2)…. = 12.

Table 8 Model 3: Maximum value of the sum of h-indices

Let’s start with part-sets a1 and a2 containing 11 papers and a single paper, respectively. It is assumed that the number of papers gradually decreases in the first part-set (a1) from 11 to 6 (g1), whereas it increases in the second part-set from unity (a2) up to 6 (g2). The last two part-sets (g1 and g2) contain six papers each. The model shows that the sum of h-indices (H) of the part-sets increases from six up to 10.

Naturally, with more papers and citations in the parent set, the h-index of the parent set could change, and accordingly also the h-index of the part-sets and their sum (H) may increase. Nevertheless, the sum of h-index of the part-sets from the same parent (complex) set does not exceed two times the h-index of the complex set.

Figure 4 shows another example. In the presented model, a complex set may contain two part-sets (1 and 2) according to different topics of the papers. Let us investigate the change of the sum of h-index of the part-sets, dynamically. The rank number of the publication sets on the x-axis indicates the yearly condition of the sets. Let us assume that in the first year of the studied period part-set 2 contains 10 papers each cited 10 times, and these papers obtain no new citations in the next years. Accordingly the h-index of this part-set: h2 = 10 in each year. It is assumed further, there is only a single paper with a single citation in part-set 1, the first year. Accordingly the h-index of this set: h1 = 1. The sum of h-index of the part-sets in the first year: H = h1 + h2 = 10 + 1 = 11. In the second year the number of and citations to the papers in part-set 2 does not change. It is assumed however that in the second year part-set 1 will contain two papers each cited two times. Accordingly h1 = 2. The corresponding sum: H = 10 + 2 = 12. The number of papers in the corresponding complex set (part-set 1 and 2 together) is 12 in the second year, whereas its h-index remains 10, as there is 10 papers in the complex set cited 10 times each, and two papers with two citations each.

Fig. 4
figure 4

Model 3: Possible maximum number (here: Hmax = 20) of the sum of h-index of two part-sets (here: h1 = 1, 2, 3…10 and h2 = 10, 10, 10…10). The part-sets contain 1 and 10, 2 and 10, 3 and 10…10 and 10 papers, respectively

We may assume that the number of papers and citations would increase in set 1 gradually from year to year with unity up to 10. Accordingly, in the 10th year the sum of h-index, i.e. cumulative h-index of the part-sets: (h1 + h2) = 20. In the 10th year, the complex set (1 and 2 together) would contain 20 papers each cited 10 times. Consequently, the h-index of the complex set is equal to 10. Accordingly, the sum of h-index of the part-sets, i.e. cumulative h-index is two times the h-index of the complex set.

If in the next years the number of citations to the papers published in the first 10 years period does not change, whereas the possible new papers obtain 10 or less than 10 citations, the sum of h-index of the part-sets (1 and 2) would remain 20. If however the number of papers and citations increased in the complex set (and of course parallel in the part-sets), the h-index of the complex set and sum of h-index of the part-sets would increase accordingly. Nevertheless the highest value of the sum of h-index of part-sets within the corresponding complex set, could not surpass 2h(E), i.e.: Hmax = (h1 + h2)max = 2h(E).

Summary and conclusions

The evaluation of scientific publications of individuals is a difficult endeavour. The methods applied should correspond to the criteria of the assessment in question. Recently, methods based on scientometric indicators derived from only a subset (core or elite set) of the total are preferably applied. However, several scientists publish papers not only on a single but on several science fields or subfields of which bibliometric features (i.e. characteristic type of publications, developmental rate, citing or referencing norms and traditions, etc.) may be different. This phenomenon can be observed also by studying publications of authors in scientometrics. Therefore, in assessing publications of scientists it seems to be reasonable, to separate their complex publication set into part-sets by fields and, to calculate the corresponding scientometric indices accordingly.

From Model 2 presented, it concludes that the sum of h-index of part-sets within the corresponding parent (complex) set can be equal to or higher than the h-index referring to the corresponding complex (total) set. In contrast, the sum of π-index of the part-sets may be equal to or higher or even lower than the π-index of the corresponding complex set. The data of the publications of several scientometricians in Table 1 show that in practice, the sum of h or π index obtained for the part-sets separately, is greater in all cases than the same index calculated for the corresponding complex (total) set, as a whole. Accordingly H(S + O) = h(S) + h(O) > h(E) and Π(S + O) = π(S) + π(O) > π(E), where “S” refers to papers in the scientometric part-set, “O” to papers in the “other” (non scientometric) part-set, whereas “E” stands for all publications in the complex set, i.e. sum of S and O papers. The mentioned findings indicate that analysing complex publication sets of scientists would be reasonable only after selecting the publications according to the different scientific fields.

From Model 3 presented, it concludes that the possible maximum value of the sum of h-index of the corresponding part-sets originated from a common complex set is equal to two times the value of h-index of the complex (parent) set.

There are several statistics (e.g. h, g, π, percentage, etc.) offered in the literature (Vinkler 2017a, b) for calculating elite subsets. The size of the elite sets is not appropriate for assessing scientific eminence, except for the h-index. The size of elite sets strongly depends on the method selected. Significant correlation coefficients were obtained between the number of papers in the elite sets calculated by inside standards (i.e. standards derived from the total set analysed) (Table 4). This feature was found valid also for the number of papers in the elite sets calculated with outside (field) standards. In contrast, the correlation coefficients between the size of the elite sets obtained by inside standards and that obtained by outside standards are not significant (Table 4).

Relations concerning the number of citations in the elite sets obtained by different methods are different from that mentioned above (Table 5). Namely, the correlation coefficients between the number of citations in the different elite sets obtained by either inside or outside standard are significant (Table 6). This finding would indicate, scientists with high number of citations would show high impact according to any reasonable scientometric evaluation method.

The study may indicate some special consequences as to publication assessment of scientists. The evaluation exercises may refer to several aspects. They may aim to the comparison of the scientific performance of individuals active on a special field or topic. In selecting a person as university professor of Physical Chemistry e.g., we would not be primarily interested in the publications of the candidates on say, history, economics or library science. Accordingly, we should select the publications of the individuals on physical chemistry, and calculate the impact indices referring only to this field. It may happen however, that several scientists are suggested by a board to be decorated with a general (life-work) award (like “state prize” or “for merits in advancing science”, etc.). In this case all scientific publications of the candidates independent of their topic should be taken into account. Accordingly, for characterizing comprehensive (total) scientific performance of scientists, we should use the sum (i.e. cumulative index) of the corresponding indicators calculated for the part-sets of their total, separately. Naturally, it would be relevant weighting the individual indices which refer to special fields. But, these aspects cannot be tackled in this paper.

The application of the part-sets method presented here may be maintained by the well-known fact that the bibliometric features depend on the field. Consequently, impact indicators, like h-index or π-index obtained from a mixed (complex) publication set (i.e. sets containing journal papers from several fields) cannot reflect the scientific merits of scientists correctly. For comparing scientific impact of researchers working on several fields, separating the publications according to the corresponding fields may be recommended.

It is obvious however that always the goals of the corresponding assessment should determine the possible items and methods applied.