Skip to main content
Log in

The prediction of science indicators using information theory

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

The study discusses the application of various forms of time series analysis to national performance data for EEC countries and the US. First, it is shown that at the aggregated level, a straightforward relation exists between output and input, which varies with time. Various analytical techniques to account for the time factor are discussed. By using information theory, a simple formula can be derived which gives the best prediction for the following year's data. Subsequently, this model is extended to multi-variate forecasting of distributions. Additionally, it can be shown by using this method that in terms of percentage of world share of publications the hypothesis that the EEC develops as a single publication system has to be rejected. However, when co-authorship relations among EEC member countries are used as an indicator, the predominance of a system is suggested.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes and references

  1. R.J.W. Tussen, J. de Leeuw, Multivariate data-analysis methods in bibliometric studies of science and technology, In:A.F.J. van Raan (Ed.),Handbook of Qantitative Studies of Science and Technology, Amsterdam, Elsevier, 1988;L. Leydesdorff, Some methodological guidelines for the interpretation of scientometric mappings,”R & D Evaluation Newsletter, (1989) No. 2, 4–7.

    Google Scholar 

  2. F. Narin,Evaluative Bibliometrics, CHI Inc., Cherry Hill, 1976;H.F. Moed, W.J.M. Burger, J.G. Frankfort, A.F.J. van Raan,On the Measurement of Research Performance: The use of Bibliometric Indicators, Leiden, Research Policy Unit, 1983;T. Braun, W. Glänzel, A. Schubert,Scientometric Indicators. A. 32-Country Comparative Evaluation of Publishing Performance and Citations Impact Singapore/Philadelphia: World Scientific, 1985;A.F.J. van Raan (Ed.),Handbook of Quantitative Science and Technology Studies Amsterdam Elsevier, 1988;H.F. Moed,The Use of Bibliometric Indicators for the Assessment of Research Performance in the Natural and Life Sciences, Ph.D. Thesis, Leiden, 1989.

    Google Scholar 

  3. M. P. Carpenter, F. Narin, Clustering of scientific journals,Journal of the American Society of Information Science, 24 (1973) 425;H. G. Small, Co-citation in the scientific literature: A new measure of the relationship between two documents,Journal of the American Society of Information Science, 24 (1973) 265;M. Callon, J-P. Courtial, W.A. Turner, S. Bauin, From translations to problematic networks: an introduction to co-word analysis,Social Science Information, 22 (1983) 191–235;H.Small, E. Sweeney, Clustering the Science Citation Index using co-citation I. A comparison of methods,Scientometrics, 7 (1985) 391;P. Healey, H. Rothman, P. Koch, An experiment in science mapping for research planning,Research Policy, 15 (1986) 179–84;L. Leydesdorff, Various methods for the mapping of science,Scientometrics, 11 (1987) 295–324;L. Leydesdorff, Words and co-words as indicators of intellectual organization,Research Policity, 18 (1989) 209–223.

    Google Scholar 

  4. H. Small, E. Greenlee, Collagen research in the 1970s,Scientometrics, 10 (1986) 95–117;L. Leydesdorff, The development of frames of references,Scientometrics, 9 (1986) 103–25;L. Leydesdorff, P. Van der Schaar, The use of scientometric methods for evaluating national research programs,Science & Technology Studies, 5 (1987) 22–31.

    Article  Google Scholar 

  5. This assumption of “the constant journal set” played a role in the debate on the “decline of British science.” See:L. Leydesdorff, Problems with the “measurement” of national scientific performance,Science and Public Policy, 15 (1988) 149–152;J. Anderson, P.M.D. Collins, J. Irvine, P.A. Isard, B.R. Martin, F. Narin, K. Stevens, On-line approaches to measuring national scientific output — A cautionary tale,Science and Public Policy, 15 (1988) 153–161;T. Braun, W. Glänzel, A. Schubert, Assessing assessments of British science. Some facts and figures to accept or decline,Scientometrics, 15 (1989) 165–170;L. Leydesdorff, The Science Citation Index and the measurement of national performance in terms of numbers of scientific publications,Scientometrics, 17 (1989) 111–120.

    Google Scholar 

  6. Leydesdorff, 1987,op cit. note 3.

    Google Scholar 

  7. The study is part of a series of studies. In two previous studies, I have shown that methods from information theory give us a basis to address in a single methodological framework crucial questions in science studies such as the relations among heterogeneous entities in networks at different levels of aggregation, also using different measurement scales, both in a static and in a dynamic model. (L. Leydesdorff, Relations among science indicators or more generally among anything one might wish to count about texts. I. The static model,Scientometrics, 18 (1990); 281–307L. Leydesdorff, Predictions on the basis of science indicators or more generally on the basis of anything one might know about texts. II. The dynamics of science,Scientometrics, (in press).) In this study, I want to extend that approach to the prediction of most likely values for future events. In a next study, extension to the static and dynamic analysis of scientometric transaction matrices is envisaged. (L, Leydesdorff, The static and dynamic analysis of network data using information theory. Paper to be presented at XIIth World Congress of Sociology, Madrid, 1990).

    Article  Google Scholar 

  8. Anderson et al. 1988,op. cit. note 5;Leydesdorff 1989,op. cit., note 5.L. Leydesdorff, The Science Citation Index and the measurement of national performance in terms of numbers of scientific publications,Scientometrics, 17 (1989) 111–120.

    Google Scholar 

  9. For the possible order of magnitude of misspellings in theSCI, see also:Leydesdorff 1988.,op cit., note 5.

    Google Scholar 

  10. This database is compiled biennially for the US National Science Foundation by Computer Horizons Inc. (CHI). Recently it was suggested that “letters” should also be included in this more qualitative indicator of scientific output. (Braun et al. 1989.,op. cit., note 5).

    Article  Google Scholar 

  11. Using integer counting one full count is attributed to each of the institutional addresses in case of a co-authored paper. Alternatively, one may attribute an, equal or weighted fraction to each of the authors involed. See also:Anderson et al. 1988.,op. cit., note 5.

    Google Scholar 

  12. This data is generated by combining each EEC country's data with the combined remainder of the EEC using a Boolean AND. The data is being collected primarily for use in a future study of European network relations in terms of international co-authorship, but is occasionally used in the present study as well. In my opinion, trends in international co-authorship, the attribution of credit for co-authored publications, and the measurement of national performance in terms of numbers of publications should not be resumed under one single indicator — as has been advocated by those who favour “fractional counting” (e.g.,Anderson et al.,op. cit.) — since co-authorship relations merit separate analysis as a network indicator. See also:G. Lewison, P. Cunningham, The use of bibliometrics in the evaluation of Community biotechnology research programmes, In:A. F. J. van Raan, A. J. Nederhof, H. F. Moed (Eds),Science and Technology Indicators. Their Use in Science Policy and Their Role in Science Studies, DSWO, Leiden, 1989, pp. 99–114.

    Google Scholar 

  13. See for the PPP normalizion: OECD,National Accounts, Vol. 1. Main Aggregates 1951–1980, Paris, 1982, p. 9.

  14. Coverage of EEC countries, the US and Japan is also more complete than in previously published surveys. For example, the US National Science Foundation'sScience and Engineering Indicators —1987 gives normalized input statistics for three major European countries only; an older UK report extends this with data for the Netherlands (B. Martin, J. Irvine, N. Minchin,An International Comparison of Government Funding of Academic and Academically Related Research, ABRC Science Policy Studies No. 2, London 1986). Data for 1979 was given inOECD Science and Technology Indicators (Paris, 1984), but since this data is normalized on a different basis, it is omitted here.

  15. When the observations are auto-correlated, the errors are likely to be auto-correlated as well. See also:R. McCleary, R. A. Hay Jr.,Applied Time Series Analysis for the Social Sciences, Sage, Beverly Hills/ London, 1980;C. Chatfield,The Analysis of Time Series, Chapman and Hall, London/New York, 1975, 1984.

    Google Scholar 

  16. G. E. P. Box, G. C. Tiao, A change in level of a nonstationary time series,Biometrika, 51 (1965) 181–192;G. E. P. Box, G. M. Jenkins,Time Series Analysis: Forecasting and Control, Revised Edition,Holden-Day, San Francisco, 1976.

    Google Scholar 

  17. SPSS/PC+Trends, SPSS,Chicago, 1987, B-30.

  18. First, with insufficient observations, the use of autocorrelation and partial autocorrelation functions for the identification of the model becomes problematic; and secondly, significance testing of the fits is sensitive to the amount of data involved.McClearly et al. 1980 (op. cit., note 17) recommend a minimum of fifty measurement points as a rule of thumb.

    Google Scholar 

  19. See also:K. Krippendorff,Information Theory. Structural Models for Qualitative Data, Sage, Beverly Hills, etc., 1986;Leydesdorff (forthcoming),op. cit. The static and dynamic analysis of network data using information theory. Paper to be presented at XIIth World Congress of Sociology, Madrid, 1990).

  20. See also:H. Theil,Applied Economic Forecasting, North-Holland, Amsterdam, 1966;H. Theil Statistical Decomposition Analysis, North-Holland, Amsterdam, 1972.

    Google Scholar 

  21. Ibid..

    Google Scholar 

  22. Leydesdorff 1990,op. cit., note 7.

    Article  Google Scholar 

  23. Alternatively, depending on the research question, one may followTheil (1966.,op. cit., pp. 325–7) who sought to improve the predictions based on information theory, by weighting more recent data linearly by using the following formula:\({{\sum {F_i '} = (NF_n + (N - 1)F_{n - 1} + (N - 2)F_{n - 2} + \ldots + F_i )} \mathord{\left/ {\vphantom {{\sum {F_i '} = (NF_n + (N - 1)F_{n - 1} + (N - 2)F_{n - 2} + \ldots + F_i )} {({1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-\nulldelimiterspace} 2}N(N + 1))}}} \right. \kern-\nulldelimiterspace} {({1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-\nulldelimiterspace} 2}N(N + 1))}}\) The assumption of equality for Qn+1 and Pn in order to estimate next year's share can be maintained; however, the computation of relative frequencies and predictions becomes a bit more complex.

    Google Scholar 

  24. SPSS/PC+Trends, Chicago, 1987.

  25. See also:Leydesdorff, 1990.op. cit., note 7.

    Article  Google Scholar 

  26. See also:Leydesdorff 1989.,op. cit. note 5.

    Google Scholar 

  27. See also:Theil 1972.,op. cit., note 22.

    Google Scholar 

  28. However, the analysis in terms of Markov chains is not necessarily fruitful, since the chains are in most scientometric applications not regular. (F. C. Schoute,Prestatie-analyse van Telecommunicatie Systemen, Kluwer, Deventer etc. 1988.)

    Google Scholar 

  29. Anderson et al. 1988,op. cit.;Braun et al. 1989,op. cit., note 5.T. Braun, W. Glänzel, A. Schubert, Assessing assessments of British science. Some facts and figures to accept or decline,Scientometrics, 15 (1989) 165–170.

    Google Scholar 

  30. Lewison et al. 1989.,op. cit., note 14.

    Google Scholar 

  31. See among others.,, notes 4 to 6.

    Article  Google Scholar 

  32. Krippendorff 1986,op. cit., note 21K. Krippendorff,Information Theory. Structural Models for Qualitative Data, Sage, Beverly Hills, etc., 1986;Leydesdorff,op. cit., note 7.L, Leydesdorff, The static and dynamic analysis of network data using information theory. Paper to be presented at XIIth World Congress of Sociology, Madrid, 1990).

  33. J. Irvine, B. R. Martin, Is Britain spending enough on science?Nature, 323 (October 16, 1986) 591–4;D. C. Smith, P. M. D. Collins, D. M. Hicks, S. Wyatt, National performance in basic research,Nature, 323 (October 23, 1986) 681–4;Evaluation of National Performance in Basic Research, ABRC Science Policy Studies No. 1, London, 1986.

    Article  Google Scholar 

  34. Ibid..

    Article  Google Scholar 

  35. This can be achieved either by transforming the data using algorithms which remove the autocorrelation or by ARIMA algorithms which estimate it. See:SPSS-PC+Trends, op. cit., Chicago, 1987, B-72.

  36. Leydesdorff (in press).,op. cit., note 7.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Leydesdorff, L. The prediction of science indicators using information theory. Scientometrics 19, 297–324 (1990). https://doi.org/10.1007/BF02095353

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02095353

Keywords

Navigation