The prediction of science indicators using information theory

Leydesdorff, L.

doi:10.1007/BF02095353

The prediction of science indicators using information theory

Published: September 1990

Volume 19, pages 297–324, (1990)
Cite this article

Scientometrics Aims and scope Submit manuscript

L. Leydesdorff¹

106 Accesses
15 Citations
Explore all metrics

Abstract

The study discusses the application of various forms of time series analysis to national performance data for EEC countries and the US. First, it is shown that at the aggregated level, a straightforward relation exists between output and input, which varies with time. Various analytical techniques to account for the time factor are discussed. By using information theory, a simple formula can be derived which gives the best prediction for the following year's data. Subsequently, this model is extended to multi-variate forecasting of distributions. Additionally, it can be shown by using this method that in terms of percentage of world share of publications the hypothesis that the EEC develops as a single publication system has to be rejected. However, when co-authorship relations among EEC member countries are used as an indicator, the predominance of a system is suggested.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes and references

R.J.W. Tussen, J. de Leeuw, Multivariate data-analysis methods in bibliometric studies of science and technology, In:A.F.J. van Raan (Ed.),Handbook of Qantitative Studies of Science and Technology, Amsterdam, Elsevier, 1988;L. Leydesdorff, Some methodological guidelines for the interpretation of scientometric mappings,”R & D Evaluation Newsletter, (1989) No. 2, 4–7.
Google Scholar
F. Narin,Evaluative Bibliometrics, CHI Inc., Cherry Hill, 1976;H.F. Moed, W.J.M. Burger, J.G. Frankfort, A.F.J. van Raan,On the Measurement of Research Performance: The use of Bibliometric Indicators, Leiden, Research Policy Unit, 1983;T. Braun, W. Glänzel, A. Schubert,Scientometric Indicators. A. 32-Country Comparative Evaluation of Publishing Performance and Citations Impact Singapore/Philadelphia: World Scientific, 1985;A.F.J. van Raan (Ed.),Handbook of Quantitative Science and Technology Studies Amsterdam Elsevier, 1988;H.F. Moed,The Use of Bibliometric Indicators for the Assessment of Research Performance in the Natural and Life Sciences, Ph.D. Thesis, Leiden, 1989.
Google Scholar
M. P. Carpenter, F. Narin, Clustering of scientific journals,Journal of the American Society of Information Science, 24 (1973) 425;H. G. Small, Co-citation in the scientific literature: A new measure of the relationship between two documents,Journal of the American Society of Information Science, 24 (1973) 265;M. Callon, J-P. Courtial, W.A. Turner, S. Bauin, From translations to problematic networks: an introduction to co-word analysis,Social Science Information, 22 (1983) 191–235;H.Small, E. Sweeney, Clustering the Science Citation Index using co-citation I. A comparison of methods,Scientometrics, 7 (1985) 391;P. Healey, H. Rothman, P. Koch, An experiment in science mapping for research planning,Research Policy, 15 (1986) 179–84;L. Leydesdorff, Various methods for the mapping of science,Scientometrics, 11 (1987) 295–324;L. Leydesdorff, Words and co-words as indicators of intellectual organization,Research Policity, 18 (1989) 209–223.
Google Scholar
H. Small, E. Greenlee, Collagen research in the 1970s,Scientometrics, 10 (1986) 95–117;L. Leydesdorff, The development of frames of references,Scientometrics, 9 (1986) 103–25;L. Leydesdorff, P. Van der Schaar, The use of scientometric methods for evaluating national research programs,Science & Technology Studies, 5 (1987) 22–31.
Article Google Scholar
This assumption of “the constant journal set” played a role in the debate on the “decline of British science.” See:L. Leydesdorff, Problems with the “measurement” of national scientific performance,Science and Public Policy, 15 (1988) 149–152;J. Anderson, P.M.D. Collins, J. Irvine, P.A. Isard, B.R. Martin, F. Narin, K. Stevens, On-line approaches to measuring national scientific output — A cautionary tale,Science and Public Policy, 15 (1988) 153–161;T. Braun, W. Glänzel, A. Schubert, Assessing assessments of British science. Some facts and figures to accept or decline,Scientometrics, 15 (1989) 165–170;L. Leydesdorff, The Science Citation Index and the measurement of national performance in terms of numbers of scientific publications,Scientometrics, 17 (1989) 111–120.
Google Scholar
Leydesdorff, 1987,op cit. note 3.
Google Scholar
The study is part of a series of studies. In two previous studies, I have shown that methods from information theory give us a basis to address in a single methodological framework crucial questions in science studies such as the relations among heterogeneous entities in networks at different levels of aggregation, also using different measurement scales, both in a static and in a dynamic model. (L. Leydesdorff, Relations among science indicators or more generally among anything one might wish to count about texts. I. The static model,Scientometrics, 18 (1990); 281–307L. Leydesdorff, Predictions on the basis of science indicators or more generally on the basis of anything one might know about texts. II. The dynamics of science,Scientometrics, (in press).) In this study, I want to extend that approach to the prediction of most likely values for future events. In a next study, extension to the static and dynamic analysis of scientometric transaction matrices is envisaged. (L, Leydesdorff, The static and dynamic analysis of network data using information theory. Paper to be presented at XIIth World Congress of Sociology, Madrid, 1990).
Article Google Scholar
Anderson et al. 1988,op. cit. note 5;Leydesdorff 1989,op. cit., note 5.L. Leydesdorff, The Science Citation Index and the measurement of national performance in terms of numbers of scientific publications,Scientometrics, 17 (1989) 111–120.
Google Scholar
For the possible order of magnitude of misspellings in theSCI, see also:Leydesdorff 1988.,op cit., note 5.
Google Scholar
This database is compiled biennially for the US National Science Foundation by Computer Horizons Inc. (CHI). Recently it was suggested that “letters” should also be included in this more qualitative indicator of scientific output. (Braun et al. 1989.,op. cit., note 5).
Article Google Scholar
Using integer counting one full count is attributed to each of the institutional addresses in case of a co-authored paper. Alternatively, one may attribute an, equal or weighted fraction to each of the authors involed. See also:Anderson et al. 1988.,op. cit., note 5.
Google Scholar
This data is generated by combining each EEC country's data with the combined remainder of the EEC using a Boolean AND. The data is being collected primarily for use in a future study of European network relations in terms of international co-authorship, but is occasionally used in the present study as well. In my opinion, trends in international co-authorship, the attribution of credit for co-authored publications, and the measurement of national performance in terms of numbers of publications should not be resumed under one single indicator — as has been advocated by those who favour “fractional counting” (e.g.,Anderson et al.,op. cit.) — since co-authorship relations merit separate analysis as a network indicator. See also:G. Lewison, P. Cunningham, The use of bibliometrics in the evaluation of Community biotechnology research programmes, In:A. F. J. van Raan, A. J. Nederhof, H. F. Moed (Eds),Science and Technology Indicators. Their Use in Science Policy and Their Role in Science Studies, DSWO, Leiden, 1989, pp. 99–114.
Google Scholar
See for the PPP normalizion: OECD,National Accounts, Vol. 1. Main Aggregates 1951–1980, Paris, 1982, p. 9.
Coverage of EEC countries, the US and Japan is also more complete than in previously published surveys. For example, the US National Science Foundation'sScience and Engineering Indicators —1987 gives normalized input statistics for three major European countries only; an older UK report extends this with data for the Netherlands (B. Martin, J. Irvine, N. Minchin,An International Comparison of Government Funding of Academic and Academically Related Research, ABRC Science Policy Studies No. 2, London 1986). Data for 1979 was given inOECD Science and Technology Indicators (Paris, 1984), but since this data is normalized on a different basis, it is omitted here.
When the observations are auto-correlated, the errors are likely to be auto-correlated as well. See also:R. McCleary, R. A. Hay Jr.,Applied Time Series Analysis for the Social Sciences, Sage, Beverly Hills/ London, 1980;C. Chatfield,The Analysis of Time Series, Chapman and Hall, London/New York, 1975, 1984.
Google Scholar
G. E. P. Box, G. C. Tiao, A change in level of a nonstationary time series,Biometrika, 51 (1965) 181–192;G. E. P. Box, G. M. Jenkins,Time Series Analysis: Forecasting and Control, Revised Edition,Holden-Day, San Francisco, 1976.
Google Scholar
SPSS/PC+Trends, SPSS,Chicago, 1987, B-30.
First, with insufficient observations, the use of autocorrelation and partial autocorrelation functions for the identification of the model becomes problematic; and secondly, significance testing of the fits is sensitive to the amount of data involved.McClearly et al. 1980 (op. cit., note 17) recommend a minimum of fifty measurement points as a rule of thumb.
Google Scholar
See also:K. Krippendorff,Information Theory. Structural Models for Qualitative Data, Sage, Beverly Hills, etc., 1986;Leydesdorff (forthcoming),op. cit. The static and dynamic analysis of network data using information theory. Paper to be presented at XIIth World Congress of Sociology, Madrid, 1990).
See also:H. Theil,Applied Economic Forecasting, North-Holland, Amsterdam, 1966;H. Theil Statistical Decomposition Analysis, North-Holland, Amsterdam, 1972.
Google Scholar
Ibid..
Google Scholar
Leydesdorff 1990,op. cit., note 7.
Article Google Scholar
Alternatively, depending on the research question, one may followTheil (1966.,op. cit., pp. 325–7) who sought to improve the predictions based on information theory, by weighting more recent data linearly by using the following formula:\({{\sum {F_i '} = (NF_n + (N - 1)F_{n - 1} + (N - 2)F_{n - 2} + \ldots + F_i )} \mathord{\left/ {\vphantom {{\sum {F_i '} = (NF_n + (N - 1)F_{n - 1} + (N - 2)F_{n - 2} + \ldots + F_i )} {({1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-\nulldelimiterspace} 2}N(N + 1))}}} \right. \kern-\nulldelimiterspace} {({1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-\nulldelimiterspace} 2}N(N + 1))}}\) The assumption of equality for Q_n+1 and P_n in order to estimate next year's share can be maintained; however, the computation of relative frequencies and predictions becomes a bit more complex.
Google Scholar
SPSS/PC+Trends, Chicago, 1987.
See also:Leydesdorff, 1990.op. cit., note 7.
Article Google Scholar
See also:Leydesdorff 1989.,op. cit. note 5.
Google Scholar
See also:Theil 1972.,op. cit., note 22.
Google Scholar
However, the analysis in terms of Markov chains is not necessarily fruitful, since the chains are in most scientometric applications not regular. (F. C. Schoute,Prestatie-analyse van Telecommunicatie Systemen, Kluwer, Deventer etc. 1988.)
Google Scholar
Anderson et al. 1988,op. cit.;Braun et al. 1989,op. cit., note 5.T. Braun, W. Glänzel, A. Schubert, Assessing assessments of British science. Some facts and figures to accept or decline,Scientometrics, 15 (1989) 165–170.
Google Scholar
Lewison et al. 1989.,op. cit., note 14.
Google Scholar
See among others.,, notes 4 to 6.
Article Google Scholar
Krippendorff 1986,op. cit., note 21K. Krippendorff,Information Theory. Structural Models for Qualitative Data, Sage, Beverly Hills, etc., 1986;Leydesdorff,op. cit., note 7.L, Leydesdorff, The static and dynamic analysis of network data using information theory. Paper to be presented at XIIth World Congress of Sociology, Madrid, 1990).
J. Irvine, B. R. Martin, Is Britain spending enough on science?Nature, 323 (October 16, 1986) 591–4;D. C. Smith, P. M. D. Collins, D. M. Hicks, S. Wyatt, National performance in basic research,Nature, 323 (October 23, 1986) 681–4;Evaluation of National Performance in Basic Research, ABRC Science Policy Studies No. 1, London, 1986.
Article Google Scholar
Ibid..
Article Google Scholar
This can be achieved either by transforming the data using algorithms which remove the autocorrelation or by ARIMA algorithms which estimate it. See:SPSS-PC+Trends, op. cit., Chicago, 1987, B-72.
Leydesdorff (in press).,op. cit., note 7.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Science Dinamics, Nieuwe Achtergracht 166, 1018 WV, Amsterdam, (The Netherlands)
L. Leydesdorff

Authors

L. Leydesdorff
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Leydesdorff, L. The prediction of science indicators using information theory. Scientometrics 19, 297–324 (1990). https://doi.org/10.1007/BF02095353

Download citation

Received: 05 December 1989
Issue Date: September 1990
DOI: https://doi.org/10.1007/BF02095353

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The prediction of science indicators using information theory

Abstract

Access this article

Similar content being viewed by others

Trends in the Publication Activity of Russian Organizations from 2000 to 2019

Accurate forecast of countries’ research output by macro-level indicators

Mutual Granger “causality” between scientific instruments and scientific publications

Notes and references

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The prediction of science indicators using information theory

Abstract

Access this article

Similar content being viewed by others

Trends in the Publication Activity of Russian Organizations from 2000 to 2019

Accurate forecast of countries’ research output by macro-level indicators

Mutual Granger “causality” between scientific instruments and scientific publications

Notes and references

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation