Abstract
The study discusses the application of various forms of time series analysis to national performance data for EEC countries and the US. First, it is shown that at the aggregated level, a straightforward relation exists between output and input, which varies with time. Various analytical techniques to account for the time factor are discussed. By using information theory, a simple formula can be derived which gives the best prediction for the following year's data. Subsequently, this model is extended to multi-variate forecasting of distributions. Additionally, it can be shown by using this method that in terms of percentage of world share of publications the hypothesis that the EEC develops as a single publication system has to be rejected. However, when co-authorship relations among EEC member countries are used as an indicator, the predominance of a system is suggested.
Similar content being viewed by others
Notes and references
R.J.W. Tussen, J. de Leeuw, Multivariate data-analysis methods in bibliometric studies of science and technology, In:A.F.J. van Raan (Ed.),Handbook of Qantitative Studies of Science and Technology, Amsterdam, Elsevier, 1988;L. Leydesdorff, Some methodological guidelines for the interpretation of scientometric mappings,”R & D Evaluation Newsletter, (1989) No. 2, 4–7.
F. Narin,Evaluative Bibliometrics, CHI Inc., Cherry Hill, 1976;H.F. Moed, W.J.M. Burger, J.G. Frankfort, A.F.J. van Raan,On the Measurement of Research Performance: The use of Bibliometric Indicators, Leiden, Research Policy Unit, 1983;T. Braun, W. Glänzel, A. Schubert,Scientometric Indicators. A. 32-Country Comparative Evaluation of Publishing Performance and Citations Impact Singapore/Philadelphia: World Scientific, 1985;A.F.J. van Raan (Ed.),Handbook of Quantitative Science and Technology Studies Amsterdam Elsevier, 1988;H.F. Moed,The Use of Bibliometric Indicators for the Assessment of Research Performance in the Natural and Life Sciences, Ph.D. Thesis, Leiden, 1989.
M. P. Carpenter, F. Narin, Clustering of scientific journals,Journal of the American Society of Information Science, 24 (1973) 425;H. G. Small, Co-citation in the scientific literature: A new measure of the relationship between two documents,Journal of the American Society of Information Science, 24 (1973) 265;M. Callon, J-P. Courtial, W.A. Turner, S. Bauin, From translations to problematic networks: an introduction to co-word analysis,Social Science Information, 22 (1983) 191–235;H.Small, E. Sweeney, Clustering the Science Citation Index using co-citation I. A comparison of methods,Scientometrics, 7 (1985) 391;P. Healey, H. Rothman, P. Koch, An experiment in science mapping for research planning,Research Policy, 15 (1986) 179–84;L. Leydesdorff, Various methods for the mapping of science,Scientometrics, 11 (1987) 295–324;L. Leydesdorff, Words and co-words as indicators of intellectual organization,Research Policity, 18 (1989) 209–223.
H. Small, E. Greenlee, Collagen research in the 1970s,Scientometrics, 10 (1986) 95–117;L. Leydesdorff, The development of frames of references,Scientometrics, 9 (1986) 103–25;L. Leydesdorff, P. Van der Schaar, The use of scientometric methods for evaluating national research programs,Science & Technology Studies, 5 (1987) 22–31.
This assumption of “the constant journal set” played a role in the debate on the “decline of British science.” See:L. Leydesdorff, Problems with the “measurement” of national scientific performance,Science and Public Policy, 15 (1988) 149–152;J. Anderson, P.M.D. Collins, J. Irvine, P.A. Isard, B.R. Martin, F. Narin, K. Stevens, On-line approaches to measuring national scientific output — A cautionary tale,Science and Public Policy, 15 (1988) 153–161;T. Braun, W. Glänzel, A. Schubert, Assessing assessments of British science. Some facts and figures to accept or decline,Scientometrics, 15 (1989) 165–170;L. Leydesdorff, The Science Citation Index and the measurement of national performance in terms of numbers of scientific publications,Scientometrics, 17 (1989) 111–120.
Leydesdorff, 1987,op cit. note 3.
The study is part of a series of studies. In two previous studies, I have shown that methods from information theory give us a basis to address in a single methodological framework crucial questions in science studies such as the relations among heterogeneous entities in networks at different levels of aggregation, also using different measurement scales, both in a static and in a dynamic model. (L. Leydesdorff, Relations among science indicators or more generally among anything one might wish to count about texts. I. The static model,Scientometrics, 18 (1990); 281–307L. Leydesdorff, Predictions on the basis of science indicators or more generally on the basis of anything one might know about texts. II. The dynamics of science,Scientometrics, (in press).) In this study, I want to extend that approach to the prediction of most likely values for future events. In a next study, extension to the static and dynamic analysis of scientometric transaction matrices is envisaged. (L, Leydesdorff, The static and dynamic analysis of network data using information theory. Paper to be presented at XIIth World Congress of Sociology, Madrid, 1990).
Anderson et al. 1988,op. cit. note 5;Leydesdorff 1989,op. cit., note 5.L. Leydesdorff, The Science Citation Index and the measurement of national performance in terms of numbers of scientific publications,Scientometrics, 17 (1989) 111–120.
For the possible order of magnitude of misspellings in theSCI, see also:Leydesdorff 1988.,op cit., note 5.
This database is compiled biennially for the US National Science Foundation by Computer Horizons Inc. (CHI). Recently it was suggested that “letters” should also be included in this more qualitative indicator of scientific output. (Braun et al. 1989.,op. cit., note 5).
Using integer counting one full count is attributed to each of the institutional addresses in case of a co-authored paper. Alternatively, one may attribute an, equal or weighted fraction to each of the authors involed. See also:Anderson et al. 1988.,op. cit., note 5.
This data is generated by combining each EEC country's data with the combined remainder of the EEC using a Boolean AND. The data is being collected primarily for use in a future study of European network relations in terms of international co-authorship, but is occasionally used in the present study as well. In my opinion, trends in international co-authorship, the attribution of credit for co-authored publications, and the measurement of national performance in terms of numbers of publications should not be resumed under one single indicator — as has been advocated by those who favour “fractional counting” (e.g.,Anderson et al.,op. cit.) — since co-authorship relations merit separate analysis as a network indicator. See also:G. Lewison, P. Cunningham, The use of bibliometrics in the evaluation of Community biotechnology research programmes, In:A. F. J. van Raan, A. J. Nederhof, H. F. Moed (Eds),Science and Technology Indicators. Their Use in Science Policy and Their Role in Science Studies, DSWO, Leiden, 1989, pp. 99–114.
See for the PPP normalizion: OECD,National Accounts, Vol. 1. Main Aggregates 1951–1980, Paris, 1982, p. 9.
Coverage of EEC countries, the US and Japan is also more complete than in previously published surveys. For example, the US National Science Foundation'sScience and Engineering Indicators —1987 gives normalized input statistics for three major European countries only; an older UK report extends this with data for the Netherlands (B. Martin, J. Irvine, N. Minchin,An International Comparison of Government Funding of Academic and Academically Related Research, ABRC Science Policy Studies No. 2, London 1986). Data for 1979 was given inOECD Science and Technology Indicators (Paris, 1984), but since this data is normalized on a different basis, it is omitted here.
When the observations are auto-correlated, the errors are likely to be auto-correlated as well. See also:R. McCleary, R. A. Hay Jr.,Applied Time Series Analysis for the Social Sciences, Sage, Beverly Hills/ London, 1980;C. Chatfield,The Analysis of Time Series, Chapman and Hall, London/New York, 1975, 1984.
G. E. P. Box, G. C. Tiao, A change in level of a nonstationary time series,Biometrika, 51 (1965) 181–192;G. E. P. Box, G. M. Jenkins,Time Series Analysis: Forecasting and Control, Revised Edition,Holden-Day, San Francisco, 1976.
SPSS/PC+Trends, SPSS,Chicago, 1987, B-30.
First, with insufficient observations, the use of autocorrelation and partial autocorrelation functions for the identification of the model becomes problematic; and secondly, significance testing of the fits is sensitive to the amount of data involved.McClearly et al. 1980 (op. cit., note 17) recommend a minimum of fifty measurement points as a rule of thumb.
See also:K. Krippendorff,Information Theory. Structural Models for Qualitative Data, Sage, Beverly Hills, etc., 1986;Leydesdorff (forthcoming),op. cit. The static and dynamic analysis of network data using information theory. Paper to be presented at XIIth World Congress of Sociology, Madrid, 1990).
See also:H. Theil,Applied Economic Forecasting, North-Holland, Amsterdam, 1966;H. Theil Statistical Decomposition Analysis, North-Holland, Amsterdam, 1972.
Ibid..
Leydesdorff 1990,op. cit., note 7.
Alternatively, depending on the research question, one may followTheil (1966.,op. cit., pp. 325–7) who sought to improve the predictions based on information theory, by weighting more recent data linearly by using the following formula:\({{\sum {F_i '} = (NF_n + (N - 1)F_{n - 1} + (N - 2)F_{n - 2} + \ldots + F_i )} \mathord{\left/ {\vphantom {{\sum {F_i '} = (NF_n + (N - 1)F_{n - 1} + (N - 2)F_{n - 2} + \ldots + F_i )} {({1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-\nulldelimiterspace} 2}N(N + 1))}}} \right. \kern-\nulldelimiterspace} {({1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-\nulldelimiterspace} 2}N(N + 1))}}\) The assumption of equality for Qn+1 and Pn in order to estimate next year's share can be maintained; however, the computation of relative frequencies and predictions becomes a bit more complex.
SPSS/PC+Trends, Chicago, 1987.
See also:Leydesdorff, 1990.op. cit., note 7.
See also:Leydesdorff 1989.,op. cit. note 5.
See also:Theil 1972.,op. cit., note 22.
However, the analysis in terms of Markov chains is not necessarily fruitful, since the chains are in most scientometric applications not regular. (F. C. Schoute,Prestatie-analyse van Telecommunicatie Systemen, Kluwer, Deventer etc. 1988.)
Anderson et al. 1988,op. cit.;Braun et al. 1989,op. cit., note 5.T. Braun, W. Glänzel, A. Schubert, Assessing assessments of British science. Some facts and figures to accept or decline,Scientometrics, 15 (1989) 165–170.
Lewison et al. 1989.,op. cit., note 14.
See among others.,, notes 4 to 6.
Krippendorff 1986,op. cit., note 21K. Krippendorff,Information Theory. Structural Models for Qualitative Data, Sage, Beverly Hills, etc., 1986;Leydesdorff,op. cit., note 7.L, Leydesdorff, The static and dynamic analysis of network data using information theory. Paper to be presented at XIIth World Congress of Sociology, Madrid, 1990).
J. Irvine, B. R. Martin, Is Britain spending enough on science?Nature, 323 (October 16, 1986) 591–4;D. C. Smith, P. M. D. Collins, D. M. Hicks, S. Wyatt, National performance in basic research,Nature, 323 (October 23, 1986) 681–4;Evaluation of National Performance in Basic Research, ABRC Science Policy Studies No. 1, London, 1986.
Ibid..
This can be achieved either by transforming the data using algorithms which remove the autocorrelation or by ARIMA algorithms which estimate it. See:SPSS-PC+Trends, op. cit., Chicago, 1987, B-72.
Leydesdorff (in press).,op. cit., note 7.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Leydesdorff, L. The prediction of science indicators using information theory. Scientometrics 19, 297–324 (1990). https://doi.org/10.1007/BF02095353
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF02095353