Accurate forecast of countries’ research output by macro-level indicators

Mueller, Christoph Emanuel

doi:10.1007/s11192-016-2084-1

Accurate forecast of countries’ research output by macro-level indicators

Published: 23 July 2016

Volume 109, pages 1307–1328, (2016)
Cite this article

Scientometrics Aims and scope Submit manuscript

Christoph Emanuel Mueller¹

640 Accesses
18 Citations
Explore all metrics

Abstract

There is a great variation of research output across countries in terms of differences in the amount of published peer-reviewed literature. Besides determining the causal determinants of these differences, an important task of scientometric research is to make accurate predictions of countries’ future research output. Building on previous research on the key drivers of differences in countries’ research outputs, this study develops a model which includes sixteen macro-level predictors representing aspects of the research and economic system, of the political conditions, and of structural and cultural attributes of countries. In applying a machine learning procedure called boosted regression trees, the study demonstrates these predictors are sufficient for making highly accurate forecasts of countries’ research output across scientific disciplines. The study also shows that using a functionally flexible procedure like boosted regression trees can substantially increase the predictive power of the model when compared to traditional regression. Finally, the results obtained allow a different perspective on the functional forms of the relations between the predictors and the response variable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scientometric laws connecting publication counts to national research funding

Article 22 February 2020

A new approach to the analysis and evaluation of the research output of countries and institutions

Article 25 March 2019

Comparing the efficiency of countries to assimilate and apply research investment

Article 24 October 2020

Notes

Data on all predictors from 2004 to 2012 were used for fitting the model (Step 1), predictor data from 2013 were used for model validation (Step 2), and predictor data from 2014 were used for forecasting research output in 2015. (See “Analytical strategy” section).
Since sufficient data could only be retrieved for 2015, these were used for the complete period. Consequently, the number of universities was treated as a constant.
The automatic identification of the optimal tree number is implemented in the Stata plugin boost (Schonlau 2005). This plugin was used for all the predictions made in this study.
In order to account for the increasing trends of many variables in the FE and RE models, a time trend term was included into the models.
Additionally, a negative binomial regression was estimated without log-transforming the dependent variable. Since the results of the negative binomial regression were even less accurate than those from OLS, this approach was not pursued any further. Such a result is not unusual, given that employing log-transformed dependent variables within a linear approach may be more appropriate in some cases than using a count data model (Thelwall and Wilson 2014).
In order to test whether the accuracy of the BRT model depends upon the number of observations used to train it as well as to see whether it loses predictive accuracy when applied to data that corresponds to a more distant time in the future, I performed another sensitivity test in that I trained the model with data from 2004 to 2009 and validated it with data from 2014. In other words, the model trained with predictor data up to 2009 was used to predict ln(docs) in 2014. The predicted values for 2014 were then compared with the actual values in 2014. Although the prediction error slightly increased (RMSE = 0.29), the loss in predictive accuracy is moderate. This means that even when considerably less observations are used for training the model and when the trained model is validated with data more distant in the future, BRT still outperforms traditional regression approaches in terms of predictive accuracy.

References

Abramo, G., & D’Angelo, C. A. (2014). How do you define and measure research productivity? Scientometrics, 101, 1129–1144.
Article Google Scholar
Basu, A. (2010). Does a country’s scientific ‘productivity’ depend critically on the number of country journals indexed? Scientometrics, 82, 507–516.
Article Google Scholar
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmont: Wadsworth.
MATH Google Scholar
Canagarajah, A. S. (2002). A geopolitics of academic writing. Pittsburgh: University of Pittsburgh Press.
Google Scholar
Diaz-Puente, J. M., Cazorla, A., & Dorrego, A. (2007). Crossing national, continental, and linguistic boundaries: Toward a worldwide evaluation research community in journals of evaluation. American Journal of Evaluation, 28, 399–415.
Article Google Scholar
Elith, J., Leathwick, J. R., & Hastie, T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology, 77, 802–813.
Article Google Scholar
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29, 1189–1232.
Article MathSciNet MATH Google Scholar
Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics and Data Analysis, 38, 367–378.
Article MathSciNet MATH Google Scholar
Friedman, J. H., & Meulman, J. J. (2003). Multiple additive regression trees with application in epidemiology. Statistics in Medicine, 22, 1365–1381.
Article Google Scholar
Gantman, E. R. (2009). International differences of productivity in scholarly management knowledge. Scientometrics, 80, 155–167.
Article Google Scholar
Gantman, E. R. (2012). Economic, linguistic, and political factors in the scientific productivity of countries. Scientometrics, 93, 967–985.
Article Google Scholar
Gul, S., Nisa, N. T., Shah, T. A., Gupta, S., Jan, A., & Ahmad, S. (2015). Middle East: Research productivity and performance across nations. Scientometrics, 105, 1157–1166.
Article Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). New York: Springer.
Book MATH Google Scholar
Hsie, P.-N., & Chang, P.-L. (2009). An assessment of world-wide research productivity in production and operations management. International Journal of Production Economics, 120, 540–551.
Article Google Scholar
Jamjoom, B. A., & Jamjoom, A. B. (2016). Impact of country-specific characteristics on scientific productivity in clinical neurology research. eNeurologicalSci, 4, 1–3.
Article Google Scholar
Kaufmann, D., Kraay, A., & Mastruzzi, M. (2011). The worldwide governance indicators: Methodology and analytical issues. Hague Journal on the Rule of Law, 3, 220–246.
Article Google Scholar
King, D. A. (2004). The scientific impact of nations. Nature, 430, 311–316.
Article Google Scholar
Koljatic, M. M., & Silva, M. R. (2001). The international publication productivity of Latin American countries in the economics and business administration fields. Scientometrics, 51, 381–394.
Article Google Scholar
Lee, L.-C., Lin, P.-H., Chuang, Y.-W., & Lee, Y.-Y. (2011). Research output and economic productivity: A Granger causality test. Scientometrics, 89, 465–478.
Article Google Scholar
Makridakis, S. G., Wheelwright, S. C., & Hyndman, R. J. (1998). Forecasting: Methods and applications (3rd ed.). New York: Wiley.
Google Scholar
Man, J. P., Weinkauf, J. G., Tsang, M., & Sin, D. D. (2004). Why do some countries publish more than others? An international comparison of research funding, English proficiency and publication output in highly ranked general medical journals. European Journal of Epidemiology, 19, 811–817.
Article Google Scholar
Meo, S. A., Masri, Al, Abeer, A., Usmani, A. M., Memon, A. N., Zaidi, S. Z., et al. (2013). Correction: Impact of GDP, spending on R&D, number of universities and scientific journals on research publications among Asian countries. PLoS One, 8, e66449.
Article Google Scholar
Ntuli, H., Inglesi-Lotz, R., Chang, T., & Pouris, A. (2015). Does research output cause economic growth or vice versa? Evidence from 34 OECD countries. Journal of the Association for Information Science and Technology, 66, 1709–1716.
Article Google Scholar
Origgi, G., & Ramello, G. B. (2015). Current dynamics of scholarly publishing. Evaluation Review, 39, 3–18.
Article Google Scholar
Rahman, M., & Fukui, T. (2003). Biomedical research productivity: Factors across the countries. International Journal of Technology Assessment in Health Care, 19, 249–260.
Article Google Scholar
Research Trends (2008). Geographical trends of research output. http://www.researchtrends.com/issue8-november-2008/geographical-trends-of-research-output. Accessed 10 Apr 2016.
Rodriguez, V., & Soeparwata, A. (2012). ASEAN benchmarking in terms of science, technology, and innovation from 1999 to 2009. Scientometrics, 92, 549–573.
Article Google Scholar
Sarwan, R., & Hassan, S.-U. (2015). A bibliometric assessment of scientific productivity and international collaboration of the Islamic World in science and technology (S&T) areas. Scientometrics, 105, 1059–1077.
Article Google Scholar
Schonlau, M. (2005). Boosted regression (boosting): An introductory tutorial and a Stata plugin. The Stata Journal, 5, 330–354.
Google Scholar
Short, J. R., Boniche, A., Kim, Y., & Li, P. L. (2001). Cultural globalization, global English, and geography journals. The Professional Geographer, 53, 1–11.
Article Google Scholar
Thelwall, M., & Wilson, P. (2014). Regression for citation data: An evaluation of different methods. Journal of Informetrics, 8, 963–971.
Article Google Scholar
Trivedi, P. (1993). An analysis of publication lags in econometrics. Journal of Applied Econometrics, 8, 93–100.
Article Google Scholar
Vinkler, P. (2008). Correlation between the structure of scientific research, scientometric indicators and GDP in EU and non-EU countries. Scientometrics, 74, 237–254.
Article Google Scholar
Vinluan, L. R. (2012). Research productivity in education and psychology in the Philippines and comparison with ASEAN countries. Scientometrics, 91, 277–294.
Article Google Scholar
Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining: Practical machine learning tools and techniques. Burlington: Elsevier.
Google Scholar

Download references

Acknowledgments

The author wishes to thank the anonymous reviewer for her/his invaluably helpful comments on an earlier version of this article.

Author information

Authors and Affiliations

Department of Sociology, Center for Evaluation (CEval), Saarland University, P.O. Box 151150, 66041, Saarbrüecken, Germany
Christoph Emanuel Mueller

Authors

Christoph Emanuel Mueller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christoph Emanuel Mueller.

Appendix

See Tables 3, 4, 5 and 6.

Table 3 Countries used for fitting the model (Step 1)

Full size table

Table 4 Observed and predicted values of ln(docs) in 2014 (Step 2)

Full size table

Table 5 Observed and predicted number of citable docs in 2014 (Step 2)

Full size table

Table 6 Inter-correlations of predictor variables

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mueller, C.E. Accurate forecast of countries’ research output by macro-level indicators. Scientometrics 109, 1307–1328 (2016). https://doi.org/10.1007/s11192-016-2084-1

Download citation

Received: 17 May 2016
Published: 23 July 2016
Issue Date: November 2016
DOI: https://doi.org/10.1007/s11192-016-2084-1

Keywords

Mathematical Subject Classification

JEL Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accurate forecast of countries’ research output by macro-level indicators

Abstract

Access this article

Similar content being viewed by others

Scientometric laws connecting publication counts to national research funding

A new approach to the analysis and evaluation of the research output of countries and institutions

Comparing the efficiency of countries to assimilate and apply research investment

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Mathematical Subject Classification

JEL Classification

Navigation

Accurate forecast of countries’ research output by macro-level indicators

Abstract

Access this article

Similar content being viewed by others

Scientometric laws connecting publication counts to national research funding

A new approach to the analysis and evaluation of the research output of countries and institutions

Comparing the efficiency of countries to assimilate and apply research investment

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematical Subject Classification

JEL Classification

Search

Navigation