Skip to main content
Log in

Accurate forecast of countries’ research output by macro-level indicators

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

There is a great variation of research output across countries in terms of differences in the amount of published peer-reviewed literature. Besides determining the causal determinants of these differences, an important task of scientometric research is to make accurate predictions of countries’ future research output. Building on previous research on the key drivers of differences in countries’ research outputs, this study develops a model which includes sixteen macro-level predictors representing aspects of the research and economic system, of the political conditions, and of structural and cultural attributes of countries. In applying a machine learning procedure called boosted regression trees, the study demonstrates these predictors are sufficient for making highly accurate forecasts of countries’ research output across scientific disciplines. The study also shows that using a functionally flexible procedure like boosted regression trees can substantially increase the predictive power of the model when compared to traditional regression. Finally, the results obtained allow a different perspective on the functional forms of the relations between the predictors and the response variable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Data on all predictors from 2004 to 2012 were used for fitting the model (Step 1), predictor data from 2013 were used for model validation (Step 2), and predictor data from 2014 were used for forecasting research output in 2015. (See “Analytical strategy” section).

  2. Since sufficient data could only be retrieved for 2015, these were used for the complete period. Consequently, the number of universities was treated as a constant.

  3. The automatic identification of the optimal tree number is implemented in the Stata plugin boost (Schonlau 2005). This plugin was used for all the predictions made in this study.

  4. In order to account for the increasing trends of many variables in the FE and RE models, a time trend term was included into the models.

  5. Additionally, a negative binomial regression was estimated without log-transforming the dependent variable. Since the results of the negative binomial regression were even less accurate than those from OLS, this approach was not pursued any further. Such a result is not unusual, given that employing log-transformed dependent variables within a linear approach may be more appropriate in some cases than using a count data model (Thelwall and Wilson 2014).

  6. In order to test whether the accuracy of the BRT model depends upon the number of observations used to train it as well as to see whether it loses predictive accuracy when applied to data that corresponds to a more distant time in the future, I performed another sensitivity test in that I trained the model with data from 2004 to 2009 and validated it with data from 2014. In other words, the model trained with predictor data up to 2009 was used to predict ln(docs) in 2014. The predicted values for 2014 were then compared with the actual values in 2014. Although the prediction error slightly increased (RMSE = 0.29), the loss in predictive accuracy is moderate. This means that even when considerably less observations are used for training the model and when the trained model is validated with data more distant in the future, BRT still outperforms traditional regression approaches in terms of predictive accuracy.

References

  • Abramo, G., & D’Angelo, C. A. (2014). How do you define and measure research productivity? Scientometrics, 101, 1129–1144.

    Article  Google Scholar 

  • Basu, A. (2010). Does a country’s scientific ‘productivity’ depend critically on the number of country journals indexed? Scientometrics, 82, 507–516.

    Article  Google Scholar 

  • Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmont: Wadsworth.

    MATH  Google Scholar 

  • Canagarajah, A. S. (2002). A geopolitics of academic writing. Pittsburgh: University of Pittsburgh Press.

    Google Scholar 

  • Diaz-Puente, J. M., Cazorla, A., & Dorrego, A. (2007). Crossing national, continental, and linguistic boundaries: Toward a worldwide evaluation research community in journals of evaluation. American Journal of Evaluation, 28, 399–415.

    Article  Google Scholar 

  • Elith, J., Leathwick, J. R., & Hastie, T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology, 77, 802–813.

    Article  Google Scholar 

  • Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29, 1189–1232.

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics and Data Analysis, 38, 367–378.

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman, J. H., & Meulman, J. J. (2003). Multiple additive regression trees with application in epidemiology. Statistics in Medicine, 22, 1365–1381.

    Article  Google Scholar 

  • Gantman, E. R. (2009). International differences of productivity in scholarly management knowledge. Scientometrics, 80, 155–167.

    Article  Google Scholar 

  • Gantman, E. R. (2012). Economic, linguistic, and political factors in the scientific productivity of countries. Scientometrics, 93, 967–985.

    Article  Google Scholar 

  • Gul, S., Nisa, N. T., Shah, T. A., Gupta, S., Jan, A., & Ahmad, S. (2015). Middle East: Research productivity and performance across nations. Scientometrics, 105, 1157–1166.

    Article  Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). New York: Springer.

    Book  MATH  Google Scholar 

  • Hsie, P.-N., & Chang, P.-L. (2009). An assessment of world-wide research productivity in production and operations management. International Journal of Production Economics, 120, 540–551.

    Article  Google Scholar 

  • Jamjoom, B. A., & Jamjoom, A. B. (2016). Impact of country-specific characteristics on scientific productivity in clinical neurology research. eNeurologicalSci, 4, 1–3.

    Article  Google Scholar 

  • Kaufmann, D., Kraay, A., & Mastruzzi, M. (2011). The worldwide governance indicators: Methodology and analytical issues. Hague Journal on the Rule of Law, 3, 220–246.

    Article  Google Scholar 

  • King, D. A. (2004). The scientific impact of nations. Nature, 430, 311–316.

    Article  Google Scholar 

  • Koljatic, M. M., & Silva, M. R. (2001). The international publication productivity of Latin American countries in the economics and business administration fields. Scientometrics, 51, 381–394.

    Article  Google Scholar 

  • Lee, L.-C., Lin, P.-H., Chuang, Y.-W., & Lee, Y.-Y. (2011). Research output and economic productivity: A Granger causality test. Scientometrics, 89, 465–478.

    Article  Google Scholar 

  • Makridakis, S. G., Wheelwright, S. C., & Hyndman, R. J. (1998). Forecasting: Methods and applications (3rd ed.). New York: Wiley.

    Google Scholar 

  • Man, J. P., Weinkauf, J. G., Tsang, M., & Sin, D. D. (2004). Why do some countries publish more than others? An international comparison of research funding, English proficiency and publication output in highly ranked general medical journals. European Journal of Epidemiology, 19, 811–817.

    Article  Google Scholar 

  • Meo, S. A., Masri, Al, Abeer, A., Usmani, A. M., Memon, A. N., Zaidi, S. Z., et al. (2013). Correction: Impact of GDP, spending on R&D, number of universities and scientific journals on research publications among Asian countries. PLoS One, 8, e66449.

    Article  Google Scholar 

  • Ntuli, H., Inglesi-Lotz, R., Chang, T., & Pouris, A. (2015). Does research output cause economic growth or vice versa? Evidence from 34 OECD countries. Journal of the Association for Information Science and Technology, 66, 1709–1716.

    Article  Google Scholar 

  • Origgi, G., & Ramello, G. B. (2015). Current dynamics of scholarly publishing. Evaluation Review, 39, 3–18.

    Article  Google Scholar 

  • Rahman, M., & Fukui, T. (2003). Biomedical research productivity: Factors across the countries. International Journal of Technology Assessment in Health Care, 19, 249–260.

    Article  Google Scholar 

  • Research Trends (2008). Geographical trends of research output. http://www.researchtrends.com/issue8-november-2008/geographical-trends-of-research-output. Accessed 10 Apr 2016.

  • Rodriguez, V., & Soeparwata, A. (2012). ASEAN benchmarking in terms of science, technology, and innovation from 1999 to 2009. Scientometrics, 92, 549–573.

    Article  Google Scholar 

  • Sarwan, R., & Hassan, S.-U. (2015). A bibliometric assessment of scientific productivity and international collaboration of the Islamic World in science and technology (S&T) areas. Scientometrics, 105, 1059–1077.

    Article  Google Scholar 

  • Schonlau, M. (2005). Boosted regression (boosting): An introductory tutorial and a Stata plugin. The Stata Journal, 5, 330–354.

    Google Scholar 

  • Short, J. R., Boniche, A., Kim, Y., & Li, P. L. (2001). Cultural globalization, global English, and geography journals. The Professional Geographer, 53, 1–11.

    Article  Google Scholar 

  • Thelwall, M., & Wilson, P. (2014). Regression for citation data: An evaluation of different methods. Journal of Informetrics, 8, 963–971.

    Article  Google Scholar 

  • Trivedi, P. (1993). An analysis of publication lags in econometrics. Journal of Applied Econometrics, 8, 93–100.

    Article  Google Scholar 

  • Vinkler, P. (2008). Correlation between the structure of scientific research, scientometric indicators and GDP in EU and non-EU countries. Scientometrics, 74, 237–254.

    Article  Google Scholar 

  • Vinluan, L. R. (2012). Research productivity in education and psychology in the Philippines and comparison with ASEAN countries. Scientometrics, 91, 277–294.

    Article  Google Scholar 

  • Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining: Practical machine learning tools and techniques. Burlington: Elsevier.

    Google Scholar 

Download references

Acknowledgments

The author wishes to thank the anonymous reviewer for her/his invaluably helpful comments on an earlier version of this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christoph Emanuel Mueller.

Appendix

Appendix

See Tables 3, 4, 5 and 6.

Table 3 Countries used for fitting the model (Step 1)
Table 4 Observed and predicted values of ln(docs) in 2014 (Step 2)
Table 5 Observed and predicted number of citable docs in 2014 (Step 2)
Table 6 Inter-correlations of predictor variables

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mueller, C.E. Accurate forecast of countries’ research output by macro-level indicators. Scientometrics 109, 1307–1328 (2016). https://doi.org/10.1007/s11192-016-2084-1

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-016-2084-1

Keywords

Mathematical Subject Classification

JEL Classification

Navigation