Limited Dependent Variable Models and Probabilistic Prediction in Informetrics

  • Nick DeschachtEmail author
  • Tim C. E. Engels


This chapter explores the potential for informetric applications of limited dependent variable models, i.e., binary, ordinal, and count data regression models. In bibliometrics and scientometrics such models can be used in the analysis of all kinds of categorical and count data, such as assessments scores, career transitions, citation counts, editorial decisions, or funding decisions. The chapter reviews the use of these models in the informetrics literature and introduces the models, their underlying assumptions and their potential for predictive purposes. The main advantage of limited dependent variable models is that they allow us to identify the main explanatory variables in a multivariate framework and to estimate the size of their (marginal) effects. The models are illustrated using an example data set to analyze the determinants of citations. The chapter also shows how these models can be estimated using the statistical software Stata.



The authors thank Fereshteh Didegah, Raf Guns, Edward Omey, and Ronald Rousseau for their suggestions during the writing of this chapter. We also thank Richard Williams and Paul J Wilson for their feedback and excellent suggestions.


  1. Abbasi, A., Altmann, J., & Hossain, L. (2011). Identifying the effects of co-authorship networks on the performance of scholars: A correlation and regression analysis of performance measures and social network analysis measures. Journal of Informetrics, 5, 594–607.CrossRefGoogle Scholar
  2. Acosta, M., Coronado, D., Marín, R., & Prats, P. (2013). Factors affecting the diffusion of patented military technology in the field of weapons and ammunition. Scientometrics, 94, 1–22.CrossRefGoogle Scholar
  3. Agresti, A. (2002). Categorical data analysis (2nd ed.). New York, NY: Wiley.CrossRefzbMATHGoogle Scholar
  4. Agresti, A. (2010). Analysis of ordinal categorical data (2nd ed.). New York, NY: Wiley.CrossRefzbMATHGoogle Scholar
  5. Barjak, F., & Robinson, S. (2007). International collaboration, mobility, and team diversity in the life sciences: Impact on research performance. In D. Torres-Salinas & H. F. Moed (Eds.), Proceedings of ISSI 2007 (pp. 63–73). Madrid: ISSI.Google Scholar
  6. Bornmann, L., & Daniel, H. D. (2006). Selecting scientific excellence through committee peer review—A citation analysis of publications previously published to approval or rejection of post-doctoral research fellowship applicants. Scientometrics, 68, 427–440.CrossRefGoogle Scholar
  7. Bornmann, L., & Daniel, H.-D. (2008). Selecting manuscripts for a high-impact journal through peer review: A citation analysis of communications that were accepted by Angewandte Chemie International Edition, or rejected but published elsewhere. Journal of the American Society for Information Science and Technology, 59, 1841–1852.CrossRefGoogle Scholar
  8. Bornmann, L., Mutz, R., & Daniel, H.-D. (2013). Multilevel-statistical reformulation of citation-based university rankings: The Leiden ranking 2011/2012. Journal of the American Society for Information Science and Technology, 64, 1649–1658.CrossRefGoogle Scholar
  9. Bornmann, L., & Williams, R. (2013). How to calculate the practical significance of citation impact differences? An empirical example from evaluative institutional bibliometrics using adjusted predictions and marginal effects. Journal of Informetrics, 7, 562–574.CrossRefGoogle Scholar
  10. Chen, C. (2012). Predictive effects of structural variation on citation counts. Journal of the American Society for Information Science and Technology, 63, 431–449.CrossRefGoogle Scholar
  11. Didegah, F., & Thelwall, M. (2013a). Determinants of research citation impact in nanoscience and nanotechnology. Journal of the American Society for Information Science and Technology, 64, 1055–1064.CrossRefGoogle Scholar
  12. Didegah, F., & Thelwall, M. (2013b). Which factors help authors produce the highest impact research? Collaboration, journal and document properties. Journal of Informetrics, 7, 861–873.CrossRefGoogle Scholar
  13. Engels, T. C. E., Goos, P., Dexters, N., & Spruyt, E. H. J. (2013). Group size, h-index and efficiency in publishing in top journals explain expert panel assessments of research group quality and productivity. Research Evaluation, 22, 224–236.CrossRefGoogle Scholar
  14. Fedderke, J. W. (2013). The objectivity of national research foundation peer review in South Africa assessed against bibliometric indexes. Scientometrics, 97, 177–206.CrossRefGoogle Scholar
  15. Gantman, E. R. (2012). Economic, linguistic, and political factors in the scientific productivity of countries. Scientometrics, 93, 967–985.CrossRefGoogle Scholar
  16. Greene, W. H. (2011). Econometric analysis (7th ed.). Upper Saddle River, NJ: Prentice Hall.Google Scholar
  17. Heinze, T., & Bauer, G. (2007). Characterizing creative scientists in nano-S&T: Productivity, multidisciplinarity, and network brokerage in a longitudinal perspective. Scientometrics, 70, 811–830.CrossRefGoogle Scholar
  18. Hilbe, J. M. (2011). Negative binomial regression (2nd ed.). Cambridge, UK: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  19. Hoekman, J., Frenken, K., & van Oort, F. (2009). The geography of collaborative knowledge production in Europe. Annals of Regional Science, 43, 721–738.CrossRefGoogle Scholar
  20. Jensen, P., Rouquier, J.-B., & Croissant, Y. (2009). Testing bibliometric indicators by their prediction of scientists promotions. Scientometrics, 78, 467–479.CrossRefGoogle Scholar
  21. Lee, Y. G. (2008). Patent licensability and life: A study of US patents registered by South Korean public research institutes. Scientometrics, 75, 463–471.CrossRefGoogle Scholar
  22. Lee, Y.-G., Lee, J.-D., Song, Y.-I., & Lee, S.-J. (2007). An in-depth empirical analysis of patent citation counts using zero-inflated count data model: The case of KIST. Scientometrics, 70, 27–39.CrossRefGoogle Scholar
  23. Leydesdorff, L., & Bensman, S. (2006). Classification and powerlaws: The logarithmic transformation. Journal of the American Society for Information Science and Technology, 57, 1470–1486.CrossRefGoogle Scholar
  24. Long, J. S., & Freese, J. (2006). Regression models for categorical dependent variables using Stata. College Station, TX: Stata Press.zbMATHGoogle Scholar
  25. Ma, Z., Sun, A., & Cong, G. (2013). On predicting the popularity of newly emerging hashtags in Twitter. Journal of the American Society for Information Science and Technology, 64, 1399–1410.CrossRefGoogle Scholar
  26. Menard, S. (1995). Applied logistic regression analysis. Thousand Oaks, CA: Sage.Google Scholar
  27. Niu, X., & Hemminger, B. M. (2012). A study of factors that affect the information-seeking behavior of academic scientists. Journal of the American Society for Information Science and Technology, 63, 336–353.CrossRefGoogle Scholar
  28. O’Brien, R. M. (2007). A caution regarding rules of thumb for variance unflation factors. Quality & Quantity, 41, 673–690.CrossRefGoogle Scholar
  29. Rigby, J. (2013). Looking for the impact of peer review: Does count of funding acknowledgements really predict research impact? Scientometrics, 94, 57–73.CrossRefGoogle Scholar
  30. Rokach, L., Kalech, M., Blank, I., & Stern, R. (2011). Who is going to win the next Association for the Advancement of Artificial Intelligence fellowship award? Evaluating researchers by mining bibliographic data. Journal of the American Society for Information Science and Technology, 62, 2456–2470.CrossRefGoogle Scholar
  31. Rousseau, R., Garcia-Zorita, C., & Sanz-Casado, E. (2013). The h-bubble. Journal of Informetrics, 7, 294–300.CrossRefGoogle Scholar
  32. Sin, S.-C. J. (2011). International coauthorship and citation impact: A bibliometric study of six LIS journals, 1980–2008. Journal of the American Society for Information Science and Technology, 62, 1770–1783.CrossRefGoogle Scholar
  33. Su, H. N., Chen, C. M. L., & Lee, P. C. (2012). Patent litigation precaution method: Analyzing characteristics of US litigated and non-litigated patents from 1976 to 2010. Scientometrics, 92, 181–195.CrossRefGoogle Scholar
  34. Vakkari, P. (2012). Internet use increases the odds of using the public library. Journal of Documentation, 68, 618–638.CrossRefGoogle Scholar
  35. Van Dalen, H. P., & Henkens, K. (2005). Signals in science—On the importance of signaling in gaining attention in science. Scientometrics, 64, 209–233.CrossRefGoogle Scholar
  36. Verbeek, M. (2008). A guide to modern econometrics. New York, NY: Wiley.Google Scholar
  37. Walters, G. D. (2006). Predicting subsequent citations to articles published in twelve crime-psychology journals: Author impact versus journal impact. Scientometrics, 69, 499–510.CrossRefGoogle Scholar
  38. Wooldridge, J. M. (2012). Introductory econometrics: A modern approach (5th ed.). Andover, MA: Cengage Learning.Google Scholar
  39. Wooldridge, J. (1997). Quasi-likelihood methods for count data. In M.H. Pesaran and P. Schmidt (Eds.), Handbook of applied econometrics (Vol 2 pp. 352–406). Oxford: Blackwell.Google Scholar
  40. Yoshikane, F. (2013). Multiple regression analysis of a patent’s citation frequency and quantitative characteristics: The case of Japanese patents. Scientometrics, 96, 365–379.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Faculty of Economics and BusinessKU LeuvenBrusselBelgium
  2. 2.Department of Research Affairs and Centre for Research & Development Monitoring (ECOOM)University of AntwerpAntwerpBelgium
  3. 3.Antwerp Maritime AcademyAntwerpBelgium

Personalised recommendations