Skip to main content


Log in

Catching Gazelles with a Lasso: Big data techniques for the prediction of high-growth firms

  • Published:
Small Business Economics Aims and scope Submit manuscript


We investigate whether our limited ability to predict high-growth firms (HGF) is because previous research has used a restricted set of explanatory variables, and in particular because there is a need for explanatory variables with high variation within firms over time. To this end, we apply “big data” techniques (i.e., LASSO; Least Absolute Shrinkage and Selection Operator) to predict HGFs in comprehensive datasets on Croatian and Slovenian firms. Firms with low inventories, higher previous employment growth, and higher short-term liabilities are more likely to become HGFs. Pseudo-R2 statistics of around 10% indicate that HGF prediction remains a challenging exercise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others


  1. The working papers by Miyakawa et al. (2017) and McKenzie and Sansone (2017), who apply LASSO to firm performance data. Miyakawa et al. (2017) seek to predict high growth performance in a sample of Japanese firms, although they use a non-standard definition of high-growth firms. McKenzie and Sansone (2017) investigate top 10% growth among business plan competition winners and non-winners in Nigeria. These working papers came to our attention at an advanced stage of this research.

  2. See also the exchange between Derbyshire and Garnsey (2014) and Coad et al. (2015) on the randomness of growth. We are grateful to a reviewer for this suggestion.

  3. Relatedly, Bernerth et al. (2018) recommend that the control variables be mentioned specifically in the formulation of the hypotheses. Of course, we cannot do this in our context, because we have several hundred explanatory variables, and we apply data-driven procedures to decide which of these explanatory variables to keep. Nevertheless, in step 7 of Algorithm 1, we add in a minimalist set of control variables that are included for theoretical reasons, i.e., sector dummies, year dummies, a dummy for the Zagreb capital region, and firm age.

  4. At the Eurostat web site, under the National accounts aggregates by industry (up to NACE A*64), we obtain current prices, million units of national currency and previous year prices, million units of national currency. To obtain the share of current in previous year prices, the two are divided and were set at constant prices in 2010. When two-digit NACE deflators were not possible to obtain, the one-digit deflators were used (e.g., mining and quarrying, the NACE one-digit deflators were used for the four separate NACE 2-digit sectors).

  5. Previous research has shown that employment growth and sales growth are the two most common indicators of firm growth, and we include them both because they are alternative and complementary indicators that capture different aspects of the firm growth process (Delmar 1997; Shepherd and Wiklund 2009).

  6. Including firms with 1 or 2 employees was not possible, because the LASSO computations could not converge to a solution. However, this does not seem to be a problem because, despite the large number of firms with one or two employees, nevertheless these firms make a small aggregate contribution to the national economy, and moreover these micro firms are relatively unlikely to become HGFs (Neumark et al. 2011). Note also that the Eurostat-OECD HGF definition excludes all firms with fewer than 10 employees.

  7. Note that the share of HGFs jumps up from 1.53 to 10.75% when we exclude firms with fewer than 10 employees. This could explain why some countries have higher HGF shares than others—it could be because the databases being used have different coverage of micro firms (e.g., Coad and Scott 2018).

  8. The number 7.8 comes from the minimum possible growth increment to become an HGF according to the OECD definition. A firm with 10 employees in the first year, with average annual growth of 20% over 3 years, will need to grow by 10 × [1.203 − 1] = 7.28 employees.

  9. By applying the natural log transformation on all variables, we are in line with the recommendations of Makridakis et al. (2018, p. 21) to automate the preprocessing of data before the application of data-intensive forecasting methods, to avoid the role of potentially ad hoc decisions being made by the researcher.

  10. Croatia has twice larger population (4.15 million) in comparison to Slovenia (2.07 million).

  11. These data-driven penalty loadings for LASSO are different from the canonical penalty loadings proposed in Tibshirani (1996).

  12. The decision on the number of financial variables per LASSO procedure is left to the researchers. When lambdaCalculation gave penalty that selected only few variables, we gradually decrease the penalty. Details on the penalty level are given in the Online Appendix 8.

  13. Logit LASSO has the same intuition as in the linear LASSO case because logit regression can be reduced to the linear case by employing reweighted regression.

  14. These results are available from the authors upon request.

  15. Nevertheless, note that the studies in Table 1 display heterogeneity regarding their HGF indicators (Birch index, top 10% of firms, etc.) as well as size of firms in the samples, which limits the comparability of the pseudo-R2 statistics across studies.

  16. It is possible that the usage and significance of inventories differs between manufacturing and services sectors. We therefore repeated the analysis on subsamples of manufacturing and services sectors, and the results for inventories remained.

  17. Note that growth of profits is only selected by LASSO in model 2, for Croatian employment HGFs.

  18. The level of intangible assets is positive in the subsample of all firms with 3 or more employees (i.e., model 2), while growth of intangible assets is positive in the subsample of all firms with 10 or more employees (i.e., model 1).

  19. One possible explanation for the varying results could be that the regression specifications for Slovenia do not include an age variable, because this variable is not present in the Slovenian data.

  20. Table 2 shows that “cash in bank” is positive and significant in model 1 (i.e., for firms with 10+ employees) for Croatia.

  21. Ries (2011, p. 184) gives the example of folding newsletters, sealing them into envelopes, and attaching a stamp. The standard approach might be to begin by folding all newsletters, then afterward putting them all into envelopes. However, this approach has drawbacks relating to time taken to sort, stack, and move around large piles of half-complete envelopes. Also it is possible that the letters do not fit in the envelopes, a problem which would only be discovered late into the production process. Instead, “single-piece flow” (see also “continuous flow manufacturing”), which corresponds to completing each envelope one at a time, is a surprisingly efficient production method, and the superiority of “single-piece flow” has been confirmed by studies (Ries, 2011, p. 184).

  22. One possibility could be that the effects of previous growth rate on subsequent HGF status are nonlinear across the distribution of previous growth rates.


  • Achtenhagen, L., Naldi, L., & Melin, L. (2010). “Business growth”—Do practitioners and scholars really talk about the same thing? Entrepreneurship Theory and Practice, 34(2), 289–316.

    Google Scholar 

  • Arrighetti, A., & Lasagni, A. (2013). Assessing the determinants of high-growth manufacturing firms in Italy. International Journal of the Economics of Business, 20(2), 245–267.

    Article  Google Scholar 

  • Audretsch, D. B., Santarelli, E., & Vivarelli, M. (1999). Start-up size and industrial dynamics: Some evidence from Italian manufacturing. International Journal of Industrial Organization, 17, 965–983.

    Google Scholar 

  • Belloni, A., Chen, D., Chernozhukov, V., & Hansen, C. (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica, 80(6), 2369–2429.

    Google Scholar 

  • Belloni, A., Chernozhukhov, V., & Hansen, C. (2014). High-dimensional methods and inference on structural and treatment effects. Journal of Economic Perspectives, 28(2), 29–50.

    Google Scholar 

  • Belloni, A., Chernozhukov, V., & Wei, Y. (2016). Post-selection inference for generalized linear models with many controls. Journal of Business & Economic Statistics, 34(4), 606–619.

    Google Scholar 

  • Bernerth, J. B., Cole, M. S., Taylor, E. C., & Walker, H. J. (2018). Control variables in leadership research: A qualitative and quantitative review. Journal of Management, 44(1), 131–160.

    Google Scholar 

  • Bianchini, S., Bottazzi, G., & Tamagni, F. (2017). What does (not) characterize persistent corporate high-growth? Small Business Economics, 48(3), 633–656.

    Google Scholar 

  • Birch, D. L. (1979). The job generation process. Cambridge, MA: MIT program on neighborhood and regional change, Massachusetts Institute of Technology.

  • Bjuggren, C.-M., Daunfeldt, S.-O., & Johansson, D. (2013). High-growth firms and family ownership. Journal of Small Business & Entrepreneurship, 26(4), 365–385.

    Article  Google Scholar 

  • Brown, R., & Mawson, S. (2013). Trigger points and high-growth firms. Journal of Small Business and Enterprise Development, 20(2), 279–295.

    Google Scholar 

  • Brynjolfsson, E., & McAfee, A. (2014). The second machine age: Work, progress, and prosperity in a time of brilliant technologies. WW Norton & Company.

  • Chernozhukov, V., Hansen, C., & Spindler, M. (2016). High-dimensional metrics in R. arXiv preprint arXiv:1603.01700.

  • Cho, H. J., & Pucik, V. (2005). Relationship between innovativeness, quality, growth, profitability, and market value. Strategic Management Journal, 26(6), 555–575.

    Google Scholar 

  • Churchill, N. C., & Mullins, J. W. (2001). How fast can your company afford to grow? Harvard Business Review, 79(5), 135–143.

    Google Scholar 

  • Coad A., (2009). The growth of firms: A survey of theories and empirical evidence. Edward Elgar, Cheltenham, UK and Northampton, MA, USA.

  • Coad, A., Cowling, M., & Siepel, J. (2017). Growth processes of high-growth firms as a four-dimensional chicken and egg. Industrial and Corporate Change, 26(4), 537–554.

    Google Scholar 

  • Coad A., Frankish J.S., Roberts R.G., Storey D.J., (2015). Are firm growth paths random? A reply to “firm growth and the illusion of randomness.” Journal of Business Venturing Insights 3, 5–8.

  • Coad, A., & Guenther, C. (2014). Processes of firm growth and diversification: Theory and evidence. Small Business Economics, 43, 857–871.

    Google Scholar 

  • Coad, A., & Planck, M. (2012). Firms as bundles of discrete resources—Towards an explanation of the exponential distribution of firm growth rates. Eastern Economic Journal, 38, 189–209.

    Google Scholar 

  • Coad, A., & Rao, R. (2011). The firm-level employment effects of innovations in high-tech US manufacturing industries. Journal of Evolutionary Economics, 21(2), 255–283.

    Google Scholar 

  • Coad A., Scott G., (2018). High-growth firms in Peru. Cuadernos de Economia, 37(75), 671-696.

  • Cowling, M. (2004). The growth-profit nexus. Small Business Economics, 22(1), 1–9.

    Google Scholar 

  • Davidsson, P., Steffens, P., & Fitzsimmons, J. (2009). Growing profitable or growing from profits: Putting the horse in front of the cart? Journal of Business Venturing, 24(4), 388–406.

    Google Scholar 

  • Davidsson, P., & Wiklund, J. (2000). Conceptual and empirical challenges in the study of firm growth. In D. Sexton, & H. Landström (Eds.), The Blackwell Handbook of Entrepreneurship (reprinted 2006 in Entrepreneurship and the Growth of Firms, Elgar): 26–44. Oxford, MA: Blackwell Business.

  • Daunfeldt, S.-O., Elert, N., & Johansson, D. (2014). The economic contribution of high-growth firms: Do policy implications depend on the choice of growth indicator? Journal of Industry, Competition and Trade, 14(3), 337–365.

    Google Scholar 

  • Daunfeldt S.-O., Halvarsson D., (2015). Are high-growth firms one-hit wonders? Evidence from Sweden. Small Business Economics 44, 361-383.

  • Daunfeldt, S.-O., Elert, N., & Johansson, D. (2014). The economic contribution of high-growth firms: Do policy implications depend on the choice of growth indicator? Journal of Industry, Competition and Trade 14(3), 337-365.

  • De Loecker, J. (2007). Do exports generate higher productivity? Evidence from Slovenia. Journal of International Economics, 73(1), 69–98.

    Google Scholar 

  • Delmar, F. (1997). Measuring growth: Methodological considerations and empirical results. In R. Donckels, & A. Miettinen (Eds.), Entrepreneurship and SME research: On its way to the next millennium (also reprinted 2006 in Entrepreneurship and the Growth of Firms, Elgar): 190–216. Aldershot, UK and Brookfield, VA: Ashgate.

  • Delmar, F., Davidsson, P., & Gartner, W. B. (2003). Arriving at the high-growth firm. Journal of Business Venturing, 18, 189–216.

    Google Scholar 

  • Delmar, F., McKelvie, A., & Wennberg, K. (2013). Untangling the relationships among growth, profitability and survival in new firms. Technovation, 33(8–9), 276–291.

    Google Scholar 

  • Derbyshire, J., & Garnsey, E. (2014). Firm growth and the illusion of randomness. Journal of Business Venturing Insights, 1-2, 8–11.

    Google Scholar 

  • Denton, F. T. (1985). Data mining as an industry. Review of Economics and Statistics, 124–127.

  • Evans, D. S. (1987). Tests of alternative theories of firm growth. Journal of Political Economy, 95(4), 657–674.

    Google Scholar 

  • Eurostat-OECD (2007). Eurostat-OECD Manual on Business Demography Statistics, Office for Official Publications of the European Communities, Luxembourg.

  • Fan, L., Chen, S., Li, Q., & Zhu, Z. (2015). Variable selection and model prediction based on lasso, adaptive lasso and elastic net, 2015 4th International Conference on Computer Science and Network Technology (ICCSNT), Harbin, 2015, 579–583.

  • George, G., Haas, M. R., & Pentland, A. (2014). Big data and management. Academy of Management Journal, 57(2), 321–326.

    Google Scholar 

  • Geroski, P. A., Machin, S. J., & Walters, C. F. (1997). Corporate growth and profitability. Journal of Industrial Economics, 45(2), 171–189.

    Google Scholar 

  • Geroski, P A (2000). The growth of firms in theory and in practice. Pages 168–186 in Nicolai Foss and Volker Mahnke (eds): Competence, governance and entrepreneurship. Oxford University Press: Oxford, UK.

  • Geroski, P., & Gugler, K. (2004). Corporate growth convergence in Europe. Oxford Economic Papers, 56, 597–620.

    Google Scholar 

  • Goedhuys, M., & Sleuwaegen, L. (2016). High-growth versus declining firms: The differential impact of human capital and R&D. Applied Economics Letters, 23(5), 369–372.

    Google Scholar 

  • Grover Goswami, A., Medvedev, D., & Olafsen, E. (2019). High-growth firms: Facts, fiction, and policy options for emerging economies. Washington, DC: World Bank.

    Google Scholar 

  • Guzman, J., Stern S. (2016). The state of American entrepreneurship: New estimates of the quantity and quality of entrepreneurship for 15 US states, 1988–2014. No. w22095. National Bureau of Economic Research.

  • Hall, B. H. (1987). The relationship between firm size and firm growth in the US manufacturing sector. Journal of Industrial Economics, 35(4), 583–606.

    Google Scholar 

  • Hambrick, D. C. (2007). The field of management’s devotion to theory: Too much of a good thing? Academy of Management Journal, 50(6), 1346–1352.

    Google Scholar 

  • Harhoff, D., Stahl, K., & Woywode, M. (1998). Legal form, growth and exit of west German firms—Empirical results for manufacturing, construction, trade and service industries. Journal of Industrial Economics, 46(4), 453–488.

    Google Scholar 

  • Helfat, C. E. (2007). Stylized facts, empirical research and theory development in management. Strategic Organization, 5(2), 185–192.

    Google Scholar 

  • Henrekson, M., & Johansson, D. (2010). Gazelles as job creators: A survey and interpretation of the evidence. Small Business Economics, 35, 227–244.

    Google Scholar 

  • Hollenbeck, J. R., & Wright, P. M. (2017). Harking, sharking, and tharking: Making the case for post hoc analysis of scientific data. Journal of Management, 43(1), 5–18

    Google Scholar 

  • Hölzl, W. (2014). Persistence, survival, and growth: A closer look at 20 years of fast-growing firms in Austria. Industrial and Corporate Change, 23(1), 199–231.

    Google Scholar 

  • Ijiri, Y., & Simon, H. A. (1964). Business firm growth and size. American Economic Review, 54(2), 77–89.

    Google Scholar 

  • Ijiri, Y., & Simon, H. A. (1967). A model of business firm growth. Econometrica, 35(2), 348–355.

    Google Scholar 

  • Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2(3), 196–217.

    Google Scholar 

  • Kumar, M. S. (1985). Growth, acquisition activity and firm size: Evidence from the United Kingdom. Journal of Industrial Economics, 33(3), 327–338.

    Google Scholar 

  • Lee, N. (2014). What holds back high-growth firms? Evidence from UK SMEs. Small Business Economics, 43(1), 183–195.

    Google Scholar 

  • Locke, E. A. (2007). The case for inductive theory building. Journal of Management, 33(6), 867–890.

    Google Scholar 

  • Lopez-Garcia, P., & Puente, S. (2012). What makes a high-growth firm? A dynamic probit analysis using Spanish firm-level data. Small Business Economics, 39, 1029–1041.

    Google Scholar 

  • Makridakis, S., Spiliotis, E., & Assimakopoulos, V. (2018). Statistical and machine learning forecasting methods: Concerns and ways forward. PLoS One, 13(3), e0194889

    Google Scholar 

  • Mason, C., & Brown, R. (2013). Creating good public policy to support high-growth firms. Small Business Economics, 40(2), 211–225.

    Google Scholar 

  • McKelvie, A., & Wiklund, J. (2010). Advancing firm growth research: A focus on growth mode instead of growth rate. Entrepreneurship Theory and Practice, 34(2), 261–288.

    Google Scholar 

  • McKenzie, D. (2017). Identifying and spurring high-growth entrepreneurship: Experimental evidence from a business plan competition. American Economic Review, 107(8), 2278–2307.

    Google Scholar 

  • McKenzie, D., & Sansone, D. (2017). Man vs. machine in predicting successful entrepreneurs: Evidence from a business plan competition in Nigeria. World Bank Policy Research Working Paper 8271.

  • Megaravalli, A. V., & Sampagnaro, G. (2018). Predicting the growth of high-growth SMEs: Evidence from family business firms. Journal of Family Business Management.

  • Miyakawa, D., Miyauchi, Y., & Perez, C. (2017). Forecasting firm performance with machine learning: Evidence from Japanese firm-level data. Research Institute of Economy, Trade and Industry (RIETI).

  • Moschella, D., Tamagni, F., & Yu, X. (2018). Persistent high-growth firms in China’s manufacturing. Small Business Economics, in press.

  • Nason, R. S., & Wiklund, J. (2018). An assessment of resource-based theorizing on firm growth and suggestions for the future. Journal of Management, 44(1), 32–60.

    Google Scholar 

  • NESTA. (2009). The vital 6 per cent: How high growth innovative businesses generate prosperity and jobs. London: NESTA.

    Google Scholar 

  • Neumark, D., Wall, B., & Zhang, J. (2011). Do small businesses create more jobs? New evidence for the United States from the National Establishment Time Series. Review of Economics and Statistics, 93(1), 16–29.

    Google Scholar 

  • Penrose, E.T., (1959). The Theory of the Growth of the Firm. Basil Blackwell: Oxford, UK.

  • Pereira, V., & Temouri, Y. (2018). Impact of institutions on emerging European high-growth firms. Management Decision, 56(1), 175–187

    Google Scholar 

  • Peric, M., & Vitezic, V. (2016). Impact of global economic crisis on firm growth. Small Business Economics, 46(1), 1–12.

    Google Scholar 

  • Ries, E. (2011). The lean startup: How today's entrepreneurs use continuous innovation to create radically successful businesses. Crown Books: New York.

  • Sermpinis, G., Tsoukas, S., & Zhang, P. (2018). Modelling market implied ratings using LASSO variable selection techniques. Journal of Empirical Finance. Forthcoming.

  • Shane, S. (2009). Why encouraging more people to become entrepreneurs is bad public policy. Small Business Economics, 33, 141–149.

    Google Scholar 

  • Shepherd, D., & Wiklund, J. (2009). Are we comparing apples with apples or apples with oranges? Appropriateness of knowledge accumulation across growth studies. Entrepreneurship Theory and Practice, 33(1), 105–123.

    Google Scholar 

  • Singh, A., & Whittington, G. (1975). The size and growth of firms. Review of Economic Studies, 42(1), 15–26.

    Google Scholar 

  • Srhoj, S., Zupic, I., & Jaklič, M. (2018). Stylized facts about Slovenian high-growth firms. Economic Research-Ekonomska Istraživanja, 31(1), 1851–1879.

    Article  Google Scholar 

  • Storey, D. J. (2011). Optimism and chance: The elephants in the entrepreneurship room. International Small Business Journal, 29(4), 303–321.

    Google Scholar 

  • Tian, S., Yu, Y., & Guo, H. (2015). Variable selection and corporate bankruptcy forecasts. Journal of Banking & Finance, 52, 89–100.

    Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267-288.

  • Tibshirani, R. (2011). Regression shrinkage and selection via the lasso: A retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(3), 273–282.

    Google Scholar 

  • Tong, et al. (2016). Comparison of predictive modeling approaches for 30-day all-cause non-elective readmission risk. BMC Medical Research Methodology, 16, 26.

    Google Scholar 

  • Vancouver, J. B. (2018). In defense of HARKing. Industrial and Organizational Psychology, 11(1), 73–80.

    Google Scholar 

  • van Witteloostuijn, A., & Kolkman, D. (2019). Is firm growth random? A machine learning perspective. Journal of Business Venturing Insights, forthcoming.

  • Vitezić, V., Srhoj, S., & Perić, M. (2018). Investigating industry dynamics in a recessionary transition economy. South East European Journal of Economics and Business, 13(1), 43–67.

    Google Scholar 

  • Weinblat, J. (2017). Forecasting European high-growth firms—A random forest approach. Journal of Industry, Competition and Trade, 1–42.

  • Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.

    Google Scholar 

Download references


We are grateful to Martin Spindler (maintainer of the HDM package in R) for advice on the software and to Iris Loncar, accounting professor, for discussions on accounting practice and the composition of particular variables. Thanks also go to Barbara Zitek for translating the accounting variables from Slovenian to English, and to Margherita Bacigalupo for introducing the authors of this manuscript to each other, and to Ivan Zilic for helpful comments on machine learning. Three anonymous reviewers provided many helpful comments. Any remaining errors are ours alone.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Alex Coad.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material


(DOCX 199 kb)


Appendix 1. LASSO results for the Croatian sample

Table 4 Logit model 1, employment indicator
Table 5 Logit model 1, turnover indicator
Table 6 Model 2, employment indicator
Table 7 Model 2, turnover indicator

Appendix 2. LASSO results for the Slovenian sample

Table 8 Model 2, employment indicator, Slovenia
Table 9 Model 2, turnover indicator, Slovenia

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Coad, A., Srhoj, S. Catching Gazelles with a Lasso: Big data techniques for the prediction of high-growth firms. Small Bus Econ 55, 541–565 (2020).

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


JEL classification