Skip to main content

Machine Learning Application to Family Business Status Classification

  • Conference paper
  • First Online:
Machine Learning, Optimization, and Data Science (LOD 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12565))

Abstract

According to a recent trend of research, there is a growing interest in applications of machine learning techniques to business analytics. In this work, both supervised and unsupervised machine learning techniques are applied to the analysis of a dataset made of both family and non-family firms. This is worth investigating, because the two kinds of firms typically differ in some aspects related to performance, which can be reflected in balance sheet data. First, binary classification techniques are applied to discriminate the two kinds of firms, by combining an unlabeled dataset with the labels provided by a survey. The most important features for performing such binary classification are identified. Then, clustering is applied to highlight why supervised learning can be effective in the previous task, by showing that most of the largest clusters found are quite unequally populated by the two classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    It is worth noting that the distinction between family/non-family firms depends not only on their different ownership structures, but also on other factors. Moreover, sometimes it is not immediate to understand if owners belong to the same family only based on ownership data, for two reasons: they might have different surnames, even though they belong to the same family (this is particularly common when family firms go through several generational shifts); there might be corporate groups linked by shareholding, which may make it difficult to identify individual owners. Current definitions of family firm follow one of these two approaches: the demographic approach (combining family ownership, governance and/or management) and the essence approach (the behavioral perspective on the firm’s nature). Demographic approaches combining ownership and management (or governance) are the most widely used [20].

  2. 2.

    The complete dataset is made of 152 companies, and is comparable in size with other datasets used in family business research [14].

  3. 3.

    Here, we used a mixed demographic approach. This definition is very close to the one proposed by the European Commission (https://ec.europa.eu/growth/smes/promoting-entrepreneurship/we-work-for/family-business_en). We raised the minimum number of family members involved in the firm’s governance from 1 to 2, to ensure a clearer demarcation between sole-founder and family-owned and governed firms.

  4. 4.

    AIDA, which stands for “Analisi Informatizzata delle Aziende Italiane” (English translation: “Computerised Analysis of Italian Firms”) is a commercial database managed by Bureau van Dijk, a Moody’s Analytics company, which contains a comprehensive set of financial informations on companies in Italy.

  5. 5.

    This way of pre-selecting a set of features (for further successive selection of its subset) can improve the overall performance of machine learning algorithms, as shown in [21] in a different framework.

  6. 6.

    Initially, for each feature, also the average of its annual change was considered, but the difference in its means over the two classes never turned out to be statistically significant.

  7. 7.

    Other architectures/machine learning algorithms (e.g., enhanced k-nearest neighbors [27], multilayer perceptrons [28], random forests [29], gradient boosting algorithms like XGBoost [30], LightGBM [31], or CatBoost [32]) could be considered for further developments.

  8. 8.

    The techniques adopted in [13] are random forests, boosting trees, and artificial neural networks. These are justified by the much larger amount of features and training data available in the dataset considered therein.

References

  1. Giudici, P., Figini, S.: Applied Data Mining for Business and Industry. Wiley, Hoboken (2009)

    Book  Google Scholar 

  2. Alexandropoulos, S.-A.N., Aridas, C.K., Kotsiantis, S.B., Vrahatis, M.N.: A deep dense neural network for bankruptcy prediction. In: Macintyre, J., Iliadis, L., Maglogiannis, I., Jayne, C. (eds.) EANN 2019. CCIS, vol. 1000, pp. 435–444. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20257-6_37

    Chapter  Google Scholar 

  3. Kirkos, E., Spathis, C., Manolopoulos, Y.: Data mining techniques for the detection of fraudulent financial statements using data mining. Expert Syst. Appl. 32(4), 995–1003 (2007)

    Article  Google Scholar 

  4. Perols, J.: Financial statement fraud detection: an analysis of statistical and machine learning algorithms. Audit. J. Pract. Theory 30(2), 19–50 (2011)

    Article  Google Scholar 

  5. Varian, H.R.: Big data: new tricks for econometrics. J. Econ. Perspect. 28, 3–28 (2014)

    Article  Google Scholar 

  6. Athey, S., Imbens, G.: Recursive partitioning for heterogeneous causal effects. Proc. Natl. Acad. Sci. 113, 7353–7360 (2016)

    Article  MathSciNet  Google Scholar 

  7. Wager, S., Athey, S.: Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. 113, 1228–1242 (2018)

    Article  MathSciNet  Google Scholar 

  8. Bargagli Stoffi, F.J., Gnecco, G.: Causal tree with instrumental variable: an extension of the causal tree framework to irregular assignment mechanisms. Int. J. Data Sci. Anal. 9(3), 315–337 (2019). https://doi.org/10.1007/s41060-019-00187-z

    Article  Google Scholar 

  9. Gnecco, G., Nutarelli, F.: On the trade-off between number of examples and precision of supervision in regression. In: Proceedings of the 4th International Conference of the International Neural Network Society on Big Data and Deep Learning (INNS BDDL 2019), Sestri Levante, Italy, pp. 1–6 (2019)

    Google Scholar 

  10. Gnecco, G., Nutarelli, F.: On the trade-off between number of examples and precision of supervision in machine learning problems. Optim. Lett. 3, 1–23 (2019). https://doi.org/10.1007/s11590-019-01486-x

    Article  Google Scholar 

  11. Gnecco, G., Nutarelli, F.: Optimal trade-off between sample size and precision of supervision for the fixed effects panel data model. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds.) LOD 2019. LNCS, vol. 11943, pp. 531–542. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37599-7_44

    Chapter  Google Scholar 

  12. Soler, I.P., Gemar, G., Guerrero-Murillo, R.: Family and non-family business behaviour in the wine sector: a comparative study. Eur. J. Family Bus. 7(1), 65–73 (2017)

    Article  Google Scholar 

  13. Peltonen, J.: Can supervised machine learning be used to identify family firms using a sophisticated definition? Acad. Manag. Proc. 2018(1) (2018). 6 pages. https://doi.org/10.5465/AMBPP.2018.154

  14. Beck, L., Janssens, W., Debruyne, M., Lommelen, T.: A study of the relationships between generation, market orientation, and innovation in family firms. Family Bus. Rev. 24(3), 252–272 (2011)

    Article  Google Scholar 

  15. Litz, R.A.: The family business: toward definitional clarity. Family Bus. Rev. 8(2), 71–81 (1995)

    Article  Google Scholar 

  16. Chua, J.H., Chrisman, J.J., Sharma, P.: Defining the family business by behavior. Entrepr. Theory Pract. 23(4), 19–39 (1999)

    Article  Google Scholar 

  17. Astrachan, J.H., Klein, S.B., Smyrnios, K.X.: The F-PEC scale of family influence: a proposal for solving the family business definition problem. Family Bus. Rev. 15(1), 45–58 (2002)

    Article  Google Scholar 

  18. Corbetta, G., Salvato, C.: Strategies for Longevity in Family Firms: A European Perspective. Palgrave Macmillan, London (2012)

    Book  Google Scholar 

  19. Baù, M., Chirico, F., Pittino, D., Backman, M., Klaesson, J.: Roots to grow: family firms and local embeddedness in rural and urban contexts. Entrepr. Theory Pract. 43(2), 360–385 (2018)

    Article  Google Scholar 

  20. Basco, R.: The family’s effect on family firm performance: a model testing the demographic and essence approaches. J. Family Bus. Strat. 4(2), 42–66 (2013)

    Article  Google Scholar 

  21. Plonsky, O., Erev, I., Hazan, T., Tennenholtz, M.: Psychological forest: predicting human behavior. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI 2017), San Francisco, USA, pp. 656–662 (2017)

    Google Scholar 

  22. Greene, W.H.: Econometrics Analysis. Prentice Hall, Upper Saddle River (2003)

    Google Scholar 

  23. Snedecor, G.W., Cochran, W.G.: Statistical Methods. Iowa State University Press, Iowa (1989)

    Google Scholar 

  24. Collin, S.M.H.: Dictionary of Accounting. A & C Black Publishers, London (2007)

    Google Scholar 

  25. Mooney, K.: The Essential Accounting Dictionary. Sphinx Publishing (2008)

    Google Scholar 

  26. Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning: data mining, inference, and prediction. Springer (2008)

    Google Scholar 

  27. Nguyen, B.P., Tay, W.-L., Chui, C.-K.: Robust biometric recognition from palm depth images for gloved hands. IEEE Trans. Hum. Mach. Syst. 45(6), 799–804 (2015)

    Article  Google Scholar 

  28. Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice Hall, Upper Saddle River (1998)

    MATH  Google Scholar 

  29. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  30. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), San Francisco, USA, pp. 785–794 (2016)

    Google Scholar 

  31. Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, USA, pp. 3149–3157 (2017)

    Google Scholar 

  32. Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A.: CatBoost: unbiased boosting with categorical features. In: Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada, pp. 6638–6648 (2018)

    Google Scholar 

  33. Hansen, L.K., Rieger, L.: Interpretability in intelligent systems – a new concept? In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700, pp. 41–49. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6_3

    Chapter  Google Scholar 

  34. McConaughy, D.L., Walker, M.C., Henderson Jr., G.V., Mishra, C.S.: Founding family controlled firms: efficiency and value. Rev. Financ. Econ. 7(1), 1–19 (1998)

    Article  Google Scholar 

  35. Martikainen, M., Nikkinen, J., Vähämaa, S.: Production functions and productivity of family firms: evidence from the S&P 500. Q. Rev. Econ. Finance 49(2), 295–307 (2009)

    Article  Google Scholar 

  36. Anderson, R.C., Mansi, S.A., Reeb, D.M.: Founding-family ownership and the agency cost of debt. J. Financ. Econ. 68(2), 263–287 (2003)

    Article  Google Scholar 

  37. Basuchoudhary, A., Bang, J.T., Sen, T.: Machine-Learning Techniques in Economics. SE. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69014-8

    Book  Google Scholar 

  38. Cameron, A.C., Trivedi, P.K.: Microeconometrics: Methods and Applications. Cambridge University Press, Cambridge (2005)

    Book  Google Scholar 

  39. Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)

    Article  Google Scholar 

  40. Choen, S., Ruppin, E., Dror, G.: Feature selection based on the Shapley value. In: Proceedings of the 19th International Joint Conference on Artificial intelligence (IJCAI 2005), Edinburgh, Scotland, pp. 665–670 (2005)

    Google Scholar 

  41. Chapelle, O., Schölkopf, B., Zien, A.: Semi-Supervised Learning. MIT Press, Cambridge (2006)

    Book  Google Scholar 

Download references

Acknowledgements

The first author is a member of the Gruppo Nazionale per l’Analisi Matematica, la Probabilità e le loro Applicazioni (GNAMPA) of the Istituto Nazionale di Alta Matematica (INdAM), Italy. We would like to thank Giovanni Foresti and Sara Giusti of Intesa Sanpaolo Research Unit (Direzione Studi e Ricerche Intesa Sanpaolo), as part of a joint research program with the IMT - School for Advanced Studies, Lucca.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giorgio Gnecco .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gnecco, G., Amato, S., Patuelli, A., Lattanzi, N. (2020). Machine Learning Application to Family Business Status Classification. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2020. Lecture Notes in Computer Science(), vol 12565. Springer, Cham. https://doi.org/10.1007/978-3-030-64583-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-64583-0_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-64582-3

  • Online ISBN: 978-3-030-64583-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics