Abstract
According to a recent trend of research, there is a growing interest in applications of machine learning techniques to business analytics. In this work, both supervised and unsupervised machine learning techniques are applied to the analysis of a dataset made of both family and non-family firms. This is worth investigating, because the two kinds of firms typically differ in some aspects related to performance, which can be reflected in balance sheet data. First, binary classification techniques are applied to discriminate the two kinds of firms, by combining an unlabeled dataset with the labels provided by a survey. The most important features for performing such binary classification are identified. Then, clustering is applied to highlight why supervised learning can be effective in the previous task, by showing that most of the largest clusters found are quite unequally populated by the two classes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
It is worth noting that the distinction between family/non-family firms depends not only on their different ownership structures, but also on other factors. Moreover, sometimes it is not immediate to understand if owners belong to the same family only based on ownership data, for two reasons: they might have different surnames, even though they belong to the same family (this is particularly common when family firms go through several generational shifts); there might be corporate groups linked by shareholding, which may make it difficult to identify individual owners. Current definitions of family firm follow one of these two approaches: the demographic approach (combining family ownership, governance and/or management) and the essence approach (the behavioral perspective on the firm’s nature). Demographic approaches combining ownership and management (or governance) are the most widely used [20].
- 2.
The complete dataset is made of 152 companies, and is comparable in size with other datasets used in family business research [14].
- 3.
Here, we used a mixed demographic approach. This definition is very close to the one proposed by the European Commission (https://ec.europa.eu/growth/smes/promoting-entrepreneurship/we-work-for/family-business_en). We raised the minimum number of family members involved in the firm’s governance from 1 to 2, to ensure a clearer demarcation between sole-founder and family-owned and governed firms.
- 4.
AIDA, which stands for “Analisi Informatizzata delle Aziende Italiane” (English translation: “Computerised Analysis of Italian Firms”) is a commercial database managed by Bureau van Dijk, a Moody’s Analytics company, which contains a comprehensive set of financial informations on companies in Italy.
- 5.
This way of pre-selecting a set of features (for further successive selection of its subset) can improve the overall performance of machine learning algorithms, as shown in [21] in a different framework.
- 6.
Initially, for each feature, also the average of its annual change was considered, but the difference in its means over the two classes never turned out to be statistically significant.
- 7.
- 8.
The techniques adopted in [13] are random forests, boosting trees, and artificial neural networks. These are justified by the much larger amount of features and training data available in the dataset considered therein.
References
Giudici, P., Figini, S.: Applied Data Mining for Business and Industry. Wiley, Hoboken (2009)
Alexandropoulos, S.-A.N., Aridas, C.K., Kotsiantis, S.B., Vrahatis, M.N.: A deep dense neural network for bankruptcy prediction. In: Macintyre, J., Iliadis, L., Maglogiannis, I., Jayne, C. (eds.) EANN 2019. CCIS, vol. 1000, pp. 435–444. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20257-6_37
Kirkos, E., Spathis, C., Manolopoulos, Y.: Data mining techniques for the detection of fraudulent financial statements using data mining. Expert Syst. Appl. 32(4), 995–1003 (2007)
Perols, J.: Financial statement fraud detection: an analysis of statistical and machine learning algorithms. Audit. J. Pract. Theory 30(2), 19–50 (2011)
Varian, H.R.: Big data: new tricks for econometrics. J. Econ. Perspect. 28, 3–28 (2014)
Athey, S., Imbens, G.: Recursive partitioning for heterogeneous causal effects. Proc. Natl. Acad. Sci. 113, 7353–7360 (2016)
Wager, S., Athey, S.: Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. 113, 1228–1242 (2018)
Bargagli Stoffi, F.J., Gnecco, G.: Causal tree with instrumental variable: an extension of the causal tree framework to irregular assignment mechanisms. Int. J. Data Sci. Anal. 9(3), 315–337 (2019). https://doi.org/10.1007/s41060-019-00187-z
Gnecco, G., Nutarelli, F.: On the trade-off between number of examples and precision of supervision in regression. In: Proceedings of the 4th International Conference of the International Neural Network Society on Big Data and Deep Learning (INNS BDDL 2019), Sestri Levante, Italy, pp. 1–6 (2019)
Gnecco, G., Nutarelli, F.: On the trade-off between number of examples and precision of supervision in machine learning problems. Optim. Lett. 3, 1–23 (2019). https://doi.org/10.1007/s11590-019-01486-x
Gnecco, G., Nutarelli, F.: Optimal trade-off between sample size and precision of supervision for the fixed effects panel data model. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds.) LOD 2019. LNCS, vol. 11943, pp. 531–542. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37599-7_44
Soler, I.P., Gemar, G., Guerrero-Murillo, R.: Family and non-family business behaviour in the wine sector: a comparative study. Eur. J. Family Bus. 7(1), 65–73 (2017)
Peltonen, J.: Can supervised machine learning be used to identify family firms using a sophisticated definition? Acad. Manag. Proc. 2018(1) (2018). 6 pages. https://doi.org/10.5465/AMBPP.2018.154
Beck, L., Janssens, W., Debruyne, M., Lommelen, T.: A study of the relationships between generation, market orientation, and innovation in family firms. Family Bus. Rev. 24(3), 252–272 (2011)
Litz, R.A.: The family business: toward definitional clarity. Family Bus. Rev. 8(2), 71–81 (1995)
Chua, J.H., Chrisman, J.J., Sharma, P.: Defining the family business by behavior. Entrepr. Theory Pract. 23(4), 19–39 (1999)
Astrachan, J.H., Klein, S.B., Smyrnios, K.X.: The F-PEC scale of family influence: a proposal for solving the family business definition problem. Family Bus. Rev. 15(1), 45–58 (2002)
Corbetta, G., Salvato, C.: Strategies for Longevity in Family Firms: A European Perspective. Palgrave Macmillan, London (2012)
Baù, M., Chirico, F., Pittino, D., Backman, M., Klaesson, J.: Roots to grow: family firms and local embeddedness in rural and urban contexts. Entrepr. Theory Pract. 43(2), 360–385 (2018)
Basco, R.: The family’s effect on family firm performance: a model testing the demographic and essence approaches. J. Family Bus. Strat. 4(2), 42–66 (2013)
Plonsky, O., Erev, I., Hazan, T., Tennenholtz, M.: Psychological forest: predicting human behavior. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI 2017), San Francisco, USA, pp. 656–662 (2017)
Greene, W.H.: Econometrics Analysis. Prentice Hall, Upper Saddle River (2003)
Snedecor, G.W., Cochran, W.G.: Statistical Methods. Iowa State University Press, Iowa (1989)
Collin, S.M.H.: Dictionary of Accounting. A & C Black Publishers, London (2007)
Mooney, K.: The Essential Accounting Dictionary. Sphinx Publishing (2008)
Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning: data mining, inference, and prediction. Springer (2008)
Nguyen, B.P., Tay, W.-L., Chui, C.-K.: Robust biometric recognition from palm depth images for gloved hands. IEEE Trans. Hum. Mach. Syst. 45(6), 799–804 (2015)
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice Hall, Upper Saddle River (1998)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), San Francisco, USA, pp. 785–794 (2016)
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, USA, pp. 3149–3157 (2017)
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A.: CatBoost: unbiased boosting with categorical features. In: Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada, pp. 6638–6648 (2018)
Hansen, L.K., Rieger, L.: Interpretability in intelligent systems – a new concept? In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700, pp. 41–49. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6_3
McConaughy, D.L., Walker, M.C., Henderson Jr., G.V., Mishra, C.S.: Founding family controlled firms: efficiency and value. Rev. Financ. Econ. 7(1), 1–19 (1998)
Martikainen, M., Nikkinen, J., Vähämaa, S.: Production functions and productivity of family firms: evidence from the S&P 500. Q. Rev. Econ. Finance 49(2), 295–307 (2009)
Anderson, R.C., Mansi, S.A., Reeb, D.M.: Founding-family ownership and the agency cost of debt. J. Financ. Econ. 68(2), 263–287 (2003)
Basuchoudhary, A., Bang, J.T., Sen, T.: Machine-Learning Techniques in Economics. SE. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69014-8
Cameron, A.C., Trivedi, P.K.: Microeconometrics: Methods and Applications. Cambridge University Press, Cambridge (2005)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Choen, S., Ruppin, E., Dror, G.: Feature selection based on the Shapley value. In: Proceedings of the 19th International Joint Conference on Artificial intelligence (IJCAI 2005), Edinburgh, Scotland, pp. 665–670 (2005)
Chapelle, O., Schölkopf, B., Zien, A.: Semi-Supervised Learning. MIT Press, Cambridge (2006)
Acknowledgements
The first author is a member of the Gruppo Nazionale per l’Analisi Matematica, la Probabilità e le loro Applicazioni (GNAMPA) of the Istituto Nazionale di Alta Matematica (INdAM), Italy. We would like to thank Giovanni Foresti and Sara Giusti of Intesa Sanpaolo Research Unit (Direzione Studi e Ricerche Intesa Sanpaolo), as part of a joint research program with the IMT - School for Advanced Studies, Lucca.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Gnecco, G., Amato, S., Patuelli, A., Lattanzi, N. (2020). Machine Learning Application to Family Business Status Classification. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2020. Lecture Notes in Computer Science(), vol 12565. Springer, Cham. https://doi.org/10.1007/978-3-030-64583-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-64583-0_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64582-3
Online ISBN: 978-3-030-64583-0
eBook Packages: Computer ScienceComputer Science (R0)