Machine Learning Application to Family Business Status Classification

Gnecco, Giorgio; Amato, Stefano; Patuelli, Alessia; Lattanzi, Nicola

doi:10.1007/978-3-030-64583-0_3

Giorgio Gnecco¹⁶,
Stefano Amato¹⁶,
Alessia Patuelli¹⁶ &
…
Nicola Lattanzi¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12565))

Included in the following conference series:

International Conference on Machine Learning, Optimization, and Data Science

1583 Accesses
1 Citations

Abstract

According to a recent trend of research, there is a growing interest in applications of machine learning techniques to business analytics. In this work, both supervised and unsupervised machine learning techniques are applied to the analysis of a dataset made of both family and non-family firms. This is worth investigating, because the two kinds of firms typically differ in some aspects related to performance, which can be reflected in balance sheet data. First, binary classification techniques are applied to discriminate the two kinds of firms, by combining an unlabeled dataset with the labels provided by a survey. The most important features for performing such binary classification are identified. Then, clustering is applied to highlight why supervised learning can be effective in the previous task, by showing that most of the largest clusters found are quite unequally populated by the two classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
It is worth noting that the distinction between family/non-family firms depends not only on their different ownership structures, but also on other factors. Moreover, sometimes it is not immediate to understand if owners belong to the same family only based on ownership data, for two reasons: they might have different surnames, even though they belong to the same family (this is particularly common when family firms go through several generational shifts); there might be corporate groups linked by shareholding, which may make it difficult to identify individual owners. Current definitions of family firm follow one of these two approaches: the demographic approach (combining family ownership, governance and/or management) and the essence approach (the behavioral perspective on the firm’s nature). Demographic approaches combining ownership and management (or governance) are the most widely used [20].
2.
The complete dataset is made of 152 companies, and is comparable in size with other datasets used in family business research [14].
3.
Here, we used a mixed demographic approach. This definition is very close to the one proposed by the European Commission (https://ec.europa.eu/growth/smes/promoting-entrepreneurship/we-work-for/family-business_en). We raised the minimum number of family members involved in the firm’s governance from 1 to 2, to ensure a clearer demarcation between sole-founder and family-owned and governed firms.
4.
AIDA, which stands for “Analisi Informatizzata delle Aziende Italiane” (English translation: “Computerised Analysis of Italian Firms”) is a commercial database managed by Bureau van Dijk, a Moody’s Analytics company, which contains a comprehensive set of financial informations on companies in Italy.
5.
This way of pre-selecting a set of features (for further successive selection of its subset) can improve the overall performance of machine learning algorithms, as shown in [21] in a different framework.
6.
Initially, for each feature, also the average of its annual change was considered, but the difference in its means over the two classes never turned out to be statistically significant.
7.
Other architectures/machine learning algorithms (e.g., enhanced k-nearest neighbors [27], multilayer perceptrons [28], random forests [29], gradient boosting algorithms like XGBoost [30], LightGBM [31], or CatBoost [32]) could be considered for further developments.
8.
The techniques adopted in [13] are random forests, boosting trees, and artificial neural networks. These are justified by the much larger amount of features and training data available in the dataset considered therein.

References

Giudici, P., Figini, S.: Applied Data Mining for Business and Industry. Wiley, Hoboken (2009)
Book Google Scholar
Alexandropoulos, S.-A.N., Aridas, C.K., Kotsiantis, S.B., Vrahatis, M.N.: A deep dense neural network for bankruptcy prediction. In: Macintyre, J., Iliadis, L., Maglogiannis, I., Jayne, C. (eds.) EANN 2019. CCIS, vol. 1000, pp. 435–444. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20257-6_37
Chapter Google Scholar
Kirkos, E., Spathis, C., Manolopoulos, Y.: Data mining techniques for the detection of fraudulent financial statements using data mining. Expert Syst. Appl. 32(4), 995–1003 (2007)
Article Google Scholar
Perols, J.: Financial statement fraud detection: an analysis of statistical and machine learning algorithms. Audit. J. Pract. Theory 30(2), 19–50 (2011)
Article Google Scholar
Varian, H.R.: Big data: new tricks for econometrics. J. Econ. Perspect. 28, 3–28 (2014)
Article Google Scholar
Athey, S., Imbens, G.: Recursive partitioning for heterogeneous causal effects. Proc. Natl. Acad. Sci. 113, 7353–7360 (2016)
Article MathSciNet Google Scholar
Wager, S., Athey, S.: Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. 113, 1228–1242 (2018)
Article MathSciNet Google Scholar
Bargagli Stoffi, F.J., Gnecco, G.: Causal tree with instrumental variable: an extension of the causal tree framework to irregular assignment mechanisms. Int. J. Data Sci. Anal. 9(3), 315–337 (2019). https://doi.org/10.1007/s41060-019-00187-z
Article Google Scholar
Gnecco, G., Nutarelli, F.: On the trade-off between number of examples and precision of supervision in regression. In: Proceedings of the 4th International Conference of the International Neural Network Society on Big Data and Deep Learning (INNS BDDL 2019), Sestri Levante, Italy, pp. 1–6 (2019)
Google Scholar
Gnecco, G., Nutarelli, F.: On the trade-off between number of examples and precision of supervision in machine learning problems. Optim. Lett. 3, 1–23 (2019). https://doi.org/10.1007/s11590-019-01486-x
Article Google Scholar
Gnecco, G., Nutarelli, F.: Optimal trade-off between sample size and precision of supervision for the fixed effects panel data model. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds.) LOD 2019. LNCS, vol. 11943, pp. 531–542. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37599-7_44
Chapter Google Scholar
Soler, I.P., Gemar, G., Guerrero-Murillo, R.: Family and non-family business behaviour in the wine sector: a comparative study. Eur. J. Family Bus. 7(1), 65–73 (2017)
Article Google Scholar
Peltonen, J.: Can supervised machine learning be used to identify family firms using a sophisticated definition? Acad. Manag. Proc. 2018(1) (2018). 6 pages. https://doi.org/10.5465/AMBPP.2018.154
Beck, L., Janssens, W., Debruyne, M., Lommelen, T.: A study of the relationships between generation, market orientation, and innovation in family firms. Family Bus. Rev. 24(3), 252–272 (2011)
Article Google Scholar
Litz, R.A.: The family business: toward definitional clarity. Family Bus. Rev. 8(2), 71–81 (1995)
Article Google Scholar
Chua, J.H., Chrisman, J.J., Sharma, P.: Defining the family business by behavior. Entrepr. Theory Pract. 23(4), 19–39 (1999)
Article Google Scholar
Astrachan, J.H., Klein, S.B., Smyrnios, K.X.: The F-PEC scale of family influence: a proposal for solving the family business definition problem. Family Bus. Rev. 15(1), 45–58 (2002)
Article Google Scholar
Corbetta, G., Salvato, C.: Strategies for Longevity in Family Firms: A European Perspective. Palgrave Macmillan, London (2012)
Book Google Scholar
Baù, M., Chirico, F., Pittino, D., Backman, M., Klaesson, J.: Roots to grow: family firms and local embeddedness in rural and urban contexts. Entrepr. Theory Pract. 43(2), 360–385 (2018)
Article Google Scholar
Basco, R.: The family’s effect on family firm performance: a model testing the demographic and essence approaches. J. Family Bus. Strat. 4(2), 42–66 (2013)
Article Google Scholar
Plonsky, O., Erev, I., Hazan, T., Tennenholtz, M.: Psychological forest: predicting human behavior. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI 2017), San Francisco, USA, pp. 656–662 (2017)
Google Scholar
Greene, W.H.: Econometrics Analysis. Prentice Hall, Upper Saddle River (2003)
Google Scholar
Snedecor, G.W., Cochran, W.G.: Statistical Methods. Iowa State University Press, Iowa (1989)
Google Scholar
Collin, S.M.H.: Dictionary of Accounting. A & C Black Publishers, London (2007)
Google Scholar
Mooney, K.: The Essential Accounting Dictionary. Sphinx Publishing (2008)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning: data mining, inference, and prediction. Springer (2008)
Google Scholar
Nguyen, B.P., Tay, W.-L., Chui, C.-K.: Robust biometric recognition from palm depth images for gloved hands. IEEE Trans. Hum. Mach. Syst. 45(6), 799–804 (2015)
Article Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice Hall, Upper Saddle River (1998)
MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), San Francisco, USA, pp. 785–794 (2016)
Google Scholar
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, USA, pp. 3149–3157 (2017)
Google Scholar
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A.: CatBoost: unbiased boosting with categorical features. In: Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada, pp. 6638–6648 (2018)
Google Scholar
Hansen, L.K., Rieger, L.: Interpretability in intelligent systems – a new concept? In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700, pp. 41–49. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6_3
Chapter Google Scholar
McConaughy, D.L., Walker, M.C., Henderson Jr., G.V., Mishra, C.S.: Founding family controlled firms: efficiency and value. Rev. Financ. Econ. 7(1), 1–19 (1998)
Article Google Scholar
Martikainen, M., Nikkinen, J., Vähämaa, S.: Production functions and productivity of family firms: evidence from the S&P 500. Q. Rev. Econ. Finance 49(2), 295–307 (2009)
Article Google Scholar
Anderson, R.C., Mansi, S.A., Reeb, D.M.: Founding-family ownership and the agency cost of debt. J. Financ. Econ. 68(2), 263–287 (2003)
Article Google Scholar
Basuchoudhary, A., Bang, J.T., Sen, T.: Machine-Learning Techniques in Economics. SE. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69014-8
Book Google Scholar
Cameron, A.C., Trivedi, P.K.: Microeconometrics: Methods and Applications. Cambridge University Press, Cambridge (2005)
Book Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Article Google Scholar
Choen, S., Ruppin, E., Dror, G.: Feature selection based on the Shapley value. In: Proceedings of the 19th International Joint Conference on Artificial intelligence (IJCAI 2005), Edinburgh, Scotland, pp. 665–670 (2005)
Google Scholar
Chapelle, O., Schölkopf, B., Zien, A.: Semi-Supervised Learning. MIT Press, Cambridge (2006)
Book Google Scholar

Download references

Acknowledgements

The first author is a member of the Gruppo Nazionale per l’Analisi Matematica, la Probabilità e le loro Applicazioni (GNAMPA) of the Istituto Nazionale di Alta Matematica (INdAM), Italy. We would like to thank Giovanni Foresti and Sara Giusti of Intesa Sanpaolo Research Unit (Direzione Studi e Ricerche Intesa Sanpaolo), as part of a joint research program with the IMT - School for Advanced Studies, Lucca.

Author information

Authors and Affiliations

AXES (Laboratory for the Analysis of CompleX Economic Systems), IMT - School for Advanced Studies, Piazza S. Francesco 19, Lucca, Italy
Giorgio Gnecco, Stefano Amato, Alessia Patuelli & Nicola Lattanzi

Authors

Giorgio Gnecco
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Amato
View author publications
You can also search for this author in PubMed Google Scholar
Alessia Patuelli
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Lattanzi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giorgio Gnecco .

Editor information

Editors and Affiliations

University of Catania, Catania, Italy
Giuseppe Nicosia
University of Reading, Reading, UK
Varun Ojha
University of Oxford, Oxford, UK
Emanuele La Malfa
University of Cambridge, Cambridge, UK
Giorgio Jansen
Almawave, Rome, Italy
Vincenzo Sciacca
University of Florida, Gainesville, FL, USA
Panos Pardalos
University of Catania, Catania, Italy
Giovanni Giuffrida
Harvard University, Cambridge, MA, USA
Renato Umeton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gnecco, G., Amato, S., Patuelli, A., Lattanzi, N. (2020). Machine Learning Application to Family Business Status Classification. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2020. Lecture Notes in Computer Science(), vol 12565. Springer, Cham. https://doi.org/10.1007/978-3-030-64583-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-64583-0_3
Published: 08 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64582-3
Online ISBN: 978-3-030-64583-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics