Abstract
Using the survival Clayton copula, we propose a method for generating synthetic data on such firm-size variables as operating revenues and the number of employees. Synthetic data must satisfy two stylized facts on firm-size statistics. First, firm-size distributions have power-law tails. Second, there should be a Gibrat’s law for the ratio of two different firm-size variables. With the survival Clayton copula, we introduce random variables whose marginal distributions are uniform on the interval from 0 to 1, and transform them to obey power-law distributions. The resulting variables satisfy the two stylized facts.
Similar content being viewed by others
References
Tremblay, J., Prakash, A., Acuna, D., Brophy, M., Jampani, V., Anil, C., To, T., Cameracci, E., Boochoon, S., Birchfield, S. (2018). in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2018)
Frid-Adar, M., Klang, E., Amitai, M., Goldberger, J., Greenspan, H. (2018). in 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018) (2018), pp. 289–293. https://doi.org/10.1109/ISBI.2018.8363576.
Abowd, JM., Vilhuber, L., in Privacy in Statistical Databases, ed. by J. Domingo-Ferrer, Y. Saygın (Springer Berlin Heidelberg, Berlin, Heidelberg, 2008), pp. 239–246
Shermeyer, J., Hossler, T., Van Etten, A.., Hogan, D., Lewis, R., Kim, D. (2020). Rareplanes: Synthetic data takes flight (2020). https://doi.org/10.48550/ARXIV.2006.02963
Nikolenko, S. I. (2021). Synthetic Data for Deep Learning. Cham: Springer. https://doi.org/10.1007/978-3-030-75178-4
Xin, B., Geng, Y., Hu, T., Chen, S., Yang, W., Wang, S., & Huang, L. (2022). Federated synthetic data generation with differential privacy. Neurocomputing, 468, 1. https://doi.org/10.1016/j.neucom.2021.10.027
Ishikawa, A. (2021). Statistical Properties in Firms’ Large-scale Data. Singapore: Springer. https://doi.org/10.1007/978-981-16-2297-7
Pareto, V. F. D. (1897). Cours d’Economique Politique. London: Macmillan.
Newman, M. E. J. (2005). Power laws, Pareto distributions and Zipf’s law. Contemporary Physics, 46, 323. https://doi.org/10.1080/00107510500052444
Clauset, A., Shalizi, C. R., & Newman, M. E. J. (2009). Power-Law Distributions in Empirical Data. SIAM Review, 51(4), 661. https://doi.org/10.1137/070710111
Fujimoto, S., Ishikawa, A., Mizuno, T., & Watanabe, T. (2011). A New Method for Measuring Tail Exponents of Firm Size Distributions. Economics-The Open Access Open-Assessment E-Journal, 5, 2011. https://doi.org/10.5018/economics-ejournal.ja.2011-20.
Gibrat, R. (1931). Les Inégalités économiques. Paris: Recueil Sirey.
Badger, WW. (1980). Mathematical models as a tool for social science pp. 87–120
Montroll, E. W., & Shlesinger, M. F. (1983). Maximum entropy formalism, fractals, scaling phenomena, and \(1/f\) noise: a tale of tails. Journal of Statistical Physics, 32(2), 209. https://doi.org/10.1007/BF01012708.
Fujiwara, Y., Souma, W., Aoyama, H., Kaizoji, T., & Aoki, M. (2003). Growth and fluctuations of personal income. Physica A, 321(3), 598. https://doi.org/10.1016/S0378-4371(02)01663-1.
Fujiwara, Y., Di Guilmi, C., Aoyama, H., Gallegati, M., & Souma, W. (2004). Do Pareto-Zipf and Gibrat laws hold true? An analysis with European firms. Physica A, 335(1), 197. https://doi.org/10.1016/j.physa.2003.12.015.
Ishikawa, A., Fujimoto, S., Mizuno, T. (2022). Statistical Properties of Labor Productivity Distributions. Frontiers in Physics 10. https://doi.org/10.3389/fphy.2022.848193
Sklar, A. (1959). Fonctions de répartition à n dimensions et leurs marges. Publications de l’Institut de Statistique de l’Université de Paris, 8, 229.
Nelsen, R. B. (2006). An Introduction to Copulas. New York, NY: Springer. https://doi.org/10.1007/0-387-28678-0
Li, Y., Gong, Y., & Huang, C. (2021). Construction of combined drought index based on bivariate joint distribution. Alexandria Engineering Journal, 60(3), 2825. https://doi.org/10.1016/j.aej.2021.01.006
Benali, F., Bodénès, D., Labroche, N., de Runz, C. (2021). in DOLAP.
Georges, P., Lamy, A. G., Nicolas, E., & Quibel, G., Roncalli, T. (2001). https://doi.org/10.2139/ssrn.1032559. https://ssrn.com/abstract=1032559
Czado, C. (2019). Analyzing Dependent Data with Vine Copulas. Springer International Publishing. https://doi.org/10.1007/978-3-030-13785-4
Clayton, D. G. (1978). A Model for Association in Bivariate Life Tables and Its Application in Epidemiological Studies of Familial Tendency in Chronic Disease Incidence. Biometrika, 65(1), 141. https://doi.org/10.1093/biomet/65.1.141
Li, D. X. (2000). On Default Correlation. The Journal of Fixed Income, 9(4), 43. https://doi.org/10.3905/jfi.2000.319253
Frank, M. J. (1979). On the simultaneous associativity of \(F(x,y)\) and \(x+y-F(x,y)\). Aequationes Mathmaticae, 19(1), 194. https://doi.org/10.1007/bf02189866
Gumbel, E. J. (1960). Bivariate Exponential Distributions. Journal of the American Statistical Association, 55, 698. https://doi.org/10.1080/01621459.1960.10483368
Kojadinovic, Ivan, & Yan, Jun. (2010). Modeling multivariate distributions with continuous margins using the copula R package. Journal of Statistical Software, 34(9), 1.
Hofert, Marius, & Mächler, Martin. (2011). Nested archimedean copulas meet R: the nacopula Package. Journal of Statistical Software, 39(9), 1.
Joe, H. (2014). Dependence Modeling with Copulas. Chapman and Hall/CRC. https://doi.org/10.1201/b17116
Bolbolian Ghalibaf, M. (2020). Relationship between Kendall’s tau correlation and mutual information. Revista Colombiana de Estadistica, 43(1), 3. https://doi.org/10.15446/rce.v43n1.78054.
Devroye, L. (1986). Non-Uniform Random Variate Generation. New York, NY: Springer. https://doi.org/10.1007/978-1-4613-8643-8
Bureau van dijk: Private company information – orbis. https://www.bvdinfo.com/
Ishikawa, A., Mizuno, T., Fujimoto, S. The Review of Socionetwork Strategies (in press)
Fujimoto, S., Mizuno, T., & Ishikawa, A. (2022). Interpolation of non-random missing values in financial statements?f big data using CatBoost. Journal of Computational Social Science. https://doi.org/10.1007/s42001-022-00165-9.
Ishikawa, A., Fujimoto, S., & Mizuno, T. (2020). Why does production function take the Cobb-Douglas form? Evolutionary and Institutional Economics Review. https://doi.org/10.1007/s40844-020-00180-3.
Acknowledgements
We express deep gratitude to Dr. Ietomi, Dr. Ohnishi, Dr. Tanaka and Dr. Watanabe, all of whom inspired us to start this research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethics Approval
This article contains no studies with human participants performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fujimoto, S., Ishikawa, A. & Mizuno, T. Copula-Based Synthetic Data Generation in Firm-Size Variables. Rev Socionetwork Strat 16, 479–492 (2022). https://doi.org/10.1007/s12626-022-00128-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12626-022-00128-6