Skip to main content
Log in

Copula-Based Synthetic Data Generation in Firm-Size Variables

  • Article
  • Published:
The Review of Socionetwork Strategies Aims and scope Submit manuscript

Abstract

Using the survival Clayton copula, we propose a method for generating synthetic data on such firm-size variables as operating revenues and the number of employees. Synthetic data must satisfy two stylized facts on firm-size statistics. First, firm-size distributions have power-law tails. Second, there should be a Gibrat’s law for the ratio of two different firm-size variables. With the survival Clayton copula, we introduce random variables whose marginal distributions are uniform on the interval from 0 to 1, and transform them to obey power-law distributions. The resulting variables satisfy the two stylized facts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Tremblay, J., Prakash, A., Acuna, D., Brophy, M., Jampani, V., Anil, C., To, T., Cameracci, E., Boochoon, S., Birchfield, S. (2018). in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2018)

  2. Frid-Adar, M., Klang, E., Amitai, M., Goldberger, J., Greenspan, H. (2018). in 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018) (2018), pp. 289–293. https://doi.org/10.1109/ISBI.2018.8363576.

  3. Abowd, JM., Vilhuber, L., in Privacy in Statistical Databases, ed. by J. Domingo-Ferrer, Y. Saygın (Springer Berlin Heidelberg, Berlin, Heidelberg, 2008), pp. 239–246

  4. Shermeyer, J., Hossler, T., Van Etten, A.., Hogan, D., Lewis, R., Kim, D. (2020). Rareplanes: Synthetic data takes flight (2020). https://doi.org/10.48550/ARXIV.2006.02963

  5. Nikolenko, S. I. (2021). Synthetic Data for Deep Learning. Cham: Springer. https://doi.org/10.1007/978-3-030-75178-4

    Book  Google Scholar 

  6. Xin, B., Geng, Y., Hu, T., Chen, S., Yang, W., Wang, S., & Huang, L. (2022). Federated synthetic data generation with differential privacy. Neurocomputing, 468, 1. https://doi.org/10.1016/j.neucom.2021.10.027

    Article  Google Scholar 

  7. Ishikawa, A. (2021). Statistical Properties in Firms’ Large-scale Data. Singapore: Springer. https://doi.org/10.1007/978-981-16-2297-7

    Book  Google Scholar 

  8. Pareto, V. F. D. (1897). Cours d’Economique Politique. London: Macmillan.

    Google Scholar 

  9. Newman, M. E. J. (2005). Power laws, Pareto distributions and Zipf’s law. Contemporary Physics, 46, 323. https://doi.org/10.1080/00107510500052444

    Article  Google Scholar 

  10. Clauset, A., Shalizi, C. R., & Newman, M. E. J. (2009). Power-Law Distributions in Empirical Data. SIAM Review, 51(4), 661. https://doi.org/10.1137/070710111

    Article  Google Scholar 

  11. Fujimoto, S., Ishikawa, A., Mizuno, T., & Watanabe, T. (2011). A New Method for Measuring Tail Exponents of Firm Size Distributions. Economics-The Open Access Open-Assessment E-Journal, 5, 2011. https://doi.org/10.5018/economics-ejournal.ja.2011-20.

    Article  Google Scholar 

  12. Gibrat, R. (1931). Les Inégalités économiques. Paris: Recueil Sirey.

    Google Scholar 

  13. Badger, WW. (1980). Mathematical models as a tool for social science pp. 87–120

  14. Montroll, E. W., & Shlesinger, M. F. (1983). Maximum entropy formalism, fractals, scaling phenomena, and \(1/f\) noise: a tale of tails. Journal of Statistical Physics, 32(2), 209. https://doi.org/10.1007/BF01012708.

    Article  Google Scholar 

  15. Fujiwara, Y., Souma, W., Aoyama, H., Kaizoji, T., & Aoki, M. (2003). Growth and fluctuations of personal income. Physica A, 321(3), 598. https://doi.org/10.1016/S0378-4371(02)01663-1.

    Article  Google Scholar 

  16. Fujiwara, Y., Di Guilmi, C., Aoyama, H., Gallegati, M., & Souma, W. (2004). Do Pareto-Zipf and Gibrat laws hold true? An analysis with European firms. Physica A, 335(1), 197. https://doi.org/10.1016/j.physa.2003.12.015.

    Article  Google Scholar 

  17. Ishikawa, A., Fujimoto, S., Mizuno, T. (2022). Statistical Properties of Labor Productivity Distributions. Frontiers in Physics 10. https://doi.org/10.3389/fphy.2022.848193

  18. Sklar, A. (1959). Fonctions de répartition à n dimensions et leurs marges. Publications de l’Institut de Statistique de l’Université de Paris, 8, 229.

    Google Scholar 

  19. Nelsen, R. B. (2006). An Introduction to Copulas. New York, NY: Springer. https://doi.org/10.1007/0-387-28678-0

    Book  Google Scholar 

  20. Li, Y., Gong, Y., & Huang, C. (2021). Construction of combined drought index based on bivariate joint distribution. Alexandria Engineering Journal, 60(3), 2825. https://doi.org/10.1016/j.aej.2021.01.006

    Article  Google Scholar 

  21. Benali, F., Bodénès, D., Labroche, N., de Runz, C. (2021). in DOLAP.

  22. Georges, P., Lamy, A. G., Nicolas, E., & Quibel, G., Roncalli, T. (2001). https://doi.org/10.2139/ssrn.1032559. https://ssrn.com/abstract=1032559

  23. Czado, C. (2019). Analyzing Dependent Data with Vine Copulas. Springer International Publishing. https://doi.org/10.1007/978-3-030-13785-4

  24. Clayton, D. G. (1978). A Model for Association in Bivariate Life Tables and Its Application in Epidemiological Studies of Familial Tendency in Chronic Disease Incidence. Biometrika, 65(1), 141. https://doi.org/10.1093/biomet/65.1.141

    Article  Google Scholar 

  25. Li, D. X. (2000). On Default Correlation. The Journal of Fixed Income, 9(4), 43. https://doi.org/10.3905/jfi.2000.319253

    Article  Google Scholar 

  26. Frank, M. J. (1979). On the simultaneous associativity of \(F(x,y)\) and \(x+y-F(x,y)\). Aequationes Mathmaticae, 19(1), 194. https://doi.org/10.1007/bf02189866

    Article  Google Scholar 

  27. Gumbel, E. J. (1960). Bivariate Exponential Distributions. Journal of the American Statistical Association, 55, 698. https://doi.org/10.1080/01621459.1960.10483368

    Article  Google Scholar 

  28. Kojadinovic, Ivan, & Yan, Jun. (2010). Modeling multivariate distributions with continuous margins using the copula R package. Journal of Statistical Software, 34(9), 1.

    Article  Google Scholar 

  29. Hofert, Marius, & Mächler, Martin. (2011). Nested archimedean copulas meet R: the nacopula Package. Journal of Statistical Software, 39(9), 1.

    Article  Google Scholar 

  30. Joe, H. (2014). Dependence Modeling with Copulas. Chapman and Hall/CRC. https://doi.org/10.1201/b17116

  31. Bolbolian Ghalibaf, M. (2020). Relationship between Kendall’s tau correlation and mutual information. Revista Colombiana de Estadistica, 43(1), 3. https://doi.org/10.15446/rce.v43n1.78054.

    Article  Google Scholar 

  32. Devroye, L. (1986). Non-Uniform Random Variate Generation. New York, NY: Springer. https://doi.org/10.1007/978-1-4613-8643-8

    Book  Google Scholar 

  33. Bureau van dijk: Private company information – orbis. https://www.bvdinfo.com/

  34. Ishikawa, A., Mizuno, T., Fujimoto, S. The Review of Socionetwork Strategies (in press)

  35. Fujimoto, S., Mizuno, T., & Ishikawa, A. (2022). Interpolation of non-random missing values in financial statements?f big data using CatBoost. Journal of Computational Social Science. https://doi.org/10.1007/s42001-022-00165-9.

  36. Ishikawa, A., Fujimoto, S., & Mizuno, T. (2020). Why does production function take the Cobb-Douglas form? Evolutionary and Institutional Economics Review. https://doi.org/10.1007/s40844-020-00180-3.

Download references

Acknowledgements

We express deep gratitude to Dr. Ietomi, Dr. Ohnishi, Dr. Tanaka and Dr. Watanabe, all of whom inspired us to start this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shouji Fujimoto.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethics Approval

This article contains no studies with human participants performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fujimoto, S., Ishikawa, A. & Mizuno, T. Copula-Based Synthetic Data Generation in Firm-Size Variables. Rev Socionetwork Strat 16, 479–492 (2022). https://doi.org/10.1007/s12626-022-00128-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12626-022-00128-6

Keywords

Mathematics Subject Classification

Navigation