Machine learning improves accounting: discussion, implementation and research opportunities

Abstract

Machine learning has been growing in importance in empirical accounting research. In this opinion piece, I review the unique challenges of going beyond prediction and leveraging these tools into generalizable conceptual insights. Taking as springboard “Machine learning improves accounting estimates” presented at the 2019 Conference of the Review of Accounting Studies, I propose a conceptual framework with various testable implications. I also develop implementation considerations panels with accounting data, such as colinearities between accounting numbers or suitable choices of validation and test samples to mitigate between-sample correlations. Lastly, I offer a personal viewpoint toward embracing the many low-hanging opportunities to bring the methodology into major unanswered accounting questions.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

Notes

  1. 1.

    This would be desirable if, for example, a decision-maker bears a quadratic loss \(\mathbb {E}((g-r_{t})^{2}|h^{t})\) when making a decision based on g. This representation is a normalization to the extent that we can always define rt as the quantity whose first moment is of interest to a decision-maker: if the decision-maker has a loss function \(\mathbb {E}(L(g,r_{t})|h_{t})\) with an optimum given by the first-order condition \(\mathbb {E}(L_{1}(g^{ML}(h^{t}),r_{t})|h_{t})=0\), we can redefine the (implied) quantity of interest as \(r_{t}^{\prime }\equiv L_{1}(g^{ML}(h^{t}),r_{t})+g^{ML}(h^{t})\), which satisfies (1).

  2. 2.

    I thank Ting Sun for sharing this analysis for purposes of discussion.

  3. 3.

    To illustrate, suppose that we (minimally) wish to separate a sample into three subsamples, a training sample to fit the model, a validation sample to select the hyperparamaters of the model, and a test sample to assess performance. The original firms are firms observed over a full time-series. To select these subsamples in a manner that did not imply any time or firm-level correlations, we would have to first divide periods in subsamples 1, 2 and 3 and then subdivide the firms in groups a, b, and c, dividing the entire sample as 1a, 2b and 3c but dropping all other subgroups to avoid correlations. Assuming each group is equally sized, this would imply a data loss of 2/3.

References

  1. Bao, Y., Ke, B., Li, B., Yu, Y.J., & Zhang, J. (2019). Detecting accounting fraud in publicly traded us firms using a machine learning approach. Available at SSRN 2670703.

  2. Barth, M.E., Li, K., & McClure, C. (2019). Evolution in value relevance of accounting information.

  3. Bertomeu, J., Beyer, A., & Taylor, D.J. (2016). From casual to causal inference in accounting research: The need for theoretical foundations. Foundations and Trends in Accounting, 10(2-4), 262–313.

    Article  Google Scholar 

  4. Bertomeu, J., Cheynel, E., Floyd, E., & Pan, W. (2019). Using machine learning to detect misstatements. Available at SSRN 3496297.

  5. Binz, O, Katherine, S, & Stanridge, K. (2020). What can analysts learn from artificial intelligence about fundamental analysis?.

  6. Chemla, G., & Hennessy, C.A. (2019). Rational expectations and the paradox of policy-relevant natural experiments. Journal of Monetary Economics.

  7. Dechow, P.M., & Skinner, D.J. (2000). Earnings management: Reconciling the views of accounting academics, practitioners, and regulators. Accounting Horizons, 14(2), 235–250.

    Article  Google Scholar 

  8. Dechow, P., Ge, W., & Schrand, C. (2010). Understanding earnings quality: A review of the proxies, their determinants and their consequences. Journal of Accounting and Economics, 50(2-3), 344–401.

    Article  Google Scholar 

  9. Deng, H. (2019). Interpreting tree ensembles with intreess. International Journal of Data Science and Analytics, 7 (4), 277–287.

    Article  Google Scholar 

  10. Ding, K., Lev, B., Peng, X., Sun, T., & Vasarhelyi, M.A. (2019). Machine learning improves accounting estimates. Review of Accounting Studies, forth.

  11. Elliott, G, & Timmermann, A. (2013). Handbook of economic forecasting. Elsevier.

  12. Gerakos, J.J., Richard Hahn, P., Kovrijnykh, A., & Zhou, F. (2016). Prediction versus inducement and the informational efficiency of going concern opinions. Available at SSRN 2802971.

  13. Gu, Z., & Wu, J.S. (2003). Earnings skewness and analyst forecast bias. Journal of Accounting and Economics, 35(1), 5–29.

    Article  Google Scholar 

  14. Horowitz, J.L. (2001). The bootstrap. In Handbook of econometrics, (Vol. 5 pp. 3159–3228): Elsevier.

  15. Hugon, A., Kumar, A., & Lin, A.-P. (2016). Analysts, macroeconomic news, and the benefit of active in-house economists. The Accounting Review, 91(2), 513–534.

    Article  Google Scholar 

  16. Li, F. (2010). The information content of forward-looking statements in corporate filings-a naïve bayesian machine learning approach. Journal of Accounting Research, 48(5), 1049–1102.

    Article  Google Scholar 

  17. Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2018). Foundations of machine learning. MIT Press.

  18. Pagan, A., & Ullah, A. (1999). Nonparametric econometrics. Cambridge University Press.

  19. Perols, J. (2011). Financial statement fraud detection: An analysis of statistical and machine learning algorithms. Auditing: A Journal of Practice & Theory, 30(2), 19–50.

    Article  Google Scholar 

  20. Perols, J.L., Bowen, R.M., Zimmermann, C., & Samba, B. (2017). Finding needles in a haystack: Using data analytics to improve fraud prediction. The Accounting Review, 92(2), 221–245.

    Article  Google Scholar 

  21. Sun, T. (2019). Applying deep learning to audit procedures: An illustrative framework. Accounting Horizons, 33(3), 89–109.

    Article  Google Scholar 

  22. Wager, S., & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523), 1228–1242.

    Article  Google Scholar 

  23. Watts, R.L. (2003). Conservatism in accounting part ii: Evidence and research opportunities. Accounting Horizons, 17(4), 287–301.

    Article  Google Scholar 

Download references

Acknowledgments

I thank Xuan Peng and Ting Sun for being extremely patient and offering many of the critical insights that ultimately led to this discussion, and, especially, conducting additional analyses for purposes of discussion. I also thank Edwige Cheynel, Iván Marinovic, Eric Floyd, Wenqiang Pan, and Allan Timmerman for many fireside discussions that helped mature the ideas contained here and most gratefully thank Joey Engelberg for creating the discussion group that awoke my interest in machine learning.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jeremy Bertomeu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Random Forests with scikit-learn

Appendix: Random Forests with scikit-learn

figurec
figured

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bertomeu, J. Machine learning improves accounting: discussion, implementation and research opportunities. Rev Account Stud 25, 1135–1155 (2020). https://doi.org/10.1007/s11142-020-09554-9

Download citation

Keywords

  • Machine learning
  • Accounting
  • Estimates
  • Modelling

JEL Classification

  • C4
  • C5
  • G3
  • M2
  • M4