Skip to main content
Log in

Automatic document classification via transformers for regulations compliance management in large utility companies

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The operation of large utility companies such as Consolidated Edison Company of New York, Inc. (Con Edison) typically rely on large quantities of regulation documents from external institutions which inform the company of upcoming or ongoing policy changes or new requirements the company might need to comply with if deemed applicable. As a concrete example, if a recent regulatory publication mentions that the timeframe for the Company to respond to a reported system emergency in its service territory changes from within X time to within Y time—then the affected operating groups will be notified, and internal Company operating procedures may need to be reviewed and updated accordingly to comply with the new regulatory requirement. Each such regulation document needs to be reviewed manually by an expert to determine if the document is relevant to the company and, if so, which department it is relevant to. In order to help enterprises improve the efficiency of their operation, we propose an automatic document classification pipeline that determines whether a document is important for the company or not, and if deemed important it forwards those documents to the departments within the company for further review. Binary classification task of determining the importance of a document is done via ensembling the Naive Bayes (NB), support vector machine (SVM), random forest (RF), and artificial neural network (ANN) together for the final prediction, whereas the multi-label classification problem of identifying the relevant departments for a document is executed by the transformer-based DocBERT model. We apply our pipeline to a large corpus of tens of thousands of text data provided by Con Edison and achieve an accuracy score over \(80\%\). Compared with existing solutions for document classification which rely on a single classifier, our paper i) ensemble multiple classifiers for better accuracy results and escaping from the problem of overfitting, ii) utilize pretrained transformer-based DocBERT model to achieve ideal performance for multi-label classification task and iii) introduce a bi-level structure to improve the performance of the whole pipeline where the binary classification module works as a rough filter before finally distributing the text to corresponding departments through the multi-label classification module.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The datasets analyzed during the current study are not publicly available due to commercial and privacy restriction since the datasets contain detailed regulation documents that are confidential to Consolidated Edison Company of New York Inc. (Con Edison) but are available from the corresponding author on reasonable request.

References

  1. Russell S, Norvig P (2002) Artificial intelligence: a modern approach

  2. Parunak HVD (1996) Applications of distributed artificial intelligence in industry. Found Distrib Artif Intell 2(1):18

    Google Scholar 

  3. House W (2016) Artificial intelligence, automation, and the economy. Executive office of the President. https://obamawhitehouse.archives.gov/sites/whitehouse. gov/files/documents/Artificial-Intelligence-Automation-Economy. PDF

  4. Lee J, Davari H, Singh J, Pandhare V (2018) Industrial artificial intelligence for industry 4.0-based manufacturing systems. Manuf Lett 18:20–23

    Google Scholar 

  5. Ramesh A, Kambhampati C, Monson JR, Drew P (2004) Artificial intelligence in medicine. Ann R Coll Surg Engl 86(5):334

    Google Scholar 

  6. Hamet P, Tremblay J (2017) Artificial intelligence in medicine. Metabolism 69:36–40

    Google Scholar 

  7. He J, Baxter SL, Xu J, Xu J, Zhou X, Zhang K (2019) The practical implementation of artificial intelligence technologies in medicine. Nat Med 25(1):30–36

    Google Scholar 

  8. Taylor RH, Menciassi A, Fichtinger G, Fiorini P, Dario P (2016) Medical robotics and computer-integrated surgery. In: Springer Handbook of Robotics, pp. 1657–1684. Springer, Cham

  9. Kononenko I (2001) Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med 23(1):89–109

    Google Scholar 

  10. Bakator M, Radosav D (2018) Deep learning and medical diagnosis: a review of literature. Multimodal Technol Interact 2(3):47

    Google Scholar 

  11. Bland M (2015) An introduction to medical statistics. Oxford University Press, UK

    MATH  Google Scholar 

  12. Tarca AL, Carey VJ, Chen X-W, Romero R, Drăghici S (2007) Machine learning and its applications to biology. PLoS Comput Biol 3(6):116

    Google Scholar 

  13. Bahrammirzaee A (2010) A comparative survey of artificial intelligence applications in finance: artificial neural networks, expert system and hybrid intelligent systems. Neural Comput Appl 19(8):1165–1195

    Google Scholar 

  14. Lv D, Yuan S, Li M, Xiang Y (2019) An empirical study of machine learning algorithms for stock daily trading strategy. Math Probl Eng 2019

  15. Arévalo A, Niño J, Hernández G, Sandoval J (2016) High-frequency trading strategy based on deep neural networks. In: International Conference on Intelligent Computing, pp. 424–436. Springer

  16. Mishra M, Nayak J, Naik B, Abraham A (2020) Deep learning in electrical utility industry: a comprehensive review of a decade of research. Eng Appl Artif Intell 96:104000

    Google Scholar 

  17. Goralski MA, Tan TK (2020) Artificial intelligence and sustainable development. Int J Manag Educ 18(1):100330

    Google Scholar 

  18. Hasan K, Shetty S, Ullah S (2019) Artificial intelligence empowered cyber threat detection and protection for power utilities. In: 2019 IEEE 5th International Conference on Collaboration and Internet Computing (CIC), pp. 354–359. IEEE

  19. Momoh JA, El-Hawary ME (2018) Electric systems, dynamics, and stability with artificial intelligence applications. CRC Press, USA

    Google Scholar 

  20. Desatnick RL (1987) Managing to keep the customer: how to achieve and maintain superior customer service throughout the organization. Jossey-Bass, USA

    Google Scholar 

  21. Lee SM, Lee D (2020) untact: a new customer service strategy in the digital age. Serv Bus 14(1):1–22

    Google Scholar 

  22. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  23. Rish I et al (2001) An empirical study of the naive bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence 3: 41–46

  24. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    MATH  Google Scholar 

  25. Haykin S (2009) Neural networks and learning machines. 3/EPearson Education India, India

    Google Scholar 

  26. Tang Y, Jin B, Sun Y, Zhang Y-Q (2004) Granular support vector machines for medical binary classification problems. In: 2004 Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp. 73–78. IEEE

  27. Villalba-Diez J, Schmidt D, Gevers R, Ordieres-Meré J, Buchwitz M, Wellbrock W (2019) Deep learning for industrial computer vision quality control in the printing industry 4.0. Sensors 19(18):3987

    Google Scholar 

  28. Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 391–398

  29. Nallapati R (2004) Discriminative models for information retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 64–71

  30. Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern 21(3):660–674

    MathSciNet  Google Scholar 

  31. Schapire RE (2013) Explaining adaboost. Empirical Inference, pp. 37–52. Springer, Berlin

  32. Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, pp. 785–794

  33. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) Lightgbm: A highly efficient gradient boosting decision tree. Adv Neural Inform Proce Syst 30

  34. Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222

    MathSciNet  Google Scholar 

  35. Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386

    Google Scholar 

  36. Gevrey M, Dimopoulos I, Lek S (2003) Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecol Model 160(3):249–264

    Google Scholar 

  37. Abiodun OI, Jantan A, Omolara AE, Dada KV, Mohamed NA, Arshad H (2018) State-of-the-art in artificial neural network applications: A survey. Heliyon 4(11):00938

    Google Scholar 

  38. LeCun Y, Bengio Y et al (1995) Convolutional networks for images, speech, and time series. Handb Brain Theory Neural Netw 3361(10):1995

    Google Scholar 

  39. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inform Proc Syst 25

  40. Nassif AB, Shahin I, Attili I, Azzeh M, Shaalan K (2019) Speech recognition using deep neural networks: a systematic review. IEEE Access 7:19143–19165

    Google Scholar 

  41. Feichtenhofer C, Fan H, Malik J, He K (2019) Slowfast networks for video recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6202–6211

  42. Iqbal T, Qureshi S (2020) The survey: Text generation models in deep learning. J King Saud Univ Comput Inform Sci

  43. Medsker LR, Jain L (2001) Recurrent neural networks. Des Appl 5:64–67

    Google Scholar 

  44. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Google Scholar 

  45. Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259

  46. Aly M (2005) Survey on multiclass classification methods. Neural Netw 19(1–9):2

    Google Scholar 

  47. De Boer P-T, Kroese DP, Mannor S, Rubinstein RY (2005) A tutorial on the cross-entropy method. Ann Oper Res 134(1):19–67

    MathSciNet  MATH  Google Scholar 

  48. Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min (IJDWM) 3(3):1–13

    Google Scholar 

  49. Tarekegn AN, Giacobini M, Michalak K (2021) A review of methods for imbalanced multi-label classification. Pattern Recogn 118:107965

    Google Scholar 

  50. Gunasekara I, Nejadgholi I (2018) A review of standard text classification practices for multi-label toxicity identification of online content. In: Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), pp. 21–25

  51. Luo X, Zincir-Heywood AN (2005) Evaluation of two systems on multi-class multi-label document classification. In: International Symposium on Methodologies for Intelligent Systems, pp. 161–169. Springer

  52. Cerri R, Barros RC, de Carvalho PLF AC, Jin Y (2016) Reduction strategies for hierarchical multi-label classification in protein function prediction. BMC Bioinform 17(1):1–24

    Google Scholar 

  53. Li T, Ogihara M, Li Q (2003) A comparative study on content-based music genre classification. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 282–289

  54. Trohidis K, Tsoumakas G, Kalliris G, Vlahavas IP et al (2008) Multi-label classification of music into emotions. ISMIR 8:325–330

    Google Scholar 

  55. Oramas S, Nieto O, Barbieri F, Serra X (2017) Multi-label music genre classification from audio, text, and images using deep features. arXiv preprint arXiv:1707.04916

  56. Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757–1771

    Google Scholar 

  57. Zhang Z-L, Zhang M-L (2006) Multi-instance multi-label learning with application to scene classification. Adv Neural Inform Process Syst 19

  58. Qi X, Zhu P, Wang Y, Zhang L, Peng J, Wu M, Chen J, Zhao X, Zang N, Mathiopoulos PT (2020) Mlrsnet: a multi-label high spatial resolution remote sensing dataset for semantic scene understanding. ISPRS J Photogramm Remote Sens 169:337–350

    Google Scholar 

  59. Cherman EA, Monard MC, Metz J (2011) Multi-label problem transformation methods: a case study. CLEI Electron J 14(1):4–4

    Google Scholar 

  60. Spolaôr N, Cherman EA, Monard MC, Lee HD (2013) A comparison of multi-label feature selection methods using the problem transformation approach. Electron Notes Theor Comput Sci 292:135–151

    Google Scholar 

  61. Read J (2008) A pruned problem transformation method for multi-label classification. In: Proc 2008 New Zealand Computer Science Research Student Conference (NZCSRS 2008), vol. 143150, p. 41

  62. Prajapati P, Thakkar A, Ganatra A (2012) A survey and current research challenges in multi-label classification methods. Int J Soft Comput Eng (IJSCE) 2(1):248–252

    Google Scholar 

  63. Santos A, Canuto A, Neto AF (2011) A comparative analysis of classification methods to multi-label tasks in different application domains. Int. J. Comput. Inform. Syst. Indust. Manag. Appl 3:218–227

    Google Scholar 

  64. Ben-Baruch E, Ridnik T, Zamir N, Noy A, Friedman I, Protter M, Zelnik-Manor L (2020) Asymmetric loss for multi-label classification. arXiv preprint arXiv:2009.14119

  65. Davidson J, Liebald B, Liu J, Nandy P, Van Vleet T, Gargi U, Gupta S, He Y, Lambert M, Livingston B, et al (2010) The youtube video recommendation system. In: Proceedings of the Fourth ACM Conference on Recommender Systems, pp. 293–296

  66. Jain H, Prabhu Y, Varma M (2016) Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 935–944

  67. Kumar P, Thakur RS (2018) Recommendation system techniques and related issues: a survey. Int J Inf Technol 10(4):495–501

    Google Scholar 

  68. Chalkidis I, Fergadiotis M, Kotitsas S, Malakasiotis P, Aletras N, Androutsopoulos I (2020) An empirical study on large-scale multi-label text classification including few and zero-shot labels. arXiv preprint arXiv:2010.01653

  69. Zhang Y, Wang Y, Liu X-Y, Mi S, Zhang M-L (2020) Large-scale multi-label classification using unknown streaming images. Pattern Recogn 99:107100

    Google Scholar 

  70. Zhang M-L, Li Y-K, Liu X-Y, Geng X (2018) Binary relevance for multi-label learning: an overview. Front Comp Sci 12(2):191–202

    Google Scholar 

  71. Babbar R, Schölkopf B (2017) Dismec: Distributed sparse machines for extreme multi-label classification. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 721–729

  72. Babbar R, Schölkopf B (2019) Data scarcity, robustness and extreme multi-label classification. Mach Learn 108(8):1329–1351

    MathSciNet  MATH  Google Scholar 

  73. Prabhu Y, Varma M (2014) Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 263–272

  74. Liu J, Chang W-C, Wu Y, Yang Y (2017) Deep learning for extreme multi-label text classification. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 115–124

  75. Zhang W, Yan J, Wang X, Zha H (2018) Deep extreme multi-label learning. In: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, pp. 100–107

  76. You R, Zhang Z, Wang Z, Dai S, Mamitsuka H, Zhu S (2019) Attentionxml: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification. Adv Neural Inform Process Syst 32

  77. Bhatia K, Jain H, Kar P, Varma M, Jain P (2015) Sparse local embeddings for extreme multi-label classification. Adv Neural Inform Process Syst 28

  78. Jalan A, Kar P (2019) Accelerating extreme classification via adaptive feature agglomeration. arXiv preprint arXiv:1905.11769

  79. Evans N, Levinson SC (2009) The myth of language universals: language diversity and its importance for cognitive science. Behav Brain Sci 32(5):429–448

    Google Scholar 

  80. Black M (2019) The importance of language. Cornell University Press

    Google Scholar 

  81. Anderson SR (2010) How many languages are there in the world. Linguistic Society of America

  82. Nadkarni PM, Ohno-Machado L, Chapman WW (2011) Natural language processing: an introduction. J Am Med Inform Assoc 18(5):544–551

    Google Scholar 

  83. Hirschberg J, Manning CD (2015) Advances in natural language processing. Science 349(6245):261–266

    MathSciNet  MATH  Google Scholar 

  84. Chowdhary K (2020) Natural language processing. Fundam Artif Intell 603–649

  85. Yadav A, Vishwakarma DK (2020) Sentiment analysis using deep learning architectures: a review. Artif Intell Rev 53(6):4335–4385

    Google Scholar 

  86. Behera B, Kumaravelan G, Kumar P (2019) Performance evaluation of deep learning algorithms in biomedical document classification. In: 2019 11th International Conference on Advanced Computing (ICoAC), pp. 220–224. IEEE

  87. Rahman S, Chakraborty P (2021) Bangla document classification using deep recurrent neural network with bilstm. In: Proceedings of International Conference on Machine Intelligence and Data Science Applications, pp. 507–519. Springer

  88. Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2019) Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461

  89. Hashimoto TB, Zhang H, Liang P (2019) Unifying human and statistical evaluation for natural language generation. arXiv preprint arXiv:1904.02792

  90. Anandarajan M, Hill C, Nolan T (2019) Text preprocessing. In: Practical Text Analytics, pp. 45–59. Springer, Cham

  91. Zhang Y, Jin R, Zhou Z-H (2010) Understanding bag-of-words model: a statistical framework. Int J Mach Learn Cybern 1(1):43–52

    Google Scholar 

  92. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

  93. Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Association for Computational Linguistics, Doha, Qatar. https://doi.org/10.3115/v1/D14-1181. https://aclanthology.org/D14-1181

  94. Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw 18(5–6):602–610

    Google Scholar 

  95. Huang Z, Xu W, Yu K (2015) Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991

  96. Hu D (2019) An introductory survey on attention mechanisms in nlp problems. In: Proceedings of SAI Intelligent Systems Conference, pp. 432–448 . Springer

  97. McCann B, Bradbury J, Xiong, C., Socher, R (2017) Learned in translation: Contextualized word vectors. Adv Neural Inform Process Syst 30

  98. Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 2227–2237. Association for Computational Linguistics, New Orleans, Louisiana. https://doi.org/10.18653/v1/N18-1202. https://aclanthology.org/N18-1202

  99. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901

    Google Scholar 

  100. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

  101. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inform Process Syst 30

  102. Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692

  103. He P, Liu X, Gao J, Chen W (2020) Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654

  104. Pappagari R, Zelasko P, Villalba J, Carmiel Y, Dehak N (2019) Hierarchical transformers for long document classification. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 838–844. IEEE

  105. Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489

  106. Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(1):5485–5551

    MathSciNet  MATH  Google Scholar 

  107. Roberts A, Chung HW, Levskaya A, Mishra G, Bradbury J, Andor D, Narang S, Lester B, Gaffney C, Mohiuddin A et al (2022) Scaling up models and data with t5x and seqio. arXiv preprint arXiv:2203.1718913

  108. Xue L, Barua A, Constant N, Al-Rfou R, Narang S, Kale M, Roberts A, Raffel C (2022) Byt5: towards a token-free future with pre-trained byte-to-byte models. Trans Assoc Compu Linguist 10:291–306

    Google Scholar 

  109. Zhuang H, Qin Z, Jagerman R, Hui K, Ma J, Lu J, Ni J, Wang X, Bendersky M (2022) Rankt5: Fine-tuning t5 for text ranking with ranking losses. arXiv preprint arXiv:2210.10634

  110. Yu C, Shen Y, Mao Y (2022) Constrained sequence-to-tree generation for hierarchical text classification. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1865–1869

  111. Chen X, Xu J, Wang A (2020) Label representations in modeling classification as text generation. In: Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: Student Research Workshop, pp. 160–164. Association for Computational Linguistics, Suzhou, China . https://aclanthology.org/2020.aacl-srw.23

  112. Qin C, Joty S. Lfpt5: A unified framework for lifelong few-shot language learning based on prompt tuning of t5. In: International Conference on Learning Representations

  113. Lester B, Al-Rfou R, Constant N (2021) The power of scale for parameter-efficient prompt tuning. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 3045–3059

  114. He Y, Zheng S, Tay Y, Gupta J, Du Y, Aribandi V, Zhao Z, Li Y, Chen Z, Metzler D, et al (2022)Hyperprompt: Prompt-based task-conditioning of transformers. In: International Conference on Machine Learning, pp. 8678–8690. PMLR

  115. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Google Scholar 

  116. Devlin J, Chang M, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805arxiv:1810.04805

  117. Smagulova K, James A (2019) A survey on lstm memristive neural network architectures and applications. Eur Phys J Spec Top. https://doi.org/10.1140/epjst/e2019-900046-x

    Article  Google Scholar 

  118. Adhikari A, Ram A, Tang R, Lin J (2019) Rethinking complex neural network architectures for document classification. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4046–4051. Association for Computational Linguistics, Minneapolis, Minnesota. https://doi.org/10.18653/v1/N19-1408.https://aclanthology.org/N19-1408

  119. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

  120. Adhikari A, Ram A, Tang R, Lin J (2019) Docbert: BERT for document classification. CoRR abs/1904.08398arxiv:1904.08398

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Wang.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Detailed Pipeline Figure

In the figure below, the detailed diagram for the pipeline is provided. See Fig. 9.

Fig. 9
figure 9

Detailed diagram of the pipeline

Appendix B Binary classifier comparison on different dataset

We provide the results for four soft classifiers not only with train and validation dataset accuracy (shown in Table 20), but also the held-out test dataset. Even though SVM perform better on the held-out test dataset, it could not outperforms other classifiers on the original train and validation dataset. Moreover, not a single soft classifier could beat the final ensembling strategy on the held-out test dataset provided in Section 5.1.3 with held-test accuracy \(92\%\).

Table 20 Training accuracy and final validation accuracy of ANN2, NB, SVM and RF

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dimlioglu, T., Wang, J., Bisla, D. et al. Automatic document classification via transformers for regulations compliance management in large utility companies. Neural Comput & Applic 35, 17167–17185 (2023). https://doi.org/10.1007/s00521-023-08555-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08555-4

Keywords

Navigation