Skip to main content

Advertisement

Log in

Multi-label charge predictions leveraging label co-occurrence in imbalanced data scenario

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Charge prediction is to predict associated charges based on fact descriptions and plays a significant role in legal aid systems. It is a fundamental and challenging task to automatically predict charges in the multi-label classification paradigm, which is fit to real applications. Existing works either focus on balanced data scenario and multiple charges or few-shot charges with a single label. Moreover, previous models utilize special initialization with label patterns to improve the performance of the multi-label classification task, which is only applicable when there is less training data, resulting in poor robustness. To this end, a multi-task convolutional neural network combined with bidirectional long short-time memory leveraging label co-occurrence framework, called CBLLC, is introduced to predict multiple charges with article information on imbalanced data occasion. We develop a new learning mechanism to train the framework of charge and article patterns when there is a lot of training data, increasing its robustness. In CBLLC, the data preprocessing process serves to aid the training in a more generalized manner and reduce overfitting. A salient word annotation is introduced to deal with few-shot charges. A better classification result is obtained with processed data and improves the generality of the model. Experimental results of Chinese AI and Law Challenge test set show the superiority of our proposed method compared with the state-of-the-art methods. In particular, a macro-F1 score of 92.9% for charges and 86.6% for articles is achieved with co-occurrence of charges and patterns of articles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  • Akcay S, Kundegorski M, Willcocks C, Breckon T (2018) Using deep convolutional neural network architectures for object classification and detection within X-ray baggage security imagery. IEEE Trans Inf Forensics Secur 13(9):2203–2215

    Google Scholar 

  • Alawad M, Gao S, Qiu J, Yoon H, Blair C et al (2019) Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks. J Am Med Inform Assoc. https://doi.org/10.1093/jamia/ocz153

    Article  Google Scholar 

  • Arif MH, Li J, Iqbal M, Liu K (2018) Sentiment analysis and spam detection in short informal text using learning classifier systems. Soft Comput 22:7281–7291

    Google Scholar 

  • Bader-El-Den M, Teitei E, Perry T (2018) Biased random forest for dealing with the class imbalance problem. IEEE Trans Neural Netw Learn Syst 30(7):2163–2172

    Google Scholar 

  • Bahdanau, D, Cho, K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473

  • Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Google Scholar 

  • Bennin K, Keung J, Phannachitta P, Monden A, Mensah S (2018) MAHAKIL: diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction. In: IEEE/ACM 40th international conference on software engineering (ICSE), Gothenburg, pp 699–699

  • Chaturvedi I, Cambria E, Welsch R, Herrera F (2018) Distinguishing between facts and opinions for sentiment analysis: survey and challenges. Inf Fusion 44:65–77

    Google Scholar 

  • Chen H, Chung W, Xu J, Wang G, Qin Y, Chau M (2004) Crime data mining: a general framework and some examples. Computer 37(4):50–56

    Google Scholar 

  • Chen T, Xu R, He Y, Xia Y, Wang X (2016) Learning user and product distributed representations using a sequence model for sentiment analysis. IEEE Comput Intell Mag 11(3):34–44

    Google Scholar 

  • Chen H, Liu J, Lv Y, Li M, Liu M, Zheng Q (2018a) Semi-supervised clue fusion for spammer detection in Sina Weibo. Inf Fusion 44:22–32

    Google Scholar 

  • Chen K, Zhao T, Yang M, Liu L, Tamura A, Wang R et al (2018b) A neural approach to source dependence based context model for statistical machine translation. IEEE/ACM Trans Audio Speech Lang Process 26(2):266–280

    Google Scholar 

  • Datta S, Das S (2018) Multiobjective support vector machines: handling class imbalance with pareto optimality. IEEE Trans Neural Netw Learn Syst 30(5):1602–1608

    MathSciNet  Google Scholar 

  • Er MJ, Zhang Y, Wang N, Pratama M (2016) Attention pooling-based convolutional neural network for sentence modelling. Inf Sci 373:388–403

    MATH  Google Scholar 

  • Feng Y, Fan LD (2019) Ontology semantic integration based on convolutional neural network. Neural Comput Appl 31(12):8253–8266

    Google Scholar 

  • Fiore U, Santis AD, Perla F, Zanetti P, Palmieri F (2019) Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Inf Sci 479:448–455

    Google Scholar 

  • Greff K, Srivastava KJ, Steunebrink B, Schmidhuber J (2017) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232

    MathSciNet  Google Scholar 

  • Han H, Bai X, Li P (2018) Augmented sentiment representation by learning context information. Neural Comput Appl 31(12):8475–8482

    Google Scholar 

  • Hu Z, Li X, Tu C, Liu Z, Sun M (2018) Few-shot charge prediction with discriminative legal attributes. In: The 27th international conference on computational linguistics (COLING 2018)

  • Ienco D, Gaetano R, Dupaquier C, Maurel P (2017) Land cover classification via multitemporal spatial data by deep recurrent neural networks. IEEE Geosci Remote Sens Lett 14(10):1685–1689

    Google Scholar 

  • Jayakorn V, Fernando D, Costeira JP (2019) Discriminative optimization: theory and applications to computer vision. IEEE Trans Pattern Anal Mach Intell 41(4):829–843

    Google Scholar 

  • Jiang X, Ye H, Luo Z, Chao W (2018) Interpretable rationale augmented charge prediction system. In: Coling 2018

  • Johannes F, Eyke H, Eneldo L, Mencía BK (2008) Multilabel classification via calibrated label ranking. Mach Learn 73(2):133–153

    Google Scholar 

  • Jollife I (1986) Principal component analysis. Springer, New York

    Google Scholar 

  • Kanghan O, Chung Y, Kim K et al (2019) Classification and visualization of Alzheimer’s disease using volumetric convolutional neural network and transfer learning. Sci Rep 9:18150. https://doi.org/10.1038/s41598-019-54548-6

    Article  Google Scholar 

  • Karim F, Majumdar S, Darabi H (2019) Insights into lstm fully convolutional networks for time series classification. IEEE Access 7:67718–67725

    Google Scholar 

  • Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of EMNLP, pp 1746–1751

  • Kurata G, Xiang B, Zhou B (2016) Improved neural network-based multi-label classification with better initialization leveraging label co-occurrence. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, San Diego, California, June 12–17, 2016, pp 521–526

  • Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. In: AAAI, vol 333, pp 2267–2273

  • Li Y, Algarni A, Albathan M, Shen Y, Bijaksana M (2015) Relevance feature discovery for text mining publisher. IEEE Trans Knowl Data Eng 27(6):1656–1669

    Google Scholar 

  • Li J, Fong S, Zhuang Y, Khoury R (2016) Hierarchical classification in text mining for sentiment analysis of online news. Soft Comput 20:3411–3420

    Google Scholar 

  • Li J, Zhang G, Yu L, Meng T (2019a) Research and design on cognitive computing framework for predicting judicial decisions. J Sign Process Syst 91:1159–1167. https://doi.org/10.1007/s11265-018-1429-9

    Article  Google Scholar 

  • Li X, Wang Y, Wang D, Yuan W, Peng D, Mei Q (2019b) Improving rare disease classification using imperfect knowledge graph. BMC Med Inf Decis Mak 19(5):238

    Google Scholar 

  • Liu C, Liao T (2005) Classifying criminal charges in Chinese for web-based legal services. In: Proceedings of the 7th Asia-Pacific web conference on web technologies research and development, Shanghai, China, March 29–April 01, 2005, pp 64–75. https://doi.org/10.1007/978-3-540-31849-1_8

  • Liu Y, Yao J, Lu X, Xia M, Wang X, Liu Y (2019) Roadnet: learning to comprehensively analyze road networks in complex urban scenes from high-resolution remotely sensed images. IEEE Trans Geosci Remote Sens 57(4):2043–2056

    Google Scholar 

  • Liu X, Mou L, Cui H, Lu Z, Song S (2020) Finding decision jumps in text classification. Neurocomputing 371:177–187

    Google Scholar 

  • Luo B, Feng Y, Xu J, Zhang X, Zhao D (2017) Learning to predict charges for criminal cases with legal basis. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 2727–2736. https://doi.org/10.18653/v1/d17-1289

  • Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, vol 26, pp 3111–3119

  • Mou L, Ghamisi P, Zhu X (2017) Deep recurrent neural networks for hyperspectral image classification. IEEE Trans Geosci Remote Sens 55(7):3639–3655

    Google Scholar 

  • Pan C, Huang J, Gong J, Yuan X (2019) Few-shot transfer learning for text classification with lightweight word embedding based models. IEEE Access 7:53296–53304

    Google Scholar 

  • Parwez M, Abulaish M, Jahiruddin (2019) Multi-label classification of microblogging texts using convolution neural network. IEEE Access 7:68678–68691

    Google Scholar 

  • Pereira RB, Plastino A, Zadrozny B, Merschmann LHC (2018) Categorizing feature selection methods for multi-label classification. Artif Intell Rev 49(1):57–78

    Google Scholar 

  • Phan H, Andreotti F, Cooray N, Chén O, Vos M (2019) Joint classification and prediction CNN framework for automatic sleep stage classification. IEEE Trans Biomed Eng 66(5):1285–1296

    Google Scholar 

  • Ravanelli M, Brakel P, Omologo M, Bengio Y (2018) Light gated recurrent units for speech recognition. IEEE Trans Emerg Top Comput Intell 2(2):92–102

    Google Scholar 

  • Schwendicke F, Golla T, Dreher M (2019) Convolutional neural networks for dental image diagnostics: a scoping review. J Dent 91:103226

    Google Scholar 

  • Shen X, Tian X, Liu T, Xu F, Tao D (2018) Continuous dropout. IEEE Trans Neural Netw Learn Syst 29(9):3926–3937

    Google Scholar 

  • Shuang K, Zhang Z, Loo J, Su S (2020) Convolution-deconvolution word embedding: an end-to-end multi-prototype fusion embedding method for natural language processing. Inf Fusion 53:112–122

    Google Scholar 

  • Srivastava SK, Singh SK, Suri JS (2020) State-of-the-art methods in healthcare text classification system: AI paradigm. Front Biosci 25:646–672

    Google Scholar 

  • Tsoumakas G, Vlahavas I (2007) Random k-label sets: an ensemble method for multilabel classification. In: Proceedings of the 18th European conference on machine learning, Warsaw, Poland, 17–21 Sept 2007, pp 406–417. https://doi.org/10.1007/978-3-540-74958-5_38

  • Tsoumakas G, Katakis I, Vlahavas I (2010) Mining multi-label data. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook. Springer, Berlin

    Google Scholar 

  • Tu Y, Du J, Lee C (2019) Speech enhancement based on teacher-student deep learning using improved speech presence probability for noise-robust speech recognition. IEEE/ACM Trans Audio Speech Lang Process 27(12):2080–2091

    Google Scholar 

  • Uysal A (2018) On two-stage feature selection methods for text classification. IEEE Access 6:43233–43251

    MathSciNet  Google Scholar 

  • Vashishtha S, Susan S (2019) Fuzzy rule based unsupervised sentiment analysis from social media posts. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2019.112834

    Article  Google Scholar 

  • Wan C, Wang Y, Liu Y, Ji J, Feng G (2019) Composite feature extraction and selection for text classification. IEEE Access 7:35208–35219

    Google Scholar 

  • Wang G, Chen H, Xu J, Atabakhsh H (2006) Automatically detecting criminal identity deception: an adaptive detection algorithm. IEEE Trans Syst Man Cybern Part A Syst Hum 36(5):988–999

    Google Scholar 

  • Xiao C, Zhong H, Guo Z, Tu C, Liu Z, Sun M, Feng Y, Han X, Hu Z, Wang H, Xu J (2018) CAIL2018: a large-scale legal dataset for judgment prediction. arXiv preprint arXiv:1807.02478

  • Xie J, Hao M, Liu W, Lin Y (2020) Fused variable screening for massive imbalanced data. Comput Stat Data Anal 141:94–108

    MathSciNet  MATH  Google Scholar 

  • Yang Z, Yang D, Dyer C, He X, Smola A-J, Hovy E-H (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1480–1489

  • Ye H, Jiang X, Luo Z, Chao W (2018) Interpretable charge predictions for criminal cases: learning to generate court views from fact descriptions. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies. https://doi.org/10.18653/v1/n18-1168

  • Zhang M-L, Zhou Z-H (2007) ML-kNN: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048

    MATH  Google Scholar 

  • Zhong H, Xiao C (2018) Overview of CAIL2018: legal judgment prediction competition. arXiv preprint arXiv:1810.05851v1,2018

  • Zhong H, Guo H, Tu C, Xiao C, Liu Z, Sun M (2018) Legal judgment prediction via topological learning. In: Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, Belgium, October 31–November 4, 2018. 2018 Association for Computational Linguistics, pp 3540–3549

Download references

Acknowledgements

This work is supported by National Key R&D Program of China, under Grant No. 2018YFC0830800. The authors would like to thank Dr. Xiaoyang Li from School of Electronics and Information, Northwestern Polytechnical University, for his valuable comments on the article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fengbao Yang.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, H., Yang, F. & Wang, X. Multi-label charge predictions leveraging label co-occurrence in imbalanced data scenario. Soft Comput 24, 17821–17846 (2020). https://doi.org/10.1007/s00500-020-05029-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-020-05029-w

Keywords

Navigation