Data Quality for Deep Learning of Judgment Documents: An Empirical Study

  • Jiawei Liu
  • Dong Wang
  • Zhenzhen WangEmail author
  • Zhenyu Chen
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1157)


The revolution in hardware technology has made it possible to obtain high-definition data through highly sophisticated algorithms. Deep learning has emerged and is widely used in various fields, and the judicial area is no exception. As the carrier of the litigation activities, the judgment documents record the process and results of the people’s courts, and their quality directly affects the fairness and credibility of the law. To be able to measure the quality of judgment documents, the interpretability of judgment documents has been an indispensable dimension. Unfortunately, due to the various uncontrollable factors during the process, such as data transmission and storage, The data set for training usually has a poor quality. Besides, due to the severe imbalance of the distribution of case data, data augmentation is essential to generate data for low-frequency cases. Based on the existing data set and the application scenarios, we explore data quality issues in four areas. Then we systematically investigate them to figure out their impact on the data set. After that, we compare the four dimensions to find out which one has the most considerable damage to the data set.


Judgment document Deep learning Quality measurement Natural language processing 



The work is supported in part by the National Key Research and Development Program of China (2016YFC0800805) and the National Natural Science Foundation of China (61832009, 61932012).


  1. 1.
    Sidi, F., Panahy, P.H.S., Affendey, L.S., Jabar, M.A., Ibrahim, H., Mustapha, A.: Data quality: a survey of data quality dimensions. In: 2012 International Conference on Information Retrieval & Knowledge Management, pp. 300–304. IEEE (2012)Google Scholar
  2. 2.
    Kiefer, C.: Assessing the quality of unstructured data: an initial overview. In: LWDA, pp. 62–73 (2016)Google Scholar
  3. 3.
    Firmani, D., Mecella, M., Scannapieco, M., Batini, C.: On the meaningfulness of “big data quality”. Data Sci. Eng. 1(1), 6–20 (2016)CrossRefGoogle Scholar
  4. 4.
    Batini, C., Scannapieco, M., et al.: Data and Information Quality. Springer, Cham (2016). Scholar
  5. 5.
    Kiefer, C.: Quality indicators for text data. BTW 2019-Workshopband (2019)Google Scholar
  6. 6.
    Gupta, A., et al.: Toward building a legal knowledge-base of Chinese judicial documents for large-scale analytics. Legal knowledge and information systems (2017)Google Scholar
  7. 7.
    Casati, F., Shan, M.C., Sayal, M.: Investigating business processes. US Patent 7,610,211, 27 Oct 2009Google Scholar
  8. 8.
    Sadiq, S., Indulska, M.: Open data: quality over quantity. Int. J. Inf. Manag. 37(3), 150–154 (2017)CrossRefGoogle Scholar
  9. 9.
    Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. Computation and Language (2016)Google Scholar
  10. 10.
    Cuayahuitl, H., Renals, S., Lemon, O., Shimodaira, H.: Human-computer dialogue simulation using hidden Markov models, pp. 290–295 (2005)Google Scholar
  11. 11.
    Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: North American Chapter of the Association for Computational Linguistics, pp. 260–270 (2016)Google Scholar
  12. 12.
    Simon, L., Webster, R., Rabin, J.: Revisiting precision and recall definition for generative model evaluation. Learning (2019)Google Scholar
  13. 13.
    Wasikowski, M., Chen, X.W.: Combating the small sample class imbalance problem using feature selection. IEEE Trans. Knowl. Data Eng. 22(10), 1388–1400 (2010)CrossRefGoogle Scholar
  14. 14.
    Batini, C., Palmonari, M., Viscusi, G.: The many faces of information and their impact on information quality. In: AISB/IACAP World Congress 2012-Information Quality, pp. 212–228 (2012)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  • Jiawei Liu
    • 1
    • 2
  • Dong Wang
    • 2
  • Zhenzhen Wang
    • 2
    • 3
    Email author
  • Zhenyu Chen
    • 1
    • 2
  1. 1.State Key Laboratory for Novel Software TechnologyNanjing UniversityNanjingChina
  2. 2.Software Testing Engineering Laboratory of Jiangsu ProvinceNanjingChina
  3. 3.School of SoftwareJinling Institute of TechnologyNanjingChina

Personalised recommendations