Data Quality for Deep Learning of Judgment Documents: An Empirical Study
- 59 Downloads
The revolution in hardware technology has made it possible to obtain high-definition data through highly sophisticated algorithms. Deep learning has emerged and is widely used in various fields, and the judicial area is no exception. As the carrier of the litigation activities, the judgment documents record the process and results of the people’s courts, and their quality directly affects the fairness and credibility of the law. To be able to measure the quality of judgment documents, the interpretability of judgment documents has been an indispensable dimension. Unfortunately, due to the various uncontrollable factors during the process, such as data transmission and storage, The data set for training usually has a poor quality. Besides, due to the severe imbalance of the distribution of case data, data augmentation is essential to generate data for low-frequency cases. Based on the existing data set and the application scenarios, we explore data quality issues in four areas. Then we systematically investigate them to figure out their impact on the data set. After that, we compare the four dimensions to find out which one has the most considerable damage to the data set.
KeywordsJudgment document Deep learning Quality measurement Natural language processing
The work is supported in part by the National Key Research and Development Program of China (2016YFC0800805) and the National Natural Science Foundation of China (61832009, 61932012).
- 1.Sidi, F., Panahy, P.H.S., Affendey, L.S., Jabar, M.A., Ibrahim, H., Mustapha, A.: Data quality: a survey of data quality dimensions. In: 2012 International Conference on Information Retrieval & Knowledge Management, pp. 300–304. IEEE (2012)Google Scholar
- 2.Kiefer, C.: Assessing the quality of unstructured data: an initial overview. In: LWDA, pp. 62–73 (2016)Google Scholar
- 5.Kiefer, C.: Quality indicators for text data. BTW 2019-Workshopband (2019)Google Scholar
- 6.Gupta, A., et al.: Toward building a legal knowledge-base of Chinese judicial documents for large-scale analytics. Legal knowledge and information systems (2017)Google Scholar
- 7.Casati, F., Shan, M.C., Sayal, M.: Investigating business processes. US Patent 7,610,211, 27 Oct 2009Google Scholar
- 9.Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. Computation and Language (2016)Google Scholar
- 10.Cuayahuitl, H., Renals, S., Lemon, O., Shimodaira, H.: Human-computer dialogue simulation using hidden Markov models, pp. 290–295 (2005)Google Scholar
- 11.Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: North American Chapter of the Association for Computational Linguistics, pp. 260–270 (2016)Google Scholar
- 12.Simon, L., Webster, R., Rabin, J.: Revisiting precision and recall definition for generative model evaluation. Learning (2019)Google Scholar
- 14.Batini, C., Palmonari, M., Viscusi, G.: The many faces of information and their impact on information quality. In: AISB/IACAP World Congress 2012-Information Quality, pp. 212–228 (2012)Google Scholar