Word Embedding Based Document Similarity for the Inferring of Penalty

He, Tieke; Lian, Hao; Qin, Zemin; Zou, Zhipeng; Luo, Bin

doi:10.1007/978-3-030-02934-0_22

Tieke He¹⁸,
Hao Lian¹⁸,
Zemin Qin¹⁸,
Zhipeng Zou¹⁸ &
…
Bin Luo¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11242))

Included in the following conference series:

International Conference on Web Information Systems and Applications

1377 Accesses
7 Citations

Abstract

In this paper, we present a novel framework for the inferring of fine amount of judicial cases, which is based on word embedding when calculating the distances between documents. Our work is based on recent studies in word embeddings that learn semantically meaningful representations for words from local occurrences in sentences. This framework considers the context information of words by adopting the word2vec embedding, compared to traditional processing methods such as hierarchical clustering, kNN, k-means and traditional collaborative filtering that rely on vectors. In the area of judicial research, there exists the problem of deciding the amount of fine or penalty of legal cases, in this work we deal with it as a recommendation task, specifically, we divide all the legal cases into 7 classes by the amount of fine, and then for a target legal case, we try to infer which class this case belongs to. We conduct extensive experiments on a legal case dataset, and the results show that our proposed method outperforms all the comparative methods in metrics Precision, Recall and F1-Score.

This work is supported in part by the National Key Research and Development Program of China (2016YFC0800805), and the Fundamental Research Funds for the Central Universities (021714380013).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Boyer, M., Lewis, T.R., Liu, W.L.: Setting standards for credible compliance and law enforcement. Can. J. Econ./Rev. Can. D’économique 33(2), 319–340 (2000)
Google Scholar
Chee, S.H.S., Han, J., Wang, K.: RecTree: an efficient collaborative filtering method. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2001. LNCS, vol. 2114, pp. 141–151. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44801-2_15
Chapter Google Scholar
Chowdhury, G.G.: Introduction to Modern Information Retrieval. Facet Publishing, London (2010)
Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)
Google Scholar
Čubranić, D.: Automatic bug triage using text categorization. In: SEKE (2004)
Google Scholar
Daughety, A.F., Reinganum, J.F.: Keeping society in the dark: on the admissibility of pretrial negotiations as evidence in court. RAND J. Econ. 203–221 (1995)
Article Google Scholar
Earnhart, D.: Enforcement of environmental protection laws under communism and democracy. J. Law Econ. 40(2), 377–402 (1997)
Article Google Scholar
Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. Inf. Syst. 25(5), 345–366 (2000)
Article Google Scholar
Guo, G., Wang, H., Bell, D., Bi, Y., Greer, K.: KNN model-based approach in classification. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) OTM 2003. LNCS, vol. 2888, pp. 986–996. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39964-3_62
Chapter Google Scholar
He, T., Chen, Z., Liu, J., Zhou, X., Du, X., Wang, W.: An empirical study on user-topic rating based collaborative filtering methods. World Wide Web 20(4), 815–829 (2017)
Article Google Scholar
He, T., Yin, H., Chen, Z., Zhou, X., Sadiq, S., Luo, B.: A spatial-temporal topic model for the semantic annotation of POIs in LBSNs. ACM Trans. Intell. Syst. Technol. (TIST) 8(1), 12 (2016)
Google Scholar
Kilgour, D.M., Fang, L., Hipel, K.W.: Game-theoretic analyses of enforcement of environmental laws and regulations. JAWRA J. Am. Water Resour. Assoc. 28(1), 141–153 (1992)
Article Google Scholar
Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: International Conference on Machine Learning, pp. 957–966 (2015)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Milani, B.A., Navimipour, N.J.: A systematic literature review of the data replication techniques in the cloud environments. Big Data Research (2017)
Google Scholar
Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Advances in Neural Information Processing Systems, pp. 1081–1088 (2009)
Google Scholar
P’ng, I.P.: Strategic behavior in suit, settlement, and trial. Bell J. Econ. 539–550 (1983)
Article Google Scholar
Polinsky, A.M., Shavell, S.: Punitive damages: an economic analysis. Harv. Law Rev. 111, 869–962 (1998)
Article Google Scholar
Ramos, J., et al.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, vol. 242, pp. 133–142 (2003)
Google Scholar
Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: GroupLens: an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, pp. 175–186. ACM (1994)
Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Article Google Scholar
Shepitsen, A., Gemmell, J., Mobasher, B., Burke, R.: Personalized recommendation in social tagging systems using hierarchical clustering. In: Proceedings of the 2008 ACM Conference on Recommender Systems, pp. 259–266. ACM (2008)
Google Scholar
Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394. Association for Computational Linguistics (2010)
Google Scholar
Virpioja, S.: BIRCH: balanced iterative reducing and clustering using hierarchies (2008)
Google Scholar
Wang, W., Chen, Z., Liu, J., Qi, Q., Zhao, Z.: User-based collaborative filtering on cross domain by tag transfer learning. In: Proceedings of the 1st International Workshop on Cross Domain Knowledge Discovery in Web and Social Network Mining, pp. 10–17. ACM (2012)
Google Scholar
Wilkin, G.A., Huang, X.: K-means clustering algorithms: implementation and comparison. In: 2007 Second International Multi-Symposiums on Computer and Computational Sciences. IMSCCS 2007, pp. 133–136. IEEE (2007)
Google Scholar
Yin, H., Wang, W., Wang, H., Chen, L., Zhou, X.: Spatial-aware hierarchical collaborative deep learning for POI recommendation. IEEE Trans. Knowl. Data Eng. 29(11), 2537–2551 (2017)
Article Google Scholar
Zhou, L., Zhang, D.: NLPIR: a theoretical framework for applying natural language processing to information retrieval. J. Assoc. Inf. Sci. Technol. 54(2), 115–123 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210093, China
Tieke He, Hao Lian, Zemin Qin, Zhipeng Zou & Bin Luo

Authors

Tieke He
View author publications
You can also search for this author in PubMed Google Scholar
Hao Lian
View author publications
You can also search for this author in PubMed Google Scholar
Zemin Qin
View author publications
You can also search for this author in PubMed Google Scholar
Zhipeng Zou
View author publications
You can also search for this author in PubMed Google Scholar
Bin Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tieke He .

Editor information

Editors and Affiliations

Renmin University of China, Beijing, China
Xiaofeng Meng
Huazhong University of Science and Technology, Wuhan, China
Ruixuan Li
Renmin University of China, Beijing, China
Kanliang Wang
Taiyuan University of Technology, Yuci, China
Baoning Niu
Tianjin University, Tianjin, China
Xin Wang
South China Normal University, Guangzhou, China
Gansen Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, T., Lian, H., Qin, Z., Zou, Z., Luo, B. (2018). Word Embedding Based Document Similarity for the Inferring of Penalty. In: Meng, X., Li, R., Wang, K., Niu, B., Wang, X., Zhao, G. (eds) Web Information Systems and Applications. WISA 2018. Lecture Notes in Computer Science(), vol 11242. Springer, Cham. https://doi.org/10.1007/978-3-030-02934-0_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-02934-0_22
Published: 20 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02933-3
Online ISBN: 978-3-030-02934-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics