Phrases provide a better foundation for indexing and retrieving documents than individual words. Constituents of phrases make other component words in the phrase less ambiguous than when the words appear separately. Intuitively, classifiers that employ phrases for indexing should perform better than those that use words. Although pioneers have explored the possibility of indexing English documents decades ago, there are relatively fewer similar attempts for Chinese documents, partially because segmenting Chinese text into words correctly is not easy already. We build a domain dependent word list with the help of Chien’s PAT tree-based method and HowNet, and use the resulting word list for defining relevant phrases for classifying Chinese judicial documents. Experimental results indicate that using phrases for indexing indeed allows us to classify judicial documents that are closely similar to each other. With a relatively more efficient algorithm, our classifier offers better performances than those reported in related works.
- Word Pair
- Chinese Character
- Chinese Word
- Chinese Text
- Criminal Charge
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Tax calculation will be finalised at checkout
Purchases are for personal use onlyLearn about institutional subscriptions
Unable to display preview. Download preview PDF.
Salton, G., Yang, C.S., Yu, C.T.: A theory of term importance in automatic text analysis. J. of the American Society for Information Science 26(1), 33–44 (1975)
Fagan, J.L.: Automatic phrase indexing for document retrieval. In: Proc. of the 10th SIGIR, pp. 91–101 (1987)
Croft, W.B., Turtle, H.R., Lewis, D.D.: The use of phrases and structured queries in information retrieval. In: Proc. of the 14th SIGIR, pp. 32–45 (1991)
Chien, L.-F.: PAT-tree-based keyword extraction for Chinese information retrieval. In: Proc. of the 20th SIGIR, pp. 50–58 (1997)
Moulinier, I., Molina-Salgado, H., Jackson, P.: Thomson Legal and Regulatory at NTCIR-3: Japanese, Chinese and English retrieval experiments. In: Proc. of the 3rd NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization and Question Answering (2002)
Chien, L.-F.: Fast and quasi-natural language search for gigabytes of Chinese texts. In: Proc. of the 18th SIGIR, pp. 112–120 (1995)
Kwok, K.L.: Comparing representations in Chinese information retrieval. In: Proc. of the 20th SIGIR, pp. 34–41 (1997)
Liu, C.-L., Chang, C.-T., Ho, J.-H.: Case instance generation and refinement for case-based criminal summary judgments in Chinese. J. of Information Science and Engineering 20(4), 783–800 (2004)
Liu, C.-L., Liao, T.-M.: Classifying criminal charges in Chinese for Web-based legal services. In: Proc. of the 7th Asia Pacific Web Conf., pp. 64–75 (2005)
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Thompson, P.: Automatic categorization of case law. In: Proc. of the 8th ICAIL, pp. 70–77 (2001)
Editors and Affiliations
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, CL., Hsieh, CD. (2006). Exploring Phrase-Based Classification of Judicial Documents for Criminal Charges in Chinese. In: Esposito, F., Raś, Z.W., Malerba, D., Semeraro, G. (eds) Foundations of Intelligent Systems. ISMIS 2006. Lecture Notes in Computer Science(), vol 4203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875604_75
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45764-0
Online ISBN: 978-3-540-45766-4