Exploring Phrase-Based Classification of Judicial Documents for Criminal Charges in Chinese
Phrases provide a better foundation for indexing and retrieving documents than individual words. Constituents of phrases make other component words in the phrase less ambiguous than when the words appear separately. Intuitively, classifiers that employ phrases for indexing should perform better than those that use words. Although pioneers have explored the possibility of indexing English documents decades ago, there are relatively fewer similar attempts for Chinese documents, partially because segmenting Chinese text into words correctly is not easy already. We build a domain dependent word list with the help of Chien’s PAT tree-based method and HowNet, and use the resulting word list for defining relevant phrases for classifying Chinese judicial documents. Experimental results indicate that using phrases for indexing indeed allows us to classify judicial documents that are closely similar to each other. With a relatively more efficient algorithm, our classifier offers better performances than those reported in related works.
KeywordsWord Pair Chinese Character Chinese Word Chinese Text Criminal Charge
Unable to display preview. Download preview PDF.
- 2.Fagan, J.L.: Automatic phrase indexing for document retrieval. In: Proc. of the 10th SIGIR, pp. 91–101 (1987)Google Scholar
- 3.Croft, W.B., Turtle, H.R., Lewis, D.D.: The use of phrases and structured queries in information retrieval. In: Proc. of the 14th SIGIR, pp. 32–45 (1991)Google Scholar
- 4.Chien, L.-F.: PAT-tree-based keyword extraction for Chinese information retrieval. In: Proc. of the 20th SIGIR, pp. 50–58 (1997)Google Scholar
- 5.Moulinier, I., Molina-Salgado, H., Jackson, P.: Thomson Legal and Regulatory at NTCIR-3: Japanese, Chinese and English retrieval experiments. In: Proc. of the 3rd NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization and Question Answering (2002)Google Scholar
- 6.Chien, L.-F.: Fast and quasi-natural language search for gigabytes of Chinese texts. In: Proc. of the 18th SIGIR, pp. 112–120 (1995)Google Scholar
- 7.Kwok, K.L.: Comparing representations in Chinese information retrieval. In: Proc. of the 20th SIGIR, pp. 34–41 (1997)Google Scholar
- 8.HowNet, www.keenage.com
- 9.Liu, C.-L., Chang, C.-T., Ho, J.-H.: Case instance generation and refinement for case-based criminal summary judgments in Chinese. J. of Information Science and Engineering 20(4), 783–800 (2004)Google Scholar
- 10.Liu, C.-L., Liao, T.-M.: Classifying criminal charges in Chinese for Web-based legal services. In: Proc. of the 7th Asia Pacific Web Conf., pp. 64–75 (2005)Google Scholar
- 13.Thompson, P.: Automatic categorization of case law. In: Proc. of the 8th ICAIL, pp. 70–77 (2001)Google Scholar