Exploring Phrase-Based Classification of Judicial Documents for Criminal Charges in Chinese

  • Chao-Lin Liu
  • Chwen-Dar Hsieh
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4203)


Phrases provide a better foundation for indexing and retrieving documents than individual words. Constituents of phrases make other component words in the phrase less ambiguous than when the words appear separately. Intuitively, classifiers that employ phrases for indexing should perform better than those that use words. Although pioneers have explored the possibility of indexing English documents decades ago, there are relatively fewer similar attempts for Chinese documents, partially because segmenting Chinese text into words correctly is not easy already. We build a domain dependent word list with the help of Chien’s PAT tree-based method and HowNet, and use the resulting word list for defining relevant phrases for classifying Chinese judicial documents. Experimental results indicate that using phrases for indexing indeed allows us to classify judicial documents that are closely similar to each other. With a relatively more efficient algorithm, our classifier offers better performances than those reported in related works.


Word Pair Chinese Character Chinese Word Chinese Text Criminal Charge 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Salton, G., Yang, C.S., Yu, C.T.: A theory of term importance in automatic text analysis. J. of the American Society for Information Science 26(1), 33–44 (1975)CrossRefGoogle Scholar
  2. 2.
    Fagan, J.L.: Automatic phrase indexing for document retrieval. In: Proc. of the 10th SIGIR, pp. 91–101 (1987)Google Scholar
  3. 3.
    Croft, W.B., Turtle, H.R., Lewis, D.D.: The use of phrases and structured queries in information retrieval. In: Proc. of the 14th SIGIR, pp. 32–45 (1991)Google Scholar
  4. 4.
    Chien, L.-F.: PAT-tree-based keyword extraction for Chinese information retrieval. In: Proc. of the 20th SIGIR, pp. 50–58 (1997)Google Scholar
  5. 5.
    Moulinier, I., Molina-Salgado, H., Jackson, P.: Thomson Legal and Regulatory at NTCIR-3: Japanese, Chinese and English retrieval experiments. In: Proc. of the 3rd NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization and Question Answering (2002)Google Scholar
  6. 6.
    Chien, L.-F.: Fast and quasi-natural language search for gigabytes of Chinese texts. In: Proc. of the 18th SIGIR, pp. 112–120 (1995)Google Scholar
  7. 7.
    Kwok, K.L.: Comparing representations in Chinese information retrieval. In: Proc. of the 20th SIGIR, pp. 34–41 (1997)Google Scholar
  8. 8.
  9. 9.
    Liu, C.-L., Chang, C.-T., Ho, J.-H.: Case instance generation and refinement for case-based criminal summary judgments in Chinese. J. of Information Science and Engineering 20(4), 783–800 (2004)Google Scholar
  10. 10.
    Liu, C.-L., Liao, T.-M.: Classifying criminal charges in Chinese for Web-based legal services. In: Proc. of the 7th Asia Pacific Web Conf., pp. 64–75 (2005)Google Scholar
  11. 11.
    Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)MATHGoogle Scholar
  12. 12.
    Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)MATHGoogle Scholar
  13. 13.
    Thompson, P.: Automatic categorization of case law. In: Proc. of the 8th ICAIL, pp. 70–77 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Chao-Lin Liu
    • 1
  • Chwen-Dar Hsieh
    • 1
  1. 1.Department of Computer ScienceNational Chengchi UniversityTaiwan

Personalised recommendations