Skip to main content

Exploring Phrase-Based Classification of Judicial Documents for Criminal Charges in Chinese

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 4203)


Phrases provide a better foundation for indexing and retrieving documents than individual words. Constituents of phrases make other component words in the phrase less ambiguous than when the words appear separately. Intuitively, classifiers that employ phrases for indexing should perform better than those that use words. Although pioneers have explored the possibility of indexing English documents decades ago, there are relatively fewer similar attempts for Chinese documents, partially because segmenting Chinese text into words correctly is not easy already. We build a domain dependent word list with the help of Chien’s PAT tree-based method and HowNet, and use the resulting word list for defining relevant phrases for classifying Chinese judicial documents. Experimental results indicate that using phrases for indexing indeed allows us to classify judicial documents that are closely similar to each other. With a relatively more efficient algorithm, our classifier offers better performances than those reported in related works.


  • Word Pair
  • Chinese Character
  • Chinese Word
  • Chinese Text
  • Criminal Charge

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Salton, G., Yang, C.S., Yu, C.T.: A theory of term importance in automatic text analysis. J. of the American Society for Information Science 26(1), 33–44 (1975)

    CrossRef  Google Scholar 

  2. Fagan, J.L.: Automatic phrase indexing for document retrieval. In: Proc. of the 10th SIGIR, pp. 91–101 (1987)

    Google Scholar 

  3. Croft, W.B., Turtle, H.R., Lewis, D.D.: The use of phrases and structured queries in information retrieval. In: Proc. of the 14th SIGIR, pp. 32–45 (1991)

    Google Scholar 

  4. Chien, L.-F.: PAT-tree-based keyword extraction for Chinese information retrieval. In: Proc. of the 20th SIGIR, pp. 50–58 (1997)

    Google Scholar 

  5. Moulinier, I., Molina-Salgado, H., Jackson, P.: Thomson Legal and Regulatory at NTCIR-3: Japanese, Chinese and English retrieval experiments. In: Proc. of the 3rd NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization and Question Answering (2002)

    Google Scholar 

  6. Chien, L.-F.: Fast and quasi-natural language search for gigabytes of Chinese texts. In: Proc. of the 18th SIGIR, pp. 112–120 (1995)

    Google Scholar 

  7. Kwok, K.L.: Comparing representations in Chinese information retrieval. In: Proc. of the 20th SIGIR, pp. 34–41 (1997)

    Google Scholar 

  8. HowNet,

  9. Liu, C.-L., Chang, C.-T., Ho, J.-H.: Case instance generation and refinement for case-based criminal summary judgments in Chinese. J. of Information Science and Engineering 20(4), 783–800 (2004)

    Google Scholar 

  10. Liu, C.-L., Liao, T.-M.: Classifying criminal charges in Chinese for Web-based legal services. In: Proc. of the 7th Asia Pacific Web Conf., pp. 64–75 (2005)

    Google Scholar 

  11. Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  12. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  13. Thompson, P.: Automatic categorization of case law. In: Proc. of the 8th ICAIL, pp. 70–77 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, CL., Hsieh, CD. (2006). Exploring Phrase-Based Classification of Judicial Documents for Criminal Charges in Chinese. In: Esposito, F., Raś, Z.W., Malerba, D., Semeraro, G. (eds) Foundations of Intelligent Systems. ISMIS 2006. Lecture Notes in Computer Science(), vol 4203. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45764-0

  • Online ISBN: 978-3-540-45766-4

  • eBook Packages: Computer ScienceComputer Science (R0)