Improving Persian Text Classification Using Persian Thesaurus

  • Hamid Parvin
  • Behrouz Minaei-Bidgoli
  • Atousa Dahbashi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7042)


This paper proposes an innovative approach to improve the performance of Persian text classification. The proposed method uses a thesaurus as a helpful knowledge to obtain the real frequencies of words in the corpus. Three types of relationships are considered in our thesaurus. This is the first attempt to use a Persian thesaurus in the field of Persian information retrieval. Experimental results show a significant improvement in the case of employing Persian thesaurus rather common methods.


Persian Text Persian Thesaurus Semantic-Based Text Classification 


  1. 1.
    American Society of Indexers, Frequently Asked Questions Indexing. Index review in Books, Ireland (1994),
  2. 2.
    Maron, M.E.: Automatic indexing: an experimental enquiry. Journal of the ACM 8, 404–417 (1961)CrossRefzbMATHGoogle Scholar
  3. 3.
    Montgomery, C.A.: Linguistics and information science. Journal of the American Society for Information Science 23, 195–219 (1972)CrossRefGoogle Scholar
  4. 4.
    Brooks, H.M.: Expert Systems and Intelligent Information Retrieval. Information Processing and Management 23(4), 367–382 (1987)CrossRefGoogle Scholar
  5. 5.
    Turney, P.D.: Learning Algorithms for Keyphrase Extraction. Information Retrieval 2(4), 306–336 (1999)Google Scholar
  6. 6.
    Frank, E.: Domain-Based Extraction of Technical Keyphrases. In: 6th International Joint Conference on Artificial Intelligence, India (1999)Google Scholar
  7. 7.
    Liu, Y., Ciliax, B.J., Borges, K., Dasigi, V., Ram, A., Navathe, S.B., Ingledine, R.: Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering. In: 4th IEEE Computational Systems Bioinformatics Conference (CSB 2004), Stanford (2005)Google Scholar
  8. 8.
    Frantzi, K., Ananiadou, S., Mima, H.: Automatic Recognition of Multi-word Terms: the C-value/NC-value Method. Digital Libraries 3(2), 115–130 (2002)CrossRefGoogle Scholar
  9. 9.
    Freitas, N., Kaestner, A.: Automatic text summarization using a machine learning approach. In: 16th Brazilian Symposium on Artificial Intelligence (SBIA), Brazil (2005)Google Scholar
  10. 10.
    Zhang, Y., Heywood, N.Z., Milios, E.: World Wide Web Site Summarization Web Intelligence and Agent Systems. Technical Report, CS-2002-8 (2006)Google Scholar
  11. 11.
    Hult, A.: Improved automatic keyword extraction given more linguistic knowledge. In: 8th Conference on Empirical Methods in Natural Language Processing (EMNLP), Japan (2003)Google Scholar
  12. 12.
    Deegan, M.: Keyword Extraction with Thesauri and Content Analysis,
  13. 13.
    Hyun, D.: Automatic Keyword Extraction Using Category Correlation of Data, Heidelberg, pp. 224–230 (2006)Google Scholar
  14. 14.
    Witten, W., Medley, I.H.: Thesaurus based automatic keyphrase indexing. In: 6th ACM/IEEE-CS JCDL 2006 (Joint Conference on Digital Libraries) (2006)Google Scholar
  15. 15.
    Klein, M., Steenbergen, W.V.: Thesaurus-based Retrieval of Case Law. In: 19th International JURIX Conference, Paris (2006)Google Scholar
  16. 16.
    Martinez, J.L.: Automatic Keyword Extraction for News Finder, Heidelberg, pp. 405–427 (2008)Google Scholar
  17. 17.
    Shahabi, A.M.: Abstract construction in Persian literature. In: Second International Conference on Cognitive Science, Tehran, p. 56 (1381) (in Persian)Google Scholar
  18. 18.
    Bahar, M.T.: Persian Grammar, ch. IV, p. 111 (1342) (in Persian)Google Scholar
  19. 19.
    Khalouei, M.: Indexing Machine. Journal Books 6(3) (in Persian)Google Scholar
  20. 20.
    Karimi, Z., Shamsfard, M.: Automatic summarization systems Persian literature. In: 12th International Conference of Computer Society of Iran (1385) (in Persian)Google Scholar
  21. 21.
    Yousefi, A.: Principles and methods for computerized indexing. Journal Books 9(2) (in Persian)Google Scholar
  22. 22.
    Hamshahri newspaper,

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Hamid Parvin
    • 1
  • Behrouz Minaei-Bidgoli
    • 1
  • Atousa Dahbashi
    • 1
  1. 1.School of Computer EngineeringIran University of Science and Technology (IUST)TehranIran

Personalised recommendations