Improving Persian Text Classification and Clustering Using Persian Thesaurus

  • Hamid ParvinEmail author
  • Atousa Dahbashi
  • Sajad Parvin
  • Behrouz Minaei-Bidgoli
Conference paper
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 151)


This paper proposes an innovative approach to improve the classification performance of Persian texts. The proposed method uses a thesaurus as a helpful knowledge to obtain more representative word-frequencies in the corpus. Two types of word relationships are considered in our used thesaurus. This is the first attempt to use a Persian thesaurus in the field of Persian information retrieval. Experimental results indicate the performance of text classification improves significantly in the case of employing Persian thesaurus rather the case of ignoring Persian thesaurus.


Persian Text Persian Thesaurus Semantic-Based Text Classification 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    American Society of Indexers. Frequently Asked Questions Indexing. Index review in Books, Ireland,
  2. 2.
    Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 583–617 (2002)MathSciNetGoogle Scholar
  3. 3.
    Hamshahri newspaper,
  4. 4.
    Yousefi, A.: Principles and methods for computerized indexing. Journal Books 9(2) (2010) (in Persian)Google Scholar
  5. 5.
    Turney, P.D.: Learning Algorithms for Keyphrase Extraction. Information Retrieval 2(4), 306–336 (1999)Google Scholar
  6. 6.
    Frank, E.: Domain-Based Extraction of Technical Keyphrases. In: International Joint Conference on Artificial Intelligence, India (1999)Google Scholar
  7. 7.
    Liu, Y., Ciliax, B.J., Borges, K., Dasigi, V., Ram, A., Navathe, S.B.: Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering. In: Computational Systems Bioinformatics Conference, Stanford (2005)Google Scholar
  8. 8.
    Frantzi, K., Ananiadou, S., Mima, H.: Automatic Recognition of Multi-word Terms: the C-value/NC-value Method. Digital Libraries 3(2), 115–130 (2002)CrossRefGoogle Scholar
  9. 9.
    Freitas, N., Kaestner, A.: Automatic text summarization using a machine learning approach. In: Brazilian Symposium on Artificial Intelligence (SBIA), Brazil (2005)Google Scholar
  10. 10.
    Zhang, Y., Heywood, N.Z., Milios, E.: World Wide Web Site Summarization Web Intelligence and Agent Systems. Technical Report, CS-2002-8 (2006)Google Scholar
  11. 11.
    Hult, A.: Improved automatic keyword extraction given more linguistic knowledge. In: 8th Conference on Empirical Methods in Natural Language Processing (2003)Google Scholar
  12. 12.
    Deegan, M.: Keyword Extraction with Thesauri and Content Analysis,
  13. 13.
    Hyun, D.: Automatic Keyword Extraction Using Category Correlation of Data, Heidelberg, pp. 224–230 (2006)Google Scholar
  14. 14.
    Witten, W., Medley, I.H.: Thesaurus based automatic keyphrase indexing. In: 6th ACM/IEEE-CS JCDL 2006 (Joint Conference on Digital Libraries) (2006)Google Scholar
  15. 15.
    Klein, M., Steenbergen, W.V.: Thesaurus-based Retrieval of Case Law. In: 19th International JURIX Conference, Paris (2006)Google Scholar
  16. 16.
    Martinez, J.L.: Automatic Keyword Extraction for News Finder, Heidelberg, pp. 405–427 (2008)Google Scholar
  17. 17.
    Shahabi, A.M.: Abstract construction in Persian literature. In: Second International Conference on Cognitive Science, Tehran, p. 56 (2002) (in Persian) Google Scholar
  18. 18.
    Bahar, M.T.: Persian Grammar, ch. IV, p. 111 (1962) (in Persian) Google Scholar
  19. 19.
    Khalouei, M.: indexing machine. Journal Books 6(3) (2009) (in Persian)Google Scholar
  20. 20.
    Karimi, Z., Shamsfard, M.: Automatic summarization systems Persian literature. In: 12th International Conference of Computer Society of Iran (2005) (in Persian)Google Scholar
  21. 21.
    Parvin, H., Minaei-Bidgoli, B., Dahbashi, A.: Improving Persian Text Classification Using Persian Thesaurus. In: Iberoamerican Congress on Pattern Recognition, pp. 391–398 (2011)Google Scholar
  22. 22.
    Hori, E.: A Manual to make and develop a multilingual thesaurus, Scientific Documentation Center (2003) (in Persian)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Hamid Parvin
    • 1
    Email author
  • Atousa Dahbashi
    • 1
  • Sajad Parvin
    • 1
  • Behrouz Minaei-Bidgoli
    • 1
  1. 1.Nourabad Mamasani BranchIslamic Azad UniversityNourabad MamasaniIran

Personalised recommendations