Abstract
This paper proposes an innovative approach to improve the performance of Persian text classification. The proposed method uses a thesaurus as a helpful knowledge to obtain the real frequencies of words in the corpus. Three types of relationships are considered in our thesaurus. This is the first attempt to use a Persian thesaurus in the field of Persian information retrieval. Experimental results show a significant improvement in the case of employing Persian thesaurus rather common methods.
Chapter PDF
References
American Society of Indexers, Frequently Asked Questions Indexing. Index review in Books, Ireland (1994), http://www.asindexing.org/site/indfaq.shtml
Maron, M.E.: Automatic indexing: an experimental enquiry. Journal of the ACM 8, 404–417 (1961)
Montgomery, C.A.: Linguistics and information science. Journal of the American Society for Information Science 23, 195–219 (1972)
Brooks, H.M.: Expert Systems and Intelligent Information Retrieval. Information Processing and Management 23(4), 367–382 (1987)
Turney, P.D.: Learning Algorithms for Keyphrase Extraction. Information Retrieval 2(4), 306–336 (1999)
Frank, E.: Domain-Based Extraction of Technical Keyphrases. In: 6th International Joint Conference on Artificial Intelligence, India (1999)
Liu, Y., Ciliax, B.J., Borges, K., Dasigi, V., Ram, A., Navathe, S.B., Ingledine, R.: Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering. In: 4th IEEE Computational Systems Bioinformatics Conference (CSB 2004), Stanford (2005)
Frantzi, K., Ananiadou, S., Mima, H.: Automatic Recognition of Multi-word Terms: the C-value/NC-value Method. Digital Libraries 3(2), 115–130 (2002)
Freitas, N., Kaestner, A.: Automatic text summarization using a machine learning approach. In: 16th Brazilian Symposium on Artificial Intelligence (SBIA), Brazil (2005)
Zhang, Y., Heywood, N.Z., Milios, E.: World Wide Web Site Summarization Web Intelligence and Agent Systems. Technical Report, CS-2002-8 (2006)
Hult, A.: Improved automatic keyword extraction given more linguistic knowledge. In: 8th Conference on Empirical Methods in Natural Language Processing (EMNLP), Japan (2003)
Deegan, M.: Keyword Extraction with Thesauri and Content Analysis, http://www.rlg.org/en/page.php?Page_ID=17068
Hyun, D.: Automatic Keyword Extraction Using Category Correlation of Data, Heidelberg, pp. 224–230 (2006)
Witten, W., Medley, I.H.: Thesaurus based automatic keyphrase indexing. In: 6th ACM/IEEE-CS JCDL 2006 (Joint Conference on Digital Libraries) (2006)
Klein, M., Steenbergen, W.V.: Thesaurus-based Retrieval of Case Law. In: 19th International JURIX Conference, Paris (2006)
Martinez, J.L.: Automatic Keyword Extraction for News Finder, Heidelberg, pp. 405–427 (2008)
Shahabi, A.M.: Abstract construction in Persian literature. In: Second International Conference on Cognitive Science, Tehran, p. 56 (1381) (in Persian)
Bahar, M.T.: Persian Grammar, ch. IV, p. 111 (1342) (in Persian)
Khalouei, M.: Indexing Machine. Journal Books 6(3) (in Persian)
Karimi, Z., Shamsfard, M.: Automatic summarization systems Persian literature. In: 12th International Conference of Computer Society of Iran (1385) (in Persian)
Yousefi, A.: Principles and methods for computerized indexing. Journal Books 9(2) (in Persian)
Hamshahri newspaper, http://www.hamshahrionline.ir
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Parvin, H., Minaei-Bidgoli, B., Dahbashi, A. (2011). Improving Persian Text Classification Using Persian Thesaurus. In: San Martin, C., Kim, SW. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2011. Lecture Notes in Computer Science, vol 7042. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25085-9_46
Download citation
DOI: https://doi.org/10.1007/978-3-642-25085-9_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25084-2
Online ISBN: 978-3-642-25085-9
eBook Packages: Computer ScienceComputer Science (R0)