Natural language processing (NLP) techniques can prove relevant to a variety of specialties in the field of cognitive science, including sentiment analysis. This paper investigates the impact of NLP tools, various sentiment features, and sentiment lexicon generation approaches to sentiment polarity classification of internet reviews written in Persian language. For this purpose, a comprehensive Persian WordNet (FerdowsNet), with high recall and proper precision (based on Princeton WordNet), was developed. Using FerdowsNet and a generated corpus of reviews, a Persian sentiment lexicon was developed using (i) mapping to the SentiWordNet and (ii) a semi-supervised learning method, after which the results of both methods were compared. In addition to sentiment words, a set of various features were extracted and applied to the sentiment classification. Then, by employing various well-known feature selection approaches and state-of-the art machine learning methods, a sentiment classification for Persian text reviews was carried out. The obtained results demonstrate the critical role of sentiment lexicon quality in improving the quality of sentiment classification in Persian language.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Parts of Speech
Objectivity score = 1—(positivity score + negativity score)
A more detailed description of FarsNet and the other Persian WordNets is provided in Persian WordNet section.
WordNet 3.1 database statistics
point-wise mutual information
If the total positive and negative sentiment polarity of a word is more than 0.5, the word is subjective.
Available at https://code.google.com/archive/p/word2vec/
Available at https://github.com/attardi/deepnl/
The opinion corpus and Persian text-processing tools for non-commercial use are available on the website of Web Technology laboratory of Ferdowsi University (http://wtlab.um.ac.ir).
The SVM method with different non-linear kernel functions (Sigmoid, Polynomial, RBF) was also studied that compared to the Linear SVM method is less accurate.
Poria S, Cambria E, Bajpai R, Hussain A. A review of affective computing: from unimodal analysis to multimodal fusion. Inf Fusion. 2017;37:98–125.
Turney, P.D. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. in Proceedings of the 40th annual meeting on association for computational linguistics. 2002. Assoc Comput Linguist
Recupero DR, Presutti V, Consoli S, Gangemi A, Nuzzolese AG. Sentilo: frame-based sentiment analysis. Cogn Comput. 2015;7(2):211–25.
Pang, B. and L. Lee, Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, in Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. 2005, Association for Computational Linguistics. p. 115–124.
Tang D, Qin B, Wei F, Dong L, Liu T, Zhou M. A joint segmentation and classification framework for sentence level sentiment classification. IEEE/ACM Trans Audio Speech Lang Process. 2015;23(11):1750–61.
Agarwal B, Mittal N. Prominent feature extraction for sentiment analysis. Berlin: Springer International Publishing; 2016.
Liu B. Sentiment analysis. Mining opinions, sentiments, and emotions: Cambridge University Press; 2015.
Cambria E, Rajagopal D, Olsher D, Das D. Big social data analysis. Big Data Comput. 2013;2013:401–14.
Wang Q-F, Cambria E, Liu C-L, Hussain A. Common sense knowledge for handwritten chinese text recognition. Cogn Comput. 2013;5(2):234–42.
Cambria E, Mazzocco T, Hussain A. Application of multi-dimensional scaling and artificial neural networks for biologically inspired opinion mining. Biologically Inspired Cogn Architectures. 2013;4:41–53.
Zheng L, Wang H, Gao S. Sentimental feature selection for sentiment analysis of Chinese online reviews. Int J Mach Learn Cybern. 2015:1–10.
Liao C, Feng C, Yang S, Huang H. Topic-related Chinese message sentiment analysis. Neurocomputing. 2016;210:237–46.
Aldayel HK, Azmi AM. Arabic tweets sentiment analysis—a hybrid scheme. J Inf Sci. 2015;42(6):782–97.
Vilares D, Alonso MA, Gómez-Rodríguez C. A syntactic approach for opinion mining on Spanish reviews. Nat Lang Eng. 2015;21(01):139–63.
Habernal I, Ptáček T, Steinberger J. Reprint of “Supervised sentiment analysis in Czech social media”. Inf Process Manag. 2015;51(4):532–46.
Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah AY, Gelbukh A, et al. Multilingual sentiment analysis: state of the art and independent comparison of techniques. Cogn Comput. 2016:1–15.
Balahur A, Perea-Ortega JM. Sentiment analysis system adaptation for multilingual processing: The case of tweets. Inf Process Manag. 2015;51(4):547–56.
Zhang, P., S. Wang, and D. Li, Cross-lingual sentiment classification: similarity discovery plus training data adjustment. Knowl-Based Syst, 2016.
Guo, H., H. Zhu, Z. Guo, X. Zhang, and Z. Su. OpinionIt: a text mining system for cross-lingual opinion analysis. in Proceedings of the 19th ACM international conference on Information and knowledge management. 2010. ACM.
Gao D, Wei F, Li W, Liu X, Zhou M. Cross-lingual sentiment lexicon learning with bilingual word graph label propagation. Comput Linguist. 2015;41(1):21–40.
Balahur A, Turchi M. Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis. Comput Speech Lang. 2013;28(1):56–75.
Banea C, Mihalcea R, Wiebe J. Porting multilingual subjectivity resources across languages. IEEE Trans Affect Comput. 2013;4(2)
Martín-Valdivia M-T, Martínez-Cámara E, Perea-Ortega J-M, Ureña-López LA. Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches. Expert Syst Appl. 2013;40(10):3934–42.
Duwairi R, El-Orfali M. A study of the effects of preprocessing strategies on sentiment analysis for Arabic text. J Inf Sci. 2014;40(4):501–13.
Prusa, J.D., T.M. Khoshgoftaar, and D.J. Dittman. Impact of feature selection techniques for tweet sentiment classification. in Proceedings of the Twenty-Eighth International Florida Artificial Intelligence Research Society Conference. 2015.
Uysal AK, Gunal S. The impact of preprocessing on text classification. Inf Process Manag. 2014;50(1):104–12.
Shamsfard, M. Challenges and open problems in Persian text processing. In 5th Language & Technology Conference (LTC): Human Language Technologies as a Challenge for Computer Science and Linguistics. Poznań, Poland; 2011. p. 65–69.
Feely, W., M. Manshadi, R. Frederking, and L. Levin. The CMU METAL Farsi NLP Approach. in Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14). 2014.
Hung C, Chen S-J. Word sense disambiguation based sentiment lexicons for sentiment classification. Knowl-Based Syst. 2016;110:224–32.
Taboada M, Brooke J, Tofiloski M, Voll K, Stede M. Lexicon-based methods for sentiment analysis. Comput Linguist. 2011;37(2):267–307.
Montejo-Ráez A, Martínez-Cámara E, Martín-Valdivia MT, Ureña-López LA. Ranked wordnet graph for sentiment polarity classification in twitter. Comput Speech Lang. 2014;28(1):93–107.
Agarwal B, Poria S, Mittal N, Gelbukh A, Hussain A. Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach. Cogn Comput. 2015;7(4):487–99.
Poria S, Cambria E, Winterstein G, Huang G-B. Sentic patterns: Dependency-based rules for concept-level sentiment analysis. Knowl-Based Syst. 2014;69:45–63.
Dong, L., F. Wei, S. Liu, M. Zhou, and K. Xu, A statistical parsing framework for sentiment classification. Comput Linguist, 2015.
Oliveira N, Cortez P, Areal N. Stock market sentiment lexicon acquisition using microblogging data and statistical measures. Decis Support Syst. 2016;85:62–73.
Ofek N, Poria S, Rokach L, Cambria E, Hussain A, Shabtai A. Unsupervised commonsense knowledge enrichment for domain-specific sentiment analysis. Cogn Comput. 2016;8(3):467–77.
Wang G, Zhang Z, Sun J, Yang S, Larson CA. POS-RS: a random Subspace method for sentiment classification based on part-of-speech analysis. Inf Process Manag. 2015;51(4):458–79.
Liu, B. and L. Zhang, A survey of opinion mining and sentiment analysis, in Mining Text Data. 2012, Springer. p. 415–463.
Boiy E, Moens M-F. A machine learning approach to sentiment analysis in multilingual Web texts. Inf Retr. 2009;12(5):526–58.
Cambria E. Affective computing and sentiment analysis. IEEE Intell Syst. 2016;31(2):102–7.
Appel O, Chiclana F, Carter J, Fujita H. A hybrid approach to the sentiment analysis problem at the sentence level. Knowl-Based Syst. 2016;108:110–24.
Catal C, Nangir M. A sentiment classification model based on multiple classifiers. Appl Soft Comput. 2017;50:135–41.
Rushdi Saleh M, Martín-Valdivia MT, Montejo-Ráez A, Ureña-López L. Experiments with SVM to classify opinions in different domains. Expert Syst Appl. 2011;38(12):14799–804.
Esuli, A. and F. Sebastiani, Pageranking wordnet synsets: an application to opinion mining, in Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL). 2007: Prague, Czech Republic. p. 442–431.
Hassan, A. and D. Radev. Identifying text polarity using random walks. in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 2010. Assoc Comput Linguist
Hassan, A., A. Abu-Jbara, R. Jha, and D. Radev. Identifying the semantic orientation of foreign words. in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume 2. 2011. Assoc Comput Linguist
Dehdarbehbahani I, Shakery A, Faili H. Semi-supervised word polarity identification in resource-lean languages. Neural Netw. 2014;58:50–9.
Dehkharghani R, Saygin Y, Yanikoglu B, Oflazer K. SentiTurkNet: a Turkish polarity lexicon for sentiment analysis. Lang Resour Eval. 2016;50(3):667–85.
Baccianella, S., A. Esuli, and F. Sebastiani. SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. in LREC. 2010.
Esuli, A. and F. Sebastiani. Sentiwordnet: A publicly available lexical resource for opinion mining. in Proceedings of 5th International Conference on Language Resources and Evaluation (LREC). 2006. Genoa: Citeseer.
Strapparava, C. and A. Valitutti. WordNet Affect: an Affective Extension of WordNet. in Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC). 2004.
Neviarouskaya A, Prendinger H, Ishizuka M. SentiFul: a lexicon for sentiment analysis. IEEE Trans Affect Comput. 2011;2(1):22–36.
Cambria, E., R. Speer, C. Havasi, and A. Hussain. SenticNet: A publicly available semantic resource for opinion mining. in AAAI fall symposium: commonsense knowledge. 2010.
Cambria, E., S. Poria, R. Bajpai, and B.W. Schuller. SenticNet 4: A Semantic Resource for Sentiment Analysis Based on Conceptual Primitives. in Proceedings of the 26th International Conference Computational Linguistics (COLING). 2016. Osaka.
Pandarachalil R, Sendhilkumar S, Mahalakshmi G. Twitter sentiment analysis for large-scale data: an unsupervised approach. Cogn Comput. 2015;7(2):254–62.
Denecke, K. Using sentiwordnet for multilingual sentiment analysis. in Data Engineering Workshop, 2008. ICDEW 2008. IEEE 24th International Conference on. 2008. IEEE.
Cruz FL, Troyano JA, Pontes B, Ortega FJ. Building layered, multilingual sentiment lexicons at synset and lemma levels. Expert Syst Appl. 2014;41(13):5984–94.
Basiri ME, Naghsh-Nilchi AR, Ghassem-Aghaee N. A framework for sentiment analysis in Persian. Open Trans Inf Process. 2014;1(3):1–14.
Amiri, F., S. Scerri, and M.H. Khodashahi. Lexicon-based sentiment analysis for Persian Text. in Recent Advances in Natural Language Processing. 2015.
Shams, M., A. Shakery, and H. Faili. A non-parametric LDA-based induction method for sentiment analysis. in Proceeding of the16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP). 2012. IEEE.
Ali-Mardani S, Aghaie A. Desinging supervised method for opinion mining in the Persian using lexicon and SVM (In Persian). J Inf Technol Manag. 2015;7(2):345–62.
Cerini, S., V. Compagnoni, A. Demontis, M. Formentelli, and G. Gandini, Micro-WNOp: a gold standard for the evaluation of automatically compiled lexical resources for opinion mining. Language resources and linguistic theory: Typology, second language acquisition, English linguistics, 2007: p. 200–210.
Dashtipour, K., A. Hussain, Q. Zhou, A. Gelbukh, A.Y. Hawalah, and E. Cambria. PerSent: a freely available Persian sentiment lexicon. in Proceedings of the 8th International Conference Advances in Brain Inspired Cognitive Systems, BICS 2016, Beijing, China. 2016. Spring.
Steinberger J, Ebrahim M, Ehrmann M, Hurriyetoglu A, Kabadjov M, Lenkova P, et al. Creating sentiment dictionaries via triangulation. Decis Support Syst. 2012;53(4):689–94.
Özsert, C.M. and A. Özgür, Word polarity detection using a multilingual approach, in computational linguistics and intelligent text processing. 2013, Springer. p. 75–82.
Chen, Y. and S. Skiena. Building sentiment lexicons for all major languages. in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Short Papers). 2014.
Mahyoub FH, Siddiqui MA, Dahab MY. Building an Arabic sentiment lexicon using semi-supervised learning. J King Saud Univ Comput Inf Sci. 2014;26(4):417–24.
Famian A, Aghajaney D. Towards building a WordNet for Persian adjectives. Int J Lexicogr. 2000;2006:307–8.
Keyvan, F., H. Borjian, M. Kasheff, and C. Fellbaum. Developing persianet: the persian wordnet. in 3rd Global wordnet conference. 2007.
Montazery, M. and H. Faili. Automatic Persian wordnet construction. in Proceedings of the 23rd International Conference on Computational Linguistics: Posters. 2010. Assoc Comput Linguist.
Shamsfard, M., A. Hesabi, H. Fadaei, N. Mansoory, A. Famian, S. Bagherbeigi, E. Fekri, M. Monshizadeh, and S.M. Assi. Semi automatic development of farsnet; the persian wordnet. in Proceedings of 5th Global WordNet Conference, Mumbai, India. 2010.
Fadaee, M., H. Ghader, H. Faili, and A. Shakery, Automatic WordNet construction using Markov Chain Monte Carlo. Polibits, 2013(47): p. 13–22.
Taghizadeh N, Faili H. Automatic Wordnet development for low-resource languages using cross-lingual WSD. J Artif Intell Res. 2016;56:61–87.
Mahdisoltani, F., J. Biega, and F. Suchanek. YAGO3: a knowledge base from multilingual Wikipedias. in 7th Biennial Conference on Innovative Data Systems Research. 2014. CIDR 2015.
Turney, P. Mining the web for synonyms: PMI-IR versus LSA on TOEFL. in 12th European Conference on Machine Learning (ECML 2001), Freiburg, Germany 2001.
AleAhmad A, Amiri H, Darrudi E, Rahgozar M, Oroumchian F. Hamshahri: a standard Persian text collection. Knowl-Based Syst. 2009;22(5):382–7.
Eghbalzadeh, H., B. Hosseini, S. Khadivi, and A. Khodabakhsh. Persica: a Persian corpus for multi-purpose text mining and Natural language processing. in Telecommunications (IST), 2012 Sixth International Symposium on. 2012. IEEE.
Balali, A., A. Rajabi, S. Ghassemi, M. Asadpour, and H. Faili. Content diffusion prediction in social networks. in 5th Conference on Information and Knowledge Technology (IKT). 2013.
Jin, W., H.H. Ho, and R.K. Srihari. OpinionMiner: a novel machine learning system for web opinion mining and extraction. in Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 2009. ACM.
Collins, M. Discriminative training methods for hidden markov models: theory and experiments with perceptron algorithms. in Proceedings of the ACL-02 conference on Empirical methods in natural language processing. 2002. Assoc Comput Linguist
Chu C, Hsu A-L, Chou K-H, Bandettini P, Lin C, A. D.N. Initiative. Does feature selection improve classification accuracy? Impact of sample size and feature selection on classification using anatomical magnetic resonance images. NeuroImage. 2012;60(1):59–70.
Tang, J., S. Alelyani, and H. Liu, Feature selection for classification: a review. Data Classification: Algorithms and Applications, 2014.
Bermingham ML, Pong-Wong R, Spiliopoulou A, Hayward C, Rudan I, Campbell H, et al. Application of high-dimensional feature selection: evaluation for genomic prediction in man. Sci Rep. 2015;5:10312.
Paltoglou, G. and M. Thelwall. A study of information retrieval weighting schemes for sentiment analysis. in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 2010. Assoc Comput Linguist
Martineau, J. and T. Finin, Delta TFIDF: an improved feature space for sentiment analysis, in Proceedings of the Third International ICWSM Conference. 2009. p. 106.
Blamey, B., T. Crick, and G. Oatley, RU:-) or:-(? character-vs. word-gram feature selection for sentiment classification of OSN corpora, in Research and Development in Intelligent Systems XXIX. 2012, Springer. p. 207–212.
Mohammad, S.M., S. Kiritchenko, and X. Zhu, NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets, in 7th International Workshop on Semantic Evaluation (SemEval 2013). 2013. p. 321–327.
Zhu, X., S. Kiritchenko, and S.M. Mohammad. Nrc-canada-2014: Recent improvements in the sentiment analysis of tweets. in Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). 2014.
Jain AK, Pandey Y. Analysis and implementation of sentiment classification using Lexical POS markers. Int J Comput Commun Netw. 2013;2(1):36–40.
Agarwal B, Mittal N. Semantic feature clustering for sentiment analysis of English reviews. IETE J Res. 2014;60(6):414–22.
O’Keefe, T. and I. Koprinska. Feature selection and weighting methods in sentiment analysis. in Proceedings of the 14th Australasian document computing symposium, Sydney. 2009. Citeseer.
Dong, L., F. Wei, Y. Yin, M. Zhou, and K. Xu, Splusplus: a feature-rich two-stage classifier for sentiment analysis of tweets. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), 2015: p. 515–519.
Mikolov, T., I. Sutskever, K. Chen, G.S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. in Adv Neural Inf Proces Syst 2013.
Tang, D., F. Wei, N. Yang, M. Zhou, T. Liu, and B. Qin. Learning sentiment-specific word embedding for Twitter sentiment classification. in The 52nd Annual Meeting of the Association for Computational Linguistics (ACL). 2014. USA.
Labutov, I. and H. Lipson. Re-embedding words. in Association for Computational Linguistics (ACL). 2013. Bulgaria.
Forman G. An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res. 2003;3:1289–305.
Zheng Z, Wu X, Srihari R. Feature selection for text categorization on imbalanced data. ACM Sigkdd Explor Newsl. 2004;6(1):80–9.
Uchyigit, G. Experimental evaluation of feature selection methods for text classification. in Fuzzy Systems and Knowledge Discovery (FSKD), 2012 9th International Conference on. 2012. IEEE.
Manning, C.D., P. Raghavan, and H. Schütze, Introduction to information retrieval. Vol. 1. 2008: Cambridge University Press.
Sebastiani F. Machine learning in automated text categorization. Acm Comput Surveys (Csur). 2002;34(1):1–47.
Ng, H.T., W.B. Goh, and K.L. Low. Feature selection, perceptron learning, and a usability case study for text categorization. in ACM SIGIR Forum. 1997. ACM.
Galavotti, L., F. Sebastiani, and M. Simi, Experiments on the use of feature selection and negative evidence in automated text categorization, in Research and Advanced Technology for Digital Libraries. 2000, Springer. p. 59–68.
Fragoudis D, Meretakis D, Likothanassis S. Best terms: an efficient feature-selection algorithm for text categorization. Knowl Inf Syst. 2005;8(1):16–33.
Simeon, M. and R. Hilderman. Categorical proportional difference: A feature selection method for text categorization. in Proceedings of the 7th Australasian Data Mining Conference. 2008. Australian Computer Society Inc.
Denecke, K., Are SentiWordNet scores suited for multi-domain sentiment classification?, in Fourth International Conference on Digital Information Management, (ICDIM 2009). 2009, IEEE. p. 1–6.
Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J. LIBLINEAR: a library for large linear classification. J Mach Learn Res. 2008;9:1871–4.
Conflict of Interest
The authors declare that they have no conflict of interest.
This article does not contain any studies with human participants performed by any of the authors.
In this paper, informed consent was not needed. We do not use any private or personal information in this research study.
About this article
Cite this article
Asgarian, E., Kahani, M. & Sharifi, S. The Impact of Sentiment Features on the Sentiment Polarity Classification in Persian Reviews. Cogn Comput 10, 117–135 (2018). https://doi.org/10.1007/s12559-017-9513-1
- Opinion mining
- Persian sentiment word miner
- Feature engineering
- Comprehensive Persian WordNet