Sentiment Classification of Customer’s Reviews About Automobiles in Roman Urdu

  • Moin Khan
  • Kamran Malik
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 887)


Text mining is a broad field having sentiment mining as its important constituent in which we try to deduce the behavior of people towards a specific item, merchandise, politics, sports, social media comments, review sites, etc. Out of many issues in sentiment mining, analysis and classification, one major issue is that the reviews and comments can be in different languages, like English, Arabic, Urdu, etc. Handling each language according to its rules is a difficult task. A lot of research work has been done in English Language for sentiment analysis and classification but limited sentiment analysis work is being carried out on other regional languages, like Arabic, Urdu and Hindi. In this paper, Waikato Environment for Knowledge Analysis (WEKA) is used as a platform to execute different classification models for text classification of Roman Urdu text. Reviews dataset has been scrapped from different automobiles’ sites. These extracted Roman Urdu reviews, containing 1000 positive and 1000 negative reviews are then saved in WEKA attribute-relation file format (ARFF) as labeled examples. Training is done on 80% of this data and rest of it is used for testing purpose which is done using different models and results are analyzed in each case. The results show that Multinomial Naïve Bayes outperformed Bagging, Deep Neural Network, Decision Tree, Random Forest, AdaBoost, k-NN and SVM Classifiers in terms of more accuracy, precision, recall and F-measure.


Sentiment analysis Classification Customer reviews Automobiles WEKA Roman urdu 


  1. 1.
    Khushboo, T.N., Vekariya, S.K., Mishra, S.: Mining of sentence level opinion using supervised term weighted approach of Naïve Bayesian algorithm. Int. J. Comput. Technol. Appl. 3(3), 987 (2012)Google Scholar
  2. 2.
    Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends® Inf. Retr. 2(1–2), 1–135 (2008)CrossRefGoogle Scholar
  3. 3.
    Rashid, A., Anwer, N., Iqbal, M., Sher, M.: A survey paper: areas, techniques and challenges of opinion mining. IJCSI Int. J. Comput. Sci. Issues 10(2), 18–31 (2013)Google Scholar
  4. 4.
    Katsiavriades, K., Qureshi, T.: The 30 Most Spoken Languages of the World. Krysstal, London (2002)Google Scholar
  5. 5.
    Ahmed, T.: Roman to Urdu transliteration using wordlist. In: Proceedings of the Conference on Language and Technology, vol. 305, p. 309 (2009)Google Scholar
  6. 6.
    Kaur, A., Gupta, V.: N-gram based approach for opinion mining of Punjabi text. In: International Workshop on Multi-disciplinary Trends in Artificial Intelligence, pp. 81–88. Springer, Cham (2014)Google Scholar
  7. 7.
    Jebaseel, A., Kirubakaran, D.E.: M-learning sentiment analysis with data mining techniques. Int. J. Comput. Sci. Telecommun. 3(8), 45–48 (2012)Google Scholar
  8. 8.
    Zhang, C., Zuo, W., Peng, T., He, F.: Sentiment classification for chinese reviews using machine learning methods based on string kernel. In: Third International Conference on Convergence and Hybrid Information Technology, ICCIT’08, 2008, vol. 2, pp. 909–914. IEEE, November 2008Google Scholar
  9. 9.
    Syed, A.Z., Aslam, M., Martinez-Enriquez, A.M.: Associating targets with SentiUnits: a step forward in sentiment analysis of Urdu text. Artif. Intell. Rev. 41(4), 535–561 (2014)CrossRefGoogle Scholar
  10. 10.
    Abbasi, A., Chen, H., Salem, A.: Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans. Inf. Syst. 26(3), 12 (2008)CrossRefGoogle Scholar
  11. 11.
  12. 12.
  13. 13.
    Wilbur, W.J., Sirotkin, K.: The automatic identification of stop words. J. Inf. Sci. 18(1), 45–55 (1992)CrossRefGoogle Scholar
  14. 14.
    Fox, C.: A stop list for general text. In: ACM SIGIR forum, vol. 24, no. 1–2, pp. 19–21. ACM, September 1989CrossRefGoogle Scholar
  15. 15.
    R. NL.: Ranks nl webmaster tools (2016).
  16. 16.
  17. 17.
  18. 18.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics, Chicago, July 2002Google Scholar
  19. 19.
    Amor, N.B., Benferhat, S., Elouedi, Z.: Naive bayes vs decision trees in intrusion detection systems. In Proceedings of the 2004 ACM Symposium on Applied Computing, pp. 420–424. ACM, March 2004Google Scholar
  20. 20.
    Domingos, P., Pazzani, M.:. Beyond independence: conditions for the optimality of the simple bayesian classifier. In: Proceedings of the 13th International Conference on Machine Learning, pp. 105–112, Chicago, July 1996Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Punjab University College of Information TechnologyUniversity of the PunjabLahorePakistan

Personalised recommendations