Skip to main content
Log in

Integrating Noun-Based Feature Ranking and Selection Methods with Arabic Text Associative Classification Approach

  • Research Article - Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

Feature ranking and selection (FR&S) is an important preprocessing phase for text classification, and it is in most cases produces small valuable sub-feature space among the whole feature space and reduces the classification errors. As the associative classification (AC) approach is an efficient method and its training and testing depend on the way that features ranked and selected, the examining of feature ranking methods is very significant. This paper presents an integration method of Arabic noun extraction with four FR&S methods: term frequency–inverse document frequency (TF-IDF), document frequency, odd ratio, and class discriminating measure (CDM). Association rule technology uses the result of the integrated feature selection to construct an Arabic text associative classifier. In this study, the majority voting and ordered decision list prediction methods are used by AC to assign test document to its category. A set of experiments are conducted on collection of Arabic text documents, and the experimental results show that our AC method works better with extracted nouns and feature selection method than with feature selection method individually. The AC based on CDM and TF-IDF methods outperforms the other methods in terms of AC accuracy. As the results indicate, the proposed method produces satisfactory classification accuracy and it has good selecting effect on the Arabic text associative classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Makrehchi, M.: Feature Ranking for Text Classifiers. Doctoral Dissertation. Electrical and Computer Engineering Department, University of Waterloo, Ontario, Canada (2007)

  2. Al-Shamsi, F.; Guessoum, A.: A hidden markov model-based POS tagger for Arabic. In: JADT06, 8 Journées Internationales d’Analyse Statistique des Données Textuelles, pp. 31–42. France (2006)

  3. Al-Sughaiyer I., Al-Kharashi I.: Arabic morphological analysis techniques: a comprehensive survey. J. Am. Soc. Info. Sci. Technol. 55(3), 189–213 (2004)

    Article  Google Scholar 

  4. Al-Shalabi, R.; Kanaan, G.; Al-Sarayreh, B.; Khanfar, K.; Al-Ghonmein, A.; Talhouni, H.; Al-Azazmeh, S.: Proper noun extracting algorithm for Arabic language. In: International Conference of Information Technology, Special issue of the International Journal of the Computer, the Internet and Management, 17. Thailand (2009)

  5. Bouras, C.; Tsogkas, V.: Noun retrieval effect on text summarization and delivery of personalized news articles to the user’s desktop. Data Knowl. Eng. J. (2010). doi:10.1016/j.datak.2010.02.005

  6. Chen J., Huang H., Tian S., Qua Y.: Feature selection for text classification with Naïve Bayes. Expert Syst. Appl. 36, 5432–5435 (2009)

    Article  Google Scholar 

  7. Yang, Y.; Pedersen, J.O.: A Comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning, pp. 412–420. Nashville, USA (1997)

  8. Mesleh, A.: Support vector machines based Arabic language text classification system: feature selection comparative study. In: Tarek, S. (ed.) Advances in Computer and Information Sciences and Engineering, pp. 11–16. Springer, Netherlands (2008)

  9. Chiang D.-A., Keh H.-C., Huang H.-H., Chyr D.: The Chinese text categorization system with association rule and category priority. Expert Syst. Appl. 35, 102–110 (2008)

    Article  Google Scholar 

  10. Abdul-Rahman S., Abu Bakar A., Hussein Z.: An intelligent data pre-processing for complex dataset. Intell. Data Anal. 16(2), 305–325 (2012)

    Google Scholar 

  11. Thabtah F.: A Review of associative classification mining. Knowl. Eng. Rev. 22(1), 37–65 (2007)

    Article  Google Scholar 

  12. Liu, B.; Hsu, W.; Ma, Y.: Integrating classification and association rule mining. In: Proceedings of the International Conference on Knowledge Discovery Data Mining, pp. 80–86. New York, NY: AAAI Press (1998)

  13. Xu, X.; Han, G.; Min, H.: A novel algorithm for associative classification of images blocks. In: Proceedings of the 4th IEEE International Conference on Computer and Information Technology, pp. 46–51. Lian, Shiguo, China (2004)

  14. Yin, X.; Han, J.: CPAR: classification based on predictive association rule. In: Proceedings of the SIAM International Conference on Data Mining, pp. 369–376. SIAM Press, San Francisco, CA (2003)

  15. Antonie, M.; Zaiane, O.: Text document categorization by term association. In: IEEE International Conference on Data Mining (ICDM’02), pp. 19–26. Maebashi, Japan (2002)

  16. Christopher J.: A statistical approach for associative classification. Eur. J. Sci. Res. 58(2), 140–147 (2011)

    Google Scholar 

  17. Ibrahim S., Chandran K., Christopher J.: An evolutionary approach for rule set selection in a class based associative classifier. Eur. J. Sci. Res. 50(3), 417–425 (2011)

    Google Scholar 

  18. Thabtah, F.; Hadi, W.; Abu-Mansour, H.; McCluskey, L.: A New rule pruning text categorization method. In: 7th International Multi-Conference on Systems, Signals and Devices, pp. 1–6. IEEE, London, UK (2010)

  19. Bakar, A.A.; Othman, Z.; Nizam, S.; Ismail, R.: Development of knowledge model for insurance product decision using the associative classification approach. In: The International Conference on Intelligent Systems Design and Applications, pp. 1481–1486. Cairo, Egypt, IEEE (2010)

  20. Soni S., Vyas O.P.: Using associative classifiers for predictive analysis in health care data mining. Int. J. Comput. Appl. 4(5), 33–34 (2010)

    Google Scholar 

  21. Soni J., Ansari U., Sharma D.: Intelligent and effective heart disease prediction system using weighted associative classifiers. Int. J. Comput. Sci. Eng. 3(6), 2385–2392 (2011)

    Google Scholar 

  22. Al-Saleem S.: Associative classification to categorize Arabic data sets. Int. J. ACM Jordan 1(3), 118–127 (2010)

    Google Scholar 

  23. Al-Radaideh Q., Al_Shawakfeh E., Ghareb A.S., Abu Salem H.: An approach for Arabic text categorization using association rule mining. Int. J. Comput. Process. Orient. Lang. 23(1), 81–106 (2011)

    Article  Google Scholar 

  24. Hattab A., Hussien A.: Arabic content classification system using statistical Bayes classifier with words detection and correction. World Comput. Sci. Inf. Technol. J. 2(6), 193–196 (2012)

    Google Scholar 

  25. Al-diabat M.: Arabic text categorization using classification rule mining. Appl. Math. Sci. 6(81), 4033–4046 (2012)

    Google Scholar 

  26. Wahbeh A., Al-Kabi M., Al-Radaideh Q., Al-Shawakfa E., AlSmadi I.: The effect of stemming on Arabic text categorization: An empirical study. Int. J. Inf. Retr. Res. 1(3), 54–70 (2011)

    Google Scholar 

  27. Khreisat L.: A Machine learning approach for Arabic text classification using n-gram frequency statistics. J. Informetr. 3, 72–77 (2009)

    Article  Google Scholar 

  28. Zahran, M.; Kanaan, G.: Text feature selection using particle swarm optimization algorithm. World Appl. Sci. J. 7 (Special Issue of Computer & IT), 69–74 (2009)

  29. Harrag F., Al-Qawasmah E.: Improving Arabic text categorization using neural network with SVD. J. Digit. Inf. Manag. 8(4), 233–239 (2010)

    Google Scholar 

  30. Jbara K.: Knowledge discovery in Al-Hadith using text classification algorithm. J. Am. Sci. 6(11), 409–419 (2010)

    Google Scholar 

  31. Al-Serhan, H.; Al Shalabi, R.; Kannan, G.: New approach for extracting Arabic roots. In: Proceedings of the 2003 Arab Conference on Information Technology, pp. 42–59. Potland, Oregon, USA (2003)

  32. Lee, D.G.; Rim, H.C.; Lim, H.S.: A syllable based word recognition model for Korean noun extraction. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, ACL, pp. 471–478. Sapporo, Japan (2003)

  33. Alkhatib, K.; Badarneh, A.: Automatic extraction of Arabic multi-words terms. In: Proceedings of the International Multi-Conference on Computer Science and Information Technology, pp. 411–418 (2010)

  34. Abdel Rahman S., Elarnaoty M., Magdy M., Fahmy A.: Integrated machine learning techniques for Arabic named entity recognition. Int. J. Comput. Sci. Issues 7(4), 27–36 (2010)

    Google Scholar 

  35. Abdul-Hamid, A.; Darwish, K.: Simplified feature set for Arabic named entity recognition. In: Proceedings of the 2010 Named Entities Workshop, ACL, 16 July 2010, pp. 110–115. Uppsala, Sweden (2010)

  36. Salton G., Buckley C.: Term weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)

    Article  Google Scholar 

  37. Mutter, S.: Classification using Association Rules. Master thesis, Department of Computer Science, University of Freiburg, Germany (2004)

  38. Abbas M., Smaili K., Berkani D.: Evaluation of topic identification methods on Arabic corpora. J. Digit. Inf. Manag. 9(5), 185–192 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdullah S. Ghareb.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghareb, A.S., Hamdan, A.R. & Bakar, A.A. Integrating Noun-Based Feature Ranking and Selection Methods with Arabic Text Associative Classification Approach. Arab J Sci Eng 39, 7807–7822 (2014). https://doi.org/10.1007/s13369-014-1304-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-014-1304-3

Keywords

Navigation