Skip to main content
Log in

A modified multi objective heuristic for effective feature selection in text classification

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Text categorization is the process of sorting text documents into one or more predefined categories or classes of similar documents. Differences in the results of such categorization arise from the feature set chosen to base the association of a given document with a given category. This process is challenging mainly because there can be large number of discriminating words which render many of the current algorithms unable to complete this. For most of these tasks there exist both relevant as well as irrelevant features. The objective here is to bring about a text classification on the basis of the features selected and also pre-processing to bring down the dimensionality and increase the accuracy of classification of the feature vector. Here the most commonly used methods are meta-heuristic algorithms in order to facilitate selection. Artificial fish swarm algorithm (AFSA) takes the underlying intelligence of the behaviour of fish swarming to combat the problems of optimization as well as the combinatorial problems. This method has been greatly successful in diverse applications but does suffer from certain limitations like not having multiplicity. Therefore, a modification has been proposed to AFSA which is MAFSA that has a crossover in its operation in order to bring about an improvement in the text classification selection. SVM or Support Vector Machine, Adaboost classifiers and naïve bayes are all used here. MAFSA has proved itself to be superior to AFSA in terms of precision and also the selected feature numbers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Jain, R., Pise, N.: Feature selection for effective text classification using semantic information. Int. J. Comput. Appl. 113(10), 18–25 (2015)

    Google Scholar 

  2. Wei, L., Wei, B., Wang, B.: Text classification using support vector machine with mixture of Kernel. J. Softw. Eng. Appl. 5(12), 55 (2013)

    Article  Google Scholar 

  3. Bikić, M.: Text classification using support vector machine (Doctoral dissertation, Fakultet elektrotehnike i računartsva, Svučilište u Zagrebu) (2010)

  4. Javed, K., Maruf, S., Babri, H.A.: A two-stage Markov blanket based feature selection algorithm for text classification. Neurocomputing 157, 91–104 (2015)

    Article  Google Scholar 

  5. Ramya, M., Pinakas, A.: Different type of feature selection for text classification. Intl. J. Comput. Trends Technol. (IJCTT) 10(2), 102–107 (2014)

    Article  Google Scholar 

  6. Lin, K.C., Chen, S.Y., Hung, J.C.: Feature selection and parameter optimization of support vector machines based on modified artificial fish swarm algorithms. Math. Problems Eng. 2015, 9 (2015)

    Google Scholar 

  7. Günal, S.: Hybrid feature selection for text classification. Turkish J. Electr. Eng. Comput. Sci. 20(Sup. 2), 1296–1311 (2012)

    MathSciNet  Google Scholar 

  8. Korde, V., Mahender, C.N.: Text classification and classifiers: a survey. Int. J. Artif. Intell. Appl. 3(2), 85 (2012)

    Google Scholar 

  9. Lamirel, J.C., Cuxac, P., Chivukula, A.S., Hajlaoui, K.: Optimizing text classification through efficient feature selection based on quality metric. J. Intell. Inf. Syst. 45(3), 379–396 (2015)

    Article  Google Scholar 

  10. Zareapoor, M., Seeja, K.R.: Feature extraction or feature selection for text classification: a case study on Phishing email detection. Int. J. Inf. Eng. Electron. Bus. 7(2), 60 (2015)

    Google Scholar 

  11. Pinheiro, R.H., Cavalcanti, G.D., Ren, T.I.: Data-driven global-ranking local feature selection methods for text categorization. Expert Syst. Appl. 42(4), 1941–1949 (2015)

    Article  Google Scholar 

  12. Jiang, L., Li, C., Wang, S., Zhang, L.: Deep feature weighting for naive Bayes and its application to text classification. Eng. Appl. Artif. Intell. 52, 26–39 (2016)

    Article  Google Scholar 

  13. Feng, G., Guo, J., Jing, B.Y., Sun, T.: Feature subset selection using naive Bayes for text classification. Pattern Recognit. Lett. 65, 109–115 (2015)

    Article  Google Scholar 

  14. Wang, S., Jiang, L., Li, C.: Adapting naive Bayes tree for text classification. Knowl. Inf. Syst. 44(1), 77–89 (2015)

    Article  Google Scholar 

  15. Tang, B., Kay, S., He, H.: Toward optimal feature selection in naive Bayes for text categorization. IEEE Trans. Knowl. Data Eng. 28(9), 2508–2521 (2016)

    Article  Google Scholar 

  16. Tang, B., He, H., Baggenstoss, P.M., Kay, S.: A Bayesian classification approach using class-specific features for text categorization. IEEE Trans. Knowl. Data Eng. 28(6), 1602–1606 (2016)

    Article  Google Scholar 

  17. Aghdam, M.H., Heidari, S.: Feature selection using particle swarm optimization in text categorization. J. Artif. Intell. Soft Comput. Res. 5(4), 231–238 (2015)

    Article  Google Scholar 

  18. Lin, K.C., Chen, S.Y., Hung, J.C.: Feature selection for support vector machines base on modified artificial fish swarm algorithm. In: Ubiquitous Computing Application and Wireless Sensor, pp. 297–304. Springer, Netherlands (2015)

  19. Albitar, S., Fournier, S., Espinasse, B.: The impact of conceptualization on text classification. In: International Conference on Web Information Systems Engineering, pp. 326–339. Springer, Berlin (2012, November)

  20. El-henawy, I.M., Ismail, M.M.: A hybrid swarm intelligence technique for solving integer multi-objective problems. Int. J. Comput. Appl. 87(3), 45–50 (2014)

    Google Scholar 

  21. bin Basir, M.A., binti Ahmad, F.: Comparison on swarm algorithms for feature selections/reductions. Int. J. Sci. Eng. 5(8), 479–486 (2014)

    Google Scholar 

  22. Huang, Z., Chen, Y.: An improved artificial fish swarm algorithm based on hybrid behavior selection. Int. J. Control Autom. 6(5), 103–116 (2013)

    Article  Google Scholar 

  23. Wu, Y., Gao, X.Z., Zenger, K.: Knowledge-based artificial fish-swarm algorithm. IFAC Proc. Vol. 44(1), 14705–14710 (2011)

    Article  Google Scholar 

  24. Chavan, G.S., Manjare, S., Hegde, P., Sankhe, A.: A survey of various machine learning techniques for text classification. Int. J. Eng. Trends Technol. (IJETT) 15(6), 288–292 (2014)

    Article  Google Scholar 

  25. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In European conference on machine learning, pp. 137–142. Springer, Berlin (1998, April)

  26. Lin, K.C., Huang, Y.H., Hung, J.C., Lin, Y.T.: Feature selection and parameter optimization of support vector machines based on modified cat swarm optimization. Int, J. Distrib. Sens. Netw. 11(7), 365869 (2015)

  27. Tang, H., Wu, J., Lin, Z., Lu, M.: An enhanced AdaBoost algorithm with naive Bayesian text categorization based on a novel re-weighting strategy. Int. J. Innov. Comput. Inf. Control 6(11), 5299–5310 (2010)

    Google Scholar 

  28. Devi, T.S., Sundaram, K.M.: A comparative analysis of meta and tree classification algorithms using WEKA. Int. Res. J. Eng. Technol. (IRJET) 3(11), 77–83 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Thiyagarajan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Thiyagarajan, D., Shanthi, N. A modified multi objective heuristic for effective feature selection in text classification. Cluster Comput 22 (Suppl 5), 10625–10635 (2019). https://doi.org/10.1007/s10586-017-1150-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-1150-7

Keywords

Navigation