Skip to main content
Log in

High performance feature selection algorithms using filter method for cloud-based recommendation system

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

In cloud-based recommendation system, the feature selection is implemented to reduce the large dimension of the cloud data. The feature selection increases the performance of the recommendation system without affecting the accuracy of the system. In this paper two filter model based algorithms SFS and MSFS are proposed to extract the necessary features for the recommendation system. The state of the art Naive bayes classification algorithm is used to evaluate the performance of the feature selection algorithm. The bench mark datasets Newsgroups, WebKB and Book Crossing are used for performance evaluation. The experimental results show that the proposed algorithm is superior to the existing feature selection algorithms T-Score, Information Gain and Chi squared.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and regression trees. CRC Press, Baco Raton (1984)

    MATH  Google Scholar 

  2. Chai, H., Domeniconi, C.: An evaluation of gene selection methods for multi-class microarray data classification. In: Proceedings of the Second European Workshop on Data Mining and Text Mining in Bioinformatics, pp. 7–14, (2004)

  3. Chuang, L.Y., Chang, H.W., Tu, C.J., Yang, C.H.: Improved binary PSO for feature selection using gene expression data. Comput. Biol. Chem. 32(1), 29–38 (2008)

    Article  MATH  Google Scholar 

  4. Chuang, L.-Y., Yang, C.H., Li, J.C., Yang, C.H.: A hybrid BPSO-CGA approach for gene selection and classification of microarray data. J. Comput. Biol. 19(1), 68–82 (2012)

    Article  MathSciNet  Google Scholar 

  5. Das, S.: Filters, wrappers and a boosting-based hybrid for feature selection. In: Proceedings of the International Conference on ICML vol. 1, pp. 74–81 (2001)

  6. Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(1–4), 131–156 (1997)

    Article  Google Scholar 

  7. Dash, M., Liu, H.: Consistency-based search in feature selection. Artif. Intell. 151(1–2), 155–176 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  8. Fragoudis, D., Meretakis, D., Likothanassis, S.: Best terms: an efficient feature-selection algorithm for text categorization. Knowl. Inf. Syst. 8(1), 16–33 (2005)

    Article  Google Scholar 

  9. Fuhr, N., Hartmann, S., Lustig, G., Schwantner, M., Tzeras, K., Knorz, G.: AIR/X-a rule based multistage indexing system for large subject fields. In: Proceedings of the RIAO, vol. 91, pp. 606–623, (1991)

  10. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learning Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  11. Howland, P., Park, H.: Generalizing discriminant analysis using the generalized singular value decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 26(8), 995–1006 (2004)

    Article  Google Scholar 

  12. Jeon, M., Park, H., Rosen, J.B.: Dimension reduction based on centroids and least squares for efficient processing of text data. In: Proceedings of the 2001 SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, pp. 1–13, (2001)

  13. Karaboga, D., Ozturk, C.: A novel clustering approach: artificial bee colony (ABC) algorithm. Appl. Soft Comput. 11(1), 652–657 (2011)

    Article  Google Scholar 

  14. Kim, Y., Street, W., Menczer, F.: Feature selection for unsupervised learning via evolutionary search. In: Proceedings of the Sixth ACM SIGKDD International Conference, on Knowledge Discovery and Data Mining, pp 365–369, (2000)

  15. Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: Proceedings of the 10th National Conference Artificial Intelligence, pp. 129–134, (1992)

  16. Krishna, B.B., Mouli, K.C.: A novel subset selection clustering-based algorithm for high dimensional data. IJSEAT 3(7), 246–251 (2015)

    Google Scholar 

  17. Langley, P.: Selection of relevant features in machine learning. In: Proceedings of the AAAI Fall symposium on relevance, vol. 184, pp. 245–271, (1994)

  18. Liping, W.: Feature selection algorithm based on conditional dynamic mutual information. Int. J. Smart Sensing Intell. Syst. 8(1), 316–337 (2015)

    Article  Google Scholar 

  19. Mengle, S.S., Goharian, N.: Ambiguity measure feature-selection algorithm. J. Assoc. Inf. Science Technol. 60(5), 1037–1050 (2009)

    Article  Google Scholar 

  20. Mohamad, M.S., Omatu, S., Deris, S., Yoshioka, M.: A modified binary particle swarm optimization for selecting the small subset of informative genes from gene expression data. IEEE Trans. Inf Technol. Biomed. 15(6), 813–822 (2011)

    Article  Google Scholar 

  21. Ng, A.Y.: On feature selection: learning with exponentially many irrevelant features as training examples. Doctoral dissertation, Massachusetts Institute of Technology. (1998)

  22. Nguyen, T.T.S., Lu, H.Y., Lu, J.: Web-page recommendation based on web usage and domain knowledge. IEEE Trans. Knowl. Data Eng. 26(10), 2574–2587 (2014)

    Article  Google Scholar 

  23. Ozturk, C., Hancer, E., Karaboga, D.: Dynamic clustering with improved binary artificial bee colony algorithm. Appl. Soft Comput. 28, 69–80 (2015)

    Article  Google Scholar 

  24. Rogati, M., Yang, Y.: High-performing feature selection for text classification. In: Proceedings of the International Conference on Information and Knowledge Management, pp. 659–661, (2003)

  25. Seo, M., Oh, S.: CBFS: High performance feature selection algorithm based on feature clearness. PLoS ONE 7(7), e40419 (2012)

    Article  Google Scholar 

  26. Shang, W., Huang, H., Zhu, H., Lin, Y., Qu, Y., Wang, Z.: A novel feature selection algorithm for text categorization. Expert Syst. Appl. 33(1), 1–5 (2007)

    Article  Google Scholar 

  27. Song, Q., Ni, J., Wang, G.: A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng. 25(1), 1–14 (2013)

    Article  Google Scholar 

  28. Souza, J., Japkowicz, N., Matwin, S.: Feature selection with a general hybrid algorithm. In: Proceedings of the SIAM International Conference on Data Mining, p. 45, (2005)

  29. Xing, E.P., Jordan, M.I., Karp, R.M.: Feature selection for high-dimensional genomic microarray data. In: Proceedings of the International Conference on ICML, vol. 1, pp. 601–608, (2001)

  30. Hall, M.A., Smith, L.A.: Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. In: Proceedings of the Twelfth International FLAIRS Conference

  31. Yan, J., Liu, N., Zhang, B., Yan, S., Chen, Z., Cheng, Q., Fan, W., Ma, W.Y.: OCFS: optimal orthogonal centroid feature selection for text categorization. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 122–129, (2005)

  32. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the International Conference on ICML vol. 97, pp. 412–420, (1997)

  33. Yang, J., Liu, Y., Zhu, X., Liu, Z., Zhang, X.: A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization. Inf. Process. Manage. 48(4), 741–754 (2012)

    Article  Google Scholar 

  34. Yang, J., Qu, Z., Liu, Z.: Improved feature-selection method considering the imbalance problem in text categorization. Sci. World J. (2014). https://doi.org/10.1155/2014/625342

    Google Scholar 

  35. Yu, J., Abidi, S.S.R., Artes, P.H.: A hybrid feature selection strategy for image defining features: towards interpretation of optic nerve images. In: Proceedings of the 2005 International Conference on Machine Learning and Cybernetics (2005)

  36. Zhang, C., Ouyang, D., Ning, J.: An artificial bee colony approach for clustering. Expert Syst. Appl. 37(7), 4761–4767 (2010)

    Article  Google Scholar 

  37. Zhang, D., Hsu, C.H., Chen, M., Chen, Q., Xiong, N., Lloret, J.: Cold-start recommendation using bi-clustering and fusion for large-scale social recommender systems. IEEE Trans. on Emerg. Topics Comput. 2(2), 239–250 (2014)

    Article  Google Scholar 

  38. Zhang, Y., Wang, S., Phillips, P., Ji, G.: Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl.-Based Syst. 64, 22–31 (2014)

    Article  Google Scholar 

  39. Zhao, Z, Liu, H.: Searching for interacting features. In: Proceedings of the IJCAI’07 20th International Joint Conference on Artifical intelligence pp. 1156–1161 (2007)

  40. Zhu X., Dong, F., Luo, J., Wang, J., Shen, J.: A personalized hybrid recommendation system oriented to e-commerce mass data in the cloud. In: Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, pp. 1020–1025, (2013)

  41. Ziegler, C., McNee, S., Konstan, J., Lausen, G.: Improving recommendation lists through topic diversification. In: Proceedings of the 14th International Conference on World Wide Web, ACM, pp. 22–32, (2005)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Muthusankar.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Muthusankar, D., Kalaavathi, B. & Kaladevi, P. High performance feature selection algorithms using filter method for cloud-based recommendation system. Cluster Comput 22 (Suppl 1), 311–322 (2019). https://doi.org/10.1007/s10586-018-1901-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-018-1901-0

Keywords

Navigation