Abstract
In cloud-based recommendation system, the feature selection is implemented to reduce the large dimension of the cloud data. The feature selection increases the performance of the recommendation system without affecting the accuracy of the system. In this paper two filter model based algorithms SFS and MSFS are proposed to extract the necessary features for the recommendation system. The state of the art Naive bayes classification algorithm is used to evaluate the performance of the feature selection algorithm. The bench mark datasets Newsgroups, WebKB and Book Crossing are used for performance evaluation. The experimental results show that the proposed algorithm is superior to the existing feature selection algorithms T-Score, Information Gain and Chi squared.
Similar content being viewed by others
References
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and regression trees. CRC Press, Baco Raton (1984)
Chai, H., Domeniconi, C.: An evaluation of gene selection methods for multi-class microarray data classification. In: Proceedings of the Second European Workshop on Data Mining and Text Mining in Bioinformatics, pp. 7–14, (2004)
Chuang, L.Y., Chang, H.W., Tu, C.J., Yang, C.H.: Improved binary PSO for feature selection using gene expression data. Comput. Biol. Chem. 32(1), 29–38 (2008)
Chuang, L.-Y., Yang, C.H., Li, J.C., Yang, C.H.: A hybrid BPSO-CGA approach for gene selection and classification of microarray data. J. Comput. Biol. 19(1), 68–82 (2012)
Das, S.: Filters, wrappers and a boosting-based hybrid for feature selection. In: Proceedings of the International Conference on ICML vol. 1, pp. 74–81 (2001)
Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(1–4), 131–156 (1997)
Dash, M., Liu, H.: Consistency-based search in feature selection. Artif. Intell. 151(1–2), 155–176 (2003)
Fragoudis, D., Meretakis, D., Likothanassis, S.: Best terms: an efficient feature-selection algorithm for text categorization. Knowl. Inf. Syst. 8(1), 16–33 (2005)
Fuhr, N., Hartmann, S., Lustig, G., Schwantner, M., Tzeras, K., Knorz, G.: AIR/X-a rule based multistage indexing system for large subject fields. In: Proceedings of the RIAO, vol. 91, pp. 606–623, (1991)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learning Res. 3, 1157–1182 (2003)
Howland, P., Park, H.: Generalizing discriminant analysis using the generalized singular value decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 26(8), 995–1006 (2004)
Jeon, M., Park, H., Rosen, J.B.: Dimension reduction based on centroids and least squares for efficient processing of text data. In: Proceedings of the 2001 SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, pp. 1–13, (2001)
Karaboga, D., Ozturk, C.: A novel clustering approach: artificial bee colony (ABC) algorithm. Appl. Soft Comput. 11(1), 652–657 (2011)
Kim, Y., Street, W., Menczer, F.: Feature selection for unsupervised learning via evolutionary search. In: Proceedings of the Sixth ACM SIGKDD International Conference, on Knowledge Discovery and Data Mining, pp 365–369, (2000)
Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: Proceedings of the 10th National Conference Artificial Intelligence, pp. 129–134, (1992)
Krishna, B.B., Mouli, K.C.: A novel subset selection clustering-based algorithm for high dimensional data. IJSEAT 3(7), 246–251 (2015)
Langley, P.: Selection of relevant features in machine learning. In: Proceedings of the AAAI Fall symposium on relevance, vol. 184, pp. 245–271, (1994)
Liping, W.: Feature selection algorithm based on conditional dynamic mutual information. Int. J. Smart Sensing Intell. Syst. 8(1), 316–337 (2015)
Mengle, S.S., Goharian, N.: Ambiguity measure feature-selection algorithm. J. Assoc. Inf. Science Technol. 60(5), 1037–1050 (2009)
Mohamad, M.S., Omatu, S., Deris, S., Yoshioka, M.: A modified binary particle swarm optimization for selecting the small subset of informative genes from gene expression data. IEEE Trans. Inf Technol. Biomed. 15(6), 813–822 (2011)
Ng, A.Y.: On feature selection: learning with exponentially many irrevelant features as training examples. Doctoral dissertation, Massachusetts Institute of Technology. (1998)
Nguyen, T.T.S., Lu, H.Y., Lu, J.: Web-page recommendation based on web usage and domain knowledge. IEEE Trans. Knowl. Data Eng. 26(10), 2574–2587 (2014)
Ozturk, C., Hancer, E., Karaboga, D.: Dynamic clustering with improved binary artificial bee colony algorithm. Appl. Soft Comput. 28, 69–80 (2015)
Rogati, M., Yang, Y.: High-performing feature selection for text classification. In: Proceedings of the International Conference on Information and Knowledge Management, pp. 659–661, (2003)
Seo, M., Oh, S.: CBFS: High performance feature selection algorithm based on feature clearness. PLoS ONE 7(7), e40419 (2012)
Shang, W., Huang, H., Zhu, H., Lin, Y., Qu, Y., Wang, Z.: A novel feature selection algorithm for text categorization. Expert Syst. Appl. 33(1), 1–5 (2007)
Song, Q., Ni, J., Wang, G.: A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng. 25(1), 1–14 (2013)
Souza, J., Japkowicz, N., Matwin, S.: Feature selection with a general hybrid algorithm. In: Proceedings of the SIAM International Conference on Data Mining, p. 45, (2005)
Xing, E.P., Jordan, M.I., Karp, R.M.: Feature selection for high-dimensional genomic microarray data. In: Proceedings of the International Conference on ICML, vol. 1, pp. 601–608, (2001)
Hall, M.A., Smith, L.A.: Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. In: Proceedings of the Twelfth International FLAIRS Conference
Yan, J., Liu, N., Zhang, B., Yan, S., Chen, Z., Cheng, Q., Fan, W., Ma, W.Y.: OCFS: optimal orthogonal centroid feature selection for text categorization. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 122–129, (2005)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the International Conference on ICML vol. 97, pp. 412–420, (1997)
Yang, J., Liu, Y., Zhu, X., Liu, Z., Zhang, X.: A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization. Inf. Process. Manage. 48(4), 741–754 (2012)
Yang, J., Qu, Z., Liu, Z.: Improved feature-selection method considering the imbalance problem in text categorization. Sci. World J. (2014). https://doi.org/10.1155/2014/625342
Yu, J., Abidi, S.S.R., Artes, P.H.: A hybrid feature selection strategy for image defining features: towards interpretation of optic nerve images. In: Proceedings of the 2005 International Conference on Machine Learning and Cybernetics (2005)
Zhang, C., Ouyang, D., Ning, J.: An artificial bee colony approach for clustering. Expert Syst. Appl. 37(7), 4761–4767 (2010)
Zhang, D., Hsu, C.H., Chen, M., Chen, Q., Xiong, N., Lloret, J.: Cold-start recommendation using bi-clustering and fusion for large-scale social recommender systems. IEEE Trans. on Emerg. Topics Comput. 2(2), 239–250 (2014)
Zhang, Y., Wang, S., Phillips, P., Ji, G.: Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl.-Based Syst. 64, 22–31 (2014)
Zhao, Z, Liu, H.: Searching for interacting features. In: Proceedings of the IJCAI’07 20th International Joint Conference on Artifical intelligence pp. 1156–1161 (2007)
Zhu X., Dong, F., Luo, J., Wang, J., Shen, J.: A personalized hybrid recommendation system oriented to e-commerce mass data in the cloud. In: Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, pp. 1020–1025, (2013)
Ziegler, C., McNee, S., Konstan, J., Lausen, G.: Improving recommendation lists through topic diversification. In: Proceedings of the 14th International Conference on World Wide Web, ACM, pp. 22–32, (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Muthusankar, D., Kalaavathi, B. & Kaladevi, P. High performance feature selection algorithms using filter method for cloud-based recommendation system. Cluster Comput 22 (Suppl 1), 311–322 (2019). https://doi.org/10.1007/s10586-018-1901-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-018-1901-0