Abstract
KNNModel algorithm is an improved version for k-nearest neighbor method. However, it has the problem of high time complexity and lower performance when dealing with complex data. An optimal subspace classification method called IKNNModel is proposed in this paper by projecting different training samples onto their own optimal subspace and constructing the corresponding class cluster and pure cluster as the basis of classification. For datasets with complex structure, that is, the training samples from different categories are overlapped with one another on the original space or have a high dimensionality, the proposed method can construct the corresponding clusters for the overlapped samples on their own subspaces easily. Experimental results show that compared with KNNModel, the proposed method not only significantly improves the classification performance on datasets with complex structure, but also improves the efficiency of the classification.
Similar content being viewed by others
References
Leopold E, indermann J (2002) Text categorization with support vector machines: how to represent texts in input space? Mach Learn 46(13):423–444
Liu Z, Wu Q, Zhang Y et al (2011) Adapive least squares support vector machines filter for hand tremor canceling in microsurgery. Int J Mach Learn Cyber 2(1):37–47
He Q, Wu C (2011) Separating theorem of samples in Banach space for support vector machine learning. Int J Mach Learn Cyber 2(1):49–54
Su J, Zhang B, Xu X (2006) Advances in machine learning based text categorization. J Softw 17(9):1848–1859
Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Yang Q, Wu X (2006) 10 Challenging problems in data mining research. J Inf Technol Decis Mak 5(4):597–604
Wang Z, Hou Z, Gao Y (2009) An improved KNN algorithm for bolean sequence. Pattern Recognit Artif Intell 22(2):330–336
Hechenbichler K, Schliep K (2004) Weighted k-nearest-neighbor techniques and ordinal classification, http://epud.ub.uni-muenchen.de/1769/paper-399.pdf
Bian HY (2002) Fuzzy-Rough nearest neighbor classification: an integrated framework. In: Proceedings of the IASTED international symposium on artificial intelligence and applications. Malaga, Spain, pp 160–164
Zhang X, Huang H (2009) An improved KNN text categorization algorithm by adopting cluster technology. Pattern Recognit Artif Intell 22(6):936–940
Guo G, Wang H, Bell D et al (2003) KNN model-based approach in classification. In: Proceedings of the CoopIS/DOA/ODBASE, pp 986–996
Guo G, Wang H, Bell D et al (2006) Using KNN model for automatic text categorization. Soft Comput 10(5):423–430
Zhang J, Chen L, Guo G et al (2011) Multi-representatives-based algorithm for subspace classification. J Frontiers Comp Sci Technol 5(11):1037–1047
Castillo JJ (2011) A Word Net-based semantic approach to textual entailment and cross-lingual textual entailment. Int J Mach Learn Cyber 2(3):177–189
Chen L, Guo G, Jiang Q (2010) Adaptive algorithm for soft subspace clustering. J Softw 10(21):2513–2523
Guo G, Chen S, Chen L (2011) Soft subspace clustering with an improved feature weight self-adjustment mechanism. Int J Mach Learn Cyber. doi:10.1007/s13042-011-0039-7
Du Y, Lu D, Huang F et al (2006) A random projected clustering algorithm facing high-dimensional categorical data. Mini-Micro Syst 27(9):1605–1607
Graaff AJ, Engelbrecht AP (2011) Clustering data in stationary environments with a local network neighborhood artificial immune system. Int J Mach Learn Cyber. doi:10.1007/s13042-011-0041-0
Li J, Han G, Wen J et al (2011) Robust tensor subspace learning for anomaly detection. Int J Mach Learn Cyber 2(2):89–98
Moise G, Sander J, Ester M (2008) Robust projected clustering. Knowl Inf Syst 14(3):273–398
Jing L, Ng MK, Xu J et al (2007) An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans Knowl Data Eng 19(8):1026–1041
Liang J, Song W (2011) Clustering based on Steiner points. Int J Mach Learn Cyber. doi:10.1007/s13042-011-0047-7
Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor Newlett 6(1):90–105
Verleysen M (2003) Learning high-dimensional data. In: Proceedings of the limitations and future trends in neural computation. IOS Press, Siena, pp 141–162
Rennie JD, Shih L, Teeven J et al (2003) Tacking the poor assumptions of navie bayes text classifiers. In: Proceedings of the 12th international conference on machine learning. Washington DC, pp 616–623
Jiang B, Li X, Wang H et al (2007) Methods of pattern classification. Syst Eng Electron 29(1):99–102
Selvakuberan K, Indradevi M, Rajaram R (2008) Combined feature selection and classification—a novel approach for the categorization of web pages. J Inf Comp Sci 3(2):83–89
Acknowledgments
This work was supported by the National Natural Science Foundation of China under Grant No. 61070062, the key Scientific Research Project on the Cooperation of Industry and University of Fujian Province of China under Grant No. 2010H6007 and the key Scientific Research Project of the Higher Education Institutions of Fujian Province of China under Grant No. JK2009006.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, N., Guo, GD., Chen, LF. et al. Optimal subspace classification method for complex data. Int. J. Mach. Learn. & Cyber. 4, 163–171 (2013). https://doi.org/10.1007/s13042-012-0080-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-012-0080-1