Abstract
K-NN is a classification algorithm which suitable for large amounts of data and have higher accuracy for internet traffic classification, unfortunately K-NN algorithm has disadvantage in computation time because K-NN algorithm calculates the distance of all data in some dataset. This research provide alternative solution to overcome K-NN computation time, the alternative solution is to implement clustering process before the classification process. Clustering process does not require high computation time. Fuzzy C-Mean algorithm is implemented in this research. The Fuzzy C-Mean algorithm clusters the based datasets that be entered. Fuzzy C-Mean has disadvantage of clustering, that is the results are often not the same even though the input data are same, and the initial dataset that of the Fuzzy C-Mean is not optimal, to optimize the initial datasets, in this research, feature selection algorithm is used, after selecting the main feature of dataset, the output from fuzzy C-Mean become consistent. Selection of the features is a method that is expected to provide an initial dataset that is optimum for the algorithm Fuzzy C-Means. Algorithms for feature selection in this study used is Principal Component Analysis (PCA). PCA reduced nonsignificant attribute to created optimal dataset and can improve performance clustering and classification algorithm. Results of this research is clustering and principal feature selection give signifanct impact in accuracy and computation time for internet traffic classification. The combination from this three methods have successfully modeled to generate a data classification method of internet bandwidth usage.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lou, X., Li, J., Liu, H.: Improved fuzzy C-means clustering algorithm based on cluster density related work. J. Comput. Inf. Syst. 2(January), 727–737 (2012)
Zhang, L., Liu, Q., Yang, W., Wei, N., Dong, D.: An improved k-nearest neighbor model for short-term traffic flow prediction. In: Procedia—Social and Behavioral Sciences, vol. 96 (Cictp), pp. 653–662 (2013). doi:10.1016/j.sbspro.2013.08.076
Lee, Y.-H., Wei, C.-P., Cheng, T.-H., Yang, C.-T.: Nearest-neighbor-based approach to time-series classification. Decis. Support Syst. 53(1), 207–217 (2012). doi:10.1016/j.dss.2011.12.014
Berget, I., Mevik, B.-H., Næs, T.: New modifications and applications of fuzzy-means methodology. Comput. Stat. Data Anal. 52(5), 2403–2418 (2008). doi:10.1016/j.csda.2007.10.020
Esbensen, K.H.: Principal Component Analysis: Concept, Geometrical Interpretation, Mathematical Background, Algorithms, History, Practice. Elsevier, New York (2009)
Wang, F.: Factor Analysis and Principal-Component Analysis. Elsevier, New York (2009)
Paramita, A.S.: Feature selection technique using principal component analysis for improving fuzzy C-mean internet traffic classification. Aust. J. Basic Appl. Sci. 8(14), 13–18 (2014)
Antonio, T., Paramita, A.S.: Full paper feature selection technique impact for internet traffic classification using naïve Bayesian. JurnalTeknologi 20, 85–88 (2014)
Acknowledgments
We would like to thank to Indonesian Higher Education and Research for this opportunity and research grant, and also for University Of Ciputra for research facility.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this paper
Cite this paper
Wiradinata, T., Adi Suryaputra, P. (2016). Clustering and Principal Feature Selection Impact for Internet Traffic Classification Using K-NN. In: Pasila, F., Tanoto, Y., Lim, R., Santoso, M., Pah, N. (eds) Proceedings of Second International Conference on Electrical Systems, Technology and Information 2015 (ICESTI 2015). Lecture Notes in Electrical Engineering, vol 365. Springer, Singapore. https://doi.org/10.1007/978-981-287-988-2_7
Download citation
DOI: https://doi.org/10.1007/978-981-287-988-2_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-287-986-8
Online ISBN: 978-981-287-988-2
eBook Packages: EngineeringEngineering (R0)