Abstract
In this era when data is generated continuously in various domains of machine learning, different algorithms are budding to improve and enhance the learning process. Clustering is one of such machine learning techniques. It is considered to be most important tool of unsupervised learning but it is sensitive to outlier. Thus it is essential to remove the outlier before clustering the data. Most of the outlier detection techniques require some user-defined parameters, which make their accuracy user-dependent. Thus an algorithm which is least dependent on user-defined values is proposed here. The algorithm takes number of cluster in which user want to cluster its data and detect outlier within those clusters using Silhouette Coefficient. The algorithm was compared with some of the existing algorithm in domain of outlier detection. And the experimental analysis is performed on some relevant benchmark dataset presented in UCI repository. Through the experimental results it can be seen that the algorithm we have proposed has performed better than the existing algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Xu, D., Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data Sci. 2(2), 165–193 (2015)
Ding, S., Wu, F., Qian, J., Jia, H., Jin, F.: Research on data stream clustering algorithms. Artif. Intell. Rev. (2013)
Vu Viet Thang, Pantiukhin, D.V., Nazarov, A.N.: FLDS: fast outlier detection based on local density score. In: International Conference on Engineering and Telecommunication, pp. 137–141 (2016)
Aggarwal, C.C.: Data Mining: The Textbook. Springer International Publishing, Switzerland (2015)
Ahmed, M., Naser, A.: A novel approach for outlier detection and clustering improvement. In: 8th IEEE Conference on Industrial Electronics and Applications (ICIEA), pp. 577–582 (2013)
Songma, S., Chimphlee, W., Maichalernnukul, K., Sanguansat, P.: Classification via k-means clustering and distance-based outlier detection. In: Proceedings of Tenth International Conference on ICT and Knowledge Engineering, pp. 125–128 (2012)
Ashok, P., Kadhar Nawaz, G.M.: Detecting outliers on UCI repository datasets by adaptive rough fuzzy clustering method. In: Green Engineering and Technologies (IC-GET), Online International Conference (2016)
Christy, A., Meera, Gandhi G., Vaithyasubramanian, S.: Cluster based outlier detection algorithm for healthcare data. Procedia Comput. Sci. 50, 209–215 (2015)
Li, X., Lv, K., Xiong, C., Xiong, Z.: An improved K-means text clustering algorithm by optimizing initial cluster centres. In: International Conference on Cloud Computing and Big Data (2016)
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31, 651–666 (2010)
Kovacs, F., Legancy, C., Babos, A.: Cluster validity measurement techniques. In: Proceedings of Sixth International Symposium on Hungarian Researchers on Computational Intelligence (CINTI) (2005)
Berkhin, P.: Survey of clustering data mining techniques. Technical report, Accrue software (2002)
Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: Proceedings of IEEE International Conference on Data Mining, pp. 911–916 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Lodhi, P., Mishra, O., Rajpoot, D.S. (2019). Sorted Outlier Detection Approach Based on Silhouette Coefficient. In: Rawat, B., Trivedi, A., Manhas, S., Karwal, V. (eds) Advances in Signal Processing and Communication . Lecture Notes in Electrical Engineering, vol 526. Springer, Singapore. https://doi.org/10.1007/978-981-13-2553-3_19
Download citation
DOI: https://doi.org/10.1007/978-981-13-2553-3_19
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2552-6
Online ISBN: 978-981-13-2553-3
eBook Packages: EngineeringEngineering (R0)