Abstract
Data mining is the extraction of different data of intriguing as such (constructive, relevant, constructive, previously unexplored and considerably valuable) patterns or information from very large stack of data or different dataset. In other words, it is the experimental exploration of associations, links, and mainly the overall patterns that prevails in large datasets but is hidden or unknown. So, to explore the performance analysis using different clustering techniques we used R Language. This R language is a tool, which allows the user to analyse the data from various and different perspective and angles, in order to get a proper experimental results and in order to derive a meaningful relationships. In this paper, we are studying, analysing and comparing various algorithms and their techniques used for cluster analysis using R language. Our aim in this paper, is to present the comparison of 5 different clustering algorithms and validating those algorithms in terms of internal and external validation such as Silhouette plot, dunn index, Connectivity and much more. Finally as per the basics of the results that obtained we analyzed and compared, validated the efficiency of many different algorithms with respect to one another.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Smith, T.F., Waterman, M.S.: Identification of Common Molecular Subsequences. J. Mol. Biol. 147, 195–197 (1981)
May, P., Ehrlich, H.-C., Steinke, T.: ZIB structure prediction pipeline: composing a complex biological workflow through web services. In: Nagel, W.E., Walter, W.V., Lehner, W. (eds.) Euro-Par 2006. LNCS, vol. 4128, pp. 1148–1158. Springer, Heidelberg (2006). https://doi.org/10.1007/11823285_121
Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco (1999)
Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Grid ınformation services for distributed resource sharing. In: 10th IEEE International Symposium on High Performance Distributed Computing, pp. 181–184. IEEE Press, New York (2001)
Foster, I., Kesselman, C., Nick, J., Tuecke, S.: The physiology of the grid: an open grid services architecture for distributed systems ıntegration. Technical report, Global Grid Forum (2002)
National Center for Biotechnology Information. http://www.ncbi.nlm.nih.gov
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)
Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 911–916. IEEE, December 2010
Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J., Wu, S.: Understanding and enhancement of internal clustering validation measures. IEEE Trans. Cybern. 43(3), 982–994 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Jayaram Reddy, A., Tripathy, B., Nimje, S., Sree Ganga, G., Varnasree, K. (2018). Performance Analysis of Clustering Algorithm in Data Mining in R Language. In: Zelinka, I., Senkerik, R., Panda, G., Lekshmi Kanthan, P. (eds) Soft Computing Systems. ICSCS 2018. Communications in Computer and Information Science, vol 837. Springer, Singapore. https://doi.org/10.1007/978-981-13-1936-5_39
Download citation
DOI: https://doi.org/10.1007/978-981-13-1936-5_39
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1935-8
Online ISBN: 978-981-13-1936-5
eBook Packages: Computer ScienceComputer Science (R0)