ICPCA/SWS 2013: Pervasive Computing and the Networked World pp 563-574 | Cite as
The Calculation of Similarity and Its Application in Data Mining
Abstract
The Similarity is a measure, which is used to measure the strength of the relationship between two objects and their closely degree. According to different object types, similarity calculation method is also different. Similarity calculation is widely used in classifing data, it is the basis of object classification. In this paper, the data objects were divided into three kinds: numerical type, non numeric type and mixed type. And these similarity calculation methods of different types are discussed. Finally, we illustrated the application of similarity in the data classification and data cluster.
Keywords
similarity object data mining data typePreview
Unable to display preview. Download preview PDF.
References
- 1.Han, J., Kamber, M., Pei, J.: Data Mining Concepts and Technologyes, 3rd edn. China Machine Press (August 2012)Google Scholar
- 2.Tan, P., Steinbach, M.: Introduction to Data Mining. China Machine Press (September 2010)Google Scholar
- 3.Witten, I.H., Frank, E., Hall, M.A.: Data Mining Practical Machine Learning Tools and Techniques, 3rd edn. China Machine Press (March 2012)Google Scholar
- 4.Murphy, K.P.: Machine Learning. The MIT Press (August 2012)Google Scholar
- 5.Jiang, S., Li, X., Zheng, Q.: Principles and Practice of Data Mining. Publishing House of Electronics Industry (March 2013)Google Scholar
- 6.Manning, C.D., Schutze, H.: Foundations of Statistical Naturral Language Processing. Publishing House of Electronics Industry (April 2007)Google Scholar
- 7.Santini, S., Jain, R.: Similarity Measures. IEEE Trans. Pattern Analysis and Machine Intelligence 21(9), 871–883 (1999)CrossRefGoogle Scholar
- 8.Theodoridis, S.: Konstantinos Koutroumbas, Pattern Recognition, 3rd edn. Publishing House of Electronics Industry (December 2006)Google Scholar
- 9.Yu, H.: The Similarity measure research and its applications in data mining. Master’s thesis. Fujian Normal University (2009)Google Scholar
- 10.Yano, Y., et al.: Associative Memory with Fully Parallel Nearest-Manhattan-Distance Search for Low-Power Real-Time Single-Chip Applications. In: Proc. of IEEE ASP-DAC, pp. 543–544 (January 2004)Google Scholar
- 11.Mattausch, H.J., et al.: Fully-parallel Pattern-matching Engine with Dynamic Adaptability to Hamming or Manhattan Distance. In: Symp. on VLSI Circuits Dig. Tech. Papers, pp. 252–255 (June 2002)Google Scholar
- 12.Ye, Q.-Z.: The Signed Euclidean Distance Transform and Its Applications. IEEE 1, 495–499 (1988)Google Scholar
- 13.Chiou, H.-K., Liu, G.-S., et al.: Multiple Objective Compromise Optimization Method to Analyze the Strategies of Nanotechnology in Taiwan. In: Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing, pp. 172–177 (2009)Google Scholar
- 14.Danielsson, P.E.: Euclidean Distance Mapping. Computer Graphics and Image Processing 14, 227–248 (1988)CrossRefGoogle Scholar
- 15.de Souza, R.M.C.R., de Carvalho, F.A.T.: Dynamic clusterig of interval data based on adaptive Chebyshev distances. Electronics Letters 40(11), 658–660 (2004)CrossRefGoogle Scholar
- 16.Kamimura, R., Uchida, O.: Greedy Network-Growing by Minkowski Distance Functions. IEEE Transaction on Neural Networks, 2837–2842 (2004)Google Scholar
- 17.Taguchi, S.C., Wu, Y.: The Mahalanobis-Taguchi System. McGraw-Hill, New York (2001)Google Scholar
- 18.Shen, C., Kim, J., Wang, L.: Scalable Large-Margin Mahalanobis Distance Metric Learning. IEEE Transactions on Neural Networks 21(9), 1524–1530 (2010)CrossRefGoogle Scholar
- 19.Kim, J., Shen, C., Wang, L.: A scalable algorithm for learning a Mahalanobis distance metric. In: Zha, H., Taniguchi, R.-i., Maybank, S. (eds.) ACCV 2009, Part III. LNCS, vol. 5996, pp. 299–310. Springer, Heidelberg (2010)CrossRefGoogle Scholar
- 20.Jiang, S.-Y.: Efficient Classification Method for Large Dataset. In: Proceeding of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, pp. 13–16 (August 2006)Google Scholar
- 21.Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. In: Proc. Adv. Neural Inf. Process. Syst., pp. 505–512 (December 2003)Google Scholar
- 22.Guan-Nan, D.: The Similarity Measure in Clustering. Northeast Dianli University 33(1/2), 156–161 (2013)Google Scholar
- 23.Ming, F., Hong-Jian, F.: Introduction to Data Mining (the full version). People Post Press (2013)Google Scholar
- 24.Min, W.: The Classification attribute data clustering algorithm. Jiangsu University, Master’s Paper (2008)Google Scholar
- 25.Guilin, L., Xiaoyun, C.: The Discussion on the Similarity of Cluster Analysis. Computer Engineering and Applications (2004)Google Scholar
- 26.Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)CrossRefMATHGoogle Scholar
- 27.Jiang, S.-Y., Li, Q.H.: An Enhanced K-means Clustering Algorithm. Computer Engineering & Science 28(11), 56–59 (2006)Google Scholar