The Calculation of Similarity and Its Application in Data Mining

Teng, Shaohua; Li, Junlei; Li, Rigui; Zhang, Wei

doi:10.1007/978-3-319-09265-2_57

Shaohua Teng¹⁸,
Junlei Li¹⁸,
Rigui Li¹⁸ &
…
Wei Zhang¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 8351))

Included in the following conference series:

Joint International Conference on Pervasive Computing and the Networked World

3109 Accesses
3 Citations

Abstract

The Similarity is a measure, which is used to measure the strength of the relationship between two objects and their closely degree. According to different object types, similarity calculation method is also different. Similarity calculation is widely used in classifing data, it is the basis of object classification. In this paper, the data objects were divided into three kinds: numerical type, non numeric type and mixed type. And these similarity calculation methods of different types are discussed. Finally, we illustrated the application of similarity in the data classification and data cluster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Han, J., Kamber, M., Pei, J.: Data Mining Concepts and Technologyes, 3rd edn. China Machine Press (August 2012)
Google Scholar
Tan, P., Steinbach, M.: Introduction to Data Mining. China Machine Press (September 2010)
Google Scholar
Witten, I.H., Frank, E., Hall, M.A.: Data Mining Practical Machine Learning Tools and Techniques, 3rd edn. China Machine Press (March 2012)
Google Scholar
Murphy, K.P.: Machine Learning. The MIT Press (August 2012)
Google Scholar
Jiang, S., Li, X., Zheng, Q.: Principles and Practice of Data Mining. Publishing House of Electronics Industry (March 2013)
Google Scholar
Manning, C.D., Schutze, H.: Foundations of Statistical Naturral Language Processing. Publishing House of Electronics Industry (April 2007)
Google Scholar
Santini, S., Jain, R.: Similarity Measures. IEEE Trans. Pattern Analysis and Machine Intelligence 21(9), 871–883 (1999)
Article Google Scholar
Theodoridis, S.: Konstantinos Koutroumbas, Pattern Recognition, 3rd edn. Publishing House of Electronics Industry (December 2006)
Google Scholar
Yu, H.: The Similarity measure research and its applications in data mining. Master’s thesis. Fujian Normal University (2009)
Google Scholar
Yano, Y., et al.: Associative Memory with Fully Parallel Nearest-Manhattan-Distance Search for Low-Power Real-Time Single-Chip Applications. In: Proc. of IEEE ASP-DAC, pp. 543–544 (January 2004)
Google Scholar
Mattausch, H.J., et al.: Fully-parallel Pattern-matching Engine with Dynamic Adaptability to Hamming or Manhattan Distance. In: Symp. on VLSI Circuits Dig. Tech. Papers, pp. 252–255 (June 2002)
Google Scholar
Ye, Q.-Z.: The Signed Euclidean Distance Transform and Its Applications. IEEE 1, 495–499 (1988)
Google Scholar
Chiou, H.-K., Liu, G.-S., et al.: Multiple Objective Compromise Optimization Method to Analyze the Strategies of Nanotechnology in Taiwan. In: Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing, pp. 172–177 (2009)
Google Scholar
Danielsson, P.E.: Euclidean Distance Mapping. Computer Graphics and Image Processing 14, 227–248 (1988)
Article Google Scholar
de Souza, R.M.C.R., de Carvalho, F.A.T.: Dynamic clusterig of interval data based on adaptive Chebyshev distances. Electronics Letters 40(11), 658–660 (2004)
Article Google Scholar
Kamimura, R., Uchida, O.: Greedy Network-Growing by Minkowski Distance Functions. IEEE Transaction on Neural Networks, 2837–2842 (2004)
Google Scholar
Taguchi, S.C., Wu, Y.: The Mahalanobis-Taguchi System. McGraw-Hill, New York (2001)
Google Scholar
Shen, C., Kim, J., Wang, L.: Scalable Large-Margin Mahalanobis Distance Metric Learning. IEEE Transactions on Neural Networks 21(9), 1524–1530 (2010)
Article Google Scholar
Kim, J., Shen, C., Wang, L.: A scalable algorithm for learning a Mahalanobis distance metric. In: Zha, H., Taniguchi, R.-i., Maybank, S. (eds.) ACCV 2009, Part III. LNCS, vol. 5996, pp. 299–310. Springer, Heidelberg (2010)
Chapter Google Scholar
Jiang, S.-Y.: Efficient Classification Method for Large Dataset. In: Proceeding of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, pp. 13–16 (August 2006)
Google Scholar
Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. In: Proc. Adv. Neural Inf. Process. Syst., pp. 505–512 (December 2003)
Google Scholar
Guan-Nan, D.: The Similarity Measure in Clustering. Northeast Dianli University 33(1/2), 156–161 (2013)
Google Scholar
Ming, F., Hong-Jian, F.: Introduction to Data Mining (the full version). People Post Press (2013)
Google Scholar
Min, W.: The Classification attribute data clustering algorithm. Jiangsu University, Master’s Paper (2008)
Google Scholar
Guilin, L., Xiaoyun, C.: The Discussion on the Similarity of Cluster Analysis. Computer Engineering and Applications (2004)
Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Article MATH Google Scholar
Jiang, S.-Y., Li, Q.H.: An Enhanced K-means Clustering Algorithm. Computer Engineering & Science 28(11), 56–59 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

School of computer science and technology, Guangdong University of Technology, Guangzhou, Guangdong, 510006, China
Shaohua Teng, Junlei Li, Rigui Li & Wei Zhang

Authors

Shaohua Teng
View author publications
You can also search for this author in PubMed Google Scholar
Junlei Li
View author publications
You can also search for this author in PubMed Google Scholar
Rigui Li
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Logistics Engineering, Wuhan University of Technology, 430063, Wuhan, Hubei, China
Qiaohong Zu
Facultad de Ingenieria y Ciencias Universidad Adolfo Ibanez, Vina del Mar, Chile
Maria Vargas-Vera
Fujitsu, Hayes, Middlesex, UK
Bo Hu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Teng, S., Li, J., Li, R., Zhang, W. (2014). The Calculation of Similarity and Its Application in Data Mining. In: Zu, Q., Vargas-Vera, M., Hu, B. (eds) Pervasive Computing and the Networked World. ICPCA/SWS 2013. Lecture Notes in Computer Science, vol 8351. Springer, Cham. https://doi.org/10.1007/978-3-319-09265-2_57

Download citation

DOI: https://doi.org/10.1007/978-3-319-09265-2_57
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09264-5
Online ISBN: 978-3-319-09265-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics