Skip to main content

The Calculation of Similarity and Its Application in Data Mining

  • Conference paper
Pervasive Computing and the Networked World (ICPCA/SWS 2013)

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 8351))

Abstract

The Similarity is a measure, which is used to measure the strength of the relationship between two objects and their closely degree. According to different object types, similarity calculation method is also different. Similarity calculation is widely used in classifing data, it is the basis of object classification. In this paper, the data objects were divided into three kinds: numerical type, non numeric type and mixed type. And these similarity calculation methods of different types are discussed. Finally, we illustrated the application of similarity in the data classification and data cluster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Han, J., Kamber, M., Pei, J.: Data Mining Concepts and Technologyes, 3rd edn. China Machine Press (August 2012)

    Google Scholar 

  2. Tan, P., Steinbach, M.: Introduction to Data Mining. China Machine Press (September 2010)

    Google Scholar 

  3. Witten, I.H., Frank, E., Hall, M.A.: Data Mining Practical Machine Learning Tools and Techniques, 3rd edn. China Machine Press (March 2012)

    Google Scholar 

  4. Murphy, K.P.: Machine Learning. The MIT Press (August 2012)

    Google Scholar 

  5. Jiang, S., Li, X., Zheng, Q.: Principles and Practice of Data Mining. Publishing House of Electronics Industry (March 2013)

    Google Scholar 

  6. Manning, C.D., Schutze, H.: Foundations of Statistical Naturral Language Processing. Publishing House of Electronics Industry (April 2007)

    Google Scholar 

  7. Santini, S., Jain, R.: Similarity Measures. IEEE Trans. Pattern Analysis and Machine Intelligence 21(9), 871–883 (1999)

    Article  Google Scholar 

  8. Theodoridis, S.: Konstantinos Koutroumbas, Pattern Recognition, 3rd edn. Publishing House of Electronics Industry (December 2006)

    Google Scholar 

  9. Yu, H.: The Similarity measure research and its applications in data mining. Master’s thesis. Fujian Normal University (2009)

    Google Scholar 

  10. Yano, Y., et al.: Associative Memory with Fully Parallel Nearest-Manhattan-Distance Search for Low-Power Real-Time Single-Chip Applications. In: Proc. of IEEE ASP-DAC, pp. 543–544 (January 2004)

    Google Scholar 

  11. Mattausch, H.J., et al.: Fully-parallel Pattern-matching Engine with Dynamic Adaptability to Hamming or Manhattan Distance. In: Symp. on VLSI Circuits Dig. Tech. Papers, pp. 252–255 (June 2002)

    Google Scholar 

  12. Ye, Q.-Z.: The Signed Euclidean Distance Transform and Its Applications. IEEE 1, 495–499 (1988)

    Google Scholar 

  13. Chiou, H.-K., Liu, G.-S., et al.: Multiple Objective Compromise Optimization Method to Analyze the Strategies of Nanotechnology in Taiwan. In: Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing, pp. 172–177 (2009)

    Google Scholar 

  14. Danielsson, P.E.: Euclidean Distance Mapping. Computer Graphics and Image Processing 14, 227–248 (1988)

    Article  Google Scholar 

  15. de Souza, R.M.C.R., de Carvalho, F.A.T.: Dynamic clusterig of interval data based on adaptive Chebyshev distances. Electronics Letters 40(11), 658–660 (2004)

    Article  Google Scholar 

  16. Kamimura, R., Uchida, O.: Greedy Network-Growing by Minkowski Distance Functions. IEEE Transaction on Neural Networks, 2837–2842 (2004)

    Google Scholar 

  17. Taguchi, S.C., Wu, Y.: The Mahalanobis-Taguchi System. McGraw-Hill, New York (2001)

    Google Scholar 

  18. Shen, C., Kim, J., Wang, L.: Scalable Large-Margin Mahalanobis Distance Metric Learning. IEEE Transactions on Neural Networks 21(9), 1524–1530 (2010)

    Article  Google Scholar 

  19. Kim, J., Shen, C., Wang, L.: A scalable algorithm for learning a Mahalanobis distance metric. In: Zha, H., Taniguchi, R.-i., Maybank, S. (eds.) ACCV 2009, Part III. LNCS, vol. 5996, pp. 299–310. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  20. Jiang, S.-Y.: Efficient Classification Method for Large Dataset. In: Proceeding of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, pp. 13–16 (August 2006)

    Google Scholar 

  21. Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. In: Proc. Adv. Neural Inf. Process. Syst., pp. 505–512 (December 2003)

    Google Scholar 

  22. Guan-Nan, D.: The Similarity Measure in Clustering. Northeast Dianli University 33(1/2), 156–161 (2013)

    Google Scholar 

  23. Ming, F., Hong-Jian, F.: Introduction to Data Mining (the full version). People Post Press (2013)

    Google Scholar 

  24. Min, W.: The Classification attribute data clustering algorithm. Jiangsu University, Master’s Paper (2008)

    Google Scholar 

  25. Guilin, L., Xiaoyun, C.: The Discussion on the Similarity of Cluster Analysis. Computer Engineering and Applications (2004)

    Google Scholar 

  26. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)

    Article  MATH  Google Scholar 

  27. Jiang, S.-Y., Li, Q.H.: An Enhanced K-means Clustering Algorithm. Computer Engineering & Science 28(11), 56–59 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Teng, S., Li, J., Li, R., Zhang, W. (2014). The Calculation of Similarity and Its Application in Data Mining. In: Zu, Q., Vargas-Vera, M., Hu, B. (eds) Pervasive Computing and the Networked World. ICPCA/SWS 2013. Lecture Notes in Computer Science, vol 8351. Springer, Cham. https://doi.org/10.1007/978-3-319-09265-2_57

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09265-2_57

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09264-5

  • Online ISBN: 978-3-319-09265-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics