Similarity Measure Design on Big Data

  • Sanghyuk Lee
  • Yan Sun
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 235)


Clustering algorithm in big data was designed, and its idea was based on defining similarity measure. Traditional similarity measure on overlapped data was illustrated, and application to non-overlapped data was carried out. Similarity measure on high dimension data was obtained through getting information from neighbor data. Its usefulness was proved, and verified by calculation of similarity for artificial data example.


Similarity measure Big data Neighbor information High dimension data 


  1. 1.
    Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2:139–172Google Scholar
  2. 2.
    Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Englewood CliffsGoogle Scholar
  3. 3.
    Murtagh F (1983) A survey of recent hierarchical clustering algorithms. Comput J 26:354–359Google Scholar
  4. 4.
    Michalski RS, Stepp RE (1983) Learning from observation: conceptual clustering. In: Machine learning: an artificial intelligence approaches. Tioga, Palo Alto, pp 331–363Google Scholar
  5. 5.
    Friedman HP, Rubin J (1967) On some invariant criteria for grouping data. J Am Stat Assoc 62:1159–1178Google Scholar
  6. 6.
    Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press, San DiegoGoogle Scholar
  7. 7.
    Advancing Discovery in Science and Engineering (2011) Computing Community Consortium, Spring 2011Google Scholar
  8. 8.
    Advancing Personalized Education (2011) Computing Community Consortium, Spring 2011Google Scholar
  9. 9.
    Smart Health and Wellbeing (2011) Computing Community Consortium, Spring 2011Google Scholar
  10. 10.
    Liu X (1992) Entropy, distance measure and similarity measure of fuzzy sets and their relations. Fuzzy Sets Syst 52:305–318Google Scholar
  11. 11.
    Lee SH, Pedrycz W, Sohn G (2009) Design of similarity and dissimilarity measures for fuzzy sets on the basis of distance measure. Int J Fuzzy Syst 11:67–72Google Scholar
  12. 12.
    Lee SH, Ryu KH, Sohn GY (2009) Study on entropy and similarity measure for fuzzy set. IEICE Trans Inf Syst E92-D:1783–1786Google Scholar
  13. 13.
    Lee SH, Kim SJ, Jang NY (2008) Design of fuzzy entropy for non convex membership function. CCIS 15:55–60Google Scholar
  14. 14.
    Cheng Y, Church G (2000) Biclustering of expression data, In: Proceedings of the 8th international conference on intelligent system for molecular biology. La JollaGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  1. 1.Department of Electrical and Electronic EngineeringXi’an Jiaotong-Liverpool UniversitySuzhouChina
  2. 2.School of Business Economic and ManagementXi’an Jiaotong-Liverpool UniversitySuzhouChina

Personalised recommendations