Similarity measure design for high dimensional data

Lee, Sang-hyuk; Yan, Sun; Jeong, Yoon-su; Shin, Seung-soo

doi:10.1007/s11771-014-2333-5

Similarity measure design for high dimensional data

Published: 06 September 2014

Volume 21, pages 3534–3540, (2014)
Cite this article

Journal of Central South University Aims and scope Submit manuscript

Sang-hyuk Lee¹,
Sun Yan²,
Yoon-su Jeong³ &
…
Seung-soo Shin⁴

146 Accesses
1 Citation
Explore all metrics

Abstract

Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data were introduced, and similarity measure analysis was also illustrated and compared with conventional similarity measure. As a result, overlapped data comparison was possible to present similarity with conventional similarity measure. Non-overlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considering high dimensional data analysis was designed with consideration of neighborhoods information. Conservative and strict solutions were proposed. Proposed similarity measure was applied to express financial fraud among multi dimensional datasets. In illustrative example, financial fraud similarity with respect to age, gender, qualification and job was presented. And with the proposed similarity measure, high dimensional personal data were calculated to evaluate how similar to the financial fraud. Calculation results show that the actual fraud has rather high similarity measure compared to the average, from minimal 0.0609 to maximal 0.1667.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similarity Measure Design on Big Data

The Calculation of Similarity and Its Application in Data Mining

Data Reduction with Distance Correlation

References

Computing community consortium, computing research association. Advancing discovery in science and engineering [R]. Computing Community Consortium, Computing Research Association, Springer 2011.
Google Scholar
Computing community consortium, computing research association. Advancing personalized education [R]. Computing Community Consortium, Computing Research Association, Springer, 2011.
Google Scholar
Smart health and wellbeing [R]. Computing Community Consortium, Computing Research Association, Springer, 2011.
Google Scholar
WEST D M. Big data for education: Data mining, data analytics, and web dashboards [R]. Washington, USA, Governance Studies at Brookings, 2012
Google Scholar
MANYIKA J, CHUI M, BROWN B, BUGHIN J, DOBBS R, ROXBURGH C, BYERS A. Big Data: The next frontier for innovation, competition, and productivity [R]. McKinsey Global Institute, 2011.
Google Scholar
CASTRO F, VELLIDO A, NEBOT A, MUGICA F. Applying data mining techniques to e-learning problems [J]. Studies in Computational Intelligence, 2007, 62: 183–221.
Article Google Scholar
LIU Xue-cheng. Entropy, distance measure and similarity measure of fuzzy sets and their relations [J]. Fuzzy Sets and Systems, 1992, 52: 305–318.
Article MATH MathSciNet Google Scholar
FISHER D H. Knowledge acquisition via incremental conceptual clustering [J]. Machine Learning, 1987, 2: 139–172.
Google Scholar
JAIN A K, DUBES R C. Algorithms for clustering data [M]. Prentice-Hall, 1988: 78–110.
Google Scholar
MURTAGH F. A survey of recent hierarchical clustering algorithms [J]. The Computer Journal, 1983, 26(4): 354–359.
Article MATH Google Scholar
MICHALSKI R S, STEPP R E. Learning from observation: Conceptual clustering [J]. Machine Learning: An artificial Intelligence Approaches, 1983: 331–363.
Chapter Google Scholar
FRIEDMAN H P, RUBIN J. On some invariant criteria for grouping data [J]. Journal of American Statistical Association, 1967, 62: 1159–1178.
Article MathSciNet Google Scholar
FUKUNAGA K. Introduction to statistical pattern recognition [M]. Academic Press, 1990: 45–89.
Google Scholar
LEE S H, PEDRYCZ W, GYOYONGSOHN. Design of similarity and dissimilarity measures for fuzzy sets on the basis of distance measure [J]. International Journal of Fuzzy Systems, 2009, 11:67–72.
MathSciNet Google Scholar
LEE S H, RYU K H, SOHN G Y. Study on entropy and similarity measure for fuzzy set [J]. IEICE Trans Inf & Syst, 2009, E92-D: 1783–1786.
Article Google Scholar
LEE S H, SUN Y. Data analysis on high dimensional data via calculation of degree of similarity [C]// Proceeding of International Symposium on System Informatics and Engineering, Xi’an, China, 2013: 160–166.
Google Scholar
KENNDY J, EBERHART R. Particle swam optimization [C]// Neural Networks, Proceedings, IEEE International Conference on, Perth. WA, USA, 1995: 1942–1948.
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Electronic Engineering, Xi’an Jiaotong-Liverpool University, Suzhou, 215123, China
Sang-hyuk Lee
International Business School Suzhou, Xi’an Jiaotong-Liverpool University, Suzhou, 215123, China
Sun Yan
Department of Information Communication Engineering, Mokwon University, 21 Mokwon-gil, Seo-gu, Daejeon, 302-318, Korea
Yoon-su Jeong
Department of Information Security, Tongmyong University, Sinseonno, Nam-gu, Busan, 608-711, Korea
Seung-soo Shin

Authors

Sang-hyuk Lee
View author publications
You can also search for this author in PubMed Google Scholar
Sun Yan
View author publications
You can also search for this author in PubMed Google Scholar
Yoon-su Jeong
View author publications
You can also search for this author in PubMed Google Scholar
Seung-soo Shin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yoon-su Jeong.

Additional information

Foundation item: Project(RDF 11-02-03) supported by the Research Development Fund of XJTLU, China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, Sh., Yan, S., Jeong, Ys. et al. Similarity measure design for high dimensional data. J. Cent. South Univ. 21, 3534–3540 (2014). https://doi.org/10.1007/s11771-014-2333-5

Download citation

Received: 18 February 2014
Accepted: 27 May 2014
Published: 06 September 2014
Issue Date: September 2014
DOI: https://doi.org/10.1007/s11771-014-2333-5

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similarity measure design for high dimensional data

Abstract

Access this article

Similar content being viewed by others

Similarity Measure Design on Big Data

The Calculation of Similarity and Its Application in Data Mining

Data Reduction with Distance Correlation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

Similarity measure design for high dimensional data

Abstract

Access this article

Similar content being viewed by others

Similarity Measure Design on Big Data

The Calculation of Similarity and Its Application in Data Mining

Data Reduction with Distance Correlation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation