Skip to main content
Log in

Similarity measure design for high dimensional data

  • Published:
Journal of Central South University Aims and scope Submit manuscript

Abstract

Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data were introduced, and similarity measure analysis was also illustrated and compared with conventional similarity measure. As a result, overlapped data comparison was possible to present similarity with conventional similarity measure. Non-overlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considering high dimensional data analysis was designed with consideration of neighborhoods information. Conservative and strict solutions were proposed. Proposed similarity measure was applied to express financial fraud among multi dimensional datasets. In illustrative example, financial fraud similarity with respect to age, gender, qualification and job was presented. And with the proposed similarity measure, high dimensional personal data were calculated to evaluate how similar to the financial fraud. Calculation results show that the actual fraud has rather high similarity measure compared to the average, from minimal 0.0609 to maximal 0.1667.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Computing community consortium, computing research association. Advancing discovery in science and engineering [R]. Computing Community Consortium, Computing Research Association, Springer 2011.

    Google Scholar 

  2. Computing community consortium, computing research association. Advancing personalized education [R]. Computing Community Consortium, Computing Research Association, Springer, 2011.

    Google Scholar 

  3. Smart health and wellbeing [R]. Computing Community Consortium, Computing Research Association, Springer, 2011.

    Google Scholar 

  4. WEST D M. Big data for education: Data mining, data analytics, and web dashboards [R]. Washington, USA, Governance Studies at Brookings, 2012

    Google Scholar 

  5. MANYIKA J, CHUI M, BROWN B, BUGHIN J, DOBBS R, ROXBURGH C, BYERS A. Big Data: The next frontier for innovation, competition, and productivity [R]. McKinsey Global Institute, 2011.

    Google Scholar 

  6. CASTRO F, VELLIDO A, NEBOT A, MUGICA F. Applying data mining techniques to e-learning problems [J]. Studies in Computational Intelligence, 2007, 62: 183–221.

    Article  Google Scholar 

  7. LIU Xue-cheng. Entropy, distance measure and similarity measure of fuzzy sets and their relations [J]. Fuzzy Sets and Systems, 1992, 52: 305–318.

    Article  MATH  MathSciNet  Google Scholar 

  8. FISHER D H. Knowledge acquisition via incremental conceptual clustering [J]. Machine Learning, 1987, 2: 139–172.

    Google Scholar 

  9. JAIN A K, DUBES R C. Algorithms for clustering data [M]. Prentice-Hall, 1988: 78–110.

    Google Scholar 

  10. MURTAGH F. A survey of recent hierarchical clustering algorithms [J]. The Computer Journal, 1983, 26(4): 354–359.

    Article  MATH  Google Scholar 

  11. MICHALSKI R S, STEPP R E. Learning from observation: Conceptual clustering [J]. Machine Learning: An artificial Intelligence Approaches, 1983: 331–363.

    Chapter  Google Scholar 

  12. FRIEDMAN H P, RUBIN J. On some invariant criteria for grouping data [J]. Journal of American Statistical Association, 1967, 62: 1159–1178.

    Article  MathSciNet  Google Scholar 

  13. FUKUNAGA K. Introduction to statistical pattern recognition [M]. Academic Press, 1990: 45–89.

    Google Scholar 

  14. LEE S H, PEDRYCZ W, GYOYONGSOHN. Design of similarity and dissimilarity measures for fuzzy sets on the basis of distance measure [J]. International Journal of Fuzzy Systems, 2009, 11:67–72.

    MathSciNet  Google Scholar 

  15. LEE S H, RYU K H, SOHN G Y. Study on entropy and similarity measure for fuzzy set [J]. IEICE Trans Inf & Syst, 2009, E92-D: 1783–1786.

    Article  Google Scholar 

  16. LEE S H, SUN Y. Data analysis on high dimensional data via calculation of degree of similarity [C]// Proceeding of International Symposium on System Informatics and Engineering, Xi’an, China, 2013: 160–166.

    Google Scholar 

  17. KENNDY J, EBERHART R. Particle swam optimization [C]// Neural Networks, Proceedings, IEEE International Conference on, Perth. WA, USA, 1995: 1942–1948.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yoon-su Jeong.

Additional information

Foundation item: Project(RDF 11-02-03) supported by the Research Development Fund of XJTLU, China

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, Sh., Yan, S., Jeong, Ys. et al. Similarity measure design for high dimensional data. J. Cent. South Univ. 21, 3534–3540 (2014). https://doi.org/10.1007/s11771-014-2333-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11771-014-2333-5

Key words

Navigation