International Conference on Computer Aided Systems Theory

Computer Aided Systems Theory – EUROCAST 2015 pp 80-87 | Cite as

Dynamic Similarity and Distance Measures Based on Quantiles

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9520)

Abstract

Data Mining emerges in response to technological advances and considers the treatment of large amounts of data. The aim of Data Mining is the extraction of new, valid, comprehensible and useful knowledge by the construction of a simple model that describes the data and can also be used in prediction tasks. The challenge of extracting knowledge from data is an interdisciplinary discipline and draws upon research in statistics, pattern recognition and machine learning among others.

A common technique for identifying natural groups hidden in data is clustering. Clustering is a process that automatically discovers structure in data and does not require any supervision. The model’s performance relies heavily on the choice of an appropriate measure. It is important to use the appropriate similarity metric to measure the proximity between two objects, but the separability of clusters must also be taken into account.

This paper addresses the problem of comparing two or more sets of overlapping data as a basis for comparing different partitions of quantitative data. An approach that uses statistical concepts to measure the distance between partitions is presented. The data’s descriptive knowledge is expressed by means of a boxplot that allows for the construction of clusters taking into account conditional probabilities.

References

  1. 1.
    Cios, K.J., Pedrycz, W., Swiniarski, R.W., Kurgan, L.A.: Data Mining. A Knowledge Discovery Approach. Springer, New York (2007)MATHGoogle Scholar
  2. 2.
    Grabmeier, J., Rudolph, A.: Techniques of cluster algorithms in data mining. Data Min. Knowl. Discov. 6, 303–360 (2002)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Witte, R.S., Witte, J.S.: Statistics, 9th edn. Wiley, New Jersey (2010)MATHGoogle Scholar
  4. 4.
    Kim, H., Loh, W.Y.: Classification trees with unbiased multiway splits. J. Am. Stat. Assoc. 96, 589–604 (2001)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Departament de Ciències Matemàtiques i InformàticaUniversitat de les Illes BalearsPalma de MallorcaSpain

Personalised recommendations