Proximities in Statistics: Similarity and Distance

Lenz, Hans-J.

doi:10.1007/978-3-211-85432-7_6

Hans-J. Lenz¹⁰

Part of the book series: CISM International Centre for Mechanical Sciences ((CISM,volume 504))

540 Accesses
1 Citations

Abstract

We review similarity and distance measures used in Statistics for clustering and classification. We are motivated by the lack of most measures to adequately utilize a non uniform distribution defined on the data or sample space.

Such measures are mappings from O x O → R ₊ where O is either a finite set of objects or vector space like R ^p and R ₊ is the set of non-negative real numbers. In most cases those mappings fulfil conditions like symmetry and reflexivity. Moreover, further characteristics like transitivity or the triangle equation in case of distance measures are of concern.

We start with Hartigan’s list of proximity measures which he compiled in 1967. It is good practice to pay special attention to the type of scales of the variables involved, i.e. to nominal (often binary), ordinal and metric (interval and ratio) types of scales. We are interested in the algebraic structure of proximities as suggested by (1967) and (1971), information-theoretic measures as discussed by (1971), and the probabilistic W-distance measure as proposed by (1970). The last measure combines distances of objects or vectors with their corresponding probabilities to improve overall discrimination power. The idea is that rare events, i.e. set of values with a very low probability of observing, related to a pair of objects may be a strong hint to strong similarity of this pair.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Borgelt, Ch., Prototype-based Classification and Clustering, Habilitationsschrift, Ottovon-Guericke-Universität Magdeburg, Magdeburg, 2005
Google Scholar
Cormack, R.M., A review of classification (with Discussion), J.R.Stat. Soc., A, 31, 321–367
Google Scholar
Cox, T.F. and Cox, M.A.A., Multidimensional Scaling, 2^nd. Ed., Chapman & Hall, Boca Raton etc., 2001
MATH Google Scholar
Frakes, W.B. and Baeza-Yates, R., Information Retrieval: Data Structures and Algorithms, Prentice Hal, Upper Saddle River, 1992
Google Scholar
Godan, M., Über die Komplexität der Bestimmung der Ähnlichkeit von geometrischen Objekten in höheren Dimensionen, Dissertation, Freie Universität Berlin, 1991
Google Scholar
Gower, J., A general coefficient of similarity and some of its properties, Biometrics, 27, 857–874
Google Scholar
Hartigan, J.A., Representation of similarity matrices by trees, J.Am.Stat.Assoc., 62, 1140–1158, 1967
Article MathSciNet Google Scholar
Hubálek, Z., Coefficients of association and similarity based on binary (presence-absence) data; an evaluation, Biol. Rev., 57, 669–689
Google Scholar
Kruse, R. and Meyer, K.D., Statistics with Vague Data. D. Reidel Publishing Company, Dordrecht, 1987
MATH Google Scholar
Kullback, S., Information Theory and Statistics, Wiley, New York etc., 1959
MATH Google Scholar
Jardine, N. and Sibson, R., Mathematical Taxonomy, Wiley, London, 1971
MATH Google Scholar
Mahalanobis, P.C., On the Generalized Distance in Statistics. In: Proceedings Natl. Inst. Sci. India, 2, 49–55, 1936
MATH Google Scholar
Murtagh, F., Identifying and Exploiting ultrametricity. In: Advances in Data Analysis, Decker, R. and Lenz, H.-J. (eds.), Springer, Heidelberg, 2007
Google Scholar
Skarabis, H., Mathematische Grundlagen und praktische Aspekte der Diskrimination und Klassifikation, Physika-Verlag, Würzburg, 1970
MATH Google Scholar
Sneath, P.H.A. and Sokal, R.R., Numerical Taxonomy, Freeman and Co., San Francisco, 1973
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Statistics and Econometrics, Freie Universität Berlin, Germany
Hans-J. Lenz

Authors

Hans-J. Lenz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Udine, Udine, Italy
Giacomo Della Riccia
University of Toulouse, France
Didier Dubois
University of Magdeburg, Magdeburg, Deutschland
Rudolf Kruse
Freie Universität Berlin, Berlin, Deutschland
Hans-Joachim Lenz

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lenz, HJ. (2008). Proximities in Statistics: Similarity and Distance. In: Della Riccia, G., Dubois, D., Kruse, R., Lenz, HJ. (eds) Preferences and Similarities. CISM International Centre for Mechanical Sciences, vol 504. Springer, Vienna. https://doi.org/10.1007/978-3-211-85432-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-211-85432-7_6
Publisher Name: Springer, Vienna
Print ISBN: 978-3-211-85431-0
Online ISBN: 978-3-211-85432-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics