Joint European Conference on Machine Learning and Knowledge Discovery in Databases

ECML PKDD 2015: Machine Learning and Knowledge Discovery in Databases pp 251-266

ConDist: A Context-Driven Categorical Distance Measure

  • Markus Ring
  • Florian Otto
  • Martin Becker
  • Thomas Niebler
  • Dieter Landes
  • Andreas Hotho
Conference paper

DOI: 10.1007/978-3-319-23528-8_16

Part of the Lecture Notes in Computer Science book series (LNCS, volume 9284)
Cite this paper as:
Ring M., Otto F., Becker M., Niebler T., Landes D., Hotho A. (2015) ConDist: A Context-Driven Categorical Distance Measure. In: Appice A., Rodrigues P., Santos Costa V., Soares C., Gama J., Jorge A. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2015. Lecture Notes in Computer Science, vol 9284. Springer, Cham

Abstract

A distance measure between objects is a key requirement for many data mining tasks like clustering, classification or outlier detection. However, for objects characterized by categorical attributes, defining meaningful distance measures is a challenging task since the values within such attributes have no inherent order, especially without additional domain knowledge. In this paper, we propose an unsupervised distance measure for objects with categorical attributes based on the idea that categorical attribute values are similar if they appear with similar value distributions on correlated context attributes. Thus, the distance measure is automatically derived from the given data set. We compare our new distance measure to existing categorical distance measures and evaluate on different data sets from the UCI machine-learning repository. The experiments show that our distance measure is recommendable, since it achieves similar or better results in a more robust way than previous approaches.

Keywords

Categorical data Distance measure Heterogeneous data Unsupervised learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Markus Ring
    • 1
  • Florian Otto
    • 1
  • Martin Becker
    • 2
  • Thomas Niebler
    • 2
  • Dieter Landes
    • 1
  • Andreas Hotho
    • 2
  1. 1.Faculty of Electrical Engineering and InformaticsCoburg University of Applied Sciences and ArtsCoburgGermany
  2. 2.Data Mining and Information Retrieval GroupUniversity of WürzburgWürzburgGermany

Personalised recommendations