Data Mining and Knowledge Discovery

, Volume 14, Issue 1, pp 1-23

First online:

An efficient approach to external cluster assessment with an application to martian topography

  • R. VilaltaAffiliated withDepartment of Computer Science, University of Houston Email author 
  • , T. StepinskiAffiliated withLunar and Planetary Institute
  • , M. AchariAffiliated withDepartment of Computer Science, University of Houston

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access


Automated tools for knowledge discovery are frequently invoked in databases where objects already group into some known (i.e., external) classification scheme. In the context of unsupervised learning or clustering, such tools delve inside large databases looking for alternative classification schemes that are meaningful and novel. An assessment of the information gained with new clusters can be effected by looking at the degree of separation between each new cluster and its most similar class. Our approach models each cluster and class as a multivariate Gaussian distribution and estimates their degree of separation through an information theoretic measure (i.e., through relative entropy or Kullback–Leibler distance). The inherently large computational cost of this step is alleviated by first projecting all data over the single dimension that best separates both distributions (using Fisher’s Linear Discriminant). We test our algorithm on a dataset of Martian surfaces using the traditional division into geological units as external classes and the new, hydrology-inspired, automatically performed division as novel clusters. We find the new partitioning constitutes a formally meaningful classification that deviates substantially from the traditional classification.


External cluster validation Multivariate Gaussian distributions Martian topography