Advertisement

Distance Measures in DNA Microarray Data Analysis

  • R. Gentleman
  • B. Ding
  • S. Dudoit
  • J. Ibrahim
Part of the Statistics for Biology and Health book series (SBH)

Abstract

Both supervised and unsupervised machine learning techniques require selection of a measure of distance between, or similarity among, the objects to be classified or clustered. Different measures of distance or similarity will lead to different machine learning performance. The appropriateness of a distance measure will typically depend on the types of features being used in the learning process.

In this chapter, we examine the properties of distance measures in the context of the analysis of gene expression data from DNA microarray experiments. The feature vectors represent transcript levels, i.e., mRNA abundance or relative abundance, either across biological samples (if comparing genes) or across genes (if comparing samples).

We consider different aspects of distances that help address the heterogeneity of the data and differences in interpretation depending on the source of the data (cDNA arrays versus short oligonucleotide arrays). Traditional measures, such as Euclidean and Manhattan distances as well as correlation-based distances, are considered. Other dissimilarity functions, which involve comparisons of distributions based on the Kullback-Leibler and mutual information criteria, are also examined.

Keywords

Mutual Information Distance Measure Linear Discriminant Analysis Mahalanobis Distance Expression Measure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer Science+Business Media, Inc. 2005

Authors and Affiliations

  • R. Gentleman
  • B. Ding
  • S. Dudoit
  • J. Ibrahim

There are no affiliations available

Personalised recommendations