Abstract
We present a robust clustering method based on a modified Weisz-feld algorithm for the multivariate median, and associated data depth. The multivariate medians are used to represent the clusters, while the induced relative L 1-depths are used to identify outliers and to select the number of clusters. We develop a cluster validation and visualization tool based on the within-cluster data depths, and the cluster data depths with respect to competing clusters. We apply our method to high-dimensional gene expression data, and several simulated data sets. Our method successfully identifies the number of clusters in noisy data sets, and generates accurate cluster assignments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. Dudoit, J. Fridlyand, T. Speed. Comparision of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97 (2002), 77–87.
S. Dudoit, J. Fridlyand. Application of resampling methods to estimate the number of clusters and to improve the accuracy of a clustering method. Technical report 600 (2001), Department of Statistics, UC Berkeley.
T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caliguiri, C. D. Bloomfield, and E. S. Lander. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286 (1999), 531–537
T. Hastie, R. Tibshirani, M. B. Eisen, A. Alizadeh, R. Levy, L. Straudt, W. C. Chang, D. Botstein, P. Brown. Gene shaving as a method for identifying distinct sets of genes with similar expression patterns. Genome Biology, 1(2) (2000), 1–21
T. Hastie, R. Tibshirani, D. Botstein, P. Brown. Supervised Harvesting of Expression Trees. Technical report (2000), Department of Statistics, Stanford University.
R. Jornsten Data compression and its statistical implications: with an application to the analysis of microarray images., PhD Thesis (2001), Department of Statistics, UC Berkeley.
L. Kaufman, and P. J. Rousseeuw. Finding Groups in Data: An introduction to cluster analysis. (1990) Wiley, New York.
K. Pollard, M van der Laan. Statistical inference for simultaneous clustering of gene expression data. Technical report (2001), Department of Biostatistics, UC Berkeley.
J. Möttönen, and H. Oja. Multivariate spatial sign and rank methods J. Nonparametric Statistics, 5 (1995), 201–203.
A. Owen, and L. Lazzeroni. The plaid model. Technical report (2000), Department of Statistics, Stanford University.
D. Rocke, D. Nguyen. Tumor classification by partial least squares using microarray gene expression data. Bioinformatics, 18(1) (2002), 39–50.
R. Tibshirani, G. Walther, and T. Hastie. Estimating the number of clusters in a dataset via the gap statistic. Technical report (2000), Stanford University, Department of Biostatistics.
R. Tibshirani, G. Walther, D. Botstein, and P. Brown. Cluster validation by prediction strength Technical report (2001), Stanford University, Department of Biostatistics.
Y. Vardi, and C-H. Zhang. The multivariate L 1-median and associated data depth. Proceedings of the National Academy of Sciences, 97 (2000), 1423–1426.
M. West, J. R. Nevins, J. R. Marks, R. Spang, C. Blanchette, H. Zuzan. DNA microarray data analysis and regression modeling for genetic expression profiling. Preprint (2001), Department of Statistics (Duke Univ).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer Basel AG
About this paper
Cite this paper
Jörnsten, R., Vardi, Y., Zhang, CH. (2002). A Robust Clustering Method and Visualization Tool Based on Data Depth. In: Dodge, Y. (eds) Statistical Data Analysis Based on the L1-Norm and Related Methods. Statistics for Industry and Technology. Birkhäuser, Basel. https://doi.org/10.1007/978-3-0348-8201-9_29
Download citation
DOI: https://doi.org/10.1007/978-3-0348-8201-9_29
Publisher Name: Birkhäuser, Basel
Print ISBN: 978-3-0348-9472-2
Online ISBN: 978-3-0348-8201-9
eBook Packages: Springer Book Archive