Abstract
Curve clustering is an important fundamental problem in biomedical applications involving clustering protein sequences or cell shapes in microscopy images. Existing model-based clustering techniques rely on simple probability models that are not generally valid for analyzing shapes of curves. In this chapter, we talk about an efficient Bayesian method to cluster curve data using a carefully chosen metric on the shape space. Rather than modeling the infinite-dimensional curves, we focus on modeling a summary statistic which is the inner product matrix obtained from the data. The inner-product matrix is modeled using a Wishart with parameters with carefully chosen hyperparameters which induce clustering and allow for automatic inference on the number of clusters. Posterior is sampled through an efficient Markov chain Monte Carlo procedure based on the Chinese restaurant process. This method is demonstrated on a variety of synthetic data and real data examples on protein structure analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adametz, D. and Roth, V. (2011). Bayesian partitioning of large-scale distance data. In Neural Information Processing Systems (NIPS), pages 1368–1376.
Auder, B. and Fischer, A. (2012). Projection-based curve clustering. Journal of Statistical Computation and Simulation, 82(8), 1145–1168.
Banfield, J. D. and Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, pages 803–821.
Belongie, S., Malik, J., and Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4), 509–522.
Bicego, M. and Murino, V. (2004). Investigating hidden Markov models’ capabilities in 2D shape classification. IEEE Trans. Pattern Anal. Mach. Intell, 26, 281–286.
Bicego, M. and Murino, V. (2007). Hidden Markov model-based weighted likelihood discriminant for 2D shape classification. IEEE Trans. Pattern Anal. Mach. Intell, 16, 2707–2719.
Bicego, M., Murino, V., and Figueiredo, M. A. (2004). Similarity-based classification of sequences using hidden Markov models. Pattern Recognition, 37(12), 2281–2291.
Bringmann, K. and Panagiotou, K. (2012). Efficient sampling methods for discrete distributions. In In Proc. 39th International Colloquium on Automata, Languages, and Programming (ICALP’12, pages 133–144. Springer.
Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics, pages 209–230.
Ferguson, T. S. (1974). Prior distributions on spaces of probability measures. The Annals of Statistics, pages 615–629.
Fraley, C. and Raftery, A. E. (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer Journal, 41(8), 578–588.
Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458), 611–631.
Fraley, C. and Raftery, A. E. (2006). MCLUST version 3: an R package for normal mixture modeling and model-based clustering. Technical report, DTIC Document.
Gaffney, S. and Smyth, P. (2005). Joint probabilistic curve clustering and alignment. In Neural Information Processing Systems (NIPS), pages 473–480. MIT Press.
Huang, W., Gallivan, K., Srivastava, A., and Absil, P.-A. (2014). Riemannian optimization for elastic shape analysis. Mathematical theory of Networks and Systems.
Jain, A. K. and Dubes, R. C. (1988). Algorithms for Clustering Data. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.
Jeannin, S. and Bober, M. (1999). Shape data for the MPEG-7 core experiment CE-Shape-1 @ONLINE.
Kurtek, S., Srivastava, A., Klassen, E., and Ding, Z. (2012). Statistical modeling of curves using shapes and related features. Journal of the American Statistical Association, 107(499), 1152–1165.
Liu, M., Vemuri, B. C., Amari, S.-I., and Nielsen, F. (2012). Shape retrieval using hierarchical total Bregman soft clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(12), 2407–2419.
Liu, W., Srivastava, A., and Zhang, J. (2011). A mathematical framework for protein structure comparison. PLoS Computational Biology, 7(2).
MacCullagh, P. and Yang, J. (2008). How many clusters? Bayesian Analysis, 3(1), 1–19.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pages 281–297, Berkeley, Calif. University of California Press.
McCullagh, P. (2009). Marginal likelihood for distance matrices. Statistica Sinica, 19, 631–649.
Murzin, A. G., Brenner, S. E., Hubbard, T., and Chothia, C. (1995). SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology, 247(4), 536–540.
Ng, A. Y., Jordan, M. I., Weiss, Y., et al. (2002). On spectral clustering: analysis and an algorithm. Advances in Neural Information Processing Systems, 2, 849–856.
Ozawa, K. (1985). A stratificational overlapping cluster scheme. Pattern Recognition, 18(3–4), 279–286.
Pitman, J. (2006). Combinatorial stochastic processes, volume 1875. Springer-Verlag.
Srivastava, A., Joshi, S., Mio, W., and Liu, X. (2005). Statistical shape analysis: clustering, learning, and testing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 590–602.
Srivastava, A., Klassen, E., Joshi, S. H., and Jermyn, I. H. (2011). Shape analysis of elastic curves in Euclidean spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 1415–1428.
Torsello, A., Robles-Kelly, A., and Hancock, E. (2007). Discovering shape classes using tree edit-distance and pairwise clustering. International Journal of Computer Vision, 72(3), 259–285.
Vogt, J. E., Prabhakaran, S., Fuchs, T. J., and Roth, V. (2010). The translation-invariant Wishart-Dirichlet process for clustering distance data. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 1111–1118.
Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244.
Yankov, D. and Keogh, E. (2006). Manifold clustering of shapes. In Proceedings of ICDM, pages 1167–1171, Washington, DC, USA.
Zhang, Z., Pati, D., and Srivastava, A. (2015). Bayesian clustering of shapes of curves. Journal of Statistical Planning and Inference (to appear).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Zhang, Z., Pati, D., Srivastava, A. (2015). Bayesian Shape Clustering. In: Mitra, R., Müller, P. (eds) Nonparametric Bayesian Inference in Biostatistics. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-19518-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-19518-6_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19517-9
Online ISBN: 978-3-319-19518-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)