Abstract
Several probability distributions for circular data have been discussed by Rao (Linear statistical inference and its applications, 2nd edn. Wiley, New York, 1973) in his classic book. The aim of this paper is to introduce model-based clustering methods for bivariate circular or toroidal data. Here a mixture model approach based on the joint distribution of the two dependent circular variables is proposed. In particular, two types of such mixture models are constructed, one based on the marginal and the other on the conditional specification. Convergence property of Expectation-Maximization (EM) algorithm for the members of the regular exponential family used for our models is studied. Cluster properties, such as optimum number, homogeneity, etc. are also discussed. A real life application on gene data is made to illustrate the use of the proposed approaches. Comparison of the two models is also done based on this example. Clustering method for observations on the torus do not seem to be available in the literature, and this paper is possibly the maiden attempt in that direction.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42519-021-00178-z/MediaObjects/42519_2021_178_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42519-021-00178-z/MediaObjects/42519_2021_178_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42519-021-00178-z/MediaObjects/42519_2021_178_Fig3_HTML.png)
Similar content being viewed by others
Availability of data and material
The data on the peak expression times or phases (from a microarray experiment) have been obtained from [13] as mentioned above.
References
Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803–821
Brendan JF, Dueck D (2007) Clustering by passing messages between data points. Science 315:972–976
Cheeseman P, Stutz J (1996) Bayesian classification (AutoClass): theory and results. Adv Knowl Discov Data Min 153–180
Dingxi Q, Tamhane A (2007) A comparative study of K-means algorithm and the normal mixture model for clustering: univariate case. J Stat Plan Inf 137:3722–3740
Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis. Wiley Series in Probability and Statistics, UK
Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Technical Report No. 329, Department of Statistics University of Washington
Fisher NI, Lee AJ (1992) Regression models for an angular response. Biometrics 48(3):665–677
Jammalamadaka SR, SenGupta A (2001) Topics in circular statistics. World Scientific Publishers, New Jersey
Johnson RA, Wehrly TE (1978) Some angular-linear distributions and related regression models. J Am Stat Assoc 73:602–606
Johnson RA, Wichern DW (2007) Applied multivariate statistical analysis. Pearson Prentice Hall, New Jersey
Kim S, SenGupta A (2017) Multivariate multiple circular regression. J Stat Comput Simul 87:1277–1291
Kim S, SenGupta A (2018) Clustering methods for spherical data: an overview and a new generalization. In: Proceedings of the 2nd Pacific Rim Statistical Conference for Production Engineering—big data, Production Engineering and Statistics. ICSA Book Series in Statistics, pp 155–164
Liu D, Peddada SD, Li L, Weinberg CR (2006) Phase analysis of circadian-related genes in two tissues. Bioinform BMC. https://doi.org/10.1186/1471-2105-7-87
McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley, New York
Rao CR (1973) Linear statistical inference and its applications, 2nd edn. Wiley, New York
Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Joo AM, Weiping D, Teng LC (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681
SenGupta A (2004) On the construction of probability distributions for directional data. Bull Cal Math Soc 96(2):139–154
SenGupta A, Arnold BC, Kim S (2013) Inverse circular-circular regression. J Multivar Anal 119:200–208
SenGupta A, Chattopadhyay AK, Roy M (2020) Model based clustering for cylindrical data. Invited Paper. In: Ghosh I, Balakrishnan N, Ng HKT (eds) Advances in Statistics—Theory and Applications—Honoring the Contributions of Barry C. Arnold in Statistical Science. Springer, New York. In Press
SenGupta A, Ugwuowo F (2011) A classification method for directional data with application to the human skull. Commun Stat Theory Methods 40:457–466
Singh H, Hnizdo V, Demchuk E (2002) Probabilistic model for two dependent circular variables. Biometrika 89(3):719–723
Wallace CS, Dowe DL (1994) Intrinsic classification by mml—the snob program. In: Proceedings of the 7th Australian Joint Conference on Artificial Intelligence, pp 37–44
Wu JCF (1983) On the convergence properties of the EM algorithm. Ann Stat 11(1):95–103
Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Xu R, Wunsch DC (2010) Clustering algorithms in biomedical research: a review. IEEE Rev Biomed Eng 3:120–154
Funding
Not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Code availability
The code may be made available on request from the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Celebrating the Centenary of Professor C. R. Rao” guest edited by, Ravi Khattree, Sreenivasa Rao Jammalamadaka , and M. B. Rao.
Rights and permissions
About this article
Cite this article
SenGupta, A., Roy, M. Clustering on the Torus. J Stat Theory Pract 15, 58 (2021). https://doi.org/10.1007/s42519-021-00178-z
Accepted:
Published:
DOI: https://doi.org/10.1007/s42519-021-00178-z