Skip to main content
Log in

Statistical inference for homologous gene pairs between two circular genomes: a new circular–circular regression model

  • Original Paper
  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

In this paper, we investigate the problem of determining the relationship, represented by similarity of the homologous gene configuration, between paired circular genomes using a regression analysis. We propose a new regression model for studying two circular genomes, where the Möbius transformation naturally arises and is taken as the link function, and propose the least circular distance estimation method, as an appropriate method for analyzing circular variables. The main utility of the new regression model is in identification of a new angular location of one of a homologous gene pair between two circular genomes, for various types of possible gene mutations, given that of the other gene. Furthermore, we demonstrate the utility of our new regression model for grouping of various genomes based on closeness of their relationship. Using angular locations of homologous genes from the five pairs of circular genomes (Horimoto et al. in Bioinformatics 14:789–802, 1998), the new model is compared with the existing models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Chakrabarti P, Pal D (2001) The interrelationships of side-chain and main-chain conformations in proteins. Prog Biophys Mol Biol 76:1–102

    Article  Google Scholar 

  • Downs TD, Mardia KV (2002) Circular regression. Biometrika 89:683–698

    Article  MathSciNet  MATH  Google Scholar 

  • Fisher NI, Lee AJ (1992) Regression models for an angular response. Biometrics 48:665–677

    Article  MathSciNet  Google Scholar 

  • Fisher NI (1993) Statistical analysis of circular data. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  • Gould AL (1969) A Regression technique for angular variates. Biometrics 25:683–700

    Article  Google Scholar 

  • Horimoto K, Suyama M, Toh H, Mori K, Otsuka J (1998) A method for comparing circular genomes from gene locations: application to mitochondrial genomes. Bioinformatics 14:789–802

    Article  Google Scholar 

  • Jammalamadaka SR, SenGupta A (2001) Topics in circular statistics. World Scientific, New York

    Book  Google Scholar 

  • Kato S, Shimizu K, Shieh GS (2008) A circular–circular regression model. Stat Sin 18:633–645

    MathSciNet  MATH  Google Scholar 

  • Kim S (2009) Inverse circular regression with a possibly asymmetric error distribution. PhD Dissertation. University of California, Riverside

  • Liu D, Weinberg CR, Peddada SD (2004) A geometric approach to deterine association and coherence of the activation times of cell-cycling genes under different experimental conditions. Bioinformatics 20:2521–2528

    Article  Google Scholar 

  • Liu D, Peddada SD, Li L, Weinberg CR (2006) Phase analysis of circadian-related genes in two tissues. BMC Bioinf. doi:10.1186/1471-2105-7-87

    Google Scholar 

  • Presnell B, Morrison SP, Littell RC (1998) Projected multivariate linear models for directional data. J Am Stat Assoc 93:1068–1077

    Article  MathSciNet  MATH  Google Scholar 

  • Rivest LP (1997) A decentred predictor for circular–circular regression. Biometrika 84:717–726

    Article  MathSciNet  MATH  Google Scholar 

  • SenGupta A, Ugwuowo F (2006) Asymmetric circular–linear multivariate regression models with applications to environmental data. Environ Ecol Stat 13:299–309

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sungsu Kim.

Appendix

Appendix

1.1 Proof of Theorem 1

The limiting distribution of \(\zeta =\{a,b\}\) is obtained using an exact first-order Taylor series expansion of the first order condition, for some \(\zeta ^+\) between \(\hat{\zeta }\) and \(\zeta _0\),

$$\begin{aligned} \frac{\partial Q_n(\zeta )}{\partial \zeta }= & {} \frac{1}{n}\sum _{j=1}^n \frac{\partial m_j(\zeta )}{\partial \zeta }\sin \{\theta _j-m_j(\zeta )\}\nonumber \\= & {} \frac{1}{n}\sum _{j=1}^n \frac{\partial m_j(\zeta )}{\partial \zeta }\sin \{\theta _j-m_j(\zeta )\}|_{\zeta _0}\nonumber \\&-\frac{1}{n}\sum _{j=1}^n\left[ \frac{\partial m_j(\zeta )}{\partial \zeta }\frac{\partial m_j(\zeta )}{\partial \zeta '}\cos \{\theta _j-m_j(\zeta )\}\right. \nonumber \\&\left. -\frac{\partial ^2m_j(\zeta )}{\partial \zeta \partial \zeta '}\sin \{\theta _j-m_j(\zeta )\}\right] \Bigg |_{\zeta ^+}(\hat{\zeta }-\zeta _0)=0.\\ \sqrt{n}(\hat{\zeta }-\zeta _0)= & {} \left\{ \frac{1}{n}\sum _{j=1}^n\left[ \frac{\partial m_j(\zeta )}{\partial \zeta }\frac{\partial m_j(\zeta )}{\partial \zeta '}\cos \{\theta _j-m_j(\zeta )\}\right. \right. \nonumber \\&\quad \left. \left. -\frac{\partial ^2m_j(\zeta )}{\partial \zeta \partial \zeta '}\sin \{\theta _j-m_j(\zeta )\}\right] \Bigg |_{\zeta ^+}\right\} ^{-1}\nonumber \\&\quad \cdot \frac{1}{\sqrt{n}}\sum _{j=1}^n \frac{\partial m_j}{\partial \zeta }\sin (\theta _j-m_j)\Bigg |_{\zeta _{0}}, \end{aligned}$$

where \(Q_n(\zeta )=\frac{1}{n}\sum _{j=1}^n\left[ 1-\cos \{\theta _j-\mu -2\arctan (a+bx_j)\}\right] \). We apply the multivariate CLT for independent random vectors in the following, to obtain an asymptotic multivariate normality of \(\frac{1}{\sqrt{n}}\sum _{j=1}^n \frac{\partial m_j}{\partial \zeta } \sin (\theta _j-m_j)|_{\zeta _{0}}\). Then,

$$\begin{aligned}&\frac{1}{\sqrt{n}}\sum _{j=1}^n \frac{\partial m_j}{\partial \zeta }\sin (\theta _j-m_j)\Bigg |_{\zeta _{0}}\mathop {\longrightarrow }\limits ^{d}N(0,B_0),\quad \text {where}\\&\quad B_0= \text {var} \left\{ \frac{\partial m_j}{\partial \zeta '}\sin (\theta _j-m_j)\right\} \Bigg |_{\zeta _{0}}= \text {E}\left\{ \frac{\partial m_j}{\partial \zeta }\frac{\partial m_j}{\partial \zeta '}\sin ^2(\theta _j-m_j)\right\} \Bigg |_{\zeta _{0}},\\&\quad \text {and}\quad \text {plim}\frac{1}{n}\sum _{j=1}^n\left[ \frac{\partial m_j(\zeta )}{\partial \zeta }\frac{\partial m_j(\zeta )}{\partial \zeta '}\cos (\theta _j-m_j)-\frac{\partial ^2m_j(\zeta )}{\partial \zeta \partial \zeta '}\sin (\theta _j-m_j)\right] \Bigg |_{\zeta ^+}\nonumber \\&\quad = \lim \frac{1}{n}\sum _{j=1}^n\left[ E\left\{ \frac{\partial m_j(\zeta )}{\partial \zeta }\frac{\partial m_j(\zeta )}{\partial \zeta '}\cos (\theta _j-m_j)\right\} -E\left\{ \frac{\partial ^2m_j(\zeta )}{\partial \zeta \partial \zeta '}\sin (\theta _j-m_j)\right\} \right] \Bigg |_{\zeta ^+}\nonumber \\&\quad =\left[ E\left\{ \frac{\partial m_j(\zeta )}{\partial \zeta }\frac{\partial m_j(\zeta )}{\partial \zeta '}\cos (\theta _j-m_j)\right\} -E\left\{ \frac{\partial ^2m_j(\zeta )}{\partial \zeta \partial \zeta '}\sin (\theta _j-m_j)\right\} \right] \Bigg |_{\zeta ^+}\nonumber \\&\quad =A_0 . \end{aligned}$$

Now, using Slutsky’s theorem, (or Product Limit Normal Rule) we get

$$\begin{aligned} \sqrt{n}(\hat{\zeta }-\zeta _0)\mathop {\longrightarrow }\limits ^{d}N(0,A_0^{-1}B_0A_0^{-1}). \end{aligned}$$

Then, the asymptotic distributions are given by

$$\begin{aligned} \hat{\zeta }\mathop {\sim }\limits ^{a} N(\zeta _0,n^{-1}A_0^{-1}B_0A_0^{-1}). \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

SenGupta, A., Kim, S. Statistical inference for homologous gene pairs between two circular genomes: a new circular–circular regression model. Stat Methods Appl 25, 421–432 (2016). https://doi.org/10.1007/s10260-015-0341-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-015-0341-8

Keywords

Navigation