Skip to main content
Log in

Multinomial Principal Component Logistic Regression on Shape Data

  • Original Research
  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

This paper proposes a linear model that uses the principal component scores in shape data and fits the nominal responses in the tangent space of shapes. Multinomial logistic regression for multivariate data and logistic regression for binary responses are considered in this regard. Principal components in the tangent space are employed to improve the estimation of logistic model parameters under multicollinearity and to reduce the dimension of the input data. This paper improves the classification of shape data according to their different nominal groups. Furthermore, we assess the effectiveness of the proposed method using a comprehensive simulation and highlight the benefits of the new method using five real-world data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

The real data sets analyzed during the current study are available in R software package shapes, [https://cran.r-project.org/web/packages/shape/index.html]. Also, the R codes for simulation study are available from the corresponding author upon request.

References 

  • Aguilera, A.M., & Escabias, M. (2000). Principal component logistic regression, (pp. 175–180). New York: Physica-Verlag HD.

    MATH  Google Scholar 

  • Aguilera, A.M., Escabias, M., & Valderrama, M.J. (2006). Using principal components for estimating logistic regression with high-dimensional multicollinear data. Computational Statistics & Data Analysis, 50, 1905–1924.

    Article  MathSciNet  MATH  Google Scholar 

  • Akaike, H. (1987). Factor analysis and aic. Psychometrika, 52, 317–332.

    Article  MathSciNet  MATH  Google Scholar 

  • Bar, C. (2010). Elementary differential geometry. Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Bartlett, M.S. (1950). Tests of significance in factor analysis. British Journal of Statistical Psychology, 3, 77–85.

    Article  Google Scholar 

  • Bastien, P., Vinzi, V.E., & Tenenhaus, M. (2005). Pls generalised linear regression. Computational Statistics and Data Analysis, 48, 17–46.

    Article  MathSciNet  MATH  Google Scholar 

  • Bellman, R. (1961). Adaptive control processes. Princeton: Princeton University Press.

    Book  MATH  Google Scholar 

  • Bentz, Y., & Merunka, D. (2000). Neural networks and the multinomial logit for brand choice modelling: A hybrid approach. Journal of Forecasting, 19, 177–200.

    Article  Google Scholar 

  • Bookstein, F.L. (1986). Size and shape spaces for landmark data in two dimensions. Statistical Science, 1, 181–222.

    MATH  Google Scholar 

  • Boothby, W.M. (1986). An introduction to differentiable manifolds and riemannian geometry.

  • Bozdogan, H. (1994). On the frontiers of statistical modeling: an informational approach. Boston: Kluwer.

    Google Scholar 

  • Cattell, R.B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245–276.

    Article  Google Scholar 

  • le Cessie, S., & van Houwelingen, J.C. (1992). Ridge estimators in logistic regression. Journal of the Royal Statistical Society: Series C (Applied Statistics), 41, 191–201.

    MATH  Google Scholar 

  • Cootes, T.F., Taylor, C.J., Cooper, D.H., & Graham, J. (1992). Training models of shape from sets of examples, (pp. 9–18). London: Springer.

    Google Scholar 

  • Cox, D.R., & Snell, E.J. (1989). Analysis of binary data. London: Chapman and Hall.

    MATH  Google Scholar 

  • Czogiel, I., Dryden, I.L., & Brignell, C.J. (2011). Bayesian matching of unlabeled marked point sets using random fields, with an application to molecular alignment. Annals of Applied Statistics, 5, 2603–2629.

    Article  MathSciNet  MATH  Google Scholar 

  • Debavelaere, V., Durrleman, S., & Allassonnière, S. (2020). Learning the clustering of longitudinal shape data sets into a mixture of independent or branching trajectories. International Journal of Computer Vision, 128, 2794–2809. https://doi.org/10.1007/s11263-020-01337-8.

    Article  MathSciNet  MATH  Google Scholar 

  • Dryden, I.L., & Mardia, K.V. (2016). Statistical shape analysis with applications in R. New York: Wiley.

    Book  MATH  Google Scholar 

  • Dryden, I.L., Hirst, J.D., & Melville, J.L. (2007). Statistical analysis of unlabeled point sets: Comparing molecules in chemoinformatics. Biometrics, 63, 237–251.

    Article  MathSciNet  MATH  Google Scholar 

  • Escabias, M., Aguilera, A.M., & Valderrama, M.J. (2004). Principal component estimation of functional logistic regression: discussion of two different approaches. Journal of Nonparametric Statistics, 16, 365–384.

    Article  MathSciNet  MATH  Google Scholar 

  • Ferrando, L., Ventura-Campos, N., & Epifanio, I. (2020). Detecting and visualizing differences in brain structures with spharm and functional data analysis. NeuroImage, 222, 117209.

    Article  Google Scholar 

  • Frechet, M. (1948). Les éléments aléatoires de nature quelconque dans un espace distancié. Annales de l’institut Henri Poincaré, 10, 215–310.

    MathSciNet  MATH  Google Scholar 

  • Guttman, L. (1954). Some necessary conditions for common-factor analysis. Psychometrika, 19, 149–161.

    Article  MathSciNet  MATH  Google Scholar 

  • Hartzel, J., Agresti, A., & Caffo, B. (2001). Multinomial logit random effects models. Statistical Modelling, 1, 81–102.

    Article  MATH  Google Scholar 

  • Hayashi, K., Bentler, P.M., & Yuan, K.H. (2007). On the likelihood ratio test for the number of factors in exploratory factor analysis. Structural Equation Modeling: A Multidisciplinary Journal, 14, 505–526.

    Article  MathSciNet  Google Scholar 

  • Hosmer, D., Lemeshow, S., & Sturdivant, R.X. (2013). Applied Logistic Regression, 3rd edn. New York: Wiley.

    Book  MATH  Google Scholar 

  • Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417–441.

    Article  MATH  Google Scholar 

  • Izenman, A.J. (2009). Modern multivariate statistical techniques: regression, classification, and manifold learning. Springer Science & Business Media.

  • Jackson, J.E. (1991). A use’s guide to principal components. New York: Wiley.

    Book  Google Scholar 

  • Jöreskog, K.G. (1967). Some contributions to maximum likelihood factor analysis. Psychometrika, 32, 443–482.

    Article  MathSciNet  MATH  Google Scholar 

  • Kaiser, H.F. (1960). The application of electronic computers to factor analysis. Educational and Psychological Measurement, 20, 141–151.

    Article  Google Scholar 

  • Karcher, H. (1977). Riemannian center of mass and mollifier smoothing. Communications on Pure and Applied Math, 30, 509–541.

    Article  MathSciNet  MATH  Google Scholar 

  • Kemalbay, G., & Korkmazoğlu, Ö. B. (2014). Categorical principal component logistic regression: A case study for housing loan approval. Procedia-Social and Behavioral Sciences, 109, 730–736.

    Article  Google Scholar 

  • Kendall, D.G. (1977). The diffusion of shape. Advances in Applied Probability, 9, 428–430. https://doi.org/10.2307/1426091.

    Article  Google Scholar 

  • Kendall, D.G. (1984). Shape manifolds, procrustean metrics, and complex projective spaces. Bulletin of the London Mathematical Society, 16, 81–121.

    Article  MathSciNet  MATH  Google Scholar 

  • Kendall, D.G., Barden, D., Carne, T.K., & Le, H. (1999). Shape and shape theory. New York: Wiley.

    Book  MATH  Google Scholar 

  • Kent, J.T. (1994). The complex bingham distribution and shape analysis. Journal of the Royal Statistical Society: Series B (Methodological), 56, 285–299.

    MathSciNet  MATH  Google Scholar 

  • Kent, K.T., Dryden, I.L., & Anderson, K.R. (2000). Using circulant symmetry to model featureless objects. Biometrika, 87, 527–544.

    Article  MathSciNet  MATH  Google Scholar 

  • Krzanowski, W.J. (1987). Cross-validation in principal component analysis. Biometrics, 43, 584.

    Article  MathSciNet  Google Scholar 

  • Mallett, X.D., Dryden, I.L., Bruegge, R.V., & Evison, M. (2010). An exploration of sample representativeness in anthropometric facial comparison. Journal of Forensic Sciences, 55, 1025–1031.

    Article  Google Scholar 

  • Marx, B.D. (1992). A continuum of principal component generalized linear regressions. Computational Statistics & Data Analysis, 13, 385–393.

    Article  MathSciNet  MATH  Google Scholar 

  • Marx, B.D., & Smith, E.P. (1990). Principal component estimation for generalized linear regression. Biometrika, 77, 23–31.

    Article  MathSciNet  MATH  Google Scholar 

  • Nabil, M., & Golalizadeh, M. (2016). On clustering shape data. Journal of Statistical Computation and Simulation, 86, 2995–3008.

    Article  MathSciNet  MATH  Google Scholar 

  • O’Higgins, P. (1989). A morphometric study of cranial shape in the hominoidea.

  • O’Higgins, P., & Dryden, I.L. (1993). Sexual dimorphism in hominoids: further studies of craniofacial shape differences in pan, gorilla and pongo. Journal of Human Evolution, 24, 183–205.

    Article  Google Scholar 

  • Ozkale, M.R. (2021). Identification of outlying and influential data with principal components regression estimation in binary logistic regression. Communications in Statistics - Theory and Methods, 50, 609–630.

    Article  MathSciNet  MATH  Google Scholar 

  • Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2, 559–572.

    Article  MATH  Google Scholar 

  • Pennec, X. (2006). Intrinsic statistics on riemannian manifolds: Basic tools for geometric measurements. Journal of Mathematical Imaging and Vision, 25, 127–154.

    Article  MathSciNet  MATH  Google Scholar 

  • Schaefer, R.L. (1986). Alternative estimators in logistic regression when the data are collinear. Journal of Statistical Computation and Simulation, 25, 75–91.

    Article  MATH  Google Scholar 

  • Schaefer, R.L., Roi, L.D., & Wolfe, R.A. (1984). A ridge logistic estimator. Communications in Statistics - Theory and Methods, 13, 99–113.

    Article  Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.

    Article  MathSciNet  MATH  Google Scholar 

  • Shen, W., Wang, Y., Bai, X., Wang, H., & Latecki, L.J. (2013). Shape clustering: Common structure discovery. Pattern Recognition, 46, 539–550.

    Article  MATH  Google Scholar 

  • Simó, A., Ibáñez, M.V., Epifanio, I., & Gimeno, V. (2020). Generalized partially linear models on riemannian manifolds. Journal of the Royal Statistical Society Series C, 69, 641–661. https://doi.org/10.1111/RSSC.12411.

    Article  MathSciNet  Google Scholar 

  • Small, C.G. (1996). The statistical theory of shape. New York: Springer.

    Book  MATH  Google Scholar 

  • Smith, E.P., & Marx, B.D. (1990). Ill-conditioned information matrices, generalized linear models and estimation of the effects of acid rain. Environmetrics, 1, 57–71.

    Article  Google Scholar 

  • Srivastava, A., Joshi, S.H., Mio, W., & Liu, X. (2005). Statistical shape analysis: clustering, learning, and testing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 590–602.

    Article  Google Scholar 

  • Stoyan, D., & Stoyan, H. (1994). Fractals, random shapes, and point fields : methods of geometrical statistics. New York: Wiley.

    MATH  Google Scholar 

  • Vago, E., & Kemeny, S. (2006). Logistic ridge regression for clinical data analysis (a case study). Applied Ecology and Environmental Research, 4, 171–179.

    Article  Google Scholar 

  • Venables, W.N., & Ripley, B.D. (2002). Modern applied statistics with S. New York: Springer.

    Book  MATH  Google Scholar 

  • Wiseman, D.N., Samra, N., Lara, M.M.R., Penrice, S.C., & Goddard, A.D. (2021). The novel application of geometric morphometrics with principal component analysis to existing g protein-coupled receptor (gpcr) structures. Pharmaceuticals, 14, 953.

    Article  Google Scholar 

Download references

Acknowledgements

We would also like to thank the editor, and two referees for their constructive and thoughtful comments which helped us tremendously in improving the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meisam Moghimbeygi.

Ethics declarations

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 224 KB)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moghimbeygi, M., Nodehi, A. Multinomial Principal Component Logistic Regression on Shape Data. J Classif 39, 578–599 (2022). https://doi.org/10.1007/s00357-022-09423-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-022-09423-x

Keywords

Mathematics Subject Classification (2010)

Navigation