Statistics in Biosciences

, Volume 4, Issue 1, pp 132–156 | Cite as

New Approaches to Principal Component Analysis for Trees

  • Burcu Aydın
  • Gábor Pataki
  • Haonan Wang
  • Alim Ladha
  • Elizabeth Bullitt
  • J. S. Marron


Object Oriented Data Analysis is a new area in statistics that studies populations of general data objects. In this article we consider populations of tree-structured objects as our focus of interest. We develop improved analysis tools for data lying in a binary tree space analogous to classical Principal Component Analysis methods in Euclidean space. Our extensions of PCA are analogs of one dimensional subspaces that best fit the data. Previous work was based on the notion of tree-lines.

In this paper, a generalization of the previous tree-line notion is proposed: k-tree-lines. Previously proposed tree-lines are k-tree-lines where k=1. New sub-cases of k-tree-lines studied in this work are the 2-tree-lines and tree-curves, which explain much more variation per principal component than tree-lines. The optimal principal component tree-lines were computable in linear time. Because 2-tree-lines and tree-curves are more complex, they are computationally more expensive, but yield improved data analysis results.

We provide a comparative study of all these methods on a motivating data set consisting of brain vessel structures of 98 subjects.


Binary trees Object oriented data analysis Principal component analysis Tree-lines Vessel structure 


  1. 1.
    Alfaro CA, Aydın B, Bullitt E, Ladha A, Valencia CE (2011) Dimension reduction in principal component analysis for trees. Manuscript in progress Google Scholar
  2. 2.
    Aydın B (2009) Principal component analysis of tree structured objects. Ph.D. Thesis, University of North Carolina at Chapel Hill Google Scholar
  3. 3.
    Aydın B, Pataki G, Wang H, Bullitt E, Marron JS (2009) A principal component analysis for trees. Ann Appl Stat 3:1597–1615 MathSciNetMATHCrossRefGoogle Scholar
  4. 4.
    Aydın B, Pataki G, Wang H, Ladha A, Bullitt E, Marron JS (2011) Visualizing the structure of large trees. Electron J Stat 5:405–420 MathSciNetCrossRefGoogle Scholar
  5. 5.
    Aylward S, Bullitt E (2002) Initialization, noise, singularities and scale in height ridge traversal for tubular object centerline extraction. IEEE Trans Med Imaging 21:61–75 CrossRefGoogle Scholar
  6. 6.
    Banks D, Constantine GM (1998) Metric models for random graphs. J Classif 15:199–223 MathSciNetMATHCrossRefGoogle Scholar
  7. 7.
    Bazaraa MS, Shetty CM (1979) Nonlinear programming: Theory and algorithms. Wiley, New York MATHGoogle Scholar
  8. 8.
    Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140 MathSciNetMATHGoogle Scholar
  9. 9.
    Breiman L, Friedman JH, Olshen JA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont MATHGoogle Scholar
  10. 10.
    Bullitt E, Gerig G, Pizer SM, Aylward SR (2003) Measuring tortuosity of the intracerebral vasculature from MRA images. IEEE Trans Med Imaging 22:1163–1171 CrossRefGoogle Scholar
  11. 11.
    Bullitt E, Zeng D, Ghosh A, Aylward SR, Lin W, Marks BL, Smith K (2010) The effects of healthy aging on intracerebral blood vessels visualized by magnetic resonance angiography. Neurobiol Aging 31(2):290–300 CrossRefGoogle Scholar
  12. 12.
    Cook WJ, Cunningham WH, Pulleyblank WR, Schrijver A (1997) Combinatorial optimization. Wiley, New York CrossRefGoogle Scholar
  13. 13.
    Everitt BS, Landau S, Leese M (2001) Cluster analysis, 4th edn. Oxford University Press, New York MATHGoogle Scholar
  14. 14.
  15. 15.
    Land AH, Doig AG (1960) An automatic method of solving discrete programming problems. Econometrica 28(3):497–520 MathSciNetMATHCrossRefGoogle Scholar
  16. 16.
    Lawler EL, Bell MD (1966) A method for solving discrete optimization problems. Oper Res 14(6):1098–1112 CrossRefGoogle Scholar
  17. 17.
    Lawler EL, Wood DE (1966) Branch-and-bound methods: A survey. Oper Res 14:699–719 MathSciNetMATHCrossRefGoogle Scholar
  18. 18.
    Nye T (2011) Principal component analysis in the space of phylogenetic trees. Unpublished manuscript,
  19. 19.
    Schrijver A (1998) Theory of linear and integer programming. Wiley, New York MATHGoogle Scholar
  20. 20.
    Shen D, Shen H, Bhamidi S, Munoz-Maldonado Y, Kim Y, Marron JS (2011) Functional data analysis for trees. Manuscript in progress Google Scholar
  21. 21.
    Wang H, Marron JS (2007) Object oriented data analysis: sets of trees. Ann Stat 35:1849–1873 MathSciNetMATHCrossRefGoogle Scholar
  22. 22.
    Wang Y, Marron JS, Aydın B, Ladha A, Bullitt E, Wang H (2011) Nonparametric regression model with tree-structured response. Manuscript in progress Google Scholar

Copyright information

© International Chinese Statistical Association 2012

Authors and Affiliations

  • Burcu Aydın
    • 1
  • Gábor Pataki
    • 2
  • Haonan Wang
    • 3
  • Alim Ladha
    • 4
  • Elizabeth Bullitt
    • 5
  • J. S. Marron
    • 6
  1. 1.HP LaboratoriesPalo AltoUSA
  2. 2.UNC at Chapel HillChapel HillUSA
  3. 3.Colorado State UniversityFort CollinsUSA
  4. 4.UNC at Chapel HillChapel HillUSA
  5. 5.UNC at Chapel HillChapel HillUSA
  6. 6.UNC at Chapel HillChapel HillUSA

Personalised recommendations