Tree PCA for Extracting Dominant Substructures from Labeled Rooted Trees

  • Tomoya YamazakiEmail author
  • Akihiro Yamamoto
  • Tetsuji Kuboyama
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9356)


We propose novel principal component analysis (PCA) for rooted labeled trees to discover dominant substructures from a collection of trees. The principal components of trees are defined in analogy to the ordinal principal component analysis on numerical vectors. Our methods substantially extend earlier work, in which the input data are restricted to binary trees or rooted unlabeled trees with unique vertex indexing, and the principal components are also restricted to the form of paths. In contrast, our extension allows the input data to accept general rooted labeled trees, and the principal components to have more expressive forms of subtrees instead of paths. For this extension, we can employ the technique of flexible tree matching; various mappings used in tree edit distance algorithms. We design an efficient algorithm using top-down mappings based on our framework, and show the applicability of our algorithm by applying it to extract dominant patterns from a set of glycan structures.


Total Space Edit Distance Glycan Structure Edit Operation Tree Structure Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The authors would like to thank both the anonymous reviewers and Kouichi Hirata, Kyushu Institute of Technology, Japan for their valuable comments. This work was partially supported by the Grant-in-Aid for Scientific Research (KAKENHI Grant Numbers 26280085, 26280090, and 24300060) from the Japan Society for the Promotion of Science.


  1. 1.
    Alfaro, C.A., Aydin, B., Valencia, C.E., Bullitt, E., Ladha, A.: Dimension reduction in principal component analysis for trees. CSDA 74, 157–179 (2014)MathSciNetGoogle Scholar
  2. 2.
    Aydin, B., Pataki, C., Wang, H., Bullitt, E., Marron, J.S.: A principal component analysis for trees. Ann. Appl. Stat. 3(4), 1597–1615 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Chartrand, G., Lesniak, L.: Graphs and Digraphs, 3rd edn. Chapman and Hall/CRC, London (2000)zbMATHGoogle Scholar
  4. 4.
    Doubet, S., Albersheim, P.: CarbBank. Glycobiology 2(6), 505–507 (1992)CrossRefGoogle Scholar
  5. 5.
    Hashimoto, K., Goto, S., Kawano, S., Aoki-Kinoshita, K.F., Ueda, N.: KEGG as a glycan informatics resource. Glycobiology 16, 63–70 (2006)CrossRefGoogle Scholar
  6. 6.
    Kuboyama, T., Hirata, K., Aoki-Kinoshita, K.F., Kashima, H., Yasuda, H.: A gram distribution kernel applied to glycan classification and motif extraction. Genome Inform. 17(2), 25–34 (2006)Google Scholar
  7. 7.
    Kuboyama, T.: Matching and learning in trees, Ph.D. thesis, Univ. Tokyo (2007)Google Scholar
  8. 8.
    Pearson, K.: On lines and planes of closest fit to systems of points in space. Philos. Mag. 2(6), 559–572 (1901)CrossRefzbMATHGoogle Scholar
  9. 9.
    Tai, K.C.: The tree-to-tree correction problem. J. Addociation Comput. Mach. 26(3), 422–433 (1979)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Valiente, G.: An efficient bottom-up distance between trees. In: Proceedings of the 8th SPIRE, pp. 212–219. IEEE Comp. Science Press (2001)Google Scholar
  11. 11.
    Wang, H., Marron, J.S.: Object oriented data analysis: set of trees. Ann. Stat. 35(5), 1849–1873 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Wang, J.T.-L., Zhang, K.: Finding similar consensus between trees : an algorithm and a distance hierarchy. Pattern Recogn. 34, 127–137 (2001)CrossRefzbMATHGoogle Scholar
  13. 13.
    Yamanishi, Y., Bach, F., Vert, J.P.: Glycan classification with tree kernels. Bioinformatics 23(10), 1211–1216 (2007)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Tomoya Yamazaki
    • 1
    Email author
  • Akihiro Yamamoto
    • 1
  • Tetsuji Kuboyama
    • 2
  1. 1.Graduate School of InformaticsKyoto University Yoshida-HonmachiSakyo-ku, KyotoJapan
  2. 2.Computer CentreGakushuin UniversityToshima-ku, TokyoJapan

Personalised recommendations