Advertisement

Fast Classification of Protein Structures by an Alignment-Free Kernel

  • Taku OnoderaEmail author
  • Tetsuo Shibuya
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9954)

Abstract

Alignment is the most fundamental algorithm that has been widely used in numerous research in bioinformatics, but its computation cost becomes too expensive in various modern problems because of the recent explosive data growth. Hence the development of alignment-free algorithms, i.e., alternative algorithms that avoid the computationally expensive alignment, has become one of the recent hot topics in algorithmic bioinformatics.

Analysis of protein structures is a very important problem in bioinformatics. We focus on the problem of predicting functions of proteins from their structures, as the functions of proteins are the keys of everything in the understandings of any organisms and moreover these functions are said to be determined by their structures. But the previous best-known (i.e., the most accurate) method for this problem utilizes alignment-based kernel method, which suffers from the high computation cost of alignments.

For the problem, we propose a new kernel method that does not employ alignments. Instead of alignments, we apply the two-dimensional suffix tree and the contact map graph to reduce kernel-related computation cost dramatically. Experiments show that, compared to the previous best algorithm, our new method runs about 16 times faster in training and about 37 times faster in prediction while preserving comparatively high accuracy.

Keywords

Feature Vector Kernel Function Structural Alignment Adjacency Matrice Suffix Array 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgement

This work was supported by JSPS KAKENHI Grant Numbers 25280002 and 24106007. The super-computing resource was provided by Human Genome Center (the Univ. of Tokyo).

References

  1. 1.
    Aluru, S., Apostolico, A., Thankachan, S.V.: Efficient alignment free sequence comparison with bounded mismatches. In: Przytycka, T.M. (ed.) RECOMB 2015. LNCS, vol. 9029, pp. 1–12. Springer, Heidelberg (2015)Google Scholar
  2. 2.
    Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide, 3rd edn. SIAM, Philadelphia (1999)CrossRefzbMATHGoogle Scholar
  3. 3.
    Bentley, J.L., Sedgewick, R.: Fast algorithms for sorting and searching strings. In: Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 360–369 (1997)Google Scholar
  4. 4.
    Bhattacharya, S., Bhattacharyya, C., Chandra, N.: Structural alignment based kernels for protein structure classification. In: Proceedings of the 24th International Conference on Machine Learning, pp. 73–80 (2007)Google Scholar
  5. 5.
    Bonham-Carter, O., Steele, J., Bastola, D.: Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis. Briefings Bioinform. 15(6), 890–905 (2014)CrossRefGoogle Scholar
  6. 6.
    Břinda, K., Sykulski, M., Kucherov, G.: Spaced seeds improve k-mer-based metagenomic classification. Bioinformatics 31(22), 3584–3592 (2015)CrossRefGoogle Scholar
  7. 7.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)CrossRefGoogle Scholar
  8. 8.
    Fox, N.K., Brenner, S.E., Chandonia, J.M.: SCOPe: structural classification of proteins-extended, integrating scop and astral data and classification of new structures. Nucleic Acids Res. 42(D1), D304–D309 (2014)CrossRefGoogle Scholar
  9. 9.
    Giancarlo, R.: A generalization of the suffix tree to square matrices, with applications. SIAM J. Comput. 24(3), 520–562 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Goldman, D., Istrail, S., Papadimitriou, C.H.: Algorithmic aspects of protein structure similarity. In: Proceedings of the 40th Symposium on Foundations of Computer Science, pp. 512–521 (1999)Google Scholar
  11. 11.
    Hasegawa, H., Holm, L.: Advances and pitfalls of protein structural alignment. Curr. Opin. Struct. Biol. 19(3), 341–348 (2009)CrossRefGoogle Scholar
  12. 12.
    Haubold, B.: Alignment-free phylogenetics and population genetics. Briefings Bioinf. 15(3), 407–418 (2014)CrossRefGoogle Scholar
  13. 13.
    Holm, L., Sander, C.: Protein structure comparison by alignment of distance matrices. J. Mol. Biol. 233(1), 123–138 (1993)CrossRefGoogle Scholar
  14. 14.
    Joachims, T.: Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 217–226 (2006)Google Scholar
  15. 15.
    Kim, D.K., Na, J.C., Sim, J.S., Park, K.: Linear-time construction of two-dimensional suffix trees. Algorithmica 59(2), 269–297 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Leslie, C.S., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for SVM protein classification. In: Pacific Symposium on Biocomputing, pp. 566–575 (2002)Google Scholar
  17. 17.
    Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. In: Proceedings of the 1st Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 319–327 (1990)Google Scholar
  18. 18.
    Poleksic, A.: Algorithms for optimal protein structure alignment. Bioinformatics 25(21), 2751–2756 (2009)CrossRefGoogle Scholar
  19. 19.
    Qiu, J., Hue, M., Ben-Hur, A., Vert, J.P., Noble, W.S.: A structural alignment kernel for protein structures. Bioinformatics 23(9), 1090–1098 (2007)CrossRefGoogle Scholar
  20. 20.
    Severyn, A., Moschitti, A.: Large-scale support vector learning with structural kernels. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part III. LNCS, vol. 6323, pp. 229–244. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  21. 21.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)CrossRefzbMATHGoogle Scholar
  22. 22.
    Shindyalov, I.N., Bourne, P.E.: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 11(9), 739–747 (1998)CrossRefGoogle Scholar
  23. 23.
    Song, K., Ren, J., Reinert, G., Deng, M., Waterman, M.S., Sun, F.: New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing. Briefings Bioinf. 15(3), 343–353 (2014)CrossRefGoogle Scholar
  24. 24.
    Vapnik, V.N.: Statistical Learning Theory, vol. 1. Wiley, New York (1998)zbMATHGoogle Scholar
  25. 25.
    Wang, C., Scott, S.D.: New kernels for protein structural motif discovery and function classification. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 940–947 (2005)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Human Genome Center, Institute of Medical Science, The University of TokyoMinato-kuJapan
  2. 2.CREST, JST TokyoJapan

Personalised recommendations