Evolutionary Optimization of Sequence Kernels for Detection of Bacterial Gene Starts

  • Britta Mersch
  • Tobias Glasmachers
  • Peter Meinicke
  • Christian Igel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4132)

Abstract

Oligo kernels for biological sequence classification have a high discriminative power. A new parameterization for the K-mer oligo kernel is presented, where all oligomers of length K are weighted individually. The task specific choice of these parameters increases the classification performance and reveals information about discriminative features. For adapting the multiple kernel parameters based on cross-validation the covariance matrix adaptation evolution strategy is proposed. It is applied to optimize the trimer oligo kernel for the detection of prokaryotic translation initiation sites. The resulting kernel leads to higher classification rates, and the adapted parameters reveal the importance for classification of particular triplets, for example of those occurring in the Shine-Dalgarno sequence.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Schölkopf, B., Tsuda, K., Vert, J.P. (eds.): Kernel Methods in Computational Biology. Computational Molecular Biology. MIT Press, Cambridge (2004)Google Scholar
  2. 2.
    Meinicke, P., Tech, M., Morgenstern, B., Merkl, R.: Oligo kernels for datamining on biological sequences: A case study on prokaryotic translation initiation sites. BMC Bioinformatics 5 (2004)Google Scholar
  3. 3.
    Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Machine Learning 46, 131–159 (2002)MATHCrossRefGoogle Scholar
  4. 4.
    Glasmachers, T., Igel, C.: Gradient-based adaptation of general Gaussian kernels. Neural Computation 17, 2099–2105 (2005)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Keerthi, S.S.: Efficient tuning of SVM hyperparameters using radius/margin bound and iterative algorithms. IEEE Transactions on Neural Networks 13, 1225–1229 (2002)CrossRefGoogle Scholar
  6. 6.
    Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation 9, 159–195 (2001)CrossRefGoogle Scholar
  7. 7.
    Friedrichs, F., Igel, C.: Evolutionary tuning of multiple SVM parameters. Neurocomputing 64, 107–117 (2005)CrossRefGoogle Scholar
  8. 8.
    Igel, C., Wiegand, S., Friedrichs, F.: Evolutionary optimization of neural systems: The use of self-adaptation. In: Trends and Applications in Constructive Approximation. International Series of Numerical Mathematics, vol. 151, pp. 103–123. Birkhäuser Verlag, Basel (2005)CrossRefGoogle Scholar
  9. 9.
    Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)Google Scholar
  10. 10.
    Degroeve, S., Beats, B.D., de Peer, Y.V., Rouzé, P.: Feature subset selection for splice site prediction. Bioinformatics 18, 75–83 (2002)Google Scholar
  11. 11.
    Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: A string kernel for SVM protein classification. In: Altman, R.B., et al. (eds.) Proceedings of the Pacific Symposium on Biocomputing, pp. 564–575. World Scientific, Singapore (2002)Google Scholar
  12. 12.
    Eads, D.R., et al.: Genetic algorithms and support vector machines for time series classification. In: Bosacchi, B., Fogel, D.B., Bezdek, J.C. (eds.) Applications and Science of Neural Networks, Fuzzy Systems, and Evolutionary Computation V, Proceedings of the SPIE, vol. 4787, pp. 74–85 (2002)Google Scholar
  13. 13.
    Fröhlich, H., Chapelle, O., Schölkopf, B.: Feature selection for support vector machines using genetic algorithms. International Journal on Artificial Intelligence Tools 13, 791–800 (2004)CrossRefGoogle Scholar
  14. 14.
    Igel, C.: Multi-objective model selection for support vector machines. In: Coello Coello, C.A., Hernández Aguirre, A., Zitzler, E. (eds.) EMO 2005. LNCS, vol. 3410, pp. 534–546. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  15. 15.
    Jong, K., Marchiori, E., van der Vaart, A.: Analysis of proteomic pattern data for cancer detection. In: Raidl, G.R., Cagnoni, S., Branke, J., Corne, D.W., Drechsler, R., Jin, Y., Johnson, C.G., Machado, P., Marchiori, E., Rothlauf, F., Smith, G.D., Squillero, G. (eds.) EvoWorkshops 2004. LNCS, vol. 3005, pp. 41–51. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  16. 16.
    Miller, M.T., Jerebko, A.K., Malley, J.D., Summers, R.M.: Feature selection for computeraided polyp detection using genetic algorithms. In: Clough, A.V., Amini, A.A. (eds.) Medical Imaging 2003: Physiology and Function: Methods, Systems, and Applications, Proceedings of the SPIE, vol. 5031, pp. 102–110 (2003)Google Scholar
  17. 17.
    Pang, S., Kasabov, N.: Inductive vs. transductive inference, global vs. local models: SVM, TSVM, and SVMT for gene expression classification problems. In: International Joint Conference on Neural Networks (IJCNN), vol. 2, pp. 1197–1202. IEEE Press, Los Alamitos (2004)Google Scholar
  18. 18.
    Runarsson, T.P., Sigurdsson, S.: Asynchronous parallel evolutionary model selection for support vector machines. Neural Information Processing – Letters and Reviews 3, 59–68 (2004)Google Scholar
  19. 19.
    Shi, S.Y.M., Suganthan, P.N., Deb, K.: Multi-class protein fold recognition using multiobjective evolutionary algorithms. In: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp. 61–66. IEEE Press, Los Alamitos (2004)Google Scholar
  20. 20.
    Beyer, H.G., Schwefel, H.P.: Evolution strategies: A comprehensive introduction. Natural Computing 1, 3–52 (2002)MATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Hansen, N., Kern, S.: Evaluating the CMA evolution strategy on multimodal test functions. In: Yao, X., Burke, E.K., Lozano, J.A., Smith, J., Merelo-Guervós, J.J., Bullinaria, J.A., Rowe, J.E., Tiňo, P., Kabán, A., Schwefel, H.-P., et al. (eds.) PPSN 2004. LNCS, vol. 3242, pp. 282–291. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  22. 22.
    Gualerzi, C.O., Pon, C.L.: Initiation of mRNA translation in procaryotes. Biochemistry 29, 5881–5889 (1990)CrossRefGoogle Scholar
  23. 23.
    Zien, A., et al.: Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics 16, 799–807 (2000)CrossRefGoogle Scholar
  24. 24.
    Rudd, K.E.: Ecogene: a genome sequence database for Escherichia coli K-12. Nucleic Acids Research 28, 60–64 (2000)CrossRefGoogle Scholar
  25. 25.
    Blattner, F.R., et al.: The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1462 (1997)CrossRefGoogle Scholar
  26. 26.
    Kozak, M.: Initiation of translation in prokaryotes and eukaryotes. Gene 234, 187–208 (1999)CrossRefGoogle Scholar
  27. 27.
    Shine, J., Dalgarno, L.: The 3’-terminal sequence of Escherichia coli 16S ribosomal RNA: Complementarity to nonsense triplets and ribosome binding sites. PNAS 71, 1342–1346 (1974)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Britta Mersch
    • 1
  • Tobias Glasmachers
    • 2
  • Peter Meinicke
    • 3
  • Christian Igel
    • 2
  1. 1.German Cancer Research CenterHeidelbergGermany
  2. 2.Institut für NeuroinformatikRuhr-Universität BochumBochumGermany
  3. 3.Institut für Mikrobiologie und Genetik, Abteilung für BioinformatikGeorg-August-Universität GöttingenGöttingenGermany

Personalised recommendations