Skip to main content

Feature and Kernel Evolution for Recognition of Hypersensitive Sites in DNA Sequences

  • Conference paper
Bio-Inspired Models of Network, Information, and Computing Systems (BIONETICS 2010)

Abstract

The annotation of DNA regions that regulate gene transcription is the first step towards understanding phenotypical differences among cells and many diseases. Hypersensitive (HS) sites are reliable markers of regulatory regions. Mapping HS sites is the focus of many statistical learning techniques that employ Support Vector Machines (SVM) to classify a DNA sequence as HS or non-HS. The contribution of this paper is a novel methodology inspired by biological evolution to automate the basic steps in SVM and improve classification accuracy. First, an evolutionary algorithm designs optimal sequence motifs used to associate feature vectors with the input sequences. Second, a genetic programming algorithm designs optimal kernel functions that map the feature vectors into a high-dimensional space where the vectors can be optimally separated into the HS and non-HS classes. Results show that the employment of evolutionary computation techniques improves classification accuracy and promises to automate the analysis of biological sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blanchette, M., Bataille, A.R., Chen, X., Poitras, C., Laganiere, J., Lefebvre, C., Deblois, G., Giguere, V., Ferretti, V., Bergeron, D., Coulombe, B., Robert, F.: Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res. 16(5), 656–668 (2006)

    Article  Google Scholar 

  2. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Haussler, D. (ed.) 5th Annual ACM Workshop on COLT, pp. 144–152. ACM Press (1992)

    Google Scholar 

  3. Boughorbel, S., Tarel, J.-P., Boujemaa, N.: Conditionally positive definite kernels for svm based image recognition. In: Proceedings of IEEE International Conference on Multimedia and Expo (ICME 2005), Amsterdam, The Netherlands (2005), http://perso.lcpc.fr/tarel.jean-philippe/publis/icme05.html

  4. Burgess-Beusse, B., Farrell, C., Gaszner, M., Litt, M., Mutskov, V., Recillas-Targa, F., Simpson, M., West, A., Felsenfeld, G.: The insulation of genes from external enhancers and silencing chromatin. Proc. Natl. Acad. Sci. USA 99(S4), 16433–16437 (2002)

    Article  Google Scholar 

  5. I. Committee. Nomenclature committee of the international union of biochemistry (nc-iub). nomenclature for incompletely specified bases in nucleic acid sequences. recommendations 1984. Biochemistry 229(2), 75–88 (1985)

    Google Scholar 

  6. Consortium IHGS. Finishing the euchromatic sequence of the human genome. Nature 431(7011), 931–945 (2004)

    Google Scholar 

  7. De Jong, K.A.: Evolutionary computation: a unified approach. MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  8. de Souza, B.F., de Carvalho, A.C., Calvo, R., Ishii, R.P.: Multiclass svm model selection using particle swarm optimization. In: Sixth International Conference on Hybrid Intelligent Systems (2006)

    Google Scholar 

  9. Dorschner, M.O., Hawrylycz, M., Humbert, R., Wallace, J.C., Shafer, A., Kawamoto, J., Mack, J., Hall, R., Goldy, J., Sabo, P.J., Kohli, A., Li, Q., McArthur, M., Stamatoyannopoulos, J.A.: High-throughput localization of functional elements by quantitative chromatin profiling. Nat. Methods 1(3), 219–225 (2004)

    Article  Google Scholar 

  10. Fan, R.-E., Chen, P.-H., Lin, C.-J.: Working set selection using the second order information for training SVM. J. Mach. Learn. Res. 6(1532-4435), 1889–1918 (2005)

    MathSciNet  MATH  Google Scholar 

  11. Friedrichs, F., Igel, C.: Evolutionary tuning of multiple svm parameters. In: 12th European Symposium on Artificial Neural Networks (ESANN 2004), pp. 519–524 (2004)

    Google Scholar 

  12. Gagné, C., Schoenauer, M., Sebag, M., Tomassini, M.: Genetic Programming for Kernel-Based Learning with Co-evolving Subsets Selection. In: Runarsson, T.P., Beyer, H.-G., Burke, E.K., Merelo-Guervós, J.J., Whitley, L.D., Yao, X. (eds.) PPSN 2006. LNCS, vol. 4193, pp. 1008–1017. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  13. Gross, D.S., Garrard, W.T.: Nuclear hypersensitive sites in chromatin. Annu. Rev. Biochem. 57, 159–197 (1988)

    Article  Google Scholar 

  14. Habib, T., Zhang, C., Yang, J.Y., Yang, M.Q., Deng, Y.: Supervised learning method for the prediction of subcellular localization of proteins using amino acid and amino acid pair composition. BMC Genom. 9(suppl. 1), S1–S16 (2008)

    Google Scholar 

  15. Higgs, D.R., Vernimmen, D., Hughes, J., Gibbons, R.: Using genomics to study how chromatin influences gene expression. Annu. Rev. Genom. Human Genet. 8, 299–325 (2007)

    Article  Google Scholar 

  16. Hofmann, T., Schölkopf, B., Smola, A.: Kernel methods in machine learning. The Annals of Statistics 36(3), 1171–1220 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  17. Holland, R.C., Down, T.A., Pocock, M., Prlic, A., Huen, D., James, K., Foisy, S., Draeger, A., Yates, A., Heuer, M., Schreiber, M.J.: BioJava: an open-source framework for bioinformatics. Bioinformatics 24(18), 2096–2097 (2008)

    Article  Google Scholar 

  18. Huang, C.-L., Wang, C.-J.: A ga-based feature selection and parameter optimization for support vector machines. Expert Systems with Applications, 231–240 (2006)

    Google Scholar 

  19. Islamaj-Dogan, R., Getoor, L., Wilbur, W.J.: A feature generation algorithm with applications to biological sequence classification. In: Liu, H., Motoda, H. (eds.) Computational Methods of Feature Selection. Springer, Berlin (2007)

    Google Scholar 

  20. Islamaj-Dogan, R., Getoor, L., Wilbur, W.J., Mount, S.M.: Features generated for computational splice-site prediction correspond to functional elements. BMC Bioinformatics 8, 410–416 (2007)

    Article  Google Scholar 

  21. Kamath, U., De Jong, K.A., Shehu, A.: Selecting predictive features for recognition of hypersensitive sites of regulatory genomic sequences with an evolutionary algorithm. In: GECCO: Gen. Evol. Comp. Conf., pp. 179–186. ACM, New York (2010)

    Google Scholar 

  22. Kamath, U., Shehu, A., De Jong, K.A.: Using evolutionary computation to improve svm classification. In: WCCI: IEEE World Conf. Comp. Intel. IEEE Press (2010) (in press)

    Google Scholar 

  23. Koza, J.: On the Programming of Computers by Means of Natural Selection. MIT Press, Boston (1992)

    MATH  Google Scholar 

  24. Leslie, C., Kuang, R., Bennett, K.: Fast string kernels using inexact matching for protein sequences. Journal of Machine Learning Research 5, 1435–1455 (2004)

    MathSciNet  MATH  Google Scholar 

  25. Leslie CS, N.W., Eskin E.: The spectrum kernel: a string kernel for svm protein classification. In: Pacific Symposium on Biocomputing, Baoding, China, vol. 7, pp. 564–575 (2002)

    Google Scholar 

  26. Lowrey, C.H., Bodine, D.M., Nienhuis, A.W.: Mechanism of DNase I hypersensitive site formation within the human globin locus control region. Proc. Natl. Acad. Sci. USA 89(3), 1143–1147 (1992)

    Article  Google Scholar 

  27. Luke, S., Panait, L., Balan, G., Paus, S., Skolicki, Z., Popovici, E., Sullivan, K., Harrison, J., Bassett, J., Hubley, R., Chircop, A., Compton, J., Haddon, W., Donnelly, S., Jamil, B., OBeirne, J.: ECJ:Ajava-based evolutionary computation research (2010)

    Google Scholar 

  28. Staelin, C.: Parameter Selection for Support Vector Machines, Internal publication of HP Laboratories, Israel (approved for external publication) Technion City, Haifa, 32000. Israel Copyright Hewlett-Packard Company (2002), http://www.hpl.hp.com/techreports/2002/HPL-2002-354R1.pdf

  29. Maston, G.A., Evans, S.K., Green, M.R.: Transriptional regulatory elements in the human genome. Annu. Rev. Genom. Human Genet. 7, 29–59 (2006)

    Article  Google Scholar 

  30. Mierswa, I.: Evolutionary learning with kernels: A generic solution for large margin problems. In: GECCO: Gen. Evol. Comp. Conf., pp. 1553–1560 (2006)

    Google Scholar 

  31. Montana, D.J.: Strongly typed genetic programming. Evolutionary Computation 3(2), 199–230 (1993)

    Article  MathSciNet  Google Scholar 

  32. Noble, W.S.: Support vector machine applications in computational biology. In: Schölkopf, B., Tsuda, K., Vert, J.-P. (eds.) Kernel Methods in Computational Biology. MIT Press, Cambridge (2004)

    Google Scholar 

  33. Noble, W.S., Kuehn, S., Thurman, R., Yu, M., Stamatoyannopoulos, J.A.: Predicting the in vivo signature of human gene regulatory sequences. Bioinformatics 21(suppl. 1), i338–i343 (2005)

    Article  Google Scholar 

  34. Phienthrakul, T., Kijsirikul, B.: Evolutionary strategies for multi-scale radial basis function kernels in support vector machines. In: Genetic and Evolutionary Computation Conference, Washington D.C.,USA, pp. 905–911 (2005)

    Google Scholar 

  35. Sabo, P.J., Humbert, R., Hawrylycz, M., Wallace, J.C., Dorschner, M.O., McArthur, M., Stamatoyannopoulos, J.A.: Genome-wide identification of DNase I hypersensitive sites using active chromatin sequence libraries. Proc. Natl. Acad. Sci. USA 101(13), 4537–4542 (2004)

    Article  Google Scholar 

  36. Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Boston (2002)

    Google Scholar 

  37. Shawe-Taylor, J., Cristianini, N.: Kernel methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  38. Sullivan, K., Luke, S.: Evolving kernels for support vector machine classification. In: Genetic and Evolutionary Computation Conference (2007)

    Google Scholar 

  39. Vapnik, V.N.: Statistical learning theory. Wiley & Sons, New York (1998)

    MATH  Google Scholar 

  40. Vertanen, K.: Genetic adventures in parallel: Towards a good island model under PVM (1998)

    Google Scholar 

  41. Wu, C.: The 5′ ends of drosophila heat shock genes in chromatin are hypersensitive to DNase I. Nature 286(5776), 854–860 (1980)

    Article  Google Scholar 

  42. Zhang, X.H., Heller, K.A., Hefter, I., Leslie, C.S., Chasin, L.A.: Sequence information for the splicing of human pre-mrna identified by support vector machine classification. Genome Res. 13(12), 2637–2650 (2003)

    Article  Google Scholar 

  43. Zien, A., Raetsch, G., Mika, S., Schölkopf, B., Lengauer, T., Mueller, K.R.: Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics 16(9), 799–807 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Kamath, U., Shehu, A., De Jong, K.A. (2012). Feature and Kernel Evolution for Recognition of Hypersensitive Sites in DNA Sequences. In: Suzuki, J., Nakano, T. (eds) Bio-Inspired Models of Network, Information, and Computing Systems. BIONETICS 2010. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 87. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32615-8_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32615-8_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32614-1

  • Online ISBN: 978-3-642-32615-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics