Skip to main content
Log in

Cooperative Metaheuristics for Exploring Proteomic Data

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Most combinatorial optimization problems cannotbe solved exactly. A class of methods, calledmetaheuristics, has proved its efficiency togive good approximated solutions in areasonable time. Cooperative metaheuristics area sub-set of metaheuristics, which implies aparallel exploration of the search space byseveral entities with information exchangebetween them. The importance of informationexchange in the optimization process is relatedto the building block hypothesis ofevolutionary algorithms, which is based onthese two questions: what is the pertinentinformation of a given potential solution andhow this information can be shared? Aclassification of cooperative metaheuristicsmethods depending on the nature of cooperationinvolved is presented and the specificproperties of each class, as well as a way tocombine them, is discussed. Severalimprovements in the field of metaheuristics arealso given. In particular, a method to regulatethe use of classical genetic operators and todefine new more pertinent ones is proposed,taking advantage of a building block structuredrepresentation of the explored space. Ahierarchical approach resting on multiplelevels of cooperative metaheuristics is finallypresented, leading to the definition of acomplete concerted cooperation strategy. Someapplications of these concepts to difficultproteomics problems, including automaticprotein identification, biological motifinference and multiple sequence alignment arepresented. For each application, an innovativemethod based on the cooperation concept isgiven and compared with classical approaches.In the protein identification problem, a firstlevel of cooperation using swarm intelligenceis applied to the comparison of massspectrometric data with biological sequencedatabase, followed by a genetic programmingmethod to discover an optimal scoring function.The multiple sequence alignment problem isdecomposed in three steps involving severalevolutionary processes to infer different kindof biological motifs and a concertedcooperation strategy to build the sequencealignment according to their motif content.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akutsu, T., Arimura, H. & Shimozono, S. (2000). On Approximation Algorithms for Local Multiple Alignment. Proceeding 4th Int. Conf. Computational Molecular Biology, 1–7. Ref Type: Conference Proceeding.

  • Bailey, T. L. & Elkan, C. (1994). Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, 28–36, AAAI Press. Ref Type: Conference Proceeding.

  • Blickle, T. & Thiele, L. (1995). A Comparison of Selection Schemes Used in Genetic Algorithms. TIK 11. Ref Type: Report.

  • Bonabeau, E., Dorigo, M. & Theraulaz, G. (2002). Swarm Intelligence. From Natural to Artificial Systems. Oxford University Press.

  • Buhler, J. & Tompa, M. (2002). Finding Motifs Using Random Projection. J. Comput. Biol. 9: 225–242.

    Google Scholar 

  • Califano, A. (2000). SPLASH: Structural Pattern Localization Analysis by Sequential Histograms. Bioinformatics 16: 341–357.

    Google Scholar 

  • Chen, T., Kao, M. Y., Tepel, M., Rush, J. & Church, G. M. (2001). A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry. J. Comput. Biol. 8(3): 325–337.

    Google Scholar 

  • Coello Coello, C. A., Veldhuizen, D. A. V. & Lamont, G. B. (2002). Evolutionary Algorithms for Solving Multi-Objective Problems. Kluwer Academic Publisher.

  • Dancik, V., Addona, T., Clauser, K., Vath, J. & Pevzner, P. A. (1999). De Novo Peptide Sequencing via Tandem Mass Spectrometry. J. Comput. Biol. 6: 327–342.

    Google Scholar 

  • Davis, L. (1991). Handbook of Genetic Algorithm. New York: Van Nostrand Reinhold.

    Google Scholar 

  • Dorigo, M. & Di Caro, G. (1999). The Ant Colony Optimization Meta-Heuristic, ch. 2.

  • Feng, D. F. & Doolittle, R. F. (1987). Progressive Sequence Alignment as a Prerequisite to Correct Phylogenetic Trees. J. Mol. Evol. 25: 351–360.

    Google Scholar 

  • Fernandez, F., Tomassini, M., Punch, III W. F. & Sanchez, J. M. (2000). Experimental Study of Multipopulation Parallel Genetic Programming. Proceedings of the Third European Conference on Genetic Programming, 283–293, Springer Verlag. Ref Type: Conference Proceeding.

  • Fukami-Kobayashi, K., Schreiber, D. R. & Benner, S. A. (2002). Detecting Compensatory Covariation Signals in Protein Evolution Using Reconstructed Ancestral Sequences. J. Mol. Biol. 319: 729–743.

    Google Scholar 

  • Goldberg, D. E. (1989). Genetic Algorithm in Search, Optimization and Machine Learning.

  • Goldberg, D. E. (2002). The Design of Innovation: Lessons from and for Competent Genetic Algorithms. Kluwer Academic Publishers.

  • Golubski, W. (2002). Genetic Programming: A Parallel Approach. Lecture notes in computer science, vol. 2311.

  • Gotoh, O. (1996). Significant Improvement in Accuracy of Multiple Protein Sequence Alignments by Iterative Refinement as Assessed by Reference to Structural Alignments. J. Mol. Biol. 264: 823–838.

    Google Scholar 

  • Gras, R., Gasteiger, E., Chopard, B., Müller M. & Appel, R. D. (2001). New Learning Method to Improving Protein Identification from Peptide Mass Fingerprinting, 2000. 4th Siena 2D electrophoresis meeting. Ref Type: Conference Proceeding.

  • Gras, R. & Muller, M. (2001). Computational Aspects of Protein Identification by Mass Spectrometry. Current Opinion in Molecular Therapeutics 3: 526–532.

    Google Scholar 

  • Gras, R., Muller, M., Gasteiger, E., Gay, S., Binz, P. A., Bienvenut, W., Hoogland, C., Sanchez, J. C., Bairoch, A., Hochstrasser, D. F. & Appel, R. D. (1999). Improving Protein Identification from Peptide Mass Fingerprinting Through a Parameterized Multi-level Scoring Algorithm and an Optimized Peak Detection. Electrophoresis 20: 3535–3550.

    Google Scholar 

  • Hernandez, D., Gras, R., Lisacek, F. & Appel, R. D. (2002). MoDEL: Inférence de motifs avec un algorithme évolutionniste, JOBIM 2002, 265–267. Ref Type: Conference Proceeding.

  • Hernandez, P., Gras, R., Frey, J. & Appel, R. D. (2002). Automated Protein Identification from Tandem Mass Spectrometric Data Using Ant Colony Optimization Algorithms, 148–150. Proteomics in press, 5th Sienna meeting. Ref Type: Conference Proceeding.

  • Hertz, G. Z. & Stormo, G. D. (1999). Identifying DNA and Protein Pattern with Statistically Significant Alignment of Multiple Ssequence. Bioinformatics 15: 563–577.

    Google Scholar 

  • Higgins, D. G. & Sharp, P. M. (1989). Fast and Sensitive Multiple Sequence Alignments on a Microcomputer. Comput. Appl. Biosci. 5: 151–153.

    Google Scholar 

  • Holland, J. H. (1975). Adaptation in Natural and Artificial Systems. Ann Arbor: The University of Michigan Press.

    Google Scholar 

  • Jonassen, I., Collins, J. F. & Higgins, D. G. (1995). Finding Flexible Patterns in Unaligned Protein Sequences. Protein Science 4: 1587–1595.

    Google Scholar 

  • Jones, T. (1993). A Description of Holland's Royal Road Functions. 5th International Conference on Genetic Algorithms. Ref Type: Conference Proceeding.

  • Keich, U. & Pevzner, P. A. (2002). Finding Motifs in the Twilight Zone. Proc. 6th Int. Conf. Computational Molecular Biology, 195–204. Ref Type: Conference Proceeding.

  • Koza, J. R. (1992). Genetic Programming: On the Programming of Computers by Means of Natural Selection. The MIT Press.

  • Lassmann, T. and Sonnhammer, E. L. (2002). Quality Assessment of Multiple Alignment Programs. FEBS Lett. 529: 126–130.

    Google Scholar 

  • Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F. & Wootton, J. C. (1993). Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment. Science 262: 214.

    Google Scholar 

  • Lin, S. C., Punch, III W. F. & Goodman, D. (1994). Coarse-Grain Parallel Genetic Algorithms: Categorization and New Approach, 28–37. Sixth IEEE parallel and distributed processing. Ref Type: Conference Proceeding.

  • Mann, M. & Wilm, M. (1994). Error-tolerant Identification of Peptides in Sequence Databases by Peptide Sequence Tags. Anal. Chem. 66: 4390–4399.

    Google Scholar 

  • Marsan, L. & Sagot, M.-F. (2000). Extracting Structured Motifs Using a Suffix Tree – Algorithms and Application to Consensus Identification. In Minoru, S. & Shamir, R. (eds.) Proceedings of the Fourth Annual International Conference on Computational Molecular Biology (RECOMB 00), 210–219. Tokyo, Japan: ACM Press. Ref Type: Conference Proceeding.

    Google Scholar 

  • Michalewicz, Z. & Fogel, D. (2000). How to Solve It: Modern Heuristics. Springer-Verlag.

  • Mitchell, M., Holland, J. H. & Forrest, S. (1995). When Will a Genetic Algorithm Outperform Hill Climbing? Morgan Kaufmann.

  • Morgenstern, B. (1999). DIALIGN 2: Improvement of the Segment-to-Segment Approach to Multiple Sequence Alignment. Bioinformatics. 15: 211–218.

    Google Scholar 

  • Mullan, L. J. (2002). Multiple Sequence Alignment – the Gateway to Further Analysis. Brief. Bioinform. 3: 303–305.

    Google Scholar 

  • Notredame, C. & Higgins, D. G. (1996). Sequence Alignment by Genetic Algorithm. Nucleic Acids Res. 24: 1515–1524.

    Google Scholar 

  • Notredame, C., Higgins, D. G. & Heringa J. (2000). T-Coffee: A Novel Method for Fast and Accurate Multiple Sequence Alignment. J. Mol. Biol. 302: 205–217.

    Google Scholar 

  • Pacheco, P. S. (1997). Parallel Programming with MPI. San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Pelikan, M., Goldberg, D. E. & Cantu-Paz, E. (1999). BOA: The Bayesian Optimization Algorithm. I. Proceedings of the Genetic and Evolutionary Computation Conference GECCO-99, 525–532. San Francisco, CA: Morgan Kaufmann. Ref Type: Conference Proceeding.

    Google Scholar 

  • Pennington, S. R. & Dunn, M. J. (2001). Proteomics from Protein Sequence to Function. BIOS Scientific.

  • Pevzner, P. A., Mulyukov, Z., Dancik, V. & Tang, C. L. (2001). Efficiency of Database Search for Identification of Mutated and Modified Proteins via Mass Spectrometry. Genome Research 11: 290–299.

    Google Scholar 

  • Pevzner, P. A. & Sze, S.-H. (2000). Combinatorial Approaches to Finding Subtle Signals in DNA Sequences. Proceedings of the eighth International Conference on Intelligent Systems for Molecular Biology, 269–278, San Diego. Ref Type: Conference Proceeding.

  • Punch, W. F. (1998). How Effective are Multiple Populations in Genetic Programming. Genetic Programming 1998, 308–313. Ref Type: Conference Proceeding.

  • Rigden, D. J. (2002). Use of Covariance Analysis for the Prediction of Structural Domain Boundaries from Multiple Protein Sequence Alignments. Protein Eng. 15: 65–77.

    Google Scholar 

  • Scherl, A., Coute, Y., Deon, C., Calle, A., Kindbeiter, K., Sanchez, J.-C., Greco, A., Hochstrasser, D. F. & Diaz, J. J. (2002). Functional Proteomic Analysis of the Human Nucleolus. Mol. Biol. Cell, published online.

  • Schlosser, A. & Lehmann, W. D. (2002). Patchwork Peptide Sequencing: Extraction of Sequence Information from Accurate Mass Data of Peptide Tandem Mass Spectra Recorded at High Resolution. Proteomics 2: 524–533.

    Google Scholar 

  • Stoye, J. (1998). Multiple Sequence Alignment with the Divide-and-Conquer Method. Gene 211: GC45–GC56.

    Google Scholar 

  • Taylor, J. A. & Johnson, R. S. (1997). Sequence Database Searches via de Novo Peptide Sequencing by Tandem Mass Spectrometry. Rapid Commun Mass Spectrom 11: 1067–1075.

    Google Scholar 

  • Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994). CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence Alignment Through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice. Nucleic Acids Res. 22: 4673–4680.

    Google Scholar 

  • Wilkins, M. R., Williams, K. L., Appel, R. D. & Hochstrasser, D. F. (1997). Proteome Research: New Frontiers in Functional Genomics. Springer-Verlag.

  • Yagiura, M. & Ibaraki, T. (2001). On Metaheuristic Algorithms for Combinatorial Optimization Problems. Systems and Computers in Japan 32: 33–55.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robin Gras.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gras, R., Hernandez, D., Hernandez, P. et al. Cooperative Metaheuristics for Exploring Proteomic Data. Artificial Intelligence Review 20, 95–120 (2003). https://doi.org/10.1023/A:1026080413328

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1026080413328

Navigation