Cooperative Metaheuristics for Exploring Proteomic Data

Gras, Robin; Hernandez, David; Hernandez, Patricia; Zangge, Nadine; Mescam, Yoann; Frey, Julien; Martin, Olivier; Nicolas, Jacques; Appel, Ron D.

doi:10.1023/A:1026080413328

Cooperative Metaheuristics for Exploring Proteomic Data

Published: October 2003

Volume 20, pages 95–120, (2003)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Robin Gras¹,
David Hernandez¹,
Patricia Hernandez¹,
Nadine Zangge¹,
Yoann Mescam^1,2,
Julien Frey¹,
Olivier Martin¹,
Jacques Nicolas² &
…
Ron D. Appel^1,3

90 Accesses
6 Citations
Explore all metrics

Abstract

Most combinatorial optimization problems cannotbe solved exactly. A class of methods, calledmetaheuristics, has proved its efficiency togive good approximated solutions in areasonable time. Cooperative metaheuristics area sub-set of metaheuristics, which implies aparallel exploration of the search space byseveral entities with information exchangebetween them. The importance of informationexchange in the optimization process is relatedto the building block hypothesis ofevolutionary algorithms, which is based onthese two questions: what is the pertinentinformation of a given potential solution andhow this information can be shared? Aclassification of cooperative metaheuristicsmethods depending on the nature of cooperationinvolved is presented and the specificproperties of each class, as well as a way tocombine them, is discussed. Severalimprovements in the field of metaheuristics arealso given. In particular, a method to regulatethe use of classical genetic operators and todefine new more pertinent ones is proposed,taking advantage of a building block structuredrepresentation of the explored space. Ahierarchical approach resting on multiplelevels of cooperative metaheuristics is finallypresented, leading to the definition of acomplete concerted cooperation strategy. Someapplications of these concepts to difficultproteomics problems, including automaticprotein identification, biological motifinference and multiple sequence alignment arepresented. For each application, an innovativemethod based on the cooperation concept isgiven and compared with classical approaches.In the protein identification problem, a firstlevel of cooperation using swarm intelligenceis applied to the comparison of massspectrometric data with biological sequencedatabase, followed by a genetic programmingmethod to discover an optimal scoring function.The multiple sequence alignment problem isdecomposed in three steps involving severalevolutionary processes to infer different kindof biological motifs and a concertedcooperation strategy to build the sequencealignment according to their motif content.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Akutsu, T., Arimura, H. & Shimozono, S. (2000). On Approximation Algorithms for Local Multiple Alignment. Proceeding 4th Int. Conf. Computational Molecular Biology, 1–7. Ref Type: Conference Proceeding.
Bailey, T. L. & Elkan, C. (1994). Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, 28–36, AAAI Press. Ref Type: Conference Proceeding.
Blickle, T. & Thiele, L. (1995). A Comparison of Selection Schemes Used in Genetic Algorithms. TIK 11. Ref Type: Report.
Bonabeau, E., Dorigo, M. & Theraulaz, G. (2002). Swarm Intelligence. From Natural to Artificial Systems. Oxford University Press.
Buhler, J. & Tompa, M. (2002). Finding Motifs Using Random Projection. J. Comput. Biol. 9: 225–242.
Google Scholar
Califano, A. (2000). SPLASH: Structural Pattern Localization Analysis by Sequential Histograms. Bioinformatics 16: 341–357.
Google Scholar
Chen, T., Kao, M. Y., Tepel, M., Rush, J. & Church, G. M. (2001). A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry. J. Comput. Biol. 8(3): 325–337.
Google Scholar
Coello Coello, C. A., Veldhuizen, D. A. V. & Lamont, G. B. (2002). Evolutionary Algorithms for Solving Multi-Objective Problems. Kluwer Academic Publisher.
Dancik, V., Addona, T., Clauser, K., Vath, J. & Pevzner, P. A. (1999). De Novo Peptide Sequencing via Tandem Mass Spectrometry. J. Comput. Biol. 6: 327–342.
Google Scholar
Davis, L. (1991). Handbook of Genetic Algorithm. New York: Van Nostrand Reinhold.
Google Scholar
Dorigo, M. & Di Caro, G. (1999). The Ant Colony Optimization Meta-Heuristic, ch. 2.
Feng, D. F. & Doolittle, R. F. (1987). Progressive Sequence Alignment as a Prerequisite to Correct Phylogenetic Trees. J. Mol. Evol. 25: 351–360.
Google Scholar
Fernandez, F., Tomassini, M., Punch, III W. F. & Sanchez, J. M. (2000). Experimental Study of Multipopulation Parallel Genetic Programming. Proceedings of the Third European Conference on Genetic Programming, 283–293, Springer Verlag. Ref Type: Conference Proceeding.
Fukami-Kobayashi, K., Schreiber, D. R. & Benner, S. A. (2002). Detecting Compensatory Covariation Signals in Protein Evolution Using Reconstructed Ancestral Sequences. J. Mol. Biol. 319: 729–743.
Google Scholar
Goldberg, D. E. (1989). Genetic Algorithm in Search, Optimization and Machine Learning.
Goldberg, D. E. (2002). The Design of Innovation: Lessons from and for Competent Genetic Algorithms. Kluwer Academic Publishers.
Golubski, W. (2002). Genetic Programming: A Parallel Approach. Lecture notes in computer science, vol. 2311.
Gotoh, O. (1996). Significant Improvement in Accuracy of Multiple Protein Sequence Alignments by Iterative Refinement as Assessed by Reference to Structural Alignments. J. Mol. Biol. 264: 823–838.
Google Scholar
Gras, R., Gasteiger, E., Chopard, B., Müller M. & Appel, R. D. (2001). New Learning Method to Improving Protein Identification from Peptide Mass Fingerprinting, 2000. 4th Siena 2D electrophoresis meeting. Ref Type: Conference Proceeding.
Gras, R. & Muller, M. (2001). Computational Aspects of Protein Identification by Mass Spectrometry. Current Opinion in Molecular Therapeutics 3: 526–532.
Google Scholar
Gras, R., Muller, M., Gasteiger, E., Gay, S., Binz, P. A., Bienvenut, W., Hoogland, C., Sanchez, J. C., Bairoch, A., Hochstrasser, D. F. & Appel, R. D. (1999). Improving Protein Identification from Peptide Mass Fingerprinting Through a Parameterized Multi-level Scoring Algorithm and an Optimized Peak Detection. Electrophoresis 20: 3535–3550.
Google Scholar
Hernandez, D., Gras, R., Lisacek, F. & Appel, R. D. (2002). MoDEL: Inférence de motifs avec un algorithme évolutionniste, JOBIM 2002, 265–267. Ref Type: Conference Proceeding.
Hernandez, P., Gras, R., Frey, J. & Appel, R. D. (2002). Automated Protein Identification from Tandem Mass Spectrometric Data Using Ant Colony Optimization Algorithms, 148–150. Proteomics in press, 5th Sienna meeting. Ref Type: Conference Proceeding.
Hertz, G. Z. & Stormo, G. D. (1999). Identifying DNA and Protein Pattern with Statistically Significant Alignment of Multiple Ssequence. Bioinformatics 15: 563–577.
Google Scholar
Higgins, D. G. & Sharp, P. M. (1989). Fast and Sensitive Multiple Sequence Alignments on a Microcomputer. Comput. Appl. Biosci. 5: 151–153.
Google Scholar
Holland, J. H. (1975). Adaptation in Natural and Artificial Systems. Ann Arbor: The University of Michigan Press.
Google Scholar
Jonassen, I., Collins, J. F. & Higgins, D. G. (1995). Finding Flexible Patterns in Unaligned Protein Sequences. Protein Science 4: 1587–1595.
Google Scholar
Jones, T. (1993). A Description of Holland's Royal Road Functions. 5th International Conference on Genetic Algorithms. Ref Type: Conference Proceeding.
Keich, U. & Pevzner, P. A. (2002). Finding Motifs in the Twilight Zone. Proc. 6th Int. Conf. Computational Molecular Biology, 195–204. Ref Type: Conference Proceeding.
Koza, J. R. (1992). Genetic Programming: On the Programming of Computers by Means of Natural Selection. The MIT Press.
Lassmann, T. and Sonnhammer, E. L. (2002). Quality Assessment of Multiple Alignment Programs. FEBS Lett. 529: 126–130.
Google Scholar
Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F. & Wootton, J. C. (1993). Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment. Science 262: 214.
Google Scholar
Lin, S. C., Punch, III W. F. & Goodman, D. (1994). Coarse-Grain Parallel Genetic Algorithms: Categorization and New Approach, 28–37. Sixth IEEE parallel and distributed processing. Ref Type: Conference Proceeding.
Mann, M. & Wilm, M. (1994). Error-tolerant Identification of Peptides in Sequence Databases by Peptide Sequence Tags. Anal. Chem. 66: 4390–4399.
Google Scholar
Marsan, L. & Sagot, M.-F. (2000). Extracting Structured Motifs Using a Suffix Tree – Algorithms and Application to Consensus Identification. In Minoru, S. & Shamir, R. (eds.) Proceedings of the Fourth Annual International Conference on Computational Molecular Biology (RECOMB 00), 210–219. Tokyo, Japan: ACM Press. Ref Type: Conference Proceeding.
Google Scholar
Michalewicz, Z. & Fogel, D. (2000). How to Solve It: Modern Heuristics. Springer-Verlag.
Mitchell, M., Holland, J. H. & Forrest, S. (1995). When Will a Genetic Algorithm Outperform Hill Climbing? Morgan Kaufmann.
Morgenstern, B. (1999). DIALIGN 2: Improvement of the Segment-to-Segment Approach to Multiple Sequence Alignment. Bioinformatics. 15: 211–218.
Google Scholar
Mullan, L. J. (2002). Multiple Sequence Alignment – the Gateway to Further Analysis. Brief. Bioinform. 3: 303–305.
Google Scholar
Notredame, C. & Higgins, D. G. (1996). Sequence Alignment by Genetic Algorithm. Nucleic Acids Res. 24: 1515–1524.
Google Scholar
Notredame, C., Higgins, D. G. & Heringa J. (2000). T-Coffee: A Novel Method for Fast and Accurate Multiple Sequence Alignment. J. Mol. Biol. 302: 205–217.
Google Scholar
Pacheco, P. S. (1997). Parallel Programming with MPI. San Francisco: Morgan Kaufmann.
Google Scholar
Pelikan, M., Goldberg, D. E. & Cantu-Paz, E. (1999). BOA: The Bayesian Optimization Algorithm. I. Proceedings of the Genetic and Evolutionary Computation Conference GECCO-99, 525–532. San Francisco, CA: Morgan Kaufmann. Ref Type: Conference Proceeding.
Google Scholar
Pennington, S. R. & Dunn, M. J. (2001). Proteomics from Protein Sequence to Function. BIOS Scientific.
Pevzner, P. A., Mulyukov, Z., Dancik, V. & Tang, C. L. (2001). Efficiency of Database Search for Identification of Mutated and Modified Proteins via Mass Spectrometry. Genome Research 11: 290–299.
Google Scholar
Pevzner, P. A. & Sze, S.-H. (2000). Combinatorial Approaches to Finding Subtle Signals in DNA Sequences. Proceedings of the eighth International Conference on Intelligent Systems for Molecular Biology, 269–278, San Diego. Ref Type: Conference Proceeding.
Punch, W. F. (1998). How Effective are Multiple Populations in Genetic Programming. Genetic Programming 1998, 308–313. Ref Type: Conference Proceeding.
Rigden, D. J. (2002). Use of Covariance Analysis for the Prediction of Structural Domain Boundaries from Multiple Protein Sequence Alignments. Protein Eng. 15: 65–77.
Google Scholar
Scherl, A., Coute, Y., Deon, C., Calle, A., Kindbeiter, K., Sanchez, J.-C., Greco, A., Hochstrasser, D. F. & Diaz, J. J. (2002). Functional Proteomic Analysis of the Human Nucleolus. Mol. Biol. Cell, published online.
Schlosser, A. & Lehmann, W. D. (2002). Patchwork Peptide Sequencing: Extraction of Sequence Information from Accurate Mass Data of Peptide Tandem Mass Spectra Recorded at High Resolution. Proteomics 2: 524–533.
Google Scholar
Stoye, J. (1998). Multiple Sequence Alignment with the Divide-and-Conquer Method. Gene 211: GC45–GC56.
Google Scholar
Taylor, J. A. & Johnson, R. S. (1997). Sequence Database Searches via de Novo Peptide Sequencing by Tandem Mass Spectrometry. Rapid Commun Mass Spectrom 11: 1067–1075.
Google Scholar
Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994). CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence Alignment Through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice. Nucleic Acids Res. 22: 4673–4680.
Google Scholar
Wilkins, M. R., Williams, K. L., Appel, R. D. & Hochstrasser, D. F. (1997). Proteome Research: New Frontiers in Functional Genomics. Springer-Verlag.
Yagiura, M. & Ibaraki, T. (2001). On Metaheuristic Algorithms for Combinatorial Optimization Problems. Systems and Computers in Japan 32: 33–55.
Google Scholar

Download references

Author information

Authors and Affiliations

Swiss Institute of Bioinformatics, CMU, 1 rue Michel Servet, CH-1211, Geneva 4, Switzerland
Robin Gras, David Hernandez, Patricia Hernandez, Nadine Zangge, Yoann Mescam, Julien Frey, Olivier Martin & Ron D. Appel
IRISA-INRIA, Rennes, France
Yoann Mescam & Jacques Nicolas
University of Geneva, Geneva, Switzerland
Ron D. Appel

Authors

Robin Gras
View author publications
You can also search for this author in PubMed Google Scholar
David Hernandez
View author publications
You can also search for this author in PubMed Google Scholar
Patricia Hernandez
View author publications
You can also search for this author in PubMed Google Scholar
Nadine Zangge
View author publications
You can also search for this author in PubMed Google Scholar
Yoann Mescam
View author publications
You can also search for this author in PubMed Google Scholar
Julien Frey
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Martin
View author publications
You can also search for this author in PubMed Google Scholar
Jacques Nicolas
View author publications
You can also search for this author in PubMed Google Scholar
Ron D. Appel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robin Gras.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gras, R., Hernandez, D., Hernandez, P. et al. Cooperative Metaheuristics for Exploring Proteomic Data. Artificial Intelligence Review 20, 95–120 (2003). https://doi.org/10.1023/A:1026080413328

Download citation

Issue Date: October 2003
DOI: https://doi.org/10.1023/A:1026080413328

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Cooperative Metaheuristics for Exploring Proteomic Data

Abstract

Access this article

Similar content being viewed by others

The Agile particle swarm optimizer applied to proteomic pattern matching and discovery

Cooperation-Based Search of Global Optima

The surprising little effectiveness of cooperative algorithms in parallel problem solving

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Navigation

Cooperative Metaheuristics for Exploring Proteomic Data

Abstract

Access this article

Similar content being viewed by others

The Agile particle swarm optimizer applied to proteomic pattern matching and discovery

Cooperation-Based Search of Global Optima

The surprising little effectiveness of cooperative algorithms in parallel problem solving

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation