Abstract
Most combinatorial optimization problems cannotbe solved exactly. A class of methods, calledmetaheuristics, has proved its efficiency togive good approximated solutions in areasonable time. Cooperative metaheuristics area sub-set of metaheuristics, which implies aparallel exploration of the search space byseveral entities with information exchangebetween them. The importance of informationexchange in the optimization process is relatedto the building block hypothesis ofevolutionary algorithms, which is based onthese two questions: what is the pertinentinformation of a given potential solution andhow this information can be shared? Aclassification of cooperative metaheuristicsmethods depending on the nature of cooperationinvolved is presented and the specificproperties of each class, as well as a way tocombine them, is discussed. Severalimprovements in the field of metaheuristics arealso given. In particular, a method to regulatethe use of classical genetic operators and todefine new more pertinent ones is proposed,taking advantage of a building block structuredrepresentation of the explored space. Ahierarchical approach resting on multiplelevels of cooperative metaheuristics is finallypresented, leading to the definition of acomplete concerted cooperation strategy. Someapplications of these concepts to difficultproteomics problems, including automaticprotein identification, biological motifinference and multiple sequence alignment arepresented. For each application, an innovativemethod based on the cooperation concept isgiven and compared with classical approaches.In the protein identification problem, a firstlevel of cooperation using swarm intelligenceis applied to the comparison of massspectrometric data with biological sequencedatabase, followed by a genetic programmingmethod to discover an optimal scoring function.The multiple sequence alignment problem isdecomposed in three steps involving severalevolutionary processes to infer different kindof biological motifs and a concertedcooperation strategy to build the sequencealignment according to their motif content.
Similar content being viewed by others
References
Akutsu, T., Arimura, H. & Shimozono, S. (2000). On Approximation Algorithms for Local Multiple Alignment. Proceeding 4th Int. Conf. Computational Molecular Biology, 1–7. Ref Type: Conference Proceeding.
Bailey, T. L. & Elkan, C. (1994). Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, 28–36, AAAI Press. Ref Type: Conference Proceeding.
Blickle, T. & Thiele, L. (1995). A Comparison of Selection Schemes Used in Genetic Algorithms. TIK 11. Ref Type: Report.
Bonabeau, E., Dorigo, M. & Theraulaz, G. (2002). Swarm Intelligence. From Natural to Artificial Systems. Oxford University Press.
Buhler, J. & Tompa, M. (2002). Finding Motifs Using Random Projection. J. Comput. Biol. 9: 225–242.
Califano, A. (2000). SPLASH: Structural Pattern Localization Analysis by Sequential Histograms. Bioinformatics 16: 341–357.
Chen, T., Kao, M. Y., Tepel, M., Rush, J. & Church, G. M. (2001). A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry. J. Comput. Biol. 8(3): 325–337.
Coello Coello, C. A., Veldhuizen, D. A. V. & Lamont, G. B. (2002). Evolutionary Algorithms for Solving Multi-Objective Problems. Kluwer Academic Publisher.
Dancik, V., Addona, T., Clauser, K., Vath, J. & Pevzner, P. A. (1999). De Novo Peptide Sequencing via Tandem Mass Spectrometry. J. Comput. Biol. 6: 327–342.
Davis, L. (1991). Handbook of Genetic Algorithm. New York: Van Nostrand Reinhold.
Dorigo, M. & Di Caro, G. (1999). The Ant Colony Optimization Meta-Heuristic, ch. 2.
Feng, D. F. & Doolittle, R. F. (1987). Progressive Sequence Alignment as a Prerequisite to Correct Phylogenetic Trees. J. Mol. Evol. 25: 351–360.
Fernandez, F., Tomassini, M., Punch, III W. F. & Sanchez, J. M. (2000). Experimental Study of Multipopulation Parallel Genetic Programming. Proceedings of the Third European Conference on Genetic Programming, 283–293, Springer Verlag. Ref Type: Conference Proceeding.
Fukami-Kobayashi, K., Schreiber, D. R. & Benner, S. A. (2002). Detecting Compensatory Covariation Signals in Protein Evolution Using Reconstructed Ancestral Sequences. J. Mol. Biol. 319: 729–743.
Goldberg, D. E. (1989). Genetic Algorithm in Search, Optimization and Machine Learning.
Goldberg, D. E. (2002). The Design of Innovation: Lessons from and for Competent Genetic Algorithms. Kluwer Academic Publishers.
Golubski, W. (2002). Genetic Programming: A Parallel Approach. Lecture notes in computer science, vol. 2311.
Gotoh, O. (1996). Significant Improvement in Accuracy of Multiple Protein Sequence Alignments by Iterative Refinement as Assessed by Reference to Structural Alignments. J. Mol. Biol. 264: 823–838.
Gras, R., Gasteiger, E., Chopard, B., Müller M. & Appel, R. D. (2001). New Learning Method to Improving Protein Identification from Peptide Mass Fingerprinting, 2000. 4th Siena 2D electrophoresis meeting. Ref Type: Conference Proceeding.
Gras, R. & Muller, M. (2001). Computational Aspects of Protein Identification by Mass Spectrometry. Current Opinion in Molecular Therapeutics 3: 526–532.
Gras, R., Muller, M., Gasteiger, E., Gay, S., Binz, P. A., Bienvenut, W., Hoogland, C., Sanchez, J. C., Bairoch, A., Hochstrasser, D. F. & Appel, R. D. (1999). Improving Protein Identification from Peptide Mass Fingerprinting Through a Parameterized Multi-level Scoring Algorithm and an Optimized Peak Detection. Electrophoresis 20: 3535–3550.
Hernandez, D., Gras, R., Lisacek, F. & Appel, R. D. (2002). MoDEL: Inférence de motifs avec un algorithme évolutionniste, JOBIM 2002, 265–267. Ref Type: Conference Proceeding.
Hernandez, P., Gras, R., Frey, J. & Appel, R. D. (2002). Automated Protein Identification from Tandem Mass Spectrometric Data Using Ant Colony Optimization Algorithms, 148–150. Proteomics in press, 5th Sienna meeting. Ref Type: Conference Proceeding.
Hertz, G. Z. & Stormo, G. D. (1999). Identifying DNA and Protein Pattern with Statistically Significant Alignment of Multiple Ssequence. Bioinformatics 15: 563–577.
Higgins, D. G. & Sharp, P. M. (1989). Fast and Sensitive Multiple Sequence Alignments on a Microcomputer. Comput. Appl. Biosci. 5: 151–153.
Holland, J. H. (1975). Adaptation in Natural and Artificial Systems. Ann Arbor: The University of Michigan Press.
Jonassen, I., Collins, J. F. & Higgins, D. G. (1995). Finding Flexible Patterns in Unaligned Protein Sequences. Protein Science 4: 1587–1595.
Jones, T. (1993). A Description of Holland's Royal Road Functions. 5th International Conference on Genetic Algorithms. Ref Type: Conference Proceeding.
Keich, U. & Pevzner, P. A. (2002). Finding Motifs in the Twilight Zone. Proc. 6th Int. Conf. Computational Molecular Biology, 195–204. Ref Type: Conference Proceeding.
Koza, J. R. (1992). Genetic Programming: On the Programming of Computers by Means of Natural Selection. The MIT Press.
Lassmann, T. and Sonnhammer, E. L. (2002). Quality Assessment of Multiple Alignment Programs. FEBS Lett. 529: 126–130.
Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F. & Wootton, J. C. (1993). Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment. Science 262: 214.
Lin, S. C., Punch, III W. F. & Goodman, D. (1994). Coarse-Grain Parallel Genetic Algorithms: Categorization and New Approach, 28–37. Sixth IEEE parallel and distributed processing. Ref Type: Conference Proceeding.
Mann, M. & Wilm, M. (1994). Error-tolerant Identification of Peptides in Sequence Databases by Peptide Sequence Tags. Anal. Chem. 66: 4390–4399.
Marsan, L. & Sagot, M.-F. (2000). Extracting Structured Motifs Using a Suffix Tree – Algorithms and Application to Consensus Identification. In Minoru, S. & Shamir, R. (eds.) Proceedings of the Fourth Annual International Conference on Computational Molecular Biology (RECOMB 00), 210–219. Tokyo, Japan: ACM Press. Ref Type: Conference Proceeding.
Michalewicz, Z. & Fogel, D. (2000). How to Solve It: Modern Heuristics. Springer-Verlag.
Mitchell, M., Holland, J. H. & Forrest, S. (1995). When Will a Genetic Algorithm Outperform Hill Climbing? Morgan Kaufmann.
Morgenstern, B. (1999). DIALIGN 2: Improvement of the Segment-to-Segment Approach to Multiple Sequence Alignment. Bioinformatics. 15: 211–218.
Mullan, L. J. (2002). Multiple Sequence Alignment – the Gateway to Further Analysis. Brief. Bioinform. 3: 303–305.
Notredame, C. & Higgins, D. G. (1996). Sequence Alignment by Genetic Algorithm. Nucleic Acids Res. 24: 1515–1524.
Notredame, C., Higgins, D. G. & Heringa J. (2000). T-Coffee: A Novel Method for Fast and Accurate Multiple Sequence Alignment. J. Mol. Biol. 302: 205–217.
Pacheco, P. S. (1997). Parallel Programming with MPI. San Francisco: Morgan Kaufmann.
Pelikan, M., Goldberg, D. E. & Cantu-Paz, E. (1999). BOA: The Bayesian Optimization Algorithm. I. Proceedings of the Genetic and Evolutionary Computation Conference GECCO-99, 525–532. San Francisco, CA: Morgan Kaufmann. Ref Type: Conference Proceeding.
Pennington, S. R. & Dunn, M. J. (2001). Proteomics from Protein Sequence to Function. BIOS Scientific.
Pevzner, P. A., Mulyukov, Z., Dancik, V. & Tang, C. L. (2001). Efficiency of Database Search for Identification of Mutated and Modified Proteins via Mass Spectrometry. Genome Research 11: 290–299.
Pevzner, P. A. & Sze, S.-H. (2000). Combinatorial Approaches to Finding Subtle Signals in DNA Sequences. Proceedings of the eighth International Conference on Intelligent Systems for Molecular Biology, 269–278, San Diego. Ref Type: Conference Proceeding.
Punch, W. F. (1998). How Effective are Multiple Populations in Genetic Programming. Genetic Programming 1998, 308–313. Ref Type: Conference Proceeding.
Rigden, D. J. (2002). Use of Covariance Analysis for the Prediction of Structural Domain Boundaries from Multiple Protein Sequence Alignments. Protein Eng. 15: 65–77.
Scherl, A., Coute, Y., Deon, C., Calle, A., Kindbeiter, K., Sanchez, J.-C., Greco, A., Hochstrasser, D. F. & Diaz, J. J. (2002). Functional Proteomic Analysis of the Human Nucleolus. Mol. Biol. Cell, published online.
Schlosser, A. & Lehmann, W. D. (2002). Patchwork Peptide Sequencing: Extraction of Sequence Information from Accurate Mass Data of Peptide Tandem Mass Spectra Recorded at High Resolution. Proteomics 2: 524–533.
Stoye, J. (1998). Multiple Sequence Alignment with the Divide-and-Conquer Method. Gene 211: GC45–GC56.
Taylor, J. A. & Johnson, R. S. (1997). Sequence Database Searches via de Novo Peptide Sequencing by Tandem Mass Spectrometry. Rapid Commun Mass Spectrom 11: 1067–1075.
Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994). CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence Alignment Through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice. Nucleic Acids Res. 22: 4673–4680.
Wilkins, M. R., Williams, K. L., Appel, R. D. & Hochstrasser, D. F. (1997). Proteome Research: New Frontiers in Functional Genomics. Springer-Verlag.
Yagiura, M. & Ibaraki, T. (2001). On Metaheuristic Algorithms for Combinatorial Optimization Problems. Systems and Computers in Japan 32: 33–55.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gras, R., Hernandez, D., Hernandez, P. et al. Cooperative Metaheuristics for Exploring Proteomic Data. Artificial Intelligence Review 20, 95–120 (2003). https://doi.org/10.1023/A:1026080413328
Issue Date:
DOI: https://doi.org/10.1023/A:1026080413328