An effective hybrid of hill climbing and genetic algorithm for 2D triangular protein structure prediction

Su, Shih-Chieh; Lin, Cheng-Jian; Ting, Chuan-Kang

doi:10.1186/1477-5956-9-S1-S19

An effective hybrid of hill climbing and genetic algorithm for 2D triangular protein structure prediction

Proceedings
Open access
Published: 14 October 2011

Volume 9, article number S19, (2011)
Cite this article

Download PDF

You have full access to this open access article

Proteome Science Aims and scope Submit manuscript

An effective hybrid of hill climbing and genetic algorithm for 2D triangular protein structure prediction

Download PDF

Shih-Chieh Su¹,
Cheng-Jian Lin² &
Chuan-Kang Ting¹

4249 Accesses
24 Citations
4 Altmetric
Explore all metrics

Abstract

Background

Proteins play fundamental and crucial roles in nearly all biological processes, such as, enzymatic catalysis, signaling transduction, DNA and RNA synthesis, and embryonic development. It has been a long-standing goal in molecular biology to predict the tertiary structure of a protein from its primary amino acid sequence. From visual comparison, it was found that a 2D triangular lattice model can give a better structure modeling and prediction for proteins with short primary amino acid sequences.

Methods

This paper proposes a hybrid of hill-climbing and genetic algorithm (HHGA) based on elite-based reproduction strategy for protein structure prediction on the 2D triangular lattice.

Results

The simulation results show that the proposed HHGA can successfully deal with the protein structure prediction problems. Specifically, HHGA significantly outperforms conventional genetic algorithms and is comparable to the state-of-the-art method in terms of free energy.

Conclusions

Thanks to the enhancement of local search on the global search, the proposed HHGA achieves promising results on the 2D triangular protein structure prediction problem. The satisfactory simulation results demonstrate the effectiveness of the proposed HHGA and the utility of the 2D triangular lattice model for protein structure prediction.

Introduction

Since the presence of HP lattice model [1], heuristic search algorithms for a variety of lattice models have been proposed and proven useful to explore the relationship between the primary amino acid sequence and its native folding structure, particularly in the protein folding problem (PFP) and the protein structure prediction (PSP). The main purpose of the HP lattice model is to understand the physicochemical principle of protein folding during the modeling process of searching for the lowest free-energy conformation of a protein.

Despite the difference in modeling accuracy, both high-resolution and low-resolution models can contribute to an understanding of the protein structure obtained from experiments, such as NMR and crystallography. Moreover, they have various applications in protein modification, protein-ligand and protein-protein interactions [2]. Table 1 summarizes the relationship between modeling accuracy and the related applications.

Table 1 The relationship between modeling accuracy and the related application.

Full size table

To improve the modeling accuracy, several lattice models have been developed and proposed. The present study compares four popular lattice models in terms of visual comparison, including 2D square and triangular lattice models, 3D cubic lattice model and face-centered cubic (FCC). The protein structures obtained from the four modeling types were compared with reported 'real' biological protein structures. As Figure 1 shows, the 2D triangular lattice model can give a better structure modeling and prediction for proteins with short primary amino acid sequences.

In solving this prediction problem, Hart and Istrail [4] first gave a 1/4 (25%) approximation for the problem of the 2D square lattice and a 3/8 (38%) approximation for the problem of the 3D cubic lattice. Agarwala et al. [5] gave a 6/11 (54%) approximation for the problem, which is consistent with our experimental results.

Many researchers have favored and focused research on the square lattice model because it has many associated benchmarks, large amount of data accumulated over the years, and the availability of comparison with different strategies and modeling methods. By contrast, little work has been done on the 2D triangular lattice model. In this paper, we proposed a genetic algorithm with elite-based reproduction strategy (ERS-GA). Based on ERS-GA, this study further develops a hybrid of hill-climbing and genetic algorithm (HHGA) for protein structure prediction on the 2D triangular lattice. Experimental results were conducted to validate the effectiveness of this method.

The remainder of this paper is structured as follows: Section II gives the preliminaries and the definition of the protein structure prediction problem in the HP 2D triangular lattice model. Section III describes the methodology used in the study. The comparison of results is presented and discussed in Section IV followed by the conclusion in Section V.

Preliminaries

Proteins play fundamental and crucial roles in nearly all biological processes, such as, enzymatic catalysis, signaling transduction, DNA and RNA synthesis, and embryonic development. It has been a long-standing goal of molecular biology to predict the tertiary structure of a protein from its primary amino acid sequence [6, 7]. This paper emphasizes research on ab initio modeling, among which the 2D HP triangular lattice model is thought to be the best two-dimensional model in protein structure prediction at present.

HP lattice model

The HP lattice model [1] is the most frequently used model, which is based on the observation that the hydrophobic interaction between amino acid residues is the driving force for protein folding and for development of native state in proteins [8]. In this model, each amino acid is classified based on its hydrophobicity as an H (hydrophobic or non-polar) or a P (hydrophilic or polar). The HP lattice model allows HP protein sequences to be configured as self-avoiding walks (SAW) on the lattice path favoring an energy free state according to HH interaction. The energy of a given conformation is defined as the number of topological neighboring (TN) contacts between H's that are not adjacent in the sequence. Figure 2 shows an example for the 2D triangular lattice model.

Calculation of free energy

The free energy of a protein can be calculated by the following formulae [9]:

(1)

(2)

where the parameter

(3)

Protein folding can then be transformed into an optimization problem for the conformation with minimal free energy. Formally, given an HP sequence s = s ₁ s ₂…s _n, find a conformation of s with minimum energy. That is, the problem is to find c* ∈ C(s) such that E(c*) = min{E(c)|c ∈ C(s)}, where C(s) is the set of all valid conformations for s [10].

Triangular lattice model

A significant drawback of the cubic lattice [5] is that, if two residues are at any even distance in the primary sequence, they cannot be in topological contact with one another when the protein is embedded in this lattice. In other words, on the square lattice, two amino acids in contact in any folding must be at odd distance away in the protein sequence [5]. To address this issue, Joel et al. [11] introduced the 2D triangular lattice model. As Figure 3 shows, each lattice point has six neighbors in the two-dimensional triangular lattice. Since each residue has two covalent neighbors, except the first and the last residues, a residue at a lattice point can be in topological contact with at most four other residues. Thus, each residue is involved in up to four H-H contacts [11].

With the unit vectors obtained from the triangular lattice, it is much easier to model protein conformation on a two-dimensional triangular lattice without exhibiting the parity problem [5]. However, the lattice model of protein conformation as a self-avoiding walk is NP-complete [12]. To solve this problem, some heuristic search algorithms [13–18] have been developed for various lattice models. Backofen and Will [21] utilized advanced techniques such as constraint programming to calculate all optimal side-chain structures of a given sequence, and proved their optimality [3]. Further, Böckenhauer et al. [15] extended the library by implementing the 2D triangular lattice and the pull move set for triangular lattice models.

In this paper, we developed an effective hybrid of local search and genetic algorithm (GA) to resolve this problem. The performance is examined and compared to the results in [15]. More details about the proposed algorithm are presented in the next section.

Methods

This paper introduces the elite-based reproduction strategy to GA as the ERS-GA. Further, we propose a hybrid of hill-climbing and ERS-GA, called the HHGA, for protein structure prediction on the 2D triangular lattice. The proposed HHGA, in essence, is a combination of global search algorithm with local search operator. Restated, HHGA works within the framework of ERS-GA and adopts hill-climbing to enhance its exploitation capability. Figures 4 and 5 show the flow charts of the proposed ERS-GA and HHGA. The following subsections describe the operators of ERS-GA and HHGA.

Initialization

For an input amino acid sequence of length n, a candidate conformation in the 2D triangular lattice [11, 14] is encoded as a chromosome in the form of a string of length (n – 1) over symbols {L, R, LU, LD, RU, RD}, denoting the fold directions left, right, left-up, left-down, right-up and right-down, respectively. An initial population is generated randomly in the (n – 1) dimensional space within a predetermined range. In this paper, population size was set at 200 empirically.

Each chromosome in the population needs to be evaluated for its fitness. Here we directly use equation (2) of free energy as the fitness function. The goal for an optimization algorithm like HHGA is to minimize the fitness value, namely, free energy. The evaluated chromosomes are sorted according to their fitness values. This sorted population serves as the basis of subsequent reproduction process.

Elite-Based Reproduction Strategy (ERS)

Reproduction is a process in which the information of candidate solutions are modified and copied, depending upon their fitness values. The reproduction in GA consists of selection, crossover, and mutation. For the ERS-GA and HHGA, this study adopts the elite-based reproduction strategy, which keeps the top half of the population to the next generation and generates offspring by performing crossover and mutation on the second half of the population [19]. In the experiments, this study uses two-point crossover with crossover ate 0.8 and uniform mutation with mutation rate 0.4.

Local search

Two local search operators are proposed for the protein structure prediction problem. First, given the current solution, local search I chooses its neighbor residues, which are generated in a way similar to mutation operation: i.e., randomly changing its direction. Consequently, if the fitness value of a neighbor is better than the current solution, this neighbor residue will be accepted to replace the current one.

In local search II, the neighbor residues are generated in a way similar to crossover operation. That is, five neighbors are created by changing the direction of the second segment after the crossover point, where rotation angles are 60°, 120°, 180°, 240° and 300°, respectively. If any of the five folding directions leads to a superior fitness to the original direction, this neighbor will replace the current solution.

Termination condition

Genetic algorithm requires a termination condition to stop the evolutionary process and return the final result. In this study, the experiments ran ERS-GA and HHGA for a maximum of 200 generations. The best chromosome of the population is then returned as the final result.

Numerical Results

Table 2 lists the eight benchmark sequences in our experiments. These sequences have been used for the 2D square HP model [20]; however, in the 2D triangular HP model the minimum energy of these benchmarks was still unknown. The comparison with previous studies provided a means of demonstrating the effectiveness of the method described here.

Table 2 The benchmarks for the 2D triangular lattice HP model.

Full size table

The experiments were conducted in two steps. First, ERS-GA was used to predict the protein structure to evaluate the efficacy of this method. Tables 3 and 4 summarize the results and compare them with prior work. According to the results in Table 3, the proposed ERS-GA significantly outperforms simple genetic algorithm (SGA) and hybrid genetic algorithm (HGA).

Table 3 Comparison of the proposed approach with the simple genetic algorithm (SGA) and hybrid genetic algorithm (HGA).

Full size table

Table 4 Comparison of a hybrid of hill-climbing and GA (HHGA) with the tabu search (TS).

Full size table

Next, the HHGA integrates the hill-climbing local search into the ERS-GA approach for performance improvement. Table 5 shows that this hybrid algorithm, i.e., HHGA, can effectively enhance the performance and performs comparably with the tabu search proposed by [15]. This comparative outcome demonstrates that HHGA is a similarly good approach as the state-of-the-art method in protein structure prediction. Figure 6 plots the structures obtained from HHGA for eight protein sequences.

Table 5 Comparison of ERS-GA with HHGA in free energy obtained (Mean/Best) and average running time.

Full size table

Table 5 further presents the comparison of the ERS-GA with the HHGA, where each algorithm was run for 30 times. The average running time was measured on Intel i7-920 machines. The experimental results show that HHGA achieves better solution quality, i.e. lower energy, than ERS-GA does on all the benchmarks. This validates the effectiveness of the local search in HHGA. On the other hand, HHGA gains this advantage at the cost of running time.

Conclusions

In the ab initio technique, the lattice model is one of the most frequently used methods in protein structure prediction. From visual comparison, however, it was found that the 2D triangular lattice model can yield better structure modeling sequences and prediction for proteins with short primary amino acid sequences. Meanwhile, it was realized that the 2D triangular lattice model has rarely been used in protein structure prediction.

This paper has highlighted this interesting issue and provides a short introduction to the working method for 2D triangular lattice models. Furthermore, the paper proposes the genetic algorithm with elite-based reproduction strategy (ERS-GA) and a hybrid of hill-climbing and genetic algorithms (HHGA) for protein structure prediction on the 2D triangular lattice. The simulation results show that ERS-GA and HHGA can successfully be applied to the problem of protein structure prediction. The satisfactory simulation results validate the effectiveness of the proposed algorithms; in addition, they demonstrate that the 2D triangular lattice model is promising for protein structure prediction.

References

Lau KF, Dill KA: Lattice statistical mechanics model of the conformation and sequence space of proteins. Macromolecules 1989, 22: 3986–3997. 10.1021/ma00200a030
Article CAS Google Scholar
Sali A, Kuriyan J: Challenges at the frontiers of structural biology. Trends in Genetics 1999, 15: M20-M24. 10.1016/S0168-9525(99)01908-3
Article CAS Google Scholar
Mann M, Smith C, Rabbath M, Edwards M, Will S, Backofen R: CPSP-web-tools: a server for 3D lattice protein studies. Bioinformatics 2009, 25: 676–677. 10.1093/bioinformatics/btp034
Article CAS PubMed Central PubMed Google Scholar
Hart WE, Istrail S: Fast protein folding in the Hydrophobic-Hydrophilic model within three-eighths of optimal (extended abstract). Proceedings of 27th Annual ACM Symposium on Theory of Computation (STOC95) 1995, 157–168.
Google Scholar
Decatur S, Batzoglou S: Protein folding in the Hydrophobic-Polar model on the 3D triangular lattice. 6th Annual MIT Laboratory for Computer Science Student Workshop on Computing Technologies 1996.
Google Scholar
Mirsky AE, Pauling L: On the structure of native, denatured and coagulated proteins. Proc. Natl. Acad. Sci. USA 1936, 22: 439–447. 10.1073/pnas.22.7.439
Article CAS PubMed Central PubMed Google Scholar
Orengo CA, Todd AE: From protein structure to function. Curr. Opin. Struct. Biol 1999, 9: 374–382. 10.1016/S0959-440X(99)80051-7
Article CAS PubMed Google Scholar
Guoa YZ, Fenga EM, Wangb Y: Optimal HP configurations of proteins by combining local search with elastic net algorithm. Journal of Biochemical and Biophysical Methods 2007, 70: 335–340. 10.1016/j.jbbm.2006.08.001
Article Google Scholar
Huang C, Yang X, He Z: Protein folding simulations of 2D HP model by the genetic algorithm based on optimal secondary structures. Computational Biology and Chemistry 2010, 34: 137–142. 10.1016/j.compbiolchem.2010.04.002
Article CAS PubMed Google Scholar
Shmygelska A, Hoos HH: An ant colony optimisation algorithm for the 2D and 3D hydrophobic polar protein folding problem. BMC Bioinformatics 2005, 6: 30. 10.1186/1471-2105-6-30
Article PubMed Central PubMed Google Scholar
Joel G, Martin M, Minghui J: RNA folding on the 3D triangular lattice. BMC Bioinformatics 2009, 10: 369. 10.1186/1471-2105-10-369
Article Google Scholar
Crescenzi P, Goldman D, Papadimitriou C, Piccolboni A, Yannakakis M: On the complexity of protein folding. Journal of Computational Biology 1998, 5: 423–465. 10.1089/cmb.1998.5.423
Article CAS PubMed Google Scholar
Unger R, Moult J: Genetic algorithms for protein folding simulations. Journal of Molecular Biology 1993, 231: 75–81. 10.1006/jmbi.1993.1258
Article CAS PubMed Google Scholar
Hoque MT, Chetty M, Dooley LS: A hybrid genetic algorithm for 2D FCC hydrophobic–hydrophilic lattice model to predict protein folding. Advances in Artificial Intelligence, Lecture Notes in Computer Science 2006, 4304: 867–876.
Google Scholar
Böckenhauer HJ, Ullah AD, Kapsokalivas L, Steinhöfel K: A Local Move Set for Protein Folding in Triangular Lattice Models. Algorithms in Bioinformatics, LNCS 2008, 5251: 369–381. 10.1007/978-3-540-87361-7_31
Google Scholar
Albrechta AA, Skaliotisb A, Steinhöfelb K: Stochastic protein folding simulation in the three-dimensional HP-model. Computational Biology and Chemistry 2008, 32: 248–255. 10.1016/j.compbiolchem.2008.03.004
Article Google Scholar
Ullah AD, Kapsokalivas L, Mann M, Steinhöfel K: Protein Folding Simulation by Two-Stage Optimization. Computational Intelligence and Intelligent Systems, Communications in Computer and Information Science 2009, 51: 138–145.
Article Google Scholar
Zhao X: Advances on protein folding simulations based on the lattice HP models with natural computing. Applied Soft Computing 2008, 8: 1029–1040. 10.1016/j.asoc.2007.03.012
Article Google Scholar
Lin CJ, Hsu YC: Reinforcement hybrid evolutionary learning for recurrent wavelet-based neuro-fuzzy systems. IEEE Transactions on Fuzzy Systems 2007, 15: 729–745.
Article Google Scholar
Jiang T, Cui Q, Shi G, Ma S: Protein folding simulations for the hydrophobic-hydrophilic model by combining tabu search with genetic algorithms. Journal of Chemical Physics 2003, 119: 4592–4596. 10.1063/1.1592796
Article CAS Google Scholar
Backofen R, Will S: A constraint-based approach to fast and exact structure prediction in three-dimensional protein models. Constraints 2006, 11: 5–30. 10.1007/s10601-006-6848-8
Article Google Scholar

Download references

Acknowledgements

We would like thank Dr. Roy Preece at Oxford Brookes University for the proofreading of the manuscript and Dr. Lihui Wang at Imperial College London for advice on writing of the manuscript.

This article has been published as part of Proteome Science Volume 9 Supplement 1, 2011: Proceedings of the International Workshop on Computational Proteomics. The full contents of the supplement are available online at http://www.proteomesci.com/supplements/9/S1.

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Chung Cheng University, Chiayi, 62102, Taiwan, R.O.C
Shih-Chieh Su & Chuan-Kang Ting
Department of Computer Science and Information Engineering, National Chin-Yi University of Technology, Taichung, 41101, Taiwan, R.O.C
Cheng-Jian Lin

Authors

Shih-Chieh Su
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Jian Lin
View author publications
You can also search for this author in PubMed Google Scholar
Chuan-Kang Ting
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Cheng-Jian Lin or Chuan-Kang Ting.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

SSC carried out studies on the protein folding prediction models, participated in the design and experiments of the genetic algorithm, and drafted the manuscript. LCJ conceived of the study and participated in the design of genetic algorithm. TCK conceived of the study, participated in the design and experiments of the genetic algorithm, and drafted the manuscript.

All authors read and approved the final manuscript.

Shih-Chieh Su, Cheng-Jian Lin and Chuan-Kang Ting contributed equally to this work.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Su, SC., Lin, CJ. & Ting, CK. An effective hybrid of hill climbing and genetic algorithm for 2D triangular protein structure prediction. Proteome Sci 9 (Suppl 1), S19 (2011). https://doi.org/10.1186/1477-5956-9-S1-S19

Download citation

Published: 14 October 2011
DOI: https://doi.org/10.1186/1477-5956-9-S1-S19

An effective hybrid of hill climbing and genetic algorithm for 2D triangular protein structure prediction