Membrane Biogenesis pp 85-101
Molecular Dynamics Simulations of Membrane Proteins
Molecular dynamics simulations are a powerful tool for complementing experimental studies, providing insights in biological processes at the molecular and atomistic level, at timescales from picoseconds to microseconds. Simulations are useful for testing hypotheses and can provide explanations for experimental observations as well as suggestions for further experiments. This does require that the simulation setup allows assessment of the question addressed. For example, it is evident that for simulation of a protein in its functional state the protein model and the environment have to mimic the biological situation as close as possible. In this chapter, a general strategy is presented for setting up and running simulations of membrane proteins of known structure in biological membranes of diverse composition and size.
Key wordsMembrane proteins Lipid bilayers Molecular dynamics simulation All-atom force field Coarse-grained force field MARTINI Backmapping
In molecular dynamics (MD) simulations, particles are represented by soft spheres that interact through bonded and nonbonded interactions. This classical approximation allows using Newton’s equations of motion to determine the time evolution of a system, from which inferences can be made about the mechanics and kinetics of processes observed. The forces between particles are calculated for every configuration, based on a set of functions and associated semiempirical parameters, together forming a force field. Force fields typically contain parameters for bonds, angles, and torsion angles, as well as for nonbonded Coulombic and van der Waals interactions, but may contain additional terms, for example, explicit parameters for hydrogen bonds.
Atomistic MD simulations can currently be performed for system sizes of up to a million atoms and simulation times in the microsecond range. Recent developments in specialized hardware have even enabled performing simulations of membrane proteins for hundreds of microseconds . Yet such resources are not commonly available and the computational resources required for investigating relevant biological processes may often be considered excessive. The length and time scales accessible may be extended significantly by so-called coarse-grained models at the cost, however, of accuracy and resolution. Both atomistic and coarse-grained models are now widely used in the field of membrane simulations (for a review see ). The type of simulation to be chosen depends very much on the particular problem and the following questions should be considered: What is the timescale of the processes to be studied? How large should the membrane environment be chosen? Is sufficient sampling in the simulation expected?
In all-atom simulations, the standard technique to study membrane proteins in a lipid bilayer is based on the insertion of the protein of interest into a pre-equilibrated bilayer of given composition and size, moving the lipids out of the way (see, e.g., ). A different strategy in use is based on building a bilayer around the protein, either by placing lipid by lipid around the protein  or by spontaneous aggregation of lipids to form a micelle or a bilayer around the membrane protein [5, 6]. The latter methods require comparably long simulation times, i.e., of up to hundreds of nanoseconds for the simulation of the combined system, requiring several days of computational time on a high-performance compute cluster. An additional problem arises when the membrane to be inserted has a mixed composition. For single-component membranes, a merged system will be close to equilibrium, but in multicomponent membranes, specific interactions between the protein and the different lipids may cause the merged system to be far from equilibrium, requiring up to microseconds for resorting of the lipids.
Modeller  (http://salilab.org/modeller/download_installation.html): A program for homology modeling. Note that it requires registration, but it is free for academic use.
Martinize.py  (http://md.chem.rug.nl/cgmartini/index.php/downloads/tools/204-martinize): A program for generating coarse-grained structures and topologies from atomistic structure files of proteins and nucleic acids.
Insane  (http://md.chem.rug.nl/cgmartini/index.php/downloads/tools/239-insane): A program for building membranes of arbitrary composition for coarse-grained simulations.
Backward  (http://md.chem.rug.nl/cgmartini/index.php/downloads/tools/240-backward): A program for mapping an atomistic representation to a coarse-grained system.
A further requirement is the polarizable MARTINI force field, version 2.1 [7, 8], which can be obtained from http://md.chem.rug.nl/cgmartini/index.php/component/content/article/1-latest-news/224-m22.
Additional topology and structure files, as well as simulation parameters and scripts, are available at http://www.biotechnik.nat.uni-erlangen.de/research/boeckmann/downloads/mps/
For visualization of molecular dynamics trajectories and for displaying protein or membrane structures, it is advised to install vmd (Visual Molecular Dynamics)  program: http://www.ks.uiuc.edu/Research/vmd/.
The steps described below are written to be used on a Linux-based workstation. On other platforms, the syntax may be different. Following every step, it is important to check for errors and warnings reported by the software used. It is vital to assert that the results from every step are correct to avoid wasting time and computational resources on propagating errors.
3.1 Preparing the Structure of the Membrane Protein
Start with downloading a structure of the membrane protein of your choice from the Protein DataBank (http://www.rcsb.org) or from the Orientations of Proteins in Membranes database (http://opm.phar.umich.edu/) (seeNote1).
PreparationBefore building a system from your structure, make sure to answer the following questions:
Is the structure complete or are residues or parts thereof missing?
Are there ligands, cofactors, (metal) ions, or posttranslational modifications, such as palmitoylation, which are required for the process of interest?
What is the oligomeric state of the protein? Is it active as a monomer? Does it require other proteins for stability and/or function?
Which amino acids belong to the extracellular part, which belong to the intracellular part, and which constitute the transmembrane region of the protein?
What is the composition of the membrane required for the process to be investigated? Are models for the components required available in the force field of choice?
What is the time scale required for the process of interest?
Correcting for missing heavy atoms from side chains.
If the protein is missing atoms other than hydrogens, these have to be built in using modeling software. Hydrogen atoms will be built automatically based on the heavy atom positions during generation of a topology in step 5, and are of no concern.
The program Modeller comes with a tool for filling in missing heavy atoms automatically. It is assumed that the protein structure is contained in a file in PDB format, called protein.pdb. The syntax for fixing the atoms then is
The program writes out a file protein_allatom.pdb containing only heavy atoms.
In some cases it may be convenient to manually repair a structure, for example, using GUI-based software, such as SwissPDB Viewer  or PyMOL  (seeNote2). In PyMOL, there is a mutagenesis wizard that can be used for rebuilding side chains, by mutating the broken residue to the same type, allowing choosing between different rotamers.
Adding missing residues
It happens regularly that certain portions of a protein cannot be resolved in experiments due to increased motility. For the termini this may not matter much, but missing loops in the protein sequence need to be handled with care. Modeller has an additional module to rebuild missing loops based on the sequence of the protein. Structures obtained from the PDB have the sequence of the protein listed, which can be extracted using
Note that this requires the structure file to be named protein.pdb. The sequence is written to a file named protein.seq. This file should be copied twice to a file with the name alignment.ali, in which the ‘structure’ line in the second copy should be replaced by ‘sequence’, and the missing residues should be inserted. Below is an example of such a file for the Melectin peptide with missing residues at positions 5–7 (Ile-Leu-Lys):
structureX:protein: 1 : :+16 : :::-1.00:-1.00
The next step, building the missing residues, requires editing the script http://www.biotechnik.nat.uni-erlangen.de/research/boeckmann/downloads/MPS/model-missingres.py. For the example given above, the residue range should be set to residues 5–7 (line 13):
In addition, the number of models to be generated has to be specified (lines 20–21):
a.starting_model= 1 # index of the first model
a.ending_model = 5 # index of the last model
After editing the script, Modeller can be run to fix the structure. The input structure file should be named protein.pdb and the file alignment.ali needs to be available:
As specified in the script 5 different structure files will be generated, named protein_fill.BL$modellercode.pdb. From these files, the best candidate should be chosen, based on visual inspection in, e.g., PyMOL. The selected structure should be renamed to protein_corrected.pdb.
Topology and protonation states
At this point, a correct starting structure should be available, from which the simulation system can be built. The first step in a simulation is bringing the structure in line with the force field chosen and constructing a topological description of the protein. In GROMACS, the tool for this operation is called pdb2gmx. This program also adds hydrogen atoms to the protein structure based on the positions of the heavy atoms and can be instructed to ignore hydrogens that are present in the file (-ignh). The program is run in interactive mode, which asks the user to specify protonation states for titratable residues (-inter) and termini (-ter):
pdb2gmx -f protein_corrected.pdb -ter -ignh -inter
Standard protonation at neutral pH may be chosen by removing the -inter option from the above command, but mind that this will neglect pKa changes due to the environment (seeNote3).
The program will ask for the protein force field and water type for the simulation. Mixing different, possibly incompatible force fields and lipid or solvent models should always be avoided, as it may lead to artifacts, e.g., in the protein–lipid interaction. For membrane protein simulations, currently the CHARMM36  and the Amber force field [19, 20] offer parameters for both lipids and proteins. For a united atom force field, in which nonpolar hydrogens in methyl or methylene groups are not explicitly simulated, the GROMOS96 force field (preferentially 54A7) [21, 22] is widely being used.
The command given above should yield an output structure named conf.gro and a corresponding topology in a file named topol.top.
Orientation of the protein in space.
In the case of membrane proteins, the orientation of the protein with respect to the membrane requires some attention. The protein should be positioned with the transmembrane region aligned with the membrane plane. Several methods are available for positioning the protein. One of such methods is used as a basis for the Orientations of Proteins in Membranes database, where the oriented structures of membrane proteins available in the Protein DataBank are made available. However, it is often sufficient to orient the protein along its principal axes, although the validity of the result requires to be checked. Aligning the protein with GROMACS can be done using:
editconf –f conf.gro -princ -o protein_princ.pdb
If the resulting orientation is not correct, it is also possible to add the option –rotate to the command line, which takes three arguments, denoting rotation around x, y, and z, respectively.
At this point, the protein should be prepared, including the topology and the proper orientation with respect to the membrane. The next step is setting up and adding the membrane.
3.2 Membrane-Protein System Setup
Coarse-graining of the system and adding a lipid bilayer, water, and ions.
First the protein needs to be converted to coarse-grained (CG) representation. This is done using the program martinize.py that was introduced as a one-step solution for generation of a coarse-grained structure and topology from a protein structure. The program uses the DSSP program for determining the secondary structure, the path to which needs to be provided to the program. For explanation of the options, the user is referred to the programs internal help (./martinize.py –h).
./martinize.py -f protein.pdb -nt -v -o topol.top -x protein_CG.pdb -p ALL -ff martini21p -dssp dssppath/dsspcmbi
Following this step, the structure is energy-minimized in vacuum to resolve stretched bonds that may be introduced by the conversion. This is done in two steps, using gromacs. In the first step the structure, topology, and run parameters are combined into a run input file, using grompp. The second step runs the simulation using mdrun.
grompp -f martini_min.mdp -p topol.top -c protein_CG.pdb -o min_protein.tpr
mdrun -v -deffnm min_protein
Next a coarse-grained lipid bilayer is added around the protein, as well as water and ions. This step uses the program Insane (INSert (in) membrANE), which is a versatile tool for building coarse-grained membranes and solvent.
./insane.py -f min_protein.gro -o withbilayer.gro -pbc rectangular -x 10 -y 10 -z 10 -l POPC -sol PW -salt 0.15
In the example above, a cubic box of 10 nm side length is chosen, a POPC lipid bilayer is added as well as water and ions (NaCl) at a physiological concentration of 0.15 M (seeNote5). SeeNote6 for setting asymmetric bilayers and defining the relative abundance of lipids in each leaflet.
The resulting protein/bilayer assembly should be checked in a molecular viewer. For visualization of bonds between coarse-grained beads one can use vmd together with the script cg_bonds.tcl. To load this script in VMD, type in the console of VMD:
cg_bonds -tpr topol.tpr -gmx gromacspath/gmxdump -cutoff 12
Bilayer and water equilibration.The added bilayer and water resemble a crystalline arrangement (Fig. 2), which is quickly dispersed in a simulation run with position restraints on the protein. In this type of simulation, the CG protein atoms are fixed in space using harmonic restraints. The insertion depth of the protein in the bilayer can still be adjusted, though, by motion of the surrounding lipids. Before running the simulation, it has to be asserted that the topology file of the coarse-grained system contains definitions for all components, including solvent and lipids used.
The system can then be energy-minimized, using a sequence of commands similar to that given before:
grompp -f martini_min.mdp -p topol.top -c withbilayer.gro -o min_all.tpr
mdrun -v -deffnm min_all
Subsequently, the position restraint simulation is run, again in a similar manner:
grompp -f posre.mdp -p topol.top -c min_all.gro -s CG_posre.tpr
mdrun -v -deffnm CG_posreThe system as it is built, using martinize and insane, may still be strained. To relax the system, a series of short runs have to be performed in the isothermal-isobaric ensemble (NpT, at constant temperature and pressure) with position restraints. In these runs the integration time step is gradually increased from 1 fs to 2 fs, 5 fs, 10 fs, and finally to 20 fs. The relaxation can be checked from the convergence of the simulation box size using the g_energy tool of GROMACS (compare Fig. 3).
The equilibration of a membrane may take between tens to hundreds of nanoseconds, or several microseconds for complex, multicomponent membranes.
Backmapping to all-atom representation.
In order to study the protein mechanics and protein–lipid interactions at atomic resolution, the system has to be converted to an all-atom (e.g., Amber or charmm force field) or to a united atom representation (GROMOS), using a procedure called reverse transformation or backmapping. It may be useful to delay this step and perform a first stage of unrestrained coarse-grained simulation, to obtain a rough view of the energy landscape. Structures can then be selected from the resulting trajectory and converted back to atomistic. In this way, larger conformational changes may be identified which should, however, be interpreted carefully due to the limited accuracy of current CG force fields for proteins as compared to atomistic force fields.
A detailed description of the procedure used here for backmapping will be described elsewhere . At the core of this method is the program backward, which constructs an atomistic starting structure from the coarse-grained positions. Unique about this procedure is that it requires only the coarse-grained structure and the atomistic topology for the conversion. The method is tailored to work with a native version of GROMACS, although the protocol can be easily implemented for other MD packages, without requiring changes in the code.
Here, the MARTINI structure CG_posre.gro is converted to CHARMM36 all-atom representation and written to aa_charmm.gro:
./initram.sh -f CG_posre.gro -o aa_charmm.gro -to charmm36 -p topol.topFigure 4 shows part of a CG POPC bilayer together with the atomistic structure resulting from the backmapping.
3.3 Membrane-Protein Simulation
RelaxationWhen a system is built by direct embedding in an atomistic membrane or by backmapping from a coarse-grained structure, the system can be strained due to the treatment. This strain needs to be dissipated before running the production simulation, which is achieved by a series of simulations in which the system is gradually relaxed to the production simulation conditions. This means that at this point a clear notion of the simulation parameters is required. In particular, the following points need attention, which are controlled in the simulation by parameters in the simulation parameter (.mdp) file (seeNote7):
The sequence for relaxation typically comprises the following steps:
Treatment of nonbonded interactions. Several schemes are available for determining nonbonded interactions. Which scheme should be used depends on the force field chosen. Here, using CHARMM36, the particle mesh Ewald (PME)  method should be used for Coulombic interactions and a shifted Lennard-Jones potential is used for van der Waals interactions. This is preset in the parameter files provided.
Temperature and pressure control. Due to numerical methods used, it is necessary to control the temperature and pressure by coupling to an external bath. In most cases, physiological conditions should be used, meaning a pressure of 1 atm and a temperature of 298 K for laboratory conditions or 310 K for in vivo conditions. Several methods are available for control of temperature and pressure. The current opinion favors the use of the Parrinello–Rahman algorithm for control of pressure and either the Nosé–Hoover or the Bussi algorithm for control of temperature. As these may not be stable if the system is far out of equilibrium, it is commonly advised to use the Berendsen algorithm for equilibration towards the target values, in particular for the pressure. The different thermodynamic properties of the membrane with the protein and the solvent may cause energy to flow from one part to another, which has to be avoided by separately coupling protein and the lipids to one temperature bath, and water and ions to a second bath.
MD at constant temperature and volume (no pressure coupling) with position restraints (seeNote8) on the protein to allow equilibration of the environment around the protein, without disturbing the protein’s internal state:
grompp –f nvt.mdp –c aa_charmm.gro –p topol.top –o nvt.tpr
mdrun –deffnm nvt
MD at constant temperature and pressure with position restraints on the protein to allow further relaxation of the environment:
grompp –f npt.mdp –c nvt.gro –p topol.top –o npt.tpr
mdrun –deffnm npt
Short unrestrained MD under production conditions
grompp –f pre.mdp –c npt.gro –p topol.top –o pre.tpr
mdrun –deffnm pre
With the system being relaxed, it is time to set up the production run. This requires specifying the length of the simulation by setting the time step to use and the number of steps to simulate. The total time required depends on the typical time of the process studied. Other points of attention are the frequency to write out structures and energies, whether and how often to write out velocities, and which groups to use for splitting the total energy into intra-group and inter-group interaction energies. The latter can be useful for investigating protein–lipid, protein–protein, or protein–ligand interactions. These settings are listed in the simulation parameter file md.mdp. The production simulation is run using the same grompp/mdrun combination as before:
grompp –f md.mdp –c pre.gro –p topol.top –o md.tpr
mdrun –deffnm md
If after the production simulation it appears that the system has not converged sufficiently, then the run should be extended.
3.4 Simulation Analysis
Simulation quality assurance.
Check the output logfile for error messages (file md.log) and visually inspect the simulation trajectory (traj.xtc), i.e., the file containing the system coordinates at specified time steps of your simulation (specified in the mdp file) and/or the final structure of your simulation in comparison to the starting structure. Particular attention should be paid to the formation of unexpected pores in the membrane, a separation of the lipid leaflets, ion aggregation on the protein surface, or loss of protein secondary structure. Such events may indicate problems in the chosen force field (combination) or the setup of the mixed protein–lipid system.
Inspection of the potential energies, temperature, volume, or box sizes (using the g_energy tool of GROMACS) provides valuable information about the equilibration of the biomolecular system, e.g., no drift in the lateral box size should be observed after equilibration.
Protein structure and quality assurance.
As a first step, calculate the root mean square deviation (RMSD) of the protein as a function of simulation time (use the GROMACS tool g_rms). The RMSD is a measure for the deviation of the protein structure from the crystal structure. The fluctuations may be analyzed using the GROMACS tool g_rmsf. It provides hints about less or highly flexible regions. The secondary structure and its time evolution is analyzed by the GROMACS tool do_dssp for each residue.
Check the distance between the images of your protein in the periodic simulation setup (GROMACS tool g_mindist with the flag -pi). The distance should at least be larger than twice the largest cutoff distance for nonbonded interactions (seeNote9).
The tilt angle of the whole protein or of its individual helices differs usually for different membrane thicknesses and lipid types. Its analysis (using, e.g., the GROMACS tool g_bundle) is instructive to, e.g., learn about the preferred lipid environment of the membrane protein.
Lipid bilayer around the protein.The insertion depth of the protein in the membrane, i.e., the distance of the centers of mass of the protein and the bilayer, can be analyzed with the tool g_dist. For locating the insertion depths of individual residues in more detail the tool g_density is perfectly suited (compare the example given in Fig. 5).
Calculation of the diffusion coefficients of lipids as a function of the distance from the membrane protein provides insight into the strength of the lipid–protein interaction and about protein-induced long-range ordering effects in the membrane. Use the GROMACS tool g_msd (option -lateral z) to compute the diffusion coefficients of lipids (choose the center of mass of the lipid headgroups as a reference, and substract the overall lateral motion of the whole leaflet).
GridMAT-MD  is a simple program calculating the area per lipid and bilayer thickness for protein-membrane complexes. Check the webpage http://www.bevanlab.biochem.vt.edu/GridMAT-MD/index.html for details and download.
In order to conclude on the fluidity of your membrane the deuterium order parameters can be calculated for each individual chain (sn1 and sn2) by using g_order.
Check for ligands used, e.g., for improved crystallization, chemically modified amino acids, or nonstandard amino acid names in the pdb file. Make sure that your final structure file contains only polypeptide chains.
If you suspect some amino acids adopting a nonstandard protonation, calculate first the pKa of the titratable groups. One example of such a program is the freely available mcce (Multi-Conformation Continuum Electrostatics) [25, 26].
If you already have a bilayer of interest at hand, the protein is not too large, and the orientation of the protein in the membrane is known, the easiest possibility to insert the protein into the lipid bilayer is the GROMACS tool g_membed . In this tool, the protein is first shrunken and inserted into the bilayer, removing overlapping lipids and water molecules. Subsequently, the protein is slowly resized to its original size, moving the lipids out of the way.
In a first step, orient the protein to the membrane normal and translate it to the membrane center (using editconf). Then run g_membed:
grompp -f membed.mdp -p topol.top -c solvated.pdb
g_membed -f topol.tpr -p topol.top -xyinit 0.1 -xyend 1.0 -nxy 1000 -v -n index.ndx
The currently supported lipid types inside http://md.chem.rug.nl/cgmartini/index.php/downloads/tools/239-insane are:
Phospholipids: DPPC, DHPC, DLPC, DMPC, DSPC, POPC, DOPC, DOPC, DAPC, DUPC, DPPE, DHPE, DLPE, DMPE, DSPE, POPE, DOPE, PPCS, DOPG, POPG, DOPS, and POPS
Glycolipids: DSMG, DSDG, DSSQ GM1, DGDG, MGDG, SQDG, CER, GCER, DPPI, PI, and PI34
However, be aware that not for all of the above mentioned lipids all-atom topologies are currently available.
The syntax of insane.py to generate a mixed symmetric bilayer is:
./insane.py -f protein.pdb -o withbilayer.gro -pbc rectangular -x 10 -y 10 -z 10 -l POPC:70 -l POPS:10 -l CHOL:20 -sol PW -salt 0.15
For preparing an asymmetric bilayer additionally provide the flags -u (upper bilayer) und -l (lower bilayer):
./insane.py -f protein.pdb -o withbilayer.gro -pbc rectangular -x 10 -y 10 -z 10 -l POPC:70 -l POPS:10 -l CHOL:20 -sol PW -u POPC:80 -u CHOL:20 -salt 0.15
It may be convenient to have the sum of relative abundances in each leaflet to sum up to 100.
For an overview and explanation of all options for the simulation parameter file check the GROMACS manual or the webpage http://manual.gromacs.org/online/mdp_opt.html.
A file with restraints on selected atoms can be generated using the gromacs tool genrestr. As an input it requires a structure file as well as an index file with a group of atoms that should be restrained. Index files are most effectively generated using the make_ndx tool of GROMACS.
For charged systems it is recommended to have a minimum distance between the images of the protein of four times the Debye screening length (0.7–0.8 nm at physiological salt concentration of 0.15 M) to avoid self-interaction.
This work was supported by a grant from the Deutsche Forschungsgemeinschaft (BO 2963/2-1) to RAB.
- 2.Tieleman DP (2012) Computer simulation of membrane dynamics. In: Comprehensive biophysics, vol 5. ElsevierGoogle Scholar
- 13.Wassenaar TA, Sengupta D, Tieleman DP, Marrink SJ (in preparation) INSANE: fast and versatile generation of custom membranes for molecular simulationsGoogle Scholar
- 14.Wassenaar TA, Pluhackova K, Böckmann RA, Marrink SJ, Tieleman DP (2013) Going backward: A flexible geometric approach to reverse transformation from coarse grained to atomistic models. (in preparation)Google Scholar
- 17.The PyMOL molecular graphics system, Version 22.214.171.124 Schrödinger, LLCGoogle Scholar