Ins and outs of AlphaFold2 transmembrane protein structure predictions

Transmembrane (TM) proteins are major drug targets, but their structure determination, a prerequisite for rational drug design, remains challenging. Recently, the DeepMind’s AlphaFold2 machine learning method greatly expanded the structural coverage of sequences with high accuracy. Since the employed algorithm did not take specific properties of TM proteins into account, the reliability of the generated TM structures should be assessed. Therefore, we quantitatively investigated the quality of structures at genome scales, at the level of ABC protein superfamily folds and for specific membrane proteins (e.g. dimer modeling and stability in molecular dynamics simulations). We tested template-free structure prediction with a challenging TM CASP14 target and several TM protein structures published after AlphaFold2 training. Our results suggest that AlphaFold2 performs well in the case of TM proteins and its neural network is not overfitted. We conclude that cautious applications of AlphaFold2 structural models will advance TM protein-associated studies at an unexpected level. Supplementary Information The online version contains supplementary material available at 10.1007/s00018-021-04112-1.

(a) AF2 provided structures for regions that were difficult (non-conserved extracellular loops #3 and helices between NBD and the linker, labelled by purple and red, respectively) or impossible (N-terminal helices, blue) to homology model.(b) We misaligned the non-conserved β1 strand in the homology model, which segment is also registry shifted in ABCG8 experimental structure (Fig. 3a).TM6 was misaligned because the proceeding non-conserved EL3.(c) The homology model highly deviated from initial structure more than the AF2 model based on RMSD (root mean square deviation).(d) The structure exhibited stability also on longer time scales (500 ns).(e) The homology model exhibited fluctuations in EL3 and β-strands of NBD1 higher than observed for the AF2 model in MD simulations (f).Warmer color and larger thickness indicate higher B-factor, which was calculated by GROMACS rmsf tool for frames between 50 and 100 ns.The disordered regions a.a.1-39 and 809-863 of AtABCG36, were removed before MD system building, since they exhibited lower quality score and could have possibly interfere with the simulation.

Fig
Fig. S2 Low thickness and low CCTOP reliability values may indicate structures forming complexes

Fig
Fig. S3 Structures used as reference ABC transmembrane domain folds

Fig
Fig. S6 Comparing MD simulations with the homology model and the AtABCG36 AF2 model (ACC: Q9XIE2).
Table S2 PFAM entries used to detect transmembrane ABC domains in the 21 proteomes with AF predictions