Structural models of the NaPi-II sodium-phosphate cotransporters
Progress towards understanding the molecular mechanisms of phosphate homeostasis through sodium-dependent transmembrane uptake has long been stymied by the absence of structural information about the NaPi-II sodium-phosphate transporters. For many other coupled transporters, even those unrelated to NaPi-II, internal repeated elements have been revealed as a key feature that is inherent to their function. Here, we review recent structure prediction studies for NaPi-II transporters. Attempts to identify structural templates for NaPi-II transporters have leveraged the structural repeat perspective to uncover an otherwise obscured relationship with the dicarboxylate-sodium symporters (DASS). This revelation allowed the prediction of three-dimensional structural models of human NaPi-IIa and flounder NaPi-IIb, whose folds were evaluated by comparison with available biochemical data outlining the transmembrane topology and solvent accessibility of various regions of the protein. Using these structural models, binding sites for sodium and phosphate were proposed. The predicted sites were tested and refined based on detailed electrophysiological and biochemical studies and were validated by comparison with subsequently reported structures of transporters belonging to the AbgT family. Comparison with the DASS transporter VcINDY suggested a conformational mechanism involving a large, two-domain structural change, known as an elevator-like mechanism. These structural models provide a foundation for further studies into substrate binding, conformational change, kinetics, and energetics of sodium-phosphate transport. We discuss future opportunities, as well as the challenges that remain.
KeywordsTransporter Structure prediction Homology modeling Inverted-topology repeats Repeat-swap modeling Hidden Markov models
Biological roles of NaPi-II transporters
Despite the abundance of biochemical and electrophysiological data, no three-dimensional structural data is available for this family of transporters, hindering progress towards a detailed mechanism of transport. In such situations, computational techniques to predict protein structures or to analyze the amino acid sequences of related proteins can prove a valuable stopgap, aiding in the interpretation of the available structure-functional data and leading to new, experimentally testable hypotheses.
Advances in modeling tools
The accuracy of predicted protein structures depends primarily on the level of information available for structural homologs of the protein of interest, or target. If the structure of a close homolog of the target has been determined to high resolution, then that structure can be used as a template during a procedure known as homology modeling, in which the most similar regions of the protein structure are essentially copied, while more dissimilar regions are adjusted or inserted according to physicochemical or empirical rules of protein structure. Assuming that the appropriate relationship between the template and target proteins has been identified, namely, by accurate alignment of their primary sequences, available methods for homology modeling can construct protein models with high accuracy . For membrane proteins, whose structures diverge less during evolution than those of their water-soluble counterparts due to the constraints imposed by the membrane, the reliability of homology models is particularly high. For example, when the sequence alignment between the template and target proteins contains > 40% identical residues, models built from those alignments are likely to be correct within ~ 1 Å of the native structure, at the level of the protein backbone in the transmembrane segments [11, 29]. As the similarity of the target to proteins of known structure decreases, however, several challenges arise. First, identification of the appropriate template structure becomes more difficult. Second, the likelihood of obtaining a reasonable alignment between the sequences of the target and template decreases. Finally, even in cases where the two proteins clearly share the same overall architecture, i.e., the same number, length, and spacing of transmembrane segments, the probability that the protein adopts a similar structure also diminishes. Thus, for two proteins sharing 10% identical residues, the expected accuracy of the model can be as impressive as 1.5 Å or as low as 3.5 Å, considering only the backbone atoms in the transmembrane helices [11, 29]. And, of course, one has no way of knowing where on this spectrum, the current prediction lies.
The strategy of structure prediction by homology modeling as discussed so far assumes the availability of at least one structure of similar architecture. In the absence of such a template, a number of procedures have been developed that either assemble fragments of known structure or use evolutionary information from sequence homologs to identify constraints that, in turn, are used to guide model-building. Both of these template-free methods typically fail for proteins with longer sequences, while the evolutionary methods depend on the availability of a large number of suitably diverse sequence homologs.
Neither of the template-free strategies mentioned, however, can yet reach the reliability of homology modeling when a suitable template is available. Suitable, in this case, implies a structure with a similar overall architecture or “fold,” i.e., containing the same number or length of secondary structure elements arranged in the same relative positions in space. Notably, similar folds can be adopted by proteins with essentially no matching residues, in which case, the fold detection process becomes a matter of matching evolutionary patterns and structural elements rather than individual residues. While the classical search method, BLAST, for detecting sequence relatives was revolutionary in its speed, it nevertheless relies on exact sequence matching . Its powerful cousin, the PSI-BLAST search, incorporates the evolutionary history captured after an initial BLAST search so as to increase the sensitivity in subsequent searches and thereby detect more distantly related proteins . Even greater sensitivity can be achieved by tracking the likelihood of insertions and deletions in specific positions in the evolutionary record, through methods using Hidden Markov Models (HMM) as representations of the target or template, or both [34, 37]. The HMM profiles generated by the method HMMER , for example, comprise a set of aligned sequences combined with a secondary structure prediction averaged over all sequences in the set. In the case of the HHpred prediction tool , an HMM profile generated for the query sequence is scanned against a database containing the HMM of every structure in the Protein Data Bank (PDB) .
Predicting the structural fold of NaPi-IIa transporters
Identifying the repeat units of NaPi-IIa using hydrophobicity profiles and HMMs
As mentioned above, at the turn of this century, NaPi-II transporters were believed to contain two sets of transmembrane helices, or structural repeats, separated by an extracellular loop. Each of these sets of helices contains a copy of a motif with the sequence QSSS. Based on the differences in accessibility of these motifs to either side of the membrane, the structural repeats were suggested to adopt an inverted orientation with respect to the membrane plane . However, the boundaries of these repeats were not clear (Fig. 1a). To determine which residues comprise each of the structural repeats and to establish if these two segments shared a common fold, we analyzed the hydropathy plots of these regions, taking advantage of the fact that proteins that share similar folds also share qualitatively similar hydrophobicity profiles . After dividing the full-length profile at the position of the long loop, we then aligned the two fragments, revealing a clear relationship between the first ~ 180 residues of each of the fragments. The C-terminal fragment, however, contained an extension with two strong peaks likely corresponding to two additional transmembrane segments (Fig. 1b). Based on this analysis, we concluded that NaPi-IIa contains two repeat units (RU1 and RU2) comprising approximately five transmembrane segments each and that together these repeats constitute the core fold of NaPi-IIa (Fig. 1c). Moreover, from analysis of an HMM profile representing all NaPi-IIa amino acid sequences, we observed two distinctive conserved segments corresponding to RU1 and RU2, each containing the conserved QSSS motif, in addition to a short segment on the C-terminal end of the profile . Using HHalign to align the HMM profile segments for the two conserved regions allowed us to assign the boundaries of the repeats to residues 86–256 and 335–489. The C-terminal residues 504–564 were predicted to contain two transmembrane helices (TM11–12) that are not part of the core fold, but instead are likely to be located at the periphery of the protein structure (Fig. 1c).
Template detection using hydrophobicity profiles and HMM methods
The more detailed topology illustrated in Fig. 1c helped to delineate key features of the NaPi-II fold, but was still no substitute for a three-dimensional model of the transporter. Unfortunately, for many years, no structural templates for NaPi-II transporters could be identified using conventional methods such as PSI-BLAST, while the length of the protein (~ 560 residues) precluded template-free methods of structure prediction. Moreover, the peripheral helices predicted in the NaPi-II sequence were expected to further complicate the detection of distant sequence relationships. To address these challenges, Fenollar-Ferrer et al.  questioned whether the sequence search methods might be overlooking a suitable template and adopted a more sensitive approach, namely, scanning the HMM profile of NaPi-IIa against the protein databank (PDB ) using HHpred [18, 38]. This search identified several possible templates, albeit all assigned very low scores (E values ~ 1). Each of the putative templates was examined in detail, but one stood out: the Na+-coupled dicarboxylate transporter from Vibrio cholerae, VcINDY, which belongs to the dicarboxylate:sodium symporter (DASS) family. Not only did the VcINDY sequence align with the highest coverage (~ 62%) and identity (~ 7%) of all the putative templates, but the alignment also matched the conserved QSSS motif to a motif common to the DASS family. In the available structure of VcINDY , residues in this SNT motif contribute to the binding sites for Na+ and the anionic substrate, suggesting that, despite the low sequence identity between the two proteins, the binding regions are at least conserved. Moreover, the VcINDY structure contained a prominent inverted-topology structural repeat, as expected for NaPi-IIa.
The possibility that VcINDY could be a suitable template was put into question by the observation that its structure contains at least four more transmembrane segments than had been predicted for NaPi-II. Indeed, alignments of the full-length protein sequences using conventional methods suggested segment matching that was inconsistent with the known locations of the structural repeats and the core folds; specifically, those additional helices were inserted within the core of the NaPi-II transporter fold. This result reflects a common failure of alignment methods for very distant homologs of different lengths. Fenollar-Ferrer et al.  circumvented this issue by adopting a strategy similar to that used for identifying the repeats within the NaPi-IIa fold. Specifically, the repeats of each protein were separated out and aligned in a pairwise manner, with the aim of reducing the chances that core helices become aligned to peripheral helices. Both hydrophobicity profile alignments and HMM profile alignments of the RU1 and RU2 segments of NaPi-II and VcINDY suggested that the core fold of the two proteins is similar even though the first two transmembrane helices of each of the repeats of VcINDY have no counterpart in NaPi-II proteins (Fig. 1d, e). These four helices of VcINDY are in fact peripheral and not part of the core fold responsible for binding of Na+ or substrates .
Taken together, the high sequence coverage, the qualitatively similar hydrophobicity profiles, the reasonable correspondence between helices when the HMM profiles are aligned, and the matching of conserved residues important for the function of the protein corroborated the choice of VcINDY as a suitable template for homology modeling of NaPi-II transporters. This result also suggested a new, much more complex and detailed transmembrane topology (Fig. 1f).
Building an initial model of human NaPi-IIa
The hNaPi-IIa structural model was also consistent with experimental data available in the literature at the time (Fig. 2d) [7, 16, 20, 21, 23, 24, 32, 41, 42]. In particular, cysteine-scanning mutagenesis (SCAM) data indicated high solvent accessibility of helix 1c, consistent with its location at the external surface of our model, and of loop L5ab, which is at the same depth as the substrate binding sites and, as a consequence, is accessible through the same aqueous pathway as the substrates [7, 23]. Similar experiments on Ser424 concluded that this residue was not exposed to the solvent , in agreement with a more buried position within HP2b in our model. Finally, the SCAM data obtained for TM3  is in agreement with its lipid-lining, buried location in the hNaPi-IIa model.
Predictions obtained from the initial model
Available structural models of NaPi-II transporters
Bolstered by the matching of the QSSS and SNT motifs, this model of hNaPi-IIa was also used to predict the binding sites for several of the substrates, including two of the three sodium ions required for transport. First, one of the sodium ions was modeled at the position of Na2 in VcINDY, where it could be readily coordinated by several suitable side chain and backbone groups from HP2ab and TM5 without additional modifications of the model. A second ion was tentatively modeled in the symmetric position, involving the equivalent segments from the other repeat, namely, HP1ab and TM2, consistent with the proposal from Wang and colleagues . Again, a number of suitable side chain and backbone groups were available for cation coordination in this region without further adjustment to the model. Finally, inorganic phosphate was modeled in between these two cations, almost exactly at the symmetry axis , and similar to the location of the anionic substrate in VcINDY. In this position, the double negative charge on the substrate would be expected to be compensated by the sodium ions on either side.
These modeled binding sites are predictions based on primarily on homology, which helped to identify specific residues that might be responsible for binding. In addition, they raised the question of the location of a third sodium binding site, for which no equivalent was identified in the template.
Refining the NaPi-IIa model by iterative modeling and experimental validation
Refining models based on experimental data
The electrogenic isoform NaPi-IIa is characterized by a transport stoichiometry (Na+:HPO42−) of 3:1 and by voltage-dependent transport kinetics . It has been proposed that only two steps of the transport cycle are voltage dependent and that one of those two steps is the binding of the first Na+ ion to its binding site, which is referred to as Na1. The voltage dependency of transport by NaPi-IIa can be abolished by a single point mutation, D224G, rendering the transporter electroneutral . Subsequent studies explored the role of this residue in NaPi-IIa as well as the equivalent residue in the electroneutral isoform NaPi-IIc in more depth, concluding that Asp224 potentially coordinates the Na+ ion in the Na1 binding site [3, 31].
Together, these data were used to refine the 2014 model in three main regions . First, the alignment of TM2b was shifted so that residues Gln206 and Asp209 pointed towards TM3, while locating Thr211 further away and simultaneously positioning Thr200 to participate in either Na1 or Na2 binding sites (Fig. 3c). Next, the alignment of TM5 and TM6 was adjusted to position residue Ser447 closer to the known Na1-binding residues. As a consequence, residues Thr451 and Thr454 were placed in the Na3 binding site together with Gln417, Ser418, and Ser419 (from the QSSS motif of HP2; Fig. 3c, d). The resultant structural model is improved in the TM5-TM6 region (Table 1) according to the per-residue score from the empirical membrane protein model scoring function, ProQM . The largest improvement in score was observed for TM6, probably due to the repositioning of three arginine side chains into the cytosol and away from the hydrophobic core of the membrane.
The final refined model, published in 2015, represents the hNaPi-IIa state in which the transporter is loaded with three sodium ions occupying the Na binding sites Na1, Na2, and Na3 and with a phosphate molecule interacting with sodium ions at the Na2 and Na3 sites . Residues Thr200, Gln206, Asp209, and Asn227 coordinate one sodium ion at binding site Na1, while Arg210 and Asp224 form a salt bridge nearby. As mentioned above, the refinement also reorganized part of the Na3 binding site so that it is instead formed by residues Gln417, Ser418, Ser419, Thr451, and Thr454 (Figs. 3d, e).
For Na1, the final prediction involves residues from three different TM segments: TM2b, TM3, and TM5, which are far from one another in sequence. The fact that the experimental phenotype upon mutation of these residues is so consistent provides very strong support for the hypothesis that NaPi-II transporters share a common architecture with DASS family to which VcINDY belongs.
Validation of the ion binding sites by structure comparison
Our computational studies of NaPi-II transporters indicate that this protein family has an overall architecture and core topology similar to that of VcINDY, even though the number of helices and their transmembrane orientation probably differ. More recently, X-ray structures of the transporters YdaH and MtrF [6, 39], which belong to the AbgT family, were compared with the structure of VcINDY, revealing a common two-domain fold—comprising the so-called transport and oligomerization domains—and demonstrating that the structures of the transport domains are particularly well conserved . The structure of YdaH was of particular interest, as a Na+ ion was detected in the second structural repeat. The coordination of this ion involved residues from hairpin HP2 and the helix TM7. This position is symmetric to the site of the Na+ ion bound to repeat 1 of VcINDY and involves equivalent elements to the proposed Na3 binding site in the most recent model of NaPi-IIa, i.e., HP2 and TM5 . Structural comparison of YdaH and NaPi-IIa by aligning their transport domains indicated that the predicted Na3 site in NaPi-IIa is in excellent agreement with the position of the Na3 site in YadH. Indeed, the ion at Na3 and the Cα carbons of residues Ser418 and Thr454 in NaPi-IIa are < 2 Å from the ion and equivalent groups in YadH. This observation provides strong validation of the refined hNaPi-IIa model .
Examining conformational change using repeat-swap modeling
While the structural models of hNaPi-IIa reported in 2014 and 2015 provide important insights into the overall topology, they do not reveal a great deal about the mechanism by which the protein changes conformation so as to expose the binding sites to the opposite side of the membrane. For other secondary active transporters with inverted-topology repeats, it has been shown that a model of the opposite state than that observed experimentally can be constructed by exploiting the inherent asymmetry of the known structure [10, 12, 13]. Specifically, the asymmetry manifests as two distinct conformations for the repeat units. Thus, by exchanging their conformations (i.e., RU1 adopting the conformation of RU2, and vice versa), one can reveal the alternate state, i.e., with the binding site exposed to the other side of the membrane. In essence, this so-called repeat-swap modeling procedure is simply homology modeling, albeit using the two halves of the protein as templates for their counterparts simultaneously.
Repeat-swap modeling has been used to predict that VcINDY, the protein used as a template for modeling NaPi-IIa, uses a two-domain elevator-like mechanism . In this dramatic conformational change, observed previously for another transporter containing hairpins, GltPh [35, 36], the substrate binding site is moved in its entirety along with the rest of the transport domain, while another component of the transporter (typically the oligomerization interface) remains essentially static with respect to the membrane plane. The elevator-like conformational mechanism is quite distinct from mechanisms adopted by proteins such as LeuT, in which structural elements “rock” or make clam-shell-like movements around a central binding site. We note also that hybrid mechanisms, combining features of both rocking and elevator-like movements, may also be possible .
Unfortunately, although the inward-facing NaPi-IIb model was of reasonable quality according to the ProQM score (Table 1), this model was limited as it is missing the extracellular loop connecting TM3 and TM4a, as well as the last two transmembrane helices, in addition to being a monomer (as the dimer interface is unknown). The absence of the long extracellular loop in particular prevented a conclusive comparison or validation based on voltage-clamp fluorometry measurements carried out to examine the conformational change . Thus, although the biophysical measurements led to the conclusion that this protein undergoes a large movement similar to that predicted in an elevator-like mechanism, the details of the conformational change remain to be firmly established for the NaPi-II transporters.
The future of NaPi-II structure-function studies
The structural models available for NaPi-II transporters have guided a number of experiments that have elucidated central features of their function, including residues contributing to substrate binding and an elevator-like conformational mechanism. Nevertheless, much remains to be learned, including a more detailed atomistic description of the key binding regions as required for drug discovery, as well as conformations of the protein in apo and partially occupied states, to help delineate the steps in the transport cycle. At present, all available models of NaPi-II transporters are limited to the core transmembrane elements and lack the C-terminal peripheral helices, the terminal elements, and the long extracellular loop that hosts the glycosylation sites. Moreover, in the absence of the peripheral helices, it is unclear exactly how the transporter would dimerize, although evidence from other elevator-like transporters indicates that the dimer interface would likely not involve elements of the transport domain. Additional structural data, even in the form of low-resolution cryo-EM maps, would be of great value in this regard, for example, by aiding with positioning of probes to examine transport dynamics and kinetics. Finally, resolving the terminal domain structures would provide key information relating to regulatory interactions with cytoplasmic proteins.
In the meantime, further modeling studies have the potential to provide important insights. For example, recently developed methods that leverage evolutionary-coupling information (see  for review) could provide restraints to complete the model of the protein, including contacts between the peripheral helices and those in the core, or even to refine helix-helix contacts within the core of the protein. As additional structures become available, e.g., of VcINDY in different conformations, or of more closely related proteins, these structures may be used as templates to build additional models that can guide experiments in unforeseeable, but exciting new directions. Whatever may be the case, these studies make clear that structure prediction can, and will continue to, offer powerful contributions when integrated closely with functional studies (see Forster IC et al. in this issue).
We gratefully acknowledge the contributions of Ian Forster (Zurich) and Andreas Werner (Newcastle), whose insight and expertise have made the collaboration leading to this review enormously satisfying and fun.
This research was supported by the Division of Intramural Research of the NIH, National Institute of Neurological Disorders and Stroke, National Institute of Mental Health, and National Institute of Deafness and Other Communication Disorders.
- 5.Biber J, Hernando N, Forster I (2013) Phosphate transporters and their function. Annu Rev Physiol 75:535–550. https://doi.org/10.1146/annurev-physiol-030212-183748 CrossRefGoogle Scholar
- 7.Ehnes C, Forster IC, Bacconi A, Kohler K, Biber J, Murer H (2004) Structure-function relations of the first and fourth extracellular linkers of the type IIa Na+/Pi cotransporter: II. Substrate interaction and voltage dependency of two functionally important sites. J Gen Physiol 124:489–503. https://doi.org/10.1085/jgp.200409061 CrossRefGoogle Scholar
- 14.Forster IC, Hernando N, Biber J, Murer H (2012) Phosphate transport kinetics and structure-function relationships of SLC34 and SLC20 proteins. Curr Top Membr 70:313–356. https://doi.org/10.1016/B978-0-12-394316-3.00010-7 CrossRefGoogle Scholar
- 39.Su CC, Bolla JR, Kumar N, Radhakrishnan A, Long F, Delmar JA, Chou TH, Rajashankar KR, Shafer WM, Yu EW (2015) Structure and function of Neisseria gonorrhoeae MtrF illuminates a class of antimetabolite efflux pumps. Cell Rep 11:61–70. https://doi.org/10.1016/j.celrep.2015.03.003 CrossRefGoogle Scholar
- 43.Yu X, Yang G, Yan C, Baylon JL, Jiang J, Fan H, Lu G, Hasegawa K, Okumura H, Wang T, Tajkhorshid E, Li S, Yan N (2017) Dimeric structure of the uracil:proton symporter UraA provides mechanistic insights into the SLC4/23/26 transporters. Cell Res 27:1020–1033. https://doi.org/10.1038/cr.2017.83 CrossRefGoogle Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.