Background

Evolution studies have been undertaken to identify those genetic changes that underlie human-specific features such as susceptibility to acquired immunodeficiency syndrome, bipedalism, a large brain, and higher-order cognitive functions. Several phenotypic differences distinguishing human from other great apes species obviously rely on cerebral activity. Large-scale studies in human and chimpanzee using either genome comparisons [1, 2] or brain transcriptome analyses [35] have led to the identification of a subset of genes that may have contributed to the evolution of human brain anatomy and activity from a common primate ancestor. An important complementary approach has relied on the study of candidate genes selected on the basis of their importance in specific human phenotypes. Consequently, several genes involved in the structure and/or functioning of the human brain have been associated with recent positive selection: ASPM [6, 7], MCPH1 [810], GLUD2 [11], MAOA [12, 13], SHH [14], and the "speech gene" FOXP2 [1517]. More recently, accelerated evolution of noncoding sequences has also been shown [18, 19].

The Rolandic and Sylvian fissures divide the cortex hemispheres of primates into their main anatomical structures. In human, these areas participate in speech production under the control of the Broca's area. We recently identified the SRPX2 gene as being responsible for two related disorders of the Rolandic and Sylvian speech areas [20, 21]. Since it is linked to defects in the functioning and the development of such brain regions, such as epileptic seizures, oral and speech dyspraxia, or bilateral perisylvian polymicrogyria, SRPX2 may be one of the specific genes whose evolution at the DNA-level may have participated in the recent emergence of higher-order cognitive functions, including the adaptive organization of brain areas for speech production.

In this study, we have examined the molecular evolution of the SRPX2 gene. One single, fixed amino acid change occurred in the first sushi domain (also known as CCP – complement control protein – module, or short consensus repeat) of SRPX2 after the human-chimpanzee split. Three-dimensional modeling showed that both this evolutionary mutation and a previously identified disease-associated mutation [20] lie within a hypervariable loop shared by all sushi modules and that has been implicated in some cases in protein-protein interactions [22]. Using the branch models, the synonymous/non-synonymous analysis was consistent with accelerated evolution in the human lineage but this could not be confirmed when the branch-site models were used. Population genetics tests did not reach statistical significance, indicating either that a selective sweep may have occurred more than 100 000–200 000 years ago, or that there has been no episode of positive selection on SRPX2.

Results and discussion

One single amino acid substitution (R75K) has been fixed in human SRPX2 since the human-chimpanzee split

In silico screening of publicly available databases followed by phylogenetic analysis showed that SRPX2 belongs to a family of five genes: SRPX2 itself (sushi-repeat-containing protein, X-linked 2), SRPX (sushi-repeat-containing protein, X-linked), SELP (selectin P precursor), SELE (selectin E precursor), and SVEP1 (selectin-like protein) gene (Fig. 1; Additional file 1). This family emerged during vertebrate evolution. In order to trace the recent evolutionary history of SRPX2, all coding exons were sequenced from a subset of primate species that represent key evolutionary steps: human [Genbank: NM_014467], chimpanzee [Genbank: EF369515], gorilla [Genbank: EF369516], orangutan [Genbank: EF369517], gibbon [Genbank: EF369518], macaque [Genbank: EF369519], and baboon [Genbank: EF369520]. Sequence comparison (Fig. 2) detected only five amino acid variant sites (1.1%; 5/465). One discrepancy between the protein [Genbank: ABN46998] predicted from the five chimpanzee SRPX2 sequences obtained here (see Methods section), and the chimpanzee SRPX2 protein [Genbank: XP_521170] predicted from the genomic sequence at the UCSC database [23], was detected (D429 in ABN46998; N429 in XP_521170). This likely corresponds to an error in the chimpanzee genome sequence previously available, although it may be due to chimpanzee polymorphism. The only amino acid substitution specific to the human lineage is an Arg to Lys change at position 75, corresponding to a 224A>G mutation within exon 4 of SRPX2. The possible importance of the R75K substitution in the evolution of the human species was indicated by the conservation of R75 in all SRPX2 orthologues, from nonhuman primates to ray-finned fishes, as well as by the fixation of K75 in human, as demonstrated by the lack of any variation at position 75 in silico, in the 24 worldwide X-chromosomes tested here, and in the 624 Caucasian control X-chromosomes previously screened [20].

Figure 1
figure 1

Phylogenetic tree of the SRPX2 family members. ag: Anopheles gambiae; bt: Bos taurus; cn: Canis familiaris; dr: Danio rerio; gg: Gallus gallus; hs: Homo sapiens; mm: Mus musculus; ma: Macaca mulatta; rn: Rattus norvegicus; pt: Pan troglodytes; tn: Takifugu rubripes; tr: Tetraodon nigroviridis. The list of the ENSEMBL gene accession numbers used to construct the phylogenetic tree is available in Additional file 1.

Figure 2
figure 2

Amino acid sequence alignment of SRPX2 from human and nonhuman primates. Dots represent residues identical to the human amino acid sequence. Black lines represent the three Sushi domains and the dotted line represents the HYR domain. The site of the human-specific variation (R75K) is boxed.

K75 is situated in the hypervariable loop that is usually implicated in protein-protein interactions

The R75K human-specific modification occurred in the first sushi domain of the protein. Sushi domains have been identified in several proteins of the complement system and in the selectin family of proteins [24]. They may serve in protein-protein interactions [22], as demonstrated in the case of the neurocan-L1 interaction [25]. R75K occurs only three amino acid residues away from the tyrosine residue (Y72) that is mutated in a patient with rolandic seizures and bilateral perisylvian polymicrogyria and in his female relatives with mild mental retardation [20]. The amino acid residue at position 75 may thus participate in the proper function of the first sushi domain of SRPX2.

To further address this issue, three-dimensional modeling of the first sushi module of the human SRPX2 protein (i.e. sushi 1) was undertaken by homology with a known sushi domain (CCP module) structure (Fig. 3; Additional file 2). Sushi domains are characterised by a compact hydrophobic core, containing an almost invariant Trp residue, which is enclosed in a framework of five extended segments that form β-strands for all or part of their lengths. The extended segments are aligned with the long axis of the module and are held together by two strictly conserved disulfide bridges [24]. In a previously performed cluster assignment exercise undertaken for more than 240 sushi domain sequences [26], the first sushi domain of SRPX (which is highly similar to sushi 1) fell into cluster-D, which also includes many modules found within the selectin family of proteins.

Figure 3
figure 3

Three-dimensional modeling of the first sushi domain of human SRPX2. A cartoon representation [65] is shown, highlighting the conserved Trp and Cys residues. The model reveals an additional putative disulfide bridge (Cys71-Cys87), atypical of the classical sushi (CCP) module fold. The hypervariable loop, with the sites of disease-causing mutation (Tyr72) and evolutionary change (Lys75) and the protrusion specific to SRPX, are indicated.

In almost all sushi domains, a region that is highly variable in length, sequence and conformation, and commonly referred to as the hypervariable loop [27], is inserted within the second extended region. Depending on its length this projects laterally from the module and forms an obvious candidate surface for protein-protein recognition. Indeed, the hypervariable loop has been implicated as a "hot-spot" for several protein-protein interactions and disease-causing mutations in CCP-containing complement proteins [22, 2830]. In the sushi 1 model (Fig. 3) an additional disulfide bridge (cysteine residues C71-C87), atypical of the classical sushi domain fold, is evident. Cys71 lies at the beginning of the hypervariable loop that is exceptionally long in sushi 1. The hypervariable loop is thus forced to form a prominent protrusion extending towards the N-terminus of the module. This feature has neither been seen in any experimentally determined sushi module structure to date, nor is predicted to occur in other members of the D-cluster except SRPX.

The Y72S mutation is largely solvent exposed and located within the hypervariable loop, adjacent to the cysteine residue (C71) that participates in the non-typical, third, disulfide bridge. This change from a large aromatic side-chain (tyrosine; Y) to a small, polar one (serine; S) at position 72 will have a profound effect on the surface properties of this region that is close to the aforementioned prominent protrusion. K75 is located nearby, within the protrusion, and its side-chain is exposed (Fig. 3). It is reasonable to suggest that the unique structure formed by the hypervariable loop of sushi 1 performs a role that is specific to the SRPX2 protein. Presumably, it is not a coincidence, that it is also the site of the human-specific change. An R75K substitution is a conservative one; the substitution of one exposed, positively charged residue for another can easily be accommodated by small atom shifts in surrounding side-chains, and is not likely to affect the structure of the sushi domain. Such a change might have a small but not a dramatic functional effect. Conservative substitutions are thought to play a role in adaptive change [31]. Moreover, conservative R-to-K and K-to-R substitutions can result in the altered properties of either secreted proteins [32] or the extracellular domains of some plasma membrane proteins [33, 34], including a member of the selectin family [35].

Analysis of non-synonymous/synonymous substitution rate ratio in primates

Human-specific modifications with putative functional consequences may sustain positive selection since the human-chimpanzee split. The ratio of non-synonymous (Ka) to synonymous (Ks) substitution rates was then calculated for each branch on the primate phylogenetic tree (Fig. 4). Using the branch models, the Ka/Ks value in the human lineage differed from all other branches with statistical significance (p = 0.027) (Additional file 3). In this model, the calculated Ka/Ks ratio has an infinite value, as no synonymous change has accumulated between the human and the chimpanzee. However, this elevated Ka/Ks ratio was not statistically >1 (p = 0.371) (Additional file 3). Using the branch-site tests, model A indicated that the K75 site might be under positive selection in the human lineage. However, model A did not differ from the null hypothesis with statistical significance (p = 0.089) (Additional figure 3). Overall, positive selection in the human lineage could not be clearly demonstrated, as the branch-site analysis did not confirm the data obtained using the branch models. This may be due either to the actual absence of any positive selection, or to the lack of power of the tests due to the low number of genetic variations within SRPX2.

Figure 4
figure 4

Numbers of non-synonymous (left) and synonymous (right) substitutions of different evolutionary lineages in primates.

Population genetics analysis

Evidence for positive selection can also be found by analyzing the intra-specific variation in DNA sequences subjected to a selective sweep. DNA sequences adjacent to an advantageous allele should display lower-than-expected levels of diversity. The levels of polymorphism and divergence were estimated by sequencing a total of 9,908 kb of intronic sequences surrounding exon 4 of SRPX2 in the 24 X-chromosomes mentioned above. Polymorphic sites are shown in Table 1. Nucleotide diversity at the SRPX2 locus (π = 0.00036) was 76% of the estimated nucleotide diversity on chromosome X (π = 0.00047) [36]. The divergence between humans and chimpanzee for SRPX2 (D = 0.0075) was nearly identical to the average divergence calculated for X-linked intronic regions (D = 0.0072) [37].

Table 1 Polymorphic sites of SRPX2 introns 3 and 4 in 12 women worldwide.

The McDonald-Kreitman test [38] that measures the fraction of site under positive selection pressure by comparing the ratio of nonsynonymous to synonymous divergence and the ratio of nonsynonymous to synonymous polymorphism, was useless in the case of SRPX2 because of the very few number of inter- and intra-species DNA variations: no synonymous mutation and only one non-synonymous mutation has occurred since the human-chimpanzee split, and neither non-synonymous nor synonymous polymorphisms were detected when 24 human SRPX2 entire coding sequences from two sub-Saharan African, three Asian, three Maghrebian, and four Caucasian women were tested. Moreover, no polymorphism had been detected in the 198 unrelated patients previously screened for disease-causing mutations in the coding sequence of SRPX2 [20]. It is also noteworthy that while no synonymous polymorphism was found in silico, one non-synonymous variation was detected in silico [dbSNP:rs17851822]. Although the existence of a rare DNA variation cannot be ruled out, this change was detected in one single IMAGE clone only [Genbank: BC020733; IMAGE:4769946]. This is more likely to correspond to a clone artifact, as we indeed detected a large proportion of clones carrying artifactual mutations within SRPX2 inserts, using various Escherichia coli strains (unpublished data).

The HKA (Hudson-Kreitman-Aguadé) test [39] was then used in order to compare the intra- and inter-specific variations between the SRPX2 locus and control loci assumed to be under neutral selection, but did not yield significant results (Table 2). The Tajima's D test was also applied to the SRPX2 intron data. Tajima's D was clearly negative (D = -0.646) but did not reach significance (p > 0.05). As in the case of the ASPM gene that is involved in human brain expansion [6], the present intra-specific analyses did not show significant evidence against neutral expectations. The lack of statistical significance may indicate that no selective sweep has ever occurred in the human lineage. However, it should be mentioned that a selective sweep can be detected by intra-specific studies only for a short period (< 0.5 N generations, where N is the effective human population size, i.e. approximately < 100 000–200 000 years ago) after fixation of the advantageous variant [40, 41].

Table 2 Nucleotide polymorphism and divergence at SRPX2

Conclusion

In this study, we have examined the molecular evolution of the SRPX2 gene that causes brain disorders of the speech areas. One single, fixed amino acid change (R75K) occurred in the first sushi domain of SRPX2 after the human-chimpanzee split. Neither the primate analysis nor the population genetics separately demonstrated the existence of positive selection of SRPX2. Whether the single human-specific R75K mutation has sustained positive selection thus remains an open question. However the 3-D location of R75K right within an important functional domain of the SRPX2 protein, in the immediate vicinity of a pathogenic mutation, indicates that this evolutionary mutation may have functional importance. Because R75K occurred in a domain implicated in protein-protein interactions, it is possible that qualitative or quantitative changes in the interaction with one or several putative SRPX2 partners have been modified. Despite the present lack of any knowledge on either the actual function of SRPX2 or the proteins it interacts with, it is obvious that the comparison of the properties of the human and chimpanzee SRPX2 orthologues will help determine in the future if and how new functions were acquired. From this viewpoint, our study represents a first important step towards the analysis of the consequences of R75K on SRPX2 functioning and protein interactions.

Methods

DNA samples and sequencing

All human and nonhuman primate DNAs were extracted using standard procedures and according to the appropriate ethical committees and animal' care rules, respectively. Polymerase chain reaction (PCR) fragments representing 9,908 kb from introns 3 and 4 were sequenced with specific primers in 12 human females from the major continental populations (two sub-Saharan Africans, four Europeans, three Asians, and three Maghrebians). Each exon of the full-length coding region of SRPX2 was amplified by PCR and sequenced in the same 12 women and in six nonhuman primate species, using inter-specific consensus intronic primers. The nonhuman primates were of the hominidae, pongidae, hylobatidae, and cercopithecidae families: three chimpanzees (one male and two females; Pan troglodytes); one gorilla (Gorilla gorilla); one orangutan (Pongo pygmaeus), one gibbon (Hylobates sp.), one macaque (Macaca mulatta) and one baboon (Papio sp.). The primer sequences are listed in Additional file 4. Sequencing was performed using standard dye terminator chemistry. Sequences were analysed and polymorphisms detected using the Genalys 3.0 software [42]. Multiple sequence alignments were performed using the ClustalW program [43].

Modeling analyses

Modeling of the first sushi module of human SRPX2 was undertaken based upon its closest homologue in the Protein Data Bank (PDB) [44, 45]: the highest resolved X-ray structure of the first sushi module of complement receptor type 2 (PDB ID: 1LY2) [46], which was identified with a BLAST search [47, 48]. The target and template shared 29% sequence identity from N- to C-terminal cysteines. The program Modeller release 8 version 1 [49] was used.

The alignment between the target and template sequences was based on an initial multiple sequence alignment of several individual human sushi-module sequences assigned to cluster-D (with the addition of template sequence CR2 module 1), using the program ProbCons [50] to help position indels more appropriately (see Additional file 2 for target-template alignment). Cluster-D members, according to the recently published convention of sequence assignment [26], are characterized by the presence of a six-cysteine residue motif, forming three putative disulfide bridges. The target-template sequence alignment was subjected to further, minor manual editing guided by positioning of secondary structure elements more appropriately between the target and template sequences; secondary structure was predicted by the server PsiPred version 2.4 [51, 52] for the target sequence, or identified by DSSP [53] for the template structure. The additional putative disulfide bridge present in the target (absent in its corresponding template) was restrained during model building. Twenty-five models were generated, and the one with the lowest objective function score [49] selected as the representative model. The loop KGGNY in the hypervariable loop region of SRPX2, for which CR2-template-derived restraints were largely absent was subsequently remodeled under SYBYL version 6.9 (Tripos Associates, St. Louis, MO, USA) after we conducted a loop search for that region against a high resolution subset of the PDB, in order to select the best root-mean square fitting matching loop conformation for that region.

Non-identical side-chain residues of the representative model were optimized using the side-chain replacement program SCWRL version 3 [54]. The model was then protonated and subjected to energy minimization using the Tripos forcefield [55] (20 steps steepest descent followed by 20 steps conjugate gradient) under SYBYL to remove clashes and bad geometries. The model structure was finally checked for valid stereochemistry using PROCHECK version 3.5.4 [56].

Evolution data analyses

The phylogenetic analysis was performed using the phylogenomic analysis pipeline available in the FIGENIX [57] automated genomic annotation platform [58] with the human SRPX2 protein sequence [Genbank: NP_055282] as input and the NCBI nr or the Ensembl databases for BLAST searches (Additional file 1). The synonymous/non synonymous analyses were conducted in primates assuming the following unrooted tree topology: (Baboon, Macaque, ((((Chimpanzee, Human), Gorilla), Orangutan), Gibbon)). Ancestral sequences were reconstructed with the pamp and codeml programs in the PAML package [59, 60], using the parsimony method and the maximum likelihood method, respectively. From these data, the number of synonymous and non-synonymous substitutions and the Ka/Ks ratios were estimated using the DnaSP 4.0 package [61]. The codeml program from the PAML 3.15 packages was used to test for positive selection, using the branch models and the branch-site models, as previously described [62, 63] (Additional file 3).

The DnaSP 4.0 program was also used for all population genetic analyses. Nucleotide diversity (π) and Watterson's θ were computed as described [64]. The neutral evolution hypothesis for the SRPX2 intronic polymorphisms was checked with the HKA (Hudson-Kreitman-Aguadé) test [39], using available data on worldwide polymorphisms in non-coding reference (neutral) autosomal and X-linked regions of the human genome (Table 2), as previously described [16].