Background

The melatonin receptor family belongs to the super family of G protein coupled receptor (GPCR) and contains 3 known subtypes, Mel1a, Mel1b and Mel1c [14]. The genes for these receptors contain two exons separated by a large intron [2, 3]. The Mel1a gene is found on chromosome 4 in the chicken and human, Mel1c is also found on chromosome 4 in chicken (it is not found in the human), while Mel1b is encoded on chromosome 1 in the chicken and on chromosome 11 in the human. These 3 receptors all bind melatonin with a high affinity (KD = 10 to 200 pM) [5, 6]. Among them only Mel1a and Mel1b have been cloned and characterized in mammals [7, 8] and have been renamed MT1 and MT2 by the International Union of Pharmacology (IUPHAR). In contrast, Mel1c has been found only in fish, the chicken and Xenopus [4]. The three receptors (MT1, MT2 and Mel1c) share about 60% sequence identity and BRET studies showed that MT1 and MT2 receptors could form heterodimers [9].

In eutherian mammals, a previously known orphan receptor GPR50 was identified as a melatonin-related receptor because it has 45% identity with the melatonin receptor family [10]. However GPR50 is encoded by a gene located on the X chromosome and does not bind melatonin [11]. A recent study demonstrated that GPR50 can heterodimerize with MT1 and MT2 receptors [12], leading to a suppression of MT1 but not MT2, affinity for melatonin. Interestingly GPR50 has only been found in eutherian mammals and not in fish or birds.

The aim of this study was to understand the phylogenetic evolution of the melatonin receptor family and more specifically of the Mel1c and GPR50 genes using an in silico approach. Studying the phylogenetic tree of the melatonin receptor family and tracking the synteny of genes surrounding Mel1c in several species strongly suggested that the Mel1c gene found in fish and avian species is the ortholog of the eutherian GPR50 gene. This interpretation was further supported by estimation of selection pressure and by a gene structure analysis of the Mel1c and GPR50 genes.

Results

Phylogenetic analysis of the melatonin receptors

The phylogenetic tree in Fig. 1A–E, built from the NCBI protein database using chicken Mel1c protein as the query, showed that there are four groups of orthologous genes corresponding to GPR50, MT1, MT2 and Mel1c. The GPR50 gene was only detected in mammalian genomes (Fig. 1B). In contrast, the Mel1c gene was only detected in fish species, Xenopus and chicken genomes confirming previous results [1, 4] (Fig. 1E). Noticeably, prototherian and metatherian species appeared in Mel1c (platypus) or GPR50 (oppossum) branches (Fig. 1B, E). Note that several bootstrap values are low suggesting that the robustness of the tree is questionable.

Figure 1
figure 1

Phylogenetic analysis of the GPR50/MT1/MT2/Mel1c genes. (A) Overall phylogenetic tree showing 3 groups of genes: GPR50, MT1/MT2, and Mel1c genes and the animal orders where each branch is expressed. The trees (npl) are the fusion of three phylogenetic trees built based on Neighbour joining, maximum Parsimony and maximum Likelihood (see "Materials and Methods" section for further details). The italic letters correspond to the name given to the branches for the likelihood ratio tests (B) Phylogenetic tree of GPR50 genes. Please note that only mammalian species appear in the tree. (C) Phylogenetic tree of MT1 genes (D) Phylogenetic tree of MT2 genes. (E) Phylogenetic tree of Mel1c genes that do seem to appear only in non mammalian species. Bootstrap values are reported for each npl method.

Branch lengths are correlated with the evolutionary rate of the sequence [13] and branch length values were clearly higher for the GPR50 orthology group than for the other three groups. These results suggest that sequences from the GPR50 group evolved faster than those from MT1, MT2 or Mel1c groups.

Analyses of selection pressure were performed to provide insights into functional constraints applied at sequence level to the GPR50 group. The branch-site model A was applied to test evolutionary shift using the Maximum Likelihood method. The branch leading to the GPR50 group was labelled as the foreground branch and all others as background branches in the phylogenetic tree. Parameter estimates under model A suggested that 71% of sites evolved under purifying selection (ω0 = 0) whereas 29% sites were identified under the neutrality assumption. Likelihood Ratio Tests (LTRs) were highly significant with P < 0.0001 (2Δl = 24.7 and df = 2) when the model A was compared to the null model M1a (neutral). Thus, the model did not find evidence for positive selection and that, on average, numerous sites (~25%) evolved without functional constraint.

In order to test if positive selection occurred in different lineages along the phylogeny, others branches (leading to Mel1a, b, c orthologous groups) were labelled as foreground branches. The calculations are summarized in Table 1. All the LRTs gave significant results. Compared to the GPR50 group, the percentage of sites evolving under purifying selection was higher, particularly for the Mel1c group (~95%). Remarkably, the percentage of sites evolving under neutrality for GPR50 was ~28%, contrary to the Mel1a, b, and c groups whose percentages were ~3%. Positive selection was detected in the Mel1a, b, c groups. The Bayes empirical Bayes (BEB) analysis identified 1, 4 and 1 sites respectively, under positive selection along branches leading to Mel1a, b, c, at a probability of >95%. Amino acids under positive selection are reported in Table 1.

Table 1 Parameter estimates for the GPR50, mel1 a, b, c under Model A and the effects of codon usage bias on LRTs (n = 29)

Analysis of Mel1c synteny

In order to test whether Mel1c and GPR 50 genes were lost in mammals and in non-mammals species respectively, we carried out a study of the synteny of these genes. At the Mel1c gene locus on chicken chromosome 4 the following group of genes encoding bHLHPAS, Mel1c, HMG2A, CD99 molecule like 2 and myotubularin related protein were displayed (Fig. 2). For clarity, accession numbers of these genes and their orthologs in different species are summarized in Table 2. Interestingly, synteny has been best conserved in mammals where the chromosomal locus containing Me11c is found on the X chromosome (from Ensembl.org website) (Fig. 2). Except for minor changes among species (for example, the insertion of the gene for Ribosomal protein 19 between the genes for GPR50 and HMG in man) this synteny analysis clearly shows that, despite rapid evolution of the coding region of interest, Mel1c evolved into GPR50 in eutherian mammals. A BLAST analysis against all the available mammalian genomes did not reveal any "fossilized" pseudogenes of Mel1c at this locus in any species, strengthening the notion that the Mel1c ancestral gene was not duplicated in mammals to allow the emergence of GPR50 and of a lost Mel1c pseudogene. Interestingly, this rapid evolution was also observed in neighbouring genes, i.e. 2610030H06 Rik [see Additional file 1] and HMG2A [see Additional file 2] whose phylogenetic trees show either poor bootstrap values (HMG2A) or an odd organization (2610030H06Rik). This synteny approach represented therefore a powerful tool to elucidate the orthology relationships between fast evolving genes.

Figure 2
figure 2

Synteny of Mel1c/GPR50 genes and neighbours in vertebrate genomes. Note that genes are found on chromosome 5 in zebra fish and on chromosome 4 in chicken while they are found on chromosome X in other depicted species. Please note that synteny is mostly conserved for bHLHPAS, 2610030H06 RIK, Mel1c, HMG2A, CD99, and myotubularin related protein in opossum and mammalian species despite the integration of new genes coding for hypothetical proteins (opossum, chimpanzee, cow), ribosomal proteins (dog, chimpanzee, man), NGFI-A binding protein (chimpanzee, man), Utbf (mouse) and MAGE (cattle) proteins. It is also of note that several genes surrounding Mel1c in zebra fish (pdcd8, nono, and the two hypothetical proteins) present high identities with genes found on chromosome X in mouse but not in the GPR50 locus (unpublished data). p.d.: predicted gene. Chrm: chromosome.

Table 2 Accession numbers of genes surrounding Mel1c/GPR50. For clarity, the names of proteins used as an entry were based on the protein identification found in chicken.

Sequence alignments between paralogs of the melatonin receptor family, and sequence identity analysis

The percentage of identically aligned amino acids between MT1, MT2, and Mel1c/GPR50 from five species (Xenopus laevis (Xl), Gallus gallus (Gg), Monodelphis domestica (Md), Mus musculus (Mm), Homo sapiens (Hs)) are presented in Table 3. The greatest sequence identity among the orthologous genes was observed for MT1 where more than 75% of the amino acid alignment has been conserved if we exclude Xenopus laevis.

Table 3 Sequence identity analysis

Mutagenesis studies performed either on the human [1417] or ovine MT1 receptor [18] and studies using human chimeric GPR50/MT1 receptor constructs [19, 20] have shown several highly conserved residues in transmembrane helices that are critical for ligand binding especially those in transmembrane helix III (TMIII: S110, G258) (Fig. 3). In addition, mutation of the N124 within the specific NRY signature of the melatonin receptor group, located just downstream of TMIII dramatically impairs receptor function (binding affinity, control of cAMP level and regulation of ion channel activity [21]). In the same way, key amino-acid residues in the ligand binding pocket of the MT2 receptor are located in transmembrane helices IV (N175), V (V204, V208), VI (G271, L272) and VII (Y298) ([22, 23]; Fig. 3). These residues are extremely conserved among species and receptor subtypes. It is worth noting that some key amino acids for melatonin binding that are found in helix VI of the MT1 receptor (G258) and of the MT2 (G271, L272) are substituted by T257 and V258 respectively in human GPR50 (Fig. 3A)

Figure 3
figure 3

Sequence alignment of human MT1, MT2 and GPR50 with bovine rhodopsin (pdb 1F88). Sequence identities are reported white on a black background, whereas sequence similarities are boxed (A). The positions of the transmembrane helices, as observed in the bovine rhodopsin structure, are reported above its sequence. Arrows indicate the positions of the amino acids that, in GPR50, evolved under positive selection. Stars indicate amino acids which have been shown to play a key role for melatonin binding in MT1 (dark blue), MT2 (light blue) or both (red). A ribbon representation of the GPR50 3D structure model is represented (B), with transmembrane helices colored according to the sequence alignment. Amino acids evolving under positive selection and amino acids important for melatonin binding in MT1/MT2 are shown according to the colors reported in the sequence alignment.

Structural evolution of Mel1c into GPR50

Sequence alignments of the orthologous genes Mel1c and GPR50 (Fig. 4) reveal the addition of a long C terminal domain in the GPR50 receptor. As a consequence, the largest discrepancies between the sequence alignment of amino acids was observed for the Mel1c and GPR50 orthologs where the sequence identity ranged from 45% to 79% (Table 3). This led us to compare the gene structure of the Me11c and GPR50 receptors.

Figure 4
figure 4

Sequence alignment of chicken Mel1c, zebra fish Mel1c and human GPR50. Sequence identities are reported white on a black background, whereas sequence similarities are boxed. The positions of the transmembrane helices are reported above its sequence. Arrows indicate the positions of the amino acids that, in GPR50, evolved under positive selection.

Study of Mel1c structures in several species using BLAT software revealed a common gene organization in the zebra fish and the chicken where Mel1c was coded by 2 exons (Fig. 5). The structure of the GPR50 gene was very similar in mouse and man where it also contained 2 exons, but the first exon is segmented into 4 smaller exons in the horse. In contrast, the GPR50 gene in the opossum was made up of 7 smaller exons. The C-terminal fragment in 3' position in eutherian mammals replaces the stop codon found in the chicken and zebra fish; the fragment starts at the 3' end with a SxL amino acid sequence, S being amino acid 328 in the mouse, 320 in man and 525 in the horse (Fig. 5). Analysis of this extension to the C-terminus of the receptor using a bidimensionnal Hydrophobic Cluster Analysis (HCA; hydrophobic residues gathered into clusters, typical of regular secondary structures) identified a repeated sequence between amino acids 398 and 466 (Fig. 6). This repeated sequence is organized around a degenerated heptapeptide. The first and last positions of the heptad are generally occupied by an aromatic amino acid, the sixth position by an aliphatic, hydrophobic amino acid, and the second and fifth positions are occupied by a basic amino acid (generally K) and by a hydroxyl amino acid (generally S). One of the ten repeats has a single amino acid insertion (S) between the fifth and sixth positions. This repeat heptad is reminiscent of the C-terminal repeat domain (CTD) of RNA polymerase II (RNAPII), with which it well aligned (Fig. 6). The RNAPII CTD also has an unusual extension, outside the catalytic core of the largest subunit of the enzyme, that serves as a flexible binding scaffold for numerous factors that regulate transcription-related events (for reviews, see [24, 25]). The binding of factors to the RNAPII CTD is determined by the pattern of phosphorylation, which principally occurs at Ser2 and Ser5 of the repeat. Worth noting, is that the second and fifth positions in GPR50 repeats include amino acids that are highly conserved. A serine is also highly conserved in the fifth position, as for RNAPII CTD, whereas a basic amino acid (K or R) is invariably conserved in the second position instead of a serine. This conserved pattern in GPR50 together with its similarity to RNAPII CTD, suggests that GPR50 repeats might also constitute a flexible scaffold for the binding of partner(s) that recognize specific phosphorylation sites. The GPR50 heptad repeat is followed by a ~100 amino acid domain. And from HCA analysis the domain is predicted to be structured. However, this domain is unusually rich in serine and threonine residues and many other repeated sequences (e.g. a SH dipeptide is repeated five times at non regular intervals). These sequences are probably in specific structures and functions.

Figure 5
figure 5

Schematic diagram of the Mel1c/GPR50 gene organization in zebrafish, chicken, opossum, mouse, man, and horse. The stop codon following the second exon in zebrafish and chicken is replaced by the insertion of a protein fragment reminiscent of a DNA directed RNA polymerase II in mammals (light color).

Figure 6
figure 6

Alignment of the repeated heptad found in the C-terminal extension of human GPR50 and comparison with the repeated heptad observed in the C-terminal domain (CTD) of RNA polymerase II (RNAPII). The three positions (2, 5 and 7) occupied by phosphorylable serine residues in RNAPII CTD are boxed.

Discussion

Using an in silico approach, we have demonstrated in this study, that Mel1c, the gene for a high affinity melatonin binding receptor found in the chicken and in Xenopus, rapidly evolved into GPR50 gene in eutherian mammals. The GPR50 gene encodes a receptor that does not bind melatonin but affects the interaction of this hormone with its cognate MT1 receptor after dimerization.

Analysis of the phylogenetic tree of the melatonin receptor family suggested that Mel1c is not present in mammals. Two evolutionary hypotheses are possible. The first, is duplication of the Mel1c/GPR50 ancestral gene before the emergence of vertebrate species. One of these genes evolved into the Mel1c gene of fish species, Xenopus and the chicken, the GPR50 gene being lost in these species, while in mammals, the ancestral gene evolved into GPR50 and the Mel1c gene was lost. This hypothesis implies the existence of a Mel1c pseudogene in mammals and a GPR50 pseudogene in fishes, Xenopus and chicken. A similar pathway exists for the zona pellucida gene family [26] where the ZPAX and ZPD genes have been lost in mammals. However, careful analysis of BLAST data against all the genomes studied failed to find any evidence for "fossilized" genes for GPR50 or Mel1c for example, a residual exon with a stop codon or a deletion. Thus it is unlikely that Mel1c/GPR50 evolved as a consequence of gene duplication. The second more likely hypothesis is that the Mel1c gene evolved rapidly into GPR50 gene in mammals by the mutation of several critical amino acids and by the addition of a C-terminal sequence. A condition for this hypothesis to be true is that the Mel1c and GPR50 genes are surrounded by the same genes, in the syntenic genomic regions of the chicken, opossum and mammalian genomes. This conserved synteny has been clearly highlighted in our results. In this regard, the Mel1c/GPR50 gene is located close to a break in the synteny of the chromosome. Breaks of synteny zones are associated with regions of chromosomal instability in rodents [27]. Navarro and Barton (2003) have also demonstrated that the Ka/Ks ratio is higher for genes located on chromosomes that underwent structural rearrangements between the human and the chimpanzee compared with colinear chromosomes [28]. It is therefore possible that a link exists between the rapid evolution of Mel1c into GPR50 gene in mammals and its close vicinity to a site of structural rearrangement in the chromosome.

Considering the rapid evolutionary process of Mel1c into GPR50, two interesting features are worth highlighting. The first is the loss of affinity of GPR50 for melatonin. It is widely assumed that G protein-coupled receptors and among them, the melatonin receptors, share the same structure as rhodopsin in which predicted critical residues for ligand binding to the appropriate binding pocket are located in the transmembrane regions [29]. Mutagenesis studies performed either on human MT1 or MT2 receptors have shown that several transmembrane amino-acid are critical for the binding of melatonin. Among these, the critical amino acids for melatonin binding G258 in helix VI of MT1 and G271 and L272 of MT2 are replaced by T257 and V258 in human GPR50. Interestingly, these amino acid substitutions have probably evolved under conditions of neutrality strengthening the neutral evolution theory of Kimura [30], even if at present, there no available tools to confirm this theory. Our results show that five other separate sites (one, four and one amino acids for MT1, MT2 and Mel1c respectively) also underwent rapid evolution under positive selection. One of these amino acids (L183), in the second extracellular loop of the GPR50, has replaced S or T in all melatonin receptor subtypes of all species except in Esox lucius MT2 (A197). Whether or not these amino acid substitutions have actually led to the loss of affinity of the GPR50 for melatonin may be confirmed by site-directed mutagenesis. To explain the loss of affinity of GPR50 for melatonin, we observed that the amino substituted under positive selection were surprisingly not the ones identified as being important for high affinity binding. The 3D structural model reveals that both types of sites (positively selected sites and sites important for melatonin binding), although distinct, are close to each other leading to potential impairment of ligand binding. The GPR50, 3D model also shows that three of the sites under positive selection are located at the external membrane surface of the receptor. At this position they might be able to "gate" the receptor ligand pocket and therefore contribute to loss of affinity of GPR50 for melatonin.

The second intriguing feature of Mel1c/GPR50 evolution is the addition of the long C-terminal tail. This region may have also contributed to the functional differences between GPR50 and Mel1c. This CTD shows no signs of rapid evolution because it is well conserved among mammals. It is of note that the opossum GPR50 gene contained four more exons coding for the CTD of this receptor than eutherian mammals. However, PSI-BLAST did not reveal any significant homology of the last 4 exons with a known protein suggesting that this CTD appeared after the divergence of non-eutherian and eutherian mammals. The physiological function of this CTD remains to be established.

Despite the functional difference between GPR50 and Mel1c no positive selection was detected suggesting that functional shifts not always correlated with positive selection or that the model used for detecting positive selection is unable to discriminate sufficiently. However, the percentage of sites in the GPR50 group (~28%) that on average, evolve under neutrality allows us to hypothesize that relaxed selection is linked to functional change. This percentage for the branch leading to the Mel1a, b, c groups is ~3% suggesting that the melatonin receptor family has a divergent evolutionary history. Currently, only a few examples of relaxed or positive selection have been related to functional shifts but the role of the environment in gene evolution must be statistically examined before any such claims can be substantiated [31]. Key insights into understanding the role of the environment on the evolution of GPR50 would be gained by pinpointing the selective advantage conferred in mammals by the change. Determining the physiological roles of this receptor is therefore critical to the testing of this hypothesis.

The rapid evolution of Mel1c into GPR50 is associated with a change in the physiological role of the the pineal gland, from a directly photosensitive organ with its own photoreceptors in most non-mammalian vertebrates into an indirectly photosensitive neuroendocrine gland in mammals [32]. Moreover, there is a difference between fishes and mammals in the cellular pathways inducing genes in the pineal gland. One is the recently described orphan nuclear receptor Rev-erb α [33]. This gene is regulated by orthodenticle homeobox 5 (Otx5) through a Pineal Expression Related Element (PERE) that is found in fish species and Xenopus but not in mammals [33]. Nishio and co-workers suggest that Rev-erb α gene expression in the pineal underwent a major functional shift in mammals. Interestingly, it was previously reported that Otx5 family of genes undergoes rapid evolution in mammalian lineages where they are known as Crx genes, with a restricted distribution compared to Otx genes [34]. Taken together with our data, these results suggest that some components of the photoresponsive network, from the genes expressed in the pineal gland to melatonin receptors in the hypothalamus, underwent particularly rapid evolution in vertebrates. However the precise significance of this evolution still needs to be clarified.

The functional significance Mel1c and its evolution into into GPR50 are unclear partly because of the lack of knowledge concerning the role of Mel1c receptors in the intracellular transduction of the melatonin signal. Moreover, the physiological regulation [35, 36] and function [12, 3638] of GPR50 receptors remain largely unknown despite a recent paper indicating an altered metabolic phenotype of GPR50 knockout mice [36]. Interestingly, Mel1c and GPR50 receptors do not share the same distributions in the brain. In the chicken, Mel1c is widely spread in the brain [4], while in mammals GPR50 has been observed in hypothalamo-pituitary regions, for example dorsomedial hypothalamus in the rodent [36, 39] and the pars tuberalis in the human and the sheep [10, 11], a noticeable exception is the ependymal cell layer of the third ventricle where strong GPR50 expression is seen in all species examined to date [35, 36]. This altered pattern of expression and the loss of affinity for melatonin is probably not a neutral physiological event with respect to melatonin signalling. Melatonin is secreted throughout the period of darkness and the duration of its secretion mimics daylength during the year [40]. This signal constitutes a chemical transduction of the time of the year and is a critical factor for the regulation of seasonal functions such as moulting, hibernation and reproduction in mammals (for reviews see: [41, 42]). It is intriguing that GPR50 is present mainly in the ependymal cell layer of the third ventricle [11, 35, 43], where cerebrospinal fluid concentrations of melatonin are up to 20 times higher than in blood [44]. In addition, levels of GPR50 in ependymal cells of Siberian hamsters are lower under photoperiodic conditions that mimic short days [35]. A recently published in vitro study reported decreased function of the MT1 receptor after heterodimerization with GPR50 because of an interaction of the C-terminal tail of GPR50 with regulatory proteins of MT1 receptors [12]. It is thus tempting to speculate that the photoperiodic modulation of GPR50 regulates the function of the MT1 receptor by altering its affinity for melatonin, and hence modulates physiological responses elicited by this receptor. A critical test of such a hypothesis is to determine if MT1 receptors are in the ependymal cell layer of the third ventricle. Presently this is difficult because there is not a suitable MT1 receptor antibody and because of low levels of MT1 mRNA [42].

Conclusion

In conclusion, this work has shown that the methodology allowing the building of phylogenetic trees may not be sufficient to define the orthological relationships among genes that evolve rapidly, and that studying synteny is often necessary to decipher the relationships among genes of a family. When applied to the family of melatonin receptor genes, this approach allowed us to demonstrate that the high affinity melatonin receptor Mel1c found in non mammalian species is present in the genomes of mammalian species where it has been named GPR50. This receptor has been extensively remodelled through evolution by the mutation of numerous critical amino acids and by the addition of a long C-terminal tail. These alterations have modified the affinity of GPR50 for melatonin and probably affected its interactions with the two functional melatonin receptors, MT1 and MT2, in mammals. Further studies are required to determine the physiological roles of the GPR50 receptor.

Methods

Phylogenetic analysis

We performed the phylogenetic analysis using the phylogenomic analysis pipeline available in FIGENIX platform [45, 46]. The FIGENIX platform retrieved sequences, provided multiple sequence alignments, phylogenetic reconstruction and deduced orthology and paralogy relationships (for a detailed description of pipelines and models used, see [46]). The chicken Mel1c protein sequence (346 aa) was extracted from NCBI (accession no. NP_990692.1) and entered in the phylogenomic inference task, which was run with the default parameters and with Ensembl or NCBI protein database. We also built trees of chicken Mel1c flanking protein sequences: 2610030H06 Rik (Accession number NM_001031127.1) and HMG2A (Accession number XM_001235453.1). We chose the NJ topology for the figures. The trees (npl) are the fusion of three phylogenetic trees built based on Neighbour joining [47], maximum Parsimony and maximum Likelihood [48]. The Dayhoff PAM matrix [49] provided the distance matrix for the NJ method. The evolutionary distance separating sequences is defined as the number of mutational events per site underlying the evolutionary history separating the sequences. Thus, evolutionary relations among sequences are represented by a tree structure where branch length represents the evolutionary distance [13]. In Fig 1, and in additional files 1 and 2, for each node, bootstrap values are reported for each npl method. Bootstrapping was carried out with 1000 replications.

Evolutionary shift analysis

The protein sequences were aligned using Clustal W [50]. Correspondence between protein alignment and each DNA sequence was established using the Wise2 software package followed by manual adjustments [51]. The final alignment contained 783 codons and 52 aligned sequences: The codeml program of the PAML (Phylogenetic Analysis by Maximum Likelihood [52]) 3.15 software package was applied to test evolutionary shift, PAML uses a Maximum Likelihood algorithm to assign likelihood scores to different models for selection. We first used the model A that enables ω (= dN/dS) to vary both between sites and between lineages, and was implemented in the maximum likelihood framework [53]. Branches a, b, c and d were independently labeled as foreground branches, and all remaining branches were labeled as background branches (see Fig. 1). This model was then used to construct likelihood ratio tests (LRTs) by comparison with the null model (site model M1a neutral).

Analysis of Mel1c synteny

We examined the synteny of genes flanking Me11c on chicken chromosome 4: bHLH-PAS (XM_420353.2), 2610030H06 (NM_001031127.1), HMG2A (XM_001235453.1), CD99 molecule like 2 (XM_420355.2) and myotubularin related protein (XM_420356.2). These genes are found on chromosome 4 whose synteny in mammals is found on chromosome X [54]. Using the "TBLASTN" software [55], proteins were related to sequences of the genome of the opossum (Monodelphis domestica), dog (Canis familiaris), mouse (Mus musculus), chimpanzee (Pan troglodytes), human (Homo sapiens) and cattle (Bos taurus) (Fig. 2). For clarity, accession numbers of species examined for different gene orthologs are summarized in Table 2. For comparison, the synteny of genes flanking Mel1c found on zebra fish (Danio rerio) chromosome 5 were also added (Fig. 2). Using the TBLASTN software, we found several genes surrounding Me11c in that species (pdcd8, nono, and the two hypothetical proteins) with a high percentage identity with genes on chromosome X of the mouse but at different loci than GPR50 (data not shown).

Multiple sequence alignments (MT1, MT2, Mel1c and GPR50)

Multiple alignments of the amino-acid sequences of Mel1a, Mel1b and Mel1c/GPR50 were performed using the Clustal W software available at the EMBL-European Bioinformatics institutes web site [56] (Figs. 3 and 4).

Sequence identity analysis

Amino acid sequences of Mel1a, Mel1b and Mel1c/GPR50 for Xenopus laevis, Gallus gallus, Monodelphis domestica, Mus Musculus and Homo sapiens were aligned by pairs using the Smith-Watermann local alignment (EMBOSS) software [57]. The program compares protein sequences and calculates the statistical significance of matches. For each alignment, we focused on the percentage of amino acid identity (Table 3).

Gene structure analysis of GPR50

Protein sequences coding for Mel1c in chicken and zebra fish and for GPR50 in opossum, mouse, horse and man were run through the "BLAT" software [58] to deduce the gene structure (Fig. 5)

Analysis of the C-terminal extension of GPR50

The sequence of the C-terminal extension of human GPR50 does not share any obvious similarity with other proteins available in database search sensitive programs such as PSI-BLAST [59]. The bi-dimensional method of sequence analysis, called Hydrophobic Cluster Analysis [60, 61], which efficiently combines analysis of the 1D and 2D structures, was used to explore further the GRP50 C-terminal extension. This led to the identification of repeated sequences, which are described in the Results section (Fig. 6)

GPR50 homology modelling

A model of the three-dimensional structure of the GPR50 transmembrane domain was obtained using the high resolution crystal structure of bovine rhodopsin (pdb 1F88) as a template. The multiple alignment of bovine rhodopsin with MT1, MT2 and GPR50, shown in Fig. 3, was performed using MAFFT [62] and refined using Hydrophobic Cluster Analysis (HCA; [60]). This alignment is similar to that reported by Rivara et al. for MT1 and MT2 receptor models [63]. The three-dimensional models (Fig. 3) were generated using MODELLER [64] and their stereochemical quality checked using PROCHECK [65].