Gain and loss events in the evolution of the apolipoprotein family in vertebrata
Various apolipoproteins widely distributed among vertebrata play key roles in lipid metabolism and have a direct correlation with human diseases as diagnostic markers. However, the evolutionary progress of apolipoproteins in species remains unclear. Nine human apolipoproteins and well-annotated genome data of 30 species were used to identify 210 apolipoprotein family members distributed among species from fish to humans. Our study focused on the evolution of nine exchangeable apolipoproteins (ApoA-I/II/IV/V, ApoC-I~IV and ApoE) from Chondrichthyes, Holostei, Teleostei, Amphibia, Sauria (including Aves), Prototheria, Marsupialia and Eutheria.
In this study, we reported the overall distribution and the frequent gain and loss evolutionary events of apolipoprotein family members in vertebrata. Phylogenetic trees of orthologous apolipoproteins indicated evident divergence between species evolution and apolipoprotein phylogeny. Successive gain and loss events were found by evaluating the presence and absence of apolipoproteins in the context of species evolution. For example, only ApoA-I and ApoA-IV occurred in cartilaginous fish as ancient apolipoproteins. ApoA-II, ApoE, and ApoC-I/ApoC-II were found in Holostei, Coelacanthiformes, and Teleostei, respectively, but the latter three apolipoproteins were absent from Aves. ApoC-I was also absent from Cetartiodactyla. The apolipoprotein ApoC-III emerged in terrestrial animals, and ApoC-IV first arose in Eutheria. The results indicate that the order of the emergence of apolipoproteins is most likely ApoA-I/ApoA-IV, ApoE, ApoA-II, ApoC-I/ApoC-II, ApoA-V, ApoC-III, and ApoC-IV.
This study reveals not only the phylogeny of apolipoprotein family members in species from Chondrichthyes to Eutheria but also the occurrence and origin of new apolipoproteins. The broad perspective of gain and loss events and the evolutionary scenario of apolipoproteins across vertebrata provide a significant reference for the research of apolipoprotein function and related diseases.
KeywordsPhylogenesis Gain and loss events Divergence Apolipoprotein Vertebrata
basic local alignment search tool
lamprey apolipoprotein 1
molecular evolutionary genetics analysis version 7
MUltiple Sequence Comparison by Log-Expectation
National Center for Biotechnology Information
new world monkeys
very low-density lipoprotein
Apolipoproteins are components of plasma lipoproteins and are mainly synthesized in the small intestine and liver. The major human apolipoproteins include ApoA, ApoB, ApoC, and ApoE. ApoA-I and ApoA-II are the major proteins of plasma high-density lipoproteins (HDLs); both ApoA-I and ApoA-IV are found in chylomicrons and HDLs, while a large portion of ApoA-IV is lipid free [1, 2, 3]. Human ApoA-V is present at a very low concentration in plasma and as a fraction of very low-density lipoproteins (VLDLs), HDLs and chylomicrons [4, 5]. ApoCs are constituents of chylomicrons, VLDLs, and HDLs , and ApoE is a component of various lipoprotein particles, including VLDLs and their components, chylomicrons and their components and HDLs , in humans.
As a crucial structural component of lipoproteins, apolipoproteins form different kinds of lipoproteins and play significant roles in lipid transport and lipoprotein metabolism [6, 8, 9, 10, 11, 12]. Abnormalities in apolipoproteins are a risk factor for a variety of human diseases. For example, a deficiency in ApoC-II and 24 genetic mutations can cause hypertriglyceridemia (HTG) . The level of ApoC-III in human plasma is elevated in hyperlipidemia and diabetes [14, 15]. Low-density lipoprotein (LDL) accumulation in plasma resulting from a deficiency in LDL receptors (ApoB and ApoE) is related to the onset of accelerated coronary artery disease . In addition, ApoE gene allelic variants [17, 18] and single nucleotide polymorphisms (SNPs) in different regions , together with protein level, are associated with the risk of Alzheimer’s disease [20, 21, 22, 23, 24].
The human exchangeable apolipoproteins APOA/C/E (ApoA, ApoC, and ApoE) have the same genomic structure and are members of a multigene family. In humans, apolipoprotein genes have similar structures, including 3′- and 5′-untranslated regions, 3–4 introns, and 2–3 exons. The exons encode signal peptides, propeptides and mature peptides. Apolipoproteins have a common 33-codon block and variable repeats in the mature peptide region. Thus, their protein domains consist of one 33-amino acid repeat and various 22- and 11-amino acid repeats. Genes for human apolipoproteins belonging to the apolipoprotein A1/A4/E family are located on three different chromosomes. In Homo sapiens, the APOE/C1/C2/C4 gene cluster is located on chromosome 19q13.32. The gene APOC1 (6.6 kb), together with its downstream pseudogene, the APOC1’ (6.0 kb) gene, is located in the downstream gene APOE (4.7 kb) . The gene APOC2 (4.7 kb) is located between the gene APOC1 and gene APOC4 (4.2 kb) [26, 27, 28]. Another gene cluster of human apolipoproteins is the APOA1/APOC3/APOA4 cluster on chromosome 11q23.3, which spans a range of approximately 43.8 kb [29, 30]. The gene APOC3 (4.1 kb) is located upstream of APOA4 and 4735 bp downstream of the APOA1 (2.9 kb) gene. The APOA5 gene (4.0 kb) is located 31.4 kb downstream of the APOA4 gene (3.4 kb). Only the APOA2 (1.7 kb) gene is located on chromosome 1q23.3.
A previous study showed that the apolipoprotein LAL1 (lamprey apolipoprotein 1) in Petromyzon marinus (lamprey) is similar to human ApoA-I/II/IV, ApoE and ApoC-III . This suggests that the APOA/C/E family and LAL1 likely share common ancestors. Members of the mouse apolipoprotein family are highly conserved with members of the human apolipoprotein family [32, 33]. The apolipoprotein ApoA genes of primates and hedgehogs independently underwent convergent evolution through different duplication and modification events [34, 35]. Another study showed that the evolution of ApoE is highly influenced by feeding habits, and ApoE of frogs as an ancestor is less related to ApoE of other species .
As early as 1988, Wen-Hsiung Li et al. summarized the knowledge of the biosynthesis, structure, and structure-function relationships of the APOA/C/E family and proposed a hypothetical scheme for the evolution of this family . Since then, numerous studies have explored the evolution of the APOA/C/E family in different aspects and specific lineages. However, systematic comparisons and analyses of all members of the APOA/C/E family in various species throughout vertebrata are still lacking. The determination of the evolution of each apolipoprotein in vertebrata is important for understanding the implication of the function of the APOA/C/E family and the adaptation of specific species. In recent years, a large amount of genomic data has become available and provides a favorable opportunity to study apolipoproteins in a broad perspective. In this study, we systematically analyzed all members of the APOA/C/E family across vertebrata throughout Chondrichthyes, Holostei, Teleostei, Amphibia, Sauria (including Aves) and Mammalia via a comparative genomic approach. The analysis revealed the evolutionary relationships and the gain and loss events for apolipoprotein family members ApoA (I, II, IV, V), ApoC (I, II, III, IV), and ApoE and uncovered the connection between the evolution of apolipoprotein family members and their biological function in the process of species formation.
Overall distribution of the apolipoprotein family in vertebrata
Divergence and convergence between species evolution and apolipoprotein phylogeny
In the maximum-likelihood (ML) tree of the ApoA-II protein sequences, the Glires clade clustered with Eutheria (even with a very low bootstrap value) (Fig. 2b) rather than clustering with Primates in the Euarchontoglires clade. Except for ApoA-I (Fig. 2a), Glires consistently clustered as a peripheral clade instead of clustering with the Primates clade. Moreover, in ApoE ML trees (Fig. 2i), Oryctolagus cuniculus (rabbit) had low homology with Mus and Rattus, although rabbit and mouse both belong to Glires. The special evolutionary status of the Glires apolipoprotein may reflect its functional divergence from its orthologous proteins.
In addition to the unique Glires clade, apolipoprotein in platypus (Ornithorhynchus) and opossum (Monodelphis), which evolved as early mammals, also had phylogenic divergences compared with species evolution. For example, in the ApoA-I ML tree (Fig. 2a), opossum clustered at the periphery of Sauria.
In addition, it is noteworthy that the branch length of different clades in the trees of different apolipoproteins varied greatly. This suggests that the evolutionary rate of this family varied depending on species specificity and living environment. For example, the branch length of the Eutheria clade is shorter than that of the other branches in the ApoA-I and ApoE trees.
Frequent loss of members in the evolution of the apolipoprotein family in vertebrata
During species evolution, ApoC-I, ApoC-II, and ApoE were lost in Aves (L1 event); subsequently, ApoC-I was lost in Cetartiodactyla (L2 event). ApoA-V appears in Amphibia but disappears together with ApoC-I in platypus (Ornithorhynchus). Although ApoC-III and ApoC-IV appeared in Sauria and Eutheria, respectively (G3 and G4 event), and later than other apolipoproteins, they remain stable in the process of species evolution. These apolipoproteins emerged in diverse speciation events and were lost at different times. This result means that apolipoprotein family members did not all simultaneously occur at the beginning of the species and were not always preserved during evolution.
Evolutionary relationships among the apolipoproteins ApoA, ApoC and ApoE
The ApoA-I, A-II, A-V, ApoC-I~III and ApoE genes have three introns separating four exons, but the ApoA-IV and ApoC-IV genes have two introns and three exons. These genes also partly share similar repeat patterns and have a common block of 33 codons in the third exon. According to these structural characteristics, an evolutionary schematic diagram of apolipoproteins at the gene level has been proposed (Fig. 4b). The structure and length of the ancestral apolipoprotein gene are similar to ApoA-I or ApoA-IV; this primordial gene duplicated in Chondrichthyes, with one lineage becoming ApoA-I and one lineage becoming ApoA-IV after losing the first intron, gaining three duplications of 22 codons and one duplication of 11 codons in the fourth exon. Furthermore, in the ApoA-I lineage, the fourth exon obtained one 22-codon duplication and one 11-codon duplication, and then the duplicated gene became the ApoE gene in Coelacanthidae. In the ApoA-IV lineage, the gene duplicated into ApoA-V with one 11-codon duplication when Amphibia arose. In the latter lineage, substantial changes took place in gene length and gene duplication, and ten or eleven deletions of 11 codons occurred in the fourth exon.
Following duplication, one lineage became ApoA-II in Holostei, and the other became the common ancestor of ApoC. In the latter lineage, the duplication of 11 codons in the fourth exon occurred in Teleostei, and then the gene was further duplicated into two genes. These two resultant genes deleted one and two 22-codon repeats, leading to ApoC-II and ApoC-I, respectively. In the other lineage, the gene duplicated, with the deletion of 11 codons in the gene in Sauria and with the duplication of 11 codons in the gene in Eutheria, and became the present ApoC-III and ApoC-IV genes, respectively.
Apolipoprotein family members are constantly amplified and changed with the evolution of species and perform important physiological functions in these species. Although Wen-Hsiung Li et al. summarized the structural features and evolution of apolipoprotein in the 1980s, their study was limited to three species (humans, dogs and mice) and few apolipoproteins [29, 31, 32, 33]. Because of the lack of genome data for many species at that time, the conclusions of those studies are limited and cannot be applied to all vertebrates. In this research, we presented new insights that are very different from those of previous studies by the integrated analysis of a large amount of genome data of all vertebrates. We collected all ApoA, ApoC, and ApoE family members (ApoA-I/II/IV/V, ApoC-I~IV and ApoE) from Chondrichthyes, Holostei, Teleostei, Amphibia, Sauria (including Aves), Prototheria (Ornithorhynchus), Metatheria (Monodelphis), Laurasiatheria (Perissodactyla, Cetartiodactyla, and Carnivora), and Euarchontoglires (Glires and Primates). Representative species can indicate the dynamic changes in apolipoprotein family members at various stages and nodes of a vertebrate phylogeny more comprehensively.
These results suggest that the ancestral members of the apolipoprotein are most likely ApoA-I and/or ApoA-IV, and other members emerged subsequently after gene duplication. The order of emergence is roughly ApoA-I/ApoA-IV—ApoE—ApoA-II—ApoC-I/ApoC-II—ApoA-V—ApoC-III—ApoC-IV. This new discovery is quite different from the previous hypotheses of Wen-Hsiung Li et al., who suggested that the evolutionary change in apolipoprotein structure and length was from less structured to more structured and from short to long, respectively, by analyzing only the structural characteristics of human apolipoproteins. However, according to our study, the oldest members of the apolipoprotein family are ApoA-I and ApoA-IV, which appeared in ancient cartilaginous fish and have long amino acid sequences and complex structural compositions.
In the context of the divergence and consistency between species evolution and apolipoprotein phylogeny, almost nine apolipoproteins exhibit the phenomenon in which the clusters of protein sequences from different species are inconsistent with the evolutionary progress of these species. The Glires clade of apolipoproteins is usually separate from Euarchontoglires or Primates and does not cluster well in Eutheria. Indeed, through a literature search and analysis, we found that the divergent species in the genera Mus and Rattus are quite different from other species of Eutheria in their life history [37, 38]. In terms of body size, reproduction capacity and longevity, Mus and Rattus species differ significantly from other species. Compared with other species of Eutheria, species of Mus and Rattus are one-fiftieth in size, exhibit 2 to 3 times the frequency of reproduction, and have one-tenth of the average life-span; their peculiar life history inevitably has resulted in their unique mechanisms of growth and metabolism. Thus, we speculate that this diversity may lead to a marked distinction in the metabolism and transport function of lipids between Glires and Primates. This finding may explain the main reason for the effectiveness of drugs related to lipid metabolism in mouse or rat models but the ineffectiveness of such drugs in humans.
The functions of apolipoproteins not only include transferring lipids, regulating lipoprotein metabolism, and acting as receptor ligands and enzyme cofactors but also involves immune responses related to the pathogenicity factor lipopolysaccharide (LPS) [39, 40]. ApoA-I and ApoA-II, which are present in Teleostei, also have a crucial function in antibiosis [41, 42]. ApoC-I and ApoE are differentially expressed after bacterial infection in fish , and ApoA-IV is associated with food intake in zebrafish . In addition, the apolipoproteins Apo-II and Apo-IV have been found to be involved in estrogen regulation and egg production [45, 46], which do not exist in Mammalia.
These apolipoproteins, which are primitively present in Chondrichthyes, have important physiological roles, and their functions have also been preserved in some species during evolution. However, our results also show that ApoC-I, ApoC-II and ApoE are absent or were lost from Aves. Thus, the reserved exchangeable apolipoproteins may be responsible for the lipid metabolism and fat storage needed for migration. For example, ApoA-I can bind cholesterol, high-density lipoprotein (HDL) particle receptors and phospholipids and participate in the reverse transport of cholesterol from tissues to the liver by promoting cholesterol efflux from tissues and by acting as a cofactor for the lecithin cholesterol acyltransferase (LCAT). ApoA-IV can also bind copper ions and has protein-homodimerization activity. ApoA-V can bind heparin, lipase, low-density lipoprotein (LDL) particle receptor and phosphatidylcholine.
Furthermore, some functions of apolipoprotein are unique, and once they are lost, corresponding phenotypes will appear. For example, ApoE is related to the formation of bones and appeared in bony fish, not in cartilaginous fish [47, 48]. ApoE may provide a basis for bone structure changes in the evolution of cartilaginous fish to bony fish. However, ApoE is absent in birds and may be associated with specific phenotypes, such as hollow bones without bone marrow and slender ductile bone walls. ApoC-III is the essential component of triglyceride-rich very low-density lipoproteins (VLDLs) and HDLs in plasma. ApoC-III plays a multifaceted role in triglyceride homeostasis, promotes hepatic very low-density lipoprotein 1 (VLDL1) assembly and secretion, and attenuates the hydrolysis and clearance of triglyceride-rich lipoproteins (TRLs) . Animals, from aquatic to terrestrial, consume more energy to overcome changing temperatures and living conditions. ApoC-III may promote the accumulation of triglycerides and store more energy to fight hunger, but at the same time, it is also prone to inducing obesity and cardiovascular disease when food is adequate.
It is believed that relevant evidence for the absence or presence of other members can also be found, which is also the focus and challenge of future studies.
We noticed that the sequence of apolipoprotein family members evolved from long (ApoA and ApoE) to short (ApoA-II, ApoC-I, ApoC-II, ApoC-III). Long apolipoproteins mainly play a role of lipids binding and transport. Short apolipoproteins have multiple regulatory functions such as lipase inhibitor activity (ApoA-II, ApoC-I, ApoC-III), lipase activator activity (ApoC-II), signaling receptor binding (ApoA-II), phospholipase activator activity(ApoC-II), phosphatidylcholine-sterol O-acyltransferase activator activity (ApoC-I), heat shock protein binding (ApoA-II). Therefore, based the function annotation and the sequence shortening events, we propose a hypothesis that besides the functions of lipids binding and transport, short apolipoproteins may evolve new regulatory functions and contribute to more complex and flexible lipids metabolism in vertebrate species.
To summarize, we reported a lot of gain and loss events of the exchangeable apolipoproteins from Chondrichthyes to Eutheria, which hasn’t previously been reported. New members occurred at the nodes of speciation. Following ApoA-I and ApoA-IV, ApoE arose in Coelacanthiformes; ApoA-II arose in Holostei; ApoC-I arose in Teleostei with ApoC-II; ApoC-III and ApoC-IV emerged in Sauria and Eutheria, respectively; And ApoC-IA emerged in Hominidae. Furthermore, members also lost in the specific nodes of a vertebrate phylogeny. ApoC-I, ApoC-II and ApoE were absent from Aves; ApoC-I/ApoA-V was absent from Ornithorhynchus and Cetartiodactyla; And ApoC-IA lost in human. This is the first clarification of these gain and loss events, which will benefit the research in related fields. Additionally, we also noticed that the sequence of apolipoprotein family members evolved from long (ApoA and ApoE) to short (ApoA-II and ApoC), according to the species in which they primordially existed. Short apolipoproteins ApoA-II and ApoC may evolve new regulatory functions besides the functions of lipids binding and transport. Further data mining and functional research will focus on the function and evolution of major apolipoproteins related to human disease.
Collection of genome data and sequence retrieval
To study the evolution of the apolipoprotein family across vertebrata, we obtained genome information from the Ensembl database (http://www.ensembl.org/index.html). After the level of genome assembly and annotation were evaluated, only the species with the level ‘Chromosome’ were used for further analysis (Fig. 1). We downloaded the genome and protein sequences of these species. All members of the apolipoprotein family in Homo sapiens were retrieved from the NCBI database (https://www.ncbi.nlm.nih.gov/). These human apolipoprotein sequences (ApoA-I:NP_000030.1, ApoA-II:NP_001634.1, ApoA-IV:NP_000473.2, ApoA-V:NP_443200.2, ApoC-I: NP_001636.1, ApoC-II:NP_000474.2, ApoC-III:NP_000031.1, ApoC-IV:NP_001637.1, and ApoE:NP_000032.1) were used as query sequences against all protein sequence data using both local and online BLASTP searches under the default parameters. The details for sequence retrieval were described in our previous study . To confirm the loss of genes in specific species, we first used these protein sequences to search for genome sequences by local tblastn. Then, we used these apolipoproteins to search for all available nucleotide databases in NCBI by online tblastn.
Phylogenetic analysis of the apolipoprotein family
Phylogenetic analysis of the apolipoprotein family was performed using MEGA7 . The amino acid or nucleotide sequences were multiply aligned using MUSCLE  with the default parameters. Alignment gaps and unmatched regions were eliminated manually. Phylogenetic trees were constructed by using the maximum-likelihood (ML) method. The ML method is based on the Jones-Taylor-Thornton (JTT) matrix-based mode or Poisson model . The phylogenetic test was performed using the bootstrap method with 1000 bootstrap replications. The ML heuristic method was performed with the nearest-neighbor-interchange (NNI). Initial trees for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model and selecting topology with superior log likelihood value. The tree was drawn to scale with branch lengths measured in the number of substitutions per site. Positions including gaps and missing data were completely deleted.
J-Q L and S-X D carried out the data mining and sequence alignments and analyzed and interpreted the data regarding the homologous apolipoprotein family members. J-Q L, S-X D and J-F H designed and coordinated the study. J-Q L, W-X L, J-J Z, and S-X D participated in the bioinformatics analysis. J-Q L, S-X D and Q-N T contributed to the writing of the manuscript. All authors read and approved the final manuscript.
This work was supported by the National Natural Science Foundation of China (Grant Number 31601867), the National Natural Science Foundation of China (Grant Number 31401142), and the National Basic Research Program of China (Grant Number 2013CB835100).
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
- 4.O'Brien PJ, Alborn WE, Sloan JH, Ulmer M, Boodhoo A, Knierman MD, Schultze AE, Konrad RJ. The novel apolipoprotein A5 is present in human serum, is associated with VLDL, HDL, and chylomicrons, and circulates at very low concentrations compared with other apolipoproteins. Clin Chem. 2005;51(2):351–9.PubMedCrossRefGoogle Scholar
- 17.Munoz SS, Garner B, Ooi L. Understanding the role of ApoE fragments in Alzheimer's disease. Neurochem Res. 2018.Google Scholar
- 23.Strittmatter WJ, Saunders AM, Schmechel D, Pericak-Vance M, Enghild J, Salvesen GS, Roses AD. Apolipoprotein E: high-avidity binding to beta-amyloid and increased frequency of type 4 allele in late-onset familial Alzheimer disease. Proc Natl Acad Sci U S A. 1993;90(5):1977–81.PubMedPubMedCentralCrossRefGoogle Scholar
- 25.Lauer SJ, Walker D, Elshourbagy NA, Reardon CA, Levy-Wilson B, Taylor JM. Two copies of the human apolipoprotein C-I gene are linked closely to the apolipoprotein E gene. J Biol Chem. 1988;263(15):7277–86.Google Scholar
- 27.Wei CF, Tsao YK, Robberson DL, Gotto AM Jr, Brown K, Chan L. The structure of the human apolipoprotein C-II gene. Electron microscopic analysis of RNA:DNA hybrids, complete nucleotide sequence, and identification of 5′ homologous sequences among apolipoprotein genes. J Biol Chem. 1985;260(28):15211–21.PubMedGoogle Scholar
- 44.Otis JP, Zeituni EM, Thierer JH, Anderson JL, Brown AC, Boehm ED, Cerchione DM, Ceasrine AM, Avraham-Davidi I, Tempelhof H, et al. Zebrafish as a model for apolipoprotein biology: comprehensive expression analysis and a role for ApoA-IV in regulating food intake. Dis Model Mech. 2015;8(3):295–309.PubMedPubMedCentralCrossRefGoogle Scholar
- 46.Ratna WN, Bhatt VD, Chaudhary K, Bin Ariff A, Bavadekar SA, Ratna HN. Estrogen-responsive genes encoding egg yolk proteins vitellogenin and apolipoprotein II in chicken are differentially regulated by selective estrogen receptor modulators. Theriogenology. 2016;85(3):376–83.PubMedCrossRefPubMedCentralGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.