INTRODUCTION

Albumin is a superfamily of albuminoids, which also includes vitamin D binding protein, alpha-fetoprotein, and afamin. Albuminoids are mainly localized in organism fluids and are involved in the transport of several important ligands. An interest in albumin is due to its key contribution to the homeostasis of physiological processes in higher vertebrates, which is achieved in the form of transport and osmotic functions. Out of all albuminoids, albumin transports the widest possible range of ligands, including inorganic cations, fatty acids, bilirubin, vitamins, hormones, peptides, proteins, and other compounds, providing plastic and energy metabolism in vertebrates (Curry et al., 1998; Otterbein et al., 2002; Jerkovic et al., 2005; Ascenzi et al., 2013; Malik et al., 2013). Having a small globule size and a high titer in the blood plasma (~60% of total protein), albumin creates its colloid osmotic pressure, maintaining the vascular volume and osmotic homeostasis of the internal environment of the organism (Dziegielewska et al., 1980; Byrnes and Gannon, 1990; Gray and Doolittle, 1992; Metcalf et al., 1998a, 1998b; Xu and Ding, 2005; Majorek et al., 2012; Anguizola et al., 2013).

Testing 183 species of amniotes, amphibians, and fish for the presence of albumin, a 100% presence of this protein was detected in first two groups of organisms, while only half of the selected species had it in fish (Li et al., 2017). For example, among teleost fish (Teleostei), albumins were found in salmon (Salmoniformes) (Nynca et al., 2017), while they were not found in Cyprinidae (Noel et al., 2010). Such facts of gene loss are associated with the events of whole genome duplications (Noel et al., 2010; Braasch et al., 2014, 2016). In the course of evolution, vertebrates experienced several large-scale and smaller duplication events affecting whole genomes or their limited regions (Ohno, 1970). All multigenic families, including albuminoids, were formed under the influence of these processes (Hoegg et al., 2004; Noel et al., 2010; Ozernyuk and Myuge, 2013; Soshnikova et al., 2013; Braasch et al., 2014, 2016). Simultaneously with the duplication of genes and genomes, “narrowing” of the repertoire in the form of the loss of individual genes occurred in some multigenic families (Braasch et al., 2016; Pasquier et al., 2016). The loss of the albumin gene in the Cyprinidae fish family is an example of the consequences of genomic rearrangements. However, there is still no clarity on the extent of this loss in Teleostei: is it limited to cyprinids or does it cover a wider range of orders?

This review considers several scenarios of evolutionary transformations of albumin in the composition of the albuminoid superfamily using the example of model representative of lower aquatic vertebrates, including jawless Agnatha and jawed bony fish (Gnathostomata: Osteichthyes). The association of these scenarios with the events of whole genome, local, and segment duplications, as well as the problem of albumin gene loss and the compensation for this loss in the largest groups of teleost fish (Ostariophysi and Acanthopterygii), is discussed.

MATERIALS AND METHODS

To search for albumins, 26 species were selected belonging to 20 orders of jawless Agnatha (Cyclostomata, 1 order) and bony jawed fish (19 orders), including lobe-finned and ray-finned (Gnathostomata: Sarcopterygii, Actinopterygii) (Table 1). When selecting objects, we used information about the presence of genomes in DB Genomes NCBI (Genome) and whole genome sequencing (BioProject), as well as the list of model objects from a review (Ravi and Venkatesh, 2018) with BioProject accession identifiers for 60 species of ray-finned fish.

Table 1. List of gene and protein identifiers in tested objects

The taxonomy and Latin specific names are given according to “Annotated Catalog of Cyclostomes and Fishes…” (Reshetnikov, 1998). For the rainbow trout, the species name Parasalmo mykiss (Reshetnikov, 1998) and Oncorhynchus mykiss was used. The latter variant (O. mykiss) is used in Proteins, Genes (NCBI), and KEGG databases.

For teleost fish, an extended list of model objects was formed, which includes (1) lower Teleostei from two families (Salmonidae, Esocidae) of the order Salmoniformes, (2) Ostariophysi from five orders (Cypriniformes, Gymnotiformes, Gonorynchiformes, Characiformes, and Siluriformes), and (3) higher teleost Acanthopterygii from ten orders (Gobiiformes, Mugiliformes, Cichliformes, Atheriniformes, Cyprinodontiformes, Carangiformes, Pleuronectiformes, Labriformes, Perciformes, and Tetraodontiformes). Mammals (Homo sapiens) were used as a reference group (Table 1).

Information about amino acid sequences, domain organization, and S–S bonds of the albumin and other albuminoids of tested species were obtained from DB Proteins NCBI. The complete sequences were used for comparison (albumin fragments of lake char and lamprey were an exclusion). Information about the organization of albumin genes (exon–intron structure; complete length; length of coding and non-coding sequences; and paralogous genes, their chromosomal affiliation, and syntenic groups) was obtained from DB Genes and Genome Data Viewer (GDV) NCBI. The SmartBLAST and/or KEGG web resource was used to search for closely related sequence matches in the form of the five best matches and an extended list of homologs, multiple alignment, and quantification of sequence similarity. The alignment and comparison of intron sequences (similar in position in the structure of albumin genes) on the presence of short repeated DNA sequences in them in the tested species was carried out using a MEGA 6 sequence (Tamura et al., 2013).

The names of albuminoid proteins/genes and designations of the parameters of their organization are given according to DB Genes and Proteins NCBI: for albumin and vitamin D binding protein in lower vertebrates, ALB/alb and DBP/dbp, respectively; for albumin, alpha-fetoprotein, afamin, and vitamin D binding protein in humans, ALB/ALB, AFP/AFP, AFM/AFM, and DBP/DBP, respectively; CDS, coding sequence; SINEs, short interspersed nuclear element; LINEs, long interspersed nuclear element; D, domain; L, a.a.r., length of the protein chain in amino acid residues; and L, nt, length of DNA sequences in nucleotides.

Names of the events of whole genome, taxon-specific, and other duplications are given in accordance with the designations accepted in the literature: WGD, whole genome duplication; AsGD, acipenseric-specific WGD; TGD, teleost WGD; CyGD, cyprinidae WGD; SaGD, salmonidae WGD; SD, segment duplication; LD, local duplication; 1R, 2R, 3R, WGD rounds (Ohno, 1970; Freeling and Thomas, 2006; Braash et al., 2016).

Whole Genome, Local, and Segment Duplications in Vertebrates

Most biologists consider the duplication of genes and genomes with subsequent divergence as a source of the evolutionary process, which stimulated the emergence and radiation of large taxa of animals and plants, for example, vertebrates and angiosperms (Freeling and Thomas, 2006). Duplications of genes and genomes are reached using duplications of different scales (WGD, SD, and LD). With WGD, extended DNA segments (ohnologs) are amplified. With SD, less extended genomic segments are amplified under the influence of chromosome rearrangements; during interchromosomal rearrangements, they can be on different chromosomes. LDs usually occur as tandem repeats, as a result of which a cluster of related genes located one after another appears on the chromosome. In all these cases, a duplication of the genome, group of genes, or individual genes occurs, as a result of which homologous copies are obtained from the original (ancestral) genes. Homologous genes in the same genome that occurred by duplication of the ancestral gene are paralogs; homologous genes in the genomes of different organisms occurred as a result of evolution of the same ancestral gene, which was in the genome of the common ancestor of these organisms, are orthologs (Ravi and Venkatesh, 2018).

In 1970, Susumu Ohno (Ohno, 1970)) put forward a 2R hypothesis according to which the ancestral genome of vertebrates went through two rounds of whole genome duplications prior to their diversification (1R (WGD1) and 2R (WGD2)). Later, jawed fish went through the third round of whole genome duplication (3R, WGD3), which gave rise to the Teleostei taxon; for this reason, it was designated as TGD (Ohno, 1970; Noеl et al., 2010; Braasch et al., 2016; Pasquier et al., 2016). Among tetrapods, genome duplications occurred in amphibians and reptiles (Sauria); in birds and mammals, no resistant polyploid forms were found (Vasil’ev, 1985; Uno et al., 2013; Evans et al., 2017; Ravi and Venkatesh, 2018) (Fig. 1).

Fig. 1.
figure 1

Scheme of vertebrate evolution with a mark of the main rounds of whole genome duplications (WGD1, WGD2, WGD3) relative to the stages of Vertebrata divergence into Agnatha and Gnatostomata and the divergence of the latter into Sarcopterygii and Actinopterygii. Esoc, Esocidae; Salm, Salmonidae. Duplications in individual orders in the composition of taxa are noted with small asterisks. Explanations are in the text.

Most authors consider that first two rounds of duplications took place before the separation of Agnatha and Gnathostomata (~500–550 million years ago) (Holland et al., 1994; Kuraku et al., 2009a, 2009b; Van de Peer et al., 2017). Another point of view supports the scenario in which the first round is assigned to the period before the divergence of Agnatha and Gnathostomata, while the second round to the period after their divergence (Nakatani et al., 2021). The third round of duplications occurred presumably ~230–400 (Venkatachalam et al., 2017) or ~300–450 million years ago (Taylor et al., 2001).

The analysis of genomes in cyclostomes indicates a high probability of polyploidization events (hexaploidization) independent of Gnathostomata in the lamprey line (Smith and Keinath, 2015; Nakatani et al., 2021). Independent events of polyploidization occurred in Acipenseriformes in the Acipenseridae and Polyodontidae families (Crow et al., 2012; Du et al., 2020; Cheng et al., 2021). In the branch of ray-finned fish after the TGD event and appearance of Teleostei taxon, the genome duplications occurred in cyprinids (СyGD) (Ma et al., 2014) and salmon fish (SaGD) (Alexandrou et al., 2013; Berthelot et al., 2014; Zhivotovskii, 2015). Polyploids were also found in some orders of Acanthopterygii, for example, in Labriformes, Anabantiformes, and some others (Vasil’ev, 1985; Felip et al., 2009; Cioffi et al., 2015) (Fig. 1).

In addition to 2R hypothesis, models considering the contribution of both whole genome and local and segment duplications in the evolutionary process were proposed (Freeling and Thomas, 2006). For example, the model 1R + SD takes into account the contribution of multiple segment duplications in genomic rearrangements of lampreys before the event 1R and after it (Asrar et al., 2013; Smith and Keinath, 2015; Nakatani et al., 2021). It is assumed that both whole genome duplications TGD and SaGD and SD contributed to the emergence of Salmonidae (Vasil’ev, 1977; Zhivotovskii, 2015).

Organization of Albuminoid Genes on Chromosomes

Among vertebrates, only amniotes have a complete set of four albuminoid genes. In mammals, all four genes (DBP, ALB, AFP, and AFM) are located on the same chromosome (Fig. 2). In jawed fish, the genes of two albuminoids (alb and dbp) were found. Agnatha have only a single albuminoid (albumin); they have no dbp gene (Fanali et al., 2012). Like mammals, the alb and dbp genes in gar pike and sterlet are located on the same chromosome in the opposite transcriptional orientation (Fig. 2b). In lower teleost fish, 2–3 albumin paralogous genes were found: 2 genes in Northern pike and 3 in Salmonidae. Moreover, the alb and dbp genes can be located on the same or different chromosomes. In pike, the paralogous genes are located in two different chromosomes. In rainbow trout, 3 albumin paralogous genes are located on chromosomes 1, 11, and 23; both albumin and dbp genes are located on the chromosome 11; only albumin genes in the chromosomes 1 and 23.

Fig. 2.
figure 2

Position of albuminoid genes on the chromosome in the composition of syntenic groups in human (a), sterlet (b), Atlantic salmon (c), and northern pike (d). Arrows show the direction of the transcription. Names of the genes and explanations are in the text.

In mammals and fish, syntenic groups of genes including albumin are different. They do not match in the gar pike, sterlet, or Salmonidae. At the same time, two stable syntenic groups that include albumin paralogous genes were found in the Salmoniformes order (Figs. 2c, 2d). In Northern pike (which is considered the species closest to the common ancestors of Salmonidae) (Rondeau et al., 2014), the syntenic group on chromosome 13 includes (taking into account albumin) five genes: E3 ubiquitin protein lipase K CMF1 like–solute carrier family 20 member–albumin– methylenetetrahydrofolate dehydrogenase (NADP + dependent) 2 like–epithelial mitogen (epigen). Another syntenic group (also out of five genes) on chromosome 5 includes other genes: poly(A) polymerase gamma–retinal homeobox protein Rx1-like– albumin–v-rel avian reticuloendotheliosis vital oncogene homolog–peroxisomal biogenesis factor 13 (Fig. 2c).

In the Atlantic salmon, the syntenic group on the chromosome ssa15 matches (except for one gene) the sequence of pike genes on chromosome 13, but only in reverse order: epigen–methylenetetrahydrofolate dehydrogenase (NADP + dependent) 2 like–albumin–sodium dependeter phosphate transporter 2– Un–E3 ubiquitin protein lipase K CMF1 like (Fig. 2d). Another syntenic group on an unidentified salmon chromosome (Un) completely matches the order of genes in the pike on chromosome 5 (Fig. 2d).

Both syntenic groups are found on different chromosomes in the brown trout Salmo trutta, lake trout Salvelinus namaycush, and rainbow trout Oncorhynchus mykiss in the form of complete or truncated variants. At the same time, the examples of the transformation of syntenic groups as a result of inversions of single genes, several genes, or complete linkage groups are noted in each case.

Exon–Intron Organization of Albumin Genes

The length of the coding sequence (CDS) of albumin gene in Gnathostomata is relatively constant (Table 2). In all tested species, it is organized in the form of 14–16 exons; albumin of the lake trout Salvelinus namaycush on chromosome 4 has a minimal number of exons (13). Except for two extreme exons, the rest encode a polypeptide chain out of three domains. The lengths of the complete gene sequences vary significantly due to the different length of introns (noncoding DNA regions) (Tables 2, 3; Fig. 3). In the lamprey, all listed indices exceed those in jawed fish.

Table 2. Parameters of organization of the albumin gene and polypeptide chain in model species of jawless Agnatha and jawed fish (according to DB Genes and Proteins NCBI of February 14, 2022)
Table 3. Position and length of introns in albumin genes of Gnatostomata (according to DB Genes NCBI of April 1, 2022)
Fig. 3.
figure 3

Graphical representation of exon–intron organization of albumin genes in sea lamprey (Petromyzon marinus), coelacanths (Latimeria chalumnae), sterlet (Acipenser ruthenus), gar pike (Lepisosteus oculatus), Atlantic salmon (Salmo salar), northern pike (Esox luceus), and humans (Homo sapiens). Exons are presented in the form of dark vertical boxes of different thicknesses depending on their length; introns are in the form of horizontal lines between them. D1–D3, domains. Graphical data of DB Genes NCBI were used.

The comparison of lengths of the appropriate introns in albumin genes of jawed fish and human found no matches (Table 3). Most of those presented in DB NCBI intron sequences start with GT dinucleotide and end with AG (splicing sites); we note that these sites are absent in a number of presented sequences (introns between the second and third exons of albumin gene in northern pike Esox lucius on chromosome 5 and albumin gene in gar pike Lepisosteus oculatus).

The comparison of repeated elements in the structure of albumin gene introns in fish and humans located between the exons 2 and 3 and highlighted in pink in Table 3 revealed no matches. There are 11 SINEs and 6 LINEs (short and long dispersed DNA repeats) in human albumin gene introns. They do not follow each other in a linear order, but are scattered throughout the sequence (Nishio and Dugaiczyk, 1996; Nishio et al., 1996). Among SINEs, primate-specific Alu repeats, named after the Alu restriction endonuclease, whose DNA processing led to their discovery, are best known: aattgcttgtgtttctt, gatgtatg, gaatactcattcat, and others. Four Alu sequences were detected in the human albumin gene.

In fish, other short repeats were found in the composition of albumin gene introns. Thus, the intron, located between exons 1 and 2, contains tg repeats of different lengths (from 40 to 124 bp in salmonids and shorter in coelacanth). The longest second intron (7411 bp) in the salmon albumin gene on the chromosome ssa15 contains 13 copies of agagtatggc short repeats, 10 copies of ccttggtaat, and 11 copies of aaccaggtcc. The listed sequences are absent in the paralogous gene of salmon albumin, as well as in albumin genes of rainbow trout, coelacanth, and humans.

The promoter region of the albumin gene in tested species is presented in DB NCBI only by the sequence in Atlantic salmon. It consists of 410 bp (Salmo salar albumin promoter region: GenBank X79487.1). In humans, the promoter region was studied in detail (Kajiyama et al., 2006). It consists of 406 bp (albumin (ALB) 5' regulatory region; RefSeq NC_000004.12). The albumin gene promoters contain ТАТА motifs both in salmon and in humans. We note that, in many vertebrates, the promoters of a number of genes can contain other initiating elements (Carninci et al., 2006). The promoter motif TATA box in the salmon albumin gene consists of six nucleotides (TATAAA) and, in humans, seven (ТАТАААА). Regulatory regions of salmon and human albumin gene are homologous, which is confirmed by a specific binding of salmon HNF1 transcription factor by promoter regions of albumin genes in fish and mammals (Deryckere et al., 1995).

Organization of Albumin Polypeptide Chain

The polypeptide chain of albumin and other albuminoids consists of three albumin domains that are structural and functional units (Li et al., 2017). The domains contain binding sites of various endogenous and exogenous ligands, including seven fatty acid binding sites (Ghuman et al., 2005). Each domain contains ~190 amino acid residues (a.a.r.). In mammals, each domain is stabilized by 5–6 S–S bonds; the complete polypeptide chain has 17 S–S bonds (Saber et al., 1977). Spiralized regions of the polypeptide chain give it a conformational flexibility, while S–S bonds maintain the required rigidity of the spatial structure. Like mammals, fish retain a conservative three-domain organization of albumins, as well as a conservative amount of amino acid residues and S–S bonds. In lampreys, all indices increase proportionally (Table 2, Fig. 4).

Fig. 4.
figure 4

Graphical image of albumin domains (D) of polypeptide chain of human albumin (HSA) (RefSeq NP_000468.1) (a) and Japanese lamprey Lеthenteron camtschaticum (GenBank BAF47283.1) (b); positions of cysteines (С77–С193) and S–S bonds in the domain (D1) HSA (RefSeq NP_000468.1) (c). Explanations are in the text.

The conservatism of the listed parameters is maintained with a relatively high indices of the similarity of amino acid sequences at intragenus and intraorder levels. At the same time, the similarity of sequences in the representatives of different orders is minimal. Thus, the similarity of SDS-1 albumin in sea lamprey with albumins of the same species and Japanese lamprey reaches ~77–99%, while with mammal albumins it is only ~22–24%. The similarity of sterlet albumin (612 a.a.r.) with albumins of Acipenseriformes is ~81–99%; with albumins of Lepisosteidae, Polypteriformes, and Salmonidae, it is ~58, ~49–51, and ~43%, respectively; with AFP and AFM of amniotes, it is ~28–35%; and with DBP of vertebrates, it is <28%. The same pattern is traced for the Atlantic salmon albumin: a high similarity of sequences at intragenus (~78–99%) and intraorder (~64–93%) levels, lower at the interorder level (~41–49% with albumins of gar pike and Acipenseriformes), and minimal with albuminoids of amniotes (~23–27%) and DBP (~20–22%).

For gar pike, a maximal similarity (93%) was also noted at an intraorder level with the albumin of alligator gar Atractosteus spatula; lower indices were noted with albumins of Acipenseriformes (~58–60%) and Salmoniformes (~43–49%); and there was a minimal similarity with albumins of mammals (~31–33%), AFP and AFM of amniotes (~26–32%), and DBP (~21–22%).

The albumins of sea lamprey have a special status due to the large length of the polypeptide chain (1423 a.a.r.), the presence of seven albumin domains, and the number of S–S bonds proportional to the protein size (41), in general, retaining the similarity with albuminoids of Gnathostomata. The high level of similarity of albumin sequences in lampreys at an intraorder level as compared with albumins of other vertebrates is probably explained by their independent origin from one- or two-domain precursor, unlike albumins of other vertebrates starting from three-domain precursor (Gray and Doolittle, 1992). The homology of amino acid sequences of albumins in lampreys and albuminoids in Gnathostomata is preserved throughout the lamprey amino acid chain (Fig. 5).

Fig. 5.
figure 5

Multiple alignment of the albumin sequence of Japanese river lamprey Lethenteron camtschaticum (a) and sea lamprey Petromyzon marinus (b) with the five closest homologs, albuminoids ALB, VDB, AFP, and AFM in human, mouse Mus musculus, and zebrafish Danio rerio. The scale of the length of amino acid sequences is on the top, their length is on the right, and the ID identifier is indicated near the name of the protein and species.

Evolutionary Transformations of Albumin and Their Association with Genome/Gene Duplication

Three-domain albuminoids originated from a single-domain precursor. Initially, this conclusion was made based on the analysis of amino acid sequence of bovine serum albumin and, first and foremost, by the location of S–S bridges characterized by a 3-fold repetition in the amino acid sequence (Brown, 1976). Later it became clear that the sequences of albuminoids of other mammal species had a similar architecture of S–S bonds and three-domain structure (Gorin et al., 1981; Jagodzinski et al., 1981; Morinaga et al., 1983; Minghetti et al., 1985, etc.).

The origin of albuminoids was given by an ancient gene encoding a “semi-domain” protein (½D) (Gibbs and Dugaiczyk, 1987). It had five exons in its structure. The gene with seven exons encoding single-domain (1D) protein originated from it. Based on it, the gene encoding a two-domain protein and containing 11 exons appeared, followed by an ancestral gene from 15 exons encoding a three-domain protein (albuminoid precursor, 3D). The elongation of the ancient gene occurred as a result of two events of nonhomologous recombination (unequal crossing over) and one event of homologous recombination. It is assumed that the ancestral gene was duplicated, giving rise to two paralog protogens (proto-ALB and proto-DBP). Proto-ALB evolved into ALB genes, which gave rise to AFP and AFM genes, while vitamin D binding protein genes in vertebrates originated from proto-DBP (Eiferman et al., 1981; Ohno, 1981; Sargent et al., 1981; Gibbs and Dugaiczyk, 1987; Gray and Doolittle, 1992; Noel et al., 2010). The DBP gene at initial stages of evolution lost two exons (the 12th and 13th) (Nishio and Dugaiczyk, 1996; Malik et al., 2013). DBP was not found in lampreys (Gray and Doolittle, 1992); however, it is in almost all other groups of vertebrates, and it preserved its structure well over ~500 million years of evolution (Bouillon et al., 2020). All vertebrates have the albumin gene except for some teleost fish; only amniotes have AFP and AFM (Noel et al., 2010).

The events of gene duplication that resulted in a three-domain structure (common for albuminoids) presumably occurred between the branching of lampreys from the trunk of vertebrates (~450 million years ago) and the emergence of DBP (Gray and Doolittle, 1992). The albumin of lampreys most likely occurred as a result of the duplication of the gene encoding one- or two-domain protein regardless of the events leading to three-domain albumin and before the emergence of the DBP gene (Gray and Doolittle, 1992). The elongation of the albumin polypeptide chain to seven domains occurred in the line of lampreys. In analogy to the Gibbs and Dugaiczyk model (Gibbs and Dugaiczyk, 1987), it can be assumed that this elongation could be a result of nonhomologous recombinations in the form of unequal crossing over. Literature data indicate the presence of two albumins in lampreys (AS and SDS-1) presumably encoded by paralogous genes (Filosa et al., 1998). Recent studies indicate a high probability of polyploidization events independent of Gnathostomata in the branch of lampreys in the form of hexaploidization and segment duplications (Smith and Keinath, 2015; Nakatani et al., 2021), which indirectly supports the point of view that lampreys can have paralogous genes for albumin.

After the 2R even, the evolution of albumin on the basis of three-domain precursor in different branches of Gnathostomata occurred according to different scenarios. The coelacanth (Sarcopterygii) and gar pike (Holostei) had a diploid genome typical for vertebrates similar to the human genome (Ravi and Venkatesh, 2018); their albumins are determined by a single gene. In Tetrapoda (Sarcopterygii), duplication events occurred that affected albuminoids. Thus, a whole genome duplication occurred ~40 million years ago in amphibians, as a result of which frogs have albumin genes on homeologous chromosomes. In amniotes, polyploids appeared in the group Sauria (lizards); no stable polyploids were found in birds and Mammalia (Vasil’ev, 1985). Meanwhile, all amniotes have AFP and AFM genes absent in other Vertebrata. They appeared presumably as a result of the local (tandem) duplication of the ancestral gene of albuminoids and the appearance of the ancestral AFP/AFM gene (Noel et al., 2010). Another point of view considers AFM an intermediate component between ALB and AFP (Fasano et al., 2017). Thus, the evolution of albumin in tetrapods proceeded in the following directions: (a) the appearance of multiple copies of albumin gene in the composition of homeologous chromosomes in allotetraploids (amphibians) and (b) tandem duplication of albumin gene and appearance of paralogous AFP and AFM genes, as a result of which the cluster ALB/AFP/AFM occurred, from which the DBP gene (amniotes) is located at a distance of 1.5 Mb. The appearance of AFP/AFM probably occurred after 2R and the divergence of amphibians and reptiles ~320 million years ago (Noel et al., 2010).

In another branch of Gnathostomata (Actinopterygii), two copies of the albumin gene, which appeared probably as a result of genome duplication specific to Acipenseriformes, were detected in the Acipenseriformes order in the sterlet. A TGD event, which gave rise to the taxon Teleostei and stimulated the diversification of this group, occurred on the time interval ~230–450 million years ago (simultaneously with the divergence of amphibians and reptiles) in the Actinopterygii branch. As a result of the reorganization of the genomes, the largest groups of teleosts (Acanthopterygii and Ostariophysi) lost the albumin gene, while lower teleosts (Salmonidae and Esocidae) received additional copies of this gene.

Based on the data, a scheme of evolution of albumin in the groups Agnatha and Gnathostomata is proposed (Fig. 6), supplemented with the events of the whole genome and segment duplications, recombinations, and duplications of the albumin gene (including in tetrapods in the group Amniota).

Fig. 6.
figure 6

Scheme of evolutionary transformations of the precursor of albuminoids and albumin under the influence of WGD: in Agnatha (a) and Gnathostomata before the TGD event (b) and after it (c). The scheme is complemented by the events of segment and local (tandem) duplications, as well as by the event of albumin gene loss. Explanations are in the text.

Evolutionary Fate of Albumin Paralogous Genes after Duplication Events

Susumo Ohno (1970) suggested that, most frequently, one of the two paralogous genes loses its functionality after duplication. The accumulation of harmful mutations that lead to degeneration, pseudogenization, and the nonfunctional state of the gene can be one reason for the loss. Pseudogenes are characterized by a certain degree of homology with functional genes and changes in organization, preventing their normal transcription and translation. Most copies lose their functionality during rediploidization following the duplication event. The analysis of TGD consequences on the example of multigenic Teleostei families indicates a high percentage of the loss of their genes after the duplication of genomes (Brunet et al., 2006; Braash et al., 2016). Thus, ~20% of all duplicated genes in Danio rerio are retained; the rest lose their functionality (Postlethwait et al., 2000) and, in general, this index reaches ~17% for the group of teleosts (Braasch and Postlethwait, 2012). The absence of the albumin gene in Acanthopterygii and Ostariophysi fish is also probably an example of the loss of the gene as a result of TGD. This event is not related to the absence of DBP in lampreys. In the first case, we are talking about a possible loss of albumin genes, since albumins were in all groups of bony Gnathostomata before the TGD event. In the second case, the absence of DBP is explained by the fact that the branching of Agnatha occurred before the emergence of the three-domain precursor of all albuminoids, and hence before the appearance of proto-DBP.

At the same time, some groups of fish (for example, Salmonidae) demonstrate the inclusion of almost half of all duplicated genes in the processes of sub- and neo-functionalization (Berthelot et al., 2014). These two scenarios are achieved in the evolution of paralogous genes that are preserved after duplication events. During subfunctionalization, the functions of the original gene are divided between paralogs. A duplication–degeneration–complementation (DDC) model (Force et al., 1999) assumes a possible scenario of distribution of the functions of ancestral gene between two paralogs. According to it, the accumulation of mutations occurs in both copies. This leads to a decrease in their functionality, as a result of which both copies together reproduce the effect of a single (original) gene. If mutations concern regulatory regions, then this will be reflected in the expression patterns of paralogs. If mutations concern protein-coding regions, then they can lead to the distribution of functions of the original gene between its daughter copies. The examples of these scenarios were described in a number of experimental works and reviews (Ozernyuk and Myuge, 2013; Kamenskaya and Brykov, 2020; Bayramov et al., 2021; Gu and Xia, 2019; etc.).

Trying on the listed scenarios for albuminoids, we note that initially three-domain ancestral protein encoded by the ancestral gene probably performed a transport function. Its duplication led to the appearance of two paralogs (proto-ALB and proto-DBP) that gave rise to albumin and vitamin D binding protein genes. Their subsequent subfunctionalization could occur in the form of the specialization of each of them in the transport of a certain range of ligands: wide (in the case of albumin) and narrow (in the case of DBP). It is possible that the narrowing of the functionality of DBP occurred due to the loss of two exons by it in the early stages of evolution. The acquisition by one of the two paralogs (namely, albumin) of a new (osmotic) function could be an example of neofunctionalization. This could occur due to an increase in the level of expression of the gene encoding it (probably due to mutations in its regulatory region). A high expression provided a high albumin titer in the plasma, which turned out to be sufficient to create a colloid–osmotic pressure of the plasma. Since only albumin has the highest titer in the plasma of vertebrates (among all albuminoids), it was fixed by selection as an ideal protein factor controlling the homeostasis of extracellular fluid in the organism of vertebrates.

Another example of subfunctionalization concerns albumin in lampreys and tetrapods. The paralogs of albumin gene SDS-1 and AS divided the function of ligand transport between different stages of ontogenesis (Filosa et al., 1998), like ALB and AFP paralogous genes in mammals that divided the function of transport between the embryonic and postnatal stages of development.

One more example of sub- and neofunctionalization is demonstrated by Salmonidae, the genomes of which underwent whole genome and segment duplications and experienced at least six multiple chromosome fusion events (Makhrov, 2017). Almost half of ancestral genes (48%) were preserved in Salmonidae as copies, a significant part of which has divergent profiles (levels) of expression (Berthelot et al., 2014). This circumstance indicates a high probability of mutations in the regulatory regions of paralogs and that almost half of the paralogous genes in Salmonidae are involved in the processes of sub- and neofunctionalization. It is possible that albumin paralogous genes are also involved in these processes.

After duplication events, genomes go through a period of rediploidization or a return to a diploid state (Ravi and Venkatesh, 2018). This probably explains the reason for the discrepancy in the numbers of real and expected paralogous genes in polyploids. It was established (Ramberg et al., 2021) that the average number of paralogs for a number of genes encoding the enzymatic proteins is three in Salmo salar. The same number of paralogs was also established for albumins in S. salar. Apparently, this number is more than two due to the fact that Salmonidae went through (in addition to 1R and 2R) two whole genome duplications (TGD and SaGD). At the same time, it is less than the expected value of four, probably due to the process of rediploidization of the polyploid genome, which Salmonidae went through in the late Cretaceous and Eocene (Gundappa et al., 2021). Meanwhile, only two, but not three, albumin paralogous genes are found in northern pike, which is probably explained by only one (in addition to 1R and 2R) duplication event in the form of TGD in the evolution of Esocidae.

After the given examples of one or another strategy in the evolution of paralogous genes, a natural question arises: what factors determine its choice?

Factors Affecting the Evolutionary Fate of Paralogous Genes

The evolutionary strategy of paralogous genes depends on the way of duplication (WGD or SD) and type of amplified DNA fragments (Freeling and Thomas, 2006). In whole genome duplications, DNA is copied as long copies (ohnologs) uniting multiple groups of genes. In this form, the genes involved in the processes of regulation of ontogenesis, oogenesis, cell cycle, signaling cascades, and others are preserved. In the composition of ohnologs, the genes are often combined into so-called “functional modules.” The proteins encoded by the genes of such modules are, as a rule, organized in supramolecular functional complexes. It is considered that the duplication of such modules leads to a jump in the adaptability of organisms. At the same time, individual genes that are not combined into functional modules are preserved in the form of short DNA copies under segment duplications. Such a method of amplification does not lead to a jump in adaptability, but makes a certain contribution to the modulation of biological processes. The genes responsible for the conservative functions, for example, DNA metabolism, activity of nucleases, and others, are preserved in the form of short copies (Bowers et al., 2003; Papp et al., 2003; Yang et al., 2003; Maere et al., 2005; Freeling and Thomas, 2006). The ideas about the role of gene balance in the composition of long and short DNA copies are based on the fact that the balance is observed in ohnologs, while local duplications (for example, tandem), due to a risk of imbalance, preferably cover the genes encoding monomeric proteins or the genes weakly bound on the chromosome (Freeling and Thomas, 2006).

Returning to albumins of Salmoniformes (Esocidae and Salmonidae fish), we note that there is no information whether they are included or not included in functional modules. However, it was found that they are stably present in the composition of two syntenic groups out of five–six genes. One such “pike” group of genes on chromosome 5 is completely reproduced on one of the chromosomes of Atlantic salmon, brown trout, and rainbow trout. Another pike group of genes on chromosome 13 is completely reproduced on chromosome 18 of the lake char and in the form of an inverted gene sequence on salmon chromosome ssa15; at the same time, one of the five pike genes in salmon is “lost” and replaced by another. Inversions of individual or several genes from these two syntenic groups of the pike, substitutions of one gene for another, also occur in other Salmoniformes. Neither polyploidization nor the intense chromosomal rearrangements typical for Salmoniformes (which were also found in the precursors of teleosts) (Braash et al., 2016) “broke” these groups, but only rearranged them.

A comparative analysis of the genomes of teleost fish, gar pike, Australian ghostshark Callorhinchus milii, and other Gnathostomata demonstrated that short conservative syntenic groups or blocks are typical for teleosts (Ravi and Venkatesh, 2018). For single genes encoding monomeric proteins not organized in clusters in the form of tandem repeats and not included in the composition of functional modules, another regularity was established: they are lost more often than related genes in the composition of functional modules of ohnologs (Freeling and Thomas, 2006). It is possible that a similar scenario for the loss of the albumin gene could be realized in the groups Ostariophysi and Acanthopterygii.

Due to WGD, teleost fish made a global jump in evolutionary development, while a “finer” adjustment of the mechanisms aimed at the maintenance of a balanced work of the doubled genome (and, ultimately, on the homeostasis of physiological functions) can be reached due to segment duplications and the duplications of individual genes.

Problem of Compensating Albumin Function in the Groups Acanthopterygii and Ostariophysi

Taking into account the role of serum albumin as one of the key factors of physiological homeostasis in higher vertebrates, it could be assumed that its loss in the representatives of lower vertebrates should have unfavorable consequences. However, Acanthopterygii and Ostariophysi belong to the largest and most evolutionary advanced groups of vertebrates; from this, it follows that some other plasma proteins took over the osmotic and transport functions of albumin in them.

The analysis of the protein composition of the blood plasma in teleost fish, including Acanthopterygii and Ostariophysi, revealed a wide spectrum of proteins with a high electronegative potential capable of binding inorganic cations and water dipoles and thus exhibiting osmotic activity. It was found that not one, but several proteins in the composition of the osmotically active protein fraction are present in the blood plasma of albumin-containing and “albumin-free” teleost fish: (1) albumin, (2) warm temperature acclimated 65 kDa protein exhibiting the properties of hemopexin, (3) inhibitors of serine and cysteine proteinases, (4) apolipoproteins ApoA-I and Apo-14 in the composition of high-density lipoproteins, and (5) multiple additional anode fraction out of low molecular weight proteins with highly negative surface potential (Andreeva, 2019, 2020, 2021). The latter has not only “true” plasma proteins (extracellular proteins with specific functions in the circulatory system), but also so called “transit” proteins, whose pool is constantly filled with intracellular proteins entering the blood due to cell destruction (Andreeva, 2021).

The portion of osmotically active plasma proteins in teleost fish from the abovementioned groups reaches ~50–60% of the total protein, which is comparable with the content of albumin in mammalian plasma. This suggests that the contribution of these proteins in a colloid–osmotic pressure of the plasma in “albumin-free” teleost fish is comparable with Mammalia. The transport function of blood plasma was not affected in Acanthopterygii and Ostariophysi. Thus, the comparison of fatty acid transport functions between albumins and high density lipoproteins suggests their more efficient implementation with the involvement of lipoprotein particles (Andreeva, 2019).

Reasons for the Evolutionary Success of Polyploids Using the Example of Teleostei, or Why the Loss of Albumin Was Unnoticed in Teleost Fish

The main changes that emerged in Teleostei when compared with more ancient ray-finned fish (and apparently determined their evolutionary success) affected the mechanisms of locomotion, nutrition, and fertility (Romer and Parsons, 1986). Teleostei differ from other vertebrates by the highest diversity of body shape, way of life, type of nutrition, and high fertility. Since the genomes of the representative of teleost fish passed through an additional round of WGD and taxon-specific events SaGD, CyGD, etc., the progressive traits of this group are largely associated with genome duplication events. Among duplicated genes, HOX genes occupy a special place. After the separation of Acipenseriformes and Semionotiformes, the duplication of four clusters of HOX genes occurred in the precursors of teleosts. It is assumed that it is this event that affected shape-building processes; organism morphology; and caused the radiation of the ancestral group, which gave rise to a new taxon (Ozernyuk and Myuge, 2013; Soshnikova et al., 2013).

The genes encoding transcription factors are another group of genes whose duplications had important consequences for the evolution of Teleostei. They are represented by different multigenic families (Sox, Bmp, Gtf3a, Hif, HNF1, etc.) involved in the regulation of the expression of hundreds of other genes; signaling cascades; and control of the processes of embryogenesis, ontogenesis, oogenesis, cell cycle, etc. (Ozernyuk and Myuge, 2013; Pelster and Egg, 2018; Rojo-Bartolome et al., 2020). The genes encoding proteins of the intermediate metabolism undoubtedly contributed to the evolutionary success of Teleostei. First and foremost, these are proteins involved in lipid metabolism, which is the basis of the energy of fish. A special role is given to transport proteins responsible for providing tissue metabolic processes with all required ligands, including albuminoids (Sharma et al., 2006).

The state of rRNA gene clusters is important for the survival of polyploid organisms, since the expression of duplicated genes is limited by a protein-synthesizing ability of ribosomes depending on the efficiency of replenishment of pools of different rRNA fractions. No clear correlation was detected when comparing the expression of the clusters of these genes in allotetraploid (4n = 200) and diploid (2n = 100) goldfish Carassius auratus, although differences in the expression of 5S and 45S rRNA were noted (Zhao et al., 2021). The effect of duplication events and chromosome rearrangements on redistribution, accumulation, and homeostasis of rRNA, providing the required level of expression of in-demand duplicated genes, was demonstrated using the example of Danio rerio and polyploid Perciformes (Cioffi et al., 2015; Rojo-Bartolome et al., 2020). Unlike teleost fish and other eukaryotes, the multiplicity of alleles of the 18S rRNA gene (associated presumably with the polyploidization of this group) was demonstrated in the Acipenseridae family using the example of nine species of North American Acipenseridae (Krieger et al., 2000, 2006; Krieger and Fuerst, 2002). A significant individual variation of 18S rRNA gene is absent in most other species, for which its sequences were established. The authors suggest that a high polymorphism of these sequences in Acipenseridae can be associated with a low rate of coordinated evolution in this group, slowing down the loss of any resulting polymorphic variation. This circumstance allows us to explain the absence of similar variability in teleost fish by the rapid pace of evolutionary transformation in this group (Ravi and Venkatesh, 2018).

CONCLUSIONS

This search suggests differences in the scenarios of the evolution of albumin genes in jawless and jawed vertebrates. In the line of lampreys, the evolution of albumin occurred under the influence of one or two rounds of WGD and independent events of hexaploidization and segment duplications. Taking the origin not from the three-domain precursor, but from shorter one- or two-domain proteins, albumins of lampreys reached a two-fold superiority in the length of the amino acid sequence over albumins of Gnathostomata. Two paralogous genes, sharing the function of ligand transport in juvenile and mature lamprey individuals, are very similar to paralogs (albumin and alpha-fetoprotein) in amniotes sharing the transport function of the original gene between the embryonic and postnatal stages. Jawed fish unaffected by the third whole genome duplication (coelacanth and gar spotted pike) have, like mammals, one albumin gene. At the same time, two albumin paralogous genes were found in sterlet, in the evolution of which, in addition to the first rounds of WGD, whole genome duplication specific for Aciprnseriformes was present. In the group of teleost fish, the third round of WGD and taxon-specific whole genome duplications led, on the one hand, to the appearance of albumin paralogous genes in lower Teleostei and, on the other, to the loss of albumin in the largest and evolutionary advanced Acanthopterygii and Ostariophysi groups; as a result, they had a “narrowing” of the albuminoid repertoire to a single protein (DBP).

The presence of sequenced genomes in the absence of annotations or weak support in the form of sequence annotation of some analyzed species do not allow one to consider the search results final. However, the extensive literature data on the identification of plasma proteins and tissue filtrates in model Acanthopterygii and Ostariophysi species assure the absence of albumins in the composition of plasma proteomes (Dietrich et al., 2014, 2021; Vilchez et al., 2016; Banerjee et al., 2017; Schrama et al., 2017; etc.). Albumins were also not found in a number of nonmodel objects demonstrating a high level of identity with the proteins of model Acanthopterygii and Ostariophysi (Andreeva et al., 2015, 2017, 2019). These facts allow us to consider the loss of albumin genes as an event that goes beyond one family of Cyprinidae fish.

The conservativism of the organization of albumin polypeptide chains in all groups of Gnathostomata (with the preservation of their length, number of domains, and S–S-bonds, as well as the proportional preservation of these peculiarities with an increase in the length of albumin in lampreys) is probably explained by the fact that the protein organized in this way successfully performs the functions of transport and osmotic activity assigned to it. All these conservative traits are supported by selection with a relatively low similarity of amino acid sequences at an interorder level rarely exceeding 40%. At the same time, the support of the conservative organization of the polypeptide chain is reached by a high turnover of introns. We did not identify any examples of the coincidence of the lengths of the corresponding introns in paralogous genes and albumin orthologs. A dynamic turnover of introns is demonstrated by albumin paralogs in the sterlet and lower teleost fish (they did not have coincidences in the lengths of the corresponding introns and DNA repeats). Meanwhile, a deficiency in intron acquisition and turnover was noted for mammal genes with the conservative protein-coding regions (Roy et al., 2003). Taking into account that introns also include regulatory elements, their high turnover in the albumin genes of teleost fish probably contributed to the processes of their sub- and neofunctionalization.

Another evolutionary scenario of albumin transformations in teleost fish (in the form of its loss due to WGD) did not prevent the Teleostei taxon from developing and conquering new ecological niches. Against the background of dynamic transformations of genomes, a high level of intra- and interchromosomal rearrangements, and increased rate of evolution of protein-coding sequences, as well as a higher (compared with other vertebrates) rate of the turnover of introns and regulatory elements (Ravi and Venkatesh, 2018), the loss of albumin in Teleostei went unnoticed and other osmotically active proteins took over the function of albumin.