Introduction

One of the most fundamental questions in evolutionary developmental biology (evo-devo) is how the evolution of metazoan genomes led to the astonishing diversification of body forms that we observe on Earth and in the ocean (Mora et al. 2011). Among the bilaterians, vertebrates have unique anatomical aspects and possess the greatest number of cell/tissue types (True and Carroll 2002). It was speculated that the invention of genes with new functions underlies increasing developmental, morphological, and metabolic complexity during vertebrate early history (Carroll 2001). Particularly, in the early years of genomic research prior to the availability of large-scale animal genome sequence data, based on rather inaccurate indicators such as genome size and gene number, it was suggested that whole-genome duplications (WGDs) generated a large amount of raw material to prompt evolution of novelty and complexity in a short time during early vertebrate history (Ohno 1970; Ohno 1973). This notion popularly theorized “2R hypothesis” (two rounds of WGDs) has been widely debated (Abbasi 2008; Hughes 2001; Hughes and Friedman 2003). In the early years of genomic era, the 2R has remained inconclusive owing to the limited amount of sequence data availability, sensitivity of computational programs to noisy data, and lack of methodological rigor (Durand 2003).

HOX paralogon and the history of vertebrate genome

Relied on partial dataset, initial investigations presented several lines of evidence in favor of this hypothesis. The most important one is the occurrence of potentially quadruplicated HOX (HSA 2/7/12/17), MHC (HSA 1/6/9/19), and FGFR (HSA 4/5/8/10) regions in human and other mammalian genomes (Larhammar et al. 2002; Panopoulou and Poustka 2005). Among these large quadrupled regions, human HOX clusters and their nearby genes are probably the most cited example, whose organization was taken as strong support to ancestral tetraploidy in vertebrates (Furlong and Holland 2002). However, over the past few years, with the rapidly increasing availability of complete genome sequences from diverse set of animal species, evolutionary history of human HOX-bearing chromosomes has been subjected to rigorous scrutiny (Abbasi 2010b; Abbasi and Grzeschik 2007; Ambreen et al. 2014; Asrar et al. 2013; Hughes et al. 2001). Three major steps were applied for large-scale computational analysis of gene duplications: (1) identifying multigene families with at least threefold representation on HSA 2/7/12/17, (2) construction of gene family trees, (3) estimation of timing of duplications relative to a phylogeny of organisms, (4) tests of phylogenetic consistency. Combing the results of these methods, a total of 62 families provided results that were inconsistent with 2R hypothesis. Rather, it appeared that members of these families were created by segmental duplications, independent gene duplications, and translocation events, scattered at different times over the history of animals.

Reconstructing the history of mammalian HOX clusters

In addition to resolving the history of human HOX-bearing chromosomes, the history of duplication of HOX clusters themselves has been revealed in this large-scale phylogenetic investigation of human multigene families (Abbasi 2010b; Abbasi and Grzeschik 2007; Ambreen et al. 2014; Asrar et al. 2013). Reconstructing the duplication history of mammalian four HOX clusters remains problematic because the interspecies variation is not sufficient to resolve the cluster duplication events (Bailey et al. 1997; Zhang and Nei 1996). However, Zhang and Nei (1996) analyzed the phylogeny of mammalian HOX clusters and proposed two alternative topologies (((HOXC HOXD) HOXA)HOXB) and ((HOXC HOXD) (HOXA HOXB)). The former topology suggests three separate regional duplication steps (1 → 2 → 4 HOX clusters) whereas the later favors two rounds of whole-genome duplication events (2 + 2 topology).

Multigene families linked to HOX clusters share the same history as the HOX clusters

It was hypothesized that multigene families closely linked to four human HOX clusters share the same evolutionary history as the HOX sequences themselves and are good candidates to resolve the clusters duplication history (Bailey et al. 1997). Taking advantage of the well-annotated and high-quality human genomic sequence map, intragenomic conserved syntenic association was explored around four human HOX clusters. This survey of human HOX-bearing loci pinpointed triplicate/quadruplicate syntenic association of paralogs of at least four distinct gene families (SP, HNRNPA, ITGA, and FMNL) with HOX clusters (Fig. 1). The close physical linkage of members of these families with each of the human HOX clusters makes them an interesting test case to evaluate clusters duplication history (Ambreen et al. 2014). Current availability of an immense amount of protein sequence data from an expanding range of vertebrate and invertebrate species was exploited to conduct a robust and thorough phylogenetic investigation (Fig. 1) (Abbasi 2010b; Abbasi and Grzeschik 2007; Ambreen et al. 2014; Asrar et al. 2013). Intriguingly, results from each of these families provide strong bootstrap support for a tree where the HOXB cluster branched off first, followed by HOXA, and final HOXC/D (Fig. 1). This would be expected if human HOX clusters and members of SP, HNRNPA, ITGA, and FMNL families (HOX paralogous regions) evolved in conjunction through three independent events of segmental duplications (SDs) (Fig. 2).

Fig. 1
figure 1

History of gene syntenic segments spanning human HOX loci. Upper panel schematically depicts the human fourfold paralogy regions spanning HOX loci on HSA2/7/12/17. Paralogous gene sets are color-coded similarly. The “×” symbol denotes the putative gene losses. Lower panel depicts the schematic NJ tree topologies of the multigene family constituting the human HOX paralogy block. A congruent-type (AB)(C)(D) topology for the HOX, ITGA, HNRNP, SP, and FMNL families suggests that human HOX paralogy regions were quadruplicated by three rounds of segmental duplication events. HOX genes are depicted in red and designed 1 through 14. Color codes for non-HOX genes are as follows: Itga (Integrin alpha), light blue; Fmnl (formin-like), purple; Sp (Transcription factor family SP), green; Evx (even-skipped homeobox), light green; Meox (mesenchyme homeobox), dark blue; Hnrnp (heterogeneous nuclear ribonucleoprotein), pink. None of the features of this figure are drawn to scale

Fig. 2
figure 2

Model of HOX locus evolution in vertebrates. A model for the evolutionary history of human HOX gene cluster is proposed based on phylogenetic history of HOX gene clusters and unrelated neighboring genes. Left panel displays in detail the inferred events, whereas, the right panel is a summary flow chart. Vertebrate ancestry contained a single set of coherent HOX gene cluster (depicted in red). Based on comparative syntenic analysis, here we inferred the minimal block of synteny spanning the ancestral HOX gene cluster. This ancestral HOX gene cluster along with at least six neighboring genes (HOX locus) underwent three rounds of segmental tandem (SD) duplication and translocation events generated four copies of ancestral HOX locus. HOX B locus was the first to diverge, next HOX A, and finally the split of HOX C and HOX D occurred. These three SD events are depicted sequentially from top to bottom in both the right and left panel. Gene losses after each SD event are also shown. Color codes in this figure are the same as described in Fig. 1. None of the features of this figure are drawn to scale

Minimal HOX locus in the Urbilaterian

Interchromosomal syntenic association and the tree topology comparison-based predication of HOX clusters evolution demand the constituent genes were linked prior to vertebrate origin. To test this assumption, heres a survey of Drosophila (fruit fly) and recently sequenced Tribolium (red flour beetle) HOX loci was conducted (Richards et al. 2008). Comparative genomic data from fruit fly, red floor beetle, and human, made it possible to reconstruct the genic content of minimal HOX locus in the Urbilaterian (common ancestor of bilaterians) (Fig. 3). From this reconstruction, it can be seen that, canonical HOX genes are clustered with SP, HNRNPA, ITGA, and FMNL genes, at the Urbilaterian HOX locus (Fig. 3). Given the relatively well-preserved genomic organization of HOX loci in three distantly genomes analyzed, the approximate ancestral order of the genes is also predicted (Fig. 3). Thus, intergenomic synteny analysis lending support to the hypothesis based on intragenomic synteny and phylogenetic data that these genes duplicated in concert with HOX clusters through SD events (Fig. 2). It is notable that SP, HNRNPA, ITGA, and FMNL are novel examples of genes which are present on the same genomic segment as the HOX cluster in the last common ancestor of bilaterians. Such an arrangement implies that these genes from distinct families have remained linked with HOX clusters over hundreds of millions of years of evolution, suggesting functional implications. These associations might reflect ancient cis-regulatory constrains with multiple genes sharing cis-regulatory elements or cis-elements of developmental regulators are often located at large distances from transcriptional start site of the gene upon which they act (Parveen et al. 2013).

Fig. 3
figure 3

Cladogram depicts HOX locus architecture for representative animals. At the base is shown, the putative organization of genic content of the bilaterian (protostome and deutrostome) HOX locus. Gene content order of Urbilaterian (ancestral bilaterian animal) HOX locus was inferred by comparative gene order analysis of extant representatives of protostome (Drosophila and Tribolium) and deutrostome (quadruplicated human HOX loci) animals. The left branch depicts the fragmented HOX loci of the Drosophila and cb. The right branch portrays four coherent human HOX loci. Color codes in this figure are the same as described in Fig. 1. None of the features of this figure are drawn to scale

Ancient vertebrate genome was shaped by small-scale events

Did the paralogy regions on human HOX-bearing chromosomes arise through 2R at the origin of vertebrates? If the HOX clusters themselves were the products of ancient whole-genome duplications? These questions were hotly debated but a definitive answer to these questions has remained elusive because of the paucity of extensive genomic sequence data in early years of genomic era. However, recent availability of high quality whole-genome sequence data from diverse set of animal species and accurate gene predication annotation pipelines is ideal to test whether the human HOX-bearing chromosomes and HOX clusters themselves are remnants of two rounds of polyploidy in vertebrate ancestry (Abbasi 2010a). Indeed, the combined application of the mapself comparison approach and comparison with a preduplication species, together with robust and thorough phylogenetic analysis of human multigene gene families provide an unprecedented opportunity to gain insight into the mechanisms that contributed to the evolution of ancestral vertebrate genome. These results are not consistent with WGD hypothesis. Instead, it appears that paralogy blocks residing on human HOX-bearing chromosomes and HOX clusters themselves resulted from small-scale events which include, segmental duplications, independent gene duplication, and translocations. Furthermore, this work discounts the contention that a burst of gene duplication activity took place in the early vertebrate history after the divergence of invertebrate lineage. It appears that current hierarchy of human proteome is created by small-scale events, scattered at different times over the evolutionary history of animal’s life.

Conclusion and future perspectives

Over the past few years, a growing body of evidence suggests that apparently higher morphological and developmental complexity seen in modern vertebrates must be accounted for by inventions other than adding new genes by duplications. These include: (a) evolution of alternative protein domains in ancient genes through changes in parts of protein-coding sequences not essential for current function, (b) alternative spliced forms of same gene, (c) evolution of transcriptional regulation, expressing ancient protein in a novel tissues or developmental compartments/stages, (d) origin of new chimeric genes through exon shuffling, (e) new coding sequences and cis-regulatory elements may emerge de novo from non-coding genomic sequences. The challenge for the future studies is to extend beyond the traditionally well-studied source of gene duplication and portray a comprehensive view of the interplay of all the aforementioned mechanisms that drive vertebrate evolution during their early history.