Introduction

The gap gene network of the fruit fly Drosophila melanogaster is one of the most thoroughly studied developmental gene regulatory networks. There are hundreds of publications in the literature describing genetic and molecular analyses of gap genes, their expression, regulation, and their regulatory effect on downstream targets, and yet, we are far from a complete understanding of its pattern-forming and regulatory capacities, not to mention its evolutionary history.

Gap genes have attracted the interest of developmental, evolutionary, and systems biologists for three main reasons: First, they play a key role in patterning the early embryo. The gap gene system implements the most upstream regulatory layer of the segmentation gene network, which determines both the position and the identities of body segments [1, 2]. It solves a fundamental problem of embryonic patterning: how to establish discrete territories of gene expression based on regulatory input from a long-range protein gradient [35]. Such gradient-based patterning occurs in most multi-cellular organisms studied so far (see [610] for recent reviews).

Second, gap genes played a crucial role during the evolution of segment determination. While most segmented animals—arthropods, annelids, and vertebrates—add segments to their body sequentially during growth, some higher insects have evolved a mode of segment determination in which segments form by simultaneous subdivision of the embryo. This is called the long-germband mode of segment determination. It appears to have evolved many times independently [11, 12], a process which probably involved the recruitment of gap genes into the segmentation gene network [13, 14].

Finally, the gap gene network has become one of the few examples of a developmental gene network, which can be studied using data-driven mathematical modeling. Such modeling studies have allowed us to reconstruct the regulatory structure of the gap gene network in silico, to assign particular patterning functions to each regulatory interaction, and to study regulatory feedback based on gap gene cross-regulation in the intact, wild-type system [1524]. These analyses establish the gap gene network as a model system for the quantitative study of the developmental and evolutionary dynamics of pattern-forming processes.

In this review, I summarize what is known (and what is not) about regulation of gap genes. The information presented here is predominantly based on genetic and molecular evidence. In addition, I have included evidence from selected mathematical models, if (and only if) those models closely adhere to experimental data, and provide specific biological predictions or insights into gap gene regulation. A comprehensive historical review of modeling the Drosophila blastoderm is provided elsewhere [25].

The review is structured as follows: After a brief introduction to segmentation, maternal inputs to the gap genes, and the dominant (but inaccurate) conceptual framework traditionally used to interpret pattern formation in the early Drosophila embryo, I describe phenotypes, expression, and regulation of gap genes in separate sections. This is followed by brief sections summarizing the molecular nature of gap gene regulation, the issue of precision of gap gene expression, as well as gap gene evolution. Since this is a work of reference, not all of these sections need to be read in sequence. Each section is designed to be understandable without the others. Readers interested in specific aspects of gap gene regulation are encouraged to skip ahead to those parts of the review that are relevant to them.

Segmentation genes and segment determination

The gap gene network is involved in segment determination during early embryogenesis. As mentioned above, body segments can be determined in two ways: either they are formed sequentially, by adding them to the posterior end of a growing embryo (short-germband development), or (more or less) simultaneously, by subdividing an embryo into equally sized sub-domains (long-germband development). While vertebrates, annelids, and most arthropods use the former mode of segmentation, insects show both types (including many intermediates between the two extremes; see [11, 12, 26, 27] for review).

Early insect development typically proceeds through syncytial cleavage and blastoderm stages (Fig. 1a) [11, 28, 29]. During these early stages, nuclei divide rapidly and almost simultaneously without becoming separated by cell membranes. Each nucleus is surrounded by microtubule-rich cytoplasm, with which it forms a unit called an energid. Towards the end of the cleavage stage, most energids start to migrate. After a number of cleavage divisions (nine in Drosophila), they arrive at the surface of the embryo to form the syncytial blastoderm, a peripheral layer of nuclei lying within a zone of yolk-free periplasm. At this stage, embryos are most conveniently classified by the number of nuclear divisions: cleavage cycle n, corresponds to the period between mitosis n − 1 and mitosis n [30]. These cycles become increasingly longer during the blastoderm stage (from about 10 to 50 min between cycles 10 and 14A in Drosophila; [30, 31]). The embryo becomes cellularized through invagination of cell membranes between nuclei. Subsequently gastrulation starts, during which the three germ layers (ecto-, endo-, and mesoderm) are formed. This is followed by extension and retraction of the germband. Tissue rearrangements occur mainly during and after gastrulation.

Fig. 1
figure 1

Segment determination in Drosophila. a The first 3 h of development of Drosophila melanogaster. Numbers indicate cleavage cycle number, where cycle n covers the time between mitosis n − 1 and mitosis n. The blastoderm stage lasts from 1 min into cycle 10 to the onset of gastrulation (grey background). The embryo remains syncytial (without membranes between nuclei) until cellularization occurs during cycle 14A. The cellular blastoderm stage is more or less instantaneous, since gastrulation begins immediately after cellularization is complete. Cycle 14B denotes the part of cycle 14, which occurs after the onset of gastrulation. Embryos are shown with the anterior pole to the top. b The regulatory hierarchy of the Drosophila segmentation gene network. Segment determination is based on a molecular pre-pattern established by the segmentation genes, which are active during the blastoderm stage. Different regulatory tiers of the network can be distinguished based on mutant phenotypes, epistatic interactions, and expression patterns. Maternal co-ordinate genes are expressed in broad gradients (Bcd protein distribution is shown as an example). They regulate the zygotic gap genes, expressed in broad overlapping domains (the central domain of Kr is shown). Gap genes and pair-rule genes together regulate pair-rule genes, which are expressed in 7–8 stripes (shown for Even-skipped (Eve) protein). Pair-rule genes in turn regulate segment-polarity genes whose expression in 14 stripes becomes established just before the onset of gastrulation (shown for en mRNA). These stripes constitute the segmentation pre-pattern and correspond to the positions of parasegmental boundaries later in development. Arrows indicate regulatory interactions between classes of segmentation genes. Circular arrows represent cross-regulation within a class. Embryo images are shown with anterior to the left, and dorsal up (see text for details). a is reproduced with permission from the Journal of Cell Science: http://jcs.biologists.org [30]. b Embryo images (Bcd, Kr, and Eve) are from the FlyEx database [164, 166]. The image of en is courtesy of Carlos E. Vanario-Alonso

Short- versus long-germband modes of development are reviewed in [11, 12]. In most short- and intermediate-germband insects, the blastoderm embryo occupies only a small fraction of the egg (the remainder consists of yolk and extra-embryonic tissue). A number of anterior segments become determined during the blastoderm stage, while posterior segments are added after gastrulation. In contrast, most long-germband embryos take up a large proportion of the egg, and segment determination occurs before the onset of gastrulation. No tissue growth is involved in this process. The morphological formation of segments occurs much later in development; segmental boundaries are clearly visible at the extended germband stage.

The first systematic molecular study of the process of segment determination was carried out in the fruit fly Drosophila melanogaster. Like all dipterans, Drosophila is a long-germband insect [12]. In vitro culture and transplantation experiments established that segment determination occurs at the blastoderm stage [32, 33], 1.5–3 h after egg laying (AEL) [30]. In the late 1970s, methods were developed to saturate the genome of Drosophila with mutations, and to efficiently select for segmentation phenotypes among the mutant progeny [34]. This led to the identification of several dozen genes involved in axis patterning and segmentation [3537]. The resulting mutant phenotypes were easily classifiable into distinct groups: Mutations affecting the minor (dorso-ventral, D–V) embryonic axis rarely affected patterning along the major (antero-posterior, A–P) axis and vice versa. Zygotic mutants in A–P patterning could be further subdivided into those lacking entire regions of the embryo (gap), those missing every other segment (pair-rule), and those affecting polarity within segments (segment-polarity genes). Screens for maternal factors affecting segmentation uncovered an additional class involved in A–P patterning: the maternal co-ordinate genes [38]. These genes can be subdivided into anterior, posterior, and terminal maternal systems, depending on the regions of the embryo that are affected in the corresponding mutants.

In the decade after the initial screening efforts, segmentation genes were cloned and analyzed molecularly (reviewed in [1, 2]). They encode transcriptional or translational regulators, or proteins involved in signal transduction. Genetic analyses of their epistatic relationships revealed that these factors form a complex hierarchical network of regulatory interactions. The distinct groups of phenotypes correspond to distinct layers in the regulatory hierarchy of the network (Fig. 1b): maternal co-ordinate genes regulate gap genes; both of them jointly regulate pair-rule genes, which in turn regulate the initial expression of segment-polarity genes. In addition, all classes of segmentation genes show cross-regulation. In contrast, products of genes in the lower tiers of the network do not regulate genes in the layers above. Pair-rule genes, for example, do not regulate gap genes and so on.

At the same time, segmentation gene expression patterns were visualized by in situ hybridization and antibody staining. These studies revealed that genes in each layer of the segmentation gene network are expressed in similar patterns, which are clearly distinguishable from those of genes in other layers (Fig. 1b). The protein products of the maternal co-ordinate genes form long-range gradients along the A–P axis. Gap genes are expressed in broad, overlapping domains about 10–20 nuclei wide. The first periodic expression patterns occur at the level of the pair-rule genes, which are expressed in seven to eight stripes, each being about four nuclei wide. Segment-polarity genes show expression in 14 narrow stripes, which form a molecular pre-pattern involved in positioning the morphological segment boundaries later in development. This occurs through the formation of parasegment boundaries—tissue compartment boundaries between cells expressing distinct segment-polarity genes that no cells can cross—which are phase-shifted with regard to the morphological segmental boundaries [3941]. At the same time, segment identity is established by the expression of homoeotic (Hox) genes during the late blastoderm stage [42]. Hox gene expression is regulated by maternal co-ordinate and gap genes.

Maternal systems, gradients, and the French Flag paradigm

Gap genes receive their initial regulatory inputs by three sub-groups of maternal co-ordinate genes. The anterior and posterior maternal systems are based on long-range gradients of maternal proteins along the A–P axis.

During oogenesis, the mRNA of bicoid (bcd) is localized to the anterior pole of the embryo by other components of the anterior system, such as the protein products of swallow (swa), exuperantia (exu), and staufen (stau) [43, 44]. After fertilization, bcd mRNA spreads further posterior, forming a gradient along the A–P axis [43, 45]. Bcd protein is thought to diffuse from its predominantly anterior source to form an exponential anterior-to-posterior gradient (Fig. 2a) [4648]. Bcd has been shown to regulate zygotic target genes in a concentration-dependent manner [4956]. In addition, it represses translation of the ubiquitous maternal caudal (cad) mRNA, establishing a posterior gradient of Cad protein [5761]. This gradient spans the middle third of the embryo while Cad is present at uniformly high levels in more posterior regions (Fig. 2a).

Fig. 2
figure 2

Maternal gradients and French Flags. ac Three maternal systems regulate the expression of gap genes: a The anterior system is based on the Bcd gradient, which regulates gap gene transcription in a concentration-dependent manner and also establishes the posterior gradient of Cad through translational repression. b The posterior system is based on the Nos gradient, whose only function is to repress the translation of maternal hb mRNA in the posterior region of the embryo to form an anterior Hb protein gradient. c The terminal system is based on Tor signaling from both terminal ends of the embryo, which induces the expression of the terminal gap genes tll and hkb at both poles of the embryo. Expression profiles are based on integrated data from the FlyEx database [164, 166], except for Nos, which is illustrated by a mirrored Bcd gradient due to the absence of quantitative Nos expression data. d Wolpert’s French Flag model: A morphogen is produced at a source (shown in green), diffuses through the tissue (without protein degradation) and is degraded at a sink (pink), at the other end of the tissue. Specific concentration thresholds in the resulting linear gradient (T 1, T 2) are detected by cells (or nuclei) in the tissue, which switch on alternative target genes (represented by blue, white, and red), which in turn lead to distinct differentiation pathways in each region of the embryo. In this model, development is seen as a two-step process: First, positional information is implemented by the morphogen gradient (step 1). Subsequently, cells in the tissue passively interpret this information (step 2). Concentration thresholds in the gradient correspond exactly to borders of downstream expression territories. e A revised French Flag, incorporating target domain shifts and increasing precision over time. New evidence shows that maternal gradients are not sufficient to determine precise downstream boundary positions on their own. Instead, cross-regulation among target genes leads to (a) shifts in boundary positions over time and (b) an observed increase in the precision with which boundaries are placed. In this model, there is no longer a precise correspondence between concentration thresholds in the gradient and the final position of target domain boundaries

The posterior system works in a similar way: mRNA of the main posterior determinant nanos (nos) diffuses and becomes trapped at the posterior pole of the embryo [62]. Only its posteriorly localized pool is actively translated [6365]. This is thought to establish a posterior-to-anterior gradient of Nos protein (Fig. 2b). In contrast to Bcd, Nos does not function as a transcriptional regulator (and thus does not affect gap genes directly), but instead acts as a translational repressor of the uniformly distributed maternal hunchback (hb) mRNA establishing an anterior Hb protein gradient (Fig. 2b) [6672]. Translational regulation of maternal hb is likely to be Nos’ only essential contribution to segmentation gene expression, since embryos from mothers mutant for both nos and hb are viable [6870].

The maternal gradients of Bcd and Hb specify the position of gap domain boundaries in a concentration-dependent manner [5052, 55, 56, 73]. In 1968, Lewis Wolpert had suggested a model—using an analogy to the French Flag—of how such positional specification can be achieved (Fig. 2d) [3, 4]. He proposed that there are specific concentration thresholds in the gradient, which can be detected by cells in the tissue. The cells thus ‘interpret’ the gradient by initiating expression of different sets of target genes, depending on whether they experience regulator concentrations above or below a given threshold. This provides a straightforward and testable hypothesis for a global patterning mechanism in which the maternal gradient imposes positional information onto its target tissue.

Wolpert repeatedly used the positioning of gap domain boundaries as an example of the French Flag mechanism [7476]. However, other authors have criticized this proposition as not being robust, since it depends too strongly on precise measurement of gradient concentrations (see [77, 78], and the appendix of [74]). Even Wolpert himself has stressed the importance of local regulatory interactions [74, 79]. Alternative models were proposed, in which gradient-based patterning is complemented by cross-regulation among downstream targets [77, 78, 80, 81]. Current evidence indicates that such target gene cross-regulation is indeed essential for the patterning function and robustness of the gap gene network [5, 15, 16, 22, 23, 82].

Terminal gap genes and the terminal maternal system

In contrast to the long-range gradients described above, which are involved in patterning the segmented, central region of the embryo, the terminal system is based on localized signaling through the Torso (Tor) MAP-kinase cascade at both poles of the embryo (Fig. 2c; reviewed in [83]). Tor signaling acts predominantly through activation of head gap genes (discussed below) [55, 84] and the terminal gap genes tailless (tll) and huckebein (hkb) [8587]. Activation of the latter is achieved through localized relief from constitutive repression [8893] and depends on the strength of the Tor signal [9498]. The expression domains of tll and hkb are missing in loss-of-function mutants of the terminal system [94, 96, 99] and are expanded centrally in gain-of-function alleles of Tor signaling [100]. Bcd and the D–V system play an important part in the regulation of the anterior tll and hkb domains [94, 101, 102]. In contrast, posterior expression of tll and hkb largely depends on the terminal system [94, 99, 103], with the notable exception of a subtle fine-tuning effect of the posterior system on the extent of tll and hkb de-repression [104]. These domains are not affected at all by any other gap genes [95, 99, 105], and therefore provide an independent, external input to the rest of the gap gene network. For this reason, their regulation will not be discussed further below.

Gap genes: phenotypes, gene structure, and protein products

This review mainly focuses on the gap genes hunchback (hb), Krüppel (Kr), knirps (kni), and giant (gt) involved in patterning of the segmented trunk (gnathal, thoracic, and abdominal) region of the embryo. Gap genes were initially defined based on their mutant phenotypes, which exhibit deletions in one or two contiguous regions of the embryo covering multiple segments [34]. Only hb has a maternal component. Embryos without zygotic hb lack the labial and all thoracic segments and show defects in the posterior abdomen [34, 36, 106, 107]. Mutants, which lack both maternal and zygotic hb, have a more severe phenotype: they have no gnathal and thoracic segments, and exhibit mirror duplications of anterior abdominal and enlargement of posterior abdominal segments [106, 108]. This phenotype can be rescued if a single copy of zygotic hb is supplied paternally [106]. Kr null mutants show deletion of thoracic and anterior abdominal segments as well as frequent mirror duplications in the abdomen [34, 35, 109111]. kni mutant embryos show defects in the head plus all but the most posterior abdominal segments [34, 36, 112115]. Finally, strong gt alleles show defects in the head and the fifth to seventh abdominal segments [37, 116118]. All of these phenotypes only appear about 10–20 min after the onset of gastrulation.

Unlike the clustered Hox genes, gap genes are dispersed throughout the genome (Table 1). Each trunk gap gene is located on a different chromosome arm [3437, 66, 118120] (only tll and hkb map to the same arm as hb [95, 121123]). Like other genes that are expressed during the blastoderm stage, gap genes are all unusually compact: zygotic transcripts are short (Table 1; about 1–3 kilobases (kb), with at most one or two short introns [66, 95, 101, 120, 123125]). Such compact gene structure seems to be required for gap gene expression during the extremely short mitotic cycles of the early blastoderm stage, as the much longer maternal and late zygotic transcript of hb (about 6 kb; Table 1) and the knirps-related (knrl) gene (a duplication of kni, which seems to be functionally redundant but contains a much larger intron) only become expressed during the extended interphase of cleavage cycle 14A [114, 126].

Table 1 Gap genes, transcripts, and proteins

All Drosophila gap genes encode transcription factors (Table 1): Hb, Kr, Kni, Tll, and Hkb contain zinc-finger DNA-binding domains [66, 67, 95, 120, 123, 124, 127, 128]. Kni and Tll belong to the steroid receptor superfamily [120, 123]. Gt belongs to the basic leucine zipper (bZip) family [125]. All gap proteins show predominantly nuclear sub-cellular localization [61, 67, 129132]. The transcription factors encoded by gap genes usually act as transcriptional repressors (see, for example, [100, 133147]), although there is evidence for activation in some specific cases [102, 134, 135, 148, 149].

Apart from being involved in segment determination, most gap genes have additional roles later in development: hb, Kr, tll, and hkb are involved in neurogenesis [105, 132, 150155]. Kr is required for the development of the malpighian tubules and trachea [111], larval photoreceptor organs [156], muscles [157], and extraembryonic tissue [130]. kni is involved in tracheal [158], gut [159, 160], and wing-vein development [161, 162]. hkb is required for gut development [87, 95, 102].

Gap gene expression and regulation

In the blastoderm embryo of Drosophila melanogaster, the trunk gap genes hb, Kr, kni, and gt are expressed and regulated in two clearly distinguishable phases (Fig. 3) [19]: Early gap gene expression is established through strictly feed-forward regulation by maternal gradients, and each gap gene is regulated independently. At this stage, expression is highly variable; gap domain boundaries sharpen, but their positions do not shift over time [19]. During cleavage cycle 13—as gap proteins start accumulating in significant amounts—gap–gap cross-interactions begin to introduce feedback regulation to the system. These mostly repressive cross-regulatory interactions are involved in sharpening and maintaining gap domain boundaries [163], but also lead to dynamic shifts in the position of expression borders during cycle 14A [15, 16, 2224, 61]. The regulatory logic of the system becomes much more complex at this stage as gap gene expression patterns become dependent on each other. After providing a brief description of gap gene expression patterns, I will analyze each of these two separate regulatory stages in detail.

Fig. 3
figure 3

Early versus late gap gene regulation. Gap gene regulation can be divided into two distinct phases: early regulation of gap mRNA domains is based on maternal gradients only, while late regulation of protein domains involves gap–gap cross-regulatory interactions. The position of gap domains along the major, or antero-posterior (A–P) axis of the embryo is shown schematically as colored boxes. Only the trunk region of the embryo (approx. 35–95% A–P position) is included in the diagram. Anterior is to the left, posterior to the right. Background color represents activating inputs by Bcd and Cad. Top panel: arrowheads represent activating; T-bars represent repressive inputs responsible for setting specific domain boundaries. Bottom panel: arrows and T-bars represent activating and repressive gap–gap cross-regulation, respectively. Circular arrows represent auto-activation. The thickness of the T-bars corresponds to repressive strength. Question marks indicate missing or ambiguous evidence, or other open questions regarding gap gene regulation (see text for details)

Expression patterns

Quantitative mRNA expression patterns for Kr, kni, and gt at the early blastoderm stage have been published in Jaeger et al. [19], early hb mRNA expression has been analyzed quantitatively in [56], while protein expression patterns during later cycles are described in detail in [61]. A comprehensive data set of gap protein expression patterns—at high temporal and spatial resolution—is available online from the FlyEx database (http://urchin.spbcas.ru/FlyEx) [164166]. Additional mRNA expression patterns at lower temporal resolution are available from the Berkeley Drosophila Genome Project (BDGP) in situ database (http://www.fruitfly.org/cgi-bin/ex/insitu.pl) [167169]. Moreover, the BDGP is developing a database of three-dimensional, quantified mRNA expression patterns in the early Drosophila embryo [170172].

Transcription is initiated at slightly different times for each gap gene during the early blastoderm stage (Fig. 4). The earliest reported expression patterns are transient localized domains of kni and Kr, which appear during pro-, meta-, and anaphase of cleavage divisions 9 and 10 respectively (Fig. 4, inset) [19, 173]. These early domains vanish again during telo- and interphase, only to reappear during the subsequent mitosis. The function (if any) and regulation of these early domains is unknown.

Fig. 4
figure 4

Early gap gene expression. mRNA distribution is visualized by fluorescent in situ hybridization for Kr, kni, and gt during early blastoderm stage (cycles 11–13). The inset shows transient early Kr expression during mitosis 11. Embryo images are from [19], shown with anterior to the left, dorsal up. Plots show individual one-dimensional expression profiles for each gene from the middle 10% along the dorso-ventral (D–V) axis at late cycle 13, illustrating the large embryo-to-embryo variability of the patterns at this stage. Relative mRNA concentration is plotted against position along the A–P axis (in %, where 0% is the anterior pole) (see [19, 165] for details on data quantification)

The earliest detectable expression patterns of gap genes during interphase are those of hb [56] and tll [94], which both appear during cleavage cycle 9. Some embryos initiate expression of gt during cycle 11, while most only show detectable gt expression during cycle 12 [19, 118, 131]. Kr can also be first detected during cycle 12 [19, 174]. The last gap gene to become expressed during interphase is kni. Some authors have reported its appearance during interphase of cycle 12 [113, 175] while others have only been able to detect it during mitosis 12 and early cycle 13 [19].

What all early gap mRNA domains have in common is that their initial expression is weak and appears as a dotted nuclear signal (Fig. 4) [19]. During early cycle 13, levels of transcription increase dramatically, and nuclear export leads to increasing accumulation of gap gene mRNAs in the cytoplasm, where they are translated [19]. Moreover, early expression of Kr, kni, and gt is highly variable, as positions of early gap domain boundaries at the mRNA level differ by as much as 10–15% egg length between embryos of the same age (Fig. 4, bottom row) [19]. In contrast, early expression of hb appears to be surprisingly precise at cleavage cycle 11 already [56] (see also below).

Zygotic protein products of gap genes appear later than their respective mRNA domains. Kr and Gt proteins become detectable during cycle 12, while Kni only appears during cycle 13 [61, 176]. The accumulation of zygotic Hb protein is difficult to monitor, as it is chemically indistinguishable from maternal Hb. While the maternal Hb gradient gradually transforms into its zygotic expression pattern in the anterior half of the embryo during cleavage cycles 10–13 [61, 67], at least some maternal Hb protein persists until the onset of cellularization [177]. Terminal gap gene products Tll and Hkb have only been detected in early cycle 13 ([61]; J. Jaeger, unpublished). However, the much earlier appearance of tll mRNA suggests that they may already be present before that.

Gap gene expression during the late blastoderm stage is very dynamic (Fig. 5). After their initial establishment, gap domain borders sharpen [163] and those of Kr, kni, and gt in the posterior region of the embryo shift anteriorly during cleavage cycle 14A [15, 16, 22, 23, 61], while the posterior domain of hb only appears during early cycle 14A [66, 67, 126, 178]. Similarly, the dynamics of gap gene expression changes dramatically in the anterior of the embryo during this stage. The broad and relatively uniform anterior expression of hb refines into a stripe at the position of parasegment 4 (PS4) and more irregular and weaker expression further anterior [66, 67, 126, 178]. The anterior domain of gt splits into two stripe-like domains and an additional dorsal patch of expression appears anterior to these [118, 131, 176]. The ventral, anterior domain of kni—which is not involved in segment determination [115])—expands dorsally at its posterior margin to form an L-shaped pattern during mid-cycle 14A [113, 175]. Finally, anterior and posterior domains of Kr appear, which also do not play any role in segmentation [130, 150, 179, 180].

Fig. 5
figure 5

Late gap gene expression showing dynamic shifts in gap domain positions. Protein expression patterns are shown for Hb, Kr, Kni, and Hb at eight time classes during cycle 14A (T1–T8) [61]. Plots show integrated one-dimensional expression patterns from the middle 10% along the D–V axis over time, illustrating the anterior shift in boundary position for all expression domains posterior of the central Kr domain. Relative protein concentration is plotted against position along the A–P axis (in %, where 0% is the anterior pole). Embryo images and integrated data for plots are from the FlyEx database [164, 166], shown with anterior to the left, dorsal up (see [165] for details on data quantification)

Early regulation of gap genes by maternal gradients

Since gap gene mRNAs appear before gap proteins (and do not play any role in gap–gap cross-regulation) initial regulation of localized expression must depend exclusively on maternal gradients. While gene expression in head gap domains is activated by the terminal system (see below) [55, 84], the only maternal gradients that are known to directly regulate gap gene transcription in the trunk region are the activator gradients of Bcd and Cad as well as the repressor gradient of Hb (Fig. 2a, b) [19]. Early gap gene regulation depends on a delicate balance between activation and repression (summarized in Fig. 3, top panel).

Cad activates the posterior gt domain, which is absent or very strongly reduced in embryos mutant for maternal and zygotic cad [103, 181], and—in concert with Bcd—the abdominal domain of kni, which is absent in embryos lacking both maternal Bcd and Cad [181, 182]. Expression of hb and Kr is not affected in cad mutants [103] or embryos over-expressing cad [183].

Bcd activates the anterior domains of gt and hb, which are absent in embryos from bcd mutant mothers [67, 131, 176, 181]. In the case of hb, activation occurs through Bcd binding sites in the hb regulatory region [50, 52]. The evidence is far more complicated for activation of Kr by Bcd: Early studies indicated that Kr is activated by ubiquitous maternal transcription factors [184], while Bcd was thought to repress Kr since the central Kr domain expands anteriorly in embryos from bcd mutant mothers [108, 185, 186]. However, exactly the same expansion can be seen in gt; hb double mutants. This indicates that the effect is indirect [187], as both anterior gt and hb domains are absent in a bcd mutant background [67, 176]. Later molecular studies identified a regulatory element of Kr containing multiple Bcd binding sites whose expression depends on the presence of Bcd [188, 189]. This suggests activation of Kr by Bcd. The fact that Kr expression is still present in embryos without Bcd can be explained either by an activating effect of Hb at low concentrations [108, 190, 191] or redundant activation of Kr by Cad [15] (see also below).

Maternal Hb is required for robust early expression of hb [56]. In addition, it represses Kr, kni, and the posterior domain of gt: It binds to the regulatory region of Kr [189] and Kr expression expands anteriorly in hb mutants [163, 179, 185]. The abdominal domain of kni expands anteriorly in zygotic mutants of hb; expression in its expanded domain is much stronger in embryos lacking both maternal and zygotic Hb [108]. Both abdominal kni and posterior gt domains are lacking in embryos with Hb present in the posterior region of the embryo [113, 120, 131, 175, 176, 187]. In contrast, Hb does not seem to have an effect on the anterior domain of gt. This could either be because the anterior and the posterior domains of gt are regulated by different enhancer elements, implementing different regulatory mechanisms [192194], or because Bcd and Cad modulate the effect of Hb on gt where they are present [19] (see also below). A similar dependence on third factors has been demonstrated for the effect of Hb on the regulation of stripes 2 and 3 of the pair-rule gene even-skipped (eve), where Hb activates expression in stripe 2 due to modulation by Bcd, while it represses stripe 3 on its own [195198].

Regulation of target genes by Bcd and Hb is concentration-dependent [50, 51, 55, 56, 73, 190]. How is this achieved at the molecular level? Two alternative explanations have been provided: Activation of some Bcd target genes depends on the number and affinity of Bcd binding sites. Regulatory elements of the head gap gene orthodenticle (otd) (see below) and hb contain a mixture of both high- and low-affinity Bcd binding sites [52, 53], while the regulatory region of kni contains a tightly spaced array of six high-affinity sites [181]. However, a more comprehensive survey of Bcd target genes found no correlation between Bcd binding site number and affinity and the position of the target gene’s boundary along the A–P axis [194]. In this case, boundary position depends on the context of the Bcd binding sites, i.e., the presence of additional binding sites for third factors—such as Hb or Kr—in a regulatory element. Such context-dependence has also been found in an equivalent survey on Hb targets [73]. The importance of genomic context is further corroborated by the fact that many homo- and heterotypic combinations of binding sites are significantly enriched in regulatory regions of segmentation genes [199].

In contrast to the concentration-dependent effect of Bcd and Hb, Cad only activates gap genes in the posterior of the embryo, where its concentration is high and constant across space (Fig. 2a) [61]. Although Cad is required for the normal expression of these genes, there is no evidence that it is actively involved in positioning any early gap domains.

The evidence presented above strongly suggests that multiple gradients are required for the placement of most gap domains. This is further supported by the fact that domains of segmentation genes—and the fate map of the embryo in general—shift less in mutants with varying doses of bcd than expected if they would depend on Bcd alone [49]. It has been proposed that regulatory synergism between maternal Hb and Bcd could account for this effect [200, 201]. However, the exact molecular nature of this synergism remains unclear.

Alternatively, the reduced shifts in bcd dosage mutants could be explained by Bcd not reaching its steady state until late during the blastoderm stage [202, 203]. However, there is currently no evidence supporting this proposition, and it has been demonstrated that the gradient of nuclear Bcd protein remains stable throughout the relevant stages of development [48, 61].

Maternal gradients can position target gene expression boundaries in two different ways: Activator gradients induce boundaries with the same polarity as the gradient itself, while repressor gradients position counter-polar boundaries (Fig. 6). Accordingly, Bcd can only set posterior boundaries of gap domains, while repression by Hb is the only available mechanism for positioning anterior borders (see Fig. 3, top panel). For example, in the abdominal domain of kni, repression by Hb positions the anterior boundary [19, 108]. Bcd appears to be responsible for establishing the posterior boundary [181], although this border is only partially developed before repression by terminal gap genes leads to the full retraction of kni from the posterior pole of the embryo during cycle 13 [19, 61].

Fig. 6
figure 6

Two ways of setting expression domain boundaries. Such boundaries can only be set by an activation threshold (left)—which implies the same polarity for the regulator gradient and the regulated boundary—or by repression (right)—which implies opposite polarity for regulator and regulated target

In light of this, there is a problem for positioning early gap domain boundaries in the central and anterior region of the embryo where the concentration of Hb changes very rapidly during cycles 10–13 [19]. It is unclear how a balance between Bcd activation and Hb repression can be achieved in this region to position, for example, both boundaries of the central Kr domain. Despite the rapidly changing concentration of Hb, early boundaries of Kr remain at a constant position during cycles 12 and 13 (Fig. 4). Mathematical models of early gap gene regulation corroborate the fact that Hb repression is insufficient for placing these borders [19].

To avoid these problems, it has been suggested that Kr is repressed at high and activated at low concentrations of Hb (see Fig. 3, bottom panel). Such a concentration-dependent switch between activation and repression has been observed in assays with cell lines carrying reporter constructs that monitor the regulatory effect of transcription factors such as Kr [134] or Engrailed (En) [135]. Cells were exposed to varying levels of regulator concentration. However, it is difficult to establish whether such an effect occurs at physiologically relevant regulator concentrations. Although mathematical models incorporating such a switch can lead to a gap-like (bell-shaped) target gene expression profile [204], these models still fail to reproduce the stability of Kr boundaries over time in the presence of a rapidly changing Hb repressor gradient [19].

In summary, the evidence presented above suggests that known maternal gradients are not sufficient to account for early gap gene regulation, and we may still be missing a maternal regulator required for the establishment of early gap domain boundaries [19].

Gap gene cross-regulation and gap domain shifts

As mentioned above, gap gene regulation can be subdivided into an early (maternal-only) and a late phase (including gap-gene cross-repression). Due to its complexity, it is useful to further subdivide the latter into five separate regulatory mechanisms (Fig. 7): (a) broad activation of gap genes by maternal gradients of Bcd and Cad. (b) gap gene auto-activation. (c) Strong mutual repression between gap genes that show complementary expression patterns (hb and kni; Kr and gt). (d) Weaker, asymmetric repression between overlapping gap genes (Hb on gt; Gt on kni; Kni on Kr; Kr on hb, and Hb on Kr) and (e) repression by the terminal gap genes tll and hkb in the pole regions of the embryo. In the following sub-sections, I will discuss each of these mechanisms in turn.

Fig. 7
figure 7

The five main regulatory mechanisms for late gap gene regulation: a Gap genes are activated by maternal Bcd and Cad in broad regions of the embryo. b Auto-activation leads to intensification and sharpening of domain boundaries in specific gap domains. c Strong cross-repression between gap genes with mutually exclusive expression domains leads to the basic staggered arrangement of gap domains (alternating cushions hypothesis). d Weaker cross-repression between gap genes with overlapping domains of expression leads to anterior shifts in boundary positions over time. e Repression by terminal gap genes establishes the posterior boundaries of several gap domains and excludes gap gene expression from the un-segmented terminal regions of the embryo. Horizontal axis, background color, gap domains, and regulatory links as in Fig. 3. Colored picture elements highlight those domains involved in or affected by a specific mechanism

Late activating contributions by Bcd and Cad

I have already described that activation by Bcd plays an important role in establishing early boundaries of gap gene domains, while activation by Cad does not contribute to positional specification. During cleavage cycle 14A, both of these activating contributions continue to occur, but not even Bcd is significantly involved in the placement of domain boundaries anymore [15, 16]. Instead, activation by Bcd and Cad contributes to the maintenance of gap gene expression (Fig. 7a), until about 10–15 min before gastrulation when the Bcd gradient starts to rapidly decay [46, 61]. At the same time, Cad disappears from the abdominal region due to transcriptional repression by Hb [103, 182, 205], and its expression domain refines into a narrow posterior stripe [61], which is regulated by gap and pair-rule genes [103, 205]. This does not contradict the general rule that maternal co-ordinate genes are not regulated by gap and pair-rule genes. Late zygotic expression of cad plays a very different role than that of maternal Cad: it is involved in determining the identity of the posterior-most abdominal segment in a homoeotic-gene-like fashion [206]. The late decrease in overall maternal activation is reflected by decreasing levels of gap proteins right before the onset of gastrulation [16, 61].

Auto-regulation

Early theoretical analyses of segment determination postulated a prominent and essential role for auto-activation in gap gene regulation [80, 81]. In contrast, more recent studies suggest that auto-regulation only plays a minor part in gap gene regulation (Fig. 7b). Auto-activation by itself cannot be involved in positioning of domain boundaries as it only amplifies differences in expression levels which are already present. Instead, it contributes to sharpening and maintenance of domain borders [15]. Moreover, it does not seem to be strictly essential for correct gap gene expression (although it is clearly present in the embryo) since models of the gap gene network that lack auto-regulation show perfectly normal expression patterns [17].

Experimental support for auto-activation is strongest for hb: Early and late stages of zygotic hb expression are driven by two distinct promoters (P2 and P1, respectively), whose transcripts vary in their first exon but encode identical proteins [126, 178]. Early zygotic expression from P2 occurs in a broad anterior domain and depends on activation by Bcd (see above). Robustness, but not positioning, of this early hb expression domain also requires maternal Hb [56]. In contrast, localized late expression from P1 in its PS4 stripe depends on earlier hb expression [107, 178, 200] but not on Bcd [207]. Either maternal or early zygotic Hb on their own are sufficient for auto-activation as PS4 expression is normal in embryos lacking early zygotic expression from P2 [207], and in maternal mutants with a single paternal copy of hb [70]. PS4 expression is strongly expanded in embryos mis-expressing hb [178]. Finally, a predicted Hb binding site is present in the hb P1 promoter [208]. Note that auto-activation is not required for expression of the posterior hb domain, which is driven by both P1 and P2 promoters [107].

The evidence is less clear for auto-activation of other gap genes. The central domain of Kr is narrowed and weakened [209], and the intensification of gt domains during cycle 13 is delayed [176] in mutants of these genes expressing non-functional proteins. Moreover, recent computational studies predict that both Kr and Gt bind to some of their own regulatory elements [193].

In the case of Kr, kni, and the posterior hb domain, some authors have suggested auto-repression [178, 188, 210, 211]. Reporter assays using the two redundant Kr regulatory elements driving expression in the central domain reveal that one element—the one containing Kr binding sites—shows much weaker reporter activity than the other one [188]. In the case of kni, auto-repression is supported by the fact that reporter gene expression driven by kni regulatory elements is up-regulated in a kni mutant background [175]. Similarly, reporter gene expression in the posterior hb domain is expanded in hb mutants, and decreased in embryos over-expressing hb [178]. However, the evidence for gap gene auto-repression is weak and circumstantial, and the potential regulatory role for such negative auto-feedback remains unclear.

Repression between complementary gap genes

The basic staggered arrangement of trunk gap domains consists of two complementary pairs of expression patterns—those of hb and kni, as well as Kr and gt—which are out of phase with respect to one another (Figs. 5, 7c). This pattern is maintained and stabilized by strong mutual repression between the members of each of these complementary pairs of genes creating positive (or double-negative) regulatory feedback [187, 212]. This has been called the ‘alternating cushions’ mechanism, as one gap domain excludes—and thus buffers against—another. It is strongly supported by experimental evidence.

Repressive feedback between hb and kni is suggested by the following: The abdominal kni domain expands anteriorly in hb mutants [108, 175, 212] while kni is repressed in regions of embryos where hb is mis-expressed [73, 187, 212] or in embryos where Hb is present in the posterior region [113, 120]. Posterior expansion of kni in hb mutants has never been observed, which may be due to redundant repression by Gt and Tll in this region [15]. Very low levels of Hb are required for effective repression of kni [73, 190]. It has been suggested that this repression may be indirect, through repression of zygotic cad by Hb [182]. This is contradicted by the fact that kni expression is still observed in mutants lacking both maternal and zygotic Cad [103]. A direct interaction of Hb and kni is further supported by the fact that Hb binds to the regulatory region of kni [213] a molecular interaction that depends on co-factors of the Polycomb group [136, 139].

The effect of Kni on hb is more subtle. Only a slight expansion of the posterior hb domain can be detected in kni mutants, while the anterior hb domain remains unaffected [163, 212, 214]. Double mutants of Kr and kni, however, show complete de-repression of hb in the central region of the embryo [212]—indicating redundant repression of hb by these two factors. Furthermore, hb is repressed in regions of embryos mis-expressing kni [212, 215217].

It has been noted that Kr and gt expression patterns are always complementary, in wild-type and various mutant backgrounds [131, 190]. In Kr mutants, both anterior and posterior domains of gt expand into the region of the central Kr domain, but do not meet in the middle [118, 125, 131, 176]. Moreover, the posterior domain of gt expands further anterior in bcd; Kr double mutants than in bcd mutants alone [190]. Finally, reporter gene expression from an enhancer driving expression in the posterior gt domain expands anteriorly in a Kr mutant background [192]. While there is only a very subtle and late effect on Kr expression in gt mutants [147, 176, 185, 218], mis-expression of gt abolishes Kr expression very effectively and the resulting embryos show a phenotype that is strikingly similar to the Kr mutant phenotype [144, 176, 187]. Moreover, the central Kr domain expands more strongly to the anterior in hb; gt double mutants than in hb mutants alone [187]. Finally, Gt has been shown to bind to multiple regulatory elements of Kr [125].

Repression between overlapping gap genes

In addition to the repressive feedback between mutually exclusive gap genes described above, there is experimental evidence for additional repressive interactions between gap genes with overlapping expression domains (Fig. 7d). For a long time, the function of these interactions remained mysterious, and they seemed to be redundant with repression between complementary gap genes. Recent studies using mathematical models of the gap gene network suggest that repressive interactions between overlapping gap genes regulate anterior shifts of gap domain boundaries during cleavage cycle 14A [1517, 2224]. These shifts are independent of nuclear movements [61, 171], and can cover more than 15% of the embryo’s length in the case of the posterior border of posterior gt [61].

Mathematical models allow us to identify precisely how such cross-repression can lead to boundary shifts, a task which would be extremely challenging based on traditional experimental approaches alone. Posterior of the central Kr domain, where such shifts are observed, repression from the posterior to the anterior neighbor is much stronger than the other way around. For instance, Gt represses kni, but Kni does not repress gt. This leads to an asymmetric cascade of repressive feedback with posterior dominance (Fig. 7d).

This cascade involves the following interactions, which are all supported by experimental evidence: The appearance of the posterior hb domain during early cycle 14A is made possible by Tll activation [218, 219] (which is probably indirect, via repression of kni [217]), and the absence of repression by Gt [125, 147, 176, 187]. Hb then starts to repress gt, and causes its retraction from the posterior pole [118, 176]. Gt in turn accumulates in the posterior part of the abdominal kni domain. This is possible since Kr—a strong repressor of Gt [118, 125, 131, 176]—has shifted anteriorly due to increased repression by Kni [147, 163, 179]. Gt down-regulates kni [125, 176] inducing an anterior shift in kni’s posterior border. Meanwhile, the anterior border of kni is shifting as well due to the retraction and sharpening of the anterior hb domain (Hb strongly represses kni; [108, 175, 187, 212]). Therefore, the anterior boundaries of both abdominal kni and posterior gt shift as an effect of the shift (or sharpening) of the posterior boundaries of the central Kr and the anterior hb domain.

Mathematical models suggest that this complicated chain of repressive interactions leads to the observed compaction and shift of the domains of Kr, kni, and gt in the central to posterior region of the embryo [5, 15, 16, 23, 24]. Note that such positional shifts due to gap–gap cross-repression are in direct contrast to the French Flag mechanism proposed for the gap gene system by Wolpert (Fig. 2e) [5].

In general, repression between overlapping neighbors is much weaker than that between gap genes with mutually exclusive expression patterns. This is to be expected because several nuclei express both neighboring gap genes simultaneously in each transition zone between domains. This imposes an upper limit on the strength of repression, as too strong an interaction would lead to mutual exclusion. This is probably the reason why the genetic evidence on many of these interactions remains quite ambiguous.

Repression of gt by Hb is indicated by the fact that the posterior gt domain fails to retract from the posterior pole of the embryo around mid-cycle 14A [118, 131, 176] while no gt expression can be detected in embryos over-expressing hb [131, 176, 187, 190]. As in the case of kni, repression of gt by Hb depends on co-factors of the Polycomb group [136]. In contrast, expression of the posterior hb domain is not affected in gt mutants [147, 176] or embryos mis-expressing gt [125, 187, 220].

Repression of kni by Gt has been reported by some authors, but not by others. While a posterior expansion of the abdominal kni domain was reported in one study [176], this effect was not seen in another [175]. Similarly, one study [125] reported reduced expression of abdominal kni in embryos over-expressing gt, while another [187] saw no such effect. Evidence on repression of gt by Kni is similarly ambiguous. There are slight defects of the posterior border [118, 176] and expression levels of the posterior gt domain are reduced [131, 176] in kni mutants. However, since kni is not expressed in the region of the observed defects, they are likely to be indirect.

There is little doubt that Kni represses Kr. The central Kr domain expands posteriorly into regions with reduced or lacking Kni activity in mutants [147, 163, 179, 185]. There is a Kni binding site in the Kr regulatory region, which overlaps with a Bcd activator site [221]. Repression appears to be weak, however, as mis-expression of kni fails to reduce levels of Kr in its central domain [215]. In contrast, there has been some confusion over the effect of Kr on kni. It has been proposed that Kr is required for kni activation, since expression of kni and kni reporter constructs is strongly reduced in Kr mutants [222]. However, this effect turned out to be indirect—via de-repression of gt—as kni expression is completely restored in Kr;gt double mutants [125].

Kr and hb are the only pair of overlapping gap genes that show mutual repression (Fig. 7d). Again, there is some ambiguity in the genetic evidence. While some authors have reported a posterior expansion of the anterior hb domain and its late PS4 expression in Kr mutants [107, 163, 212, 220], a quantitative study of hb expression failed to confirm this effect [214]. In any case, this interaction seems to be at least partially redundant with repression of hb by Kni, as Kr;kni double mutants show a complete de-repression of hb in the central region of the embryo [212]. Repression of Kr by Hb is suggested by an anterior expansion of the central Kr domain (or expression of corresponding Kr reporter constructs) in hb mutants [108, 163, 179, 185, 189, 223]. This expansion can be rescued by ectopic expression of hb in these mutants [216]. The interaction is probably direct, as multiple Hb binding sites have been identified in the Kr regulatory region [189]. Both of the above interactions are weak, since Kr and hb overlap across large regions of the embryo in wild-type and different mutant backgrounds [186, 195]. Moreover, Kr expression is still present in embryos over-expressing hb [187].

It has been suggested that in addition to its repressive effects, Hb can also activate Kr at low concentrations (see above and Fig. 3, bottom panel). Expression in the central Kr domain is strongly reduced in hb mutants [108, 179] and is completely absent in embryos lacking both Bcd and maternal Hb [108, 190, 191]. Reintroduction of increasing dosages of hb into the latter, leads to restoration of Kr expression in a concentration-dependent manner [190, 191]. Furthermore, there is a posterior expansion of Kr in embryos over-expressing hb [108]. However, all of these effects can be explained equally well by an indirect effect, through de-repression of kni in hb mutants, which then represses Kr [15]. Studies based on mathematical models favor this alternative mechanism and show that concentration-dependent activation of Kr by Hb is not required for correct gap gene expression [1517, 19, 24]. At this point, both alternative explanations are equally consistent with the available evidence and expression studies in hb;kni double mutants will be required to clarify the issue.

Repression by terminal gap genes

A third layer of gap gene repression is provided by the terminal gap genes, which convey the regulatory effect of the maternal terminal system [87]. They are required to exclude trunk gap gene expression from the un-segmented pole regions of the embryo and are involved in establishing the posterior borders of the abdominal kni as well as the posterior gt and hb domains (Fig. 7e). In addition, the terminal gap gene tll may be required for activation of the posterior hb domain.

With one possible exception, the terminal gap genes have strong repressive effects on trunk gap gene expression. The evidence is quite clear, although little attention has been paid to hkb so far and its effects on Kr and kni remain to be investigated. Binding sites for Tll have been found in the regulatory regions of hb, Kr, and kni [178, 213, 221]. It represses Kr and kni in concert with the co-repressor encoded by brakeless (bks) [224]. Embryos that mis-express tll in the central region show no expression of Kr, kni, or gt [100, 175, 187, 217]. Only gt was assessed, and found to be abolished, in a similar experiment mis-expressing hkb [95]. Although, Kr expression is not affected in tll mutants alone [185, 218], embryos mutant for both tll and the posterior system show posterior expansion of the central Kr domain; this expansion extends all the way to the posterior pole if these embryos also lack hkb [87]. The abdominal domain of kni expands posteriorly both in tll and tll hkb double mutants, but it has not been established whether the expansion is larger in the latter case [99, 175, 222]. Posterior gt shows delayed retraction in tll mutants, and completely fails to retract from the posterior pole in tll hkb double mutants [99, 131, 176]. Finally, posterior hb fails to retract from the pole in hkb mutants [99, 219], while it is strongly reduced in tll and tll hkb double mutants [99, 218, 219].

In contrast to the other trunk gap genes, the posterior domain of hb is present and expanded to the anterior in embryos over-expressing tll [100, 217]. This suggests that Tll activates hb expression in its posterior domain. However, this interaction is probably indirect, since posterior hb is present in tll;kni double mutants [217]. Furthermore, it remains unclear how this activating effect overcomes translational inhibition by Nos (see above). Either, the Nos gradient has disappeared by this stage of development, or increasing amounts of hb mRNA are able to overcome translational repression by Nos. Quantitative measurements of the Nos gradient, as well as more careful studies using hb regulatory constructs will be required to resolve this issue.

Head gap genes

While head patterning is not completely independent of segment determination in the trunk [225], it involves additional head gap genes—otd, empty spiracles (ems), and buttonhead (btd) [226232]—as well as an early gap-like expression domain of sloppy paired (slp) [233]. In contrast to the trunk gap genes, expression of the head gap genes is directly regulated by the terminal maternal system [55, 84, 227, 233236], with additional activating contributions from Bcd [53, 226, 227, 229, 233, 234]. Although these studies indicated that Bcd activation is concentration-dependent, two more recent publications report that head gap gene expression is not seriously disrupted in embryos with a more or less uniform distribution of Bcd [55, 84]. Moreover, in contrast to the trunk, there is little evidence for gap–gap cross-regulatory interactions [233, 234, 236, 237], and head gap genes appear to act in a more or less parallel and independent manner [238, 239]. Furthermore, head gap domains—such as those of slp, btd, and the anterior domain of kni—are regulated by the maternal D–V system [175, 233, 234, 240].

Other genes with gap-like expression domains

Other genes are expressed in the blastoderm embryo in gap-like domains [167169]. Of these, only a small number have been studied experimentally so far: nubbin (nub; also called pdm1), pdm2 [241244] and castor (cas; also called ming) [245, 246], for example. pdm genes are regulated by gap proteins [154, 241, 244, 247], and have been shown to affect pair-rule gene expression [244]. However, in contrast to hb, Kr, kni, and gt, mutations in these genes do not lead to a gap-like phenotype and have no effect on the expression of other gap genes [244]. Therefore, they are not considered essential components of the gap gene network and will not be discussed further here.

Molecular mechanisms

So far, our discussion of gap gene regulation has remained largely at the genetic (or gene-network) level. In general, I have discussed how specific regulatory interactions (repressive or activating) affect gap gene expression without considering molecular details such as chromatin structure, or cis-regulatory elements (CREs) and the transcription factor binding sites they are composed of. Although some progress has been made towards understanding gap gene regulation at the level of regulatory sequences, our grasp of the molecular mechanisms involved is far less coherent and complete than our genetic knowledge of the system.

Zygotic gene expression before gastrulation depends on the mediator complex involved in chromatin remodeling [248]. Apart from this, very little is known about chromatin-level mechanisms of gap gene regulation and I will focus on transcriptional regulation through CREs instead.

The main conceptual problem when studying eukaryotic transcription in molecular detail is that we do not yet understand many functional and mechanistic aspects of CREs (see, for example, [249, 250]). We do not know why many of these elements are modular (i.e., located on a compact stretch of DNA), while others are dispersed across many kilobases of DNA. We cannot yet reliably predict which sets of transcription factor binding sites constitute a functional enhancer, and which ones do not. We do not have any detailed understanding how such enhancer elements interact and synergize in the regulation of whole genes. Finally, we do not have much quantitative evidence on how transcription factor occupancies at specific binding sites in a CRE affect gene expression, and whether this relationship is a simple one, as is often assumed in current assays.

For these reasons, we do not yet have a clear and satisfactory molecular understanding of the regulation of any of the gap genes discussed above. On the other hand, each of these genes can be used to illustrate some important regulatory principles that we do know about, as well as the difficulties in how to put these insights into a broader regulatory context.

The evidence presented in this section is mainly based on reporter assays in which specific stretches of regulatory sequence are combined with a heterologous promoter and a reporter gene (encoding, for example, β-galactosidase or green fluorescent protein, GFP), which are tested in transgenic animals. This is complemented by gel-shift and DNAse protection (footprinting) assays to identify specific transcription factor binding sites (see [251]). More recently, attempts have been made at determining the binding specificity of all maternal-co-ordinate and gap genes [252254], and large-scale computational screens have been used to identify and analyze CREs (usually based on a combination of binding site cluster detection and identification of regulatory sequences which are conserved across species) [73, 192194, 199, 211, 255, 256]. In addition, ChIP-on-chip data are now available which indicate that maternal co-ordinate and gap transcription factors bind to thousands of regulatory sequences across the entire Drosophila genome [257, 258].

As mentioned earlier, hb is transcribed from two different promoters, which vary in the first exon of their transcripts but not in the protein they encode (Fig. 8a) [126, 178]. The upstream P1 promoter has a brief open reading frame in its first exon, which has been implicated in translational regulation although its function remains unclear [126]. Maternal transcription originates exclusively from P1 [126, 178, 259]. A 1.1-kilobase (kb) region surrounding the P1 transcription start site, and containing multiple predicted GAGA factor binding sites, is necessary and sufficient to drive hb expression during oogenesis [259]. In contrast, early zygotic expression in the anterior half of the embryo is driven by the P2 promoter, which lies in the first intron of the P1 transcript [126]. A 123-bp element about 200 bp upstream of the P2 promoter is both necessary and sufficient for early anterior hb expression [50, 51]. This regulatory element contains several weak and strong binding sites for Bcd [50, 52] and Hb [260]. Late zygotic expression in the posterior hb domain and PS4 shows contributions by both P1 and P2 promoters and is under control of a regulatory element that lies 3 kb upstream of the P1 promoter [178, 208]. This element contains several predicted Kr [208] and Tll [178] binding sites. The presence of additional regulatory sequences between the upstream element and the P1 promoter is suggested by ChIP-on-chip data [257], but their function (if any) remains unknown. In summary, maternal and early zygotic hb regulation occur through entirely distinct molecular mechanisms, and hb can be considered as two independent genes encoding the same protein at these stages. In contrast, late zygotic transcription occurs through both promoters involving a shared upstream CRE. It is not clear how the switch between early and late regulation is achieved.

Fig. 8
figure 8

Molecular mechanisms of gap gene regulation. Transcripts (start site is indicated by arrow, exons by grey boxes, and introns by thin triangular lines) and protein coding sequence (black boxes), as well as cis-regulatory elements (CREs; thick black bars) involved in gap gene regulation are shown schematically for hb (a), Kr (b), kni (c), and gt (d). Solid and dashed curved arrows in a indicate early regulation by separate CREs and late regulation by a common CRE, respectively. Inset in b shows repression by competitive binding, c shows repression by interactions between CREs (kni_kd is composed of 223 and 64 bp-sub-elements; Hb-binding to the 223-bp element masks Bcd-activation in the 64-bp element in the posterior of the embryo), and d shows that strong repression of gt by Hb (required for the anterior boundary of the posterior domain) must be overcome for correct expression in the anterior domain. Genomic positions are not drawn to scale (see text for details)

In addition to expression in its central domain during the blastoderm stage, Kr shows a very complex expression pattern at later stages of development. Accordingly, its regulatory region is very complex. Kr regulatory sequences extend from 1.3 kb downstream of its transcriptional start site (including the single, short intron) up to 17 kb upstream of it [188]. Within this large region, there are specific CREs for each of the different expression domains [189, 223]. The extent of these CREs and how they interact remains controversial. There are two at least partially redundant elements (CD1, CD2) driving expression in the central domain (Fig. 8b). It remains unclear why two CREs are present and how they interact in Kr regulation. Such redundancy of CREs does not seem to be limited to Kr. Redundant CREs (called ‘sibling’ or ‘shadow’ enhancers) are now being discovered in many gene regulatory regions, including those of several other gap genes [256]. Footprint assays revealed binding sites for Bcd and Hb [189] as well as for Kni, Tll [221], and Gt [125] in both of these elements, while Kr sites are only present in CD2 [188]. In most cases, repressor sites overlap with Bcd activator sites, suggesting repression by competitive binding (see inset in Fig. 8b) [221].

Dissection of the 4.4-kb upstream region of kni has uncovered two repressive CREs that are required for setting boundaries of the abdominal kni expression domain [175, 213]. There are two discrete sub-elements responsible for transcriptional activation in the upstream region of kni (in the kni_kd element): The 64-bp element contains six binding sites for Bcd and mediates Bcd-dependent reporter expression, whereas the 223-bp element contains six Cad binding sites and mediates Cad-dependent reporter expression in the posterior part of the embryo (Fig. 8c) [181]. When these two CREs are combined, the anterior expression of the 64-bp element becomes eclipsed by Hb-mediated repression through the 223-bp element [181]. Here, in contrast to Kr, repression is achieved by interactions between CREs, rather than competitive binding of transcription factors to overlapping binding sites (see inset in Fig. 8c). The molecular mechanism for this interaction remains unclear. In addition, a CRE driving anterior kni expression (kni -5 ), as well as an intronic element driving both anterior and an imprecise, extended posterior pattern (kni +1 ) were identified using computational predictions [193, 211].

CREs for gt expression were only identified relatively recently using computational approaches. Three such elements drive reporter gene expression in the posterior (gt −3 ) and distinct anterior domains (gt −6, gt −10 ), respectively, while another element (gt −1 ) reproduces endogenous gt expression in both anterior and posterior domains (Fig. 8d) [192194, 211, 256]. It is unknown how these elements interact, why both domain-specific and multi-domain enhancers are present, and how strong repression by Hb (required for positioning the early posterior domain) is overcome in the anterior of the embryo in the gt −1 reporter construct or in regulation of the endogenous gt gene (see inset in Fig. 8d) [19, 193].

Several recent studies based on computational modeling have attempted to predict and analyze expression of reporter constructs [211, 256, 261] or whole endogenous gap genes [204, 262] based on regulatory sequences and transcription factor concentrations. However, these predictions must be considered preliminary at best at this point. The accuracy of the predicted patterns requires further improvement: predicted boundaries are often missing or appear at a significantly different position than those measured experimentally [211, 256]. In addition, one of these studies proposes regulatory mechanisms, which are in severe conflict with the genetic evidence presented above (tll is repressed by other gap genes; Tll represses hb; Kni represses gt; Kr’s main activator is Cad) [211]. Other transcriptional models [204, 256, 261, 262] provide more plausible insights into gap gene regulation, but—in contrast to gene network models [1520, 22, 23]—are not yet able to reproduce the dynamically shifting patterns of gap gene expression in the blastoderm embryo. This emphasizes our limited grasp of gap gene regulation at the molecular level. Further work on quantitative, dynamical models of transcriptional regulation will be required to resolve this issue.

Patterning precision and size regulation

Since segmentation gene patterns eventually determine the position of morphological body segments, they must be positioned precisely. So far, we have only considered developmental precision with regard to where, when, and how gap domain boundaries are placed, sharpened, and maintained in a ‘typical’ or ‘average’ embryo. Precise patterning, however, also requires that variability in boundary positions be minimized across embryos in a population. To achieve this, the patterning system must exhibit stability or robustness in the presence of genetic and environmental variation. Since growth rates and embryo morphology can vary across a population, the system must also be able to maintain expression domains at the same relative positions in embryos of different shape and size.

In 2002, a study by Houchmandzadeh et al. [214] found that the posterior boundary of the anterior Hb domain exhibits surprisingly little positional variability between embryos at the late blastoderm stage, while the corresponding spatial error in the maternal Bcd gradient is large. In addition, the relative position of the Hb boundary remained constant in embryos of different sizes, while no such size regulation could be detected in the Bcd gradient [214]. Similar results were reported for the pair-rule gene Eve and Bcd in a later study [263]. Furthermore, while the Bcd gradient is affected by temperature changes (as it is established by diffusion of its mRNA and/or protein), Hb precision is not [214]; in fact, hb (and eve) expression are quite unaffected if a large temperature gradient is applied across the embryo using a microfluidic device [264, 265]. Finally, the precision of Hb is maintained in mutants for all three maternal systems, other gap genes, and even in embryos lacking whole chromosome arms [214]. The only exceptions to this are certain alleles of the anterior system gene staufen (stau), which show strongly increased variability in the position of the posterior boundary of Hb [214]. This led to suggestions—based on theoretical considerations—that Hb precision could be due to transport of hb mRNA by Stau protein [266], or an unknown maternal posterior gradient which interferes with Bcd activation [267269]. However, there is currently no experimental evidence to support either of these proposed mechanisms.

In contrast, a study using reporter constructs consisting of three concatenated Bcd binding sites found that such reporters can show sharp posterior boundaries with only very slightly increased spatial variability compared to hb [270]. Even a heterologous anterior gradient based on the yeast GAL4 transcription factor induced precise reporter gene expression [270]. None of these reporter constructs are affected by regulators other than their respective maternal protein gradients. Therefore, these results suggested that such gradients alone are capable of setting precise and sharp target gene boundaries. Precise early expression of hb at cleavage cycle 11 (before other gap proteins can be detected) provides further evidence that Bcd is sufficient to provide precise positioning [56]. In addition, some of the Bcd variability measured earlier [214] turned out to be due to methodological artifacts, and embryo-to-embryo variability measured in vivo (using a fusion of Bcd with GFB) exhibited surprisingly little spatial error in the central region of the embryo [54]. The same authors also measured the input/output ratio between Bcd and Hb protein levels in blastoderm nuclei and recovered a sigmoid distribution with very little variance, suggesting a tight correlation between concentration levels of Bcd and those of Hb. Moreover, disruption of Hb precision in stau mutants is correlated with increased variability of the Bcd gradient in these embryos [271]. Finally, there is now evidence that Bcd does exhibit size regulation within and among populations of Drosophila melanogaster [271, 272].

Yet, for many reasons, it remains highly unlikely that Bcd is indeed sufficient to establish precise positioning of gap domain boundaries. Spatial variability in the Bcd gradient is still higher than that of hb [273] or other gap domain boundaries [61] at the late blastoderm stage. Moreover, sensitivity analysis—based on the Berg-Purcell theory of bacterial chemotaxis [274]—shows that Bcd input on hb would have to be integrated over almost 2 h for it to be able to achieve the observed precision [54]. In contrast, the establishment of the anterior hb domain occurs within 20–30 min in the embryo [56, 61, 66, 67]. During this time, precision of gap gene expression increases significantly: early gap mRNA domains (with the exception of hb [56]) show very large positional variability, and only become more precise once gap–gap cross-regulation has been initiated [19, 61, 173]. At the same time, the distribution of spatial variability in the expression domains of gap genes and the pair-rule gene eve becomes increasingly de-correlated with spatial errors in the Bcd gradient (which grow steadily with lower concentrations towards the posterior of the embryo) [263]. Finally, and most significantly, none of the studies purporting to show precise regulation by Bcd take gap–gap interactions into account, although we know, for example, that at the relevant stage of development hb is repressed by Kr and Kni [212]. Such cross-regulatory interactions have been known for a long time to affect the regulation of gap domain boundaries [163], and therefore cannot be excluded from any serious analysis of patterning precision in the gap gene system.

Two recent studies confirm this and provide a mechanism for the increasing precision of gap gene expression patterns based on gap–gap cross-regulatory interactions [22, 23]. First, they show that Hb precision is reduced to that of Bcd in double mutants for Kr and kni (note that only single gap gene mutants were considered in [214], since all gap genes are on different chromosome arms; Table 1). This establishes that gap genes are important for Hb precision. Second, they use dynamical models of the gap gene network, which reproduce the observed precision of Hb (and five additional gap domain boundaries) when exposed to variation in Bcd concentration. The authors perform a numerical analysis of these models, which establishes that Kr and kni are responsible for this reduction of expression noise. They show that this is due to regulatory compensation: Since Bcd activates both hb and its repressors Kr and kni, increasing activation by Bcd is compensated by increasing repression by Kr and Kni (and vice versa) [22, 23]. Equivalent mechanisms were found for other gap domains.

While it appears that robustness of gap gene expression depends on zygotic regulatory interactions, it is the Bcd gradient that establishes size regulation. The length scale of this gradient was shown to adjust to embryo size within a wild-type laboratory population [271], and relative positions of gap gene and eve expression patterns are constant in embryos of D. melanogaster populations (collected from the wild) that differ significantly in size [272]. Genetic crosses between flies of these two populations show that this effect is entirely maternal, and is not influenced by zygotic feedback. Size regulation also occurs between certain (but not all) species of flies: While the closely related D. simulans and D. sechellia do not show scaling of gap gene patterns [272], such scaling has been found for Bcd, gap and pair-rule patterns in some very small (D. buskii) and some very large (Lucilia sericata, Calliphora vicina) fly embryos [275, 276]. Bcd proteins are of similar size between species, and gradients formed by Lucilia or Calliphora Bcd scale to the correct host embryo size if expressed in D. melanogaster [276]. Dextran injection shows that the cytoplasm of these different embryos does not impart different diffusive properties [275]. Instead, gradient scaling depends on conserved sequences in the Bcd protein required for nuclear localization and protein degradation [276]. Based on this and the observation that Bcd is rapidly imported into nuclei in embryos of D. melanogaster, it has been suggested that scaling is achieved through regulation of protein degradation [275, 276] and/or rapid nuclear import of Bcd protein [48, 276, 277].

The evolution of the gap gene network

Drosophila melanogaster is a long-germband insect. This mode of development is a derived character trait, which only occurs in some higher, holometabolous insects (these insects have a distinct larval stage with subsequent pupation, while hemimetabolous insects show gradual transformation of the larvae into the adult imago during successive moults; see Fig. 9a) [11, 12, 278]. In contrast, all other segmented animals—including vertebrates, annelid worms and most arthropods (including insects)—grow segments sequentially after gastrulation (short-germband segmentation). This ancestral, sequential mechanism is based on oscillatory temporal patterns of Notch signaling and its downstream targets, such as homologues of the pair-rule gene hairy (h). Such oscillatory patterns have been observed in vertebrates (reviewed in [26]), annelids [279], and arthropods such as spiders [280282], centipedes [283, 284], and the cockroach Periplaneta americana—a hemimetabolous insect [285] (Fig. 9a). This may either indicate a common origin of segmentation [286, 287], or convergent co-option of the Notch signaling cascade into the segmentation process in all these phyla [288].

Fig. 9
figure 9

The evolution of the gap gene system. a A simplified phylogenetic tree for the arthropods is shown to the left (based on [323, 360, 361]) indicating relationships between taxa containing species in which gap genes have been studied in some detail. The prevalent mode of segment determination is shown in the first column (S short-, L long-germband). The presence or absence of an oscillator based on Notch-signaling is indicated in the second column. Evidence for or against gap-like expression patterns and phenotypes for the trunk gap genes hb, Kr, kni, and gt is indicated in the remaining two columns to the right (see key for abbreviations). b A simplified phylogenetic tree for the diptera (based on [362]) is shown to the left, indicating relationships between dipteran families containing species in which gap genes have been studied in some detail. The presence or absence of maternal gradients is indicated in the first column (see key for abbreviations). Only higher (cyclorrhaphan) flies have a Bcd gradient. The relative position of gap domains [from left to right in Drosophila: gt, hb (anterior), Kr, kni, gt, and hb (posterior)] and the number of pair-rule (eve) stripes before gastrulation are shown schematically to the right. There are two convergent branches, which have evolved an extreme form of long-germband development: Mosquitoes (Culicidae, top) and higher flies (Phoridae, Syrphidae, Tephritidae, and Drosophilidae, bottom) show seven eve stripes and posterior gt/hb domains before gastrulation. In contrast, midges (Psychodidae/Scatopsidae) lack posterior gt/hb and only develop 3–6 eve stripes during the blastoderm stage. Note that the posterior domains of hb and gt have swapped positions (double arrow) in mosquitoes. Question marks indicate unknown gap gene expression patterns (see main text for details)

However, Notch signaling is not involved in segment determination in holometabolous insects such as Drosophila [14], or (surprisingly) the short-germband beetle Tribolium [286] (Fig. 9a). In this latter species, the pair-rule genes themselves form an oscillatory feedback loop driving the sequential appearance of expression stripes [289]. This indicates that the gene networks governing segment determination in Tribolium—despite exhibiting short-germband dynamics—are derived compared to those in hemimetabolan short-germband insects.

Long-germband development can be seen as a heterochronic shift of segment determination to stages before gastrulation [12]. The transition from short-germband to long-germband development has occurred repeatedly during insect evolution [12] and is thought to be an adaptation to fast embryonic development [11, 290]. Some authors have suggested that this process is associated with the co-option or recruitment of gap genes into the segment determination process [13, 14, 290, 291] (the most conserved—and thus probably ancestral—role of gap genes is in head patterning and neurogenesis [154, 292298]). In long-germband insects, gap genes provide spatially specific regulatory input for the regulation of pair-rule stripes, which replaces the regulation of such stripes by oscillatory temporal mechanisms involved in short-germband segment determination.

However, the evolutionary origins and timing of gap gene recruitment remains unclear [12, 299]. There is almost no evidence on gap gene expression and regulation outside insects (Fig. 9a). They do not play a role in segmentation of centipedes [291, 296], and hb is only expressed after segments have already formed in the crustacean Artemia fransiscana [300]. In contrast, hb is required for segmentation in the spider Achaearanea tepidariorum, where it is expressed in a complex, dynamic pattern of stripes, and leads to the loss of multiple segments upon knock-down by RNA interference (RNAi) [298]. Similarly, Kr shows a gap-like expression patterns in this species [301]. Current evidence does not allow us to distinguish whether the segmentation function of these gap genes was lost in centipedes and crustaceans, or convergently acquired in chelicerates and insects.

Somewhat more detailed evidence is available within the insects (Fig. 9). In short-germband species such as crickets and grasshoppers [292, 295, 302304], the milkweed bug Oncopeltus fasciatus [293, 294, 305, 306] or the flour beetle Tribolium castaneum [297, 307311], trunk gap genes are expressed in broad domains with roughly the same order along the A–P axis as in Drosophila. Small-scale mutagenesis screens in Tribolium uncovered several gap phenotypes [312, 313], one of which (the jaws mutant) is caused by a mutation in Tc-Kr [310]. Similar gap-like phenotypes have been observed in RNAi knock-down of hb, Kr and gt in Oncopeltus [293, 294, 305, 306], as well as hb in Gryllus bimaculatus [295] and Locusta migratoria [304].

In addition, mutants of the mille pattes (mlpt) gene of Tribolium also cause gap-like phenotypes [314]. This gene is not involved in segment determination in Drosophila, where it is known as tarsalless (tal) or polished rice (pri) [315, 316]. This suggests that gap genes may not only be recruited but also be lost during evolution of long-germband development. Another interesting aspect of mlpt is that it encodes a polycistronic mRNA, which codes for several, very short peptides of unknown function [314316].

Still, there is considerable doubt that the function of gap genes is conserved in short-germband insects. RNAi knock-down of hb in the cricket Gryllus bimaculatus [295], of hb and Kr in Oncopeltus [293, 294], and of hb, Kr, kni and gt in Tribolium [297, 309311] indicates that gap genes may be primarily involved in hox gene regulation, growth zone maintenance, or head patterning, rather than the determination of trunk segments through their effect on pair-rule genes (Fig. 9a). Moreover, RNAi knock-down of Oncopeltus gt does not affect the expression of other gap genes, despite it showing a clear gap-like phenotype, while kni knock-down does not show any phenotype at all in this species [306]. In summary, the evidence remains ambiguous, and more systematic analyses—both in terms of species and gene sampling—will be required for a better understanding of gap gene function in these insects.

In contrast, gap genes are clearly involved in segment determination in long-germband hymenopteran insects such as the parasitic wasp Nasonia vitripennis or the honeybee Apis mellifera (Fig. 9a). Nasonia mutants lacking hb, as well as Nasonia and Apis embryos exposed to Kr or gt RNAi knock-down show gap-like phenotypes [177, 317, 318]. Wild-type zygotic expression patterns of hb, Kr, kni, and gt in Nasonia, as well as Kr and gt in Apis, closely resemble those of Drosophila [103, 177, 317, 318]. Moreover, several interactions such as repression of Kr by Gt, of hb by Kr, or activation of the posterior domains of kni and gt by Cad are present in both Nasonia and Drosophila [103, 177, 317, 319].

Other aspects of gap gene expression in hymenopterans differ from Drosophila in interesting ways: Maternal gradients of the product of otd1, one of the two Nasonia orthologs of the head gap gene otd, replace Tor signaling in the terminal maternal system at both poles of the embryo [319, 320]. otd1 also activates the anterior tll domain in Apis, while the posterior domain seems to be established exclusively by mRNA localization [321]. In addition, a maternal gradient of Gt protein is present, which prevents expression of Kr in the anterior region of the embryo [317]. Maternal expression of gt is also detected in Apis, but its mRNA is not localized anteriorly as it is in Nasonia [318].

It appears that the striking similarities in gap gene expression and function between hymenopterans and Drosophila reflect convergent evolution, rather than evolutionary conservation: coleopterans (beetles) and lepidopterans (butterflies/moths)—both placed between hymenopterans and dipterans in recent phylogenies [322, 323]—show a large range of variation between long- and short-germband types of segment determination (Fig. 9a). While Tribolium is a short-germband insect (see above), other beetle species show intermediate or long-germband modes of development [324]. Unfortunately, very little is known about the gene networks involved in segment determination in these species.

The same wide range of variation was observed in those few lepidopteran species that have been studied so far (Fig. 9a): both short- and long-germband mode of development occur in the (very derived) embryos of the silkworm Bombyx mori and the tobacco hawkmoth Manduca sexta, respectively [325329]. Consistent with this, the posterior domain of hb only appears after gastrulation in Bombyx [327], while it is present before gastrulation in Manduca [325]. The anterior domain of hb is very similar to Drosophila in both species [325, 327], and Kr expression is also conserved in Manduca [325].

Although all dipterans are long-germband insects, there are significant differences in regulatory inputs from maternal co-ordinate genes and in gap gene expression between species. Drosophila shows an extreme form of long-germband development, in which all gap domains and pair-rule stripes are formed before gastrulation (Fig. 9b). This arrangement appears to be conserved in the cyclorrhapha, the group of higher flies (Brachycera) to which Drosophila belongs: the dung fly Themira minor (family: Sepsidae) [330], the medfly Ceratitis capitata [331], the house fly Musca domestica (Muscidae) [332], various species of blowflies (Calliphoridae) [275, 276, 333], the hoverfly Episyrphus balteatus (Syrphidae) [334336], and the hump-backed or scuttle fly Megaselia abdita (Phoridae) [334, 337, 338] all show seven pair-rule stripes before gastrulation, and gap gene expression patterns that are virtually identical to those of Drosophila (Fig. 9b).

Little functional evidence is available for gap–gap cross-regulation, but RNAi experiments have shown that many aspects of gap gene regulation by maternal factors are conserved among cyclorrhaphans: Bcd activates hb in drosophilids [208, 339], Musca (through the P2 promoter as in Drosophila) [340, 341], Megaselia (again via P2) [338, 342], and Episyrphus [336]. Anterior expression of tll in Musca involves Bcd, in concert with the dorso-ventral and terminal maternal systems [341, 343]. In Episyrphus, the terminal system activates tll and hkb, in addition to its role in regulation of the head hap gene otd [336]. Finally, Episyrphus Cad activates the posterior domains of kni and gt, as it does in Drosophila [336].

On the other hand, there are also important regulatory differences. These are evidently required in light of the fact that maternal inputs show considerable variability among cyclorrhaphan flies (Fig. 9b): otd (which encodes a homeobox transcription factor with the same affinity as Bcd) is expressed maternally in tephritid fruit flies [344], but not in Drosophila [227, 228] or Episyrphus [335]. Furthermore, while Megaselia lacks a maternal contribution to cad expression (Fig. 9b) [345], Episyrphus has no maternal hb [336], and Cad plays a much more prominent role in gap gene regulation in this fly. Episyrphus embryos exposed to cad RNAi show no trunk segments at all [335], and Cad is required not only for expression of kni and gt but also for hb and tll in the posterior of the embryo [336]. Similarly, the terminal system plays a more important role in Episyrphus than in Drosophila, as it not only regulates expression of otd, but also of cad, kni, and gt in the anterior region [336]. Finally, embryos lacking Bcd in Drosophila [346, 347] and Musca [348] show anterior deletions, but no mirror-abdomen (bicaudal) phenotypes as observed in equivalent embryos of lower cyclorrhaphan flies such as Megaselia [338, 342] and Episyrphus, [336]. This is not surprising for Episyrphus, which lacks the maternal hb contribution that maintains embryo polarity in Drosophila bcd mutants, but also suggest a comparatively minor patterning role for maternal Hb in Megaselia.

Similar to higher flies, the malaria mosquito Anopheles gambiae (Culicidae) shows seven pair-rule stripes and expression in all gap gene domains before gastrulation (Fig. 9b) [349]. However, significant differences in maternal co-ordinate and gap gene expression suggest that this form of extreme long-germband development is very probably convergent to that in higher flies. Non-cyclorrhaphan flies (including dance flies, horse flies, midges, and mosquitoes) do not have a bcd gene (Fig. 9b) [342, 350353]. The identity of the anterior determinant—whose existence is strongly suggested by classical experiments using embryo centrifugation and UV irradiation in chironomid midges [11]—remains unknown. Neither otd nor hb are expressed maternally in Anopheles (Fig. 9b) [349] as they are in Tribolium [354]. Mosquitoes also show transient anterior localization of nos, in addition to its conserved posterior function [349, 355357]. Moreover, gap gene expression is not entirely conserved between the two evolutionary branches, since the posterior domains of gt and hb have swapped positions in Anopheles compared to Drosophila (Fig. 9b) [349].

Expression data from basally branching dipterans such as Psychodid or Scatopsid midges corroborate the convergent nature of long-germband development in mosquitoes and higher flies. The moth midge Clogmia albipunctata (Psychodidae) only shows 6, and the phantom midge Coboldia fuscipes (Scatopsidae) only 3–5 stripes of the pair-rule gene eve before gastrulation (Fig. 9b) [334, 337, 358]. Moreover, while anterior gap gene expression is well conserved, Clogmia does not exhibit any significant posterior expression of gt, and its posterior hb domain only forms after gastrulation (Fig. 9b) [337, 358]. This reduction and delay of posterior patterning in basal dipterans suggests that both mosquitoes and higher flies have independently acquired gt expression as well as heterochronic shifts toward earlier hb and eve expression in the posterior region of the embryo.

These delays in posterior segmentation gene expression are reminiscent of (but not equivalent to) the sequential addition of segments observed during short-germband development. Although some posterior expression features only form after gastrulation in basal dipterans, there is no tissue growth involved in their establishment [12].

Another feature reminiscent of sequential segmentation is the anterior shifts in gap domain positions described above [16, 61]. These shifts are conserved among dipterans since they occur in Epysyrphus [336], as well as in Clogmia, where they are significantly more pronounced than in Drosophila [358]. Similar (although periodically repeating) traveling waves of gene expression can be observed during vertebrate somitogenesis [26] and centipede segmentation [283, 284], and are very probably also occurring in embryos of spiders [280282] and cockroaches [285]. More detailed and comprehensive studies of gap gene expression and regulation in insects outside the Diptera will be required to reveal whether there is a true mechanistic connection between delays and shifts in posterior gap gene expression in flies and the ancestral short-germband mode of development.

Conclusions

In this review, I have attempted to provide a comprehensive overview on our current knowledge of gap gene regulation in development and evolution. For the trunk gap genes hb, Kr, kni, and gt, this knowledge is more detailed and complete than for any other developmental gene regulatory network. By now, we have a solid understanding of how regulatory interactions between maternal co-ordinate and gap genes produce the observed expression dynamics. Only minor ambiguities and gaps remain in the evidence: Does Hb affect Kr by activation and repression at different concentrations? How is translational repression by Nos overcome in the posterior hb domain? How are stable early boundaries established in light of rapid changes in hb concentration? Are we missing a posterior repressor required for the establishment of early gap domain boundaries or the control of precisely placed expression boundaries? How are head gap genes regulated? All of these remaining issues can be resolved by existing experimental and computational approaches.

On the other hand, some fundamental and intriguing questions remain: Our understanding of the molecular mechanisms underlying gap gene regulation is sketchy at best. We still cannot reliably predict expression dynamics from regulatory sequence, since it is difficult to identify those sets of transcription factor binding sites, which are essential for particular expression features. We do not understand why apparently redundant CREs are present, and how CREs interact with each other in regulation of endogenous genes. A better and quantitative understanding of eukaryotic transcription is absolutely essential to connect the genetic regulatory mechanisms—which are the focus of this review—with molecular processes at the level of the genome. Novel, experimental approaches to monitor chromatin dynamics and binding site occupancy in CREs combined with data-driven mathematical modeling of CRE interactions and function will be required to investigate these problems.

Another intriguing issue concerns gap phenotypes and their relation to underlying molecular events: Segmental deletions observed in gap mutant phenotypes most often do not coincide with the extent of the corresponding gap gene expression domains. It has been argued on theoretical grounds, that this is due to gap–gap cross-regulation, such that the absence of one gap transcription factor not only affects its own domain of expression but also those of neighboring genes [80, 81]. Furthermore, many gap gene mutants exhibit segmental duplications and inversions. In this case, it has been suggested that such phenotypes can be understood only if ratios between protein levels are considered to be relevant for positional specification, instead of absolute concentrations of individual gap proteins [359]. However, the exact mechanistic basis of these propositions remains unclear.

Finally, we do not yet have a very good understanding of the causal flow of regulatory information in complex, feedback-driven processes such as the regulation of gap domain shifts. What we do know is that this process involves interactions among all gap genes, and therefore is a network-level property of the system. A better, quantitative understanding of such properties will be required to understand the regulatory dynamics of gap gene expression, and how it influences the evolution of segment determination across different species of insects. Such an understanding can only be gained by quantitative studies combining genetic approaches with data-driven modeling of gene network dynamics.

These challenges illustrate the two central points I wanted to make in this review: First, it is undoubtedly worth taking a second, quantitative and more detailed look at biological systems that appear to have been studied to exhaustion. The more we learn about gap genes and their developmental and evolutionary context, the more interesting and important new questions we uncover. It is not mere details that remain to be discovered in these times of ‘omics’ and systems biology: Answering questions such as those described above will lead to fundamental insights and novel conceptual tools for developmental and evolutionary biology.

This leads me to the second point I am trying to make: The gap gene system—with all its biological features that have been described here, and its incomparable wealth of experimental evidence—provides a unique opportunity to study the role of gene regulatory networks in development and evolution in an integrative and quantitative manner. How do dynamic expression patterns emerge from the collective regulatory interactions within the network? What are the molecular mechanisms underlying these interactions? How do changes in regulatory mechanisms affect gene expression? Or in other words, how does random change at the level of the genome translate into non-random changes in phenotype? I have no doubt that much pioneering work to address these important issues will be based on studies of the gap gene network.