Introduction

Owing to their ease of breeding, relatively low housing costs, amenability to genetic manipulations, and generally better acceptance for use as laboratory animals than larger species, rodents play an essential role in preclinical research in general, and in diabetes research in particular. Spontaneous, chemically, surgically or environmentally induced mouse models of obesity, type 1 and type 2 diabetes have greatly contributed to our understanding of the underlying mechanisms of these diseases. The advent of gene-targeting methodologies, together with the sequencing of the mouse genome, have enabled scientists to study the biological function and pathological role of genes implicated in metabolic regulation. Yet, we often question whether findings obtained in mice are relevant to human pathophysiology. This question is legitimate and the answer is far from simple. In our opinion, mice are extremely valuable tools in diabetes research, as long as we acknowledge their limitations. The question is not so much whether mouse models are useful in diabetes research but rather under which conditions they can provide relevant information. With this question in mind, we have attempted: (1) to review the most commonly used experimental tests to assess glucose and energy homeostasis in mice; (2) to provide some guidelines regarding the design, analysis and interpretation of these tests; and (3) to identify important caveats and confounding factors that must be taken into account when drawing conclusions from the data generated. We certainly do not pretend to set the rules for the field, but simply propose guidelines stemming from our experience with our own research programmes and with the Rodent Metabolic Phenotyping core facility (Metabolic Disease Innovate Solutions) that we have established at the Montreal Diabetes Research Center (Centre de Recherche du Centre Hospitalier de l’Université de Montréal [CRCHUM], Montreal, QC, Canada). Such points of view are inherently debatable and open for discussion, which we hope this review will stimulate. We have not discussed the different types of animal models available for metabolic physiology and diabetes studies, topics that have been extensively reviewed by others [1,2,3]. Not covered, either, are the factors that drive reproducibility in preclinical research, an extremely important and topical issue recently and eloquently discussed by Drucker [4] and Flier [5].

Assessment of glucose homeostasis in mice

GTTs and insulin tolerance tests

GTTs and insulin tolerance tests (ITTs) are first-line experiments to assess glucose and insulin tolerance, respectively, since they are relatively easy to perform and minimally invasive. They consist of measuring blood glucose levels in response to a bolus administration of glucose (GTT) or insulin (ITT). Although they do not require special skills or previous surgery, and they enable rapid investigation of the impact of an intervention on glucose homeostasis, interpretation of these tests can be strongly influenced by variable experimental conditions and data analysis. Further, tolerance tests provide limited mechanistic insights to explain a metabolic phenotype.

Considerations when conducting tolerance tests

Several variables have to be considered when performing tolerance tests, including nutritional state (fasted vs fed), time of the day, route of administration, glucose/insulin dose and the level of restraint required for blood sampling. These experimental conditions have been tested and discussed in detail [6,7,8,9,10,11].

A general recommendation that has emerged is to subject animals to a short fast (~6 h); avoiding prolonged (overnight) fasting is especially important in mice because of their high metabolic rate and the fact that they consume most of their daily calories during the dark cycle. The body weight loss and metabolic stress induced by an overnight fast depletes liver glycogen stores and increases insulin-stimulated glucose transport, which can have confounding effects [8, 12].

The recommended doses of glucose and insulin to detect robust differences between experimental groups are 1 to 2 g/kg (i.p. or oral) and 0.5 to 1 U/kg, respectively (Table 1), depending on the strain. It is important to keep in mind that, during an oral GTT, the rate of glucose clearance depends on gastric emptying, glucose intestinal absorption and the potentiating effect of incretins on insulin secretion. The i.p. or i.v. routes of administration bypass these processes.

Table 1 Recommendations for tests and data presentation when assessing glucose homeostasis in mice

Because glucose disposal is mostly dependent on lean mass, it is also important to consider whether glucose or insulin dose should be adjusted based on body weight or lean body mass (this has been discussed in detail [7, 10, 11]). When tolerance tests are performed in animals of different body weight (mostly owing to differences in fat mass), adjusting glucose or insulin dose based on body weight introduces a bias, worsening glucose intolerance in obese models [6, 13]. For this reason, it is more accurate to normalise glucose or insulin dose based on lean mass. However, this requires lean mass to be measured before the test using MRI, a stressful manipulation that requires restraint or anaesthesia and may introduce confounding effects. Moreover, this approach is limited in the context of longitudinal studies that require repeated GTTs or ITTs, during a diet intervention for instance. A potential alternative is to use a fixed dose of glucose or insulin based on the body weight of the control group [6, 13], as performed in humans.

GTT/ITT data analysis

Analysis of data from GTTs and ITTs is relatively straightforward. It should always be performed on raw data rather than percentages of the basal value, to avoid misinterpretation in cases where the experimental groups have different basal glucose or insulin levels. Changes in glucose levels over time during a GTT or ITT should be analysed using two-way ANOVA (and not at a given time point), and AUC should be presented for GTT to draw reliable conclusions (Table 1). It is important to substract the basal value when calculating the AUC. The rate of glucose disappearance can also be calculated as the slope of the decreasing line of blood glucose levels (K ITT ) during an ITT [14].

Basal glucose and insulin levels measured at time 0 of GTTs have been extensively used to calculate indexes of insulin sensitivity in mice, including HOMA. This index was initially established to empirically estimate insulin sensitivity in epidemiological and clinical studies. The HOMA equation is derived from human studies and is defined by the product of blood glucose and blood insulin levels in the fasted state, divided by a normalising factor derived from ‘normal’ insulin and glucose levels in an ‘ideal’ individual. In rodents, this index is poorly predictive of insulin sensitivity [15] and its use is generally not recommended. It is, however, conceivable that the HOMA index might be useable with derivation of an equation specific to mice and/or mouse strains in fasted conditions.

Application of murine-derived GTT/ITT data

The only solid conclusions that can be drawn from GTT and ITT are relating to glucose tolerance and insulin resistance, respectively. Measuring insulin and/or C-peptide levels during a GTT might provide valuable information on insulin secretion as long as blood can be sampled without stressing the animals. A higher AUC for glucose and/or a lower rate of glucose disappearance vs a control group studied side-by-side indicates glucose intolerance. Likewise, a lower rate of glucose disappearance during an ITT is indicative of insulin resistance. However, the converse is not true: a lack of differences between the groups during an ITT does not mean that the animals have similar levels of insulin sensitivity; ITTs are not sensitive enough to draw this conclusion. Insulin sensitivity can only be reliably assessed at steady-state levels of glucose and insulin (see below).

Tolerance tests are poor predictors of the glucoregulatory mechanisms underlying a particular phenotype. For instance, many studies show normal or minor changes in glucose tolerance despite a significant decrease or even complete absence of insulin secretion during a GTT [16,17,18,19,20,21,22,23], while the opposite is also true [24]. The main variable contributing to this discrepancy between insulin secretion and glucose excursion is glucose effectiveness, a measure that represents the capacity of glucose to regulate its own disposal independently of insulin. This can be estimated using a frequently sampled IVGTT (FS IVGTT; described in detail below). During a GTT, ~50% of glucose disposal is independent of insulin secretion or sensitivity in rodents and humans [16, 25]. Importantly, glucose effectiveness is regulated by various metabolic signals, including gut hormones [26], and is altered in rodent models of obesity and diabetes [27]. Thus, glucose tolerance during a GTT results from insulin secretion, sensitivity and glucose effectiveness. Since gut signals (e.g. glucagon-like peptide-1 [GLP-1], fibroblast growth factor 19 [FGF19]) regulate glucose effectiveness [26], the relative contribution of glucose effectiveness to glucose tolerance is dependent on the route of glucose administration (i.p. vs oral).

ITTs were initially designed to measure the activity of the hypothalamic–pituitary–adrenal (HPA) axis in response to insulin-induced hypoglycaemia (during the 90–120 min period post-injection) in humans and were adapted to measure insulin sensitivity using frequent sampling during the 15 min period following insulin injection [28]. Most ITTs performed in mice use the early and late responses to insulin to estimate insulin sensitivity. Considering that the half-life of insulin is ~10 min, glucose excursions at later time points mostly depend on the counteregulatory response to hypoglycaemia (glucagon, catecholamines and cortisol) rather than changes in insulin sensitivity. In addition, the hypoglycaemia threshold triggering counterregulation may vary from one mouse model to another. Thus, it is more accurate to only consider the response to insulin within the first 20 min for analysis and K ITT estimation [28].

For all reasons described above, it is essential to accurately report the experimental conditions of tolerance tests (fasting duration, glucose and insulin doses, administration route, blood sampling site, use of restraint), conduct rigorous data analysis, and interpret the results with caution before drawing robust conclusions.

In summary, tolerance tests can generate valuable results but suffer from limited sensitivity. More powerful tests to interrogate specific mechanisms, such as insulin secretion, sensitivity and glucose effectiveness, are described in the following sections.

Insulin and glucose clamps

The hyperinsulinaemic–euglycaemic clamp

Also known as the ‘insulin clamp’, the hyperinsulinaemic–euglycaemic clamp (HIEC) was developed by DeFronzo et al [29] to measure insulin sensitivity in humans and was subsequently adapted for use in conscious mice. The procedure consists of inducing a state of stable hyperinsulinaemia by perfusing a fixed and constant dose of insulin concomitantly with a variable glucose infusion to maintain euglycaemia. In these conditions, the glucose infusion rate (GIR) directly reflects whole-body glucose disposal. The M/I index of insulin sensitivity is calculated as the ratio of GIR (M) to circulating insulin levels (I) during the steady-state period. Additional information can be obtained using isotopic glucose tracers during clamp to estimate tissue- and organ-specific insulin sensitivity (glucose uptake and endogenous glucose production).

Key considerations when conducting HIEC tests

Several key methodological variables must be considered when performing HIEC tests in mice (Table 1). These include the nutritional state, blood sampling sites and initial insulin priming and infusion rates, as discussed by Ayala et al [12]. For instance, blood sampling from the tail significantly increases circulating catecholamines, leading to decreased insulin sensitivity compared with arterial sampling, which requires less restraint. In addition, insulin sensitivity is increased after 18 h fasting compared with a 5 h fast. Finally, the insulin priming dose and infusion rates affect insulin clearance, as well as hepatic vs peripheral insulin action.

Other important issues often arise in the design of a clamp. For example, our core facility recently performed clamps in transgenic mice with reduced fasting blood glucose levels compared with controls. In the transgenic group, glucose levels were clamped to the level in the control group. However, alternatives, such as using the average blood glucose of both groups, would have been acceptable, as long as they are precisely reported.

More complicated is the issue of insulin levels, since these cannot be measured extemporaneously during the clamp and, therefore, are not adjusted in real time. By design, insulin levels during the steady-state period of the HIEC should be similar between groups. In practice, however, this is not always the case, even if endogenous insulin secretion is suppressed by co-infusion of somatostatin. This is due to differences in insulin clearance. For instance, we [30] and others [31] have shown that lipopolysaccharide-induced endotoxaemia dramatically increases insulin levels during HIEC, potentially because of decreased hepatic insulin clearance. Similarly, glucose and lipid co-infusion in rats decreases the clearance of exogenous human insulin during a subsequent HIEC [32]. Such differences in insulin levels during the clamp make comparisons between groups very difficult unless the insulin dose is adapted in the experimental group to match blood levels of insulin in the control group [31] or comparisons are only performed between groups with similar levels of circulating insulin. Alternatively, one can circumvent differences in insulin levels by calculating the sensitivity index, which takes into account changes in insulin levels (clamped – basal) rather than absolute values [33]. In all cases, it is essential that basal and clamped glucose and insulin values are reported [34].

The hyperglycaemic clamp

The hyperglycaemic clamp (HGC), also known as the ‘glucose clamp’, is considered the gold standard for measuring glucose-stimulated insulin secretion (GSIS) in vivo. It consists of infusing a variable rate of glucose (i.v.) to maintain hyperglycaemia at a target value (usually between 12 and 18 mmol/l) and measuring insulin and C-peptide levels during this steady-state hyperglycaemia. In addition, a bolus of arginine can be performed at the end of the clamp to assess the maximal arginine-induced insulin response (AIRmax).

Importantly, the clamp can be designed as a one-step or two-step protocol; the latter allows for measurement of beta cell glucose responsiveness at different levels of hyperglycaemia (e.g. 8 and 16 mmol/l) during the same test, as we have previously described [35]. The different measures obtained during steady-state hyperglycaemia can be used to estimate insulin clearance (C-peptide/insulin), M/I and the disposition index (DI; an index of beta cell function based on insulin secretion corrected for insulin sensitivity, calculated by multiplying the M/I index by C-peptide levels). However, M/I and DI estimations are limited because they are based on the assumption that, in a given mouse model, the relationship between insulin sensitivity and secretion is hyperbolic [36]. Since it is technically difficult to perform both a HIEC and HGC in the same animal (both are typically terminal experiments), M/I and DI are typically obtained from the same HGC. Therefore, they are not independent from each other and should be interpreted with caution [37].

Key considerations when conducting HGC tests

As with HIEC, several experimental variables should be considered when designing HGC studies, including nutritional state or blood sampling sites (Table 1). For example, compared with fed animals, GIR and insulin secretion in 16 h fasted mice is decreased [38].

HGC vs GTT

The sensitivity of HGC vs GTT is highlighted by studies on the role of the fatty acid receptor G-protein-coupled receptor 40 (GPR40; also known as free fatty acid receptor 1 [FFA1]) in GSIS. For example, using oral GTT, IP- or IVGTT, we [19, 39] and others [40, 41] found that glucose tolerance and insulin secretion were similar in chow-fed GPR40-deficient mice compared with wild-type (WT) littermates, while, in comparison, using HGC, we found that insulin secretion was impaired in GPR40 knockout (KO) mice [38].

Although clamp methodology is sensitive and powerful, it is technically challenging. Dual venous and arterial catheterisation in mice is difficult to perform and often results in loss of animals (because of occluded or leaky catheters). For this reason, some groups only use a venous catheter for infusions and collect blood samples from the tail. As aforementioned, blood sampling from the tail, which involves animal restraint, directly impacts stress levels and glucoregulatory responses and, thus, introduces confounding factors. In addition, obese models (diet-induced and genetic) do not tolerate surgery well and may not recover pre-surgery body weights, potentially affecting insulin sensitivity and/or secretion. Thus, it is essential to precisely report the experimental conditions used for clamp studies (blood sampling site, restraint used, insulin dose and levels, blood glucose profile and GIR) [34]. If clamps cannot be performed under optimised conditions, GTT and ITT remain valuable alternative methodologies to generate interpretable results, as long as the experimental conditions are well reported and the data analysed accurately.

FS IVGTT

The FS IVGTT, initially developed by Bergman et al [42] in humans, provides an indirect measure of insulin sensitivity based on glucose and insulin levels obtained during the test. The FS IVGTT has been used in dogs and rats but was only recently validated in mice [20, 27]. In addition to measuring insulin sensitivity and secretion, glucose effectiveness can be estimated from FS IVGTTs. As such, it represents a powerful method for concomitantly measuring the three processes involved in glucose tolerance in the same animal. Alonso et al [27] observed strong heterogeneity in the first phase of GSIS during FS IVGTT within the same BL/6J mouse substrain. They also showed strong differences in glucose effectiveness in pathological models (diet-induced obesity [DIO] and ob/ob mice). Although this test has many advantages, the challenging dual catheterisation (venous and arterial) and frequent blood sampling make it difficult to use routinely in mice.

In Fig. 1, we provide a step-by-step flowchart for selecting the appropriate in vivo tests, which can be followed by isolation of tissues or organs (e.g. pancreatic islets or muscle) for ex vivo studies.

Fig. 1
figure 1

Proposed flowchart for the investigation of glucose and energy homeostasis in mice. The first simple step for metabolic phenotyping is to measure body weight, fasting blood glucose and plasma insulin levels and to perform a GTT that includes insulin measurements. If body weight is affected, we refer the reader to the guidelines proposed by Tschöp et al [43] to investigate energy balance in rodents. If glucose tolerance is found to be altered, changes in insulin sensitivity and/or secretion can be tested using ITTs and/or the gold standard, HGC, respectively. However, ITT is not very sensitive and, hence, a lack of change in insulin tolerance does not rule out insulin resistance. HIECs can be used to interrogate insulin sensitivity at the whole-body level or in a tissue-specific manner using glucose tracers. ITT, HIEC and HGC tests can be combined with ex vivo studies (e.g. analysis of insulin secretion in isolated islets, glucose transport in tissues, beta cell mass measurements) to better define the underlying mechanisms

Assessment of energy homeostasis in mice

Energy metabolism

Alterations in glucose homeostasis often result from changes in energy balance and body composition. Therefore, investigating whole-body energy homeostasis is critical to understanding the mechanisms underlying changes in glucoregulatory responses. Several types of metabolic cages are now available for in-depth analysis of energy balance through measurements of energy expenditure, respiratory quotient, locomotor activity and food intake.

Although indirect calorimetry technology (and its sensitivity) has significantly improved, there is no single approach to mouse metabolic phenotyping and several considerations must be taken into account to generate reliable results and meaningful interpretations. These considerations have been recently discussed and guidelines have been proposed for metabolic phenotyping [43,44,45]. Here we summarise some key aspects that require consideration.

First, housing conditions (temperature, single- vs group housing, position of the cage on the rack and type of diet) can dramatically affect animal energy homeostasis. For instance, recent studies highlighted the impact of conventional housing temperature at ~22°C vs ~30°C (which is thermoneutral for mice) [46] on feeding [47], glucoregulatory responses [48] and DIO [49]. Second, body composition has a major impact on energy expenditure analysis. Most studies normalise energy expenditure to total body or lean mass, which may introduce confounding factors. Although still debated [44], one recommendation is to analyse group and body composition effects on metabolic rates using ANCOVA [43, 50, 51]. Third, accurate measurement and analysis of food intake should take into account several variables, including the type of diet, pattern of feeding behaviour, and spillage and foraging. Commercial low-fat diets (purified) are more reliable controls for high-fat diet (HFD) studies because of their standardised composition compared with ‘chow’ diets, the dietary composition of which can differ from batch to batch. When using HFDs, the nature and type of fat should also be considered. Although food intake can be accurately measured manually, automated systems enable the tracking of circadian feeding patterns, frequency and quantity of food ingested. Reporting daily food intake values that have been averaged over a period of time is not informative, especially when animals are group housed. Cumulative food intake, calorie efficiency (amount of body weight gained based on calories consumed over time) and loss of calories in the faeces are also important variables to measure. Finally, paired and yoked feeding, both of which allow differences in diurnal feeding pattern between groups to be circumvented, are powerful strategies to determine whether or not changes in energy balance and body weight are dependent on the amount of calories consumed. Also important to consider are the acclimation period to new cages or to changes in husbandry conditions (single vs group housing) and the nutrient content of bedding [52].

Considering the number of variables that must be taken into account and controlled for, metabolic phenotyping is a complex and challenging endeavour. For this reason, it is essential to report detailed descriptions of procedures and protocols, perform accurate data analysis and interpret the results in a given experimental context. Finally, when studying energy metabolism in a genetically modified model, it is important to keep in mind that the gene of interest may regulate sensorial (taste, smell, sight), locomotor and/or gastrointestinal functions, thereby affecting feeding behaviour; these must be considered to avoid misinterpretation of the phenotype [53].

Some caveats and confounding factors in metabolic studies using mice

As described in the previous sections, most tests used to examine glucose and energy homeostasis were initially developed in humans and have rarely been validated in rodents. Yet, major differences exist between these species, such as metabolic rates, contribution of glucose effectiveness to glucose disposal, and feeding behaviour, influencing the design and interpretation of these tests. One of the major advantages of using mice lies in their highly standardised breeding and housing conditions and genetic background. This homogeneity increases the power of an experiment by lowering the inter-individual variability as compared with human studies, in which lifestyle, ethnicity and genetic variations may have a strong impact on a given response. On the other hand, the relatively standardised experimental conditions used for laboratory animals may also introduce some bias or confounding factors.

Age and sex

For cost-related reasons, most studies in the field of metabolism (except, of course, on aging) are performed in young animals (8–12 weeks). Yet, obesity and type 2 diabetes are strongly associated with aging, and several differences have been reported between young and old rodents. For example, the beta cell replicative capacity strongly decreases with age in mice [54, 55]. Further, we [56, 57] have shown that young rats do not develop insulin resistance in response to nutrient infusion (in contrast to older animals), suggesting that adaptive responses to nutrient surfeit are age-dependent. Even HFD studies often underestimate the impact of age since the typical length of HFD is 8–16 weeks, starting in 5–6-week-old animals. For these reasons, studies pertaining to chronic metabolic diseases should ideally be performed in older animals and/or using short- (4–6 weeks) and long-term (12–20 weeks) dietary interventions so as not to overlook potential age-dependent adaptive mechanisms.

Likewise, for reasons related to housing cost and difficulty to synchronise and/or test the cycle phase in female mice, many metabolic studies only include male mice. This introduces a bias since oestrogen is well known to have beneficial effects on metabolism. Oestrogen acts on the brain and peripheral tissues to reduce food intake while increasing energy expenditure, and enhance insulin sensitivity and beta cell function (reviewed in [58]). Together, these effects likely contribute to the well-known protection against the development of obesity, insulin resistance and hyperglycaemia in HFD-fed female vs male mice, and the sexual dimorphism of metabolic diseases in humans [59]. In addition, there are significant differences in drug/xenobiotic pharmacokinetics in male vs female animals [60]. The importance of including male and female animals has been recently recognised by the National Institutes of Health to improve reproducibility in preclinical research [61], and guidelines have been proposed to study the mechanisms of sex differences in metabolic research [62].

Environment

Although mice are typically bred and housed under highly controlled environmental conditions, many environmental factors can affect their physiology and behaviour including diets, husbandry, handling, noise/vibrations, temperature and circadian rhythm (reviewed in [47]). A prime example of this is the protection from diabetes observed in NOD mice when housed on the top of the rack compared with those housed on the lower rack shelf [63].

Amongst environmental factors, the impact of the gut microbiota on energy homeostasis and the aetiology of metabolic diseases is receiving increasing attention [3]. Sequencing of the gut microbiota and interventions altering gut microflora have shown that microbiota composition modulates insulin sensitivity and the response to DIO. In addition, elegant studies recently showed that the interaction between the microbiome and the host is dependent on the genetic background, leading to a complex crosstalk between microbiota and host in disease development (reviewed in [64]). This is a highly complex issue that, under most circumstances, is impossible to control for experimentally. However, the potential impact of the microbiota should be taken into account when otherwise unexplainable differences in the response to a particular manipulation are observed between, for example, mice from the same strain and background but from different vendors, or experiments performed in the same mouse colony but in different animal facilities.

Genetic background and breeding strategies

Type 2 diabetes and obesity have a highly complex genetic architecture (reviewed in [65]), with individual SNPs only contributing a minor fraction to the overall risk. Thus, one may question the relevance of using inbred animals as models for these polygenic diseases. On the one hand, one would want to examine the role of individual genes or variants in a model that is as close as possible to the genetic variety of humans (keeping in mind that outbred animals are still not as genetically diverse as humans). On the other hand, keeping everything else as equal as possible facilitates the identification of a phenotype associated with the gene of interest. For this reason, genetic studies in mice are often performed in inbred strains. However, many studies have shown that the impact of a genetic manipulation is highly influenced by the animal’s genetic background. For instance, db mutations only lead to obesity in the C57BL/6J background but induce obesity and chronic hyperglycaemia in C57BLKS/J animals [66]. A recent study by Sittig et al [67] showed that the same genetic mutation in 30 genetic backgrounds can lead to dramatically different, sometimes completely divergent, phenotypes. Of particular interest, only three out of 30 mouse strains harbouring a null mutation of Tcf7l2 (a genetic risk factor for type 2 diabetes [65]) showed significant reduction in blood glucose levels measured throughout the day. In addition, Tcf7l2 mutant mice showed behavioural phenotypes that varied depending on the strain. While raising new questions on the importance of gene–gene interactions, these findings highlight important limitations of studies in genetically modified inbred animals. Were financial and timing aspects not limiting, it would be ideal to study the impact of genetic interventions in both inbred and outbred animals, or at least in several different inbred strains. This type of approach would strengthen the outcomes and conclusions of studies. The clustered regularly interspaced short palindromic repeats (CRISPR) technology has opened new avenues for generating gene mutations more quickly and could thus be helpful in this regard.

Despite the limitations of inbred strains, phenotypic differences between strains have been useful for the identification of new genes or gene polymorphisms involved in beta cell function or glucose homeostasis using transcriptomic analyses [68], linkage studies allowing genetic mapping of qualitative and quantitative trait loci, or genome-wide association studies (GWAS) (reviewed in [69]).

Inbred strains display major differences in glucoregulatory responses and susceptibility to DIO. For example, dilute brown non-agouti (DBA) mice show heightened insulin secretory response to glucose and body weight gain when fed a HFD compared with C57BL/6, Friend virus B-type (FVB) or A/J mice [17, 68, 70]. The C57BL/6 strain has been widely used because of its high susceptibility to develop obesity and hyperglycaemia under HFD compared with other strains. However, there are several BL/6 sub-strains, including the commonly used BL/6J (often from Jackson Laboratories) and BL/6N (often from Charles River and Taconic). Several studies have now demonstrated important metabolic differences between these two sub-strains that strongly influence the development of obesity and diabetes. The BL/6J mice harbour a mutation in the Nnt gene that is responsible for impaired GSIS under chow-fed conditions [21, 71]. Importantly, BL/6 mice heterozygous for the Nnt mutation (BL/6NJ) are characterised by impaired GSIS compared with BL/6NN animals on a HFD [22], although they show reduced body weight gain and glucose intolerance compared with BL/6J [72]. These findings have major implications when using the cre-loxP system. Most engineered cre mice are available at the Jackson depository and maintained on the BL/6J background. In contrast, floxed mice are usually generated using BL/6N- or 129-derived embryonic stem cells. The resulting cre-floxed animals have mixed BL/6J and BL/6N or 129 genetic backgrounds. Uncontrolled genetic background and unknown status of the Nnt mutation (JJ, NJ or NN) will undoubtedly affect beta cell function and thus introduce confounding factors, strong phenotypic heterogeneity and difficulties in replicating findings. This is a major issue in the field that has been recently discussed by Fontaine and Davis [73]. In the literature, there are several examples of inconsistent or opposite mouse phenotypes that are likely related to genetic background. For instance, Steneberg et al [74] initially reported that mice deficient in GPR40 were protected from obesity-induced hyperinsulinaemia and glucose intolerance under HFD conditions whereas mice overexpressing GPR40 in the pancreas were prone to diabetes, suggesting that antagonism of GPR40 could be beneficial for the treatment of diabetes. In contrast, subsequent studies by us [39] and others [40, 41] using three independent GPR40 KO strains (backcrossed with BL/6N or BL/6J) and a transgenic BL/6J mouse overexpressing GPR40 in beta cells [75] led to opposite conclusions. More recently, discrepant results were reported regarding the anti-inflammatory role of the n-3 fatty acid receptor GPR120 (also known as free fatty acid receptor 4 [FFA4]) during high-fat-feeding; Oh et al [76] found that the anti-inflammatory and beneficial effects of dietary n-3 on glucose homeostasis were absent in GPR120 KO mice of an unspecified genetic background fed with a HFD. In contrast, two subsequent studies showed that dietary n-3 were equally effective in GPR120 KO mice and WT littermates on a mixed (129 and BL/6J) or pure (BL/6N) background [77, 78]. Finally, studies on the role of uncoupling protein 2 (UCP2) in insulin secretion also led to diametrically opposite conclusions (reviewed in [79]). The initial UCP2 KO mice generated on a mixed background (BL/6J and 129) showed improved GSIS. However, when these animals were backcrossed with inbred strains (BL/6J, AJ or 129), a decrease in GSIS was observed [79]. Interestingly, Seshadri et al recently demonstrated that UCP2 expression is temporally regulated in beta cells throughout the day and that this pattern is required for the temporal control of glucose-induced ATP production, normal rhythms of GSIS and, ultimately, glucose tolerance [80]. We believe that it is important to encourage authors and editors to adequately report breeding strategies and detailed genetic backgrounds. An ideal approach would be to phenotype experimental animals on pure and mixed backgrounds and report potential differences.

Different breeding schemes can be used to generate experimental whole-body or conditional KO animals using the cre-loxP system. In all circumstances, control animals must be littermates, implying that homozygous KO should never be crossed together to generate experimental mice. Accordingly, it is not acceptable to use WT controls purchased from a supplier; WT controls must be littermates. The absence of a phenotype in female KO animals does not preclude a phenotype from arising during gestation. Therefore, it is safer to use heterozygous female KO mice for breeding. Cre homozygosity may introduce confounding factors and/or cre-associated toxicity [81] and, so, it is recommended to cross cre-negative female mice with cre-hemizygous male mice [82]. Finally, it is important to consider multiparity in the breeding scheme and the size of the litter that may influence the metabolic phenotype of the offspring [83].

Additional issues with the cre-loxP system and inducible models

The cre-loxP strategy involves flanking the endogenous gene or a portion of it with loxP sites, which can affect gene expression. A well-known example is the Glut4 (also known as Scl2a4)-floxed mouse in which the recombined Glut4-floxed gene was expressed at lower levels, specifically in adipose tissues, compared with WT controls [84]. This could be owing to the loxP sequences and/or the neoR selection cassette that is part of the construct. It is therefore critical to verify the expression of the recombined floxed gene and to excise the neoR cassette used for embryonic stem cell selection using flippase. Also, the genome contains degenerate loxP sites that can be targeted by cre, leading to off-target recombination of genes in a tissue-specific manner, as described in the heart using the αMyHC-cre mouse (B6.FVB-Tg(Myh6-cre)2182Mds/J) [81].

There are also issues regarding the efficiency, specificity and phenotype of cre transgenics, which have been recently discussed in the context of beta cell- and brain-specific cre lines [85, 86]. For example, the Nestin-cre (B6.Cg-Tg(Nes-cre)1Kln/J) that targets the brain is known to leak into multiple peripheral tissues. In addition, Nestin-cre [85] and RIP-cre (B6.Cg-Tg(Ins2-cre)25Mgn/J) [87] mice are phenotypically different to their WT littermates. Other examples of ectopic expression include the Pdx1-cre (Tg(Pdx1-cre)89.1Dam) and RIP-cre strains, which are designed for pancreas and beta cell KO, respectively, but leak in the brain [85, 86, 88]. The leakiness of cre expression in RIP-cre strains might be caused by the truncated insulin promoter in the transgene construct. Indeed, the full-length insulin promoter used in the MIP-creERT (B6.Cg-Tg(Ins1-cre/ERT)1Lphi/J) strain restricts cre expression to the beta cell [88, 89]. A strategy to circumvent the leakiness issue is to generate a ‘knockin’ cre strain such that cre expression is under the control of an endogenous promoter, with the caveat that this deletes one of the two endogenous alleles, which may result in lower gene expression. This strategy has recently been used to express cre from the endogenous Ins1 gene, resulting in faithful cre expression in beta cells without leakiness into the brain [90], and from the proglucagon promoter, resulting in cre expression in GLP-1-producing alpha cells, intestinal L cells and neurons [91]. Cre efficiency can be influenced by the breeding or genetic background of a mouse. For instance, cre expression in Albumin-cre mice (B6.Cg-Tg(Alb-cre)21Mgn/J) is silenced with an increasing number of generations [92]. Along the same lines, cre-mediated excision can be affected if the cre-driving promoter is sensitive to changes in metabolic status or diet. Douglass et al [93] recently reported that GFAP-creER (B6.Cg-Tg(GFAP-cre/ERT2)505Fmv/J)/Ikkb (also known as Ikbkb)-floxed mice had minimal gene deletion in hypothalamic astrocytes when chow fed, while the efficiency of cre-mediated excision of inhibitor of kappaB kinase β (IKKβ) was dramatically enhanced after HFD.

Recent studies have shown that several cre lines used in beta cell research ectopically express the human growth hormone (hGH) minigene, commonly used as a polyadenylation sequence in transgene constructs [89, 94]. Expression and secretion of hGH by mouse beta cells has a significant impact on beta cell function and mass and protects animals from streptozotocin (STZ)-induced hyperglycaemia [89, 94]. This also has implications in other cre models as it was shown that hGH expression in the hypothalamus is responsible for the metabolic phenotype in Nestin-cre mice [95].

Finally, it is important to consider that drugs used to induce cre expression, such as tamoxifen and doxycycline, have off-target effects that may be toxic [82, 96,97,98]. As such, it is recommended to test potential adverse effects and administer all experimental groups with the lowest tamoxifen/doxycycline doses suitable, using the most appropriate route of administration (e.g. oral tamoxifen is considered less toxic than i.p. injections).

For all aforementioned reasons, it is strongly recommended to include all three cre, floxed and WT control groups in tissue-specific KO studies, or alternatively to verify, at the outset, that these control groups have a similar phenotype. Also, including heterozygous KO animals can help strengthen conclusions by establishing a relationship between gene dosage and phenotype severity. It is essentially impossible to avoid all the confounding variables associated with mouse genetic studies but most of them can, and should be, controlled for.

Conclusions

In this review, we attempted to address the question, under which conditions can mouse models provide relevant information for metabolic research? We focused on some aspects that we consider of particular importance, while deliberately not covering others that are just as important but were recently discussed elsewhere, such as sample size, randomisation and, more generally, applying similarly rigorous standards to the design, execution and reporting of preclinical studies to those used in clinical research [4, 5].

The pitfalls and limitations highlighted herein should certainly not discourage the use of mice for metabolic studies. The mice are an extremely useful model, especially at early stages of the discovery process, be it for functional genomics, target identification and validation, in vivo validation of test compounds or early toxicology studies. Yet, these pitfalls and limitations should be known, acknowledged and, whenever possible, circumvented or controlled for. In addition to the recommendations summarised in the text box, the reporting of animal experiments should be transparent, accurate and complete. In that sense, we encourage editors and reviewers to request that detailed information be provided by authors regarding the design and interpretation of studies using mice. Diabetologia has recently implemented a mandatory preclinical checklist to that effect.

figure a

Finally, and perhaps most importantly, before broad conclusions can be made regarding the role of a particular gene or impact of given experimental manipulations, it is essential that the findings are reproduced using complementary approaches, i.e. genetic loss- and/or gain-of-function, genetic rescue and pharmacological tools (inhibitors, antagonists and agonists), when available, and in different animal models rather than a single mouse strain.

The research environment and publication process have evolved in such a way that favours depth over breadth and encourages (even incentivizes) overinterpretation of the findings and inflation of the conclusions so as to convince the reader or reviewer of the translational potential of the work [99]. As a first step towards reversing this dangerous trend, may we all (including the authors) agree to: (1) provide detailed and accurate reporting of the methodologies; (2) acknowledge the caveats and limitations of a particular rodent model or metabolic test; and (3) avoid such sentences as ‘these findings identify a novel pathway for the treatment of metabolic diseases’ from our future publications?