Background

Obesity is a defined as the excess of body fat. It is a complex disorder, influenced by a number of genetic and environmental factors. There has been a dramatic increase in the number of overweight and obese individuals, both children and adults, globally [1]. Pakistan with a total population of 184.35 million in 2012–13 is the 6th most populous country of the world [2]. According to the Global Burden of Disease Study, Pakistan ranked 9th out of 188 countries in terms of obesity [3].

Traditionally, before the advent of high throughput genotyping methodologies, the contribution of genes to the risk of development of disease was recognized through the increased risk of disease in the proband’s relatives. The genetic component was then expressed as heritability estimates or variance components. However, rapid developments of high-throughput genetic technologies have led to the genome-wide association studies (GWAS) [4]. The GWASs analyze common variations by genotyping of a large number of SNPs (~ 0.5–1 million) in a case control study design. The results are then used to determine which of these SNPs reach genome wide significance level with the outcome (mostly the disease) [5]. One problem with common variants is their small effect sizes (the contribution of a SNP to the genetic variance of a trait) accounting for a small fraction of variance in the disease risk. Familial clustering of complex diseases suggests that the heritable risk factors are of large effect sizes therefore a GWAS is unable to detect such variants because of a very low frequency. The situation is further complicatede due to epistatic effects resulting from the interaction of variants in different genes. The epistatic effects thus confound the search for new loci because their probability is the product of probabilities of low frequency individual variants [6].

The genetics of complex disease is inherently based on statistical methods because the phenomenon (e.g., obesity) being a complex disorder is itself probabilistic by definition. In order to interpret meaningful results from a dataset, various statistical methods are needed. The commonly used statistical procedures include use of risk prediction algorithms (relative risk, odds ratio), family analyses (liability threshold models) and regression methods (linear/logistic regression) [7]. These methods are based on assumptions which can be very different and even incompatible. In GWASs, the inclusion of a large number of SNPs leads to more accurate gene identification in theory because it is based on the frequency of individual risk alleles. However, this theoretical advantage is reduced either by multiple testing correction (due to the inclusion of many SNPs), or by the increased degrees of freedom. The use of a weighted score test (WST) or gene score with only one degree of freedom has been suggested to handle the above mentioned limitation [8].

Gene score is defined as the sum of all the risk alleles of the selected variants present in each study participant. However, this approachfaces a problem when some of the SNPs are positively associated with the outcome of interest (i.e., increase the risk of disease), while some are protective (i.e., decrease the risk of disease). In order to overcome this limitation, SNP coding is adjusted before a gene score test in such a way that all the alleles are positively correlated with the outcome [9]. Another problem encountered in gene scoring is the use of information from all of the SNPs, although some SNPs have low and others have high effect sizes, resulting in the reduction in the study power. The current ways to deal with these issues include modified forward multiple regression (MFMR, has higher power to detect weak genetic effects and has limited number of false positives), Bonferroni correction (used to counteract the problem ofmultiple comparisons, particularly when many SNPs are included simultaneously, the p-value used for statistical significance cutoff is 0.05/the number of the SNPs), false discovery rate (FDR, a method of conceptualizing the rate of type I errors in null hypothesis testing when conducting multiple comparisons and randomization tests (significance test that will have a false rejection rate always equal to the significance level of the test) [10].

There has been scarce research on the obesitygenetics in Pakistan and most of it focussed monogenic forms. We have chosen those SNPs which are either candidate or GWAS hits for involvement in the energy regulation pathway. For the current investigation, ten SNPs were chosen because it is a pilot study and we were limited by the resources. It was taken care that the SNPs chosen had intermediate MAFs and they have previously been shown to predispose to obesity in other ethnicities. A gene score approach has not been tried for these SNPs in the Pakistani subjects. We are the first to use this approach to our ethnic group. We therefore aimed to look for any difference which use of a genetic risk score can make in comparison to the individual risk variants.

Methods

Study subjects

The study was a case control observational type and included 475 subjects (250 cases and 225 controls). Study subject recruitment was done from various cities of Punjab, Pakistan. The study subjects’ recruitment details, inclusion and exclusion criteria have been described elsewhere in detail [11]. The inclusion criteria were BMI and WHR cut offs defined for Asian populations previously (for obese cases: BMI > 23Kg/m2 as overweight and > 26Kg/m2, for controls: BMI < 23Kg/m2). Exclusion criteria for both cases and controls included pregnancy, presence of malignancies and recent infections. The study was approved by the institutional ethics committee (Ethical Committee, School of Biological Sciences, University of the Punjab, Pakistan), subjects gave a written informed consent and all procedures were carried out in compliance with the Helsinki declaration.

Anthropometric measurements, blood sampling and biochemical analyses

The measurement of body weight (Kg), height (m), waist and hip circumference (cm) was according to standard procedures as described previously [12]. BMI (body mass index, Kg/m2) and WHR (waist to hip ratio) were calculated for each study subject. Blood samples were taken after 8-12 h fasting, half sample was used for DNA isolation while the rest half was used to obtain serum. Serum was separated by centrifuging gel vacutainers at 14,000 rpm for 10 min, collected in sterilized eppendorf and screened for any infectious agents (HBV, HCV, HIV). Any positive samples were discarded and safe samples were used for the lipid profile determination. Serum total cholesterol (TC), triglycerides (TG), high density lipoprotein cholesterol (HDL-C), and low density lipoprotein cholesterol (LDL-C) were measured using commercially available kits (Spectrum Diagnostics, Egypt). Epoch, Biotek microplate reader (Biotek instruments, Highland Park) was used for all optical density measurements.

Genotyping

Genomic DNA was isolated from blood leukocytes using Wizard® Genomic DNA purification kit (Promega, USA). DNA was quantified using nanodrop (ND-8000, USA), and made to a 5 ng/μl concentration. The variants included the common SNPs in the genes involved in either the energy regulation (candidate) genes or GWAS implicated (non-candidate) genes (Additional file 1: Table S1). The genotyping methodologies for these SNPs were based on PCR-RFLP, tetra-ARMS or TaqMan methods (leptin (LEP) gene SNP G2548A, leptin receptor (LEPR) SNP Gln223Arg, and fatty acid binding protein 2 (FABP2) SNP Ala54Thr, were genotyped by PCR-RFLP method, the FTO gene SNP by tetra-ARMS PCR and rs3923113 near growth factor receptor bound protein (GRB14), rs16861329 in sialyltransferase 6 galactosidase 1 protein (ST6GAL1), rs1802295 in vacuolar protein sorting associated protein (VPS26A), rs7178572 in high mobility group protein 20 A (HMG20A), rs 2,028,299 in adaptor related protein complex (AP3S2) and rs4812829 in hepatocyte nuclear factor (HNF4A) by TaqMan allelic discrimination assay). The reaction mixture composition and PCR conditions have been described previously [11,12,13,14].

Gene score (GS) calculation & statistical analysis

For the GRB14 and ST6GAL1 SNPs, the major alleles while for the rest the minor alleles were risk alleles. SPSS was used to construct the gene score of the included variants. The SNPs were coded as 0, 1, and 2 for presence of no, one and two risk alleles i.e., homozygous protective, heterozygous and homozygous risk genotype, respectively. A new variable named ‘Gene Score’ was computed in the SPSS by adding up the number of the risk alleles for all the SNPs in each subject (e.g., if a subject has the allele profile for all variants as 0, 0, 1, 2, 0, 1, 1, 2, 0, 1, 1, 2, and 0, the gene score would be 0 + 0 + 1 + 2 + 0 + 1 + 1 + 2 + 0 + 1 + 1 + 2 + 0 = 11). The trend of gene score in cases and controls was analyzed by a normal distribution curve and the effect on anthropometric and biochemical traits was checked using linear regression taking obesity or lipid traits as dependent and gene score as the independent variable. The analyses were adjusted for confounders including age, gender, socioeconomic status (SES), hypertensive, diabetic, CVD status, etc. The difference between mean gene score in cases and controls was checked by the independent sample t-test. Due to the inclusion of multiple SNPs, a corrected p-value (0.05/10 = 0.005) was used as a significance cutoff.

Results

The study subject characteristics have been published previously (Table 1) [11]. The reference SNPs’ information including name, respective gene and the minor allele frequency in the cases and the controls is given in Additional file 1: Table 1. Table 1 showed that all the parameters except height differed significantly between the cases and the controls as tested by the independent sample t-test. The lipid profile parameters deviated from normal ranges (with TC, TG and LDL-c significantly increased and HDL-c significantly decreased) in the cases as compared to the controls. The genotyping call rates for all the SNPs were ~ 98%.

Table 1 Study Population Anthropometric and Biochemical characteristics [1]

Gene score distribution in the cases and the controls

The comparison of the gene score between cases and controls is given in the Fig. 1. It shows that the curve is shifted towards right in the cases indicating that a greater number of individuals possessed a higher gene score as compared to the controls whereby the majority of the individuals had a lower number of risk alleles. The mean gene score of the participants is given in Table 1 and showed that in controls (8.35 ± 2.07) and cases (9.1 ± 2.26) was significantly different (p = 2 × 10− 4). As ten SNPs were included in the analysis, the maximum number of risk alleles an individual could possess is twenty. The descriptives of the gene score are summarized in Table 1.

Fig. 1
figure 1

Histograms of Gene score in cases and controls. The histograms show the normal distribution of the risk allele count of the study participants, the top half for the controls and the bottom half for the cases. On the x-axis, gene score is plotted and on y-axis, frequency is mentioned. The bars show the respective frequency of study subjects in each group on that particular gene score

Comparison of the effect of gene score and individual variants on obesity

In order to check whether the use of the gene score approach improves the association of the genetic component to obesity as compared to the individual variants, we performed a linear regression analysis. The association of the individual variants and the gene score with the obesity showed that the individual variants had either marginal or no significant association with obesity, but the gene score was highly significantly associated with the obesity. The p-values in the table indicate the strength of association of the single SNPs and the gene score (Table 2).

Table 2 Comparison of Association of Individual SNPs and Gene Score with Obesity

The effect of the gene score on anthropometric parameters

The effect of the gene score on anthropometric parameters is presented in Tables 3 and 4 , Table 3 summarizing the increase in the mean values of a parameter with increasing gene score and Table 4 showing the quantitative increase per increase of one risk allele. It is clear from Table 3 that the increase in the number of risk alleles (i.e. the gene score) increased the weight, BMI, WC, HC and WHR. This is further clarified in the Fig. 2 showing a graphical plot of the relationship of the gene score and the anthropometric traits. The effect of the gene score on the selected anthropometric traits appeared to be quantitative and is shown in Table 4. The beta effect means the per risk allele increase in a parameter and the p-value shows whether this increase is significant or not. It is clear that the per allele increase in weight, height, HC and WHR is insignificant while highly significant increase is observed for BMI and WC.

Table 3 Effect of Gene Score on the anthropometric traits
Table 4 Quantitative Effect of the gene score (risk allele) on selected traits, expressed as the beta effect (Standard error)
Fig. 2
figure 2

Effect of gene score on the anthropometric traits. The figure shows gene score on x-axis and mean values for the respective parameters on the y-axis. The slope of the lines indicate the trend of the parameter with increasing gene score

The effect of the gene score on biochemical parameters

The change in biochemical traits with increasing gene score is given in Table 5 and per risk allele increase in the mean values are shown in Table 6. The presence of an increasing number of risk alleles makes the lipid traits more dyslipidemic as indicated by the increase in the values of TC, TG and LDL and decrease in the concentration of HDLC with an increasing gene score in the Table 5. The quantitative effect of the gene score on the selected biochemical traits in Table 6 indicated a strongly significant increase in all the lipid parameters’ levels with the presence of each risk allele. A Bonferroni’s correction was made for analyses and a corrected p-value (0.005) was used for testing the significance of the association of the gene score due to inclusion of ten SNPs. Gene score appeared to be significantly associated with only BMI, WC and all lipid traits.

Table 5 Effect of Gene Score on the biochemical traits
Table 6 Quantitative Effect of the gene score (risk allele) on biochemical traits, expressed as the beta effect (Standard error)

Discussion

The use of a genetic risk score is not a completely new idea, it is being used in the risk scoring of heart diseases in addition to the conventional risk factors in many developed countries to decide about the appropriate therapeutic options [15] and a recent study in Pakistan also proposed the use of this approach [16]. It has been used for risk scoring of many polygenic disorders, however, its use is somewhat new for obesity. A recent study found significant association of genetic risk score for 32 loci with obesity in obese subjects with major depressive disorder [17], a 32-locus genetic risk score was also found to be statistically significant predictor of body mass index and obesity in White subjects from Atherosclerosis Risk in Communities (ARIC) cohort [18], whereas another study reported the association of gene score with serum triglyceride levels in morbidly obese Mexican subjects [19].

We used the gene score to study the combined effect of risk alleles on obesity. This is a robust approach, particularly when sample size is small, as it gives information regarding the additive effects of multiple variants in different genes in the same individual. The effect size assigned to each variant is independent of the effect estimated from the current small study such that the power problem is somewhat overcome. We selected ten variants from different genes that means an individual could have a maximum of twenty risk alleles. By using a gene score approach, we found that there is a significant difference in the mean gene scores between cases and controls which indicates the role of this index in the development of disease.

Among anthropometric traits, gene score appeared to be significantly associated with BMI and WC only. These are indices of central and abdominal obesity and association of the gene score with these parameters shows that the effect of each risk may be small on its own, but when combined can affect the overall fat distribution and disturb the fat metabolism resulting in an increase in BMI and WC. It is important to note, however, that although we could detect association with these two indices only, the trends of HC and WHR are also in the same direction. The lack of a statistically significant association may either be due to small sample size or the possibility that other variants which have not been included in the study may have influenced the effect of the SNPs.

It has been observed that many lipid/lipoprotein abnormalities are prevalent in obesity, such abnormalities are collectively termed as dyslipidemia, however, these dyslipidemias are often hyperlipidemia wherein majority lipids are shifted towards the upper limits of range or higher than the range. Obesity associated dyslipidemia is characterized by an increase in total cholesterol (TC), triglycerides (TG), low density lipoproteins (LDL-c), and decrease in high density lipoproteins (HDL-c), with TG and HDL being the most consistent and pronounced. One study considered fat distribution as an important factor for determining the differential distribution of TG, HDL and lipoproteins in both sexes and indicated lipid profile in obese persons as an important factor for progression to cardiovascular diseases [20,21,22,23]. We observed that the associations with the lipid traits became less significant when adjusted for BMI, while the association with TG was no longer significant. This showed that the associations were mediated somewhat by BMI. It is thus unclear whether lipid or obesity is causal for the others or the genes have pleiotropic effects on both traits.

The genetic contribution to obesity is well known in the present era and many remarkable achievements have been made in elucidating the role of genetics in the development of obesity. There is a long list of the candidate and non candidate genes known to be associated with obesity and we have comprehensively reviewed it previously [24]. We selected only a few from this list and from other sources retrieved from various search engines because of the resources available that supported the current analysis only. Because of this consideration, we tried to select a representative set of variants, from a number of genes so that this pilot study can provide us information about the significance to study role of these variants in context with obesity in the Pakistani subjects. The study has the limitation of relatively small sample size, inability to include more SNPs into analyses and different genotyping strategies for different SNPs. For the first two limitations, future studies should be planned to identify a panel of common variants associated with obesity in the Pakistani population. The limitation of genotyping techniques was relatively overcome by adopting a stringent control over genotyping call rate, the genotyping was repeated wherever a discrepancy was found. However, better results may have been obtained if all variants were genotyped by the same technique.

Conclusion

As complex disorders are the result of interaction of a number of factors, special attention is needed to cope and treat these issues, one such major problem is ‘obesity’. In the era of current research, it’s not appropriate to rely only on the biochemical parameters of serum for treatment of problem after appearance of symptoms, rather use of information from different genes known to play a role in obesity in the concerned ethnic group should be used to calculate the ‘risk’ an individual possesses for the development of obesity in the future and treat the at risk individual so as to prevent the progression to obesity.