Despite remarkable progress in our understanding of the genetic architecture of cardiometabolic diseases, which account for substantial direct and indirect health care costs, it is unclear whether genetic information is associated with health services utilization and costs. The speed of discovery of genetic variants for common diseases has increased rapidly, leading to a debate how this information might be useful to improve screening for and stratified treatment of common disorders. One potential application is the use of genetic data to improved identification of disease susceptibility in initially healthy individuals and to focus preventive measures on those at the highest risk of future disease [1, 2]. Similarly, the present study builds upon this work by investigating whether genetic data is related to health services utilization.

Many previous studies have tried to explain variation in health services utilization, including predictive modelling of health care costs and risk-adjustment models for health insurers, and typically include socio-demographic variables, clinical conditions or diagnoses, medication data and self-rated health status [35]. The role of genetic data has received little attention in the area of health services research, despite its consistent association with morbidity in some areas. For example, a strong association of PNPLA3 rs738409, a single nucleotide polymorphism (SNP) in the patatin-like phospholipase domain-containing 3 (PNPLA3) gene, with liver fat content, fibrosis, cirrhosis and non-alcoholic fatty liver (NAFLD)-related hepatocellular carcinoma has been reported [69]. Recently, TM6SF2 rs5842926 has been associated with liver fat fraction, severity of steatosis, and fibrosis [10, 11].

Previous work suggests that hepatic steatosis might lead to several comorbidities including the metabolic syndrome [12, 13] and cardiovascular diseases [14, 15]. These conditions result from a complex interplay between genetic and environmental factors and pose a huge economic burden on health care systems. Consequently, hepatic steatosis has also been related to increased future health care costs and hospitalization [16]. Because clinical conditions associated with hepatic steatosis are producing substantial health care costs, it is reasonable to assume that genetic variants in the PNPLA3 and TM6SF2 genes, as promoters of liver fat, might be associated with health care utilization. We assumed different mechanisms by which these SNPs might affect health services utilization. Accordingly, we tested whether the following mediators might, at least in part, explain potential associations between SNPs and health services utilization outcomes: ultrasound hepatic steatosis, serum alanine aminotransferase, ferritin, metabolic syndrome, waist circumference, body mass index, triglycerides, high-density lipoprotein (HDL) cholesterol, blood pressure, serum glucose, and glycated hemoglobin (HbA1c) [1721].


Study participants

The presented data were derived from the population-based Study of Health in Pomerania (SHIP). The study design has previously been described in more detail [22]. In brief, a multistage random sample was drawn from the population aged 20 to 79 years of West Pomerania, a north-eastern coastal region of Germany. The examinations were conducted between 1997 and 2001. From the 7008 initially sampled subjects, 6265 were eligible for the study and 4308 participated (response proportion 68.8 %). We excluded participants with positive findings for hepatitis B surface antigen or presence of anti-hepatitis C virus antibodies (n = 28). In addition, participants who had DNA with of insufficient genotyping quality (n = 227); with missing data on the hepatic ultrasound measurement (n = 70), missing self-reports of liver cirrhosis (n = 6), hospitalization (n = 17), smoking (n = 19), and missing values for ferritin (n = 20), creatinine (n = 21) and waist circumference measurements (n = 7) had to be excluded. The analysis sample included 3759 subjects. All participants gave written informed consent and the study was approved by the ethics committee of the University of Greifswald. Weighting was used to adjust for bias due to differences in responses, probabilities of selection, and discrepancies between data from official statistics and our samples with regard to demographic and geographical distributions [23].

Data collection and measures

Information on socio-economic characteristics, lifestyle habits, medication use and health services utilization was collected by trained and certificated medical staff during a standardized computer-assisted personal interview. Number of outpatient visits was measured using two questions: (1) Have you visited a physician (general practitioner or specialist) in the past year? and (2) If “yes,” how often did you visit a particular physician in the past year? Subjects responded to a list of 18 different types of physicians and specialists. The analyses were restricted to general health services and excluded visits to dentists. Inpatient service was measured by asking the participants if they had been hospitalized at least once in the past 12 months and by further probing for the number of hospitalized days during the past year if they answered affirmatively [16].

All participants underwent an extensive medical examination including the collection of a blood samples and a sonographic examination of the abdomen, performed by trained and certified physicians using a 5 MHz transducer and a high resolution instrument (Vingmed VST Gateway Santa Clara, CA). The sonographers were unaware of the participant’s clinical and laboratory characteristics. A hyperechogenic pattern was defined as the presence of an ultrasonographic contrast between hepatic and renal parenchyma [16]. Hepatic steatosis was defined as the presence of a hyperechogenic liver pattern.

Educational attainment was estimated by recording years of schooling completed. Income was included as “equalized” household income (in €), applying the Luxembourg Income Study recommendation to divide the household income by the square root of the number of household members [24]. Alcohol consumption was assessed using a beverage-specific quantity-frequency measure [25]: number of days with alcohol consumption (beer, wine, spirits) and average daily alcohol consumption for such a day over the past month. Average daily consumption (in grams pure ethanol per day) was calculated by multiplying frequency and amount, using beverage-specific standard ethanol contents [25]. Study participants provided information about whether they had ever smoked cigarettes regularly (never, past only, current). Waist circumference was measured to the nearest 0.1 cm using an inelastic tape midway between the lower rib margin and the iliac crest in the horizontal plane, with the subject standing comfortably with weight distributed evenly on both feet. We used body mass index (BMI) ≥ 30 for obesity. Systolic and diastolic blood pressures were measured on the right arm of rested and seated participants using a digital blood pressure monitor (HEM-705CP, Omron Corporation, Tokyo, Japan). The second and third blood pressure measurements were averaged and used for analysis. A non-fasting venous blood sample was obtained from all study participants between 07:00 AM and 04:00 PM [26] photometrically (Hitachi 704 and 171, Roche, Mannheim, Germany). Serum ferritin levels were determined by an immunoturbidimetric assay (Cobas Micra Plus, F. Hoffmann-La Roche Ltd). Serum low-density lipoprotein (LDL) cholesterol and HDL cholesterol were precipitated and measured photometrically (Boehringer). A total cholesterol/HDL ratio ≥ 5 indicated dyslipidemia [27]. Triglycerides and glucose were determined enzymatically using reagents from Roche Diagnostics (Hitachi 717, Roche Diagnostics, Mannheim, Germany). Serum alanine aminotransferase (ALT) was measured photometrically (Hitachi 704 and 171, Roche, Mannheim, Germany) [16]. HbA1c was determined by high-performance liquid chromatography (Bio-Rad Diamat Analyzer, Munich, Germany). The creatinine concentration (Jaffé method) was determined on a Hitachi 717 (Roche Diagnostics, Mannheim, Germany). The estimated glomerular filtration rate (eGFR) was estimated according to the MDRD-formula and expressed in mL/min/1.73 m2 [28].

The metabolic syndrome was defined as the presence of at least three of the following five components [29]: (1) central obesity (waist circumference ≥94 cm in men and ≥80 cm in women); (2) reduced HDL: non-fasting HDL <1.03 mmol/L in men and <1.29 mmol/L in women, drug treatment for reduced HDL is an alternate indicator [30]; (3) elevated blood pressure: systolic ≥130 mmHg and/or diastolic ≥85 mmHg or antihypertensive drug treatment; (4) hypertriglyceridemia: non-fasting plasma triglycerides ≥2.3 mmol/L or drug treatment for elevated triglycerides [30]; (5) hyperglycemia: non-fasting glucose level of ≥8.0 mmol/L (≥144 mg/dL) or drug treatment of elevated glucose [30].

The definition of other comorbidities was based on self-reported physician’s diagnosis or self-reported use of medication. Medical history included a recall of physician’s diagnosis of a list of 15 chronic conditions including diabetes, myocardial infarction, angina pectoris, congestive heart failure, obesity, arthritis, osteoporosis, chronic obstructive pulmonary disease, neurological disease (such as multiple sclerosis or Parkinson’s disease), upper gastrointestinal disease (ulcer, hernia, reflux), stroke, anxiety, and depression. Comorbid health status was measured by the Functional Comorbidity Index (FCI), which is a summary measure of comorbid diseases selected and weighted according to their association with physical functioning [31].


Genotyping was performed using the Human SNP 6.0 Array (Affymetrix, Santa Clara, CA, USA). Hybridisation of genomic DNA was genotyped according to the manufacturer’s standard recommendations. Genotypes were determined using the Birdseed2 clustering algorithm. For quality control purposes, several control samples where added. On the chip level, only subjects with a genotyping rate on QC probe sets (QC call rate) of at least 86 % were included. All remaining arrays had a sample call rate > 92 %. Imputation of genotypes in SHIP was performed with the software IMPUTE [32] v0.5.0 based on HapMap II CEU (rs738409) or IMPUTE v2.2.2 based on 1000Genomes v3 ALL populations reference panel (rs58542926). Because rs738409 and rs58542926 were not directly genotyped on the array but available in the imputed dataset, the best-guess genotypes of this SNP were used for the subsequent analyses by assigning the genotype having the highest probability after imputation to the corresponding individual. Quality score of imputation measured via observed by expected variance ratio of the genotypes was 0.96 for rs738409 and 0.99 for rs58542926, where quality scores may range from 0 to 1 and a value of 1 indicates nearly perfect imputation quality.

Statistical analyses

Categorical data were expressed as percentages; continuous data were expressed as arithmetic mean (SD). Annual numbers of outpatient visits and inpatient days, income, alcohol intake and ferritin were log-transformed and the geometric means (SD) are reported, as they followed approximately a log-normal distribution. Since the number of annual outpatient physician visits exhibited a skewed and discrete distribution, the traditional linear (least square) regression model was inappropriate [33]. Therefore a negative binomial model was used to evaluate the association of PNPLA3 rs738409 and TM6SF2 rs5842926 with the number of outpatient visits [33]. Transformations to rate ratios [i.e. exb(ß)] were performed, which describe the percent change in the outcome. The number of inpatient days among those with hospital stay was estimated using a zero-truncated negative binomial regression and expressed as rate ratios [33]. We examined the relation of PNPLA3 rs738409 and TM6SF2 rs5842926 and risk of hospitalization using a logistic regression model. Results from logistic regressions were expressed as odds ratio (OR), with corresponding 95 % confidence intervals (CI). We adjusted regression models for variables known to be correlated with health services use. Model 1 was adjusted for age, sex, education, income, alcohol intake, smoking status, and GFR. Model 2 added the FCI. Next, we investigated whether a potential association between PNPLA3 rs738409 and TM6SF2 rs5842926 and health services utilization might be mediated by ultrasound hepatic steatosis, serum alanine aminotransferase, ferritin, metabolic syndrome, waist circumference, body mass index, triglycerides, high-density lipoprotein (HDL) cholesterol, blood pressure, serum glucose, and HbA1c [1721]. We therefore performed models that additionally included each of these intermediate factors (models 3 to 13 in Tables 2 and 3).

Internal validation of the models was performed using a 10-fold cross-validation [34]. The performance of the regression models was evaluated using Nagelkerke’s R2 [34] (all models), root mean squared error and mean prediction error (number of outpatient visits, number of inpatient days) [3], and the area under the receiver operating characteristics (AUC) curve (hospitalization) [34]. The AUC of models without and with SNPs were compared using the method of DeLong and Clark-Pearson for correlated data [35]. We reported the median of all assessments of the Nagelkerke R2, mean prediction error, and AUC across the 10 cross-validations [34]. We performed bootstrap validation of all regression models using 300 replications as a sensitivity analysis [36]. Because the findings of the bootstrap validation were almost identical to those from the cross-validation, we reported only the results from the cross-validation procedure. Bias-corrected 95 % confidence intervals (CI) were reported [37]. Stata 13.1 SE was used for statistical analyses (Stata Corporation, College Station, TX, USA).


Socio-economic factors, lifestyle habits, clinical factors and comorbidities by PNPLA3 rs738409 and TM6SF2 rs58542926

The socio-economic, lifestyle and clinical characteristics according to genotypes of PNPLA3 rs738409 and TM6SF2 rs58542926 are shown in Table 1. Regarding PNPLA3 rs738409, hospitalizations during the last year and annual number of inpatient days were higher in minor allele homozygous subjects (GG) than in major allele homozygous subjects (CC). Minor allele homozygous subjects also revealed a higher prevalence of hepatic steatosis and increased ALT, were older, exhibited lower waist circumferences, suffered less frequently from angina pectoris and arthritis than major allele homozygous subjects. Both groups were similar regarding the number of annual outpatient visits, sex, level of education, income, alcohol intake, smoking, and other clinical variables.

Table 1 Characteristics of study participants by PNPLA3 rs738409 and TM6SF2 rs58542926

Furthermore, comparing the heterozygous subjects (CG) with major allele homozygous subjects, heterozygous subjects more frequently exhibited hepatic steatosis and increased ALT and had a lower waist circumference. Both groups did not differ regarding the probability of hospitalization during the last year, the annual number of inpatient days as well as other socio-economic, lifestyle, and clinical factors (Table 1). Minor allele homozygous subjects (TT) of TM6SF2 rs58542926 more often had hepatic steatosis and more annual outpatient and inpatient days than major allele homozygous subjects (CC). Minor allele homozygous subject also had lower educational attainment, higher ferritin, lower LDL and total cholesterol, and lower triglyceride levels. Heterozygous subjects (CT) of TM6SF2 rs58542926 more frequently hepatic steatosis, increases ALT, lower LDL cholesterol, lower total cholesterol, and higher triglycerides than major allele homozygous subjects.

Regression analyses of PNPLA3 rs738409and TM6SF2 rs58542926 with outpatient services utilization and hospitalization

Results of regression models for the association between PNPLA3 rs738409 and health services utilization and hospitalization are summarized in Table 2. Logistic regression analysis revealed that minor compared to major allele homozygous subjects had 1.51 higher odds (95 %-CI: 1.02–2.15) of hospitalization after adjustment for age, sex, education, income, alcohol intake, smoking status, waist circumference, and GFR (Table 2, model 1). Further adjustment for comorbid conditions did not attenuate the OR (Table 2, model 2). In contrast, genotype groups did not differ with regard to the numbers of annual outpatient and inpatient visits, after full adjustment. In addition, heterozygous subjects did not differ from major allele homozygous subjects regarding the annual number of outpatient visits, odds of hospitalization, and the annual number of inpatient days.

Table 2 Associations of PNPLA3 rs738409 with health utilization and hospitalization (n = 3759)

We assumed that the association between PNPLA3 rs738409 and health services utilization might be mediated by hepatic steatosis, ALT, ferritin, the metabolic syndrome, waist circumference, BMI, triglycerides, HDL, systolic blood pressure, serum glucose or hbA1c (Table 3, models 3 to 13). We found that the point estimates for the association of PNPLA3 rs738409 with health services utilization were largely unchanged after adjustment for any of these hypothesized mediators (except for serum glucose), which provides little evidence that these measured phenotypes are intermediate steps on the causal pathway from PNPLA3 rs738409 to hospitalization.

Table 3 Associations of TM6SF2 rs58542926 with health utilization and hospitalization (n = 3759)

Next, we regressed health services utilization outcomes on TM6SF2 rs58542926 (Table 3). Heterozygous subjects (CT) and minor homozygotes (TT) had an approximately 53 % (95 % CI: 18.3–67.6) and 68 % (95 % CI: 28.3–134.6) higher number of annual outpatient visits then major homozygotes after adjustment for age, sex, education, income, alcohol, and GFR (Table 3, model 1). Further adjustment for comorbid conditions slightly increased effect sizes (Table 3, model 2). To evaluate mediation, we tested whether including assumed intermediate variables change the effect sizes (Table 3, model 3 to 13). Point estimates were slightly attenuated after adding the metabolic syndrome, waist circumference, triglycerides and serum glucose to model 1. Likewise, TM6SF2 rs58542926 was an independent predictor of length of hospital stay. Minor heterozygous subjects had a 123 % (95 %-CI: 33.1–276.0) and heterozygous subjects a 85 % (95 % CI: 18.9–188.8) higher number of annual hospital days than major homozygotes (Table 3, model 1). Inclusion of metabolic syndrome and its single components attenuated point and variance estimates (Table 3, models 3 to 13).

Incremental predictive power of PNPLA3 rs738409and TM6SF2 rs58542926

To examine the incremental predictive power for prediction of annual outpatient visits and correct for optimistic prediction, we conducted cross-validation of all regression models. We computed measures of predictive performance for model 2, model 2 plus each of the SNPs, and model 2 plus both SNPs (Table 4). Adding both SNPs to model 2 resulted in small changes in the R2 values, mean squared errors, absolute prediction error, and AUC.

Table 4 Predictive power of models with established health utilization predictors without and with SNPs: results of the 10-fold cross validation


Previous studies have reported a consistent association of PNPLA3 rs738409 and TM6SF2 rs58542926 with hepatic steatosis [711, 38, 39], which results in increased future health care expenditures [16]. Accordingly, we hypothesized that PNPLA3 rs738409 or TM6SF2 rs58542926 might be useful to identify subjects who would have increased health services utilization. Therefore, we correlated PNPLA3 rs738409 and TM6SF2 rs58542926 directly with different measures of health services utilization in a general population sample. We observed that the minor allele of PNPLA3 rs738409 was associated with increased odds of hospitalisation. Similarly, both minor allele homozygous and heterozygous subjects of TM6SF2 rs58542926 exhibited a higher number of annual outpatient visits and more hospital days. We also investigated mediation by variables related to fatty liver disease and components of the metabolic syndrome. The regression coefficients of PNPLA3 rs738409 and TM6SF2 rs58542926 were essentially unchanged upon adjustment for measured hepatic steatosis, ALT or ferritin but attenuated after inclusion of features of the metabolic syndrome. This indicates that components of the metabolic syndrome might be intermediate variables on the pathway from PNPLA3 rs738409 and TM6SF2 rs58542926 to health services utilization. Inclusion of the SNPs only marginally improved the predictive performance of models predicting the number of outpatient visits, hospitalization, and the number of inpatient days.

The reasons for the significant association of PNPLA3 rs738409 and TM6SF2 rs58542926 with outpatient and inpatient services utilization even after controlling for multiple covariables are not entirely clear. One possible explanation is that SNPs are associated with unmeasured phenotypes that cause health care outcomes that were not included in our regression models. Some assessments of chronic diseases were based on self-report data from our standardized interview, which is less specific and might result in misclassification further increasing the likelihood of detecting false-positive association. Another possible explanation is that PNPLA3 rs738409 and TM6SF2 rs58542926 is not directly responsible for the association but is in linkage disequilibrium with another SNP, which might have introduced genetic confounding.

As far as we know, this is the first population-based study that uses genetic data as correlates of health services utilization and investigates whether adding genetic information to a predictive model of established health services predictors improves prediction. The findings gave some indications that individual genetic data might be useful in screening for excess health services utilization and costs. Access to health care is unrestricted in Germany and cost of treatment is either covered by private or statutory health insurance. Screening based on genetic information has not been introduced in Germany. However, for making decisions regarding genetic testing it has to be considered that the inclusion of genotype information in our multivariable models with established predictors of health care utilization did not improve prediction and an external replication of our findings needs to be conducted to support our data.

The ethical and economic implications of genetic screening need to be examined before any recommendations can be made [40]. False-positive screening results can lead to additional diagnostic tests or cause unnecessary anxiety and thereby needless increase in health care use [41]. It is essential that genetic screening and its consequences are transparent and adequately understood by the target population [42, 43]. A serious critical ethical issue of genetic testing is the potential for discrimination and stigmatisation of individuals and groups. To assess genetic screening, economic evaluations need to be performed [44].

Our study has several strengths and limitations that need to be considered. Major strengths of this study include the sample size, the comprehensive clinical characterization, strict quality control procedures and standardized protocol as well as trained and certified staff for all data acquisitions. However, due to the non-fasting status of our participants a non-standard definition of the metabolic syndrome had to be applied [30] and it is not clear how this could have affected our estimates. Health care utilization outcomes were assessed using questionnaire self-reports. The usual limitations of this approach apply, especially the underreporting in self-reports and incomplete assessment of services. However, relative effects, which are of primary interest in this study, may be less biased than absolute numbers. Claims data containing the direct medical cost would help to improve the validity of the findings. Furthermore, the study was performed in a Caucasian, European population. This is not clear whether the observed findings are generalizable to other populations. Subjects that refused to take part in the study might have been different from participants with regard to health services utilization and their genetic profile, which might have introduced selection bias. Another possible limitation of this study is the imputation of SNPs and the lack of information about the specific causes of hospitalization.


In conclusion, the study illustrated that genetic information might be associated with health services utilization. Further studies in independent cohorts are needed to replicate our findings.