Dysglycaemia and South Asian ethnicity: a proteomic discovery and confirmation analysis highlights differences in ZAG

To (1) explore and verify differences in the plasma proteome of white European (WE) and South Asian (SA) adults with normal glycaemic control (NGC) or non-diabetic hyperglycaemia (NDH) and to (2) validate these findings using a separate WE and SA cohort at a high risk of NDH. Mass spectrometry analysis was performed on fasted samples from 72 WE or SA men with NGC or NDH. These results were verified using specific biochemical assays and validated by repeating the analysis in an additional cohort of 30 WE and 30 SA adults. Proteomic results were analysed using independent samples t test and univariate analysis. The targeted assay results were analysed using generalised linear models with adjustment for appropriate covariates including age, BMI, fasting plasma glucose, high-density lipoprotein-cholesterol, triglycerides and sex. Only zinc-alpha-2-glycoprotein (ZAG) significantly differed between both ethnicities and glycaemic control groups. ZAG-specific biochemical assays verified the lower circulating ZAG in SAs (41.09 versus 37.07 (mg L−1); p = 0.014), but not the difference between NGC and NDH groups (p = 0.539). Validation of the ethnicity difference in a separate cohort confirmed that, after adjustment for covariates, ZAG was lower in SAs (p = 0.018). There was no association between ZAG and glycaemic control in the validation cohort. Our analyses identified that ZAG is lower in SAs compared to WEs, but its difference between glycaemic control statuses was uncertain. Further research is needed to establish whether lower ZAG in SAs is associated with, or prognostic of, health outcomes, particularly regarding the risk of dysglycaemia.


Introduction
Type 2 diabetes mellitus (T2DM) is a chronic metabolic disease characterised by peripheral insulin resistance and inadequate relative insulin secretion, resulting in hyperglycaemia (WHO 1994). T2DM encompasses approximately 90% of all cases of diabetes (ADA 2002) and is preceded by a high-risk category termed non-diabetic hyperglycaemia (NDH). In NDH, blood glucose levels are elevated above healthy ranges, but do not meet diagnostic criteria for T2DM (Tabák et al. 2012). NDH not only increases an individual's risk of developing T2DM, but also cardiovascular disease (Danaei et al. 2006).
South Asian (SA) individuals (predominantly Indian, Pakistani and Bangladeshi) account for 20% of the world's population and constitute the largest ethnic minority group within the UK (ONS 2015). SAs have more pronounced insulin resistance, a two to four-fold higher risk of developing T2DM, faster rates of progression from NDH to T2DM and a higher risk of myocardial infarction, compared to white European (WE) individuals (Gholap et al. 2011;Harding 2003;McKeigue et al. 1988;Tuomilehto et al. 1984). Furthermore, these occur at a younger age and lower body mass index (BMI) (Chiu et al. 2011). Reasons for this elevated risk are not fully understood, but genetic, epigenetic and unhealthy lifestyle practices have all been suggested (Sattar and Gill 2015). However, specific physiological pathways that link the SA phenotype to greater risk of metabolic dysfunction need further elucidation in order to better understand mechanisms leading to an elevated risk of T2DM in SA communities.
Proteomic analyses are increasingly used to identify novel markers of disease in order to improve personalised medicine (Anderson 2002). Proteomics is defined as "the use of quantitative measurements at protein-level of gene expression which characterises biological processes and gives insight to mechanisms of the control of gene expression" (Anderson and Anderson 1998). Previous analyses of proteins that differ with increasing insulin resistance, T2DM and an increasing burden of diabetes complications has identified alterations in the circulating concentrations of several proteins, which could be contributing factors to disease pathophysiology (Hwang et al. 2010;Yeh et al. 2016;Zürbig et al. 2012). However, to date, proteomics technologies have not been widely harnessed to aid our understanding ethnic health disparities. Applying such approaches has the potential to identify novel proteins and biological pathways that will enable new insight into the risk of metabolic dysfunction in high risk ethnic groups.
Subsequently, the aims of this study were three-fold. First, to explore the plasma proteome of WE and SA adults with normal glycaemic control (NGC) or NDH using discovery mass spectrometry analysis to identify novel proteins that may be important in elucidating why SAs have a high risk of metabolic disease; second, to verify differences between ethnicities and glycaemic control statuses using targeted biochemical assays; third, to validate findings in a separate cohort of WE and SA adults with a high risk of dysglycaemia.

ADDITION-Leicester cohort
Data for this cross-sectional observation analysis were obtained using baseline samples from the ADDITION-Leicester cohort (Webb et al. 2010), part of the international ADDITION study (Lauritzen et al. 2000). Briefly, ADDITION-Leicester involved a population level diabetesscreening programme followed by a single-blind cluster randomised trial, designed to assess the cost-effectiveness and efficacy of intensive multi-factorial management programme for those with newly diagnosed T2DM. The inclusion for ADDITION-Leicester was WE adults aged 40-75 years and SA or Afro-Caribbean adults aged 25-75 years. Exclusion criteria included individuals with known T2DM, any terminal illness with likely prognosis less than 12 months, females who were pregnant or lactating, or individuals with any psychological disorders that would hinder the ability to give informed consent.
Glycaemic control was established through a standard 75 g oral glucose tolerance test taken at the baseline visit of the study (Diabetes-UK 2017), which was performed following an overnight fast and with participants having refrained from alcohol and strenuous physical activity for 48-h. Impaired fasting glucose (IFG) was defined as a fasting plasma glucose (FPG) concentration between 6.1 and 6.9 mmol·L −1 and impaired glucose tolerance (IGT) was defined as 2-h plasma glucose (2hPG) between 7.8 and 11.0 mmol·L −1 . Those with IFG and/or IGT were defined as having NDH (ADA 2015).
Anthropometric measurements were performed using standardised procedures and by trained members of staff. Height was measured to the nearest 0.1 cm using a stadiometer, whilst body weight (to the nearest 0.1 kg) was measured in light indoor clothing. These measurements were then used to calculate BMI (kg m 2 ).
FPG and 2hPG levels were measured on specific clinical chemistry analyser (Abbott laboratories, Maidenhead, UK) analyser. Fasting lipid levels were measured using specific biochemical assays (Dade Behring Dimension analyser, Newark, USA). Fasting Triglyceride (TG) levels were measured following the enzymatic hydrolysis of glycerol and high-density lipoprotein cholesterol (HDL-c) was measured after the isolation of other non-HDL-c apolipoproteins. Glucose and lipid parameters were measured on the same day, in the same laboratory conditions and quality control assessments were conducted daily.

STAND-UP cohort
Data for this cross-sectional observation analysis were also obtained using baseline samples from the STAND-UP trial (April 2015-March 2017; conducted at the Leicester Diabetes Centre, Leicester General Hospital and the British Heart Foundation Glasgow Cardiovascular Research Centre, University of Glasgow). The methods and results of the trial have been reported elsewhere (Yates et al. 2018). Briefly, this was a multi-component randomised crossover laboratory investigation into the impact of breaking up prolonged sitting with regular bouts of five-min standing or light walking breaks on post-prandial metabolism in older WE and SA adults. A total of 30 WE adults and 30 SA (Indian, Pakistani, Bangladeshi or other) adults were included in the trial. A fasted plasma sample from this cohort was used for this subsequent post-hoc analysis, with participants having fasted overnight for 12-h, avoided alcohol for two-days and vigorous exercise for three days prior to the sample being drawn.
Criteria for acceptance into this study were individuals aged between 65 and 79 years of age, who were able to walk without assistance and were able to understand and communicate in English. Individuals were excluded from the study if they (1) performed purposeful exercise (determined at ≥ 75-min of self-reported vigorous exerciser per week), (2) had any psychological condition that would impair consent and subsequent study participation and (3) were using steroids or glucose lowering therapies.
FPG measures were obtained at the beginning of each trial condition following an overnight fast. The values obtained across the three conditions were averaged and this value was used as a continuous measure in this analysis. Postprandial glucose values were obtained from a standardised meal (8 kcal/kg of body weight) that had a macronutrient content reflective of a typical western diet (13% protein, 52% carbohydrate and 35% fat).
Fasting HDL-c was quantified during the familiarisation assessment (approximately two weeks prior to the first experimental condition). Similar to FPG, fasting TG was quantified at the beginning of each trial condition and the values averaged. All biochemical analysis was performed at the British Heart Foundation Glasgow Cardiovascular Research Centre, University of Glasgow.
Anthropometric measurements were performed using standard procedures by trained members of staff. Height was measured to the nearest 0.5 cm, whilst body weight (to the nearest 0.1 kg) was measured in light indoor clothing. These measurements were then used to calculate BMI (kg.m 2 ).

Discovery proteomic analysis
For the present study, 72 men from the screening phase of ADDITION-Leicester were selected into four equal groups, (1) WE-NGC, (2) WE-NDH, (3) SA-NGC, and (4) SA-NDH. These groups were matched for age and BMI, and only men were selected to reduce heterogeneity amongst subgroups. Discovery proteomic analysis was performed as part of a service by Proteomic Facility, Institute of Psychiatry, Kings College London.
For the proteomics analysis, trypsin-digested albumin and immunoglobulin immuno-depleted samples were labelled with 10-plex isobaric tandem mass tags (TMT, Thermo Fisher Scientific) and then fractionated by isoelectric focusing, before further fractionation and separated by reverse phase chromatography (Easy NanoLC system; Thermo Fisher Scientific). Samples were then delivered to the mass spectrometer at a flow rate of 300 nL·min −1 over 2 h to elute peptides. Peptides were delivered by electrospray into an Orbitrap Velos Pro instrument (Thermo Fisher Scientific) (For full process see Supplementary Material; Additional Proteomics Methods (Pearsey 2020b). Raw data were processed using Proteome Discoverer 1.4 and the Swiss-Prot Human Protein Database was searched using Mascot Algorithm version 2.2.06 to obtain information on protein sequence and function. The eight sets of 12 raw files were then processed by Multi-Dimensional Protein Identification Technology analysis using Proteome Discoverer Daemon 1.4. The concentrations of the identified proteins were reported as median normalised.
The proteins identified that significantly differed between ethnic groups and/or glycaemic control statues were explored through Cytoscape using Clue-GO application to assess protein mechanistic pathways and protein-protein interactions. Report settings selected for analysis were GO Biological Process, Cellular Components and Molecular Function. Also selected were Kyoto Encyclopaedia of Genes and Genomes (KEGG), REAC-TOME pathway and interaction analysis and Wiki Pathways. Proteins that were either up or downregulated acted as the clusters, with the significance of the pathways identified being set as Probability (p)-values of ≤ 0.05. Where no results were returned for the criteria set in Cytoscape, g:Profiler (https ://biit.cs.ut.ee/gprofi ler/gost; Elixar, Estonia) was used to return protein reports.
Only proteins that differed between both ethnicities (SA versus WE) and groups of glycaemic control status (NGC versus NDH) were taken forward for verification using targeted biochemical assays. A list of the other proteins identified can be seen in the Supplementary Material, with Rank 1 proteins being identified in all 72 samples whilst Rank 2 proteins were not identified in every sample.

Targeted biochemical assay analysis
Target assay analysis was performed on the ADDITION-Leicester cohort samples in order to verify the discovery proteomic findings. Subsequently, the same assay was used for the STAND-UP cohort samples to validate the previous findings. Plasma concentrations of zinc-alpha-2-glycoprotein (ZAG) were determined using ELISA (Oxford Biosystems, Milton Park, Oxfordshire, UK), performed according to the manufacturer's instructions. The analytical limit of detection of the assay was 0.673 mg . L −1 . Samples were run in duplicate and those with an intra assay coefficient of variation ≥ 20% were reanalysed. Different plasma aliquots were used from the initial discovery proteomic analysis to ensure results were not impacted by repeated freeze-thaw cycles.

Proteomics
Independent samples t tests were used to compare difference between individual groups and univariate models were used to compare differences in identified proteins between all groups (WE-NGC, WE-NDH, SA-NGC and SA-NDH). Fisher's Least Significant Difference (LSD) post-hoc analysis was employed to determine which pairs of means were statistically different. Univariate analysis was repeated adjusting for important covariates including age, BMI, HDL-c and TG.

Targeted assay analysis
Generalised linear models (GLMs) were used to analyse the data; with ZAG (mg . L −1 ) as the dependent variable and with ethnicity (WE or SA), glycaemic control (NGC or NDH) and continuous glucose measures (FPG or 2hPG) as independent variables. Model 1 was unadjusted, whilst Model 2 was adjusted for age, BMI, HDL-c, TG and ethnicity (for glucose models) or FPG (for ethnicity models). Analysis using samples from the STAND-UP cohort were further adjusted for sex. For the ADDITION-Leicester cohort, glucose was analysed both as a categorical (NGC vs NDH) and as a continuous variable (FPG and 2hPG). The relationship between ZAG and glycemic control was only assessed using glucose as continuous variables (FPG and 2hPG) within the STAND-UP cohort. HDL-c and TG were chosen to be included as covariates in the analysis because circulating levels of ZAG are thought to affect lipid transport (Lei et al. 2017;Zhu et al. 2014).

Significance of findings
Statistical analyses were performed using commercially available software (SPSS 25.0, IBM SPSS, Inc., USA). P-values of ≤ 0.05 were considered statistically significant for the proteomics analysis and targeted analyses. For the proteomics analyses, only proteins that were significantly different across the two comparisons of ethnicity and glycaemic control categories were selected. As this combination of significant results would occur only once every 400 proteins by chance, correction for multiple testing was not undertaken. Nevertheless, the results from the proteomics analysis should be viewed with caution and interpreted in relation to the verification and validation results.

Proteomic analysis of samples from a type 2 diabetes mellitus screening trial
To explore differences in the plasma proteome of WE and SA individuals with NGC or NDH, bottom-up liquid chromatography mass spectrometry was performed on baseline plasma samples from 72 male participants, separated into four equal groups, matched for age and BMI. The median (IQR) age and BMI of all individuals included in this study were 58.5 (46.0, 67.7) years and 27.6 (25.9, 29.9) kg·m 2 , respectively. Participant characteristics for each group can be found in Table 1. A total of 331 proteins were identified in the analysis, 155 of these were identified in all samples (Rank 1). Fortytwo of these proteins differed between glycaemic control statuses, five of which appeared in both ethnicities (apolipoprotein A-I, haptoglobin, Inter-alpha-trypsin inhibitor heavy chain H1, Serum paraoxonase/lactonase 3 and ZAG) (data available at Pearsey 2020c). Twenty-four proteins differed between ethnicities, three of which appeared in both glycaemic control statuses (Carboxypeptidase B2, Ig gamma-1 chain C region and ZAG) (data available at Pearsey 2020c). Functional analysis identified the proteins that significantly differed between glycaemic control statuses as being primarily involved in the regulation of immune responses, peptidase activity and regulating primary and cellular metabolic processes including regulating protein metabolism. Similarly, proteins that differed between ethnicities showed roles in regulating biological processes and regulating immune responses which includes humoral immune response. All proteins largely exist within the extracellular region or exist within the blood microparticle. A full breakdown of the proteins GO terms can be found in the Supplementary Material (Pearsey 2020d).
Only ZAG as significantly different across both groups of glycaemic control (fold change = 0.89; p = 0.001) and ethnicity (fold change = 0.90; p = 0.003), with ZAG being lower in NDH and SA, respectively. LSD comparisons showed that only the difference between WE-NDH and SA-NGC was not statistically different (p = 0.769) and differences remained consistent when adjusted for age, BMI, HDL-c and TG (p = 0.001) (see Supplementary Fig. 1a, b) (data available at Pearsey 2020e). GO-annotation analysis of ZAG through ClueGO shows it primarily functions in transporter and ribonuclease activity which contributions to regulating peptidase (particularly hydrolase) activity, regulating biological and metabolic processes and acting in coordinating an immune response in response to a stimulus. Additional reporting from CORUM (Giurgiu et al. 2019) also shows an interaction between ZAG and prolactin-inducible protein (PIP). GO-annotation analysis of the cellular localisation of ZAG revealed it has been located in the extracellular region, plasma membrane and nucleus.

Verification of samples using targeted analysis specific to zinc-alpha-2-glycoprotein
To assess whether the results of our proteomic analyses could be replicated using biochemical assays specific to ZAG, we obtained a fresh aliquot of fasting plasma belonging to the same participants selected for proteomic analyses (n = 68; fresh plasma aliquots were not available for four participants) and measured ZAG using a specific ELISA (Biosystems; Oxford, UK). ELISA analyses confirmed that circulating ZAG concentrations were higher in WEs compared to SA individuals (41.09 vs. 37.07 (mg·L −1 ); p = 0.014; n = 68) (Fig. 1a). Differences were maintained after adjusting for age, BMI, FPG, HDL-c and TG (p = 0.020) (Fig. 1b). There were no differences in plasma ZAG concentrations between individuals with NGC and NDH (Fig. 1c, d). Similarly, in unadjusted models using the whole study population, there were no associations between ZAG and circulating plasma glucose in either FPG, or 2hPG after an oral glucose tolerance test (Table 2). However, when models were adjusted for age, ethnicity, HDL-c and BMI there was a positive association between circulating ZAG and FPG (β = 3.40 [0.45, 6.35]; p = 0.024) ( Table 2).

Validation of findings using targeted analysis specific to zinc-alpha-2-glycoprotein
To validate the differences in ZAG between WEs and SAs, we performed ELISA (Biosystems; Oxford, UK) on samples from the STAND-UP study of older adults (n = 60; WE = 30, SA = 30). Participant characteristics can be found in Table 3. ZAG concentrations were higher in WEs compared to SA individuals (51.25 vs. 46.70 mg·L −1 ; p = 0.089; n = 60) (Fig. 2a), with the difference becoming statistically significant when adjusting for age, sex, BMI, FPG, TG and HDL-c, with ZAG being 7.28 [1.23,13.33]   Table 4 Associations between circulating ZAG concentrations and FPG or 2hPG (n = 60) Data are β-coefficients with 95% CI, presenting the difference in circulating ZAG (mg·L −1 ) per unit difference in glucose (mmol·L −1 ). mg·L −1 in SAs compared to WEs (p = 0.018) (Fig. 2b). Neither FPG nor 2hPG were found to have an association with circulating ZAG in the validation cohort (Tables 4).

Discussion
In this study, whilst discovery proteomics identified a number of proteins that differ between ethnicities and glycaemic control statuses, showing influence over a number of metabolic and biological processes that could contribute to disease risk, only plasma ZAG concentration differed between both. ZAG was lower in SAs compared to WEs and the difference remained when our proteomics analysis was verified in the same cohort and validated in a different cohort using a targeted ELISA method. In contrast, the lower ZAG values observed in those with higher glucose levels through proteomic analysis were not confirmed with either verification or validation. ZAG is a 40-43 kDa adipokine that is encoded for by the AZGP1 gene (Hassan et al. 2008b) and is found to be present in the majority of body fluids (Burgi and Schmid 1961;Rolli et al. 2007).
In particular, ZAG has a proposed role in lipid metabolism within adipose tissue, exerting both autocrine and paracrine effects (Pelletier et al. 2013), and may have possible applications in obesity and obesity-related disease therapy (Severo et al. 2020). ZAGs role in lipid metabolism is due to its recent categorisation as a potential lipidmobilizing factor (Bing et al. 2004), binding primarily to β-3-adrenoreceptor on the surface of adipocytes resulting in the conversion of adenosine triphosphate to cyclic-adenosine monophosphate and ultimately leading to increased lipolysis (Russell and Tisdale 2012).
This may have direct relevance to SAs, who have been shown to have reduced lipid oxidation when exercising (Hall et al. 2010), and elevated visceral fat, compared to age-and BMI-matched WEs (Sniderman et al. 2007). Subsequently, differences in lipid metabolism in SAs have been suggested to mediate the increased cardio-metabolic risk in SA individuals and may highlight the need to perform more moderate physical activity than WE individuals to achieve similar cardio-metabolic health profiles (Celis-Morales et al. 2013). From the findings of this study, it could be hypothesised that ZAG may play a role in this response, although whether it actively mediates metabolic dysfunction through impaired lipid metabolism in SAs or is simply a marker of its development requires further investigation.
The equivocal findings for an association between ZAG and glycaemic control is consistent with the limited evidence available to date. Our own recent systematic review and meta-analysis of 14 studies found that ZAG is lower in individuals with dysglycaemia (NDH, T2DM, metabolic syndrome or poly-cystic ovary syndrome); however, there was a high level of heterogeneity and differences in ZAG disappeared when accounting for differences in BMI (Pearsey et al. 2020a). Outside of circulating ZAG, ZAG mRNA has been positively associated with glucose transporter (GLUT)-4 mRNA expression in adipose tissue (Balaz et al. 2014) and evidence suggests that ZAG may act on other tissues, in particular human skeletal muscle cells, to aid improved glucose utilization through increased GLUT-4 expression (Russell and Tisdale 2010). However, experimental mechanistic evidence has not consistently translated into evidence of an association within observational research, with most reporting a null associations (Ceperuelo-Mallafre et al. 2015, 2009Garrido-Sanchez et al. 2012).
Our analyses have several strengths. Firstly, there have been limited investigations employing proteomics to explore populations composed of more than one ethnic group and, to our knowledge, very few that have directly compared ethnicities. Secondly, this was the first study to identify ZAG as a protein that significantly differed between ethnicities, which we have both verified and validated to reinforce our initial findings. As ZAG has been described in a number of disease and health disparities, our findings could help elucidate why SAs are at greater risk of certain chronic diseases compared to other ethnicities, particularly WEs. This could include cardiovascular diseases, as ZAG is recognised to be lower in cases of coronary artery disease (Liu et al. 2019) and lowered in cases of atherosclerosis development (Huang et al. 2019), which is also recognised to be more prevalent in SA individuals (Ghaffar et al. 2004). Finally, the inclusion of a cohort consisting of both males and females in the subsequent validation analysis ensures our findings are generalisable.
It should be acknowledged though that there are also certain limitations. Proteomics includes the identification of hundreds of proteins, which together with the limited sample size, could result in a type 1 error and thus explain the discrepancy between the results from the untargeted and targeted analysis for associations between ZAG and glycaemic control. This discrepancy could also reflect methodological differences between mass spectrometry and ELISA for ZAG quantification. This includes, but is not limited to, the inability for ELISA to distinguish protein structural variants and protein post-translational modifications (Nedelkov 2005) Also, the complex nature of plasma resulting in interactions between biological matrix components interfering with absolute protein quantification (Gorovits et al. 2014). The results for ethnicity can be considered more robust as they were confirmed through verification (within sample) and validation (out of sample) targeted testing. However, it is possible that differences were due to residual confounding or confounding by 1 3 unmeasured factors could explain the differences observed. It could also be possible that differences are due to reverse causation (e.g. poor metabolic health in SAs acts to lower ZAG rather than vice-versa).
In conclusion, our findings show that circulating ZAG is lower in SA individuals than WE individuals, which should be interpreted as highlighting a novel biomarker that could be involved in mediating or predicting an increased risk of metabolic disease in SA communities. However, as our analysis was simply designed to highlight possible proteins that may be important markers of ethnic health and considering an association between ZAG and glycaemic control was not confirmed, further research is required to assess whether ZAG is causally related to metabolic dysfunction in SAs.

Disclaimer
The views expressed are those of the authors and not necessarily those of the Diabetes Research Centre, NIHR Biomedical Research Centre, NHS or the Department of Health Sciences at University of Leicester.
Informed consent All eligible participants provided informed consent. For both studies, consent was given for samples to be used in future analysis that was not part of the trial initial outcomes.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.