Identification of novel genetic variants associated with short stature in a Baka Pygmies population

Human growth is a complex trait determined by genetic factors in combination with external stimuli, including environment, nutrition and hormonal status. In the past, several genome-wide association studies (GWAS) have collectively identified hundreds of genetic variants having a putative effect on determining adult height in different worldwide populations. Theoretically, a valuable approach to better understand the mechanisms of complex traits as adult height is to study a population exhibiting extreme stature phenotypes, such as African Baka Pygmies. After phenotypic characterization, we sequenced the whole exomes of a cohort of Baka Pygmies and their non-Pygmies Bantu neighbors to highlight genetic variants associated with the reduced stature. Whole exome data analysis revealed 29 single nucleotide polymorphisms (SNPs) significantly associated with the reduced height in the Baka group. Among these variants, we focused on SNP rs7629425, located in the 5′-UTR of the Hyaluronidase-2 (HYAL2) gene. The frequency of the alternative allele was significantly increased compared to African and non-African populations. In vitro luciferase assay showed significant differences in transcription modulation by rs7629425 C/T alleles. In conclusion, our results suggested that the HYAL2 gene variants may play a role in the etiology of short stature in Baka Pygmies population.


Introduction
Human growth and, in particular, adult height are undoubtedly multifactorial processes, involving genetic, hormonal, nutritional and other environmental factors (Waldman and Chia 2013). Regulation of human adult stature has been of particular interest to geneticists, evolutionary and cultural anthropologists, as well as to pediatricians focused on growth disorders (Lettre 2011). Adult height is a prime example of a highly polygenic complex trait with a relatively high hereditability (≥ 69%) (Pemberton et al. 2018;Sohail et al. 2019). Polygenic traits are known to evolve differently from monogenic ones, through slight but coordinated shifts in the frequencies of a large number of alleles, each with a small effect (Sohail et al. 2019). Because a major proportion of adult stature is dependent upon an intact growth hormone (GH) and insulin-like growth factor I (IGF-I) axes, much attention in previous studies has been devoted to abnormalities related to these growth factor patterns. GH, GH-binding protein (GHBP) and IGF-I are among the key molecules involved in human growth and their abnormal secretion is often found in human growth disorders (Wit et al. 2016;Andradeet al. 2017). However, because of the phenotypic Matteo Zoccolillo and Claudia Moia share first authorship and contributed equally to this work.

Electronic supplementary material
The online version of this article (https ://doi.org/10.1007/s0043 9-020-02191 -x) contains supplementary material, which is available to authorized users. complexity, variants in components of the GH/IGF-I axes can only explain a small part of the variability of normal human growth (Durand and Rappold 2013). Indeed, recent genome-wide association studies (GWAS) identified approximately 700 common variants with a putative effect on determining adult height (Wood et al. 2014;Andrade et al. 2017).
From a clinical point of view, the term idiopathic short stature (ISS) is adopted when no recognizable cause of growth impairment is found despite an adequate diagnostic workup (Cohen 2014). Essentially, ISS is characterized by a height more than two standard deviations (SD) below the mean for sex and age without other clinical features in a child born with a normal birth size, encompassing familial short stature and constitutional delay of growth (Wit et al. 2008). Patients with ISS show a normal GH secretion in response to provocative stimuli. However, mean serum levels of IGF-I and GHBP are below the population average, suggesting partial insensitivity to GH (Wit et al. 2008;Kang 2017). Remarkably, a similar insensitivity to IGF-1 was scored in African Pygmies populations (Bozzola et al. 2009). African Pygmies, together with some other populations spread around the world, now commonly known as pygmoid populations, are characterized by an adult short stature, with males exhibiting an average height of about 150 cm or less in conjunction with reasonably well-conserved body proportions (Migliano et al. 2007;Meazza et al. 2011;Verdu 2016).
African Pygmies live in equatorial rain forest and share an economy based on hunting and gathering, exhibiting characteristic culture and behavior features (Le Bouc 2017). Genetic studies indicate a quite clear distinction between Pygmies and non-Pygmies populations (Verdu et al. 2009). Pygmies populations are distributed across equatorial Africa in two main clusters: one in East Africa (e.g., Uganda) including the Batwa and Efe groups, and the other one in West Africa (e.g., Cameroon) including the Kola and the Baka populations. Substantial admixtures between Pygmies and non-Pygmies may have occurred for a long time ). However, the degree of admixture varies in a same region, e.g., Kola Pygmies from Southwest Cameroon show a relatively higher level of admixture compared to Baka Pygmies from Southeast Cameroon (Verdu et al. 2009). Despite this fact, genetic studies indicate a quite clear distinction between Pygmies and non-Pygmies (Verdu et al. 2009;Patin et al. 2014).
Over the years, several evolutionary hypotheses have been proposed to explain the short stature of Pygmies (Perry and Dominy 2009). These included the adaptation to food scarcity, difficulties of thermoregulation in the dense tropical forest with warm and humid conditions and trade-off between growth cessation and age at first reproduction caused by a high mortality rate (Bailey 1991;Perry et al. 2014). However, in recent years, the limitations of physiological data as well as scarce demographic, epidemiological, and paleoanthropological evidence have led to the use of genetics approaches (Becker et al. 2011), including Whole Exome Sequencing (WES), to identify the genetic determinants influencing Pygmies' short stature. Using high-density single nucleotide polymorphism (SNP) chip data, several studies of population genetics studies have found candidate chromosomal regions associated with short stature, including genes encoding for factors involved in the IGF-I axis, the iodine-dependent thyroid hormone and the bone homeostasis/skeletal remodeling pathways (Jarvis et al. 2012;Hsieh et al. 2016;Mendizabal et al. 2012). Lachance and collaborators (2012) searched for signals of positive selection in five high-coverage Western African Pygmies genomes and suggested that short stature may be due to selection of genes involved in development of the anterior pituitary gland, as well as in the crosstalk between the adiponectin and insulinsignaling pathways.
In the present study, we describe a WES analysis based on phenotypically characterized Baka Pygmies and Bantu non-Pygmies subjects, with the aim to identify potential genetic variations associated with the Pygmies' short stature. Based on these results, we suggest that a variant of the HYAL2 gene may have a role in the determination of short height in the Baka Pygmies population.

Sample collection and genotyping scheme
Blood samples were collected from 84 Pygmies (35 males and 49 females) and 20 Bantu (6 males and 14 females) subjects, in serum-separator tubes and in Tempus Blood RNA tubes (ThermoFisher Scientific, Waltham, MA, USA). All samples were maintained at refrigerated temperature for 3 days, then frozen for transport and stored at -20 °C until DNA extraction. Subjects were orally informed and those who gave their consent underwent clinical evaluation and blood withdrawal for genetic studies. The criteria for the enrolment of the Bantu control population were matched for age and sex, sympatry and clinical exclusion of phenotypically apparent diseases.
The genotype scheme considered initially 84 blood samples from Pygmies and 20 from Bantu individuals. Auxological data (weight, height, BMI) were available for 27 Pygmies and 20 Bantu, respectively. WES was then performed on a representative selection of eight Pygmies and five Bantu, at the lower or upper limits of the height distributions of these populations, respectively. The most significant genetic candidates WES results were then finally validated on a cohort of 76 Pygmies and 15 Bantu individuals.
The pairwise genetic difference was estimated for all populations by calculating Wright's F statistics (F st ) (Wright 1951). Data sets used for F st estimation were derived from the 1000 Genome database with the following adjustments: African without Americans of African Ancestry and European without Finnish in Finland, East Asian without Vietnam, and Northern and Western Ancestry.
Differences in genotype frequencies associated with different phenotypes were tested for each autosomal biallelic variant selected by F st , with sparse Partial Least Squares regression (sPLS) (Chun and Keleş 2010). For sPLS, the number of components included in the model was set to two. The number of variables kept for the first component, which determines the strength of the variant selection, was set to 400.

Sequence validation
The Fluidigm 48 × 48 Access Array IFC system (Fluidigm, San Francisco, CA, USA) was used to validate 46 variants identified by WES and associated with short stature in Baka Pygmies. In detail, specific primer pairs (sequences available upon request) were designed to amplify flanking regions of each variant. Then universal primers sequences (5′-ACA CTG ACG ACA TGG TTC TACA and 5′-TAC GGT AGC AGA GAC TTG GTCT) were ligated at the 5′ termini to all PCR products. The amplification PCRs were performed on 76 Pygmy and 15 Bantu subjects using 50 ng of genomic DNA, 1 × FastStart High-Fidelity Reaction Buffer with MgCl 2 , 5% DMSO (v/v), dNTPs (200 µM each), universal primers (1 µM each), 1 × Access Array loading reagent and FastStart High-Fidelity Enzyme Blend (Roche, Basel, Switzerland). Subsequently, thermal cycling on a Fluidigm FC1 Cycler was performed according to the manufactures' condition. Sanger sequencing analyses for rs7629425 variant were performed on 10 ng of genomic DNA using the following primers: (5′-AAA GGC ATT CAG GTC CAG TG; 5′-AAT AAG CAG GTG TTT GGG GA).

Generation of pCMV-Luc, pCMV-HYAL2-WT, and pCMV-HYAL2-MUT plasmids
pCMV-Luc plasmid was generated from firefly pGL3derived luciferase gene HindIII and BamHI cloning into pcDNA 3.1 vector (Invitrogen, Carlsbad, CA, USA). Starting from pCMV-Luc, two DNA fragments of 50 nucleotides of 5′-UTR of HYAL2 gene containing either rs7629425 reference (C) or alternative alleles (T) were amplified using a three-round PCR and then cloned into pCMV-Luc, adjacent to the firefly luciferase gene, thus originating plasmids pCMV-HYAL2-WT and pCMV-HYAL2-MUT, respectively (Suppl. Figure 1). In the first PCR, the firefly luciferase gene, including the restriction site for BamHI, was amplified from pCMV-Luc together with a small sequence containing the last 20 nucleotides of the 5′-UTR of HYAL2 variants using these primer pairs (i.e., 1 for reference or two for alternative allele). The second PCR using primers four and three amplified the first PCR amplicons adding the full portion of interest of the 5-'UTR as well as the HindIII restriction site; the last PCR using primers four and five added random nucleotides upstream of HindIII and BamHI to improve next restriction digestion's efficiency. All primers sequences are reported in Supplementary Table 3. PCRs were performed using the following thermal profile: 98 °C for 3 min; 98 °C for 20 s, 55 °C for 40 s, 72 °C for 60 s (five cycles); 98 °C for 20 s, 65 °C for 40 s, 72 °C for 60 s (20 cycles); final extension at 72 °C for 10 min.
Following purification with Agencourt AMPure XP beads (Beckman Coulter, Brea, CA, USA), analysis with either Agilent DNA High-Sensitivity Kit for BioAnalyzer (Agilent, Santa Clara, CA, USA) or 0.8% agarose gel electrophoresis and validation with Sanger sequencing on both DNA strands, PCR products and pCMV-Luc vector were digested with HindIII and BamHI enzymes (New England Biolabs, Ipswich, MA, USA). PCR fragments were gel-purified and cloned into digested pCMV-Luc vector thus giving pCMV-HYAL2-WT (containing rs7629425 C allele) or pCMV-HYAL2-MUT (containing T allele) plasmids. Following heat-shock transformation in E.coli One-Shot Top10 (Invitrogen) competent cells, colony PCR was performed to select positive clones, then plasmids were extracted using Promega Wizard Plus SV Minipreps DNA Purification System (Promega, Madison, WI, USA). Recombinant clones, containing pCMV-HYAL2-WT and pCMV-HYAL2-MUT plasmids, were verified by Sanger sequencing using primers Luc1 and Luc3 (Eurofins GATC Biotech, Germany).

Phenotypic features of the investigated populations
Our study included 104 adult semi-nomadic individuals living in the Reserves of Dja and Lobo in Southeast Cameroon who were enrolled during two fieldworks conducted between 2007 and 2009. Their camps were hard to reach by jeep on remote roads. The majority of the individuals was illiterate and spoke different dialects. To obtain informed consent from the subjects enrolled in the study, we relied on local interpreters-nurses. They were thus orally informed and those who gave their consent underwent clinical evaluation and blood withdrawal for both serological and genetic studies. The criteria for the enrolment of the Bantu control population were age and sex match, sympatry and clinical exclusion of phenotypically apparent diseases. On the basis of biological and cultural anthropological fieldwork experience in the investigated communities, the population was categorized as hunter-gatherers (Pygmy) or farmers (non-Pygmy). The enrolled subjects were divided in two populations constituted of 84 Baka Pygmies and 20 Bantu non-Pygmies.
We first investigated if height and weight traits exhibited significant differences between the two populations. Of all the subjects, phenotypic measurements were available for 27 Baka and 20 Bantu subjects, respectively. As summarized in Table 1, the two populations exhibited similar BMI values, while height and weight were significantly lower in the Pygmies. In particular, the mean stature of Baka Pygmies males was significantly lower, with a mean standard deviation score (SDS) of − 3.96 compared to − 1.24 in the sympatric Bantu samples (p < 0.05, one-way ANOVA). Interestingly, differences were less prominent in SDS for females, i.e., − 2.09 and − 1.02, for Baka Pygmies and Bantu, respectively (p < 0.05, one-way ANOVA).

Whole exome sequencing (WES) analysis and functional classification of the genetic variants
To identify genetic variants involved in the regulation of height in the Baka Pygmies population, a WES analysis was performed. Representative samples of eight Baka Pygmies and five Bantu individuals were selected close to the lower or upper limits of their height distribution, respectively (Fig. 1). Their genomic DNA was extracted from whole blood followed by exome library preparation, next-generation Remarkably, Baka Pygmies exhibited a significant increase in novel identified SNPs compared to Bantu subjects (i.e., 515.25 ± 63.7 vs. 330.4 ± 39.5, p = 0.014, one-way ANOVA), while numbers of novel insertions and deletions (In/Del) were roughly similar (Supplementary Table S1).
To determine the potential impact of the identified variants, the functional predictor SnpEff algorithm was employed (Table 3). As a result, 23-25% of the variants in both populations were defined as modifiers (including non-coding variants, 3′-and 5′-UTR, inter-and intra-genic variants) or variants affecting non-coding genes. Around 42% of the variants were classified as of low impact, while about 38% as of moderate impact (non-disruptive variants that might change protein effectiveness, missense variant, in-frame deletion); of note, around 1% of variants were predicted as high impact (i.e., producing truncation, or loss of protein function). In particular, Baka Pygmies exomes displayed 354 variants of high impact not shared by the control population: specifically, 30 were involved in start codon loss, 15 in stop codon loss, 148 in stop codon gain, 134 in frameshift mutations and 27 in splice events. In contrast, Bantu exomes showed 142 high-impact variants: 12 were involved in start codon loss, three in stop codon loss, 69 in stop codon gain, 49 frame-shift mutations and 9 splice events.

Minor allele frequency (MAF), Fixation Index (Fst) and sparse partial least square (sPSL) regression analysis
To evaluate if MAF analysis was informative to distinguish the investigated populations, the correlation of allele frequencies for all SNPs shared between Baka Pygmies, Bantu, African and European populations was evaluated. As expected, the correlation between the Baka Pygmies and European populations was lower than between Baka Pygmies and Africans (r 2 = 0.782 and r 2 = 0.915, respectively).  A similar trend was observed for Bantu, comparing with European (r 2 = 0.774), East Asian (r 2 = 0.793) and with African (r 2 = 0.892) ( Table 4). Then the likely impact and extent of genetic distance according to Wright's population differentiation statistic, F st (Pair-Wise Fixation Index) were evaluated. The determination of F st , starting from the entire set of 88,830 autosomal SNPs, estimated the difference in terms of genetic structure between Baka Pygmies and Bantu individuals, and assessed for each identified variant to what extent it was involved in the genetic discrimination of the two populations. Furthermore, to gain a wider insight into the extent of genetic distance, F st analysis was extended to African, East-Asian and European populations included in the 1000 Genomes database, the largest public catalogue of human variation and genotype data. As a result, mean F st values ranged from 0.010 to 0.380 and the weighted F st from 0.030 to 0.250, indicating that the Baka Pygmies group was closer to Bantu one compared to European and East-Asian populations. In Bantu, mean F st values and the weighted F st showed similar trends with the other populations (Table 5). Fixing F st > 0.30, 2232 SNPs significantly discriminated Baka Pygmies and Bantu samples. The most discriminating variants, corresponding to the 0.1 percentile (i.e., F st > 0.60), were found in 224 genes (Fig. 3); furthermore, the 0.01 percentile cutoff (F st > 0.70) highlighted 22 genes (Supplementary Table S2). Interestingly, none of these variants was previously described to be associated with Pygmies short stature studies.
To select the putative variants prevalently associated with Pygmies' short stature, sPLS was applied to the genetic datasets comprising 2232 variants, previously identified using the F st analysis and further reduced to 2108 bi-allelic variants. Among these, 397 variants separately clustered Baka Pygmies, compared to Bantu, African and worldwide populations (Fig. 4). Next, by hierarchical clustering, 29 SNPs associated with known genes were selected ( Table 6).

Identification of candidate variants for pygmy short stature
These 29 selected variants were then validated in a larger cohort composed of 76 Baka Pygmies and 15 Bantu individuals. To address this point, the Fluidigm 48 × 48 Access Array enrichment system was employed, highlighting that nine variants, among the 29 tested, were significantly associated with Baka Pygmies (data not shown). In particular, five variants mapped within 5′-or 3′-UTR, four in coding regions comprising two missense mutations (Table 7). Again, none of these variants has been previously associated with human height trait determination. A promising variant, significantly associated with short stature in Baka individuals (p = 0.032, Chi-Square, Table 6) was rs7629425, a C/T SNP located in the 5′-UTR of HYAL2 gene, nine nucleotides upstream of exon two coding sequence (Fig. 5). The C to T nucleotide variation was predicted to affect Sp3 or Rfx1 transcription binding, respectively (PROMO 3.0 https ://algge n.lsi.upc.es/, data not shown). In particular, Sp3 transcription factor was reported to affect several biological processes including ossification (Göllner et al. 2001). Then, a luciferase reporter assay was performed, comparing transcriptional activities of the two alleles through eukaryotic expression vectors (see "Materials and methods"). After HeLa cells transfections with allelespecific plasmids, normalized luciferase measurements at 24 h post-treatment (p.t.) indicated a significant increase in the gene reporter expression in correspondence of the alternative T allele compared to the reference one (n = 6, p = 0.024, one-way ANOVA).

Discussion
In the present contribution, we describe the results of a WES analysis of individuals from two Cameroonians populations, specifically hunter-gatherers Baka Pygmies and their neighbor Bantu non-Pygmies farmers in an attempt to identify genetic variants associated with short stature. Previous studies have collected clinical and biological data of Pygmies, related to their lifestyle, culture and environment and various groups identified candidate genes contributing to their short stature phenotype (Lachance et al. 2012). These are in particular HESX1 (which encodes a homeobox containing transcriptional repressor that plays a critical role in development of the anterior pituitary, the site of growth hormone synthesis and secretion), APPL1 (which is involved in crosstalk between adiponectin and insulin signaling pathways), ASB14 (which encodes a SOCS box protein), and the sperm-motility gene DNAH12. However, the causative mechanisms for Pygmies' short stature still remain a debated topic. Moreover, from a clinical point of view, understanding the molecular mechanisms involved in Pygmies' short stature is important not only from an evolutionary point of view, but also because it could provide additional information regarding a potential novel cause of children with idiopathic short stature (Kang 2017).
In a WES genetic approach identifying genes associated with short stature in Baku Pygmies, a primary aspect to consider is the inclusion of a proper control population for data comparison. To this purpose, while Baku is a relatively homogeneous population relying on an economy based on hunting and gathering, it also has a complex socioeconomic relationship with the farming neighbor population, known as Bantu (Rozzi et al. 2015). From an evolutionary point of view, Baka Pygmies and Bantu populations shared a common ancestor, but started diverging as separate populations around 60,000 years ago. However, substantial admixtures between these populations have occurred in the last 5-1000 years .
Starting from the genetic material obtained from phenotypically selected Baka Pygmies and Bantu individuals, we performed Pair-Wise Fixation Index and sparse partial least square regression method analysis for positive selection of variants associated with short stature. Following gene prioritization, validation of initial reads in a larger cohort of individuals and functional annotation of candidate genes harboring genetic variants enriched in the Baka Pygmies population, we hypothesized that the identified rs7629425 variation, located in the 5′-UTR region of Hyaluronoglucosaminidase 2 (HYAL2) gene, might have a role in the determination of short height in Pygmies. HYAL2 encodes for a GPI-anchored cell surface protein that degrades hyaluronan, one of the major glycosaminoglycans of the extracellular matrix. Hyaluronan and its degradation fragments are thought to be involved in cell proliferation, migration and differentiation (Lepperdinger et al. 2001). Furthermore, the gene encodes two alternatively spliced transcript variants, which differ only in the 5′-UTR, supporting the idea that this region may have an important regulatory function on gene transcription. HYAL2 depletion in conditional knockout mice showed that this gene was essential for the catabolism of hyaluronan. In the absence of HYAL2, extracellular hyaluronan accumulated and, in some cases, may lead to cardiopulmonary dysfunctions (Chowdhury et al. 2013). In a different study, it has been reported that mice lacking HYAL2 displayed variably penetrant developmental defects, including skeletal and cardiac anomalies (Triggs-Raine and Natowicz 2015). Importantly, HYAL2 is localized on human chromosome region 3p21.3 described to be associated with short stature in Pygmies, containing loci associated with growth hormone, insulin and insulin-like growth factor signaling pathways, as well as immunity and neuroendocrine signaling involved in reproduction and metabolism (Jarvis et al. 2012). This region also includes the positional candidate gene DOCK3, which is known to be associated with height variation in Europeans, and CISH, a negative regulator of cytokine signaling known to inhibit growth  hormone-stimulated STAT5 signaling (Jarvis et al. 2012). The 5′-UTR localization of rs7629425, nine nucleotides upstream of the first Methionine residue, was hypothesized to differently affect the binding of transcription factors. To functionally evaluate the effects of C-T variation in regulating transcription, a reporter Luciferase assay was performed indicating a significant and most prominent transcriptional activity for the alternative T allele. In summary, our results indicate the importance of a WES approach to generate data for identifying functionally important genetic variants associated with complex traits like stature in humans. In particular, we found enrichment of a rare variant in the non-coding 5′-UTR region of HYAL2 gene in the Baka Pygmies population, in a chromosomal region previously described to be linked to this peculiar phenotype. Future studies will be devoted to further investigate the molecular effects of the identified nucleotide variant on the cascade of genes involved in the regulation of body stature.