Evaluation of loci to predict ear morphology using two SNaPshot assays

Human ear morphology prediction with SNP-based genotypes is growing in forensic DNA phenotyping and is scarcely explored in Pakistan as a part of EVCs (externally visible characteristics). The ear morphology prediction assays with 21 SNPs were assessed for their potential utility in forensic identification of population. The SNaPshot™ multiplex chemistries, capillary electrophoresis methods and GeneMapper™ software were used for obtaining genotypic data. A total of 33 ear phenotypes were categorized with digital photographs of 300 volunteers. SHEsis software was applied to make LD plot. Ordinal and multinomial logistic regression was implemented for association testing. Multinomial logistic regression was executed to construct the prediction model in 90% training and 10% testing subjects. Several influential SNPs for ear phenotypic variation were found in association testing. The model based on genetic markers predicted ear phenotypes with moderate to good predictive accuracies demonstrated with the area under curve (AUC), sensitivity and specificity of predicted phenotypes. As an additional EVC, the estimated ear phenotypic profiles have the possibility of determining the human ear morphology differences in unknown biological samples found in crimes that do not result in a criminal database hit. Furthermore, this can help in facial reconstruction and act as an investigational lead. Supplementary Information The online version contains supplementary material available at 10.1007/s12024-022-00545-7.


Introduction
The externally visible traits of humans are complex, resulting from polygenic inheritance [1][2][3].Human ear morphology is signified as a highly polymorphic and polygenic trait that exhibits continuous phenotypic distribution and serves as an important target in forensic DNA phenotyping studies [4].The variability exists among phenotypes of lobe sizes and states, degree of ear protrusion and the difference in helix shape, tragus and antitragus morphology in each individual [5].In forensics, external ear morphology has been used since Bertillon (1893) for personal identification from photographic images, videos, or ear prints in forensics [6].An otoscopic forensic opinion has the status of scientific evidence which is admitted by Polish Courts [7].Earlobe attachment can be highly useful in disaster victim verification [8,9].The medico-legal importance of the ear is due to its stable structure and rigidity in burnt bodies which further enables facial reconstruction [10].Moreover, it is useful in the identification of drowning cases of mutilated faces [9,11,12].
Understanding the genetic aetiology is important for ear morphogenesis [13], forensic genetics [14] and diagnostics [15].The first comprehensive study investigated the pinna trait in the Latin American population and identified seven loci for variations in human ear morphology using genomewide association studies (GWAS) [16].Another GWAS for variant association with lobe attachment in multi-ethnic groups (Europeans, Americans cohorts) identified 49 significant loci associations [4].The genetic variations like SNPs insertion-deletion variants, block substitution and inversion variants may cause amino acid substitutions which alter the functional property of the protein [3].This results in morphological changes and distinct phenotypes [17].
Previously developed phenotyping assays used a variety of reported techniques to obtain genotypic data including TaqMan assays [18,19], next-generation sequencing (NGS), Ion Ampliseq technology [20] and whole-genome sequencing (WGS) [21].However, whole-genome sequencing is an expensive technique and not suitable for the specific traits of interest involving limited genes.Multiplex analyses coupled with the mini sequencing technique offer a targeted approach for retrieval of specific phenotypes of interest [22][23][24][25].The phenotypic variation in population caused by genetic variation must be added to modelling parameters [17].Regression analyses are performed to model the structure (identify the pattern) seen within the dataset following odd ratios [26].
Several phenotyping methods with the multiplex genetic panels and prediction models have been proposed, for example, IrisPlex [27], HIrisPlex [28] and HIrisPlex-S [29].Furthermore, progress has been made in inferring height [30], baldness [31] [32], freckles [20], hair thickness [33], age [34] and facial morphology from biological samples [35,36].Forensic DNA phenotyping (FDP) is the prediction of these externally visible characteristics (EVCs) from DNA traces [37][38][39].The importance of forensic DNA analysis for criminal investigation is quite evident in the Zainab Murder case [40,41].It was confirmed through DNA testing when the 814th sample of suspects showed similarity with the reference sample in the database.In the absence of reference DNA, a DNA phenotyping study can be useful in narrowing down the pool of suspects and can potentially provide more details about the appearance of individuals than eyewitnesses can.It is used as an intelligence tool rather than to confirm individual identity [42].
In Pakistan, only one study is available focusing on the genetic determination of lobe attachment ear phenotype for Southern Punjab subjects [43].Another study was found regarding the DNA-based prediction of eye colour in the Swat population [44].No published data is available for other phenotypes of the ear.Much attention is paid to the diagnostic and genetics of hearing loss in Pakistan [45].Whereas multiplex panels for EVCs prediction are often tested majorly in Europeans [46,47], Eurasians Americans [48] and Koreans [49].The utility of forensic DNA phenotyping is in its infancy in Pakistan.The frequency of ear morphological characteristics is well documented [14,[50][51][52].
A Nikon D5600 camera was used to photograph each ear along with the individual's head in the Frankfort horizontal plane described by Meijerman et al. [53].Phenotypes were assessed by high-quality photographs and closely observing the individual ear.The approval of this study was obtained from the ethical review committee of the University of Health Sciences, Lahore, Pakistan.Healthy males and females of age 18-40 years without ear abnormalities were considered in the study.DNA was extracted with an in-house standard protocol of phenol-chloroform isoamyl alcohol [54], and quantitative analysis was performed using Qubit 3 Fluorimeter with a double-stranded DNA broad range assay kit (Thermo Fisher Scientific) according to the manufacturer's directions [55].

SNP genotyping assay
Primer 3 plus was used to design 21 primer pairs and their respective single-base extension primers using the default parameters of the software program, targeting similar melting temperatures of 60 °C and similar GC contents.Primer sequences are detailed in Table 1 along with final PCR and SBE primer concentrations for both multiplexes [56].The melting temperature and amplicon size were analysed in silico on the UCSC genome browser [57].The potential performance of multiplex PCR primers was screened on Autodimer [58] to detect any hairpin and primer dimer formation.Both forward and reverse single-base extension primers were designed, and either one of them was added to the final multiplex system.Poly T-tails have been added to the 5′ end of the SBE primers to ensure complete capillary electrophoresis separation between the SBE products of multiplexes.Optimization of all primers was performed using gradient PCR.Multiplex PCR was performed in a 10-µl final reaction volume containing 1 × Qiagen PCR Multiplex Mix (Hilden, Germany), primer concentrations as specified in Table 1 and 5 ng of DNA.Thermal cycling was performed on a Veriti 96 well thermocycler (Applied Biosystems).The multiplex PCR conditions were as follows: 95 °C for 15 min, 30 cycles of 95 °C for 30 s, 60 °C for 90 s, 72 °C for 60 s and the final extension at 60 °C for 30 min.For removal of unincorporated primers and dNTPs, 3 µl of amplified product was purified with 1 µl Exosap (ExoproStart™) at 37 °C for 1 h and 75 °C for 15 min.Before performing a multiplex extension reaction, all SBE primers were verified for their proper working efficiency by executing the singleplex extension reaction with the corresponding template.The multiplex single-base assay reaction was prepared with final concentrations of 1 × SNaPshot™ ready mix (Thermo Fisher), SBE primer concentrations as stated in Table 1 and 1 µl of purified PCR product, in a Veriti 96-well thermocycler (Applied Biosystems) following thermocycling conditions: 96 °C for 2 min, 25 cycles of 96 °C for 10 s, 50 °C for 5 s and 60 °C for 30 s.The extension product was purified by the addition of 1 µl of SAP enzyme (Applied Biosystems), followed by incubation for 70 min at 37 °C and 20 min at 72 °C.

Capillary electrophoresis and allele calling
The purified extension product (1 µl) was mixed with 10 µl Hi-Di formamide and 0.4 µl Genescan-120 Liz size standard and run on a 3130xl Genetic analyser (Applied Biosystem) after rapid heating of the reaction mix at 100 °C for 2 min and cooling for 2 min.The analyser has POP-7 as the sieving polymer, on a 36-cm capillary length under an injection voltage of 2.5 kV for 10 s and with a running time of 500 s at 60 °C using the default run module and E5 dye set.Allele calling and analysis of results were performed with GeneMapper™ ID software version 3.1.

Statistical analysis
The output files generated through SNaPshot™ were analysed to assess levels of association between phenotype and genetic variation across all individuals typed.

Population analysis
All variant data were tested for Hardy-Weinberg expectations using the HWE calculator of Micheal H. Court's (2005-2008) online calculator Excel-based HWE test and SNPstats (https:// www.snpst ats.net/ start.htm).Linkage disequilibrium testing was performed with the online software SHEsis [59].

Association testing
To predict the probability of an ordered outcome of lobe sizes, tragus size, antitragus sizes, posterior and superior helix rolling, antihelix folding, antihelix superior crus, Darwin tubercle and crus helix expression, ordinal logistic regression was applied.The multinomial logit was performed on phenotypes of lobe states and ear protrusions being not in the order form.The multiple SNP association testing was performed using R programming through the multinomial regression.For each phenotype, one multinomial logistic regression or ordinal regression was applied, whatever is fitted.The significance regression coefficient of the respective genotype, i.e., SNP, was described by a Wald statistics-based p-value, with the threshold of 0.05.For better interpretation, the fitted model transforms the regression coefficients into the odds ratio.An odds ratio (OR) measures the effect of SNPs over the respective phenotype.An odds ratio equal to one indicates the change in SNP level in genotype has no effect on the phenotype, and odds ratio greater or below than one indicates the change in SNP level in genotype has an effect on the phenotype.95% confidence    The performance of the fitted ordinal and multinomial regression model using the area under the receiver operating characteristic (ROC) curves was evaluated for final prediction accuracy on the dataset.The AUC basically can be considered as the probability that the test correctly identifies the phenotype.It is the integral of ROC curves ranging from 0 to 1. Additionally, the sensitivity of the model, specificity, negative predictive value (NPV), positive predictive value (PPV) and maximal probability approach was assessed.The threshold of probability for ear phenotypes prediction was tested ranging from p-value > 0.05 to > 0.09.

Population data
The percentage distribution of phenotypes is shown in Supplementary Fig. S1.The association testing was performed on all SNPs in order to draw the all-possible information in pilot-scale preliminary work.One rare SNP marker (rs3827760) was excluded at the level of statistical analyses (monomorphic in our dataset).Deviation from Hardy-Weinberg equilibrium was noted for few SNPs shown in Table S2 as the p-values < 0.05 were not consistent with HWE.Details of LD analysis are shown in Fig. 1.All the SNPs were not in Linkage disequilibrium.The ear traits and phenotypes are shown in Fig. 2.

SNaPshot™ multiplex SNP genotyping assay and screening of genotypes
The two genotyping assays were based on the principle of multiplex PCR followed by multiplex single-base extension assays using SNaPshot™ chemistry: Plex-1 assay encompassed of 10 SNPs whereas Plex-2 included 11 SNPs.Amplicons were designed to be 300 bp or smaller in length.According to the quality of amplicons, primer concentrations and annealing temperatures were optimized.Both the PCR and SBE multiplexes were optimized to achieve the balanced Plex-1 and Plex-2 SNP genotype profile (Figs. 3 and 4).The activity of SBE primer was verified by executing a single Plex extension reaction with a corresponding template PCR product.DNA input in assays was 5 ng.All expected peaks were detected, sized properly with accurate genotyped with uniform strength as shown in Figs. 3 and 4. Each peak was fragmented into genotypes and was interpreted following the peak(s) present at that site, with a single peak indicating homozygous genotype for that allele and double peaks indicating a heterozygote genotype for that SNP.Peaks with a relative fluorescence unit (RFU) value below 50 were excluded.

SNP associations testing in Punjab population
As demonstrated in   g g g g g g g g g g g g g g g g g Fig. 2 Ear phenotypes.The ear trait was classified as 1 lobe size (small, medium, large); 2 lobe attachment (attached lobe, intermediate attachment, free; 3 antitragus ( absent, average, prominent); 4 tragus size (absent, average, prominent); 5 posterior helix rolling (under folded, partial folded and over folded); 6 superior helix rolling (under folded, partial folded and over folded); 7 antihelix folding (under folded, partial, over folded); 8 antihelix superior crus (flat, average and extended); 9 Darwin tubercle (absent, degree of presence and prominent); 10 crus helix expression (small, prominent and extended); 11 ear protrusion (small, medium and large)  The highest statistical significance was obtained for two SNP (rs13397666, rs7567615) which explains the variation in superior helix rolling.The subjects' genotype change from GG to AG in rs13397666 is OR = 2.24 times (p-value = 0.041) and the genotype change from AA to GG in rs7567614 2.4 times (p value = 0.02) more likely to get over folded superior helix rolling.
Four genetic predictors (rs7428, rs684523, rs1619249, rs263156) were significantly associated with posterior helix rolling.The individual genotype alters from CC to CT in rs7428 which is 0.505 times (p-value = 0.032), from CC to TT in rs7428 which is 0.432 times (p-value = 0.021) and from AA to AC in rs26315 which is 0.505 times (p-value = 0.049) less likely to get over folded posterior helix rolling.The genotype changed from TT to CT in rs684523 making it 1.922 times (p-value = 0.038) and from CC to TT in rs1619249 making it 5.609 times (p-value = 0.028) more likely to get prominent posterior helix rolling.
Two SNPs (rs2080401, rs260674) were significantly associated with antihelix folding.The subject genotype change from CC to AC in rs2080401 with 2.496 times (p-value = 0.016) more likely to have over folded antihelix folding.The genotype change from GG to AA in SNP rs260674 is 0.190 times (p-value = 0.041) less likely to get prominent antihelix folding.
The highest statistical significance was obtained for seven SNPs (rs17023457, rs7567615, rs1960918, rs9866054, rs10192049, rs13427222, rs1878495) which explains variation in antihelix superior crus.The individual genotype change from AA to GG in rs7567615 is 2.182 times (p-value = 0.031) and from CC to TT in rs1960918 is 3.420 times (p-value = 0.005) more likely to have extended antihelix superior crus.The individual genotype change from CC to CT in rs17023457 is 0.232 times (p-value = 0.027), from AA to GG in rs9866054 0.393 times (p-value = 0.048), from GG to AG in SNP rs1342722 0.260 times (p-value = 0.238), from GG to AA in rs10192049 0.391 times (p-value = 0.028), from GG to AA in rs13427222 0.267 times (p-value = 0.037) and from AA to CC in rs1878495 0.444 times (p-value = 0.048) less likely to get prominent antihelix superior crus.
Two SNPs (rs13397666, rs260674) were significantly associated with Darwin tubercle.The individuals' genotype change from GG to AG which is rs1339766 is 3.471 Fig. 3 SNP genotyping electropherogram analysis in Plex-1.10 SNPs generated a customized report with genotype results (including size, height, peak area).The vertical-coloured boxes are bins created automatically by the software using an extension product created with SNaPshot Kit.Each bin defines the minimum and maximum allowable size for each allele and identifies each peak and assigns the corresponding allele.Polymorphisms were identified based on peak size and colour.The C/T heterozygote allele of rs10212419 (SNP1), the homozygote allele T/T for rs17023457 (SNP2), the homozygote A/A allele of rs13397666 (SNP3), the homozygote G/G allele of rs7567615 (SNP 4), the homozygote T/T allele of rs3827760 (SNP5), the heterozygote C/T allele of rs7428 (SNP6), the T/T homozygote allele (SNP7), the A/A homozygote allele of rs868157 (SNP 8), the C/T heterozygote of rs7873690 (SNP9), and the C/T heterozygote allele of rs1960918 (SNP10) are shown in the electropherogram of the above sample.The C/C homozygote allele (SNP1), the homozygote allele T/T (SNP2), the homozygote A/A (SNP3), the homozygote G/G allele of (SNP 4), the homozygote T/T allele of (SNP5), the heterozygote C/T allele of rs7428 (SNP6), the G/G Homozygote allele (SNP7), the A/A homozygote allele of (SNP 8), the C/T heterozygote of (SNP9), the C/T heterozygote allele of (SNP10) are shown in the electropherogram of the sample below times (p-value = 0.049) more likely to have prominent Darwin tubercle.The genotype change from GG to AA in rs260674 is 0.125 times (p-value = 0.029) less likely to get prominent Darwin tubercle.Three genetic predictors (rs7428, rs2080401, rs7873690) were significantly associated with crus helix expression.The individuals genotype change from CC to CT in rs7428 is 0.528 times (p-value = 0.036), from CC to AA in rs2080401 is 0.510 times (p-value = 0.048) and from TT to CC in rs7873690 is 0.476 times (p-value = 0.048) less likely to get extended crus helix expression.Two SNPs (rs263156, rs1878495) were significantly associated with ear protrusion.The genotype changed from AA to CC in rs1878495, with 0.474 times (p-value = 0.041) and from AA to CC in rs263156 with 0.474 times (p-value = 0.041) less likely to get large protrusion.The complete details of Table 2 are discussed in the supplementary material of Table S2.

Discussion
Our phenotypic characteristics were comparable with previous studies [16,52].The Ear-Plex was designed on a similar pattern to IrisPlex and HIrisPlex-S [60] [29].It was verified that all SBE primers worked correctly by executing a single Plex extension reaction with the corresponding template PCR product.This step was important when considering low-level peak height that may be susceptible to dropout when multiplexed as demonstrated in previous studies [61].DNA input around 5 ng in assays was reported previously in studies [31].
We did not prefer to use very low concentrations of DNA to avoid heterozygote imbalance and allelic dropout issues [20].Some of the obtained allelic peak height imbalances were as expected which is influenced by differences in intensity levels of the four fluorescence dyes used to label the four bases in the primer extension reaction of the SNaPshot™ chemistry.This is unavoidable unless moving away from fluorescence-based SNP-typing technologies [22].The high peak height was resolved by reducing the concentration of the respective primer.Minor shifts in the electrophoretic mobility were observed due to the incorporated base at the end of each probe and to the POP-7™ polymer.However, these shifts did not interfere with the analysis because poly-T tails increased probe spacing as consistent with other reported studies [22].
A few samples evidenced one PCR product peak with more than one colour due to pull-ups.The peak in blue produced a secondary peak in black or green; this problem is probably due to bleed-through.We use same SNPs for multiple phenotype variants because a single SNP can affect multiple phenotypes.The proposed method elucidates the underlying associations.Their genetic underpinnings were highlighted as those SNPs were related to the same trait of interest and that is ear morphologies.It gives insights to SNP-phenotype associations and helps to find pleiotropic loci as well.The individual's genotype change from GG to AA in rs13427222 was 0.246 times (p-value = 0.022) less likely to have free lobes.The rs13427222 association with the attached ear lobe was previously reported in another study [16].The individual genotype change from AA to GG in rs7567615 was 2.454 times (p-value = 0.020) more likely to cause superior helix rolling as was previously reported [16].The possible reason might be that these genetic variations are common in the Asians, Americans and Europeans.As Europeans are considered genetically closer to Pakistanis, we hypothesized that some of the previously reported loci of ear morphology might also be associated with ear morphology in the Punjabi Pakistani population [62,63].Our other eighteen variants are linked to different ear phenotypes compared to the phenotypes reported by Adhikari et al.Compared to our methodology, the study reported by Adhikari et al. used a different methodology to link the ear phenotypes to the genetic variants [16].This suggests that further validation through functional studies would be required to confirm the link of the genetic variants to the predicted phenotypes in the Punjabi population of Pakistan.The statistical nonsignificance of SNPs for the trait of interest suggests that those SNPs might not play a role in Pakistani population ear morphology and are non-informative.The Punjab population of Pakistan is highly conserved due to consanguinity [64] compared to the European or American admixed populations [65].The genetic variations may not be common in Asians to account for sample size.
We oversampled young individuals in our study.In the future, however, anthropometric measurements could be taken into account for better accuracies.Analysis of full genes sequences may be important to achieve good accuracy Based on our data, we proposed p > 0.7 as the optimal threshold which allows for increased prediction accuracy.There are fluctuations in prediction accuracies from excellent prediction, highly predictive and reasonably good predictive phenotypes.This suggests that epigenetic factors, insertion-deletion and repeated variations, pleiotropic and epistasis might be contributing to phenotypic traits.Notably, the higher values of AUC indicate the statistical model being used has higher accuracy because data was split into the training and testing sets to avoid any overfitting.Another possible reason is that as multinomial logistic regression is a useful categorical classifier and has been employed for the prediction of eye, hair and skin colour, there is a real risk of over-fitting data with small sample sizes.It is important to have enough data to avoid overfitting.Future work will be directed on a large sample size to avoid this aspect.The result obtained is a novel step towards providing Pakistan norms including data which provide the medico-legal scientist with robust classification statistics that can be easily applied when they are confronted with ear or ear prints.

Conclusion
Ear morphologies can be predicted from biological samples using multiplex PCR assays combined with SNaPshot™ chemistry and predictive modeling, as developed in this study.A set of 21 SNPs were analysed for association with ear morphologies and revealed significant results.The study confirms independent SNP association for rs13427222 with lobe attachment prediction and rs7567615 with helix rolling in our Punjab population as previously reported in other studies of ear morphologies.However, in our study, the SNaPshot assays are shown to be good predictive for ear phenotypes in the representative Punjabi population of Pakistan.Importantly, the DNA prediction model showed higher accuracy for superior helix rolling, Darwin tubercle and lobe size prediction.Combining these SNPs into one assay for inferring hair, skin, eye colour and ear phenotypes of the Pakistani population simultaneously would be an ideal strategy for developing a phenotypic profile of multiple traits from an unknown source sample.

Fig. 1 Fig. 1
Fig. 1 Linkage disequilibrium plots (LD) plot in all subjects.This figure shows the LD value in D (a) and r 2 (b) between each SNP for controls.LD values in D (c) for cases and r. 2 are shown in (d) between each SNP.Each diamond contains an LD value between the

Fig. 4
Fig.4SNP genotyping electropherogram analysis in Plex-2.Samples were run with an ABI 3130xl Genetic Analyzer, using POP-7 on a 36-cm capillary length array.The electropherograms correspond to extended genotype showing 11 fragments.Peak size and colour vary according to polymorphism.The size (bp) of the primer with combined nucleotide is shown on the x-axis.RFU (relative fluorescence unit) of the peak is presented on the y-axis.The A/A homozygote allele of rs3818285 (SNP1), the heterozygote allele C/T for rs6845263 (SNP2), the heterozygote A/G allele of rs2378113 (SNP3), the heterozygote A/C allele of rs10923574 (SNP 4), the homozygote T/T allele of rs1619249 (SNP5), homozygote G/G allele of rs9866054 (SNP6), the A/C heterozygote allele of rs263156(SNP7), the A/A homozygote allele of rs260674s (SNP

2 AUCFig. 5
Fig. 5 The area under the ROC curve for each fitted model represents the prediction of lobe size, and Darwin tubercle showed excellent predictions whose AUCs are 0.956 and 0.9245.The highly predictive accuracy obtained for superior helix rolling (0.915), posterior

Table 1
SNP markers included in the Ear-Plex system for ear morphology prediction ordered according to prediction rank with molecular details and genotyping

Table 1 (
And the trained model was also tested on test data.Ninety per cent of samples was called as 'known group' or training sets in which phenotypes were known and 10% samples were used in the testing set also known as 'blind sample' in which phenotype was not known.

Table 2
.376 times more likely to get free ear lobes.The individuals' genotype change from CC to CT in rs1960918 is 0.493 times (p-value = 0.042), from GG to AG

Table 3
Closely related populations may show differences in allele frequencies affecting the significance of certain predictors and consequently affecting prediction results.Therefore, further studies on the prediction models should involve sample sets from various ethnicities in the Punjabis of Pakistanis, which may improve prediction accuracies.An additional argument is including age-and genderdependent morphological changes in prediction modelling of appearance traits.It is still unclear how sex can affect ear phenotypes.