The heritability of vocal tract structures estimated from structural MRI in a large cohort of Dutch twins

While language is expressed in multiple modalities, including sign, writing, or whistles, speech is arguably the most common. The human vocal tract is capable of producing the bewildering diversity of the 7000 or so currently spoken languages, but relatively little is known about its genetic bases, especially in what concerns normal variation. Here, we capitalize on five cohorts totaling 632 Dutch twins with structural magnetic resonance imaging (MRI) data. Two raters placed clearly defined (semi)landmarks on each MRI scan, from which we derived 146 measures capturing the dimensions and shape of various vocal tract structures, but also aspects of the head and face. We used Genetic Covariance Structure Modeling to estimate the additive genetic, common environmental or non-additive genetic, and unique environmental components, while controlling for various confounds and for any systematic differences between the two raters. We found high heritability, h2, for aspects of the skull and face, the mandible, the anteroposterior (horizontal) dimension of the vocal tract, and the position of the hyoid bone. These findings extend the existing literature, and open new perspectives for understanding the complex interplay between genetics, environment, and culture that shape our vocal tracts, and which may help explain cross-linguistic differences in phonetics and phonology. Supplementary Information The online version contains supplementary material available at 10.1007/s00439-022-02469-2.

. The list of landmarks and semilandmarks • Table S2. The list of all phenotypic measures (PMs) • Table S3. Pairs of very highly correlated PMs •  Table S4. The number of PMs supporting the 2r vs the 1r models •  Table S5. The full results •  Table S6. Relationship between the ΔAIC with cut-off 2 and the likelihood ratio test (LRT) at α-level 0.05 •  Table S7. Correlations between the estimates of interest •  Table S8. Classes of strength of evidence for narrow-sense heritability h 2 •  Table S9. Classes of strength of evidence for c 2 •  Table S10.Classes of strength of evidence for d 2 • Table S11. The distribution of the PMs of h 2 class at least IV

Supplementary references
Other supplementary materials for this manuscript include the separate full analysis report (a compressed HTML document). The computer code for performing the analyses (R, OpenMX, lavaan and Rmarkdown) and the full results (HTML), are available in the GitHub repository https://github.com/ddediu/vt-heritability under a GPLv3 license. The MATLAB code of VTANALYZER is available in the GitHub repository https://github.com/ScottMoisik/VTANALYZER under an MIT license. The primary data is available upon request from the Netherlands Twin Register (https://tweelingenregister.vu.nl/information_for_researchers/working-with-ntrdata).

Supplementary texts.
Text S1: Sources of missing data.
There are multiple reasons for which various landmarks and semilandmarks might be missing from particular MRI scans, an important one being signal disruption from braces, alongside the actual coverage of the scanning region. The latter affects many measurements (especially those involving the cervical spine). Looking at missing landmark counts, we see the following patterns (given as % of total landmarking sessions, where a session is either of the two raters' landmarking of one scan): • Cervical spine: there is an increasing cascade from C2 to C7, as follows: C2 (absent from 1.4% of the sessions), C3 (10.5%), C4 (29.2%), C5 (69.5%), C6 (93.2%) and C7 (99.6%); • Dentition: missing landmarks cascade from the back to the front, with paired landmarks (left/right) showing similar absence rates, suggesting the effects of braces and of the anterior extension of the scanning region: second molars (absent from 19.3% of the sessions), second premolars (25.1%), canines absent (25.6%), and incisors (24.8%); • Non-dentition anterior landmarks: these might have been impacted by braces, like the anterior dentition landmarks, or the scanning region being set too posterior to capture some protruding structures of the face or even too superior to capture landmarks such as the menton; it is telling that the nasion (the frontal and nasal bone meeting point), however, is absent from only 0.2% of the sessions, reflecting the constraints of MRI brain scanning: menton (absent from 18.7% of sessions), anterior nasal spine (11.3%), and pogonion (14.5%); • Larynx and hyoid: like for the landmarks of the cervical vertebrae, lower landmarks have more severe absence rates:: apex of the epiglottis (absent from 5.4% of the sessions), hyoid (17.7%), corniculate tubercle (37.7%), and petiole of the epiglottis (48.0%).
Please note that, by definition, the landmarks necessary for VTANALZER to work (head top, head back, nose tip, head side (right), and head side (left)) are present in all sessions. Furthermore, it can be seen that, in general, landmarks closer to the brain have very low absence rates (sella, basion, odontoid, condyles, nasal spine posterior, uvula, gonion, and inferior border of the eye orbits), again reflecting the design of the original studies.

Text S2: Description of the Phenotypic Measures (PMs).
The phenotypic measures (PMs) listed in Table S2 were derived from the (semi)landmarks listed in Table S1 using an automated method implemented using MATLAB and the support of freely available scripts for performing various geometric operations across the various measures Schwarz (2022). The objective in designing the PMs was to provide a comprehensive coverage of the vocal tract dimensions and shape, in as much as the landmark set would allow, with Howells (1973) providing general inspiration, and with many of the measurement definitions drawn directly from several sources. Cervical spine measurements CS3H, CS4H, CS5H, CS6H, CS7H, and CS2A are based on Grave et al. (1999). Measurements HBC3, PNNP, HNSL, SPUN, DICD, DIPD, DIMD are based on Alfwaress et al. (2015), with DICD, DIPD, and DIMD also following Reenen and Allen (1987). Measurements for the skull, including several distances (SBAS, SBAN, SBAP, SNNA, SNNP, SNAM, and SSEN) and angles (ABSN, ACSN, AASN, AANP, APNS, APNP, ASCG, ASCG, ANSF, ANNF, and ALNS) are based on Solow et al. (1982). Finally, we also drew inspiration from Nishimura et al. (2006) in defining the measurements characterizing the horizontal and vertical lengths of the vocal tract, SVTh, SVTh*, SVTv, SVT*. The other measures are based on our own design or represent simple extensions of measures listed here using slightly different landmarks (e.g., we added HBC2 and HBC4 as extensions of HBC3 noted above).

Text S3: Modeling rater agreements and disagreements.
The two raters coded the same MRI scans independently, following the same training guidelines and using the same platform. However, they did discuss complex cases between them. To model the rater effects, we need to decide if the raters have the same error variance. While the fully unconstrained model fitted to the data would arguably be the least a priori biased, this model includes many parameters to estimate, which is potentially problematic given our sample size. Before seeing the data, based only on the organization of the training and of the landmarking process, supplemented by personal observations and debriefing interviews with the raters during and after the landmarking, the most constrained model (i.e., same latent means and error variances) would be favored. After having obtained the data (here, the estimates of the PMs for each of the two raters and not the actual (semi)landmarks), we first examined the correlations and the differences between the two raters for the corresponding participants and measures, and we found that the two raters tend to agree on the means of the measurements and to have medium to high correlations, suggesting that the latent means might indeed be equal. Finally, for each PM, we fitted two GCSM models in OpenMX (see below for details), one allowing different error variances for the two raters (the "2r" model) and another constraining the error variances to be equal (the "1r" model). These models were compared using both Akaike's Information Criterion (AIC; corrected with the "Parameters Penalty", which uses the number of free parameters in the model rather than the degrees of freedom) and the likelihood ratio test. Crucially, we used the overall results across the measures to decide between which one of the 1r or the 2r models to apply uniformly to all measures, therefore not inflating the number of multiple tests performed between measures. We found that none of the measures supported the 2r model, with the two models being equivalent for a majority of the measures, and the few remaining ones supporting the 1r model (see Table S3). Taking all these together and preferring simplicity, we decided to implement the 1r model throughout (see Figure 2 in the paper).

Text S4: Re-implementation in lavaan.
As an extra check, we re-implemented the GCSM model in lavaan (Rosseel 2012); for full details, please see the full HTML report. However, this re-implementation differs from the one in OpenMX mainly in that the covariates were not included in the GCSM model, but were regressed out from the PMs previous to fitting the GCSM model. Moreover, we implemented both the ACE and the ADE models, and we chose between them using Akaike's Information Criterion, AIC, and not by comparing the latent phenotypic correlations rMZ ans 2rDZ. lavaan returns the standardized estimates with standard errors, 95% confidence intervals, and p-values, but for the A, C and D components, we also performed model comparison, using AIC and the likelihood ratio test for nested models, between the "full" model and the "reduced" model without the component of interest. With these, we obtained, for all the PMs, the fitted model (ACE or ADE) and the standardized point estimates, standard errors, 95%CIs, p-values and (for A, C and D) the ΔAIC and p-value of the LR test, which we compared with their corresponding main OpenMX estimates.
First, even if the choice between the ACE and ADE genetic models are based on different approaches, for 127 (87.0%) PMs, the chosen genetic model is the same, but for virtually all the remaining PMs, the lavaan ACE and ADE models are virtually indistinguishable (and that for some, even the OpenMX latent rMZ and 2rDZ are also very close), suggesting that the differences between the genetic models are very superficial.

Fig. S1.
Inter-rater agreement, ICC(C,1), by domain (left) and type (right), ordered by increasing median agreement from top to bottom. We shows that actual points as well as the boxplots. Generated automatically using R 4.1.3 (https://www.r-project.org/).

Fig. S2.
Pairwise scatterplots between the estimates of the variance components (h 2 , c 2 , d 2 and e 2 ) and inter-rater agreement, ICC (C,1), showing the actual points as well as the linear (blue) and loess (red) fits (with 95% confidence intervals as the gray areas). Generated automatically using R 4.1.3 (https://www.r-project.org/).  S4. Midsagittal view of h 2 class I PMs (i.e., PMs with very strong evidence for high narrow-sense heritability coupled with high inter-rater agreement). This is Panel A of Figure 3 in the main text. Please see Figure 3 in the main text for conventions.

Fig. S5.
Midsagittal view of h 2 class II PMs (i.e., PMs with relatively high narrow-sense heritability, coupled with good inter-rater agreement). This is Panel B of Figure 3 in the main text. Please see Figure 3 in the main text for conventions and Text S2 and Table S2 for the information about the PMs.

Fig. S6.
Midsagittal view of h 2 class III PMs (i.e., PMs with relatively high narrow-sense heritability, but lacking inter-rater agreement). This is Panel C of Figure 3 in the main text. Please see Figure 3 in the main text for conventions and Text S2 and Table S2 for the information about the PMs.

Fig. S7.
Midsagittal view of h 2 class IV PMs (i.e., PMs with circumstantial evidence of narrow-sense heritability, irrespective of inter-rater agreement). Please note that ANSF is not shown. This is Panel D of Figure 3 in the main text. Please see Figure 3 in the main text for conventions and Text S2 and Table S2 for the information about the PMs.

Fig. S8. Mandible view of h 2 class I PMs.
Fig. S8. Mandible view of h 2 class I PMs (i.e., PMs with very strong evidence for high narrow-sense heritability coupled with high inter-rater agreement). This is Panel E of Figure 3 in the main text. Please see Figure 3 in the main text for conventions and Text S2 and Table S2 for the information about the PMs.

Fig. S9.
Mandible view of h 2 class II PMs (i.e., PMs with relatively high narrow-sense heritability, coupled with good inter-rater agreement). This is not shown in Figure 3 in the main text. Please see Figure 3 in the main text for conventions and Text S2 and Table S2 for the information about the PMs.

Fig. S10. Mandible view of h 2 class III PMs.
Fig. S10. Mandible view of h 2 class III PMs (i.e., PMs with relatively high narrow-sense heritability, but lacking interrater agreement). This is Panel F of Figure 3 in the main text. Please see Figure 3 in the main text for conventions and Text S2 and Table S2 for the information about the PMs.

Fig. S11. Mandible view of h 2 class IV PMs.
Fig. S11. Mandible view of h 2 class IV PMs (i.e., PMs with circumstantial evidence of narrow-sense heritability, irrespective of inter-rater agreement). This is Panel G of Figure 3 in the main text. Please see Figure 3 in the main text for conventions and Text S2 and Table S2 for the information about the PMs.

Fig. S12. Hard palate view of h 2 class I PMs.
Fig. S12. Hard palate view of h 2 class I PMs (i.e., PMs with very strong evidence for high narrow-sense heritability coupled with high inter-rater agreement).This is not shown in Figure 3 in the main text. Please see Figure 3 in the main text for conventions and Text S2 and Table S2 for the information about the PMs.

Fig. S13. Hard palate view of h 2 class II PMs.
Fig. S13. Hard palate view of h 2 class II PMs (i.e., PMs with relatively high narrow-sense heritability, coupled with good inter-rater agreement). This is not shown in Figure 3 in the main text. Please see Figure 3 in the main text for conventions and Text S2 and Table S2 for the information about the PMs.

Fig. S14. Hard palate view of h 2 class III PMs.
Fig. S14. Hard palate view of h 2 class III PMs (i.e., PMs with relatively high narrow-sense heritability, but lacking inter-rater agreement). This is not shown in Figure 3 in the main text. Please see Figure 3 in the main text for conventions and Text S2 and Table S2 for the information about the PMs.

Fig. S15. Hard palate view of h 2 class IV PMs.
Fig. S15. Hard palate view of h 2 class IV PMs (i.e., PMs with circumstantial evidence of narrow-sense heritability, irrespective of inter-rater agreement). Please note that ANSF is not shown. This is Panel H of Figure 3 in the main text. Please see Figure 3 in the main text for conventions and Text S2 and Table S2 for the information about the PMs.  Table S1. The list of landmarks and semilandmarks.  The point should be positioned in the midsagittal plane and located directly below the pituitary infundibulum (stalk). This landmark is always on the anterior aspect of the bone in comparison to the gnathion, which is found on the inferior aspect.

menton (m) No
A point on the lower border of the mandibular symphysis in the midsagittal plane.
This landmark is always on the inferior aspect of the bone in comparison to the pogonion, which is found on the anterior aspect.
apex inferior (ai) No Root apex of the right-sided mandibular central incisor. Right side only. Can be found by starting in the midsagittal plane and then searching laterally to the right for the sagittal plane which contains the best image of this incisor.
apex superior (as) No Root apex of the right-sided maxillary central incisor. Right side only. Can be found by starting in the midsagittal plane and then searching laterally to the right for the sagittal plane which contains the best image of this incisor.

incision inferior (ii)
No A point on the incisive edge of the right-sided mandibular central incisor.
Right side only. Can be found by starting in the midsagittal plane and then searching laterally to the right for the sagittal plane which contains the best image of this incisor.
incision superior (is) No A point on the incisive edge of the right-sided maxillary central incisor.
Right side only. Can be found by starting in the midsagittal plane and then searching laterally to the right for the sagittal plane which contains the best image of this incisor. Locate it by first searching in the sagittal view along the mandible at a part where the teeth are visible and then scanning laterally until the condyle can be identified with certainty. Keep searching laterally until the condyle disappears from view and mark its location.

Name (abbr) Semi Description Notes gonion left (gol) No
condylion right (cdr) No A point on the most lateral surface of the right mandibular condyle.
Locate it by first searching in the sagittal view along the mandible at a part where the teeth are visible and then scanning laterally until the condyle can be identified with certainty. Keep searching laterally until the condyle disappears from view and mark its location.
canine left (cal) No A centrally located point on the gingival margin of the lingual surface of the left-sided maxillary canine (C).
Defines the start landmark of the hard palate coronal canine semilandmarks.
canine right (car) No A centrally located point on the gingival margin of the lingual surface of the right-sided maxillary canine (C).
Defines the end landmark of the hard palate coronal canine semilandmarks.
second premolar left (pml) No A centrally located point on the gingival margin of the lingual surface of the left-sided maxillary second premolar (PM2).
Defines the start landmark of the hard palate coronal second premolar semilandmarks.
second premolar right (pmr) No A centrally located point on the gingival margin of the lingual surface of the right-sided maxillary second premolar (PM2).
Defines the end landmark of the hard palate coronal second premolar semilandmarks.
endomolare left (eml) No A centrally located point on the gingival margin of the lingual surface of the left second molar.
Defines the start landmark for the hard palate coronal M2 semilandmarks and the start landmark of the maxillary dental arch semilandmarks. endomolare right (emr)

No
A centrally located point on the gingival margin of the lingual surface of the right second molar.
Defines the end landmark for the hard palate coronal M2 semilandmarks and the end landmark of the maxillary dental arch semilandmarks.
corniculate tubercle (ct) No A point on the posterior and superior-most surface of the corniculate tubercles in the midsagittal plane.
May not be visible. Defines the end landmark of the pharynx wall semilandmarks.

nasopharynx (np) No
A point on the posterior pharyngeal wall within the nasopharynx that intersects a line drawn between the atlas and nasal spine posterior landmarks.
Defines the start landmark of the pharynx wall semilandmarks. Care must be taken to locate the thin mucosal covering of the hard palate which defines the contour that is traced with the semilandmarks. It can be helpful to look in parallel slices to see how the mucosa looks in relation to the tongue, which often is pressed against the hard palate.

Name (abbr) Semi Description Notes
hard palate coronal 2nd molar (hpc-m2) Yes A curve formed from semilandmarks defined between endomolare left and endomolare right. Comprises 12 sample points. The individual semilandmark points are constrained to move only in the y-direction (axial freedom).
Care must be taken to locate the thin mucosal covering of the hard palate which defines the contour that is traced with the semilandmarks. It can be helpful to look in parallel slices to see how the mucosa looks in relation to the tongue, which often is pressed against the hard palate.
hard palate coronal 2nd premolar (hpc-pm2) Yes A curve formed from semilandmarks defined between second premolar left and second premolar right.
Comprises 15 sample points. The individual semilandmark points are constrained to move only in the y-direction (axial freedom).
Care must be taken to locate the thin mucosal covering of the hard palate which defines the contour that is traced with the semilandmarks. It can be helpful to look in parallel slices to see how the mucosa looks in relation to the tongue, which often is pressed against the hard palate.
hard palate coronal canine (hpc-c) Yes A curve formed from semilandmarks defined between endomolare left and endomolare right. Comprises 15 sample points. The individual semilandmark points are constrained to move only in the y-direction (axial freedom).
Care must be taken to locate the thin mucosal covering of the hard palate which defines the contour that is traced with the semilandmark. It can be helpful to look in parallel slices to see how the mucosa looks in relation to the tongue, which often is pressed against the hard palate.   Table S2. The list of all phenotypic measures (PMs) used here, with their short four-character names (the star * helps deduplicate otherwise identical short names, such as SVTv and SVTv*), their anatomical domain and type, their full description (which makes reference to the landmarks and semilandmarks defined in Table S1), and their full names. Please note that 4 PMs turned out to be perfectly duplicated (i.e., they measured exactly the same things twice) and were subsequently removed from the analysis: column Remove shows "Yes" for the 4 exact duplicated PMs that were not considered (and shows, in parentheses, the corresponding PM that was kept). Column Warning shows the warning score for those measures with a score ≥ 3, as well as the main reasons for the score (these derive from manual and automatic checks and may seem rendudant as in, for example, "seems skewed" and "moderately skewed", the first derived subjectively from looking at the PM's histogram and QQ-plot, the second being based on the actual skewness of the distribution against [-1,1] for high skewness, and against [-1,-0.5] ∪ [0.5,1] for moderate skewness). Scores ≥ 5 represent serious interpretation warnings and were. in fact, dropped from the final results. The table is ordered by domain, type and short name. The short names of the PMs considered in the analysis are in bold italic.     Table S4. The number of PMs supporting the 2r vs the 1r models. Table S4. The number of PMs supporting the 2r (i.e., different error variances between raters) vs the 1r (i.e., equal error variances) models. We used three ways of quantifying, for a given PM, which of the two models is better supported (or if they are deemed equivalent): looking at the direction and the difference in AIC points (ΔAIC) and comparing it against 2 (a rather weak difference between models) or against 10 (a very strong difference), and looking for a significant (at α-level 0.05) likelihood ratio test. For each of these methods (the rows) we give the number (and %) of PMs that support the 2r, neither, or the 1r model (the columns). N.B., both 2r and 1r assume equal latent means between raters.   For each PM, we give the narrow sense heritability h 2 , the shared environment c 2 or dominance d 2 components, and the non-shared environment and error e 2 ; for each component we give: the point estimate (the 95% confidence interval in parentheses), the model comparison likelihood ratio test nominal p-value (and the Holm-corrected p-value in parentheses). We also give the inter-rater agreement ICC(C,1) as a point estimate (and with 95% confidence interval). For h 2 and c 2 /d 2 , we give the class to which the PM belongs in terms of the strength of evidence in favour of narrow sense heritability, shared environment or dominance (ordered decreasingly from the most convincing in class I to virtually no evidence in class V). The genetic model fitted (ACE or ADE) helps interpret the 'c 2 or d 2 ' column. The table is ordered by decreasing h 2 class and estimate. See Text S2 and Table S2 for the information about the PMs.  Table S6. Relationship between the ΔAIC with cut-off 2 and the likelihood ratio test (LRT) at α-level 0.05. Table S6. Relationship between the ΔAIC with cut-off 2 and the likelihood ratio test (LRT) at α-level 0.05. For each of variance components (rows), we give the number of PMs (also as %) for which the two criteria agree that the component is significantly different from 0 (column 'TT'), for which they disagree with ΔAIC judging it significantly different from 0 but LRT not (column 'TF'), for which they disagree with ΔAIC judging it not significantly different from 0 but LRT does (column 'FT'), and for which they agree in judging it not significantly different from 0 (column 'FF'). The only disagreement is for c 2 for PM SSEN, where ΔAIC = 1.86 and LRT's p = 0.049; given the limited nature of this single case, it can be concluded that the two criteria are virtually identical for our data.