The Effect of Lingual Resistance Training Interventions on Adult Swallow Function: A Systematic Review

Lingual resistance training has been proposed as an intervention to improve decreased tongue pressure strength and endurance in patients with dysphagia. However, little is known about the impact of lingual resistance training on swallow physiology. This systematic review scrutinizes the available evidence regarding the effects of lingual resistance training on swallowing function in studies using Videofluoroscopic Swallowing Studies (VFSS) with adults. Seven articles met the inclusion criteria and underwent detailed review for study quality, data extraction, and planned meta-analysis. Included studies applied this intervention to a stroke and brain injury patient populations or to healthy participants, applied different training protocols, and used a number of outcome measures, making it difficult to generalize results. Lingual resistance training protocols included anterior and posterior tongue strengthening, accuracy training, and effortful press against hard palate with varying treatment durations. VFSS protocols typically included a thin barium stimulus along with one other consistency to evaluate the effects of the intervention. Swallowing measures included swallow safety, efficiency, and temporal measures. Temporal measures significantly improved in one study, while safety improvements showed mixed results across studies. Reported improvements in swallowing efficiency were limited to reductions in thin liquid barium residue in two studies. Overall, the evidence regarding the impact of lingual resistance training for dysphagia is mixed. Meta-analysis was not possible due to differences in methods and outcome measurements across studies. Reporting all aspects of training and details regarding VFSS protocols is crucial for the reproducibility of these interventions. Future investigations should focus on completing robust analyses of swallowing kinematics and function following tongue pressure training to determine efficacy for swallowing function. Electronic supplementary material The online version of this article (10.1007/s00455-019-10066-1) contains supplementary material, which is available to authorized users.


Introduction
Lingual resistance training has emerged as an intervention for the rehabilitation of swallowing impairment, based on the fact that reduced tongue pressures have been found in adults with neurogenic dysphagia [1][2][3]. A recent systematic review by McKenna et al. [4] found converging evidence that gains in tongue strength can be expected after a course of isometric lingual strength training, but concluded that it remains unclear whether these strength gains generalize to improvements in swallow function. The intent of this systematic review is to look deeper into reported changes in swallowing function following lingual resistance training. We were specifically interested to scrutinize research using videofluoroscopic swallowing studies (VFSS) to measure changes in swallowing function, to evaluate and methodologically compare the VFSS protocols that have been used, and, if possible, to synthesize results across studies.
The tongue plays an important role in swallow function as it is composed of an intricate muscle structure allowing for fast and flexible posturing during oral functions [5][6][7]. During swallowing, its intrinsic and extrinsic muscles function synergistically to aid in bolus containment, loading, and the generation of a driving force exerted on 1 3 the bolus to propel and squeeze it through the oropharynx [8][9][10][11][12][13][14]. Tongue strength has been investigated in a number of patient populations, including Parkinson disease [15][16][17], head and neck cancer [18][19][20][21], oculopharyngeal muscular dystrophy [22,23], acquired brain injury [2], and cerebrovascular accident [1,3]. Acute neurological impairments, such as stroke and brain injury, along with other progressive impairments, such as Parkinson disease, are known to be associated with high rates of swallowing impairment or oropharyngeal dysphagia [24]. In these patient populations, the tongue may fail to contain the bolus in the mouth or generate the necessary force to propel the bolus into the pharynx in a coordinated and controlled manner. Potential functional consequences of tongue weakness include impairments in swallow timeliness and airway closure resulting in penetration and/ or aspiration (safety concerns) and the accumulation of residual material in the oropharyngeal cavities (efficiency concerns) [25].
Recent research has shown promising results for tongue strengthening exercises in building tongue strength and endurance in both healthy and disordered populations [1,[26][27][28][29][30], which has led to the increasing uptake of lingual resistance training protocols in swallowing rehabilitation [31]. However, whether improvements in swallowing function occur remains less clear. We were interested in further scrutinizing the available evidence regarding the effects of lingual resistance training on swallowing function. Our research questions were: (1) Which lingual resistance training protocols have been used to target improved swallowing function in adults? (2)

Methods
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [32] was used to guide development and methodology of this systematic review.

Search Strategy
An information specialist assisted in conducting a systematic search of the literature in AMED (Allied and Complimen

Eligibility Criteria
Studies were included if they met the following criteria: (1) studied human subjects over the age of 18 years, (2) provided a tongue pressure intervention, (3) completed a baseline VFSS to outline swallow function pre-intervention, (4) performed a post-intervention VFSS to measure intervention effects on swallowing, (5) had abstract and full-text available in English, and (6) were published in a peer-reviewed journal. The review was restricted to randomized controlled trials, controlled studies, case-control studies, cohort studies, and case series designs. Single case studies were excluded from this review as we aimed to examine articles using representative samples of reasonable size. Studies characterizing tongue pressure without delivery of any intervention targeting the tongue were excluded. Additionally, studies validating new technologies or devices for measuring tongue pressure were also excluded as the aim of this review was to determine the effects of intervention and not on technological development.
Tutorials, educational reports, literature reviews, systematic reviews, book chapters, and conference abstracts were excluded because of their lack of prospective intervention design. Studies with pediatric populations and animals were also excluded from this review because our purpose was to investigate effects of lingual resistance training on the swallowing function of adults. We excluded studies with populations who had received surgical interventions to the head and neck, as we were interested in outcomes on functions of unaltered muscular anatomy. Similarly, studies that included patients who had received chemotherapy or radiation were excluded from this review as these treatments may further exacerbate swallowing impairment due to side effects including muscle fibrosis and neuropathy.

Study Selection
As shown in Fig. 1, the original search yielded a total of 1327 records, of which 472 were duplicates. After removal of these duplicates, two reviewers independently assessed the titles and abstracts of all retrieved records and determined their eligibility for potential inclusion. Cohen's Kappa and percentage of inter-rater agreement were calculated to evaluate the level of agreement between both raters [33]. When ratings were conflicting, the article was retained for full-text review. Full-text articles of both accepted and conflicting ratings were then reviewed independently by both reviewers in order to determine whether the article should be included in the systematic review. If an article was not selected, a reason for exclusion was documented based on eleven rejection criteria (see Fig. 1). Disagreements in final ratings of full-text articles were resolved by consensus between both raters.

Risk of Bias (Quality) Assessment
A methodological quality assessment of individual studies was completed independently by each reviewer to evaluate the validity of study design and reporting methods. Risk of bias evaluation was completed using the Cochrane Collaboration's Tool for Assessing Risk of Bias [34]. The criteria assessed were selection bias (random sequence generation, allocation concealment), performance bias (blinding of participants and personnel), detection bias (blinding of outcome assessment), attrition bias (incomplete outcome data), and reporting bias (selective reporting). It was of particular interest to document whether a study included sufficient detail to permit replication when describing the intervention protocol, the VFSS procedures used (e.g., frame rate, stimuli used, number of trials), or the methods of VFSS analysis used (i.e., duplication of VFSS rating, and the use of valid and reliable operational definitions and assessment tools for VFSS analysis). Each item on the Cochrane Collaboration's Tool for Assessing Risk of Bias was scored with a "Y" for yes if susceptible to bias in that category, "N" for no if not susceptible to bias in that category, and "U" for unsure/other if raters could not determine appropriate scores, if the criteria were not applicable, or if this was not reported for that particular category.
Quality in reporting was also scored using the NIH Quality Assessment Tool for Before-After (Pre-Post) Studies with No Control Group [35] to evaluate the internal validity of eligible studies with quasi-experimental, pre-post-intervention designs: (1) Studies that had the least risk of bias were classified as "good", (2) those susceptible to bias were considered "fair" if this bias was not sufficient to invalidate their results, and (3) those as "poor" if they had a significant risk of bias. In cases where there was a disagreement in ratings, reviewers met and discussed their ratings until they achieved consensus.

Data Extraction Process
Data extraction was completed independently by a single rater for full articles that met all inclusion criteria outlined above. A form was developed to standardize and capture the relevant data from each article (Online Appendix B). Data extraction included the following: (1) study design; (2) patient population descriptions (age, sex, etiology); (3) sample size; (4) proportion of males and females; (5) use of matched controls; (6) intervention details; (7) tongue strengthening device used-if applicable; (8) tongue intervention protocol (repetitions, frequency, duration); (9) VFSS protocol (stimuli, volumes and trials, frames/s); (10) outcome measure: swallow safety; (11) outcome measure: swallow efficiency; and (12) other visuo-perceptual or temporal swallowing parameters measured on VFSS. Results from each study including statistical analyses of changes in swallowing function after intervention were also extracted. Figure 1 provides an overview of the selection process for included studies. Of the 855 studies identified for preliminary screening of titles and abstracts, 817 were rejected after failing to meet inclusion criteria. At abstract screening, the inter-rater agreement was 96.5% with a Cohen's Kappa statistic of 0.41. Although high percent agreement was achieved, only moderate inter-rater agreement at the abstract screening level was suggested by the Cohen's Kappa result, due to one rater's tendency to rate items as unsure [36]. When examining ratings at the full-text level, levels of inter-rater agreement were 94.7% with a Cohen's Kappa 1 3 of 0.86 indicating almost perfect agreement between raters [36].

Literature Retrieval
The most common reason for article exclusion (N = 487) was that tongue measurements were not collected pre-and post-interventions involving tongue-specific exercises, rather, tongue measurements were obtained to characterize impairment at a single time point, for other interventions not focused on the tongue, or to guide the development of new tools/technologies (e.g. video-ultrasonography, Glide-Scope®, algorithms and neural networks). Another common reason for exclusion was use of the single case study design (N = 105), regardless of whether tongue pressure training was utilized. Ultimately, a total of seven articles met the inclusion criteria for review and data extraction. Table 1 summarizes the quality assessment of all included studies using the Cochrane Risk of Bias Tool. Selection bias was found in four studies, where participants were enrolled either via convenience sampling or by consecutive recruitment without randomization. Quality assessment highlighted a high degree of performance bias in all studies included, where blinding of study participants to their allocated intervention group (if applicable) and of the personnel performing the intervention was not reported. Of the seven studies reviewed, two reported that they included some level of blinding of outcome assessor, where the clinicians rating the VFSS were blinded to participant [2,3]. Detection bias was present in the remaining five studies, as there was no mention of blinding of the raters in the study for any outcome measures. Attrition bias was relatively low, with all participants completing the full intervention in four [1,2,29,37] of the seven studies reviewed. Two studies [3,38] were deemed to have a high risk of attrition bias as more than 20% of their participants were lost to follow-up. One study [39] did not report final sample size, and therefore it was not possible to determine whether any participants were lost to follow-up. Finally, reporting bias was deemed to be high in three studies [37][38][39] that did not provide any information regarding the stimuli used to assess swallowing function on VFSS. The quality assessment completed using the NIH tool deemed four studies to be "poor" in quality, two as "fair", and one study as "good"(see Table 2). Four main reasons for low quality were found: (1) small sample sizes [1,2,29,39]; (2) lack of clarity with regards to the intervention/service provided and whether this was delivered consistently to all patients [1,29,37,39]; (3) use of valid and reliable outcome measures [2,3,[37][38][39]; and (4) blinding of those providing the intervention or analyzing the data [1,29,[37][38][39].

Patient Characteristics
Patient characteristics can be found in Table 3. Three different patient population groups were included: stroke [1,3,[37][38][39], acquired brain injury [2], and healthy participants [29]. Sample sizes varied widely across studies, ranging from six participants [2] to 29 participants [38]. Studies included both male and female participants; however, most studies included a larger proportion of males compared to females. The ages of participants enrolled across studies ranged from 32 [2] to 90 years [1]. Only three articles [3,37,38] included a group for comparison; however, the comparison group received some form of dysphagia intervention termed conventional or traditional exercise or an alternative tongue intervention protocol in all three cases.

Question 1: Training Protocols
Tongue exercises included anterior and posterior tongue strengthening, tongue pressure accuracy training, and oral motor exercises of the tongue including effortful press against hard palate. The Iowa Oral Performance Instrument (IOPI) was the primary device used for lingual resistance training across the studies identified [1-3, 29, 37, 38]; however, one study used no tool at all [39]. A large variation was also found in the treatment durations including 4, 5, 6, 8, and 12 weeks, with all protocol durations exceeding 4 weeks. Exercises were typically repeated 30 or more times per session, while their frequency was outlined as two, three, or five times per week. Exercises were completed solely in clinic [2,3,37,38] or with some form of clinical guidance from a speech-language pathologist or an occupational therapist along with self-directed home training [1,29,39] (see Table 4). Additionally, intervention for the treatment groups was not always limited to tongue interventions and included conventional dysphagia therapy techniques [37][38][39] such as effortful swallowing, thermal tactile stimulation, facial

VFSS Protocols
The seven studies included for review employed a broad range of VFSS protocols to assess swallow function post-treatment (see Table 5). None of the included studies reported the frame rate at which their VFSS studies were captured and recorded. Frame rate has been noted to interfere with the integrity of VFSS analysis if below 15 frames per second, particularly with respect to identifying penetration-aspiration events [40]. It is important to note that not all studies disclosed their VFSS protocols, which would hinder replication of results; however, those that reported their VFSS protocols included a thin barium stimulus to evaluate the effects of the intervention [1][2][3]29]. Information about barium concentration and brand of barium used was reported in all studies reporting VFSS protocol. Other stimuli included in the VFSS protocols varied across studies: (1) "thickened" stimuli [3], "puree" stimuli [2], and semisolid stimuli [1,29]. No study utilized a solid consistency as part of their VFSS protocol.
Swallowing Biomechanics Each study along with its inclusion criteria, study design, and a list of the outcome measures collected is shown in Table 6. Change in swallowing physiology was reported as an outcome of interest in six of the studies included in this review. The most commonly collected outcome measures of swallowing physiology to determine changes pre-and post-treatment were temporal measurements. Temporal measures included: oral transit time, pharyngeal transit time, stage transition duration, oral transit duration, oral clearance duration, pharyngeal transit duration, pharyngeal clearance duration, pharyngeal response duration, duration of hyoid maximum elevation, duration of hyoid maximum anterior excursion, duration to upper esophageal sphincter (UES) opening, duration of UES opening, total swallowing duration. Of these studies, three used the Videofluoroscopic Dysphagia Scale (VDS) as the tool of measurement; these studies did not report scores per parameter but instead reported compiled scores out of 100 across all parameters [41]. The VDS tool characterizes swallowing impairment based on ordinal scales for 14 parameters related to the oral and pharyngeal stage of swallowing, including some physiological measures (e.g., trigger of pharyngeal swallow, laryngeal elevation, pharyngeal transit time), and appears to be popular in Korea.

Swallowing Safety
All studies reported swallowing safety as an outcome of interest following lingual resistance training. The Penetration-Aspiration scale (PAS) [42] is an   [2] (1) Dysphagia secondary to acquired brain injury following a motor vehicle accident; (2) Impaired swallowing safety, i.e., scores less than or equal to 3 on the PAS with thin liquids; (3) Post-swallow residues in the valleculae or pyriform sinuses with either thin and/or spoon-thick liquids measured using a 4-point ordinal scale 8-point scale grading the depth of penetration and aspiration of the bolus into the laryngeal vestibule along with subject response. The PAS was the primary tool used to quantify swallowing safety [1][2][3]29], while the aspiration parameter on the VDS was used by the remaining studies [37][38][39] to assign scores related to presence of laryngeal vestibule invasion, supraglottic penetration, and subglottic aspiration. One article [37] reported swallowing safety using both the VDS and PAS scales. Reports of penetration-aspiration were provided using either ordinal scales or percentage estimates of the amount of the bolus aspirated in these studies.  [1,29]. Residue was quantified using a number of ordinal scales, including the following:

Question 3: Other Measures
Other measures used to determine the effects of lingual resistance training included magnetic resonance imaging (MRI) to evaluate the total lingual volume [1,29], a dietary intake questionnaire [1], and the Quality of Life in Swallowing Disorders Questionnaire (SWAL-QOL) [45] to quantify changes in swallowing related quality of life [1].

Tongue Pressure Generation Isometric Tongue Pressures:
Isometric tongue-palate pressures post-treatment was measured as an outcome of interest in a total of six studies, at either the anterior region, posterior region, or both regions. In all four studies assessing outcomes in the anterior region [1,2,29,38], improvement was found and a statistically significant increase in pressures was reported in three studies.
The fourth of these studies [2] used single subject methods for reporting results, and reported that 5/6 participants achieved improvement defined as three successive sessions in which pressures fell above a Cohen's d effect size threshold of 0.6 versus baseline. For the posterior region, four studies reported significantly increased posterior isometric pressures between baseline and post-treatment measures [1,3,29,38] and a fifth [2] reported improvements in all participants using the Cohen's d effect size criterion. One study reported statistically significant increases in peak isometric pressures, however, no information was provided regarding the placement of the air filled bulb used to measure these pressures [29]. Swallowing Pressures: Tongue pressures collected using a three-bulb array attached along the midline of the hard palate were also reported in two studies [1,29]. These pressures differ from isometric tongue pressures as they were collected during VFSS while patients were swallowing  [29]. In the other study, significant increases in swallowing pressures were reported in at least one of three trials of 3 ml thin liquid, 10 ml thin liquid, and semisolid bolus conditions [1]. One other study collected tongue pressure measurements during swallowing tasks by utilizing saliva swallows. Half of their participants demonstrated increased saliva swallowing pressures beyond the effect size boundary for at least three consecutive sessions [2].

Swallowing Outcomes Temporal Measures:
No statistically significant changes in timing measures of swallowing were found in two [3,29] of the three studies that collected them. In one study, oral transit duration (defined as time from beginning of posterior bolus movement until arrival of bolus head at ramus of mandible) [1] significantly improved for the 3 ml liquid bolus conditions in one of three bolus trials. Similarly, a significant effect was found for pharyngeal response duration (defined as time from beginning of hyoid excursion until hyoid returns to rest) for the 3 ml liquid and 10 ml liquid bolus conditions, also observed on one of three bolus trials per consistency. No additional physiological measures collected showed statistically significant changes (e.g., duration of pharyngeal response, UES opening, time to UES opening, hyoid maximum elevation, and hyoid maximum anterior excursion).
Swallowing safety: Swallowing safety pre-and post-treatment on VFSS was measured using PAS in most studies, with mixed results. In two studies, no significant improvements (i.e. decreases) in PAS were found for thin [1,3], nectar [3], or pudding consistencies [1]. Significantly decreased (i.e. improved) PAS values were reported in three studies, two of which provided information regarding bolus stimuli used to assess swallow safety. As the remaining studies used the VDS tool and did not dissociate scores related to the swallow safety parameter from other parameters when reporting results, overall effects of the intervention on swallowing safety alone could not be extracted.
Swallowing Efficiency: A statistically significant reduction in vallecular residue was noted in NRRS scores for thin liquid stimuli in one study [3], while no significant differences were found for nectar-thick stimuli or for any other bolus type in the remaining studies. All studies, except one [1], reported no significant decreases in either oral cavity or pyriform sinus residue. In this study by Robbins et al. [1], the authors concluded that mean oropharyngeal residue scale scores changed significantly for three bolus conditions (3 ml effortful, 3 ml liquid, and 10 ml liquid), however repeated measures were not accounted for.
Other measurements: Oral phase parameters (lip closure, bolus formation, mastication, apraxia, tongue to palate contact, premature bolus loss, and oral transit time) and pharyngeal phase parameters (trigger of pharyngeal swallow, vallecular residue, laryngeal elevation, pyriform sinus residue, coating on the pharyngeal wall, pharyngeal transit time, and aspiration) as captured on the VDS were reported to significantly improve in all three studies using this tool. Data relating to each specific parameter were not reported in any of these studies. Detailed information regarding all reported outcomes, including results for MRI measures, quality of life measures, and dietary intake questionnaires, are given in Table 7.

Risk of Bias
This review systematically examined the strength and quality of evidence for using lingual resistance training as an intervention to impact swallow function as measured using VFSS. The seven studies selected for review had mixed quality with four rated as "poor" on the risk of bias tools selected. Of note, performance bias was common as either blinding of participants and personnel during treatment or the blinding of individuals rating the VFSS to participants and time-point relative to intervention was not reported in any of the selected studies. Additionally, performance bias was rated as high on studies that did not appropriately handle the statistical analysis of a categorical (PAS) or ordinal outcome scale (VDS). Selection bias was also found in five studies as convenience sampling was mainly used with no randomization or concealment to treatment conditions. Furthermore, small sample size was identified as a limitation in 50% [1,2,29,39] of included studies. This increases the risk of bias as it undermines the reliability of the results leading to lack of confidence that any statistically significant effect reflects a true effect at the population level.
Another common reason for lower quality ratings were concerns about the validity and reliability of the outcome measures used. In a recent study by Swan et al. [46] using the COnsensus-based Standards for the selection of health Measurement Instruments (COSMIN) process [47] to evaluate the psychometric quality of swallowing assessment tools, the VDS was found to have limited reliability, content validity, and indeterminate hypothesis testing (or item construct validity), while the PAS revealed conflicting findings in terms of reliability and intermediate content validity and hypothesis testing. Additionally, the lack of clarity with regards to intervention descriptions, delivery, and protocol adherence impacted quality assessment [1,29,37,39]. The large heterogeneity in the patient populations, protocols used Significant improvements in both oral and pharyngeal phase of VDS for experimental and control groups (p < 0.000), and also between groups (p < 0.000) PAS Significant decrease in PAS for both groups (p < 0.000); No significant differences between groups (p = 0.0471) Park et al. [38] VDS (oral phase; pharyngeal phase; total score) Statistically significant differences in both the oral (p < 0.01) pharyngeal (p < 0.05) stages, and the total score (p < 0.

Patient Characteristics and Outcome Measures
This review identified mixed evidence that tongue intervention specifically impacts swallowing safety or efficiency in isolation; however, improved swallowing function (either safety and/or efficiency) was reported in six of seven studies reviewed [1][2][3][37][38][39]. The only study that did not find significant improvement in swallowing function was one that recruited healthy older adults [29]. These results are not surprising given that the participants recruited did not present with swallowing impairments in the first place. Additionally, swallowing pressures, anterior, and posterior tongue strength were reported to significantly improve from baseline in all studies utilizing the IOPI as a measurement and training tool, even for the healthy older adult population; however, this did not have a direct relationship with either safety or efficiency changes. Although positive evidence was found for the impact of lingual resistance training on swallow function, this is confounded by the heterogeneity of patient populations, training protocols, swallow function measurement, and other outcome measures seen across the selected studies. While the majority of patients who underwent lingual resistance training interventions in the included studies had a primary etiology of stroke, there was variation in the type of stroke (ischemic, hemorrhagic) and time post-onset of stroke (4 weeks-> 48 months). These differences in the studies recruiting stroke patients and the heterogeneity of patient population in other studies threatens the assumption that participants had similar swallowing impairment profiles to begin with, and may explain the variations seen in swallowing outcomes. Additionally, the observed variation in swallowing improvement may be attributable to differences in intervention protocols utilized, including training frequency (2-5 weeks), duration (4-12 weeks) and task repetition (30-90 times).

Limitations
Although this review followed the PRISMA guidelines, it is not without its limitations. Firstly, a choice was made to exclude unpublished and grey literature from our literature search, which may explain the limited number of studies included in the review. Furthermore, studies were only included if VFSS measures were taken pre-and post-intervention, which resulted in the exclusion of studies which utilized only post-treatment VFSS or other instrumental assessments (e.g., fiberoptic endoscopic evaluation of swallowing) to determine effects of this type of intervention. The reason behind this was that we hoped to perform a meta-analysis to reach stronger conclusions regarding use of this intervention for swallowing, using quantitative analyses on extracted VFSS measures. Quantitative analyses were not possible for many reasons, particularly poor reporting of VFSS frame rate and stimuli (bolus consistencies and volumes) used during assessments. Finally, as limited translational resources were available, only English studies were included in this review. Despite this limitation, only one non-English study was excluded at the abstract screening stage and no studies were excluded at the full-text screening stage.

Conclusions
Overall, this systematic review described the effects of lingual resistance training interventions on swallowing function. Consistent with previous reviews [4], positive evidence was found in terms of impact of these interventions on tongue pressures, along with mixed results for swallowing safety and efficiency. It is important to note that variability in the methodology with this intervention did not allow for quantitative meta-analysis or definitive conclusions. A lack of standardization in methods for VFSS measurement of outcomes across studies was found to be a particular barrier to data synthesis and meta-analysis. Controlled observational studies with larger sample sizes are still needed to provide clinical rationale for use of lingual resistance training in different clinical populations with dysphagia. Future investigations should focus on conducting instrumental evaluations and robust analyses using psychometrically sound instruments following lingual resistance training to provide stronger evidence of the efficacy of such training for improved swallow function.