Background

Asthma is a chronic lung disease caused by airway inflammation. Asthma is the most common pediatric chronic disease [1, 2] affecting 7.1 million (9.6 %) of American children [3, 4]. Asthma is the primary diagnosis for 1/3 of pediatric emergency department visits [5], and the most frequent reason for preventable pediatric hospitalization [6] and school absenteeism due to chronic conditions [7]. In 2008, 9.3 billion dollars, or 8 % of the total direct healthcare cost for all children, were spent on pediatric asthma [1].

About 80 % of pediatric asthma patients have symptom onset before age six [8, 9], most of them before age three [1012]. However, only about 1/3 of children with at least one episode of asthmatic symptoms by age three will have asthma at age six and over [10, 1318]. Asthma is under-diagnosed in 18–75 % of asthmatic children [1923]. Overdiagnosis of asthma is also prevalent. Eleven percent of patients in primary care using inhaled corticosteroids, the most potent and consistently effective long-term control medication for asthma [24, 25], have no indication for the medication [26]. It is desirable to construct an accurate model to predict whether a child will develop asthma in the future. In support of the potential of predictive models, a published predictive model for asthma development has already been shown to outperform a physician’s diagnosis of asthma in young children, which had a low sensitivity of 29 % and a low positive predictive value of 23 % [27]. Such a model can provide several benefits.

First, appropriate asthma treatment can prevent serious asthma complications. A delay (median = 3.3 years) in diagnosis is experienced by 2/3 of asthmatic children [2834] and is associated with suboptimal or no treatment for asthma [19, 20, 28, 35, 36], presenting a major clinical and public health concern [29, 37]. Many children, including 37 % of the 32 million American children on Medicaid in 2013, miss regular check-ups [38]. By identifying children at high risk for asthma and scheduling more frequent follow-up with a clinician familiar with asthma, the clinician can diagnose asthma in a timely manner and start asthma treatment earlier [39]. This has long-term benefits including fewer respiratory symptoms [4047], reduced maintenance dose of asthma control medication [43, 48], fewer medication side effects [24], less need for secondary medications [40, 4244, 4648], reduced overuse of antibiotics [29], fewer asthma exacerbations [31, 4153], less school absenteeism [45, 47], fewer caregiver work days lost [53], lower healthcare costs [24, 43, 48, 50, 53], preserved lung function avoiding airway remodeling (i.e., permanent alterations in the airway structure) [31, 41, 43, 44, 47, 48, 52, 5458], less need for rehabilitation [50], lower risk of death from asthma [31, 50, 59], increased chance that the patient outgrows his/her asthma [60], and improved quality of life [14, 54]. Moreover, timely asthma treatment can benefit both children with severe asthma and children with mild asthma [49, 6163].

Second, asthma is a subjective, clinical diagnosis in children under five [14, 64, 65]. Clinicians have difficulty diagnosing asthma in young children [27, 64, 66]. Most children under five cannot cooperate reliably with objective lung function measurements. Also, there is no genetic marker or diagnostic test that can reliably diagnose asthma [64, 67, 68]. Using a predictive model can help physicians better diagnose asthma [69, 70], particularly in children under five.

Third, the information provided by a predictive model can contribute directly to children’s quality of life. A low predicted risk for asthma can alleviate concerns of the child and his/her caregivers [71]. A high predicted risk may help the child and caregivers understand symptoms, improve treatment adherence, and adjust lifestyle and living conditions to avoid exposing the child to environmental contaminants and allergens [45, 72].

Fourth, proposed preventive interventions for asthma [7382] such as suplatast tosilate are under intensive research worldwide [83]. Disease risk ascertainment of enrollees is critical in studying efficacy of preventive interventions in randomized clinical trials [84]. An accurate predictive model can ensure enrollment of children at risk and facilitate re-analysis of earlier trials for more accurate estimates of efficacy.

Fifth, risk stratification through application of a predictive model can help clinicians and researchers properly weigh benefits against harms, costs, and inconvenience of preventive interventions for asthma [71, 85].

To facilitate asthma diagnosis and prevention, researchers have developed multiple models for predicting asthma development in children. In this paper, we provide a systematic review of these models. We present the existing models’ strengths, limitations, knowledge gaps, and opportunities for improvement in modeling. We discuss specific responses to selected gaps and limitations with the hope to stimulate future research on this topic. A list of acronyms used in this paper is provided at the end of this paper.

Methods

This study follows the principles of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guideline [86]. All study co-authors provided input to the study protocol’s design.

Information sources

This systematic review of published literature on predictive models for asthma development in children is limited to the period through June 3, 2015. Eight databases were used: PubMed, EMBASE, CINAHL, Scopus, the Cochrane Library, the Association for Computing Machinery (ACM) Digital Library, IEEE Xplore, and OpenGrey. EMBASE includes proceedings of 1000 conferences each year. The ACM Digital Library and IEEE Xplore are two major computer science literature databases covering journals, magazines, newsletters, and conference proceedings. OpenGrey is a database on grey literature. All citations were imported into the EndNote X7 reference management software.

Eligibility criteria

Inclusion criteria: Judgment of each retrieved reference’s relevance was based on pre-defined inclusion criteria ensuring that the article’s primary focus was on predictive models for asthma development in children including ≥2 attributes. To be considered a qualified report on a predictive model, the article must report Area Under the receiver operating characteristic Curve (AUC) summarizing sensitivity and specificity, accuracy, or ≥2 of the following four performance metrics: sensitivity, specificity, positive predictive value, and negative predictive value. Reporting only one of the latter four performance metrics is insufficient for demonstrating model performance, as the model can be tuned specifically to maximize one metric by sacrificing other metrics, e.g., through a tradeoff between sensitivity and specificity.

Exclusion criteria: Non-English references and conference abstracts were excluded. Unlike full-length conference papers, conference abstracts provide insufficient detail of the study for meaningful review.

Search strategies and study selection

The search strategies were developed by GL, MJ, and two medical librarians (DS and MM) trained in systematic review searches. The search queries used in the eight databases are listed in the Additional file 1. Search results were limited to human subjects and children (birth-18 years) as outlined in the Additional file 1.

For each retrieved reference, two independent reviewers (GL and MJ) evaluated the title and abstract to determine potential relevancy. For each potentially relevant reference, the full text was evaluated to make a final inclusion decision. The final literature review included articles meeting the pre-defined inclusion criteria. GL and MJ’s independent review results achieved a strong level of agreement (kappa = 0.97). Disagreements about inclusion of individual articles were addressed by discussion among GL and MJ, and if needed a third reviewer (BS).

Data extraction and quality assessment

Two independent reviewers (GL and MJ) extracted the following article details using a data abstraction spreadsheet: purpose for making the prediction, study population, population size, methods used for building predictive models, predictors used, and the models’ performance. These two reviewers also assessed each included article’s quality using the following eight questions adapted from the Critical Appraisal Skills Programme (CASP) clinical prediction rule checklist [87]:

Q 1 ::

Is the predictive model clearly defined?

Q 2 ::

Did the population from which the predictive model was derived include an appropriate spectrum of patients?

Q 3 ::

Was the predictive model validated in a different group of patients?

Q 4 ::

Were the predictor variables and the outcome evaluated in a blinded fashion?

Q 5 ::

Were the predictor variables and the outcome evaluates in the whole sample selected initially?

Q 6 ::

Are the statistical methods used to construct and validate the predictive model clearly described?

Q 7 ::

Can the performance of the predictive model be calculated?

Q 8 ::

Was the estimate of the predictive model’s performance precise?

The CASP clinical prediction rule checklist was designed specifically for evaluating the quality of predictive modeling studies. Any discrepancy in review assessment was resolved by discussion between GL and MJ, and if needed a third reviewer (BS).

Results

As shown in Fig. 1, the literature search returned 13,101 references in total, of which 74 were potentially relevant after review of titles and abstracts and underwent full-text review. Of those fully reviewed, 32 references describing 30 predictive models met inclusion criteria and are discussed in this paper. The other 42 were excluded because they do not primarily focus on predictive models for asthma development in children including ≥2 attributes. The included articles include only studies on predictive models. No systematic reviews or randomized controlled trials were found. In this section, we describe the state of the art of predictive models for asthma development in children. A summary of the predictive models for asthma development in children is given in Table 1. Our narrative description and the content of Table 1 are based on article details extracted into the data abstraction spreadsheet, with additional information to provide context. For the question Q 3 used for assessing article quality, the answer is “no” for nine included articles [17, 65, 71, 8893] and “yes” for the other 23 included articles. For each of the other seven questions Q 1 , Q 2 , and Q 4 Q 8 used for assessing article quality, the answer is “yes” for each included article.

Fig. 1
figure 1

Flowchart of the article selection process

Table 1 Categorization of existing predictive models for asthma development in children

Predictive models developed for the general child population

Twenty-three models for predicting asthma development have been developed for the general child population. These models fall into the following categories: clinical index [94100], logistic regression [17, 88, 9092, 101, 102], cumulative risk score [89, 103], severity score [104], combination of two attributes [105], and machine learning models [106111]. 17 models target children at or under age four [17, 8892, 94101, 103105, 111]. Six of 24 studies target children with wheezing or coughing symptoms [17, 88, 98, 100, 101, 111]. Sixteen models used predictors collected from a (parental) questionnaire [89, 90, 9498, 100104, 106110] and family history [8891, 94103, 107]. Three models used genetic information [92, 101, 105].

Different studies used differing prediction targets, candidate predictors, and populations, affecting the predictors included in the final predictive models. Age and gender were used in the predictive models in Clough et al. [17] and Balemans et al. [90, 91], respectively, but were non-predictive for the prediction target in Zhang et al. [88]. Eczema, maternal smoking, and rhinitis were used in the predictive models in Castro-Rodríguez et al. [94100], Balemans et al. [90], and Castro-Rodríguez et al. [9496, 98, 99], respectively, but had no independent significance for the prediction target in Kurukulaaratchy et al. [89]. Food allergy was used in the predictive models in Chang et al. [97, 99, 100, 105], but did not highly correlate with the prediction target in Kurukulaaratchy et al. [89].

Castro-Rodríguez et al. [94] published in 2000 one of the first work on predictive modeling for asthma development in children, where two clinical indices were built: the loose asthma predictive index and the stringent asthma predictive index. Both asthma predictive indices have since been externally validated, with results comparable to those of the initial study [95, 96, 98]. In addition, both asthma predictive indices have since been updated by several researchers: (1) Guilbert et al. [12] in 2004 updated the stringent asthma predictive index through replacing the predictor of allergic rhinitis by allergic sensitization to aeroallergens and allergic sensitization to milk, eggs, or peanuts [97]. (2) Singer et al. [98] in 2013 updated the original asthma predictive index through replacing the predictor of blood eosinophilia by elevated fraction of exhaled nitric oxide (FeNO) to avoid invasive blood sampling. (3) Amin et al. [99] in 2014 updated the original asthma predictive index by using the predictors of frequent wheezing, parental asthma, allergic sensitization to ≥1 aeroallergens, a history of eczema, wheezing without a cold, allergic rhinitis, and allergic sensitization to milk or egg.

Most models (17 of 23) for predicting asthma development for the general child population have low accuracy, typically with a sensitivity, positive predictive value, or AUC much less than 80 %. There are several exceptions, all with unknown performance for the situation of interest to this review:

  1. (1)

    The model built by Klaassen et al. [101] achieved an AUC of 0.86. In the study, prevalence of future asthma development was adjusted in the validation set through stratified sampling. It is unclear how the model would perform in the general child population, where the prevalence of future asthma development remains unmodified.

  2. (2)

    Chatzimichail et al. performed five studies and built one machine learning model per study [106110]. Each study used many candidate predictors and built a model achieving an accuracy ≥95 %. Each study excluded patients with missing data representing 24 % of all patients, incurring a large selection bias. The five models predict persisting asthma in children already diagnosed. In this review, we are interested in models predicting asthma development in children who have not received an asthma diagnosis. The five studies illustrate the potential benefits of including multiple attributes and using machine learning methods in building models.

Besides the above exceptions, there are two other studies that built models with unknown performance for the situation of interest to this review. First, for children at age two, Devulapalli et al. [104] in 2008 conducted a case–control study and developed a severity score to predict asthma development at age 10. The study matched children with recurrent bronchial obstruction (≥2 episodes) to children without bronchial obstruction. Since having recurrent bronchial obstruction increases a child’s asthma risk, the matching process greatly inflated the prevalence of future asthma development in the study population. It is unclear how the model would perform in the general child population, where the matching process is absent.

Second, for children aged 6–24 months with ≥3 episodes of physician-diagnosed wheezing treated with bronchodilators or corticosteroids, Elliott et al. [112] in 2013 used single-breath FeNO >30 parts per billion (p.p.b.) to predict persistence of wheezing at age three. The prediction method achieved an AUC of 0.86, a low sensitivity of 77 %, a specificity of 94 %, a positive predictive value of 95 %, and a low negative predictive value of 73 %. Obtaining single-breath FeNO measurements requires sedating the child, special equipment, and special technical expertise. It is unknown how feasible the method will be in predicting asthma development in children. The study used a highly selected population that may have a high pre-test probability of continued wheezing at age three.

Predictive models developed for the primary care setting

Six models for predicting asthma development in children have been developed for the primary care setting. In this setting, we prefer predictors that are accurate and non-invasive, easy, and inexpensive to obtain [90].

Among all models for predicting asthma development in children in the primary care setting, two [27, 113] have been externally validated [96, 114] with results comparable to those of the initial studies. With one exception [71], all models target children at or under age four. Three of six models used predictors collected from a parental questionnaire [9, 27, 71]. All models are based on logistic regression, target children with asthma-like symptoms such as wheezing, and used family history information. No model used genetic information.

Different studies used differing prediction targets, candidate predictors, and populations, affecting the predictors included in the final predictive models. Age was used in the predictive models in Eysink et al. [65, 71, 113, 114], but was non-predictive for the prediction target in Vial Dupuy et al. [18]. Gender was used in the predictive models in Caudri et al. [9, 27, 113, 114], but was non-predictive for the prediction targets in Vial Dupuy et al. [18, 71]. Parental asthma was used in the predictive models in Vial Dupuy et al. [18, 71, 113, 114], but had no independent significance for the prediction target in Caudri et al. [9, 27]. Parental education was used in the predictive model in Caudri et al. [9, 27], but was not collected in the studies in Vial Dupuy et al. [18, 65, 71, 113, 114].

For children aged 0–4 at the first time of having asthma-like symptoms in the primary care setting, Caudri et al. [27] in 2009 built a logistic regression model to predict asthma development at age 7–8. The model achieved a low AUC of 0.74, a low sensitivity of 36 %, a specificity of 91 %, a low positive predictive value of 32 %, and a negative predictive value of 92 %. In comparison, in the year when asthma-like symptoms were first reported, a physician’s diagnosis of asthma had a low sensitivity of 29 %, a specificity of 88 %, a low positive predictive value of 23 %, and a negative predictive value of 91 %. Thus, the model performed better than a physician’s diagnosis of asthma.

Most models (five of six) for predicting asthma development in children in the primary care setting have low accuracy, typically with an AUC much less than 80 %. There is only one exception with unknown performance for the situation of interest to this review. The model built by Eysink et al. [65] achieved an AUC of 0.87 using a case–control design matching IgE-positive children to IgE-negative children. The matching process excluded most children as they were IgE-negative. Since being IgE-positive increases a child’s asthma risk, the matching process greatly inflated the prevalence of future asthma development in the study population. It is unclear how the model would perform in routine clinical practice, where the matching process is absent. On a typical, clinically relevant child population in primary care, we would expect the model built in Eysink et al. [65] to perform worse than that built in van der Mark et al. [71], because the predictors used in the former are roughly a subset of those used in the latter while both models were developed using the same statistical method. The model built in van der Mark et al. [71] achieved a low AUC of 0.73.

Predictive models developed for bronchiolitis patients

Asthma is highly associated with bronchiolitis, a disease primarily of children under age two. Bronchiolitis is inflammation of bronchioles, the smallest air passages in the lungs. In cases of asthma between ages 4 and 5.5, 31 % are heralded by clinically significant bronchiolitis during infancy that incurred an outpatient clinic visit, emergency department visit, and/or hospitalization [115]. By age two, >1/3 of children have experienced clinically significant bronchiolitis [116]. Between 14 and 40 % will eventually be diagnosed with asthma [117, 118], with the risk persisting into adulthood [117, 119125]. In general, experiencing clinically significant bronchiolitis increases a child’s asthma risk 2–10 times [115, 117, 119126].

For bronchiolitis patients, various predictors of recurrent wheezing and emerging asthma have been identified in the research literature [39, 69, 84, 119, 124, 127154]: male gender, race, type of virus causing bronchiolitis, atopic dermatitis, family history of asthma, parental atopy, repeated wheezing at ages 0–1 and 1–2, early sensitization to common food and inhalation allergens, elevated blood eosinophils (blood eosinophilia), low serum vitamin D level, birth length, high birth weight, high weight gain from birth to hospitalization for bronchiolitis, serum eosinophil-derived neurotoxin level at 3 months after hospitalization for respiratory syncytial virus (RSV) bronchiolitis, high maternally derived RSV neutralizing antibody level in cord blood, breastfeeding <3 months, moisture in the home environment, exposure to secondhand smoke, no daycare attendance, exposure to high levels of dog allergen, swimming in chlorinated pools before age two, and the following factors during (RSV) bronchiolitis: elevated IgE values, quantity of RSV-specific IgE produced, high serum eosinophil cationic protein concentration, nasal eosinophil, high CCL5 (previously known as RANTES) level in nasal epithelia, signs of airflow limitation, monocyte interleukin-10 (IL-10) level, creola bodies in the sputum, and low serum level of soluble CD14.

For children at age two previously hospitalized for bronchiolitis during infancy, Mikalsen et al. [93] in 2013 built a logistic regression model to predict asthma diagnosis at age 11. Four predictors collected from a parental questionnaire were used: recurrent wheezing, parental atopy, parental asthma, and atopic dermatitis. The model achieved a low sensitivity of 65 %, a specificity of 82 %, a low positive predictive value around 50 %, and a negative predictive value around 89 %.

Discussion

Existing predictive models for asthma development in children have several limitations. We now describe these limitations and identify several opportunities to improve predictive models for asthma development in children.

Using clinical data

Most existing predictive models for asthma development in children were developed using medical research data collected specifically for the study, typically through a parental questionnaire [71]. Medical research data represent an ideal scenario atypical in practice, as they are much more robust (complete, consistent) than clinical data routinely collected in the electronic medical record in clinical practice. Also, medical research data often include additional variables not routinely collected in clinical practice. To be useful in routine clinical practice, a predictive model for asthma development in children should be developed using clinical data rather than medical research data. Such a model is suitable for implementation in an electronic medical record as a decision support tool.

Making prediction at the right time

Most existing predictive models for asthma development in children make predictions at a time unsuitable for making clinical impact. This reduces these models’ clinical value.

Usually, a physician can use a predictive model for asthma development to facilitate asthma diagnosis and/or prevention only if the child comes to seek medical attention [27]. A patient healthcare visit, ideally for asthma-like symptoms [18], is the best time for the physician to prescribe preventive interventions for asthma and to schedule follow-up visits. However, most existing predictive models for asthma development make predictions outside of a patient healthcare visit, often when the children are at a fixed age [27]. Also, if a child is not having asthma-like symptoms at that time, it would be difficult to motivate the child and his/her parents to comply with preventive interventions and follow-up visits [55]. So far, none of the existing predictive models for asthma development in children works for all types of patient healthcare visits (outpatient clinic visit, emergency department visit, and hospitalization).

Most existing predictive models for asthma development in children were developed for relatively old children, with a median age between two and four. This age is too late for effective application of preventive interventions for asthma. Many preventive interventions are intended to modify the natural course of asthma, particularly to prevent airway remodeling and eosinophilic inflammation. Airway remodeling and eosinophilic inflammation have not occurred in children with asthma-like symptoms before age two, but are already present in asthmatic children at age two [37, 155].

To be useful in routine clinical practice, a predictive model for asthma development in children should make predictions during patient healthcare visits (possibly for asthma-like symptoms) and before children reach age two. Ideally, the model should work for all types of patient healthcare visits. In general, children at high risk for asthma should be identified as early as possible [10, 17, 18, 156]. However, this does not mean that every preventive or treatment intervention for asthma should be started immediately when a child is first predicted to be at high risk for asthma. Yoshihara [37] suggested that starting inhaled corticosteroids before age one is possibly too early and likely to have no effect on the natural history of asthma. Instead, early intervention with anti-inflammatory medications such as inhaled corticosteroids should possibly occur between ages one and three.

Improving prediction accuracy

As mentioned in the introduction, predictive models for asthma development in children are developed to facilitate asthma diagnosis and prevention. Asthma is a non-communicable disease occurring in a minority of children. Medications that can potentially prevent asthma have side effects [55]. It is costly and unethical to give such a medication to a large proportion of children, particularly young children, for asthma prevention if they will not benefit from the medication [40, 55, 156]. The case for other interventions for asthma prevention or treatment is similar.

To be clinically valuable, a predictive model for asthma development in children needs to have both high positive predictive value and high sensitivity [157]. High positive predictive value ensures that a child with high predicted risk is indeed likely to develop asthma. High sensitivity ensures that the model can identify most children who will develop asthma in the future.

As reviewed in Results section, every existing predictive model for asthma development in children has a low AUC, a low sensitivity, and/or a low positive predictive value, typically all much less than 80 %. At present, no such model can attain accuracy high enough for routine clinical use [39, 71, 84]. It remains an open problem to improve the accuracy of predicting asthma development in children. There are several potential approaches for improving accuracy, including machine learning methods, using large data sets and exhaustive variable sets, and focusing on a child population with a high prevalence of future asthma development. We now describe these approaches individually.

Using machine learning methods

Most existing predictive models for asthma development in children are based on the statistical method of logistic regression. Except for those described in Chatzimichail et al. [106111], the other existing predictive models are based on either risk score or combination of risk factors. As is the case with predictive modeling in general, machine learning methods such as support vector machines and random forests often achieve higher prediction accuracy than risk score, combination of risk factors, and logistic regression [158]. It would be interesting to compare various machine learning methods for predicting asthma development in children. Traditionally, risk score, combination of risk factors, and logistic regression have two advantages over machine learning models: easier to use and easier to interpret [159]. Through integration into a decision support tool, machine learning models can be made easy to use. Recently, a new method was developed to automatically explain the prediction results of any machine learning model without losing prediction accuracy [160]. After overcoming the barriers of difficulty in use and model interpretability, machine learning models would have no major disadvantages compared to risk score, combination of risk factors, and logistic regression.

Using large data sets and exhaustive variable sets

With rare exceptions [9, 27, 92], existing predictive models for asthma development in children were developed using small data sets including (typically much) fewer than 2000 children. In general, a predictive model’s accuracy improves as the training data set becomes larger, particularly if the model uses many predictors. By using data of more children to train the predictive models for asthma development in children, we are likely to improve the predictive models’ accuracy.

Many risk factors for asthma development are known in the literature [94, 161169]. However, with few exceptions [91, 102, 106111], most existing predictive models for asthma development in children use ≤10 attributes. By using an exhaustive set of variables coupled with a large number of children, we are likely to further improve the predictive models’ accuracy.

Focusing on a child population with a high prevalence of future asthma development

The positive predictive value of a model for predicting development of a disease depends critically on the prevalence of future development of the disease. The model’s positive predictive value improves as the prevalence increases [170, 171]. If the prevalence is low, which is the case for asthma in the general population, the model’s positive predictive value will not be close to 1 even if the model has both high sensitivity and high specificity [171]. This is easy to understand. In the general child population, most children are not prone to develop asthma in the future. Thus, the signal for future asthma development is weak and difficult to detect.

To address this issue and improve the predictive model’s positive predictive value, we can focus on a subset of children with a high prevalence of future asthma development rather than apply the model to the general child population [170]. The subset of children experiencing clinically significant bronchiolitis is one good such subset for several reasons. First, as mentioned at the beginning of Predictive models developed for bronchiolitis patients section, this subset of children not only has a high prevalence of future asthma development, but also includes a significant portion of children who will eventually develop asthma. Second, in this subset of children, attributes related to clinically significant bronchiolitis can provide additional information to help improve the prediction accuracy. Third, bronchiolitis mainly occurs before age two. As explained in Making prediction at the right time section, a healthcare visit for bronchiolitis is a good time to predict a child’s risk of developing asthma in the future.

So far, only one model has been developed for predicting which bronchiolitis patients will develop asthma in the future [93]. This model focuses on children at age two previously hospitalized for bronchiolitis during infancy and has two major shortcomings. First, the prediction is made at the time the child is at age two and outside of patient healthcare visit. As explained in Making prediction at the right time section, this is not a good time to make prediction. Second, among all children experiencing clinically significant bronchiolitis, only ~10 % (3 % of the general child population) are hospitalized for bronchiolitis [115, 116]. Hence, the model can identify only a small portion of children who will eventually develop asthma [115]. The narrow applicability limits the model’s usefulness.

To overcome these two shortcomings, it would be desirable to develop models for children experiencing clinically significant bronchiolitis and predict, during patient healthcare visits for bronchiolitis, which patients will develop asthma in the future. Among all children with clinically significant bronchiolitis, a subgroup analysis based on the type of healthcare visit (outpatient clinic visit, emergency department visit, and hospitalization) could evaluate how models perform on different subgroups of children. In this case, the subgroup of bronchiolitis patients in the emergency department observation unit can be either handled separately or combined into the subgroup of hospitalized patients [115].

As mentioned in Luo et al. [172], to build such predictive models, we should use risk factors for asthma development known in the literature [94, 161169] rather than only those for bronchiolitis patients. These risk factors include both patient characteristics and environmental factors [173]. As one predictive model does not fit all [103], we should develop separate predictive models for children presenting with bronchiolitis at <6, 6–12, and 13–24 months of age [174]. As boys and girls have different likelihood of developing asthma, it could be desirable to develop separate predictive models for different genders [175].

Using an appropriate definition of asthma

Different predictive models for asthma development in children used differing asthma definitions and predicted asthma development by various ages. This diversity impacts estimated asthma prevalence rates and the models’ prediction results [176]. At present, there is no consensus on the optimal asthma definition or age cutoff [157].

For developing a predictive model for asthma development in children, we would advocate starting from a conservative asthma definition ensuring the existence of asthma with high likelihood. One such definition is used in Schatz et al. [177]: a patient is considered to have asthma if he/she has (1) at least one ICD-9 diagnosis code of asthma (493.xx) or (2) two or more asthma-related medication dispensing (excluding oral steroids) in a 1-year period, including β-agonists (excluding oral terbutaline), inhaled steroids, other inhaled anti-inflammatory drugs, and oral leukotriene modifiers. Using a conservative asthma definition helps identify the predictors of true asthma and estimate the risk for true asthma. Then if necessary, we can broaden the scope of this definition in various ways and see how the predictive model performs with different definitions.

A child who will ever develop asthma can benefit from both timely asthma diagnosis and preventive interventions for asthma, even if he/she may outgrow his/her asthma later in life [60, 178]. Hence, we would advocate the prediction target (i.e., the dependent variable) to be ever development of asthma by a certain age rather than active asthma at a certain age. To help select an appropriate cut off age for asthma development, we can plot the cumulative rate of ever development of asthma vs. age [8, 16, 167]. The age at which the cumulative rate of ever development of asthma starts to level off can be an appropriate cut off point, as it ensures including most children who will ever develop asthma.

Main findings

Substantial effort has been invested in predictive models for asthma development in children. Although considerable progress has been made, much remains to be done for these models to be useful in clinical practice. We have identified several limitations and open problems in predictive modeling for asthma development in children. In particular, prediction accuracy is inadequate. We have provided some preliminary thoughts on how to address these limitations and open problems. This establishes a foundation for future research on this topic.

So far, no study has deployed a predictive model for asthma development in children in clinical practice and demonstrated the model’s impact on clinicians’ behavior and clinical outcome [157]. It would be desirable to develop an accurate predictive model and then deploy it in clinical practice to measure its clinical impact, beginning at a single institution and later expanding to multiple institutions. This is essential for ensuring the model’s generalizability and for the model to be widely accepted by clinicians.

Limitations

This systematic review has several limitations. First, by excluding articles not written in English, we may have missed predictive models for asthma development in children published in other languages. Second, there may be other predictive models for asthma development in children that have never been published and hence are missed in this systematic review. Third, few studies directly compare predictive models on the same child population. Performance metrics such as the AUC should not be used to directly compare predictive models across different child populations. Fourth, there is no clear gold standard for the prediction target of asthma development in children. Even if the approach described in Using an appropriate definition of asthma section is used to define the prediction target, the resulting definition would still be imperfect. For instance, no existing method can tell exactly which children under five have asthma, as asthma is a subjective, clinical diagnosis in this age group [14, 64, 65]. Without a gold standard definition of asthma development, it is difficult to compare the performance of different predictive models. Thus, investigation and consensus on the appropriate definition of asthma development is needed for future efforts on developing new predictive models to be clinically and widely meaningful.

Conclusions

We systematically reviewed the literature on predictive models for asthma development in children. Existing models have several limitations. In particular, prediction accuracy is inadequate for clinical use of any existing model. Future studies will need to address these limitations to achieve optimal predictive models. More specifically, to be useful in routine clinical practice, a good predictive model should use clinical data, make prediction at a time suitable for making clinical impact, have high accuracy, and use an appropriate definition of asthma.