The results of our study show that the agreement between Rome III and Rome IV criteria in diagnosing a child with an FGID is minimal (κ = 0.34; 95% CI 0.16 to 0.53). In our study sample, none of the children fulfilled Rome III criteria for functional dyspepsia, whereas 11.5% of children fulfilled Rome IV criteria for functional dyspepsia. In addition, we found that significantly less children were diagnosed with functional constipation when using the Rome IV criteria (31.3% vs 13.5%, p<0.001).
The minimal overall agreement in diagnosis of FGIDs may be the result of changes in diagnostic criteria between Rome III and Rome IV. A study performed in Colombia, which compared the prevalence of FGIDs according to Rome IV with a previous study using Rome III, also found a lower prevalence of FGIDs according to Rome IV [7].
In accordance with the findings of this study, we found a large difference in the prevalence of functional dyspepsia likely related to changes in diagnostic criteria [7]. In contrast with the Rome III criteria, the Rome IV criteria do not require patients to describe pain as predominant symptom and introduce two subtypes of functional dyspepsia: epigastric pain syndrome and postprandial distress syndrome. Epigastric pain syndrome is characterized by epigastric pain that is not modified with bowel movements or flatus, and postprandial distress syndrome includes children with bothersome postprandial fullness or early satiation that prevents finishing a regular meal. Based on this, children diagnosed with functional dyspepsia with Rome III would likely fulfill the criteria for epigastric pain syndrome with Rome IV [14]. As all children in our study who were diagnosed with functional dyspepsia according to Rome IV criteria had the postprandial distress syndrome subtype, it is not surprising that they would not fulfill Rome III criteria for functional dyspepsia.
However, the lower prevalence of functional constipation according to Rome IV found in the current study is unlikely caused by changes in the diagnostic criteria. As the only change made in the Rome IV criteria was the shortening of the duration of symptoms from 2 months (Rome III) to 1 month, changes in criteria should have resulted in a similar or higher number of children fulfilling the Rome IV criteria for functional constipation. In contrast to our study, an Italian study on the intra-rater agreement of Rome III and Rome IV criteria found no differences in prevalence (Rome III 22%; Rome IV 21%) and good agreement between both criteria (calculated κ = 0.71) [15]. In the Italian study, children who attended medical consultation or their parents completed both questionnaires consecutively, within 10 minutes, with the help of a research assistant. Difference in found agreement may be the result of differences in study samples. However, also differences in study methods may explain the better agreement found in the Italian study. First, patients with gastrointestinal complaints severe enough to consult a pediatrician may be more focused on both their symptoms and the questionnaire than children in school. Second, children completing both questionnaires consecutively are more likely to remember the answer given to the same question a few minutes before. Third, in the Italian study, a research assistant helped children and their parents in the completion of both questionnaires. Understanding whether the help of the research assistant or the parents was key in obtaining a better agreement could be instrumental for recommending the use of questionnaires in children for clinical or research purposes.
An alternative explanation for the minimal agreement found in our study could be that questionnaires may not be the best instrument to measure the presence of an FGID. The use of questionnaires to diagnose FGIDs in children on itself may result in unreliable measurements and low levels of agreement. Of the children completing all questionnaires, 39/135 (29%) did not answer the questions as instructed and were therefore excluded from the analysis; that alone questions the reliability of the questionnaire. They may have not followed instructions because of misunderstanding of the instructions, because of inappropriate reading comprehension, or because they did not pay attention to the instructions. A previous study by van Tilburg et al. studied the intra-rater reliability of FGIDs in 18 children using the QPGS-RIII [16]. Children completed the questionnaire during their outpatient pediatric gastroenterology clinic visit and again within 2 weeks at home. They found kappa values ranging between 0.22 and 0.78, though they report that given the low number of cases, these results should be considered preliminary. Moreover, they report a low agreement (kappa values ranging from −0.10 to 0.34) between child and physician diagnosis. This raises the question whether the use of questionnaires is a reliable tool to diagnose FGIDs in children. Children may just not be interested in answering questions on a questionnaire or get bored along the way and randomly answer them. Indeed, high intra-rater reliability rates (kappa values ranging from 0.86 to 0.99) of the QPGS-III are reported by Ozgenc et al. who completed the questionnaires during face-to-face interviews in 48 children within a 2-week interval [17]. However, in our study, the exclusion of questionnaires of children who did not comply with the instructions on the questionnaire should have partly reduced the possible bias caused by children who randomly answered questions. Another reason for our found low levels of agreement may be that children have a (relative) poor recall of symptoms [18, 19]. Since our population consisted of a group of apparently healthy school-going children, they may have not payed attention to their (possible) symptoms, which could result in different report of symptoms on each questionnaire.
Strengths of this study include the novelty of the study and the assessment of adequate questionnaire completion, an aspect that has not been previously reported in children completing questionnaires diagnosing FGIDs. Moreover, the results of our study are based on children’s self-completion of questionnaires and not in a medical setting, which limits bias by parents as well as selection bias. However, multiple limitations should be considered. Our study included a relatively small sample with a relative high level of attrition, of Spanish-speaking children within a specific age range (11–18 years old) located at one public school in Cali, Colombia. Therefore, these results cannot be generalized to all age groups, languages, or other geographic areas. In addition, children were not formally evaluated for an organic disease, and consequently, some of the diagnoses may be inaccurate. In addition, we were not able to compare prevalence or rates of agreement of diagnoses which were not prevalent in our study sample according to Rome III and/or Rome IV criteria (e.g., aerophagia and IBS-diarrhea). Moreover, because of the low prevalence of individual FGIDs and our small sample size, the currently reported outcomes of agreement between individual FGIDs have to be interpreted with caution and should be considered preliminary. Still, the inclusion of these data may be valuable for the conceptualization of the problem and to guide sample size calculations in future studies.
In conclusion, we found an overall minimal agreement in diagnosing FGIDs according to Rome III and Rome IV criteria. Largest differences in prevalence were seen in the diagnoses of functional constipation and functional dyspepsia. This may be partly explained by the change in diagnostic criteria. However, limitations with the use of questionnaires to measure prevalence have to be taken into account. We believe that these results imply the need to research the reliability and validity of the use of self-reported questionnaires in research on pediatric FGIDs.