Congenital hypothyroidism (CH) is basically a treatable disorder, with as major goal a normal cognitive development. Even after 40 y of experience, several questions regarding optimal treatment remain, such as how fT4 and TSH values should be evaluated (1), to what extent over-treatment (OT) and under-treatment (UT) impede normal cognitive development, how these conditions should be defined, and what the optimal dosages of levothyroxine (L-T4) should be. Recent data demonstrated that high L-T4 treatment may lead to lowered cognitive outcome at 14 y, both in mild and severe CH (2). In contrast, several longitudinal studies during the 90’s show that low L-T4 dosages, in spite of late initiation of therapy and long periods of TSH elevation, may lead to a normal cognitive outcome (3,4,5). In a recent study, we found that OT during the first 24 mo is a greater risk for an adverse cognitive outcome at age 11 than UT (6). In that and a previous study, we also demonstrated that in CH fT4 is a better biomarker for euthyroidism than TSH (6,7). Treatment can be individualized by determining fT4 and TSH steady-state concentrations (SSC) that can provide individual target values and ranges (6,7). In the present study, we investigated whether the individual fT4 SSC’s are more useful than age-specific fT4 and TSH reference ranges for the evaluation of follow-up values. Thereby, we evaluated the effect of OT during the first 5 mo, the period of the maximal cortical growth spurt, on eventual cognitive outcome at ages 1.8, 6, and 11 y. In addition we analyzed the impact of high initial L-T4 dosages on initial fT4 concentrations.


Sample Classification by Individual fT4 SSC Range and Age-Specific Reference Ranges

The 95% CI of fT4 and TSH age-specific reference ranges were 14.4–30.2 pmol/l and 0.42–6.02 mU/l, respectively. The proportion of euthyroid, OT, and UT follow-up samples of the in total 1170 samples from period 0–24 mo considerably differed for the three methods that were applied. The proportions for fT4 SSC's were 771, 219, and 180, respectively; for the age specific fT4 reference range 957, 110, and 103, respectively; and for the age specific TSH reference range 481, 130, and 559, respectively (P < 0.001). In Figure 1 , OT and UT control samples, relative to the individual fT4 SSC range, are shown and compared with the fT4 and TSH age-specific reference ranges. The fT4 and TSH age-specific reference ranges missed 50 and 41%, respectively, of the OT episodes, found by the fT4 SSC’s. Consequently, the reference ranges recognized fewer OT patients than did the fT4 SSC’s. Group OT/Over ( Table 1 ) contained 38 patients, the group OT/No Over contained 11 patients, indicating that the fT4 age-specific reference range method missed 11 of the 49 over-treated patients (22%). In the repeated measurements ANOVA ( Table 1 ), the interaction term age*fT4 SSC was highly significant (P = 0.007), while the interaction term age*fT4 age-specific reference range was insignificant (P = 0.190). The analysis yielded lower intelligence quotients at age 11 (IQ11’s) and significantly larger decreases in development scores in the period 1.8–11 y for the groups OT/Over and OT/No Over than for the groups No OT/Over and No OT/No Over.

Figure 1
figure 1

Visualization of differences in sample classification between steady-state concentrations (SSC) method and age-specific reference range method. Symbols represent CH follow-up fT4 and TSH determinations of period 0–24 mo after initiation of therapy. 1,170 blood samples were drawn. Euthyroid determinations are omitted for better visualization: plotted are only those determinations that were classified relative to the fT4 SSC’s as over-treated (OT, indicated with crosses) and under-treated (UT, indicated with open circles), defined as fT4 concentrations above/below the individual +2 SD/−2 SD range, respectively. Horizontal (log TSH) and vertical (fT4) solid lines represent age-specific reference lines, based on a cohort of 208 healthy infants aged 9 d–24 mo. The striking feature of Figure1 is the large number of high TSH values in the OT samples, i.e., the failure of high fT4 levels to suppress TSH, illustrating the dysfunction in CH newborns of the TSH feedback system by high fT4 levels in the first 24 mo. Only 34% (n = 51) of the total 152 OT samples with a TSH value > −2 SD originated from the initial period.

PowerPoint slide

Table 1 Effect of two different sample classification methods (SSC method and age specific reference range method) on eventual cognitive outcome (mean (SEM))*,a

Influence of Over and Under-Treatment on Development Scores

The characteristics of the three OT 0–5 mo groups are shown in Table 2 . Mean fT4 concentration and total OT duration were in the severe OT group higher than in the two other groups. The analysis of the developmental scores with the groups OT/UT 0–5 mo and OT/UT 6 mo to 6 y, as well as their interaction with age as predictors ( Table 3 ), showed lower IQ11 scores for the five OT groups than for the No OT group. All five OT groups displayed, in contrast to the No OT group, decreasing cognitive scores in the period 1.8–11 y; statistically significant were the decreasing scores of the two severe OT 0–5 mo groups and that of the mild OT 0–5 mo with OT 6 mo to 6 y group. The lowest IQ11’s were found for severe OT 0–5 mo without OT 6 mo to 6 y (73.1 (13.6)), the highest IQ11’s for no OT in the period 0–6 y (112.5 (15.4)). Patients with severe OT 0–5 mo had a 5.5 (95% CI: 1.63; 18.21) fold higher risk of an IQ11 < 85 than did the other patients (P = 0.006). OT 0–5 mo was strongly associated with the decrease in development scores in the period 1.8–11 y (r = 0.433, P < 0.001), OT 6 mo to 6 y with the decrease in period 6–11 y (r = 0.340, P = 0.007) ( Table 3 ). The same analysis for UT ( Table 3 ) yielded differences in IQ11 scores and significant changes in development scores from 1.8 to 11 y, but they were not related to the severity of UT 0–5 mo. At the moment of testing at ages 1.8, 6, and 11 y 5, 3, and one in “n” patients, respectively, were hypothyroid, relative to the SSC method, and 1, 5, and 2 patients, respectively, hyperthyroid. Thyroid status at testing did not significantly influence overall test results (data not shown).

Table 2 Characteristics of the OT 0–5 mo groups (mean (SD))
Table 3 Mean development scores (SEM) at three ages and mean changes in development scores from 1.8 to 11 y (SEM) (95% CI) for the OT and UT groups 0–5 mo and 6 mo to 6 y*

Effect of Initial High and Low L-T4 Dosages on fT4 Concentrations

In Figure 2 , the consequences of high and low initial L-T4 substitution for fT4 concentrations in the initial period are illustrated. With low initial dosages, fT4 concentrations in most cases rose slowly to their individual SSC range; with high initial dosages, they often exceeded this range. The risk of OT in the first three follow-up samples (n = 183) was 3.7 (95% CI: 1.4; 9.3) (P = 0.007) fold higher for initial L-T4 dosages > 10 µg/kg than for dosages ≤ 10 µg/kg. Free T4 concentrations in these samples, collected at ages 4.4 (2.1), 9.2 (3.9), and 17.0 (10.3) days after therapy initiation, showed a positive relation with initial L-T4 dose (P < 0.001) and a negative relation with minimal initial fT4 deficit (P < 0.001). Table 4 shows the mean fT4 concentrations (in SDS) (95% CI) relative to the individual SSC’s, obtained by substituting ascending initial L-T4 dose values (µg/kg) and minimal initial fT4 deficit values (pmol/l) in the regression equation: fT4 (in SDS) = –2.30+ 0.66*initial L-T4 dose – 0.33*minimal initial fT4 deficit. Table 4 shows which combinations of initial L-T4 dosage and minimal initial fT4 deficit (bold figures) may lead to initial over-treatment, i.e., to fT4 values > +2 SDS, relative to the individual SSC’s.

Figure 2
figure 2

Examples of fT4 concentrations after low and high initial L-T4 dosages. Four examples of individual initial fT4 curves, two examples of cases with low initial L-T4 dosages and without initial OT (panel a: mild CH, initial L-T4 dose 7.2 µg/kg, panel b: severe CH, initial L-T4 dose 6.0 µg/kg) and two examples of cases with high initial L-T4 dosages and with initial OT (panel c: mild CH, initial L-T4 dose 10.3 µg/kg, panel d: severe CH, initial L-T4 dose 11.2 µg/kg). Black lines: individual fT4 concentration, gray lines: mean individual fT4 SSC ± 2 SD, determined with fT4 control values after the initial period (i.e., > 1.5 mo until 11 y). SSC, steady-state concentrations.

PowerPoint slide

Table 4 Free T4 concentrations (in SDS) of the first three follow-up samples, at 4.4, 9.2, and 17.0 days, by initial L-T4 dose and minimal initial fT4 deficita


The results of our study on CH treatment suggest that adequate signaling and avoidance of over-treatment are of great importance and that this can best be done with fT4 SSC’s. In our study, TSH and fT4 age-specific reference ranges, recommended in the recent ESPE guidelines (1), missed, relative to the fT4 SSC’s, 41 and 50%, respectively, of the over-treatment episodes and 22% of the over-treated patients, because reference ranges are too wide to signal all over-treatment episodes ( Figure 1 ). This had consequences for eventual cognitive outcome. The 11 over-treated patients, missed by the fT4 age-specific reference range, had significantly decreasing development scores from 1.8 to 11 y, suggesting real over-treatment, while the three patients recognized by the fT4 age-specific reference range as over-treated, but not so by the SSC’s, as well as the nine patients classified by both methods as not-over-treated, did not show decreasing development scores in this period. Our conclusion is that reference ranges in CH, with only two fixed values for upper and lower limit, will never be able to properly distinguish between normal or abnormal for the individual patient, whatever values are taken. Our data indicate that severe over-treatment during the first 5 mo leads to normal development scores at 1.8 and 6 y, but to considerably lowered IQ’s at 11 y. Patients with severe over-treatment 0–5 mo had a 5.5-fold higher risk of an IQ11 < 85 than those with mild over-treatment 0–5 mo or without over-treatment. Over-treatment in period 6 mo to 6 y seemed to contribute to the IQ decrease from 6 to 11 y. For under-treatment, no negative effect on cognitive development scores could be established. Patients treated with initial L-T4 dosages > 10 µg/kg had a 3.7-fold higher risk of over-treatment within the first 17 d than those with lower initial dosages had. These data suggest that during the first neonatal months, CNS development is very sensitive to over-treatment with thyroid hormone, more sensitive than to later over-treatment. From animal models (8,9,10,11,12), we know that early over-treatment may enhance neural growth and development and, when over-treatment continues, it even may lead to an irreversible arrest in neural development. A similar development was seen in our severely over-treated patients: at age 1.8 y they had normal to above normal development scores, but thereafter decreasing scores. In contrast, under-treatment in this period may lead to a delayed, but reversible neural development with later catch up (13). In CH, initial low-dose treatment seems preferable to initial high-dose treatment, in contrast to the advice in most recent studies (14,15,16,17,18) and ESPE guidelines (1). A slow fT4 rise with under-treatment seems more preferable than a fast rise with over-treatment ( Table 4 ), because we earlier demonstrated that a fast TSH normalization does not lead to higher IQ11’s (6). Low-dose treatment allows fT4 to gradually rise to its SSC value and to remain on that level, if medication is not importantly changed. A further advantage of initial low-dose treatment is that individual fT4 SSC can be estimated during the initial period (7). This is not possible with initial high-dose treatment, because fT4 values are often not yet stable then. Initial high-dose treatment was introduced in the 80’s, in order to shorten the period of neonatal hypothyroidism (16) and L-T4 doses between 10 and 14 µg/kg were considered safe and effective to do so (17). The normal to above-normal cognitive development scores in CH infants treated with a higher L-T4 dose at young age, found by us as well as by others (14,15,18,19), strengthened the conviction, also emphasized in the ESPE guidelines (1) that high L-T4 dosages and maintenance of TSH in the normal range are necessary for a normal eventual outcome. The sharp decline in IQ values from 6 to 11 y in our study is unusual and therefore unexpected, because in healthy subjects IQ at age 6 is a good predictor for IQ at ages 11 and 13 (20,21). The longitudinal study of Rovet, showing decreasing development scores in CH children from infancy to adolescence (22), supports our data. A consequence of the changing cognitive scores over time is that no meaningful conclusions on eventual cognitive outcome can be drawn from data that do not include adolescent patients (14,15,18,19). The results of the only so far published transversal study on adolescent CH patients, treated according to the recent guidelines with low TSH values during the first year (mean 1.0 mU/l) and demonstrating lowered cognitive outcomes at age 14, irrespective the type of CH (2), are in concert with our results. Moreover, the longitudinal studies during the 90’s (3,4,5) showing normal cognitive outcomes in initially low treated patients, in spite of long periods of under-treatment and late initiation of therapy, support our normal cognitive outcomes in the not-over-treated patients. These last studies also demonstrate that low initial L-T4 dosages may result in normal adolescent IQ’s.

The most important issues in CH treatment are early initiation of therapy and avoidance of over-treatment. For that purpose, frequent monitoring and L-T4 dose adaptation and correct interpretation of the fT4 and TSH concentrations are needed. It is recommended to start with a low L-T4 dose, taking the estimated minimal initial fT4 deficit into account ( Table 4 ), thus minimizing the risk of over-treatment, i.e., an fT4 concentration > 2 SDS, and to take a first follow-up sample at day 3 or 4. L-T4 dosage can then be decreased, when increase of the initial fT4 concentration is such that for the next days a rise to above 30 pmol/l can be foreseen. We advise to collect follow-up samples in the period 10–60 d at 10–14 d intervals. Once fT4 has stabilized, fT4 SSC can be estimated, also when TSH is still elevated, providing a target value to aim at (7). With five samples, mean fT4 SSC can be estimated with 95% CI at a precision level of 90% (7). In further follow-up, when more fT4 determinations, valid for the assessment of the fT4 SSC, i.e., within TSH interval 0.5–10 mUI/l, become available, the individual fT4 SSC range can be established. Such range is crucial for the signaling of over and under-treatment. A correct assessment of the thyroid status by fT4 is in CH even more important than in healthy subjects, because TSH in CH is less predictable (6,7) ( Figure 1 ) and thus less informative about the thyroid status than in the normal situation. TSH SSC’s are less well defined, with a larger variation around the individual mean SSC ( Table 5 ), and signal overtreatment less well than fT4 SSC’s. Therefore, fT4 should be the most important factor in guiding decisions on sample evaluation and therapy, more important than TSH. Two-thirds of our CH patients had a TSH SSC above the normal range, i.e., > 2.5 mUI/l (7). This means that for the majority of them the lower limit of TSH will be above 0.05 mUI/l, the lower limit indicated in the ESPE guidelines (1). We therefore recommend keeping TSH in the upper half of the TSH SSC range (3.3–5.9 mUI/l), in order to avoid over-treatment.

A drawback of our study is that fT4 measurements were performed in 27 different laboratories. The adult reference ranges of the laboratories varied, but the differences were so small that we accepted these differences. Furthermore, the fT4 values of the patients, with a few exceptions determined in their own local laboratory throughout the study, were expressed as SDS value of the individual SSC’s, making them independent of the used fT4 determination method. Therefore, the various fT4 determination methods did not change the eventual conclusions. Another drawback of our study is its limited sample size in a cohort with various different treatment schedules, resulting in small treatment groups. The diversity of treatment methods, however, allowed a comparison between the effect of low and high treatment on eventual cognitive outcome at age 11, which would not have been possible in a cohort with a single treatment protocol. A future cohort study with more CH patients, treated with different protocols and repeatedly psychologically tested until adulthood, might reveal the most optimal treatment protocol. The feasibility of such study, however, is uncertain in view of ethical issues.



The study group consisted of 61 children (19 boys) with CH, born between February 1993 and July 1996, and followed from birth until 11 y, the same cohort described previously (6,7,23,24). The study was approved by the privacy committee of the CH Screening Board and by the Erasmus MC review board. The original cohort, recruited by questionnaires sent out by the national CH coordination office to all local pediatricians, consisted of 98 patients, of which 71 patients were selected and 27 were excluded: four infants of mothers with known thyroid abnormalities, five infants at risks for abnormal psychomotor development (asphyxia, meningitis, chromosomal abnormality, severe hypotonia syndrome, prematurity), one infant that died, and 17 patients with mild CH, because testing capacity was limited and we expected more developmental abnormalities in severe CH than in mild CH (6,23,24). After written informed parental consent, the study group consisted of 27 children with severe CH and 34 with mild CH. Severe CH was defined as total incapability to produce T4, mild CH as partial incapability (6,8). All patients were psychologically tested at 1.8 (0.4) years (mean (SD)), 46 at 6.0 (0.6) years, and 55 at 11.0 (0.7) years. Reasons for not re-examining all patients at ages 6 and 11 y have been described previously (6,9).

Treatment and Monitoring

The local pediatrician made all decisions on treatment regimen. Until 1991, an initial daily L-T4 dose of 5–6 µg/kg with an upper limit for T4 of 180 nmol/l was recommended; in 1992, the recommendation changed to 9–12.5 µg/kg for the initial L-T4 dose and 35 pmol/l for the upper limit of fT4. Follow-up data on fT4, TSH, and L-T4 dosages (n = 2,524) were collected. Only 1.1% of the samples were suspicious for bad adherence to medication. On average 31 (range 18–78) control measurements per patient were performed in the period 0–6 y, of which 10 (range 5–17) were performed in the period 0–5 mo. Two groups were formed based on the timing of treatment start: early ≤ 12 d (n = 28), and late > 12 d (n = 33), and two groups based on initial daily L-T4 dosage: low (≤ 10 µg/kg, n = 40) and high (>10 µg/kg, n = 21). Mean initial L-T4 dosage was 8.6 (2.2) µg/kg. In 17 patients, initial L-T4 was increased from 7.5 (2.5) to 9.4 (2.3) µg/kg during the initial period, ≤ 1.5 mo after therapy start. Maximal initial L-T4 dose was defined as either the increased dose, or the initial dose, when L-T4 dosage was not increased.

Laboratory Tests

Free T4 and TSH were measured in 27 different laboratories, whose quality of measurement is surveyed by the National External Quality Assurance Program (the Netherlands, SMLK) (25). Adult reference values were 11–25 pmol/l for fT4 and for TSH, at the time of this study, 0.4–4.3 mU/l.

Individual Steady-State Concentrations (SSC’s)

Individual fT4 and TSH steady-state concentrations (SSC’s) were defined as those concentrations at which both fT4 and TSH are constant, despite ongoing processes that might change them, such as medication, thyroxine clearance, and alterations in the equilibrium between the different deiodinase pathways (26). In the euthyroid situation, individual fT4 and TSH SSC’s are in equilibrium with each other. In healthy, euthyroid subjects and subjects with stable thyroid disease, individual SSC’s represent the mean value of a series of determinations under standard conditions, with an individual SSC range that is about half that of the total population (27,28,29). For the assessment of SSC’s in CH, those samples were included that we assumed to represent a stable and euthyroid state, obtained in the postinitial period (i.e., > 1.5 mo after therapy start until 11 y) that were within the TSH interval 0.5–10 mUI/l. From the thus obtained series of determinations, those samples were excluded, which were outside the 95% of this series. Included were only samples that were inside both the individual fT4 as well as the individual TSH SSC range ( Table 5 ). As described previously, individual fT4 and TSH SSC’s decreased slightly in the period 0–11 y, and were largely independent of the L-T4 dosage (7). In the initial period, fT4 SSC’s were somewhat higher than in the postinitial period, without reaching significance: 22.5 (3.0) vs. 21.7 (2.6) pmol/l, respectively (P = 0.07) (7). Minimal initial fT4 deficit (pmol/l), determined for each patient, was the difference between −2 SD value of the 95% CI of the fT4 SSC range of the whole cohort, i.e., 16.6 pmol/l ( Table 5 ), and individual pretreatment fT4.

Table 5 Mean values (SD) (95% CI) of fT4 and TSH SSC’s of the 61 congenital hypothyroidism patients of the study group

Comparison of fT4 SSC’s With Age-Specific fT4 Reference Ranges

1,170 follow-up samples were collected from 61 patients over the first 24 mo after therapy start and the measured fT4 values were classified as euthyroid, when fT4 concentration was within the individual fT4 SSC ± 2 SD range, and as OT or UT, when it was above or below this range. This classification was compared with the classification based on the 95% CI of age-specific fT4 and TSH reference ranges, constructed from 208 samples of healthy infants, aged 9 d to 24 mo, admitted to our hospital for small chirurgical procedures (30). Over-treatment relative to the age-specific fT4 reference range was defined as an fT4 concentration > 95% CI. For each classification, two groups were formed, based on whether or not over-treatment had occurred in period 0–24 mo: the groups OT and No OT for the fT4 SSC’s, and the groups Over and No Over for the fT4 age-specific reference range. The groups were combined in a contingency table: OT/Over, OT/No Over, No OT/Over, and No OT/No Over.

Severity of Over and Undertreatment

The severity of the OT/UT episodes, defined as the product of the distance between the fT4 value and the individual fT4 SSC +2 SD line/−2SD line (in SDS) and the duration of the OT/UT episode (in months), resulting in the unity, was calculated for each episode, for the period 0–6 y. The severity factor (SF) per patient, being the sum of the severity determinations, was calculated for age periods 0–5 mo and 6 mo to 6 y. Three of the following groups were formed based on SF OT/UT 0–5 mo: (i) Severe, when >2 SDS.month; (ii) Mild, when between 0 and 2 SDS.months; and (iii) No OT/UT, when SF was 0. For the period 6 mo to 6 y, two groups were formed: Yes or No OT/UT.

Development Tests

At age 1.8 y, the infants were tested with the Bayley Development Scale (31); only the mental development index (MDI) scores was used in the analysis. At ages 6 and 11 y, the short version of the Revised Amsterdam Child Intelligence Test (RAKIT) (32,33) was applied, a reliable (r > 0.95), normed, Dutch child intelligent test for children aged 4–11.6 y. IQ scores across the ages are stable. The RAKIT-short consists of the following six subtests: Exclusion (visual reasoning); Word Knowledge (passive knowledge of word meaning); Disc Placement (visual-spatial-motor task); Learning Names (verbal-memory-association task); Hidden Figures (visual analysis); and Word Fluency (fast word production). Around all test moments, thyroid function was checked. At all ages, control groups of healthy children were included: 207 at age 1.8 (8), 37 at age 6 (9), and 25 at age 11 (6). Cognitive development scores for study group and control groups were corrected for socio-economic status (SES) and ethnicity, as these characteristics might influence scores (32,34,35). After correction, study group scores were standardized by the scores of the control groups that were set at a mean of 100 and SD of 15.

Statistical Analysis

Statistical analysis was done with SPSS version 23.0 (SPSS, Chicago, IL). Chi-square tests were used to analyze differences in sample classification between the fT4 SSC’s and the reference ranges. A repeated measurements ANOVA was used to compare the cognitive development scores of the four contingency table groups over time. In this analysis, the four groups and age were the independent variables, cognitive development scores at ages 1.8, 6, and 11 y the dependent variable. This analysis gives unbiased results even in case of (randomly) missing values. In all analyses, development scores were adjusted for type of CH (mild/severe) and start of treatment (early/late). An ANOVA procedure was applied comparing the clinical characteristics of the 0–5 mo treatment groups. A repeated measurements ANOVA was used to estimate the effect of the factors OT/UT 0–5 mo and OT/UT 6 mo to 6 y on the cognitive development scores at three ages. Logistic regressions were applied to determine the risk of an IQ11<85 for the OT 0–5 mo groups and the risk of OT by high initial dosages of L-T4. Selected for this last analysis were the first three control samples per patient after initiation of therapy (n = 183). Lastly, a linear regression was used to analyze the dependency of fT4 concentrations on initial L-T4 dosage and minimal initial fT4 deficit. A two-tailed significance of P < 0.05 was set for all tests.

Statement of Financial Support

This study was financially supported by unrestricted grants from the Theia Foundation, Jan Dekker & Ludgardine Bouwman Foundation, and Erasmus University Rotterdam Association Trust Fund.


None of the authors have competing financial interests.