The short-term health benefits from breastfeeding on mother and child are widely acknowledged [1]. Additionally, several studies have linked a history of breastfeeding with long-term maternal and child health outcomes, including reduced risk of type 2 diabetes [2], metabolic syndrome [3], hypertension and myocardial infarction [4] in mothers, as well as reduced risk of obesity [5] and type 2 diabetes [6], and lower blood pressure [7] and cholesterol levels [8] in children. Such studies have often relied on maternal recall of breastfeeding history, but, though this has been found to be an accurate estimate shortly after delivery [9, 10], less is known about the long-term accuracy. Two previous studies have evaluated a recall interim of more than two decades, one of which comprised only college-educated women [11] while the other addressed a fairly small study sample [12]. Nevertheless, more data on the accuracy of recall are needed in order to enhance the interpretation of epidemiological findings. The objective of the current study was to assess the accuracy of maternal recall 20 years after delivery, and also to examine potential predictors of inaccurate reporting.


We compared two sets of data that were collected from the same group of Norwegian women with a recall time varying from 20.2 to 22.5 years. First, breastfeeding data were collected prospectively in 1986–1989 by health professionals during the child’s first year of life, hereafter referred to as recorded breastfeeding data (our reference method). Second, recall of breastfeeding data was collected from a brief questionnaire mailed to the women in 2008, i.e. about 20 years after the birth of their child (our test method).

All mothers gave informed written consent. The study was approved by the Regional Committee for Medical and Health Research Ethics and by the Norwegian Data Inspectorate.

Study population

Participants were selected from a population-based prospective observational study conducted in 1986–89 in the cities of Trondheim and Bergen, Norway, hereafter referred to as the parent study. The background and design of that study have previously been described in detail [13]. Briefly, it was designed to study the tendency among mothers to repeat patterns of fetal growth and birth outcomes in consecutive pregnancies. Caucasian women with singleton pregnancies and one or two previous births were included. Exclusion criteria were multiple pregnancy gestational age > 20 weeks at enrollment, and non-Caucasian ethnicity or language incompatibility.

The mother was screened and enrolled in the study around gestational week 17 based on referrals from her primary health care provider [13]. At the first visit her age (years), highest level of education, family status, smoking history, body height and pre-pregnancy weight were recorded. At birth, the gender, gestational age (days), birth weight (grams), length (cm) and head circumference (cm) of the offspring were recorded.

Among 1,044 women who participated in the parent study, 63 women were deceased or had withdrawn their consent, which left a total of 981 women as eligible for our study (Figure 1). Among these, 47 women could not be traced. Hence 934 women were invited to participate in the recall study. A one page questionnaire was mailed to the mothers in 2008 some 20–22 years after delivery of her index child, i.e. the child enrolled in the parent study. The collected information included parity, the child’s birth weight, duration of breastfeeding in months, and the age when solid foods and other kinds of milk than breast milk were introduced.

Figure 1
figure 1

Cohort profile. Flow-chart of the participants in the parent and recall study of breastfeeding duration among Norwegian women.

A total of 579 women (participation rate 62.0%) returned the questionnaire, of whom 12 women were not included because they did not specify whether they had breastfed, or they confirmed that they had breastfed, but did not give any duration. For 193 of the 567 women who gave complete recall breastfeeding data, recorded breastfeeding data in the parent study were incomplete. In all of these cases, the records showed that the mother had breastfed at the first or several follow-ups, but there was no recorded cessation of breastfeeding. Thus, complete breastfeeding duration data from both the parent and recall studies were available for 374 women. Among the latter, 29 women were still breastfeeding at the final follow up 13 months after delivery. Our study is therefore based on 374 subjects with complete data from both sources in the analyses where we used breastfeeding duration as a categorical variable (Figure 1). When breastfeeding duration was employed as a continuous variable, the 29 women who were still breastfeeding at 13 months were not modeled since the exact duration extending 13 months was unknown. All the women in our study were known to have attempted to breastfeed.

Among the 355 women who were invited to the recall study, but who did not respond, 263 had complete breastfeeding data in the parent study. Data on breastfeeding duration and background maternal and child characteristics for the non-responders with complete recorded breastfeeding data are presented in Table 1.

Table 1 Maternal and child characteristics of responders, non-responders and eligible participants

Recorded breastfeeding data

In the parent study, mothers and children attended regular public health follow-ups in their respective communities at 6 weeks, 3 months, 6 months, 9 months and 13 months after delivery. During each visit the women were interviewed by a nurse whether they currently breastfed their baby (“Does the infant receive breast milk?” and “If the infant does not receive breast milk, at what age (of the infant) did you discontinue breastfeeding?”). If mothers had stopped breastfeeding, they reported the total duration in weeks at the first two visits and in whole months at the other three. The duration of any breastfeeding was defined as the total number of weeks or months the child received any breast milk, irrespective of the concomitant introduction of other fluids and solid foods. A breastfeeding data record was deemed complete if there was a recorded entry for non-breastfeeding or cessation of breastfeeding at any of the interviews. In case of several recorded entries for cessation of breastfeeding, the data from the follow-up closest to the child’s date of birth were used. The age of the infant when another kind of milk than breast milk and solid foods were introduced was also recorded.

Recalled breastfeeding data

A total of 374 women with complete recorded breastfeeding data in the parent study returned the one page questionnaire. The questions about breastfeeding method and duration were as follows: “Did you breastfeed your son/daughter when he/she was a baby?” and “For how many months did you breastfeed?” We asked for the age of weaning in months because we anticipated that recall in weeks or days would be too inaccurate. Furthermore, breastfeeding duration is usually asked in months in recall epidemiological studies when breastfeeding duration is used as exposure [14, 15]. In this paper, breastfeeding duration refers to any breastfeeding.


In some previous studies the accuracy of breastfeeding duration recall was associated with various maternal and child characteristics [9, 11, 12, 1619]. Based on those findings, we considered the following maternal covariates: age at study entry, pre-pregnancy body mass index (kg/m2; BMI), education (primary school, secondary school, college/university or unknown), and smoking status (ever vs. never smokers). Offspring covariates were birth weight, whether the newborn was preterm or small for gestational age (SGA; birth weight for gestation < 10th percentile) [20], birth order and gender, and the age when the child was introduced to cereals and any other kind of milk than breast milk.

Statistical analyses

The study outcome was the completed number of breastfeeding months. Given that one week is the approximate equivalent of ¼ month; the initially reported weeks were recalculated into months by multiplying number of weeks by 0.25.

In order to evaluate the representativeness of the study sample, responders in the recall study were compared to non-responders using Chi-square statistics and independent samples t-test, or alternatively, the nonparametric Mann–Whitney test for variables that were not normally distributed, in which ties were split equally.

We calculated an intraclass correlation coefficient (ICC) between the recorded and recalled breastfeeding data, as a ratio of the variance between subjects over the total variance, in which an ICC over 0.75 was considered a strong correlation [21]. This absolute agreement coefficient was calculated overall and for subgroups defined by variables previously suggested as potentially associated with recall of breastfeeding in other populations. The Wilcoxon signed rank test was used to determine whether recalled breastfeeding duration varied significantly from that of the recorded data. To evaluate the possible relation between the discrepancies in breastfeeding duration in the recalled and recorded data, we used Bland-Altman plots [22]. This type of plot employs the difference between the two methods against their mean, and shows the magnitude of disagreement, spots outliers and investigates any possible relationship between the recall error and the recorded value.

Logistic regression was used to assess possible predictors of misreporting breastfeeding duration by more than a month. Variables that gave a P-value ≤0.10 in a simple model were further analysed in a multivariable model adjusting for potential confounders. A variable was considered a confounder if an odds ratio (OR) changed by 10% or more after adjustment, using the change-in-estimate method [23]. Variables that had no such effect were deleted from the model one by one in a stepwise manner. Since only 29 women in our study underreported their breastfeeding duration by more than a month, multivariable analyses to predict underreporting could not be performed.

Breastfeeding duration was grouped in a manner corresponding to a categorization scheme previously used in a study on maternal recall of breastfeeding duration [11]. However, since our sample consisted of women who were all known to have attempted breastfeeding, as well as a fairly large group of women who had breastfed for ≥13 months, we had to modify the categorization scheme. Our categories were 0, >0-3, 4 – 6, 7 – 9, 10 – 12 and ≥13 months. The aim was to assess the degree of misclassification by cross-tabulations. In order to evaluate agreement across categories, kappa statistics were calculated for each of the categories in relation to all of the other categories in 2×2 tables. For the ordinal multi-categories, we computed a quadratic weighted kappa, in order to attach greater emphasis to large differences between categories than small ones. Weighted kappa was given by the formula K w = ∑ wf o wf c /n–∑ wf c , where wf o  = 1–(i j)2/(k–1)2, with i − j representing the difference between the row category on the scale and the column category on the scale (the number of categories of disagreement), for the cell concerned, and k representing the number of points on the scale [24, 25]. Strength of agreement was evaluated according to Landis and Koch [26].

Analyses were performed with PASW (SPSS) for Windows version 18.0 (SPSS Inc., Chicago; IL, USA) and STATA software, release 11 for Windows (Stata Corp., College Station, TX, USA).


The 374 women were on average 29.1 years (SD 3.9) old at study entry (Table 1). Mean time since delivery was 21.2 years (SD 0.6) (data not shown). Four of five (80%) mothers delivered their second child (Table 1). Approximately one in four of the women had college or university education. According to data from the parent study, all women had attempted breastfeeding. Among non-responders, fewer had college/university education, fewer were never smokers, the mean breastfeeding duration was shorter, and another kind of milk (other than breast milk) was introduced earlier compared to the group of responders.

Almost two thirds (64%) of mothers recalled their breastfeeding duration to within one month compared to the recorded data (Table 2). Three times more women overreported breastfeeding than underreported it (n=95 vs. n=29, respectively, p < 0.001). Further, five out of six (83%) women recalled breastfeeding to within two months and 90% to within three (data not shown). Median breastfeeding duration among women who underreported by more than a month was 9.0 months (Interquartile range (IQR) 3.5), whereas women who overreported by more than a month had a median breastfeeding duration of 5 months (IQR 5.8) in the parent study. Among women who recalled their breastfeeding duration within a month, median breastfeeding was 6.0 months (IQR 6.0) (data not shown). Median difference between recalled and recorded breastfeeding duration among women who overreported it by more than a month was higher compared to women who underreported it (2.3 months [IQR 2.0] vs. 2.0 months [IQR 2.0] respectively, [p<0.001]) (data not shown).

Table 2 Recall error of breastfeeding duration by maternal and child characteristics (n=345)

There was 97.9% agreement between maternal recall of their initial feeding practice (ever vs. no breastfeeding) and that recorded in the parent study 20 years earlier. Among the 39 women who had breastfed for less than 1.5 months according to the parent study, eight recalled that they had not breastfed (21%) their index child. Median breastfeeding duration was 6.0 (IQR 6.0) months in the parent and 7.0 (IQR 5.0) months in the recall study, respectively (data not shown). Median difference between recalled and recorded breastfeeding duration was 0.5 months (IQR 2.0, p<0.001). The overall intraclass correlation coefficient was high (ICC=0.82, p<0.001). Across subgroups of selected maternal and child characteristics, the agreement between recorded and recalled data on any breastfeeding duration varied from good to excellent, with the lowest value among those who introduced another kind of milk earlier than 4 months of infant’s age (ICC 0.69, p<0.001) and the highest ICC among women with a pre-pregnancy BMI ≥25 kg/m2 (ICC 0.93, p<0.001) (Table 2).

We also assessed possible predictors of overreporting breastfeeding duration by more than one month. Women who had breastfed 6 months or shorter in the parent study were more likely to overreport compared to those who had breastfed more than 6 months in the unadjusted analyses (OR 2.1; 95% CI 1.3, 3.4, p-value 0.003), but the results were no longer significant when introduction of another kind of milk and maternal pre-pregnancy BMI were added to the model (adjusted OR 1.2; 95% CI 0.6, 2.4). However, introduction of another kind of milk before the child was 4 months old remained significantly associated with overreporting by more than one month in the full model (adjusted OR 2.2, 95% CI 1.1, 4.2, p-value 0.022). No other variables were found to be independent predictors of overreporting.

The Bland-Altman plot showed that most of the mean differences were positive, i.e. recalled breastfeeding duration tended to be overestimated, (Figure 2). The limits of agreement were wide and ranged from positive to negative values, implying that the women both under- and overestimated their breastfeeding duration in the recall follow-up study compared to the recorded data in the parent study. However, the plot did not indicate that the differences increased with an increase in breastfeeding duration. The plot also illustrated that the over- and underestimation were extreme in some of the cases. More specifically, 23 (6.7%) women had a difference in breastfeeding duration (recalled minus recorded duration) of more than the mean ±2SD.

Figure 2
figure 2

Bland-Altman plot. Differences between recalled and recorded breastfeeding duration vs. the mean of the two breastfeeding durations (n=345). Limits of agreement: Mean ±2 standard deviation (SD), 0.774 ± 2*1.882.

Breastfeeding duration by categories

Using the categories 0, >0-3, 4–6, 7–9, 10–12 and ≥13 months, breastfeeding duration was correctly classified by 245 (65.5%) women (Table 3). Another 113 (30.2%) women misclassified the duration by one category, whereas 16 women (5%) misclassified it by two or more (data not shown). The proportion of women who misclassified their breastfeeding duration was highest in the 4–6 months category (39.5%) and lowest in the ≥13 months category (17.2%) (Table 3). Our results indicate that fewer women in the three mid categories overreported, and more women underreported their breastfeeding duration as the recorded breastfeeding duration increased. Agreement was statistically significant (p<0.001) for all categories of breastfeeding duration, with a Kappa coefficient ranging between 0.47 and 0.72 (Table 3) for the separate categories. The overall weighted Kappa statistics across all of the categories was 0.85 (95% CI 0.0.82 – 0.89), which gives a ‘almost perfect’ strength of agreement [24].

Table 3 Breastfeeding duration by categories and distribution of recall error (n=374)


Even with a median overestimation of about two weeks, this study among 374 Norwegian mothers showed that they recalled fairly accurately how long they breastfed their child after 20 years. A recall error of more than one month was explained only by the age of the child when another kind of milk was introduced. We found no statistically significant association between maternal education [9, 12, 16], gender of the child [12] or parity [12]. This is in agreement with previous studies and suggests that any lack of accuracy of maternal recall was non-differential. However, some comparisons may suffer from low statistical power that calls for caution in the interpretation of the results.

Recall was a fairly accurate measure of the mothers’ initial feeding method (ever versus no breastfeeding), with an agreement of 97.9%. Even among the 39 women who breastfed for less than six weeks (1.5 months) 4 out of 5 women had consistent results in both the recall and parent study. The eight women with inconsistent results recalled that they had not breastfed at all, while the records at 6 weeks follow-up examination indicated that they did so for a few weeks (range 1–5 weeks) after delivery. One may speculate that they had forgotten, mixed it up with another birth, or did not consider their brief period of breastfeeding duration important enough to mention in the recall study.

The accuracy of long-term (> 10 years) maternal recall has been investigated in a cohort of Jerusalem residents [12] and among Australian [16] and college-educated US women [11]. The sample size in all of these studies (n< 150) was small compared to the present one. Other authors have evaluated shorter term (≤10 years) recall [9, 10, 17, 19, 2730]. Except for the study by Cupul-Uicab et al. [10], all reported fairly small sample sizes. Accordingly, we hold that we have conducted the first long term maternal recall study of breastfeeding duration by the use of a large population-based sample of women where breastfeeding was common and normally of long duration.

As in several previous studies, our median recalled breastfeeding duration was longer than in the health records from the parent study [12, 16, 18, 19]. Nevertheless, our median difference between recalled and recorded breastfeeding duration was smaller than in comparable studies, even if the average recall period was the same or even longer [12, 16]. Almost two thirds (64%) of our study women recalled their breastfeeding duration to within one month and 83% to within two months of the recorded duration in the parent study. Our accuracy was slightly less favorable than Eaton-Evans and Dugdale with a recall interval of three years (79% and 95% correct recall within one and two months, respectively) [9], but better than the one reported by Tienboon (35% and 59%, respectively) [16] after an interval of 15 years. Discrepancies of one month in any direction could be attributed to rounding errors, while larger discrepancies could possibly be explained by the mothers’ recall of breastfeeding duration of a different child. Yet we found no association with parity and misreporting breastfeeding duration by more than one month.

For comparison, the correlation coefficients in our study were slightly lower than reported in a study among Mexican women two to four years after delivery [10] and among Canadian women with a follow-up time of eight years [18], but higher than the report from college-educated US women with a recall period of more than 34 years [11].

There was some misclassification when breastfeeding duration was analysed in categories. Still, 95% of the women were either correctly classified or misclassified by only one category. The overall Kappa statistics of 0.85 suggested an almost perfect agreement [24], which is far higher than the findings of Promislow et al. showed [11]. Furthermore, the proportion that correctly classified breastfeeding duration was higher in our study (66% vs. 54%). Whereas the latter study comprised only college-educated women, ours consisted of women from all educational levels. Nevertheless, we found no association between education and misreporting by more than one month.

The fact that the highest proportion of misclassification was found in the mid categories may reflect floor and ceiling effects [31]. By design, women who breastfed for more than 13 months could not mathematically overreport duration. Correspondingly, women who breastfed for less than three months were unable to underreport, because of the categorization scheme we chose.

The strengths of our study are the comparably large sample of women from a population where breastfeeding is the accepted norm and of long duration, and the reasonably high response rate (62%). Another strength is the long-term recall period and the prospective standardised recording of breastfeeding data by health professionals. And whereas some previous studies have presented their findings as categorical data only [17, 32], we report our outcome as both continuous and categorical variables.

One limitation of our study is the slightly different background characteristics of the responders as compared with the non-responders. Hence we cannot rule out the possibility that responders were more likely to recall their breastfeeding practices. Added to this, even though the response rate was reasonably high, we could not include all of the responders in our analyses because of missing data on recorded breastfeeding in the parent study. Hence our sample may not be fully representative for the general Norwegian population.

By design, the parent study did not include primiparae [13], which may be considered a second limitation. Breastfeeding duration from one infant to the next tends to be correlated [33], and therefore, multiparous women, who have breastfed two or more children for similar lengths of time, may be more likely to report it accurately than women with only one child. Third, participation in the parent study implied that the women attended some additional health check-ups during pregnancy and the child’s first year [13]. Therefore, we cannot rule out that our participants were more health conscientious, had a stronger intention to breastfeed and were more focused on their pregnancy and post-partum period. Fourth, almost one third of the responders in the recall study had incomplete data on breastfeeding duration in the parent study. More specifically, while the records of the parent study showed that the mothers had breastfed at one or more follow-up visits, the exact cessation of breastfeeding was not recorded. Among these, there were fewer children (4.7%) who were born small for gestational age [20]. One may speculate whether the public health nurse recorded the cessation date more scrupulously among children who were small for gestational age. Yet, it is unclear if and how this would affect the long-term maternal recall many years later. Fifth, among the responders in the recall study with both recorded and recalled breastfeeding data, there was a higher proportion of mothers of children classified as small for gestational age (26%). However, additional analyses did not indicate that mothers of children that were small for gestational age recalled breastfeeding duration more accurately than mothers of children that were not. Finally, breastfeeding has generally been the norm in Norway where more than 90% have breastfed for at least one week since the late 1960s [34]. Thus, breastfeeding initiation was probably close to 100% during the time of the parent study. It may be held that our results are not entirely applicable in populations where breastfeeding rates are lower. Even so, and in view of any purported selection of our study population, the high agreement between recalled and recorded breastfeeding duration supports the use of recalled breastfeeding duration as an exposure variable in epidemiological studies on maternal adverse health outcomes in later life, as has been done in a recent Norwegian study [35].


The results of this Norwegian recall study among mothers 20 years after delivery show that their recall was fairly accurate in terms of the initial feeding method of their child and breastfeeding duration. Generalising our results to other populations with different breastfeeding behaviour may, however, not be entirely appropriate. Further studies should examine the potential effect of misreporting of breastfeeding duration on estimates of associated and later health outcomes.