The inclusion of qualitative data in systematic reviews is an area of ongoing methodological development [13], with particular problems arising for reviews attempting to synthesise quantitative with qualitative data. The Cochrane qualitative methods group [2] suggests four areas in which development is needed; (1) searching, (2) critical appraisal, (3) synthesis/summary, and (4) loss of research context. This paper aims to contribute to development in the synthesis of qualitative and quantitative data. Alternative models and vocabularies of synthesis are emerging [39], but standard methods for combining different data types from the qualitative and quantitative research traditions have not yet been agreed [8].

Innovative methods are often developed during the course of research, but in general, papers report methods only briefly. As a result, the material that could inform learning is more often to be found in filing cabinets than in journals. In this paper we aim to distinguish between "the trivial and non-trivial points of divergence" p.31 [4] by providing worked examples of two methods of evidence synthesis (thematic and textual narrative) tested in one systematic review.


A systematic review of lay views about infant size and growth was undertaken as part of a series of interlinked reviews examining the evidence for associations between early growth and a number of later outcomes. The systematic review of views included both qualitative and quantitative studies.

Study methods and findings are reported in greater detail elsewhere [1013]. Standard systematic review methods were employed, following guidance from the Centre for Reviews and Dissemination [14] and from an advisory group with backgrounds in public health, paediatrics, infant nutrition, qualitative and quantitative methods, systematic reviewing, and including representatives from user groups. Twelve databases were searched using terms for growth, height, weight and infancy as well as appropriate methodological terms. 2,694 abstracts were retrieved, from which 19 studies met the inclusion criteria for the review.

Two researchers independently extracted findings by interrogating each study using the following questions developed from the aims of the review:

  1. 1.

    What is healthy growth/size?

  2. 2.

    How important is growth/size to participants?

  3. 3.

    What concepts are used to define healthy growth/size?

  4. 4.

    How do participants assess growth/size?

  5. 5.

    Where does growth lie among priorities for child health?

  6. 6.

    What information influences views/behaviour?

  7. 7.

    Who influences views/behaviour?

Directly reported participant data (e.g. verbatim quotations or scores on attitudinal scales) and author interpretations were recorded separately, to retain the richness or 'thickness' of the contributing data. 'Thickness' in this context refers to the kinds of relatively detailed descriptions and contextual material which help the reader to make judgements about the trustworthiness of the data, particularly when applying it to different contexts [15, 16]. Study characteristics and quality assessment were summarised (for examples see Table 3). There is vigorous debate on whether qualitative research can be assessed using standard quality criteria, or whether this process is contrary to the nature of qualitative enquiry [17]. While the controversy on the use of critical appraisal in systematic reviews including qualitative data lies beyond the scope of this article, with views ranging from those who believe that critical appraisal is core to qualitative synthesis [18] to those who, like Barbour [19] consider that critical appraisal of qualitative research can be reductionist, it is notable that there is general agreement that a checklist approach to critical appraisal can bring its own problems, particularly in relation to transparency in assessing interpretative work. We took the view that applying quality criteria rigidly would be likely to exclude relevant studies that had failed to comply with a particular reporting regime. Thus, all studies meeting our inclusion criteria listed were included and quality appraisal was used at the data synthesis stage contributing to strength of evidence.

Table 3 Example study summaries

Two methods were proposed for synthesis of findings, textual narrative and thematic, both of which the advisory group agreed were appropriate to our needs. The first, the textual narrative approach, involves a commentary reporting on study characteristics, context, quality, and findings, using the scope, differences and similarities among studies were used to draw conclusions across the studies, whilst the second, the thematic approach, groups data into the themes. Given the relatively small number of studies located, it was feasible to test both methods. Findings from the review are provided briefly for illustration, but the focus of this paper is on the process of synthesis and a comparison of methods used. The two reviews ran in tandem, as the thematic review needed time for response and comparison between reviewers.


Worked Example 1 – Textual Narrative Synthesis

Factors identified by the research team from the research literature as likely to affect views on infant growth were used to define a number of sub-groups. These were:

  1. 1.

    Relationship between participant and infant (e.g. mothers, other family members, health professionals, unrelated others)

  2. 2.

    Weight status of participant

  3. 3.

    Ethnicity of participant

  4. 4.

    Age of infant

  5. 5.

    Views about infants considered 'high risk' at birth i.e. those born too small or too early, or who were placed in a neonatal intensive care unit (NICU)

  6. 6.

    Weight/growth status of infant after birth

  7. 7.

    Mode of infant feeding (breast fed, bottle fed, weaned)

Using agreed versions of quality appraisal and extracted data a textual narrative synthesis was undertaken by a single researcher (PL). Each study within a sub-group was described in a commentary reporting on study characteristics, context, quality, and findings. The scope, differences and similarities among studies were used to draw conclusions across the studies (the synthesis). Drawing conclusions across studies was not always possible due to study heterogeneity and lack of data. A worked example of the process is shown in Table 1.

Table 1 Stepwise textual narrative synthesis

Findings – Textual Narrative Synthesis

We noted that unrelated members of the public tended to prefer infants of mid-range body sizes, but the evidence to support this observation was thin. Families of children with poor growth were acutely aware of growth as a problem; they monitored growth and discussed it with others. They desired "normal" growth in their child, and looked for ways that they could interpret the infant's growth as normal (for example finding members of the extended family who were of similar body shape). The most common method of assessing size in all sub-groups was by comparison with others, although the use of growth charts and physical measurement were also important for those with children with poor growth including babies born too small or too early. However, growth and size in themselves were low among concerns about such 'high risk' babies. The predominance of those with 'high risk' infants may explain our conclusion that growth was low among priorities for mothers of younger infants (aged 0–3 and 3–6 months). Among older children (more than 12 months) with poor growth there was concern among parents. Parents wanted to see good growth in their children, but they also considered love, attention, good health and good diet as important.

We judged that we had insufficient data to draw conclusions about the views of family members other than mothers, health professionals, or to compare the views of participants of different weight, ethnicity, or toward breast versus bottle fed infants.

Worked Example 2 – Thematic Synthesis

Thematic synthesis was undertaken by two researchers, LA and PL. Findings from all studies were collated under the 7 questions used in data extraction. Each researcher independently conducted a thematic analysis using these findings. On initial discussion of themes, researchers judged that there was repetition between the data extraction questions, and that data referred to four broad areas of enquiry:

  1. 1.

    Understanding healthy growth/size

  2. 2.

    Assessment of growth/size

  3. 3.

    Concerns about growth/size

  4. 4.

    Influences on views, behaviour, interpretations of growth/size

Data and themes were grouped into these areas and emerging themes were then considered for relevance, presence across studies, 'thickness' and duplication. This process was repeated until researchers were satisfied that all data could be interpreted within these themes and an agreed version reached. A worked example of the process is shown in Table 2.

Table 2 Stepwise thematic synthesis

Findings – Thematic Synthesis

Across the thematic synthesis the predominant concern of participants was normality. This was seen through the creation of norms of growth and models to explain difference. This was conducted across physical, observable characteristics, but included physical unobservable (such as underlying health status) and non physical (such as emotional care) dimensions. Where growth differed from the norm and a plausible explanation could not be found, for example among families of those with faltering growth [20], growth became an important concern for parents.

Data from across studies could be usefully combined in this method, for example in listing all the sources of influence on behaviour or views found. Family, other parents and friends, information from the infant themselves, health professionals, clothing sizes, magazines, books, radio, TV and their religious beliefs were all important to some, but the relative importance of these could not be explored.

Strengths and limitations of our study

While the data extraction and thematic synthesis was undertaken by two researchers working independently, only one of these researchers (employed to work on the qualitative aspect of the review) worked on the narrative synthesis with a second researcher discussing the work as it progressed. Whether the findings might be different with more than one researcher working on both syntheses, or researchers not involved in the data extraction doing the syntheses, or the syntheses being carried out in a different order, are themselves research-able (if rather expensive) questions, as is the issue of whether the immersion of one researcher in the data at every stage a strength (as we believe it to be) or a source of bias.


Reassuringly, the conclusions to which these analyses led us about lay perspectives were largely similar across the thematic and textual narrative synthesis. Whether using a different research team, or a larger number of reviewers, would have produced different results is itself a researchable question. However, in this case conclusions from both analyses were dominated by importance of having babies that were a 'normal' size, leading to interest in monitoring of growth in a number of ways and, sometimes, to concern that there was an underlying problem leading to 'abnormal' growth. While the general conclusions were the same, the process and the implications of the two types of synthesis differed.

Strengths and Weaknesses of Textual Narrative Synthesis Methods

A textual narrative approach typically groups studies into more homogenous groups. This technique has been particularly successful in synthesising different types of research evidence (e.g. qualitative, quantitative, economic). Examples include a number of reviews carried out by the Evidence for Policy and Practice Information and Co-ordinating Centre (EPPI-Centre) [2123], reviews of tobacco use and exposure to tobacco smoke [24], reviews of ultrasound in pregnancy [25] and of communication between health care professionals and patients about prescribing [26].

In our review, the textual synthesis proved a useful way to describe difference in the included studies, making explicit the diversity in study designs and contexts. The textual narrative review also described gaps in the literature, both by showing where evidence was absent and by making an evaluation of the strength of evidence in different areas. Using this method enabled us to comment on, for example, the ethnic uniformity of participants, and the lack of evidence collected regarding mode of feeding.

However, transparency remained a problem. For example, decisions about which sub-groups to use for synthesis of individual studies rely on judgements, albeit ones which can be informed by the scientific literature and by lay views. While we sought to make the decision making process clear, interpretation and judgement, which are not fully susceptible to external scrutiny, lie at the heart of the process.

Strengths and Weaknesses of Thematic Synthesis

The strengths of the thematic synthesis lie in its potential to draw conclusions based on common elements across otherwise heterogeneous studies. This synthesis is potentially more accessible for the reader than a textual synthesis. Conclusions from this thematic synthesis fulfil an important research aim of qualitative research in generating hypotheses, an area to which traditional systematic reviews are poorly suited [27].

However, pooling findings in the thematic synthesis risks masking the shortcomings of the individual studies that make up the review. Although descriptions of study characteristics and quality appraisal were presented alongside synthesised findings, the synthesis process obscured these in the conclusions. We believe that further debate about the reliability of this approach would be useful. On the one hand, the hypotheses that emerge from this synthesis draw on a broader body of views than any single study (as in a meta-analysis) and may therefore increase reliability; on the other, we risk making strong conclusions based on a group of studies none of which is in itself reliable on the grounds of quality or diversity of context. This method may also be poor at examining contradictions, as well as commonalities, in the data and at highlighting gaps in the evidence.


The selection of synthesis method for systematic reviews such as this may depend on the aims of the synthesis. For the purpose of generating future research hypotheses, the thematic synthesis appears to hold the greatest potential; describing common themes and providing a possible structure for new research. In contrast, the textual narrative synthesis might be better suited to reviews which aim to describe the existing body of literature; identifying the scope of what has been studied, the strength of evidence available, and gaps that need to be filled.