Introduction

Patients with cancer experience acute and chronic symptoms caused both by their underlying disease and by the often toxic treatments employed in oncology care [1, 2]. Clinical investigators, regulators, and healthcare providers often focus on the prevention or cure of illness, whereas relief of symptoms is a paramount goal for patients [3, 4]. Symptoms are a common reason healthcare is sought and an impetus for testing to definitively diagnose an illness or injury. Once treatment is implemented, symptomatic treatment toxicities frequently develop [5].

While numerous studies have examined the impact of various treatments and medications on symptoms in patients receiving cancer treatment, a limitation in comparing these outcomes is the lack of a common or “core” set of patient-reported symptoms, consistently measured across studies. For example, one study may report an improvement in nausea with a specified treatment and another study may report worsening fatigue, leaving the clinician and patient comparing treatment efficacies and weighing side effects based upon disparate outcomes.

These differences in symptom outcomes between studies to some extent can be attributed to the use of varying symptom assessment questionnaires and strategies. While there is value to focusing on specific symptoms in particular clinical contexts (e.g., pain in patients with metastatic prostate cancer to bone), there is also value in assessing a common set of symptoms both in order to characterize the broader impact of disease and treatment on the patient experience, and to enable cross-trial comparisons and aggregation of data. Therefore, in 2011, the National Cancer Institute sponsored a scientific meeting to identify a standard core set of patient-reported symptoms to be recommended across cancer clinical trials as well as existing questionnaires that are appropriate for assessing these symptoms. As a component of this initiative, a multi-disciplinary team conducted a systematic review of the literature to identify studies that measured the prevalence and severity of symptoms in patients undergoing cancer treatment.

Search methods

A systematic electronic search of PubMed was performed with the search terms of “multiple symptoms” and “cancer” (Fig. 1). The search was limited to adults over the age of 18 years, English language articles, and the years 2001 to 2011. Search outputs including the abstract were reviewed independently by seven authors, and full text of papers deemed to be potentially relevant based on a priori criteria (i.e., containing evidence characterizing symptoms in patients being treated for cancer) for inclusion were examined. Hand searches of papers in reference lists were also performed. Studies were included if they evaluated multiple symptoms in persons receiving active cancer treatment regardless of cancer site, stage, treatment, or geographic location. The initial search retrieved 55 publications of which 19 were excluded as case studies [68], studies of single symptoms [912], symptoms at the end of life [13, 14], presenting symptoms at the time of cancer diagnosis [15, 16], or were otherwise not research reports focused of multiple symptoms in persons with cancer [1724].

Fig. 1
figure 1

Search methodology

In 2009, Esther Kim et al. [25] published a literature synthesis of cancer symptomatology in 18 studies. This paper and four additional literature reviews [2629] identified in our systematic search were excluded from this analysis. Instead, the findings by Esther Kim et al. (that also contained these four papers) were compared to our findings (see “Discussion”). Unlike the Esther Kim et al. synthesis which only included studies that used the MD Anderson Symptom Inventory (MDASI), Memorial Symptom Assessment Scales (MSAS), or the Symptom Distress Scale (SDS), this present analysis intentionally did not limit to any specific assessment instrument so that a wide array of symptoms could be collated and analyzed across a broader range of studies.

Nine publications that failed to report symptom statistics [3038] and one publication that was a secondary study of a sample already included in this analysis [39] were excluded, resulting in 21 studies included in this analysis [4060]. Potentially relevant data were abstracted for the review including disease/treatment, sample, instrument used to measure symptoms, and symptom prevalence and severity. Where information about prevalence and severity was not presented, the authors were contacted by e-mail to obtain the necessary detail. Data were managed using Excel.

Analysis and synthesis methodology

Table 1 provides a summary of the 21 studies that evaluated multiple symptoms in adult oncology patients receiving active treatment. Twelve of these studies used a cross-sectional design and nine employed a longitudinal approach that included randomized clinical trials, a cross-over design, or structured interviews. The length of follow-up in the longitudinal studies ranged from 7 days to 18 months, with assessment time points typically concurrent with important milestones such as treatment cycles or return to the home community following allogeneic hematopoietic stem cell transplantation (HSCT).

Table 1 Evidence table

An aggregation of the demographics for the studies is presented in Table 2 and then prevalence and severity of symptoms by paper is presented in Table 3. To allow for comparisons between the cross-sectional and longitudinal studies, where possible, mean prevalence was computed by averaging the reported values across the available time points in the longitudinal studies. When this was not possible due to missing data, the baseline values were entered into the analysis as noted on this table.

Table 2 Characteristics of papers included in synthesis
Table 3 Prevalence and severity of symptoms by paper

A similar procedure was used for tabulating severity, although mean severity by symptom was often not reported. In addition, many of the studies only reported those symptoms in which the severity was classified as moderate to severe. Because there was variation in the range of the severity scales among the instruments used in the studies, mean values were linearly transformed to a 0–10 scale to facilitate comparisons. Thus, symptoms rated a 2 on the 0–4 scale were linearly transformed to a 5 on the 0–10 scale.

Lastly, to assess if the most commonly used instruments asked about the most compelling symptoms to patients, thresholds by number of patients queried for the symptom were constructed. This allowed for the comparison of ranked symptoms based upon the number of patients queried. Thus symptoms that were assessed in 100 or more people were ranked by frequency and severity, symptoms assessed in 500 or more people, and then a final threshold of 1,000 or more people. The lowest threshold of 100 people had the most symptoms ranked while creating the threshold of 1,000 people limited the number of symptoms. Some symptoms found in this synthesis did not include query by at least 100 patients and were therefore not included in this rank ordering of symptoms by threshold level.

Characteristics of these studies

Data were extracted from the reported studies to develop a pooled sample of 4,067 cancer patients in whom the prevalence and severity of individual symptoms was reported. Individual studies contributed from 16 to 1,433 participants; nine (43 %) studies included a sample size less than 100. All of the investigations employed convenience samples. Eleven (52 %) of the studies were conducted in the USA while the remainder were multinational as listed in Table 2. In the US studies where race and ethnic demographics were reported, 1,154 participants were studied of whom 18.5 % of the samples were non-white. Studies from other nations failed to distinguish race/ethnicity. In total, the pooled sample across the 21 studies was comprised of 38 % male and 62 % female, with a mean age of 58 years (range 18 to 97 years).

A majority (62 %) of these studies assessed symptoms in homogeneous samples with respect to tumor site (predominantly breast and lung cancer), while 38 % of the included studies utilized samples with mixed diagnoses and treatment regimens. Table 2 summarizes the characteristics of participants in the included studies by cancer disease site, stage, and treatment type. Persons with breast cancer were the single largest group by cancer site (approximately 25 % of patients) followed by lung (approximately 20 %). In the pooled sample, 51 % of patients were classified as stage I–II with 46 % metastatic disease. The largest majority of study participants were treated with chemotherapy, radiation, and/or surgery. Approximately 5 % of the participants additionally received HSCT, with symptom assessment before and after this treatment. For these two longitudinal studies, the frequency and severity of symptoms was averaged across the various assessment times for this synthesis.

Symptom assessment instruments

Eighteen instruments listed in Fig. 2 and structured interviews were used in the 21 studies included in this synthesis. Instruments included those measuring single symptoms, multi-symptom inventories, and single symptom items drawn from HRQOL or health status measures. The MD Anderson Symptom Inventory (MDASI) was the most commonly used instrument in the studies analyzed (n = 9 studies; 43 %), while the Functional Assessment of Cancer Therapy (FACT-G), Hospital Anxiety and Depression Subscale (HADS-D), Medical Outcomes Survey Short Form-36 (SF-36), and Symptom Distress Scale (SDS) were each employed in two studies. The remaining instruments were represented just once.

Fig. 2
figure 2

Instrument list

Symptom summary

Forty-seven symptoms were identified across the 21 studies which were then categorized into 17 logical groupings. While many studies only reported the most prevalent or severe symptoms, some reported every symptom acknowledged by patients. In an attempt to be as inclusive as possible, all symptoms reported were included in this synthesis.

A summary of the prevalence and mean severity rates (when available) for these various symptoms are provided in Table 4. This table is organized in descending order according to the prevalence of the measurement of symptom groupings as presented in italics. Thus, all of the studies in this synthesis measured and reported frequency and/or severity of the symptoms of fatigue and pain, 91 % of the studies similarly reported symptoms of sleep issues, while only 4.5 % reported on symptoms of hair loss or other appearance issues.

Table 4 Summary of pooled prevalence and pooled mean severity rates (when available)

The pooled prevalence and mean severities reported in Table 4 were calculated by aggregating all queries for that specific symptom across studies to provide a pooled prevalence per symptom. Where possible, symptoms of the same nature but labeled differently such as anorexia and decreased appetite were combined into one symptom. This was possible when only one term was used in a study providing one statistic; however, some symptom terms which could be synonymous for each other were often reported as separate items in the same study. Examples include the terms “fatigue”, “lack of energy”, and “weakness”. Each is listed separately in this synthesis to assure the capture of constructs that may be related but distinct. Of note, while all of studies measured and thus reported the symptom of fatigue, 59.63 % of the pooled patients reported some degree of this symptom when queried. Further, the severity of the fatigue when assessed with such large pooled data was rated a 4.62 on a 0–10 scale.

Symptom prevalence

Given the large spread of sample sizes, the variety of instruments utilized and the identification of 47 separate symptoms presented in Table 3, it was determined that the most appropriate method to rank symptoms in a systematic, unbiased, and interpretable fashion was to compare the symptom prevalence by aggregated sample size thresholds. Three thresholds were constructed for combined samples of greater than 100 patients, greater than 500 patients, and greater than 1,000 patients who were queried for the symptom. Thus the most inclusive threshold was 100 taking into account all symptoms in which at least 100 patients across studies were queried. Not included in this analysis then were symptoms such as irritability which was only assessed in a total of 85 patients in two studies or headache included in only one study of 31 patients.

As depicted in Table 5 depending upon the threshold level selected, the rank order of the prevalence changed. For a threshold of 100 patients, nocturia was the most frequently reported symptom followed by fatigue and cough. When the threshold was raised to a minimum of 500 patients assessed for the symptom, cough was replaced by insomnia/disturbed sleep. Finally, when raised to a sample threshold of 1,000 patients, fatigue, insomnia/disturbed sleep, and pain are the top three most reported symptoms. Thus, as will be discussed later, while nocturia was the most prevalent symptom with 74.8 % of patients reporting, it was only queried in 507 patients in two studies [45, 49].

Table 5 Rank ordering of symptom prevalence by sample threshold

Symptom severity

A similar approach was taken in evaluating the symptom severity and is presented in Table 6. In studies with a sample threshold of 100 patients, worry had the highest mean severity followed by sexual dysfunction and edema. By comparison, when the higher threshold of 500 patients was selected, most of the symptoms in the lower threshold of 100 patients were removed and by the 1,000 patient threshold, all but one symptom was replaced with the top three being fatigue, insomnia/disturbed sleep, and anorexia/appetite changes. As with the symptom prevalence, fewer studies assessed worry (three studies) [41, 46, 51] and sexual dysfunction (two studies) [45, 51] resulting in a smaller patient pool, yet the compelling severity reported may indicate the need for further assessment as a routine symptom measure.

Table 6 Rank ordering of symptom severity by sample threshold

Discussion

Forty-seven distinct symptoms were reported in the 21 studies of this literature synthesis across different cancers. Most studies employed instruments that assessed for symptoms of fatigue and weakness (100 % of studies), pain (100 % of studies), sleep issues (91 % of studies), anorexia and weight loss (91 % of studies), GI issues such as nausea and vomiting (81 % of studies), affect issues such as depression and irritability (76 % of studies), and respiratory issues such as cough and dyspnea (76 % of studies). Fewer studies assessed symptoms related to urinary elimination issues (19 % of studies), skin and wound issues (19 % of studies), hot flashes/sweating (14 % of studies), sexual dysfunction (10 % of studies), fever (14 % of studies), and hair loss/appearance (5 % of studies), yet the prevalence and severity of some of these symptoms was greater than the more commonly assessed symptoms.

This disparity is most obvious when evaluating the prevalence and severity by threshold levels. Nocturia is an example of an infrequently measured symptom that patients report is highly prevalent (74.8 %) and moderately severe (4.07 on scaled of 0–10). Further analysis of the two studies that assessed for nocturia reveals that one study was completed in the USA and the other in Turkey, that adequate sample sizes of 220 and 287 were included, and that various cancer sites were involved with lung being the predominant cancer [45, 49]. Given the heterogeneous nature of the studies, this finding suggests that nocturia is a symptom that has been infrequently assessed, but which necessitates further systematic evaluation and consideration as a symptom to assess in a broad range of cancers. Similar arguments can be made for sexual dysfunction [45, 51] and cough [45, 46, 48, 51, 53] which each were reported by patients in disparate studies (nation, cancer site, and sample sizes) as highly prevalent and of moderate to high severity.

Findings from this literature synthesis are most useful when compared with the research synthesis by Esther Kim et al. [25] Many similarities between the two syntheses exist including the similar mean age of 58 and 59 years, a predominance of heterogeneous cancer diagnoses (62 % and 50 %), and when reported for US studies, an approximate 18 % non-Caucasian sample. Differences include a greater range of sample sizes in this present analysis (16–1,433 compared to 26–527) and a greater proportion of females in this analysis (62 % vs. 48 %). Whereas Esther Kim et al. limited their review to only studies that employed the MDASI, MSAS, or the SDS, this present analysis included studies employing any symptom assessment tools.

Table 7 presents a comparison of the most prevalent symptoms in this synthesis to the top 10 identified by Esther Kim et al. These are presented in descending prevalence for this current synthesis. All but three of the symptoms (nocturia, outlook, and weakness) included in this synthesis were also included in the Esther Kim et al. synthesis. Unfortunately, the synthesis by Esther Kim et al. only published the aggregation of the top 10 symptoms. As presented in this table, while the vast majority of symptoms were assessed (as indicated by the instrument used), comparisons cannot be drawn because the pooled prevalence numbers for all but the top 10 symptoms were not published. In comparing our findings to those of Esther Kim et al., similarities in the prevalence of fatigue, insomnia, dry mouth, tiredness, feeling nervous, distress, and depression were noted; however, differences observed between the rates of irritability, pain, and worry are evident. Further, highly prevalent symptoms such as nocturia, lack of energy, outlook, cough, anorexia, dyspnea, and difficulty concentrating did not make Esther Kim et al.’s list of the most prevalent symptoms. The Kim study did not aggregate severity data.

Table 7 Comparison of prevalence statistics between this literature synthesis and Esther Kim et al.

Limitations and future directions

As with any literature synthesis, findings must be tempered by acknowledging publication bias as only published manuscripts were included in this literature synthesis. Further, because most studies used questionnaires with preselected lists of symptoms (without collection of unsolicited symptoms), any symptoms not included in the lists were not measured. Therefore, systematic under-reporting of symptoms which could be prevalent and severe in a population is possible in the included studies. An example of this is nocturia, which when elicited was prevalent and severe, but was infrequently systematically assessed in studies. Future studies should included a two-step methodology, starting with qualitative work in which patients in a given population are interviewed in groups or individually to determine likely prevalent and severe symptoms, followed by questionnaire administration in a larger sample. Over-representation of studies in breast, colorectal, and lung cancer may have yielded a disproportionate influence on the results, and work parallel to this paper supported by the NCI is evaluating prevalence and severity of symptoms by cancer type and stage.

It is unclear if symptoms reported in this analysis are attributable to the morbidity of cancer, to side effects of treatment, to accumulated toxicities of prior treatment, or to comorbidities. As previously noted, this is a synthesis of cross-sectional studies and longitudinal studies whereby symptoms in the longitudinal studies were average across the measurement times, and no control was made for cross-sectional sampling related to the time of symptom measurement (i.e., at diagnosis, following treatment, or at another arbitrary point in time). In addition, attribution is beyond the scope of this paper, but is a salient consideration because the impetus for measuring symptoms in a given clinical trial may be contingent on the cause of the symptom. For example, a trial seeking to evaluate whether a cancer-related symptom improves with active treatment may yield a negative result if the principal driver of measured symptoms is toxicity or comorbidity. Therefore, any given clinical trial aiming to measure symptoms should provide a rationale specific to the population and interventions regarding why particular symptoms were selected, and their suspected cause and hypothesized direction of change.

This synthesis is focused on the prevalence and severity of symptoms and did not assess measures of health-related quality of life domains such as enjoyment of life or physical functioning. Future work will evaluate these areas.

Conclusion

Symptoms are prevalent and severe among patients with cancer. Therefore, any clinical study seeking to evaluate the impact of treatment on patients should consider including measurement of symptoms. Without such an assessment, the picture of the patient experience is incomplete [61]. Symptoms may be due to various etiologies, and understanding their trajectories in a given context is essential to profiling both the benefits and harms of treatment. This study demonstrates that a discrete set of symptoms is common across cancer types. This set may serve as the basis for defining a “core” set of symptoms to be recommended for elicitation across cancer clinical trials, particularly among patients with advanced disease. Indeed, the NCI is currently engaged in such an activity, which serves as the impetus for this review. It is notable that a number of existing multi-symptom questionnaires already include a set of pre-specified symptoms. It is the authors’ hope that the data included in this review will assist in the design of future studies and questionnaires, and in improved methods for assessing the prevalence and severity of symptoms in cancer populations.