Background

Psoriasis is a chronic, non-contagious skin disease that commonly leads to appearance of red scaly patches on the skin. Psoriatic arthritis is a chronic, disabling inflammatory disease, associated with psoriasis. In psoriatic arthritis patients, the immune system attacks its own joints thus leading to joint destruction associated with cartilage deterioration, bone damage and joint fusion. Prevalence of the disease is around 2-3% of the world population. It causes considerable morbidity, significantly affecting the quality of life of those suffering from the disease[15]. Psoriasis is linked with psychological distress[6], depression[7, 8], pain and physical disability[9]. In addition it carries significant economic implications, due to direct costs of management and costs associated with productivity losses[1013]. Furthermore, there is some evidence to suggest that psoriasis and psoriatic arthritis may be associated with the development of heart disease, cancer and infections leading to premature death[1417].

There are a number of systemic treatments for psoriasis and psoriatic arthritis which have been examined in numerous randomised controlled trials (RCTs)[18, 19]. However, while RCTs are considered the gold standard for evidence-based decision making, it has been argued that observational studies have an important role in the measurement of effectiveness, longer-term outcomes, rare adverse events, and other outcomes requiring a more naturalistic study environment, for example the measurement of resource use and health-related quality of life (HRQOL)[20, 21]. RCTs are generally designed to test efficacy and safety. Although efficacy and effectiveness both address the issue of whether a particular intervention works or not, efficacy assesses whether an intervention works under optimal circumstances, whereas effectiveness assesses whether an intervention works in usual care. Effectiveness is meant to be a more pragmatic measure that addresses the utility of a drug as it is actually employed in practice, therefore to measure effectiveness it is necessary to mirror a real-world environment as much as possible. RCTs often use narrow inclusion criteria and exclude patients with specific co-morbidities. In addition, sample sizes can be restricted and follow-up periods short. Such design characteristics mean that RCTs often have low external validity (how results can be generalised to the wider population) which limits their use in guiding treatment in routine clinical practice.

An observational study, by definition, is a study in which the investigators do not seek to intervene, only observe the course of events. Changes or differences in one characteristic (e.g. whether or not people received systemic treatment) are studied in relation to changes or differences in other characteristics (e.g. whether or not HRQOL improved), without action by the investigator. Such studies have high external validity but lower internal validity than RCTs. Results are more generalisable, but it is more difficult to attribute differences in outcomes between comparison groups to the particular intervention or characteristic under observation because of potential differences in baseline patient characteristics or because of losses to follow-up. It is important therefore for observational studies to be well designed and constructed and employ techniques to minimise the susceptibility of bias. Of the three types of observational study (cohort, cross-sectional and case-control), the cohort study stands at the top of the hierarchy of clinical observational evidence as it measures events in temporal sequence and can thereby more easily distinguish cause from effect. It is the most appropriate method to measure incidence of specific events, the natural history of the disease, changes in health states and use of healthcare resources.

Observational studies can play an important role in the decision-making process. The National Institute of Clinical Excellence stresses that decision-makers need to assess and appraise all the available evidence regardless of whether it has been derived from a RCT or an observational study. In the United States comparative effectiveness research (i.e. the direct comparison of existing health care interventions to determine which work best for which patients and which pose the greatest benefits and harms) assesses effectiveness in patients typical of day to day clinical care and therefore the focus is on 'real life' studies rather than RCTs. Such comparative effectiveness research is being employed by the government to improve the quality of health care whilst reducing the rising costs. Both approaches have their strengths and weaknesses and it is important for decision-makers to understand these when using the evidence to inform them of the appropriate use of interventions in routine clinical practice[22]. Response to treatment in patients with psoriasis is unpredictable and often patients become resistant. This leads to individualised treatment regimes. The restrictive nature of RCTs would not necessarily highlight the outcomes that would be seen usual clinical practice where patients are often exposed to a number of different treatment regimes before response is achieved. Also as some of the treatments are associated with potentially serious side-effects, longer-term observational studies can provide important additional information to a variety of stakeholders including clinicians, payers, providers and patients when weighing up the risks and benefits of treatment.

We carried out a comprehensive review of large-scale, prospective, cohort studies conducted on patients with psoriasis and psoriatic arthritis. Our primary aim was to (a) summarise the design characteristics, the interventions or aspects of the disease studied and the outcomes measured and (b) investigate the methodological quality of included studies.

Methods

We included prospective, cohort studies which included at least 100 adults with psoriasis or psoriatic arthritis. We included 'treatment' studies that focused on a particular intervention, drug or group of drugs with any comparison and 'non-treatment' studies that assessed the impact on psoriasis or psoriatic arthritis on morbidity, mortality, resource use or HRQOL. We excluded all studies with an experimental element to them (RCTs, open-label studies and open-label extensions). We also excluded retrospective studies, cross-sectional studies, studies where patient's age was less than 18 years old and unpublished studies. We employed a cut-off of 100 patients to define large scale because (a) a recent health technology assessment of the management of psoriasis employed this cut-off for observational studies [23] and (b) other studies have used this as a cut-off to define large scale studies [24].

A systematic electronic literature search was conducted to identify published reports using the following databases; PUBMED (1965 to 2009), MEDLINE (1989 to 2009), Cochrane Library (which includes Cochrane reviews, other reviews, clinical trials, methods studies, technology assessments and economic evaluations), the Centre for Reviews and Dissemination Database (which includes the Database of Abstracts of Reviews of Effects, NHS Economic Evaluation Database, Health Technology Assessment Database) and one internet search engine (google). Additionally the following databases were reviewed to search for ongoing and planned studies; TRIP Turning Research into Practice database, National research register, ClinicalTrials.gov, Current Controlled Trials, Early Warning System and Salford database of psoriasis trials. Search terms combined disease terms (psoriasis and psoriatic arthritis) with study types (cohort, epidemiologic, follow-up, longitudinal, prospective, registries, Phase IV, observational). Studies were restricted to those in humans and in the English language. For example the search string used in PUBMED and MEDLINE was ("Psoriasis/drug therapy"[Mesh] OR "Psoriasis/economics"[Mesh] OR "Psoriasis/epidemiology"[Mesh] OR "Psoriasis/prevention and control"[Mesh] OR "Psoriasis/statistics and numerical data"[Mesh] OR "Psoriasis/therapy"[Mesh] OR "Arthritis, Psoriatic/drug therapy"[Mesh] OR "Arthritis, Psoriatic/epidemiology"[Mesh] OR "Arthritis, Psoriatic/prevention and control"[Mesh] OR "Arthritis, Psoriatic/therapy"[Mesh]) AND ("Cohort Studies"[Mesh] OR "Epidemiologic Studies"[Mesh] OR "Follow-Up Studies"[Mesh] OR "Longitudinal Studies"[Mesh] OR "Prospective Studies"[Mesh] OR "Registries"[Mesh] OR "Clinical Trials, Phase IV as Topic"[Mesh] OR "open label"[All Fields] OR "observational").

Titles and abstracts from the initial search were reviewed to identify relevant papers. A full paper review was then conducted on all those that met the general inclusion criteria. Reference lists of relevant studies were also hand searched to identify additional data. Contact with authors was not deemed necessary for the questions posed in this systematic review. Of those papers thought to be eligible, data on study characteristics were extracted and tabulated. Information collected included study design, objectives, patients, outcome measures, results, statistical methods and funding sources.

A quality assessment was conducted on all included studies. Although there are now guidelines on the reporting of observational studies (Strengthening the Reporting of Observational Epidemiological studies - STROBE)[25] which guide the author how to present their data, there are no consensus guidelines on quality assessment of such studies. A multitude of tools exist that claim to assess the validity of published observational studies[26]. We devised our own quality assessment tool based on a number of papers including the Downs and Black score system[27], the STROBE statement[25] and a recent systematic review of measures for assessing quality and susceptibility to bias in observational studies[26]. Each study was assessed against a list of 18 questions outlined in table 1. All results were summarised descriptively.

Table 1 Quality assessment tool

Results

A total of 1018 papers were identified from the combination of searches (Figure 1). Fifty-eight papers were obtained for full paper review, of which 35 papers were identified as eligible for inclusion into the review. Reasons for exclusion included study design was experimental[2843], sample size too small[44], the patients included were originally from a clinical trial[3, 4551], the study involved retrospective identification of patients[14, 15, 5254] and the study was just a description of medications used with no attempt to assess outcomes[55].

Figure 1
figure 1

Flow diagram of included studies.

The thirty-five papers relate to 16 observational studies, of which five were registry studies (Table 2). Nine were studies of psoriasis, six of psoriatic arthritis and one combined both conditions. Of the ten treatment-related observational studies only four evaluated biological agents; with the other six examining traditional therapies. Only two provided a comparison between two treatments. Of the six non-treatment related observational studies a number of characteristics of psoriasis and psoriatic arthritis were examined including mortality, morbidity, disease progression, cost of illness and aspects of HRQOL. Follow-up periods ranged from 3 months to 26 years.

Table 2 Characteristics of Included Studies

Table 3 outlines the clinical, patient-reported and cost measurements described in each of the studies. The main clinical outcome measure used in the psoriasis studies was the Psoriasis area and severity index (PASI), with some studies using the self-administered version of this measurement (SPASI). In the psoriatic arthritis studies the most common clinical measurements were those relating to tender and swollen joint counts with three of the studies using the disease activity score-28 (DAS) based on 28 tender and swollen joint counts. Eleven studies incorporated patient-reported outcomes into their analysis. The most common patient-reported outcome was the health assessment questionnaire (HAS) used in six studies, followed by the SF-36 used in five studies. One study used a questionnaire on experiences with skin complaints (QES) and one used the dermatology life quality index (DLQI). Only two studies assessed health utilities either using the EQ-5D or the SF-6D and again only two studies reported information on costs.

Table 3 Clinical, patient-reported and cost measurements reported in included studies

Overall the quality of the included cohort studies, measured against a checklist of 18 questions, ranged from 41% to 89% (taking into account the questions that were not applicable for certain studies) (Table 4). Figure 2 outlines the proportion of the 16 studies that met each of the quality assessment criteria. The studies in general did well on a number of quality assessment questions including having clear objectives, documenting selection criteria, providing a representative sample, defining interventions/characteristics under study, defining and using appropriate outcomes, describing results clearly and using appropriate statistical tests (where described). However, the studies fell short on a number of other quality assessment criteria. Only one study reported a sample size calculation or reported whether the sample size was sufficient for the study objectives. Only a third described potential selection bias. Around 50% described potential confounders and only a third adjusted for these potential confounders. Also, although over 60% reported losses to follow-up, less than a third made any adjustments for them in the analysis. Only around 60% of all studies identified and described a comparison group. Overall the proportion of studies meeting each quality assessment criteria ranged from 10% (sample size calculation and sufficient power) to 100% (patient characteristics described, validity of outcomes and results clearly described).

Table 4 Quality assessment of included studies
Figure 2
figure 2

Proportion (%) of studies meeting each of the quality assessment criteria.

Discussion

Three important points can be concluded from this systematic review of large scale, prospective, observational studies conducted in patients with psoriasis or psoriatic arthritis. First, very few large-scale, prospective, observational studies have been conducted given the burden of these diseases on society and the recent introduction of biologic agents onto the market, with only two assessing a drug versus drug comparison. Psoriasis is the most prevalent autoimmune disease in the United States. It affects 125 million people worldwide (2-3% of the total population). Between 10 and 20% of people with psoriasis will develop psoriatic arthritis[56]. These conditions cause significant morbidity and have been associated with an increased risk of mortality compared to the general population[17]. They significantly affect a patient's HRQOL and ability to carry out normal activities[4] and the cost burden to society is substantial. In the United States psoriasis alone costs society $11.25 billion annually, with work loss accounting for 40% of this cost burden[57]. The recent introduction of biological therapies represent an important addition to the approaches used in the treatment of psoriasis and psoriatic arthritis, however very few studies have assessed these agents in real-life situations compared to the more traditional treatments where many patient and provider factors, not present in clinical trial environments, can impact on effectiveness[58]. Also, in some countries these agents are registered for use in specific target groups of patients where evidence of efficacy and safety are not provided by currently published clinical trials[58]. Finally, clinical trial data only provide short-term evidence of efficacy and safety in a highly selected group of patients. For all these reasons large scale, long-term observational studies in real-life situations are needed to guide appropriate clinical and policy decision making.

Second, given the importance of collecting health economic data in a real world environment[20], very few observational studies collected data on economic outcomes or patient utilities. In the general hierarchy of clinical evidence in healthcare decision making, RCT's remain the gold standard for evaluation. However, there are a number of situations where such studies may be unnecessary, inappropriate, impossible or inadequate[21]. The measurement of the effectiveness of a treatment, the longer-term outcomes of treatment (clinical and patient-reported), the true incidence of adverse events, and resource use associated with treatment and its side-effects are all situations where a RCT design is inadequate. RCT's often use patients, treatments and healthcare professionals that are all atypical and in addition are often short-term. Resource use and patient utilities observed in RCTs may not reflect that likely to be observed in regular clinical practice, not least because closer monitoring of patients in a trial may lead to events being detected and treated sooner than would otherwise be the case. This higher level of care may result in a small number of patients not experiencing high cost events that would be seen in everyday practice. In economic terms this is important since economic data is often highly skewed. The removal of a few observations with very high costs can have a large effect on overall health economic results. Also, RCTs are often conducted in specialist centres. The recorded resource consumption seen in the trial will therefore reflect the practice policies of this particular health care setting which may be very different to usual clinical practice. It is in such situations that observational cohort studies would provide more appropriate and informative health economic information if conducted and analysed rigorously.

Third, of those studies included in this review overall quality assessment was in general satisfactory, however the majority of studies failed to take into account and adjust for potential biases caused by lack of randomisation. Studies scored poorly on describing potential selection biases, identifying a comparison group, adjusting for confounders and losses to follow-up and providing adequate sample size calculations. The key question posed in cohort studies is the comparison of outcomes between two groups of patients (e.g. those responding to treatment vs. those not responding to treatment). Just over 60% of the studies in this review actually defined a comparison group, be it the general population or a more restricted internal or external population. For those studies not providing a comparison it is almost impossible to assess whether the results occurred by chance. Of those reporting a comparison group most studies reported potential selection bias however only half accounted for confounders and only a third accounted for losses to follow-up. In those studies not addressing these issues of potential bias, results are likely to have very low internal validity. Adjusting for the potential bias caused by lack of randomisation is critical to the validity of cohort studies[5961].

When interpreting the results of this systematic review it is important to note three issues. First, it is difficult to systematically search for observational studies as search strategies that are both sensitive and specific do not exist for the major electronic databases. To overcome this problem we conducted a wide search and hand-searched reference lists of key papers. Second, consensus guidelines on the reporting of observational studies (STROBE) have only recently been introduced[25], therefore for many studies published prior to these guidelines it is often difficult to identify if the paper is a true observational study or not. Many studies stated they were observational, but in actual fact incorporated an experimental or 'open-label' element to them. Third, the cut-off of 100 patients to define large scale may have meant other important observational studies were excluded. However, only one study was excluded on the basis of sample size [44].

Large scale, prospective cohort studies are not the only non-randomised method for capturing real world health economic data. They are however, if conducted rigorously one of the best approaches to use, especially for non-rare outcomes over a relatively short period. A number of cross-sectional and case-control studies assessing cost, effectiveness and HRQOL have been conducted in patients with psoriasis and psoriatic arthritis. Cross-sectional studies are useful for assessing prevalence and describing specific characteristics of the disease, for example clinical and demographic characteristics, patient and provider perceptions of effectiveness, tolerability and compliance. However, unless they incorporate a retrospective element into their design, they are unable to distinguish between cause and effect and therefore are inappropriate for the measurement of effectiveness and health economic outcomes associated with an intervention. Retrospective elements to observational studies, for example retrospectively identifying patients or retrospective data collection (as in case control studies) introduces an additional level of bias and are therefore often used for more descriptive studies or hypothesis generation that can then be studied in a prospective observational study.

Looking outside of true observational designs to studies which are non-randomised but incorporate an experimental element to them, we find a number of 'open-label' trials, some aiming to assess longer-term outcomes and others aiming to assess effectiveness in a more naturalistic setting. These studies are not observational, although many claim to be. They are experimental in that patients have been selected for inclusion into the trial and administered the trial treatment. Given that these studies are not governed by any consensus guidelines on reporting or quality control, the potential for risk of error or bias is high and the results should be interpreted with caution. Included in these designs are 'open-label' extension studies. Patients represent a highly select group that have not only been selected on the basis of the original RCT, but are also those who have completed the randomized element to the trial and agreed to participate in the extension study. Such selection processes not only introduces significant bias, but also lowers even further the generalisability of the results to a wider population. In such studies the use of inferential statistics to allow for the possibility of sampling or random error to be the reason for the observed difference is crucial. However, in most extension studies assessing effectiveness in psoriasis or psoriatic arthritis no such inferential statistics have been carried out[6264]. Also included are 'open-label' studies which adopt a non-randomised approach from the start of the study. Again these studies should be interpreted with caution for two main reasons; first, treatment is experimental and has therefore been selected by an investigator not independent from the study and second, the patient will know which treatment they are being given. Both actions will introduce inadvertent bias into the outcome assessment. Furthermore it is essential that such studies conform to the same rigorous methods expected of true observational studies in that the bias created from non-randomisation should be defined, explored and adjusted for. Currently, apart from one 'open-label' phase IV study assessing health economic outcomes which does account for confounding[34], most of the others don't[29, 36, 41].

Conclusion

There is a clear need for well designed, large-scale, prospective observational studies in the field of psoriasis and psoriatic arthritis particularly to assess the impact of traditional and biological agents on economic and patient-reported outcomes and the factors that influence them, such as resistance and adherence, in a real world environment. Several population-based registries are currently being set up for both psoriasis and psoriatic arthritis[58, 6567]. However, while such registries will no doubt provide invaluable evidence on the long-term risks and benefits of new and old treatments, they fall short of providing adequate information on health economic outcomes. The recommended core datasets for registries include effectiveness measures[58, 67], HRQOL measures[67], but no patient utilities and insufficient information with which to measure health care resources or work productivity. Future observational studies measuring such outcomes would be a welcome addition to the scientific literature in this area and would provide invaluable information to patients, clinicians and policy makers.