Epidemiology can be defined as the study of the occurrence and distribution of disease and its determinants [1]. In a broader approach, the areas of research in epidemiology include disease definition, occurrence, causation, outcome, management, and prevention. The occurrence of a disease may be studied in relation to factors that can identify or predict the disease (diagnostic factors) or are thought to influence its occurrence (eg, prognostic or etiologic factors). Furthermore, the association between a particular intervention and a change in the occurrence of the disease is of importance [2].

In a recent review on rheumatic diseases, Gabriel and Michaud [3•] acknowledged that until recently, very few studies had been conducted on the epidemiology of gout. Notwithstanding the rising incidence [4, 5], the burden of disease associated with gout and the frequency of gout as a comorbidity in patients with multiple morbidities highlight the importance of gaining a better understanding of the etiology, pathology, and management of gout through epidemiologic studies.

This article focuses on the challenges of epidemiologic studies that aim to estimate the occurrence of gout in terms of prevalence or incidence. Such studies require large sample sizes, especially in heterogeneous populations. However, it is not only sample size that needs consideration when conducting epidemiologic studies on the occurrence of gout. This article considers some other deliberations. First, the definitions used in studies of gout are discussed, as well as the differences between diagnostic and classification criteria. Next, the criteria regularly used in epidemiologic studies are reviewed and critically appraised. Finally, several additional challenges encountered when interpreting results of epidemiologic research on gout are addressed.

Definition of Disease

The gold standard to diagnose gout is to demonstrate the presence of monosodium urate monohydrate (MSU) crystals in synovial fluid at the time the patient experiences a gout attack [6]. However, this is easier said than done in clinical practice, as gout patients are often seen by general practitioners [7] or specialists other than rheumatologists, who rarely perform a synovial fluid analysis to demonstrate urate crystals [8]. Reasons for nonperformance include lack of expertise, limited access to polarizing microscopes, lack of time, or the concern of getting a “dry tap” [9]. If a joint aspiration is performed, several difficulties remain, as both false-positive and false-negative results may occur [10]. In some cases, cholesterol crystals may appear as needle-shaped birefringent crystals [11, 12]. Furthermore, it is debatable whether one can diagnose a patient as having gout when only one or two MSU crystals are seen, or whether one should perform a second joint aspiration if no crystals are detected the first time. On the other hand, Lumbreras et al. [13] showed that when observers are trained in crystal detection and identification, their results are usually consistent.

Although synovial fluid examination remains best practice in clinical medicine, in the context of epidemiologic studies, this may not be feasible, especially in view of the intermittent nature of gout and the sample sizes that are needed in population studies.

In studies, the terms gout flare, chronic gout, and acute gout are regularly used. However, comparing their definitions, if they are provided at all, these terms seem to be used inconsistently by different authors. To facilitate the comparability of study results, clear definitions of the various manifestations and stages of gout therefore are needed.

Taylor et al. [14••] proposed key components of a standard definition of gout flares using the Delphi methodology. The final list of elements includes a swollen, tender, and warm joint; patient self-report of pain and global assessment; time to maximum pain level; time to complete resolution of pain; functional status; and an acute-phase marker. This definition specifically aims to be used in clinical trials.

Another frequently used term is chronic gout. Interestingly, although a core set of outcome domains to be used in clinical trials was proposed by OMERACT (Outcome Measures in Rheumatology Clinical Trials) [15], the group did actually not provide a clear definition of chronic gout but agreed on serum urate, gout flare recurrence, tophus regression, joint damage imaging, health-related quality of life, musculoskeletal function, patient global assessment, participation, safety, and tolerability as core outcome domains to be assessed in trials on gout.

For Choi et al. [16], acute gout is typically intermittent, whereas chronic tophaceous gout develops after years of acute intermittent gout. However, they point out that tophi can also be part of the initial presentation. This is in line with the American College of Rheumatology (ACR) criteria for acute gout, which incorporate the item “more than one attack of acute arthritis” as well as suspicion of a tophus.

Others state that after a certain undefined number of attacks, a patient has reached a stage called recurrent gout. Then the attacks come more frequently and last longer. If a patient cannot recover from the flares, it becomes chronic gout. In that case, there is an almost permanent state of inflammation and pain. Some authors have a more inclusive definition of chronic gout, incorporating all patients who have had more than one attack of acute gout. Acute gout, in that case, is synonymous with gout flare. The underlying reasoning is that once a patient has had a flare, a persistent metabolic disorder exists.

In addition to these deliberations, one might think of gout as a continuum of increasing severity. Currently, a clear definition of severity is lacking. A first step would be to define the domains that are of importance when deciding on the severity of gout. Of interest is a recent study that explored which variables are associated with patients’ as opposed to physicians’ assessment of gout severity [17]. It was found that physicians base their judgment of severity on the presence of tophi, frequency of gout attacks during the past year, recent serum uric acid levels, and rheumatologist utilization, whereas patients’ judgment of severity is associated with concerns regarding gout during an attack and the time since the last gout attack. It will be a challenge to try to measure each of these domains and to define thresholds to distinguish levels of severity based on the selected domains. This may also include addressing issues such as total load of uric acid and total load of tophi.

In summary, the question remains whether intermittent or recurrent gout must be distinguished from chronic gout and, if so, at which point acute or recurrent gout develops into chronic gout. Can we speak of chronic gout after a number of attacks or after a specific period of recurrent attacks, or only when a patient has bone destruction and chronic synovitis, even during remission of acute flares? Should we make a distinction between chronic gout and tophaceous gout, and should we distinguish different levels of severity of gout?

Currently, no clear answers to these questions exist, and the lack of insight into the natural course of gout among all patients complicates this issue. Some patients may not recall attacks of symptomatic gout, and patients with tophaceous gout may no longer experience acute attacks. This underlines the problem of assessing the true prevalence of gout. Moreover, the gold standard for diagnosing gout does not discriminate between different “stages” of gout. Clearly, a need exists for consensus on definitions that help distinguish the different manifestations of the disease.

Diagnostic Criteria Versus Classification Criteria

The title of this article may seem contradictory because in epidemiologic studies, no diagnostic criteria other than classification criteria are applied. Classification criteria aim to define homogeneous groups of patients with a particular disease. These criteria can be used to select patients for clinical (interventional) studies, to compare the results of clinical trials, or to assess the occurrence of a disease in epidemiologic studies [18]. In contrast to diagnostic criteria, classification criteria do not have the purpose of early detection of a disease in an individual patient [19]. Instead, classification criteria are used to detect established cases.

As for diagnostic tests, calculating the sensitivity and specificity assesses the usefulness of criteria. Sensitivity is the percentage of individuals with a certain disease correctly classified as “ill” (true positives). The percentage of individuals without the disease correctly labeled as “not ill” is the specificity (100% - percentage of false positives). If the sensitivity and specificity of criteria are both 100%, diagnostic and classification criteria are the same [19]. Note that diagnostic criteria would require sufficient sensitivity in early stages of the disease to enable early diagnosis. However, the nature of medicine makes it unlikely that there will ever be tests that offer 100% sensitivity and specificity. Therefore, misclassification poses a challenge, and the type of misclassification that is least desirable will depend on the setting in which the test is applied.

In health care, physicians must identify which disease a patient has rather than whether a disease exists at all [19]. They do not want to misdiagnose a patient and may prefer high sensitivity against acceptable specificity. In contrast, in epidemiologic studies in large populations, to study homogeneous groups that are likely to have the diagnosis of interest but do not include many false positives, the researchers must balance specificity and sensitivity and will often sacrifice part of sensitivity against better specificity.

Of course, any misclassification, which is common in classification criteria, is undesirable [19]. An approach to minimize misclassification is the use of cut-off points. In deciding on a cut-off point, one has to choose between a sensitive approach—involving false positives—and a more specific approach that results in a more homogeneous group and more false negatives [19].

Classification criteria often are developed by comparing groups with the disease of interest with control patients having other (usually related or resembling) diseases that should be taken into account in the differential diagnosis. However, one must keep in mind that if these criteria are applied in population studies, the positive predictive value (PPV) may decrease, especially when the prevalence of the disease of interest is low. The PPV is defined as the number of individuals with a true positive test result divided by all individuals with a positive test result (true positives + false positives). In other words, it indicates the probability that in case of a positive test, the patient truly has the specified disease. The value of PPV depends on the prevalence of the disease of interest in the particular setting and will decrease when the prevalence goes down, due to the increasing number of false positives.

Thus, when applying the criteria for gout, a disease with a relatively low prevalence at the general population level, it is important to keep in mind that the estimated prevalence may be overestimated due to the unintended inclusion of false positives.

Overview of Criteria

In this section, criteria to assess the prevalence of gout in epidemiologic studies are described. For this purpose, PubMed was searched using the search terms “gout,” “incidence,” “prevalence,” and “epidemiology.” Only original articles describing the prevalence and incidence of gout were considered. The EULAR (European League Against Rheumatism) criteria, which are purely diagnostic criteria and intended for use in individual patients with arthritis and not for use in groups, are excluded.

In 1963, the Rome criteria for gout were proposed during a symposium on population studies (Table 1). These and the 1966 New York criteria, which are a modification of the Rome criteria, are based on expert opinion and aimed for application in epidemiologic studies (Table 1) [18]. They rely heavily on the presence of tophi and the observation of MSU crystals in synovial fluid, which causes some feasibility issues. This is probably why both criteria sets are rarely used in large epidemiologic studies. The Rome criteria have only been used in two population studies assessing the prevalence or incidence of gout, once as interview [20] and once as questionnaire [21], and both times in combination with a physical examination. The New York criteria have been used in several population studies in the same way as the Rome criteria [2124].

Table 1 Classification criteria for gout

It should be noted that for use in population surveys, the items that make up criteria likely need to be rephrased into questions that are answerable by patients using questionnaires or participating in interviews.

Currently, the most frequently used methods to identify people with gout in epidemiologic studies are the ACR criteria—former American Rheumatism Association criteria—using an interview approach and the ICD-9.

The ACR criteria for gout have been developed to achieve a uniform system for reporting and comparing data from studies (Table 1) [6]. They have been developed by comparing different sets of criteria among gout patients and patients with classic rheumatoid arthritis of 2 years’ or less duration, definite or classic rheumatoid arthritis of more than 2 years’ duration, pseudogout, or acute septic arthritis. All have been diagnosed by rheumatologists. As such, the ACR criteria for gout focus on acute arthritis of primary gout and can be used in single patients as well as in population surveys [6].

In large studies on the occurrence of gout, the ACR criteria are often applied by interviewing patients with or without a standardized questionnaire, or by chart review of medical records. Although the ACR criteria were developed for the diagnosis of acute gout, they also have been used to identify so-called chronic gout, when patients fulfill the item tophi or radiographic abnormalities. Compared with the Rome and New York criteria, the ACR criteria rely less on the presence of tophi or identification of MSU crystals and even allow classification based on clinical criteria alone. Malik et al. [25] applied the ACR, New York, and Rome criteria in patients who had joint effusions in the setting of a rheumatology clinic. They asked patients whether they had experienced any of the clinical features of these three sets of criteria. The researchers found the highest specificity (89%) and PPV (77%) for the Rome criteria. However, the criteria were slightly less sensitive (67%). The New York criteria showed sensitivity and PPV of 70% and specificity of 83%. The ACR criteria (6 of 12 clinical items) had 70% sensitivity and 79% specificity and a PPV of only 66%. Clearly, one should not extrapolate such findings to an epidemiologic population study, because the PPV varies with the pretest probability, which, as mentioned previously, is highly dependent on the prevalence of the disease. Janssens et al. [26] compared the ACR criteria with synovial fluid analysis as a gold standard in monoarthritic patients presenting to primary care. Only patients who were suspected of having gout were included in the study. They found a PPV and a sensitivity of 80%, while specificity was 64%. According to Janssens et al. [26], these findings stress the importance of interpreting with caution the results of gout studies that made use of the ACR criteria.

A common method used to estimate the prevalence of gout is the use of large medical databases that have registered diseases by ICD-9 coding. Examples of databases used in gout research include medical patients’ record systems, administrative claims, and insurance programs. Advantages of such databases are the large numbers available at low expenses and the efficient time investment. A disadvantage is that it remains unclear how the diagnosis was made by the variety of health professionals and how to generalize the results because the denominator is often unclear. Malik et al. [27] evaluated the possibility of documenting the accuracy of the ICD-9 code for gout in three databases (National Patient Care Database, Pharmacy Benefits Management Database, and the Clinical and Administrative Database) by identifying patients with two ICD-9–coded encounters for gout during a 6-year time period. They found that identifying the items of the ACR, New York, or Rome criteria in medical records could not validate the majority of gout diagnoses recorded by ICD-9. This discrepancy may be caused by inadequate documentation in medical records, inaccurate diagnostic coding, or the inappropriateness of current criteria. According to Malik et al. [27], it is the poor documentation in medical records rather than inaccurate diagnostic coding. Harrold et al. [28] analyzed a random sample of medical records of patients with two or more coded diagnoses of gout from four managed care plans. The PPV of two or more ambulatory claims (during a time period of 5 years) for a diagnosis of gout was assessed using the investigators’ rating of the presence or absence of definite or probable gout as the gold standard. The PPV turned out to be 61%. Substantial improvement in the PPV was not achieved by increasing the number of visits to three or four. Explanations for the disappointing PPV include the ambiguity of a diagnosis of gout compared with, for example, a more firm diagnosis of myocardial infarction; the assignation of an ICD code before the diagnosis was firmly established; the underutilization of synovial fluid analysis; and inadequate documentation in medical records [28].

Self-reported disease, sometimes completed with information from other medical sources, is often applied in epidemiologic studies of gout. However, it is difficult to distinguish between questionnaires that inquire about physician-diagnosed gout and questionnaires that inquire about manifestations typical of gout based on the existing criteria described above. Furthermore, such questionnaires vary in the time frame in which gout occurred, which can be one or more attacks at some point in the past or several attacks in a specific (limited) period of time preceding the survey. Miller et al. [29] analyzed the agreement between self-reported diseases and ICD-9 coding. These authors indicated that self-report is fairly reliable. However, only 50% of self-reported cases of arthritis could be confirmed by ICD coding [29]. Reasons for this lack of agreement may be that responders have interpreted a question incorrectly or have recalled a diagnosis that was not actually established or was recalled inaccurately. However, if patients are not receiving medication or other treatment, the diagnostic code for a certain condition may not be written down in the record [29]. Miller et al. [29] pointed out that acute events that occurred in the past and conditions that are episodic in nature are not always captured if the reviewing period is too short.

It should be noted that questionnaires are often operationalized through an interview approach. Although self-completed questionnaires may cover a large population in a relatively short time period at low cost, the downside is a possible low response rate. Using an interview approach, it is possible to ensure that all questions are answered in the correct manner. However, this method is more prone to interviewer bias and interviewer variability [1]. Bergmann et al. [30] reported that the agreement between a face-to-face interview and a self-administered questionnaire was moderate (κ, 0.61). Less serious, less defined, or less persistent diseases such as gout may be perceived by patients as not being important enough to report in questionnaires [30].

Of interest might be the diagnostic rule for acute gouty arthritis recently developed by Janssens et al. [31•]. It is intended to be applied in primary care and obviate joint aspiration. Based on validated clinical variables using synovial fluid analysis as a reference test, a multivariate logistic regression model was developed. Hereafter, they developed two models based on external knowledge and availability of the tool in clinical practice. Their final model includes seven variables: male sex, previous patient-reported arthritis attack, onset within 1 day, joint redness, first metatarsophalangeal joint involvement, hypertension, or one or more cardiovascular diseases and serum uric acid level exceeding 5.88 mg/dL (0.35 mmol/L). Although developed for use in primary care, the diagnostic rule may be useful in a research setting. However, it is not known how well this model performs in a population study (work in progress).

Interpretation of Results

In addition to the above considerations that are critical in the appraisal of data on the occurrence of disease, several other issues merit consideration when appraising the results of such studies [32]. First is the question of which type of epidemiologic measure of occurrence was applied. The nature of disease will influence the relevant study design and measure of occurrence that is most informative [1]. As acute flares of arthritis mainly characterize gout, the point prevalence estimate (Table 2) is not likely of primary interest. In fact, in a cross-sectional population study, the chance that someone will suffer from a gout attack at exactly the time of the survey is low. In this case, the period prevalence, which represents individuals who experienced one or more episodes over a specified period preceding the survey (Table 2), will be more informative. Only if “chronic gout” would be described in terms of persistent joint inflammation, presence of irreversible joint damage, or presence of tophi would the point prevalence be interesting.

Table 2 Conceptual framework of incidence and prevalence in studies of gout

Another important concept in epidemiologic studies is the incidence or incidence rate, which refers to the number of new cases of gout in a population (Table 2). Cumulative incidence refers to new cases of gout per year divided by all members of a cohort (ie, a closed population) who are at risk (ie, who never experienced any signs of gout before the observation period). In contrast, incidence density refers to new cases of gout per person-year in a dynamic population, such as the inhabitants of a region or municipality (Table 2).

It is also important to carefully consider the population that has been studied [2]. As for any epidemiologic study, the sample should be a correct representation of the population of interest; this requires insight into the sampling frame and participation rate. The participation rate should be at least as high as 80%; however, a rate between 60% and 80% with a description of the nonresponders is often considered acceptable. Furthermore, to guarantee the representativeness of the samples in studies on gout, it is important to take into account sources of (selection) bias, such as the age, sex, and race of the study population. Determining the prevalence in a preponderant older male population will overestimate the occurrence of gout in the general population. One also should be aware of confounding factors such as alcohol consumption, body mass index, and comorbidities. Obesity, diabetes, and hypertension are common among patients with gout [33], and the prevalence of the metabolic syndrome is higher than in patients without gout [34, 35].


Although the pathophysiology of gout is relatively well-understood, it is surprisingly difficult to define good classification criteria for use in large population studies to validly assess the prevalence and burden of gout. This is partially due to the nature of the disease, which is typically intermittent, which limits the ability to use MSU crystals as the gold standard in large epidemiologic studies.

Also challenging is the absence of clear insight into the natural course of the disease, which would require better definitions of the manifestations and stages of gout. Although there is general agreement that gout is likely the result of a longstanding metabolic disorder that eventually leads to clinically manifest gout, it is less clear how many patients will progress to tophaceous gout and develop joint damage.

In view of the aforementioned considerations, literature data on the prevalence of gout are surprisingly consistent. In developed countries, estimates vary between 1% and 2% [36]. Nevertheless, as discussed previously, different levels of misclassification must have occurred in these studies. This will hamper interpretation of the results, especially in light of risk factors and comorbidities associated with gout. More precise estimates of the prevalence and burden of gout require addressing the validity of classification criteria and proper definitions of the various manifestations and stages of severity of gout.

McAdams et al. [37] reported recently that self-report of physician-diagnosed gout has good reliability and sensitivity and that this method may seem appropriate for epidemiologic studies.