Background

Why is a systematic review needed?

Gradients in the incidence of a disorder across time and across place can provide powerful clues to help unravel the aetiology of that disorder [1]. Numerous studies over the last few decades have presented original data on the incidence of schizophrenia, and several scholarly reviews have collated these studies [25]. Until recently, reviews of observational and experimental studies have used standard narrative approaches. However, over the last decade there has been a growing appreciation that reviews should be based on data that are as complete and as free of bias as possible [6]. Data from treatment or intervention studies have often been pooled using meta-analysis [7]. Even without pooling of data, the orderly sorting of data with meta-analytic techniques can provide useful insights into the structure of the relevant literature [8]. The need for such an exercise with respect to the incidence of schizophrenia was recognized by Jablensky [5] in an influential review: "Ideally, a meta-analysis involving a standardized recalculation of the rates from many previous studies should generate a distribution allowing one to estimate with some probability the extent to which populations differ" (p 219). We believe that a systematic review of the incidence of schizophrenia will help inform the research community, and thus contribute to the generation of hypotheses about the etiology of schizophrenia.

Ways to measure the incidence of schizophrenia

The incidence of a disease is a measure of the number of new cases that occur in a population over a given period of observation (see Additional File 1 for a glossary of definitions of terms and abbreviations used in this review). Traditional incidence studies (henceforth referred to as core studies) establish a rate based on the general population residing within a defined catchment area. National or health district catchment areas have often provided convenient boundaries regarding access to services and minimal out-of-area leakage. However, it should be noted that the boundaries chosen for incidence studies may not be optimal for the detection of variations of the disorder within or between various populations. In other words, there is no reason to believe that the genetic or environmental factors associated with the incidence of schizophrenia respect political or health-district boundaries. Nor should we expect that these risk factors are uniformly distributed within such boundaries.

Apart from general population-based studies, there are other studies that provide insights into the incidence of schizophrenia in subgroups of the general population. One type of population subgroup is the cohort defined by the year of birth. Subjects born in a particular time period (for example, 1966), can be identified and then tracked prospectively in order to identify the number of individuals who subsequently develop schizophrenia by a certain age. Cohorts can also be defined retrospectively. For example, a researcher may use a population register to identify all adults within a defined catchment area (for example, national boundary) who were born within a certain time period. Cross linking the population register with a mental health register can be used to identify cohort members who received treatment for schizophrenia up until a certain date. Cohort studies can provide information of the incidence of a disorder up to various ages within that cohort.

In addition to cohorts, the incidence of schizophrenia can also be examined in subgroups of the population defined by other criteria. Reporting the incidence of schizophrenia in subgroups of the population is standard practice in core incidence studies. For example, most studies report incidence rates for persons (that is, the general population) and then males and females separately (that is, subgroups of the population). In recent years there has been interest in subgroups defined by migrant status and risk of schizophrenia. Some studies report incidence rates in certain subgroups only (for example, twins, the deaf, those who belong to a certain religion or ethnic group, those aged 65 years or older). We have labelled these studies as Other Special Groups. Rates from migrant and Other Special Groups will be assessed separately in this review.

Studies from countries with comprehensive cross-linked national registers have been able to present incidence data based on person-years. Rather than present the number of new cases per 100,000 per year, these studies present the number of new cases that have accumulated over several years against a person-years denominator. This denominator takes into account the size of the underlying population sample, its age structure and mortality over the duration of observation. These studies will be considered separately to the core studies.

Finally, a systematic review such as this has to decide where to 'draw the line' between studies primarily designed to assess the incidence of schizophrenia, versus those studies primarily designed to compare incidence data with respect to the presence or absence of a candidate risk factor. For example, a study may compare the incidence of schizophrenia in a cohort exposed to different levels of urbanicity at birth, season of birth and family history [9]. Data related to the incidence of schizophrenia can often be found in studies exploring candidate nongenetic risk factors such as prenatal famine [10], influenza [11] and obstetric complications [12]. Such studies can provide additional data related to the incidence of schizophrenia but will not be covered in this review.

Key research questions related to the incidence of schizophrenia

What is the range of incidence rates?

The most influential study of the incidence of schizophrenia has been the WHO 10 Nation study [13]. This landmark study, which employed uniform methodology across sites, provided incidence data from eight sites (seven nations). When narrow criteria for schizophrenia were used (CATEGO S+), the incidence ranged from seven to 14 per 100,000 (Aarhus, Denmark to Nottingham, UK, respectively), while the range for ICD9 schizophrenia was 16 to 42 per 100,000 (Honolulu, Hawaii to the urban Chandigarh, India, respectively). Both definitions found at least a two-fold difference between the highest and lowest sites, and this difference for the broad (but not narrow) definition was statistically significant. Regardless of whether the incidence rates for the eight sites were significantly different, the authors of the study drew attention to the relatively narrow range of incidence rates identified in this study [13].

The results of the WHO 10 Nation study have often been misinterpreted as providing strong proof that the incidence of schizophrenia does not vary between sites. For example, Crow has stated, "The evidence points to the singular conclusion that, contrary to almost any other common condition, the incidence of schizophrenia is independent of the environment and a characteristic of human populations." (p119) [14]. In a recent narrative review, Jablenksy [15] restates the issue more concisely: "The general conclusion is that according to the great majority of studies, the prevalence and incidence rates of schizophrenia are similar across populations. However, a small number of populations have been identified that clearly deviate from this central tendency. The magnitude of these deviations is modest compared with the difference observed across populations with regard to other multifactorial diseases such as diabetes, ischaemic heart disease or cancer, where 10- to 30-fold differences in prevalence across populations are not uncommon." (p212). This systematic review will examine the central tendency and spread of data related to the incidence of schizophrenia.

Sex differences

Several reviews have drawn attention to sex differences in key epidemiological features of schizophrenia [1618]. Recently, a systematic review and meta-analysis reported a higher incidence of schizophrenia in men versus women [17]. This systematic review will compare the distributions of rates for males versus females, and examine the central tendency and distribution of the male to female risk ratio.

Urban versus rural place of residence

There is evidence that urban settings are associated with higher rates of schizophrenia. While there are post-onset factors such as selective migration that may contribute to a higher prevalence of schizophrenia in cities, it is not clear if urban living is associated with a higher risk of developing schizophrenia. Building on the classic studies of Faris and Dunham [19], several recent studies have reported that being born in an urban versus rural region was associated with an increased risk of developing schizophrenia [9, 2023]. In addition, there is evidence from a Danish study that the number of years spent in an urban area during childhood increased the risk of developing schizophrenia [24]. However, because of the general urbanization of most nations over recent decades, it is difficult to disentangle the critical window of exposure for urbanicity [25]. A study from the Netherlands [26] looked at urbanicity of residence around the time of first admission as well as urbanicity at time of birth. This study reported that for those born in rural regions, urban residence around time of onset was not associated with an increased risk of schizophrenia. However, urban birth with or without urban place of residence around the onset of illness was associated with an increased risk. This systematic review will separate out incidence studies from urban, rural and mixed urban/rural setting and compare the distributions of these rates.

Migrant status

Since the pioneering studies by Ödegård in 1932 [27], the association between migrant status and increased risk of schizophrenia has stimulated a great deal of research and a wealth of creative explanatory hypotheses [28]. More recently the issue has been put back on the agenda by research from the United Kingdom showing higher rates in both Afro-Caribbean migrants [29] and other migrant groups [30]. While studies from the Netherlands [31], Denmark [32] and Sweden [33] have supported an increased risk of schizophrenia in migrants, not all studies have found this association [34, 35]. This systematic review will compare the distributions of rates for native-born individuals versus migrant groups, and examine the central tendency and distribution of the migrant to native-born risk ratio.

The influence of methodology on incidence rates

Systematic reviews can explore possible sources of heterogeneity in data sets by sorting the data according to various rules and comparing the resulting distributions. Design features clearly need to be taken into account when comparing rates derived from different types of incidence studies.

When comparing rates derived from different catchment areas, it is important to note the population age structure within each site. The age of onset of schizophrenia is not randomly dispersed across the lifespan (peak onset in the second and third decades of life), thus sites with a younger population will yield higher incidence rates. In order to deal with this issue, some studies use techniques to adjust the raw incidence rate against a standard or reference population age structure, that is, age standardization. This provides a more valid way to compare the incidence of a disorder in two or more populations with different age distributions.

Apart from corrections related to age standardization, completeness of case identification is a critical feature in incidence studies. Thus one would predict that studies that cover a wider spectrum of potential recruitment sites (for example, inpatient and outpatient settings, general practitioners, other health care providers) would be able to identify more new cases. Method of diagnostic confirmation (for example, face-to-face interview versus chart diagnosis), diagnostic criteria (CATEGO S+ versus ICD9), age-range (all ages versus 15 to 54 years) can also influence the rate of schizophrenia [13].

In the field of systematic reviews, scores are often allocated to reflect desirable features related to the validity of the study [36]. In the field of descriptive epidemiology, the use of quality scores to rank order studies is relatively untested [3739]. There are several features of Quality Scores that need to be kept in mind. The criteria used need to be easily operationalized and reliable, thus they tend to involve simple, categorical judgments (criteria met versus not met). Unfortunately, it has been shown that important items such as "Is the sample size adequate?" cannot be reliably operationalized [38]. It is self-evident that the ability to judge the quality of the study is contingent on the quality of the reporting of that study.

Finally, some commentators have noted that more recent studies have reported lower rates, suggesting that the incidence of schizophrenia is changing over time [4042]. We explored this issue in the data set, and predicted that more recent studies would report rates that were lower compared to older studies.

Aims and hypotheses of this study

As part of a wider systematic review of the incidence and prevalence of schizophrenia, here we present those data and analyses related to the incidence of schizophrenia. We will present the methods of the analyses and the key characteristics of the included studies (divided into core studies, cohort and Other Special Groups). We will describe the distribution of rates and then sort these rates in order to examine several main hypotheses. Based the literature described above, we predict that: (a) males will have higher rates compared to females; (b) studies based on urban sites will have higher incidence compared to studies from rural or mixed urban-rural sites; and (c) migrant groups will have higher rates compared to native born. In addition, the influence of selected methodological features (quality score, year of study, scope of coverage, method of case finding, diagnostic criteria, age range, the use of age-standardization) will also be explored.

The over-riding aim of this research is to ensure that the highest quality evidence informs the scientific discourse about the incidence of schizophrenia. Therefore, we have made the data used in this systematic review freely available to the research community. The data used in the analyses are available in both Access and Excel files attached to the electronic version of this paper.

Methods

Identification of studies

As part of the wider study of the incidence and prevalence of schizophrenia, a broad search string ((schizo* OR psych*) AND (incidence OR prevalence)) was used in MEDLINE, PsychINFO, EMBASE and LILACS. Title and abstracts, if available, were reviewed in order to exclude irrelevant studies. Potentially relevant papers were accessed in order to review the full text. The references cited by each potentially relevant paper, as well as the citations in major review papers and book chapters, were scrutinized in order to locate additional potentially relevant papers. Posters were presented at two international schizophrenia conferences [43, 44] in order to encourage researchers to contribute studies, especially studies from the 'grey literature' (for example, conference reports, theses, government reports, unpublished studies). Subsequently, letters or emails were sent to the senior authors of papers that met the inclusion criteria. In these letters, which included the most-recent list of included studies, authors were asked to inform us of missing papers and unpublished data.

Included studies

We included studies first published between January 1965 and December 2001 that reported primary data on the incidence of schizophrenia (according to any diagnostic criteria), drawn from either a general population sample or a subgroup of the population. Where multiple publications presented identical data, the most informative version of the study was included and the other related papers were excluded (full list available on request). At the time of submission, papers that had not yet been located were allocated to the 'Awaiting Assessment' category.

Data extraction

Once a study was included, data were extracted related to study-level variables (for example, authors, year of publication, site, urban/rural status, recruitment duration and years covered, case finding method, method of confirming diagnosis, diagnostic criteria), and rate-level variables (for example, rates for persons, males, females, rate of different diagnostic criteria, different age-specific rates). Full rate-level data for these variables can be found in the Additional File 6 (Excel format) and Additional File 7 (Zipped Access format). Two or more of the authors checked all data used in the analysis. When disagreements arose, these were resolved by consensus. If required, we contacted the authors for clarification of issues.

Studies were given quality points for certain features (for example, greater coverage, use of diagnostic criteria, quality of diagnostic method, thoroughness of reporting, etc). Details of the quality score used in this review are provided in Additional File 2.

It is important to remain mindful that one study may generate many items of information on the incidence of schizophrenia. Some of the rates are discrete (non-overlapping) such as those related to sex, epoch (for example, 1976 versus 1977) or location (Chandigarh urban versus Chandigrah rural).

Some of the rates overlap, such as those for age range (all age versus age-specific age bands), diagnostic criteria (ICD criteria and DSM criteria), and epoch (1976 alone versus average for 1975–1980).

Sorting the rates by the application of sequential filters

In order to deal with the range of study level and rate level variables identified by the systematic review, the data were categorized and then filtered by a series of criteria. For example, the first filter parsed rates from the included studies into four categories: (a) core studies, (b) migrant studies, (c) cohort studies, and (d) Other Special Groups. Next, a study-level filter was applied in order to isolate discrete data from multiple studies that overlapped in both time and place. This was required in order to eliminate several papers counting the same subjects more than once. We selected one paper from overlapping papers according to rules that prioritized the most 'informative' paper. Priority was given to studies with: (a) larger catchment areas, (b) larger general populations, and (c) longer duration of recruitment.

For studies that presented rate level data for the same individuals according to various criteria, a third filter was designed in order to select one representative rate for inclusion in the cumulative distribution. Once again, we selected one rate according to rules that prioritized the most 'informative' rates. Priority was given to rates that covered (a) the widest age range, (b) crude incidence rates rather than age-standardized rates (because more papers presented only raw rates and those that did present age-standardized rates rarely provided information about the method of standardization), (c) diagnostic criteria that were more prevalent in the included studies (for example, as ICD criteria were most often used, for studies that presented rates according to ICD and any other criteria, we chose the ICD rate; for studies that presented multiple CATEGO diagnoses, we selected the broadest SPO+ clinical). Details of the study-level and rate-level selections are provided in the Additional File 6 and 7. These rules were applied by two of more of the authors, and when disagreements arose, the decision was made by consensus.

Presentation and analyses of the data

For practical purposes, it is not possible to show every numerator, denominator and rate for the studies in printed tables, thus we will provide the total number of rates per study, and, where available, show the largest rate level numerator and its associated denominator for persons (or, if this is not available, the range for males and females).

The distribution of incidence rates is presented in cumulative plots. The distribution of the data is shown in rank order for incidence rate (lowest to highest ranks) with the cumulative percent of rates shown on the vertical axis. The plots show the 50% (median), and 25% and 75% quantiles (within which lies the inter-quartile range). We wish to draw the reader's attention to several features of these graphs. Firstly, the central, near-linear segment of the cumulative distributions may extend beyond the inter-quartile range (for example, from the 10%–80% quantiles), thus shape features (where the tails start, the range of the linear central segment) can be more informative than traditional inter-quartile ranges. Secondly, steeper segments of the cumulative plots are underpinned by rates that have a narrow distribution, while flatter (that is, more horizontal) segments of the distribution are underpinned by rates that are relatively more dispersed. Finally, some distributions are derived from more data than others. Regardless of slope, that is, steep or flat, if many rates underpin segments of the distributions, these segments warrant closer investigation, especially in analytic contexts, compared to sparsely plotted segments of the distribution.

The data will be presented for persons, as well as for males and females separately. The main analyses will be based on discrete (non-overlapping) core rates. Readers familiar with meta-analyses may expect to see figures displaying confidence intervals and pooled data, as well as read information related to formal tests exploring heterogeneity in the data. For several reasons, the data in this review do not lend themselves to these types of analyses. Among the 68 discrete core studies (see below), only four [31, 4547] supply confidence limits, and no other measures of spread (for example, standard errors) are reported. We were able to derive the standard error of the crude annual incidence rate from studies that reported the corresponding numerator, denominator, and duration of recruitment. Based on these calculations, we were only able to generate standard errors for 117 of the 373 (31.4%) rates, drawn from only 45 (45.0%) of the discrete core studies. Faced with such a restricted pool of standard errors, the ability to assess the heterogeneity of rates in a manner generalizable across all core studies is compromised. In addition, the issues that underlie the decision to combine data from randomized controlled trials or risk-factor epidemiological studies are of less relevance to incidence rates. For example, should incidence studies from very large populations (for example, one study from former East Pakistan has a population denominator of 51 million) [48] be allowed to exert several hundred times more influence on analyses than 'smaller' studies?

However, merely presenting the graphical displays of rate distributions does not allow a formal test of the hypotheses outlined above. No matter how appealing the descriptive approach, we are inevitably drawn into inferential statistics, and are forced to choose the least biased and most transparent method to compare the distributions. The analysis of the data in this systematic review poses two particular issues. Firstly, rates drawn from the same study tend to be more alike than those from different studies. Thus any comparison between distributions needs to control for the fact that one study may generate more than one rate (that is, the within-study variance). Secondly, because the distribution of the rates was often positively-skewed, the data required log-transformation. In order to aid interpretation of the distribution of the rates, Tables 1 2 3 present three measures of central tendency: the median, the arithmetic mean and the harmonic mean (which is a better indicator of the central tendency of the log-transformed data). As a consequence of these two issues, the statistical comparisons between distributions may not always concur with the visual interpretation of the raw distributions.

Table 1 Characteristics (quantiles and moments) of incidence rates per 100,000, for rates for persons, males and females, and male to female rate ratio
Table 2 Characteristics (quantiles and moments) of incidence rates per 100,000 by sex and urbanicity
Table 3 Characteristics (quantiles and moments) of incidence rates per 100,000, by Migrant Status, and Migrant to Native-born rate ratio by sex

Results

The electronic search identified 834 papers, while manual reference checking identified an additional 249 references. We received responses from 52 authors (see Acknowledgements for full list), who provided an additional 41 references (Figure 1). At the level of potentially relevant papers, 74% were identified from electronic sources. We identified 98 studies in Languages Other Than English (LOTE). After translation, 10 were included in the study. The subsequent culling and final distribution of the papers is shown in Figure 1.

Figure 1
figure 1

Flow diagram (selection strategy) of included studies. LOTE, Languages other than English.

The rates were based on an estimated 176,056 potentially overlapping incident cases. Key details of the included studies are shown in Tables S1–S4, which can be found in Additional File 4. The systematic review identified 100 core studies [13, 31, 33, 40, 41, 4561, 63, 65127, 148, 180192]. The 158 included studies were drawn from 32 countries. One study [13] provided rates for seven countries, and another provided rates for two countries [74]. Details of these studies are shown in Table S1 in Additional File 4.

There were 24 migrant studies [31, 33, 35, 121, 128147], 23 cohort studies [9, 10, 21, 23, 42, 149165, 193] and 14 studies that reported the incidence of schizophrenia in Other Special Groups [166179]. Key features of these studies can be found in Tables S2, S3 and S4 in Additional File 4.

The review identified four studies that presented standardized incidence rates (rate ratio of specific group versus a reference group) [40, 174, 180, 181], however rates from several of these studies also contributed to the core rates [40, 180, 181] and one study contributed to Other Special Groups [174]. Of the included studies, 13 included at least one rate with person years as the denominator [9, 21, 31, 42, 74, 93, 121, 130, 133, 152, 163, 167, 174]. Person-year incidence rates have been used in descriptive and model-based analyses alongside crude annual rates. While this has the potential to distort results (an annualised person-year rate will almost never agree with a notionally corresponding crude annual rate), it is hoped that the distortion is minimal.

Of the 1,457 rates in the included studies, 501 (34.4%), from 42 studies, employed age-standardization: [13, 31, 35, 45, 46, 50, 56, 68, 74, 76, 80, 88, 91, 102, 107, 109, 110, 115, 116, 118, 124, 127, 128, 130, 133, 135138, 141, 142, 144150, 168, 181, 187, 188]. As with person-year incidence rates, age-standardized rates have been used alongside crude rates. The impact of age-standardization versus raw rates will be addressed below.

Core studies

Out of 100 core studies, 39 studies report rates only for persons [33, 40, 45, 47, 48, 5557, 59, 61, 73, 75, 79, 86, 87, 116, 91, 9396, 99, 103, 104, 107109, 112114, 118, 120122, 124127, 183, 192], 16 report rates only for males and females [41, 63, 68, 74, 82, 102, 110, 111, 116, 117, 119, 123, 187190], and 45 studies report rates for persons, males, and females [13, 31, 46, 4954, 58, 60, 6567, 6972, 7678, 80, 81, 8385, 8890, 92, 97, 98, 100, 101, 105, 106, 115, 148, 180182, 184186, 191].

The studies provided between one to 96 rates per study; the WHO 10 nation study was the most informative study [13]. Of the 509 rates derived from the core studies, 79% relied on chart or register-based diagnosis, a further 7% relied on the application of diagnostic criteria applied to written case note material, and only 11% relied on face-to-face interview (the remaining rates used other methods or this information was not specified). The most commonly used diagnostic criterion for the rates in the core studies was one of the ICD classifications (60%), while 6% used a CATEGO-derived classification, 4% used DSM, and 4% used RDC. The remaining 28% used two criteria serially (for example, ICD9 chart diagnosis followed by DSM-III-R diagnoses) or did not specify the diagnostic criteria.

Discrete-core studies

Out of 100 core studies 68 studies provided at least one discrete rate [13, 31, 33, 41, 4561, 65, 67, 6980, 8286, 88, 91104, 106, 107, 111, 112, 114, 116, 121, 123126, 188, 189]. The remaining 32 studies included rates that overlapped, by year and place, completely or partially with discrete core rates. The 68 discrete-core studies were drawn from 27 countries.

Thirty seven discrete-core studies report rates only for persons [31, 33, 4549, 5557, 59, 61, 65, 71, 73, 75, 76, 78, 79, 85, 86, 88, 91, 9396, 99, 103, 104, 107, 112, 114, 121, 124126, 180, 190], 13 report rates only for males and females [41, 50, 52, 53, 67, 74, 82, 102, 111, 116, 123, 188, 189], and 18 studies report rates for persons, males, and females [13, 51, 54, 58, 60, 69, 70, 72, 77, 80, 83, 84, 92, 97, 98, 100, 101, 106].

Of the 68 discrete-core studies, 41 (60.3%) relied on chart or register-based diagnosis, a further nine (13.2%) relied on the application of diagnostic criteria applied to written case note material, and 15 (22.1%) relied on face-to-face interview (the remaining core studies used other methods or this information was not specified). In terms of the 373 associated rates, 276 (74.0%) relied on chart or register-based diagnosis, 28 (7.5%) relied on the application of diagnostic criteria applied to written case note material, and 45 (12.1%) relied on face-to-face interview.

The most commonly used diagnostic criterion for the 68 discrete-core studies was one of the ICD classifications (51.5%), while 5.9% used a CATEGO-derived classification, 11.8% used DSM, and 2.9% used RDC. The remaining 27.9% either used another criterion, or used multiple criteria serially (for example, ICD 9 chart diagnosis followed by DSM IIIR diagnoses), or did not specify the diagnostic criteria.

Of the 373 discrete-core rates, 25.2% (n = 94), coming from 13 studies, were age-standardized [31, 45, 46, 50, 56, 74, 80, 91, 102, 107, 116, 124, 188].

Figure 2 shows the distribution of rates for persons, males and females, based on discrete-core data. In order to aid visual representation, the figure has been truncated at 100 per 100,000, however the quantiles and moments that characterize the distribution are based on all rates. Table 1 shows the quantiles and moments of these distributions. For persons, based on 170 rates, the median incidence rate was 15.2 per 100,000, and the 10% and 90% quantiles ranged from 7.7 to 43.0 per 100,000 (a 5.6-fold difference).

Figure 2
figure 2

Cumulative plots of the incidence rates per 100,000, by sex (plot truncated at 100).

Sex ratio

Unexpectedly, we noted that studies that only report incidence rates for persons appeared to report higher rates than studies that reported rates for males and females only or studies reporting males, females and persons (for figures and tables related to these analyses, see Additional File 3 and 5 respectively). This feature needs to be kept in mind in analyses that split by persons, males and females.

Figure 3 shows the distribution of the rate ratio for males versus females. These ratios are matched, and derived from within-study male to female ratios. Values greater than one indicate a higher incidence of schizophrenia in males compared to females. The rate ratio curve is normally distributed. While there were nine studies with rate ratios less than one [13, 41, 51, 60, 69, 70, 72, 83, 106, 111], over 84% of rates lie above one, with a median value (10% to 90% quantiles) of 1.40 (0.9, 2.4). In a random-effects model for the logarithm of the rates, the sex-difference is significant (F 1,30 = 76.8, p < 0.001).

Figure 3
figure 3

The cumulative percentage of the rate ratio of the incidence of schizophrenia in males versus females.

Urbanicity

We identified 21 discrete-core studies with rates from urban sites [31, 33, 54, 61, 6971, 75, 78, 80, 84, 91, 93, 97, 98, 100, 103, 106, 121, 124, 125], three studies from rural sites [57, 59, 60], and 42 studies with mixed urban-rural catchment areas [41, 4549],230,[5053, 55, 56, 58, 65, 67, 7274, 76, 77, 79, 82, 83, 85, 86, 88, 92, 9496, 99, 101, 102, 104, 107, 111, 112, 114, 116, 123, 188, 189]. Two further studies provide rates for both urban and rural categories [13, 126].

Table 2 and Figure 4 summarize the characteristics of the rates by sex and urbanicity. Because of the within-study configuration of both sex and urbanicity, and the paucity of studies on the rural sector, significance tests can only be constructed for differences between the urban and mixed categorizations.

Figure 4
figure 4

Cumulative plots of the incidence rates per 100,000, and urbanicity of site (plot truncated at 200).

The difference in harmonic means is significant for persons (F 1,50 = 6.06, p = 0.02); non-significant for males (F 1,28 = 2.7, p = 0.11); and of borderline significance for females (F 1,28 = 4.2, p = 0.05).

Migrant status

We identified 24 migrant studies from five countries [128], Germany (n = 1) [35], Sweden (n = 1) [33], The Netherlands (n = 3) [31, 129, 130], and the United Kingdom (n = 18) [121, 131147]. One study [144] was excluded from the analysis because of overlapping by time and place. Table S2 (see Additional File 4) presents a list of migrant studies with key descriptive variables, incidence rates and within-study rate ratios. The table highlights the variable definitions of first and second generation migrants used in these studies – an issue that could be more closely scrutinized in the future.

Overall migrant groups displayed elevated incidence of schizophrenia compared to their native-born populations (Table 3 and Figure 5). Note that there are more rates for migrant groups than for native-born populations, reflecting the fact that many studies reported rates for several different migrant groups. The migrant to native-born rate ratio median (10% to 90% quantiles) was 4.6 (1.0 to 12.8) (Table 3 and Figure 6). In particular, the migrant groups had higher rates for persons in the upper half of the distribution. The difference in harmonic means between migrants versus native-born is significant for persons (F 1,13 = 51.8, p < 0.001); for males (F 1,8 = 27.1, p < 0.001); and for females (F 1,8 = 10.4, p = 0.01).

Figure 5
figure 5

Cumulative percentage of incidence of schizophrenia per 100,000 in persons by migrant status.

Figure 6
figure 6

Cumulative percentage of rate ratio between migrants and natives in persons, males and females.

Methodological features

Additional figures (Figures S1–S8) and detailed tables (Tables S6–S16) related to the following section can be found in the Additional File 3 and 5 respectively.

Eight of the 68 studies show variation in quality score within studies, but always by two points at most on the scale. Thus we used the average quality score per study as our measure of a study's quality. When divided into terciles for quality score, the distribution of rates did not differ significantly (F 2,65 = 0.34, p = 0.72). Concerning methods of case identification, only eight studies used community-based surveys in order to ascertain cases [13, 45, 46, 55, 57, 78, 88, 127]. When the rates were arranged by method of case identification, the rate distributions did not differ significantly (F 3,64 = 0.14, p = 0.93). Fifteen of the 68 discrete-core studies relied on face-to-face interviews in order to confirm the diagnosis [13, 31, 45, 56, 57, 59, 65, 73, 76, 78, 82, 88, 101, 104, 125]. When the rates were arranged by method of diagnostic confirmation, they did not differ significantly (F 3,62 = 0.17, p = 0.92). Curiously, the studies that went to the effort of using face-to-face diagnostic interview tended to yield similar incidence rates as studies using chart/case records (median 14.0). The studies with systematic review of case notes produced numerically higher incidence rates (median 21.0).

Twelve studies reported reliability data on their diagnostic techniques [13, 33, 45, 54, 65, 71, 78, 83, 96, 102, 121, 124], and one study reported details of a 'leakage study' in order to identify potential missed cases [78]. Eight studies [46, 53, 54, 61, 72, 74, 101, 117] provided more than one method of diagnostic assessment, however in this review we had to choose only one method in accordance with the rules outlined above. When the rates were arranged by diagnostic criteria, there were no significant differences amongst the distributions (F 4,63 = 0.04, p = 0.99).

Surprisingly, when the rates were divided into those that were age-standardized versus raw rates, these distributions did not differ significantly (F 1,48 = 0.24, p = 0.63), nor did age-range impact significantly on the rate distributions (F 2,36 = 0.80, p = 0.42). Only one of the methodology variables revealed a significant difference – year of first intake, a measure related to possible secular change. The differences among harmonic means was significant (F 2,18 = 13.3, p < 0.001), with significant differences among all three harmonic means (in particular 1949–1975 versus 1976–1983, p < 0.001 unadjusted; 1976–1983 versus 1984–1995, p = 0.005 unadjusted).

Cohort and Other Special Group studies

We identified 23 discrete cohorts that have reported either cumulative incidence proportion or other measures of the occurrence of schizophrenia [9, 10, 21, 23, 42, 149165, 193]. Some of these cohorts have generated multiple publications reporting the incidence of schizophrenia over time. These cohorts came from nine sites: Australia (n = 2) [149, 150], Denmark (n = 3) [9, 151, 193], Finland (n = 5) [42, 152155], Israel (n = 2) [156, 157], Italy (n = 1) [158], Sweden (n = 2) [21, 159], The Netherlands (n = 2) [10, 23], USA (n = 3) [163165], and the United Kingdom (n = 3) [160162]. Because of differences in the age of the cohorts, the rates are not readily comparable; however the highest cumulative incidences have been reported from Finland and the United States.

In addition, we identified 14 studies that reported the incidence of schizophrenia in the following subgroups of the population: over age 65, twins, various ethnic and/or religious subgroups, the offspring of certain Jewish immigrants, students, deaf individuals, and workers in a radiation contamination zone [166179]. These studies came from 11 countries: Denmark (n = 2) [166, 167], Israel (n = 1) [168], Mauritius (n = 1) [169], New Zealand (n = 1) [170], Norway (n = 1) [171], Romania (n = 1) [172], Russian Federation (n = 1) [173], Sweden (n = 1) [174], USA (n = 1) [179], Ukraine (n = 1) [175], and the United Kingdom (n = 3) [176178].

Key details of the Cohort and Other Special Groups can be found in the tables included in Additional File 4.

Discussion

A total of 1,457 rates from 158 studies were identified in this systematic review of the incidence of schizophrenia, thus it is understandable that the reader interested in this area may 'starve amidst plenty'. After the application of various filters, inspection of the distribution of the rates was informative. Most distributions were data-rich, and many had central segments densely underpinned by data.

The distribution of rates

For persons in discrete core studies, 55 different studies provided 170 rates. The median value for rates was 15.2 per 100,000. The distribution was positively skewed, consistent with previous comments on the distribution of the incidence of schizophrenia [5]. Excluding the top and lower 10% of the distribution, rates ranged from 7.7 to 43.0 per 100,000. Because schizophrenia has a relatively low incidence rate, zero provides an absolute 'floor' for the rate distribution. It is of interest to note that while the lower tails of the distributions were usually data-sparse, many of the distributions had upper tails that contained more than 25% of the rates. Several of the distributions had marked positive skewness. Thus, while Jablensky commented that "a small number of populations have been identified that clearly deviate from this central tendency" [5], the data suggest that the number of studies that 'deviate' may not be so small, and that this deviation is predominantly for high rates, rather than low rates.

Mindful that we are dealing with a disorder with a relatively low incidence rate, it is a moot point if the range is narrower or wider than other low incidence rate disorders. For example, Type I diabetes occurring before age 15 also has a relatively low incidence. However, a WHO-sponsored systematic review of studies that used a shared definition of Type I diabetes described the range of 10 to 40 per 100,000 as "prominent worldwide variation" [194]. So, comparable incidence ranges may be described as "lacking in variation" by some commentators, and "markedly variable" by others. Regardless of the adjectives used to describe the distribution of rates, researchers must strive to understand the factors that drive this variation.

Sex and schizophrenia

Based on studies identified in this systematic review, rates were generally higher in males compared to females. The rate distributions suggest that the magnitude of the sex difference is relatively constant across most studies, however for studies that report a higher incidence of schizophrenia (for example, greater than 32 per 100,000), the size of the male excess may be smaller. The median male/female rate ratio was 1.40 (inter-quartile range = 1.1 to 1.8), which is similar to a pooled risk ratio derived from core, migrant and cohort-derived incidence data [17]. We note that nine studies [13, 41, 51, 60, 69, 70, 72, 83, 106, 111] reported a risk ratio of less than one (that is, higher incidence of schizophrenia in females versus males). We examined the major characteristics of these studies, but could not identify any consistent feature that distinguished these studies from those with rate ratios greater than one.

It should be noted that the studies that underpin the male to female rate distribution have been collected over several decades from many different nations, and have been based on many different design features. The fact that the distribution of the male to female rate ratio is a normal distribution is visually striking. While it is feasible that these studies share systematic biases that may have influenced the findings (for example, excluding older age groups), in order to 'wash out' this finding, a substantial number of new studies reporting a male to female ratio of less than one would need to be published.

Overall, the data indicate that the incidence of schizophrenia is higher in men than in women.

Urbanicity and the Incidence of schizophrenia

Based on the studies identified in this systematic review, the distribution of rates derived from urban catchment areas was higher compared with that based on mixed urban-rural catchment areas (p = 0.02 for persons, p = 0.11 for males, and p = 0.05 for females). The small number of rates available for rural catchment areas does not allow the more precise urban versus rural comparison.

This finding provides support for the hypothesis that the incidence of schizophrenia is higher in cities compared to mixed urban/rural catchment areas. Our findings are in keeping with recent cohort and case control studies based on place of birth [9, 23]. However, within the mixed urban-rural studies, we were not able to infer exactly how 'rural' the mixed sites were. Indeed, we allocated studies to the mixed category if there was any possibility that rural sectors were included. This bias would have made any true difference between urban versus mixed urban/rural more difficult to detect.

Migrant status and the incidence of schizophrenia

Compared to native-born populations, rate distributions based on migrant groups had a relatively consistent pattern of higher rates compared to the distributions based on rates for native-born individuals. This association was also found when rates for males and females were examined separately (p < 0.001 for males, p = 0.01 for females). Based on migrant versus native-born rate ratios, the median rate ratio for persons was 4.6 (10% to 90% quantile = 1.0, 12.8).

It is important to note that migrant studies are prone to a range of methodological issues. These include factors related to differential pathways to care, diagnostic inaccuracies (language and cultural practices may hinder accurate diagnosis), potential confounding due to socioeconomic factors and problems in determining the numerator and denominator for the calculation of rates. Harrison and colleagues have argued that because of these potential biases, confidence in the link between migrant status and schizophrenia requires consistent findings based on different methods and from different sites [146]. While the studies analyzed in this systematic review would share common biases, they are drawn from many different sites, include many different migrant groups and differ on a range of methodological features. Thus the increased relative risk for migrants warrants added weight.

Design features of the studies

While the analyses related to the quality score and methodology are interesting, we urge caution in the interpretation of these results. Researchers interested in comparing the psychometric properties of face-to-face interviews versus chart diagnoses, or the DSM-III-R versus ICD-9 diagnostic criteria are best directed to studies that were primarily designed to address these issues. These methodological issues are important, but are more efficiently addressed by studies based on prevalent rather than incident cases. However, mindful of the above issues, we found that the distributions of rates did not significantly differ when categorized by (a) quality of the study, (b) different methods of diagnostic confirmation, (c) different diagnostic criteria, (d) the presence or absence of age-standardization, and (e) different age ranges. The distributions of rates seem relatively impervious to these design features. However, we found that rates from more recent studies generated distributions that encompassed lower rates compared to those distributions from older studies.

Cohort and Other Special Groups

In recent years several studies have reported the incidence of schizophrenia in well-described birth cohorts [161, 163165, 195]. While the incidence rates from these studies are not readily comparable to core incidence studies, the cohort studies are making important contributions to risk factor research [196]. It is of interest to note that several of the authors of cohort studies have drawn attention to the relatively high cumulative incidence proportions being found in their studies [42, 154, 163, 164]. For example, the Finish birth cohort at age 31 estimated the cumulative incidence of DSM-III-R schizophrenia between 0.73% and 1.08% [195].

The Other Special Group studies share few, if any, features. Perhaps the common feature was that the authors felt that the incidence of schizophrenia may have differentiated the particular groups under observation from the general population. Interestingly, several of these studies suggest that certain minority ethnic/cultural groups may have an elevated incidence of schizophrenia [168170, 179], which may complement the findings from migrant studies.

Caveats

Systematic reviews are secondary research ('research on research'). The object of scrutiny is not the incidence of schizophrenia per se, but the literature on this topic. As such, this type of research cannot be used to 'prove' a hypothesis about the underlying category of observation. However, the compiled data can encourage the generation of new hypotheses that can then be tested prospectively, with new data. In particular, systematic reviews can draw attention to factors that may underlie heterogeneity of the data.

Despite all of our efforts to ensure a complete dataset, reviews such as this are bound to miss studies, and/or to have data entry errors. We encourage readers to inform us of missing studies or errors in the data. Updated lists of relevant studies and raw data will be available from the authors. In the absence of clear guidelines, many of the rules we used to filter studies and extract data were ad hoc. In the future, researchers may wish to re-analyze the data set using different criteria, and explore sensitivity analyses related to these choices.

It is clear from the broad range of studies identified in this systematic review that there is no one 'perfect' design for measuring the incidence of schizophrenia – different studies have different strengths. Most of the studies provide rates that are in reality the 'treated incidence' of schizophrenia (that is, the count of individuals who are identified by services and treated). Factors related to service availability and the pathways to care vary considerably. Apart from the obvious issue of availability of services, recent research has shown that community education about psychosis can lead to sudden increases in the number of 'new' cases presenting to services [197]. However, we can reasonably infer that the studies included in this systematic review probably underestimate the true (underlying) incidence of schizophrenia.

There are many additional analyses that could be undertaken with these data. In future papers we plan to explore the relationship between incidence rates and ecological features of the catchment area (for example, population age structure, social and financial indices). The impact of age adjustment on the rates (which has been addressed in previous narrative reviews [198]), will also be examined more closely. However, in a commentary, Berlin [199] cautions the scientist as follows: "Meta-analysis is like any other form of data analysis in that it requires strict adherence to methodological guidelines, careful planning, the use of a priori definitions and analytic strategies, and extremely careful interpretation that does not go beyond the limits of the data" (p387). We endorse Berlin's recommendations.

Conclusions

This systematic review of the incidence of schizophrenia reveals a complex and varied epidemiological landscape [5]. While the median incidence rate for persons was 15.2 per 100,000, the 10% to 90% quantiles cover over a five-fold range of rates (7.7 to 43.0 per 100,000). The distribution is positively skewed, with many studies reporting rates in the upper range. The incidence of schizophrenia is higher in men compared to women, higher in urban sites compared to mixed urban/rural sites, and higher in migrants compared to native-born individuals.

Traditional Linnean taxonomy classifies living creatures into various hierarchical categories. By way of analogy, this systematic review has approached the broad genus of studies related to the incidence of schizophrenia, and classified rates from these studies to the epidemiological equivalents of species and subspecies. We believe that the orderly array of these data has been informative and, in some instances, visually appealing (like butterfly collections!). We encourage researchers to further explore these data in order to help generate testable hypotheses.

Goethe noted that data are the natural enemy of hypotheses. In a Darwinian sense, the provision of data alters the landscape such that 'unfit' hypotheses should be less prosperous. In other words, the addition of new data should assist in the culling of less heuristic hypotheses. By assembling and sorting rates related to the incidence of schizophrenia, we hope that we have enriched the epidemiological environment and that these data will facilitate the survival of the most heuristic hypotheses.

Additional files

  1. (1)

    Glossary: Word document

  2. (2)

    Quality Reporting Scale: Word document

  3. (3)

    Additional figures: Word document

  • Figure S1. Cumulative plots of the incidence rates per 100,000, by features of reporting by sex

  • Figure S2. Cumulative plots of the incidence rates per 100,000, by tercile rank of the quality score

  • Figure S3. Cumulative plots of the incidence rates per 100,000, by Case-Finding Method

  • Figure S4. Cumulative plots of the incidence rates per 100,000, by Confirmation of Diagnosis

  • Figure S5. Cumulative plots of the incidence rates per 100,000, by Diagnostic Assessment Method

  • Figure S6. Cumulative plots of the incidence rates per 100,000, by Sex and Age Standardization

  • Figure S7. Cumulative plots of the incidence rates per 100,000, by Age Range

  • Figure S8. Cumulative plots of the incidence rates per 100,000, by Years of Intake

  1. (4)

    Description of studies: Word document

  2. a.

    Table S1: Summary table of incidence of schizophrenia: Core studies

  3. b.

    Table S2: Summary table of incidence of schizophrenia: Migrant studies

  4. c.

    Table S3: Birth cohort studies providing incidence in person years, cumulative incidence or cumulative percent

  5. d.

    Table S4: The incidence of schizophrenia in Other Special Groups

  6. (5)

    Additional tables and detailed results: Word document

  • Table S5: Characteristics (quantiles and moments) of incidence rates per 100,000, for rates for persons, males and females, and male to female rate ratio

  • Table S6: Characteristics (quantiles and moments) of incidence rates per 100,000, by Sex and Urbanicity

  • Table S7: Characteristics (quantiles and moments) of incidence rates per 100,000, by Migrant Status, and Migrant to Native-born rate ratio by sex

  • Table S8: Characteristics (quantiles and moments) of incidence rates per 100,000, by Sex, and by features of reporting by sex.

  • Table S9. Characteristics (quantiles and moments) of incidence rates per 100,000, by terciles of the Quality Score

  • Table S10: Characteristics (quantiles and moments) of incidence rates per 100,000, by Method of Finding Cases

  • Table S11: Characteristics (quantiles and moments) of incidence rates per 100,000, by Confirmation of Diagnosis

  • Table S12: Characteristics (quantiles and moments) of incidence rates per 100,000, by Diagnostic Assessment Method

  • Table S13: Characteristics (quantiles and moments) of incidence rates per 100,000, by Sex and Age Standardization

  • Table S14: Characteristics (quantiles and moments) of incidence rates per 100,000, by Age Range

  • Table S15: Characteristics (quantiles and moments) of incidence rates per 100,000, by year of first Intake

  1. (6)

    Complete data set in Excel format.

  2. (7)

    Complete data set in Access format. (zipped).