Background

Flood is one of the most common and severe forms of natural disasters. It can result in direct economic and property losses, physical injuries, deaths, and psychological injuries. Posttraumatic stress disorder (PTSD) is a common disorder among victims of various disasters such as traffic accidents[1, 2], violent crimes[3], hurricanes[4], earthquakes[5, 6], and floods [710]. PTSD is also a severe and complex disorder precipitated by exposure to psychologically distressing events, and it is characterized by persistent intrusive memories about the traumatic event, persistent avoidance of stimuli associated with the trauma, and persistent symptoms of increased arousal[11].

Floods occur frequently in China. A severe flood that struck China's Hunan province in 1998 left hundreds of thousands of residents homeless, and damaged many infrastructural and agricultural projects. It is of great importance to find ways of promptly identifying flood victims who are likely to develop PTSD to enable the government take timely measures to protect the health of such victims. Currently, there are no PTSD prediction models that can be applied to flood victims. The aim of this study was therefore to identify determinants of PTSD and to develop a risk score model to predict PTSD among flood victims.

Methods

Study area and population

The 1998 floods in China affected over 180 million people. It is estimated that the flood displaced 18.393 million people; destroyed 6.85 million houses; caused 4,150 deaths; and yielded a direct economic loss of about $32 billion (New Report from Ministry of Health, China, 1999). Hunan was the most severely affected province. Victims who had been directly exposed to the 1998 flood in Hunan formed our target population. The study area covered the catchment area of the Dongting Lake (north of Hunan) and the west of Hunan.

The catchment area of the Dongting Lake is located south of the middle reaches of the Yangzi River in southern China. It is usually warm, humid, and rainy during summer. The area, which is flood-prone, experienced soaked and collapsed floods in 1998. It consists of 31 counties; covers an area of 31,000 km2; and has an estimated human population of 11.3 million. Residents who live in this area share similar natural conditions and socio-economic and health status. The majority of them are farmers with low levels of education. The area within the west of Hunan covers 7 counties affected by the flash floods of 1998. These counties also share similar socio-demographic characteristics.

We used a multi-stage stratified and cluster sampling method to select study subjects. Firstly, we randomly selected 7 counties from 31 counties that suffered soaked and collapsed floods (Yueyang, Lingxiang, Huarong, Qianlianghu, Ziyang, Anxiang, Datonghu) and 1 county from 7 counties that experienced flash flood (Longshan). Then, by a systematic sampling approach, we randomly sampled 50% of townships in the selected counties, 50% of villages in the selected townships, and 50% of households in the selected villages. All family members in the selected households aged 16 years and older; experienced the flood; and willing to be interviewed were invited to participate in our study.

Flood type and severity

Flood was classified into 3 types: soaked flood, collapsed embankment flood, and flash flood. Soaked floods are also called drainage-problem floods, occurred as a result of regular drainage systems not able to handle high precipitation levels. Collapsed embankment floods, which are also called river flood, are caused by flooding of the river outside its regular boundaries, often as a result of high precipitation levels. Flash floods usually occur as a result of local rainfalls with high intensity[12].

The severity of flood suffered was also divided into 3 categories: mild (affected area <50%), intermediate (affected area 50%-75%), and severe (affected area >75%), according to the standard setup by the Chinese flood management authority.

Data collection

The survey was conducted between January and May 2000. 40 trained interviewers, who worked at the local Centres for Disease Control and Prevention and had a bachelor's degree or higher, carried out face-to-face interviews using a questionnaire to obtain demographic data, to ascertain PTSD, and to measure personality and psychological characteristics. The interviewers received on-site supervision from psychologists. The project was approved by the Research Ethics Board of Central South University, and all subjects agreed to participate in the investigation. All interviews occurred in the study subjects own home, in a private room with no other person present. Interviews lasted for about 20 minutes. To facilitate the study, we contacted the study subject by telephone before the interview. If the scheduled time was not convenient for him/her, we changed it.

The diagnosis of PTSD was made according to the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) criteria[11], which included 17 symptoms scored as 0 = none, 1 = slight, 2 = moderate, 3 = severe, and 4 = extreme. Symptoms with scores = 2 were defined as positive. The 17 symptoms of PTSD were further divided into 3 groups, representing 3 diagnostic criteria B, C, and D. Criterion B symptoms represented the re-experiencing cluster, and subjects were defined as B symptom positive if they showed one or more positive items in the B group. Criterion C symptoms represented the avoidance cluster, and subjects were defined as C symptom positive if they showed 3 or more positive items in the C group. Criterion D symptoms represented the hyperarousal cluster, and subjects were defined as D symptom positive if they showed two or more positive items in the D group. In addition, there were criteria A and E for the diagnosis of PTSD. Criterion A represented exposure to an extreme traumatic stressor involving direct personal experience of an event, witnessing an event, learning about unexpected or violent death, serious harm, or threat of death or injury experienced by a family member or another close associate (A1); and the person's response to the event must involve intense fear, helplessness, or horror (A2). All subjects in our study witnessed the 1998 flood and experienced the threat of death or injury from the flood. Also, all the probable PTSD-positive subjects met the A2 criterion. Criterion E represented the disturbance lasting more than 1 month. Subjects were diagnosed as having PTSD if Criteria A, B, C, D, and E symptoms were all positive. We assessed all symptoms, including the time and duration of occurrence. The questionnaire for PTSD had been tested in Chinese populations and had been proved to be valid and reproducible [8].

All interviewers participated in a 2-day training program, which focused on the questionnaires. A working manual was provided to ensure that all interviewers had the same understanding for the questionnaire. The completed questionnaires were checked by the coordinator (one coordinator in each county) of the study. If a questionnaire was found to be incomplete or inconsistent, the interview was repeated for the same subject to reduce missing data as much as possible.

Statistical analysis

We randomly divided the data sets into two groups, one group (group 1: approximately 70% of the samples) for the creation of the prediction model and the other group (group 2: approximately 30% of the samples) for the validation of the prediction model.

We first used stepwise forward logistic regression analysis to select the predictive variables. The dependent variable was PTSD (yes = 1, no = 0). Based on professional judgement and literature [1316], we selected the following potential predicting variables into the initial regression model: age (x1,16~ = 1; 35~ = 2; 55~ = 3), gender (x2, male = 0; female = 1), education (x3, illiterate = 3; elementary school = 2; high school or higher = 1), occupation (x4, farmer = 1; nonfarmer = 0), type of flood,(x5, soaked = 1; collapsed = 2; flash flood = 3), severity of flood (X6, mild = 1; moderate = 2; severe = 3), flood experience (X7 = X7.1+ X7.2+ X7.3+ X7.4+ X7.5+ X7.6. X7.1: were you trapped and waited for rescue during the flood, yes = 1, no = 0; X7.2: were you seriously injured during the flood, yes = 1, no = 0; X7.3: were your relatives or friends seriously injured during the flood, yes = 1, no = 0; X7.4: did you witness others drowning during the flood, yes = 1, no = 0; X7.5: was this flood your first experience of floods, yes = 1, no = 0; X7.6: was your house damaged by the flood, yes = 1, no = 0), and mental status before flood (X8 = X8.1+ X8.2+ X8.3+ X8.4. X8.1: would you consider yourself tensed or highly strung-up, yes = 1, no = 0; X8.2: do you often feel life is very boring, yes = 1, no = 0; X8.3: do you often feel lonely, yes = 1, no = 0; X8.4: are you easily hurt when people find faults with you or your work, yes = 1, no = 0). All potential predicting variables were valued according to the levels of PTSD prevalence.

To develop a simple risk score, the risk factors identified through multivariate logistic regression were assigned an integer coefficient. Integers were chosen to be approximately proportional to the estimated continuous coefficients from the logistic model. Assignment of points to risk factors was based on a linear transformation of the corresponding β regression coefficient. The coefficient of each variable was divided by the lowest β value and rounded to the nearest integer[17]. The final value of risk score predictive model was the sum of the risk scores mentioned above. Group 2 was used to confirm the accuracy of the risk score model by calculating the area under ROC curve. We then assessed the sensitivity, specificity, crude agreement (CA), positive predictive value (PPV) and negative predictive value (NPV) of the risk score model at different cut-off values for subjects in group 2 and for all subjects, using the diagnostic result of DSM-IV criteria as the gold standard. The CA was obtained by the sum of true positive and true negative divided by total number of subjects. The CA assumed that a prediction model had no diagnostic value if CA = 0, and that a model was invariably correct if CA = 1. SPSS 13.0 was used for all the data analysis.

Results

A total of 8 counties, 40 towns, 310 villages, 13,450 households, and 29,285 individuals aged 16 years and older were selected for the study. Among the 29,285 subjects 1,128 (3.9%) refused to participate, 1,035 (3.5%) had not been interviewed, 1,644 (5.6%) had incomplete data, and 25,478 had complete data, yielding a response rate of 87.0%. A total of 2,336 subjects were probable PTSD positive, yielding a probable positive rate of 9.2%. For the 25,478 subjects in the final analysis, 17,846 (70%) were randomly selected to group 1 and 7,632 (30%) to group 2. There was no significant difference in baseline characteristics and probable PTSD rates (P > 0.05) between the two study groups (Table 1).

Table 1 Sample distribution and probable PTSD-positive rates in 2 groups

Table 2 shows results of the stepwise logistic regression analysis among group 1 subjects. There were 7 variables entered into the prediction model, namely, age (X1), gender (X2), education (X3), type of flood (x5), severity of flood (X6), flood experience (X7), and mental status before flood (X8). The Logistic probability model was as follows:

Table 2 Significant PTSD predictive variables included in the logistic model

Based on β regression coefficient of each variable, risk score was calculated with Singh' method[17]. The final risk score model is as follow:

For example, if a subject is 38 years old (X1 = 2); male (X2 = 0); illiterate(X3 = 3); experienced moderate flood (X6 = 2); suffered flash flood(X5 = 3); and with the scores of 2 and 3 for flood experience (X7)and mental status before flood (X8) respectively, his total risk score will be as follows:

The area under ROC curve for both the logistic probability model and the risk score model for group 2 were 0.853 (Figure 1). This means that the risk score model, which is much simpler and easier to use, could yield similar results as the logistic probability model.

Figure 1
figure 1

The ROC curve of logistic probability model and risk score model for group 2 subjects.

Table 3 compares the validity of the risk score model under different cut-off values (based on individual total risk scores) in different groups. It appears risk score 63.5 ~ 69.5 may be an acceptable cut-off value yielding a sensitivity of 80.5% ~ 89.4%, specificity of 63.4% ~ 75.0%, CA of 65.8% ~ 75.5%, PPV of 19.8% ~ 24.5%, and NPV of 97.4% ~ 98.3% in group 2 (Table 3).

Table 3 The validity of predictive model under different cut-off value in different populations (%)

Based on individual risk scores calculated from our risk score model, we can predict the probability of PTSD occurrence. For the case mentioned above (risk score = 107), if 65 is selected as cut-off value, we will consider this individual to be at a high risk of developing PTSD. The higher the risk score is, the greater the probability of the individual developing PTSD.

Discussion

PTSD is a common psychological disorder in disaster-affected populations. It has been widely used to evaluate the psychological impact of natural disasters and accidents[1, 2, 610]. To our knowledge this is the first study to explore the prediction of PTSD by risk score model among flood victims in a large population. The method of risk score has been widely used for prediction or screening of disease because of its simplicity and ease of interpretation [1721]. In our study, a risk score model was established according to β regression coefficient from logistic regression analysis, which included 7 predictive variables. These variables included demographic characteristics (x1, x2, x3), type of flood (x5), severity of flood (x6), flood experience(x7) and mental status before flood (x8). In order to make the prediction model simpler and easier to understand, we combined X7.1-X7.6 into X7 and X8.1-X8.4 into X8, with the cumulative score as their score. The area under ROC curve for the logistic probability and risk score models were very similar but since the risk score model is much simpler and easier to use, we recommend its use in PTSD prediction among flood victims. The suitability of the risk score model is further supported by a sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) of 84.0%, 72.2%, 23.4%, and 97.8%, respectively at the cut-off value of 67.5 in group 2.

To make the model have better predictive value, we used 4 mental status-related variables (x8.1-x8.4) as potential predictors representing one's mental status before flood, in addition to age, gender, education, and severity of flood suffered, which were important predictors from previous studies [13, 14, 16]. Although there have been different findings about the strength of association between mental health status before trauma and PTSD [13, 14, 16], coupled with the fact that retrospective reports may be influenced by current symptoms, our study results still yielded valuable information. Mental status prior to flood was an effective predictor of PTSD in our study.

In view of the fact that our study focused on the prediction of PTSD, rather than screening, only demographic characteristic(x1, x2, x3), type of flood (x5), severity of flood (x6), flood experience(x7) and mental status before flood (x8) were included in our predictive model(early symptoms of PTSD were not included). We listed sensitivity, specificity, PPV, and NPV at different cut-off values so allow public health workers choose the appropriate cut-off value for their PTSD predictions. For example, if the goal is to find as many PTSD cases as possible, one could use 39.5 as the cut-off value, and this will raise the sensitivity level to 99.9%. If the goal is to reduce the false positives, one could use 97.5 as the cut-off value to yield a specificity level of 98.8%.

Our PTSD prediction model was validated with a separate sample, which showed its true and reliable performance when applied on other populations. All the predictive variables included in the model could be easily obtained through a simple questionnaire after a flood. Compared with other PTSD screening models, which included some PTSD early symptoms as predictors [1316, 22], our model showed lower sensitivity and specificity. However, it could be used to predict the possible occurrence of PTSD immediately after a flood. Our model, therefore, is significant in public health programs.

Our study used a retrospective survey method to investigate the impacts of flood. As a result, recall bias and information bias could occur. However, because interviewers did not know who had PTSD or who had not at the time of interviewing, the recall bias and information bias, if any, may have occurred randomly.

Another limitation of our study is the fact that the diagnosis of PTSD was made using a questionnaire administered by interviewers. Although the interviewers received on-site supervision from psychologists, the diagnosis of PTSD may not be accurate. In view of this, all suspected cases were diagnosed as 'probable PTSD'.

The potential predictive variable considered in our model was selected based on professional judgement as well as literature. Other variables, such as economic loss, property damage, and family history of mental illness were considered in some preliminary analyses of our data. However, since those variables were not found to be statistical significant in univariate analyses, we did not include them in the multivariable logistic regression analysis. Although our model has not been proved to be an optimal one, it is a practical and useful model for PTSD prediction at least for now. Its performance in other populations needs to be further investigated.

Conclusions

The risk score based predictive model for PTSD developed in this study has an acceptable predictive value with favourable applicability, and can be used to identify persons at risk of PTSD during floods.