Background

Stage data is needed to define clinically homogenous cohorts, adjust for the extent of disease spread, study real-world treatment effectiveness and costs, and inform regional decision-making [1]. Accurate staging, when linked to treatment and outcome data, informs the effectiveness and quality of cancer treatments, and guides healthcare planning for resource mobilization or implementation [1]. The absence of stage data increases the complexity of maintaining representativeness of the cancer cohort, minimizing bias caused by excluding patients with unknown stage data, and achieving adequate sample size to perform robust statistical analyses [2].

Capturing population-based stage data in ‘big data’ is often limited by practical and financial constraints. For example, the International Cancer Benchmarking Project used multiple national cancer registries to understand cancer stage and survival patterns [3]. The registries contained varying levels of complete stage data across primary cancer sites; upwards of 50% of patients were excluded due to missing stage data in this international comparison of cancer survival [4, 5]. As a result, many countries are aiming to improve their population-based stage data collection using a number of methods and data sources [1, 2, 6,7,8].

Validated algorithms to identify metastatic disease using routinely collected healthcare data may provide one solution to missing stage data in studies using population-based, administrative data [9, 10]. Benchimol et al. have published general guidelines for algorithm development and validation using administrative healthcare data to assign disease status [9]. Overall, many studies do not appropriate report on the performance of the algorithm, including revalidation, present at least four metrics to assess diagnostic accuracy (e.g. sensitivity, specificity, agreement), or confidence intervals [9]. Little published research has evaluated algorithm performance across cancer sites; developing high quality algorithms requires gold standard staging data to properly validate and ensure accuracy prior to use. Whyte et al. evaluated 28 algorithms to identify metastatic disease status in three administrative data cohorts of treated colorectal, breast, and lung cancer patients in the United States [11]. The algorithms had varying properties depending on cancer site, the underlying prevalence of metastatic disease, the choice of timeframe, and diagnosis codes [11]. This is consistent with the properties of other diagnostic algorithms, where there is also evidence that algorithm performance is dependent on the data sources used.

Gastric cancer (GC) is the third leading cause of cancer-related mortality worldwide [12, 13]. Most patients in North America present with metastatic disease at diagnosis [14, 15], with similar stage distributions reported in the United Kingdom [16,17,18]. Although not all countries capture this information routinely, the ability to identify stage IV patients in population-based registries is crucial. Therefore, this study linked detailed TNM staging data from a province-wide chart review with routinely collected healthcare data, to develop an algorithm to identify individuals with metastatic disease in a cohort of GC patients.

Methods

Study population

GC patients aged 19 and older and diagnosed between April 1, 2005 and March 31, 2008 were identified in the Ontario Cancer Registry. Patients with multiple cancers, no corresponding hospital chart, tumour located primarily in the oesophagus, or non-adenocarcinoma tumours were excluded. The project received the Research Ethics Boards approval at the Sunnybrook Health Sciences Centre and adhered to all privacy and confidentiality regulations of ICES. Individual patient consent was not required. ICES is an s. Forty five Prescribed Entity under Ontario’s privacy law (PHIPA), enabling us to study the health and health outcomes of individuals for the purpose of analysis or compiling statistical information with respect to the management of, evaluation or monitoring of, the allocation of resources to or planning for all or part of the health system.

Data sources

A province-wide chart review was conducted at over 100 institutions between November 2009 and November 2011. Information from multiple endoscopy, radiology, and pathology reports per patient were aggregated. Data abstraction from operative reports was completed by a surgical resident in 2013. Chart review data were linked to routinely collected healthcare and vital status data at ICES in 2013. All hospitalizations, emergency department (ED) visits, and physician visits were captured from the Canadian Institute of Health Information-Discharge Abstract Database and the Same Day Surgery Database, the National Ambulatory Care Reporting System, and the Ontario Health Insurance Plan database.

Metastatic disease

Reference standard

The 7th Edition American Joint Committee on Cancer/Union International Cancer Control TNM staging system was used [19]. TNM stage data from patient hospital charts were used as the reference standard. Stage data were collected in the 180 days prior to the diagnosis date registered in the Ontario Cancer Registry and in the 180 days following diagnosis up until the date of surgical resection (whichever came last) using a modified Collaborative Staging system approach. Clinic, diagnostic imaging, endoscopy, surgery, and pathology records were used to identify metastatic disease. Patients were considered stage IV, otherwise defined as M1 or positive for metastatic disease, if evidence of metastatic disease was identified in any portion of the medical record and M0 otherwise (stage I-III).

Algorithms

Three sets of administrative data algorithms to identify stage IV gastric cancer [19], otherwise defined as the presence of metastatic disease at diagnosis, were created using a combination of information from hospitalization records, ED visits, and outpatient physician visits. A positive diagnosis of metastatic GC was determined using three sets of eligible International Classification of Disease (ICD) system version 9 and 10 diagnosis codes (a complete list is provided in Additional file 1: Table S1). The included diagnoses ranged from conservative (secondary malignancy codes only, e.g. ICD-9 code 196) to inclusive (any non-gastric malignancy diagnosis (e.g., ICD 10 C codes excluding digestive organs). In the first set of algorithms, patients were identified as being metastatic if they had a hospitalization. In the second set of algorithms, patients with metastases were identified using hospitalization records (one or more) and outpatient records (two or more). In the third set of algorithms, patients with metastases were identified if they had one or more hospitalizations or outpatient records. Three different time periods were also considered for each algorithm: three months pre- and post-diagnosis, six months pre- and post-diagnosis, and three months pre-diagnosis with no end to follow-up post-diagnosis. These specific criteria were chosen based on the types of data in our administrative data holdings, as well as previous studies defining metastatic disease using similar data, and based on the properties of diagnostic algorithms using administrative data in other settings. We performed a sensitivity analysis restricting the cohort to those who received a surgical resection. In total, 45 algorithms were evaluated.

Statistical analysis

Sensitivity, specificity, positive predictive value, negative predictive value, and accuracy were calculated for each algorithm. Accuracy was measured using the following equation: Accuracy = (TP + TN) / (TP + TN + FP + FN) [9]. Ninety five percent confidence limits on the estimates of sensitivity, specificity, PPV, NPV and accuracy were calculated using percentiles of a distribution of 5000 bootstrap replicates with replacement. Demographic characteristics and the tumour stage, lymph node status, and TNM stage of true positives, false positives, true negatives, and false negatives were described for each algorithm. Content validity was evaluated by comparing the percentage of patients who died in year following diagnosis.

Results

Overall, 2366 patients were included; 54.3% had metastasis at diagnosis according to the chart review (Table 1). Sensitivity, specificity, and accuracy of the algorithms are reported in Table 2. Sensitivity ranged from 50.0 - 90%, specificity ranged from 27.6 - 92.5%, and accuracy from 61.5 - 73.4%. Sensitivity and specificity were maximized when the algorithm used the most conservative list of metastatic disease diagnosis codes, hospitalization and outpatient records as the data source, and when the algorithm was run on administrative data from the six months prior to and following diagnosis. The sensitivity of the algorithms all decreased and the specificity of the algorithms increased slightly, when the cohort was restricted to patients who received surgical resection (Additional file 2: Table S2). Excluding patients with unknown metastatic disease status (4.3%) did not change the results (data not shown). Concordant and discordant classifications between the algorithms and the reference standard are reported in Additional file 3: Table S3.

Table 1 A description of the reference standard M1 and non-M1 cohort
Table 2 Properties of evaluated algorithms

Table 3 describes the algorithm that maximized sensitivity and specificity (algorithm # 12). According to this algorithm, the prevalence of metastatic GC was 45%. Of the 1285 true positives using the reference standard, 31% were misclassified using this administrative healthcare data algorithm; 20% of the metastatic group identified by the algorithm were false positives and 32% of the M0 were false negatives. One third of the false positives and false negatives had an unknown stage at diagnosis according to the reference standard. Correctly classified metastatic patients were more likely to have died within a year of diagnosis, than those incorrectly classified.

Table 3 A description of concordant and discordant classifications for algorithm 12, the algorithm that maximized sensitivity and specificity

Using the algorithm with the highest positive predictive value (algorithm # 1), 11% of those identified as having metastatic disease were misclassified. Ninety percent of patients misclassified using this algorithm were stage III (55.5%) or unknown stage (34.6%), 66% had a T4a or T4b tumour. Overall, as the positive predictive value of the algorithm decreased, the proportion of node-negative patients with smaller tumours, and earlier stage disease, misclassified as metastatic increased (data not shown).

Discussion

This study evaluated 45 algorithms using routinely collected healthcare data to identify metastatic disease in a population-based cohort of GC patients. None of the algorithms did an excellent job of classifying patients based on the reference standard. The algorithm that maximized sensitivity and specificity identified metastatic disease through one or more hospitalization or outpatient records with a diagnosis from the conservative list, in the six months before and after diagnosis.

Our algorithm accuracy differed from the few others present in the literature as the result of study design or the underlying prevalence of metastatic disease. We observed lower accuracy than a study of colorectal cancer algorithms by Brooks et al. [20]. Whyte et al. reported better accuracy for their algorithms identifying metastatic disease in breast cancer, and similar accuracy for algorithms in lung and colorectal cancer [11]. Whyte et al. reported sensitivity and specificity estimates ranging from 46 to 77 and 83–99% for breast cancer, 50–67 and 68–83% for lung cancer, and 54–77 and 70–91% for colorectal cancer [11]. Whyte et al. did not define the length of their follow-up period, or explain why the total number of patients varied across algorithms, and only included patients treated within a private healthcare system [11]. Both Whyte et al. and Brooks et al. studied only patients who received treatment. Both breast and colorectal cancer have a much lower prevalence of metastatic disease at diagnosis, compared to GC which may impact accuracy. We concluded similar findings to an algorithm developed by Lash et al. to identify colorectal cancer recurrence, in which patients correctly identified by the algorithm were more likely to be younger and to die in a shorter timeframe [21].

The best algorithm choice is dependent on the research purpose [22]. For example, maximizing accuracy may be the priority when estimating the prevalence of metastatic disease, when representativeness of the identified cohort is not important. Maximizing specificity may be the priority to ensure patients included in a study of metastatic patients are not metastatic. We recommend using a conservative approach with relevant diagnosis codes reported close to the diagnosis date. This approach, and the other algorithms reported in this study should be tested in an additional, external cohort, including one that better reflects current clinical populations and treatment. The properties of algorithms in this study may be generalizable to similar high fatality cancer cohorts such as pancreas and esophagus. The algorithms may be used by other investigators and policy-makers to estimate the extent of misclassification, and in formal bias analyses to adjust effect estimates [23]. Alternatively, given that none of the algorithms demonstrated exemplary accuracy, integrating multiple algorithms using methods such as majority vote and Boolean operations may be another way these algorithms may be implemented in practice [24].

Our study is limited by our choice of a reference standard, which may have resulted in misclassification of metastatic disease across patients. The prevalence of metastatic disease was 54% in our study, with a median survival of six months, which matches the literature distribution [14, 25]. We performed a sensitivity analysis restricting to the cohort of patients with a surgical resection, who would have better quality pathologic staging data available in their charts. The true prevalence of metastatic disease was lower and the positive predictive value of the algorithms decreased. We also attempted to address administrative data quality issues by creating three sets of algorithms based on the data reliability (hospitalization data being most reliable) and using three sets of diagnosis codes.

Conclusions

We suggest that algorithms using administrative healthcare data are imperfect replacements for population-based staging data and support the need for system level data collection. However, they do yield moderately accurate results. In cases where population-based data collection is infeasible, a global understanding of misclassified patients and administrative algorithm properties is important to assessing potential selection bias.