The National Cancer Database Conforms to the Standardized Framework for Registry and Data Quality

Background Standardization of procedures for data abstraction by cancer registries is fundamental for cancer surveillance, clinical and policy decision-making, hospital benchmarking, and research efforts. The objective of the current study was to evaluate adherence to the four components (completeness, comparability, timeliness, and validity) defined by Bray and Parkin that determine registries’ ability to carry out these activities to the hospital-based National Cancer Database (NCDB). Methods Tbis study used data from U.S. Cancer Statistics, the official federal cancer statistics and joint effort between the Centers for Disease Control and Prevention (CDC) and the National Cancer Institute (NCI), which includes data from National Program of Cancer Registries (NPCR) and Surveillance, Epidemiology, and End Results (SEER) to evaluate NCDB completeness between 2016 and 2020. The study evaluated comparability of case identification and coding procedures. It used Commission on Cancer (CoC) standards from 2022 to assess timeliness and validity. Results Completeness was demonstrated with a total of 6,828,507 cases identified within the NCDB, representing 73.7% of all cancer cases nationwide. Comparability was followed using standardized and international guidelines on coding and classification procedures. For timeliness, hospital compliance with timely data submission was 92.7%. Validity criteria for re-abstracting, recording, and reliability procedures across hospitals demonstrated 94.2% compliance. Additionally, data validity was shown by a 99.1% compliance with histologic verification standards, a 93.6% assessment of pathologic synoptic reporting, and a 99.1% internal consistency of staff credentials. Conclusion The NCDB is characterized by a high level of case completeness and comparability with uniform standards for data collection, and by hospitals with high compliance, timely data submission, and high rates of compliance with validity standards for registry and data quality evaluation.

The National Cancer Database Conforms … the quality of data collected by cancer registries is the framework described by Bray and Parkin 3,4 in 2009.
The Bray and Parkin registry and data quality framework was developed with four unique domains: completeness, comparability, timeliness, and validity. 3,4Completeness represents the extent to which all the incidences of cancer occurring in the population are included in a registry. 3,4ompleteness is crucial for ensuring that estimates approximate the true value in the population. 3,4Comparability represents the extent to which statistics generated for different populations, using data from different sources and over time, can be compared. 3,4Comparability is achieved using standardized guidelines on classification procedures, maintaining consistency for coding cancer cases. 3,4Timeliness relates to the rapidity through which a registry can abstract and report reliable cancer data, which is crucial for decisionmaking. 3,4Validity represents the proportion of cases in a dataset with a given characteristic that truly has that attribute, which is crucial for relevant interpretation of estimates calculated using the data. 3,4[7] The processes that ensure data quality of both populationand hospital-based cancer registries in the United States of America (USA) have been consistent for several decades and include standardization of data-field definitions, quality checks executed during data abstraction, and case monitoring after submission (Fig. 1).The principal aim of a population-based cancer registry is to record all new cases in a geographic area or state, with an emphasis on epidemiology and public health. 8,9By contrast, a hospital-based registry is designed to improve patient quality of care at the institutional level. 8,9Both population-and hospital-based cancer registries adhere to uniform procedures during the record abstraction and coding process to ensure accuracy but serve different purposes.
The reporting of cancer cases to the population-based central cancer registry (CCR) is mandated by legislation in the USA and territories. 10,11][12] The reporting of cancer cases within a hospital is mandated by the hospital-based National Cancer Database (NCDB) to maintain accreditation from the Commission on Cancer (CoC). 13,14Although the Bray and Parkin quality control criteria were written primarily with population-based registries in mind, we propose their use for large hospital-based registries, such as the NCDB.
Cancer surveillance programs collaborate to standardize definitions of relevant cancer data items and closely monitor estimates of cancer trends and outcomes calculated using different data sources. 9Each cancer surveillance program works with oncology data specialist (ODS)-certified cancer registrars who are educated, trained, and certified in abstracting cancer data following established definitions and rules. 9,15Although these processes, among many others, have demonstrated consistency over time, they also are dynamic and undergo periodic revisions to incorporate advances in cancer care and ensure the availability of contemporary cancer data. 9,15he NCDB is a hospital-based cancer registry and contains approximately 40 million records, collecting data on patients with cancer since 1989. 16,17The NCDB is jointly maintained by the American College of Surgeons CoC and the American Cancer Society. 13,17To earn voluntary CoC accreditation, a hospital must meet quality of patient care and data quality standards. 13Hospitals are evaluated on their compliance with the CoC standards on a triennial basis through a site visit process to maintain levels of excellence in the delivery of comprehensive patient-centered care. 13he CoC standards are designed to ensure that the processes of the hospital's cancer program support multidisciplinary patient-centered care. 13Adherence to these standards is required to maintain accreditation in the CoC.The standards demonstrate a hospital's investment in structure along a full continuum from cancer prevention to survivorship. 13verall, approximately 1500 CoC-accredited hospitals submit data to the NCDB each year. 16The NCDB collects data from patients in all phases of first-course treatment in cancer care and cancer surveillance and includes the addition of roughly 1.5 million records with newly diagnosed cancers annually. 14,16,17Reportable cancer diagnoses will originate from single-and multi-institution cancer registries. 18The fundamental purpose of the NCDB is to capture data designed to improve patient outcomes. 18vidence-based quality measures representing clinical best practice are reported from the NCDB through interactive benchmarking reports. 13This includes the Rapid Cancer Reporting System (RCRS), a web-based tool designed to facilitate real-time reporting of cancer cases. 13lthough registrars who submit data to the NCDB are involved in aspects of both the population-based registries and the hospital-based registries, not all quality procedures performed by registrars pertain to the NCDB (Table 1).Quality procedures identified by Bray and Parkin that are relevant only to population-based cancer registries include assessment of age-specific curves, incidence rates of childhood cancers, mortality incidence ratio stability, number and average sources per case, and death certificate methods. 11Death certificate-only analyses are performed routinely across all population-based registries. 11Death certificate analysis as a quality indicator does not directly affect the NCDB.However other quality procedures are performed after data submission as part of data aggregation, quality assessment, and reporting within the NCDB.
The NCDB is part of a multi-agency, National Cancer Registry community in the USA that works collaboratively to ensure that consistent, high-quality cancer data can be applied across diverse utilities (Fig. 1).This surveillance community comprises the central cancer registries, including the Centers for Disease Control and Prevention (CDC), National Program of Cancer Registries (NPCR), and the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute (NCI); the National Cancer Registrars Association (NCRA); and the CoC. 19The North American Association of Central Cancer Registries (NAACCR) is also part of this community and serves a vital role as a consensus organization. 11The NAACCR facilitates standardization of data definitions, abstraction and coding rules, quality procedures, and registry certification, which in turn ensures uniform registry processes and establishes data quality standards. 11Instructions to support standardized data definitions, abstraction, and coding rules, as well as quality procedures, are detailed in key manuals and documents. 11n assessment of existing quality processes and procedures is fundamentally important to ensuring that the best possible data are being used to inform cancer practices and policies.The principal aim of this study was to assess the quality of cancer data collected by the NCDB using the Bray and Parkin framework.

Completeness
Completeness, defined as a measure of representation, is the extent to which all the incident cancer cases occurring in the population are included in the registry.Case-finding procedures are considered critical to both cancer registry coverage and survival accuracy.Completeness includes nine quality procedures (Table 1). 3,4ecause of the legislative mandate to report cancer cases to population-based cancer registries in the USA, population-based cancer registries are regarded as the gold standard for data completeness. 11We evaluated data completeness within the NCDB by comparing the number of incident cancer cases from participating central registries included in the United States Cancer Statistics (USCS), the official federal cancer statistics. 12These statistics include cancer registry data from the CDC's NPCR and the NCI SEER program. 12he National Cancer Database Conforms … The USCS internal quality control file includes cases from all 50 states and the District of Columbia, providing information on demographic and tumor characteristics.12 Cancers diagnosed at a Veterans Affairs hospital were excluded from the NCDB analysis.Cases were further limited to malignant disease except for benign and borderline brain and other nervous system cancers and female in situ breast cancers.Only male and female cancers diagnosed within the USA between 2016 and 2020 were included.The percentage of cancer cases captured within the NCDB from 2016 to 2020 were compared against prior reports, which included diagnostic years 2012 to 2014. 14mparisons were made by primary disease site using the SEER definitions of the World Health Organization (WHO) International Classification of Diseases for Oncology, thirdedition (ICD-O-3) site recodes.20 Additional stratification included sex, diagnosis year, patient age, race/ethnicity, and state of diagnosis corresponding to the patient's residence.
Outcomes for other measures of completeness that affect all registries (Table 1) have been previously reported. 21ncidence case ascertainment for the NCDB is continuously verified with CoC special studies, which are required for accreditation, and specifically capture additional data on previously submitted cancer diagnoses.This provides an extra level of detail and audit of abstraction accuracy.][24] This type of auditing may be extended to assess registry completeness.

Comparability
The study ensured comparability by using standardized international guidelines on coding and classification procedures for cancer data abstraction. 3,4Cancers reported to the NCDB are identified by the WHO ICD-O-3 topography, morphology, behavior, and grade codes. 257][28] Staging standards are defined by the American Joint Committee on Cancer (AJCC). 29The rules for coding include timing relative to initiation of treatment.Clinical staging includes the extent of cancer information before initiation of definitive treatment or within 4 months after the date of diagnosis, whichever is shorter. 29,30Pathologic staging includes any information obtained about the extent of cancer through completion of definitive surgery or within 4 months after the date of diagnosis, whichever is longer. 29,30Secondary diagnosis codes are captured by the cancer registry as International Classification of Diseases, 10th Revision codes. 30The CoC also requires registries to submit up to 10 comorbid conditions to the NCDB.These conditions influence the health status of the patient and treatment complications. 30n interactive drug database maintained by SEER facilitates the proper coding of treatment fields. 31The rules for diagnostic confirmation require the reportability of both clinically diagnosed and microscopically confirmed tumors. 30Clinically diagnosed tumors are those with the diagnosis based only on diagnostic imaging, laboratory tests, or other clinical examinations, whereas microscopically confirmed tumors include all tumors with positive histopathology. 11,30Cancer registries reference both "ambiguous terms at diagnosis" to determine case reportability and "ambiguous terms describing tumor spread" for staging purposes. 30or reportability, the NCDB follows rules for class of case to describe the patient's relationship to the facility.Rules exist for the reporting of multiple primary tumors to the NCDB. 32These solid tumor rules are aimed at promoting consistent and standardized coding by cancer registrars and are intended to guide registrars through the process of determining the correct number of primary tumors. 32

Timeliness
No international guidelines for cancer registry data submission timeliness exist, although the cancer surveillance community has specific timeliness standards for their respective registries. 11Timeliness of NCDB data submission was assessed using compliance with CoC standard 6.4 (Table 1). 13

Validity
Validity is defined by Bray and Parkin 3,4 as the proportion of cases in a dataset with a given characteristic that has this characteristic.Data validity is maintained through procedures specific to quality control that are integral to the registry and tied to CoC standards 3.2, 4.3, 5.1, and 6.1 for CoC accreditation (Table 1). 13ccreditation for anatomic pathology by a qualifying organization is a component of standard 3.2, designed to further structure quality assurance protocols. 13Histologic verification also is assessed in compliance with CoC standard 3.2 and ensures that each hospital provides diagnostic imaging services, radiation oncology services, and systemic therapy services on site with accreditation by a qualifying organization for anatomic pathology. 13ompliance with CoC standard 4.3 is assessed for internal consistency, which ensures that all case abstraction is performed by cancer registrars who hold current certification by the NCRA. 13,15This ensures that registrars use, maintain, and continue their formal education through NCRA and thus continue working toward correct interpretation and coding of cancer diagnoses. 13,15tandard 5.1 requires College of American Pathologists 33 synoptic reporting and for each hospital to perform an annual internal audit, confirming that at least 90 % of all cancer pathology reports are in synoptic format. 13he database validity criteria for re-abstracting, recoding, and reliability procedures identified by Bray and Parkin are measured in compliance with CoC standard 6.1.Additionally, data edits are integrated to maintain quality control. 11hese electronic logical rules evaluate internal consistency of values or data items. 11For instance, a biologic woman with a diagnosis of prostate cancer will fail edits.Edits are currently maintained by NAACCR based on edits originally developed by SEER. 34The NAACCR Edits' Metafile comprises validation checks applied to cancer data. 34The CDC develops and maintains software (EditWriter and GenEDITS Plus) for registries to obtain edit reports on their cases using the standards maintained by NAACCR. 34,35The NCDB assigns scores that are applied to the call for data and to RCRS reporting requirements, causing a case to be rejected or accepted into either dataset. 36An edit score of 200 will cause a record to be rejected from the NCDB. 36ll data were analyzed using SAS version 9.4 (SAS Institute, Cary, NC, USA) 37

RESULTS
The exclusion and inclusion criteria resulted in 9,269,442 cases from the USCS and 6,828,507 cases from the NCDB.Compared with the USCS, the official cancer statistics, 39 the NCDB demonstrated 73.7 % completeness of cancer cases diagnosed in the USA between 2016 and 2020 (Table 2).Among the top 10 major cancer sites, breast cancer in males and females had the highest coverage, at 81.9%, and the lowest coverage was found for melanoma of the skin in males and females, at 52.0% (Table 2).In aggregate, coverage steadily increased from 73.0% in 2016 to 74.3% in 2020 (Table 3).Age group comparisons showed the lowest coverage (61.1%) for the patients 85 years of age or older, with the highest coverage for those 20-74 years of age (73.1-80.4%)(Table 3).Race and ethnicity comparisons showed coverage to be 68.4% for white patients, 73.7% for black patients, 41.0% for American Indian/Alaskan Native patients, 70.7% for Asian/Pacific Islander patients, and 56.4% for Hispanic patients (Table 3).Finally, by state, Arkansas demonstrated the lowest coverage (24.0%), and North Dakota demonstrated the highest coverage (98.9%) (Table 4).
For timeliness, CoC standard 6.4 was assessed on the requirement for timely data submission, with compliance at 92.7% (Table 5). 13This standard has three components.The first criterion assesses compliance with monthly data submissions of all new and updated cancer cases. 13The second criterion ensures that all analytic cases are submitted to the NCDB's annual call for data. 13The third criterion requires hospitals at least twice each calendar year to review the quality measures performance rates, which are affected by timeliness of data submission. 13lidity was assessed on compliance with CoC standards 3.2, 4.3, 5.1, and 6.1 at more than 90% (range, 93.6-99.1%)(Table 5).The compliance rate for CoC standard 6.1, which requires review of at least 10% of cases each year and CoC hospitals to establish a cancer registry quality control plan, was 94.2%. 13The re-abstracting and recoding auditing approaches involve data captured by the registry compared with data collected by a designated auditor. 11Compliance with histologic verification standards was high, at 93.6% for CoC standard 5.1 pathologic synoptic reporting and 99.1% for CoC standard 3.2 accreditation for anatomic pathology by a qualifying organization.The synoptic format must be structured and must include all core elements reported in a "diagnostic parameter pair" format. 13Each diagnostic parameter pair must be listed together in synoptic format at one location in the pathology report. 13Compliance with CoC standard 4.3 was at 99.1%.This standard for credentials may additionally include participation in reliability studies designed to measure abstractor and coder compliance with existing coding rules. 11eproducibility is a goal in assessing the reliability study measures to help identify ambiguity or inadequacy of existing data definitions and rules as well as education needs. 11dits checks at the time of data submission are part of the NCDB validity criteria and are covered in the Bray and Parkin criteria. 3During the 2023 annual call for data, which began in March 2023, the NCDB processed 12,151,768 records consisting of 2021 diagnoses and follow-up resubmissions from prior years.Of the total, 71,854 cases failed the NCDB edits score, representing less than 1 %.

DISCUSSION
The current study characterized the NCDB data quality in all four domains defined by Bray and Parkin, 3,4 including high rates of completeness, comparability, timeliness, and validity.The cancer registry stakeholder community, demonstrated in Fig. 1 collaborates to standardize abstraction practice with universal coding definitions.The CoC accreditation standards layer an additional component to quality assurance with regard to histologic verification, registry staff credentials, synoptic reports, and inclusion of submission timeliness.Altogether, nearly all framework that applies to the hospital-based NCDB, identified by the Bray and Parkin criteria, is maintained with results indicative of consistency and stability over time.
The CoC standards for data quality that we examined are associated with high compliance and are a necessary component to maintain accreditation by the CoC.Cancer hospitals of the CoC are diverse by region, patient case mix, and volume, yet still display unified adherence to compliance with metrics designed to promote high quality of data.Many of the countries that previously reported on national registry data quality have universal health care coverage with a single or two-tiered national provider. 5,6orway has an 11-digit personal identification assigned to all newborns and people residing in the country. 5In contrast, the USA has a complex system of insurance options and eligibility criteria that patients navigate on their own or through their employer.The USA has no national patient identifier, and the gathering of cancer data could be further complicated by the variability in electronic health record systems, which may not be interoperable.Despite these challenges, registrars that submit data to the NCDB demonstrate the effectiveness of quality control mechanisms developed in partnership with the registry stakeholder community, yielding high-quality data.Hospitals are required to follow standard processes and procedures to abstract and report data to the NCDB, including treatment information, and are therefore a valuable resource for evaluating cancer treatment patterns.Although central registries capture treatment information, this varies by state and therefore is not routinely available in the public facing NPCR and SEER data.
This study had limitations to be noted.First, the NCDB does not capture data beyond those hospitals accredited by the CoC.The USA has approximately 6000 hospitals, 40 with variable definitions and practices.Through this study, we determined that the NCDB captures 73.7% of cancer patients in the USA compared with national data.
A second limitation was that the NCDB does not collect direct patient identifiers, including name.The patient's name is necessary to run the NAACCR algorithm used by population-based registries to identify Hispanic identity, demonstrated to be of lower coverage in the NCDB.
Finally, the NCDB is not designed to assess changes in clinical practices or quality of care in real time, although with the launch of RCRS, more timely evaluation of sudden changes in cancer care and outcomes, such as those that occurred during the first months of the COVID-19 pandemic, is increasingly feasible.Mandatory concurrent data abstraction rules are in place and required of hospitals accredited by the CoC.Data submission rules are currently in place that require all new and updated cancer cases to be submitted monthly. 13Additional progress with timeliness is expected as the CoC standards for concurrent abstraction are adjusted to include the diagnostic and first treatment phase of care.There are plans for future studies to evaluate the completeness, comparability, validity, and timeliness of RCRS data and the feasibility of using real-time data in research.
Advances in cancer control are information dependent.As new data sources and analytic platforms become available, it is imperative that data quality be considered alongside data availability to ensure information validity and reliability.The data quality standards described in this report and adhered to by the NCDB facilitate reporting to hospital administration personnel for decision-making, researchers and epidemiologists, and quality analysts, as well as to governments that mandate reporting of cancer.Registry data must be comprehensive, granular, and valid.High-quality data allows use of the NCDB during the CoC accreditation process to include reports on quality-of-care measures and patient outcomes assessments.The NCDB provides a comprehensive view of cancer care in the USA within CoC-accredited hospitals.OPEN ACCESS This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

DISCLOSURE
The findings and conclusions in this article are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.Daniel J. Boffa, MD, MBA, FACS, received a stipend to attend a panel discussion from Iovance in May 2022.The remaining authors have no conflicts of interest.
National Cancer Registry quality processes.The quality of cancer data in the United States is supported by a large, multi-agency, National Cancer Registry stakeholder community in the United States that works collaboratively to ensure consistent, high-quality cancer data that can be applied across diverse utilities.These National Cancer Registry stakeholders standardize cancer data definitions, abstraction and coding rules, and registry-based quality procedures as well as registrar education, training, and certification.These national standards are monitored at the hospital level through compliance with quality procedures during the record abstraction and coding process as well at the national level during the process of data aggregation for quality and reporting.AJCC American Joint Committee on Cancer, CDC Centers for Disease Control and Prevention, CoC Commission on Cancer, NAACCR North American Association of Central Registries, Inc.; NCDB National Cancer Data Base, NCRA National Cancer Registrars Association, SEER Surveillance, Epidemiology, and End Results Program, STORE Standards for Oncology Registry Entry, SSDI Site-Specific Data Item, WHO World Health Organization

TABLE 1
Assessment of NCDB registry and data quality according to Bray and Parkin criteria

Table 1
(continued) USCS United States Cancer Statistics, NAACCR North American Association of Central Registries, Inc.; CoC Commission on Cancer, NCRA National Cancer Registrars Association, WHO World Health Organization, ICD International Classification of Diseases, SSDI Site-Specific Data Item, AJCC American Joint Committee on Cancer, STORE Standards for Oncology Registry Entry, SEER Surveillance, Epidemiology, and End Results Program, CDC Centers for Disease Control and Prevention, RCRS Rapid Cancer Reporting System a Procedures followed by all registrars for purposes of reporting to population-based registries that may not have a direct impact on reporting to the NCDB The National Cancer Database Conforms … or SEER Surveillance Research Program, National Cancer Institute SEER*Stat software version 8.4.2. 38

TABLE 2
The National Cancer Database Conforms … Comparison of incidence for completeness by disease sites in 2016-2020

Table 2 (
NCDB National Cancer Database; NA not applicable; NOS not otherwise specified a Totals include all breast disease, both males and females, miscellaneous primaries, and invalid primaries not defined in the SEER site recode ICD-O 3/WHO 2008 definitions not shown in the table.

TABLE 4
Comparison of incidences for completeness by patient state for all cancer sites in 2016-2020

Table 4
(continued)USCS United States Cancer Statistics, NCDB National Cancer Database a These states did not meet the requirements for USCS publication criteria for diagnosis year 2020.

TABLE 5
Program compliance with Commision on Cancer Data and Registry Quality Accreditaiton Standards, based on Commission on Cancer Accreditation Site Visits 2022 (n = 329 programs) a Accreditation for anatomic pathology by a qualifying organization b Newly accredited hospitals are not rated on standard 6.4 until their first re-accreditation visit resulting in the discrepant N.