Accurate and comprehensive healthcare data are vitally important for a variety of purposes, as clearly stated in the newly released article examining diagnostic coding in intensive care patients [1]. These data may be used for local assessments or evaluations within a healthcare system, such as for specific outpatient conditions or inpatient hospital events. The data may also be used regionally or nationally for assessing performance within or across healthcare systems. Also, while comparisons become enormously difficult, administrative data may be used for comparing across national boundaries, to assess international differences in healthcare and disease.

Administrative healthcare databases are uniquely suited to epidemiological studies of disease, particularly for studying the incidence or outcome of rare diseases that are impossible to study locally or within traditional cohort studies [2]. Such data are also uniquely suited to understanding secular trends in disease and examining healthcare resource consumption for planning the future of healthcare with respect to diseases and financial allocations.

Healthcare databases are most frequently developed for the purpose of assessing the quality of healthcare, often for a specific disease or within a specific healthcare delivery system. In the field of critical care medicine, there are databases such as Project Impact Critical Care Medicine (PICCM), the Acute Physiology and Chronic Health Evaluation (APACHE) system, the French intensive care databases Collège des Utilisateurs de Bases de données en Réanimation (Cub-Réa) and OutcomeRea, and the UK Intensive Care National Audit and Research Centre (ICNARC) Case Mix Program Database. Condition-specific registries have been developed with some success, such as with the US National Registry of Cardiopulmonary Resuscitation [3], the PROGRESS sepsis registry [4] and the institutional Harborview Medical Center ARDS Registry [5].

Outside critical care there are data collected for primarily administrative purposes, such as the Medicare Provider Analysis and Review database (MedPAR), the National Hospital Discharge Survey (NHDS) or the Healthcare Cost and Utilization Project (HCUP) – all set by the US government – or databases maintained by the University Healthcare Consortium and Kaiser-Permanente, to mention just two. As a general rule, corporate databases are proprietary while government data are publicly available, with some corporations offering the ability to combine regional and healthcare system data into a unified database [6].

Healthcare databases have been an essential component of understanding and improving critical care worldwide. Investigators have utilized primary administrative data to increase our knowledge of specific diseases, particularly through epidemiological studies. In addition, the development of the APACHE score, the Simplified Acute Physiology Score and the Mortality Probability Model have permitted determination of risk-adjusted outcomes for critically ill patients, and are now routinely utilized for assessing healthcare quality. As with many healthcare databases, their use has expanded from the original intent to permit novel research investigations for important areas in healthcare. For example, the APACHE database has permitted examination of the relationship between hospital volume and outcomes of mechanically ventilated patients [7], the HCUP databases have permitted examination of longitudinal trends in pulmonary artery catheterization [8], and the ICNARC, Cub-Réa and NHDS databases have provided novel information regarding sepsis and factors that influence its incidence and outcome [916].

Expectedly, there are significant limitations to all administrative and healthcare data. Often this relates to the breadth of data collected, which is frequently determined by the expected use of the database. For example, APACHE data include detailed information on clinical physiology and laboratory abnormalities, while HCUP data include detailed information on the source of admission, diagnoses, procedures and financial costs of care. Perhaps most importantly, for databases that rely upon administrative coding, there may be significant limitations in data quality.

Misset and colleagues examined diagnostic coding for patients in the OutcomeRea database and found a poor correlation between the coding performed at the time of hospitalization and subsequent expert coding, as well as a poor correlation between two experts assigning diagnostic codes from reviewing the medical record [1]. It is unclear whether these results are related to the OutcomeRea database, to local coding practices or training, to national effects specific to France, or to influences of critical care or critical care medical conditions. Regardless, the results raise concerns about the accuracy of administrative coding, and particularly about the accuracy of post hoc administrative coding of medical records. Additional studies are needed to answer these questions and to validate coding strategies in individual databases.

As a critical care community, we desperately need well-conceived, comprehensive and accurately collected healthcare databases. Investigators and oversight entities have achieved some success in meeting this need outside the United States, such as with OutcomeRea and ICNARC. In contrast, there is a remarkable paucity of critical care data collected within the United States. Databases such as NHDS, HCUP, APACHE, PICCM may partially serve this purpose, yet their data are limited either in location (for example, few participating institutions), in scope (for example, focus on specific medical conditions) or in breadth of data collected. As a critical care community, for purposes inclusive of healthcare quality, research and education, we must develop comprehensive databases that incorporate the best features of these with accuracy and appropriate breadth of data collection. We must begin this process now, using advocacy and collaboration to achieve our goals.