The quality, accuracy and completeness of the information in medical records is fundamental to good patient care and to quality improvement activities: “you cannot improve what you cannot measure” [1]. The transition from paper-based records to Electronic Medical Records (EMRs) has led to expectations that electronic health care data can and will be used to measure and improve the quality of care provided to patients [2, 3].

However, the quality of the data entered in EMRs as part of routine care needs to be assessed; missing or inconsistent data may make the measurement of quality problematic [4]. Many physicians continue to simultaneously use both paper and electronic medical records, or “hybrid” charts, [5] leading to uncertainty as to where data reside. Problems already encountered in both Canadian and international primary care EMR settings include inconsistent or missing diagnostic coding and risk factor designation, “dirty data” (misspelled words, inconsistent word strings, free text strings instead of structured data), missing “metadata” or description of data content (for example, referral to “Dr Smith”, where physician specialty is not listed) and data entered in inconsistent or incorrect database fields [4, 613].

Data quality factors can be categorized as data completeness (are all the data present?), data reliability (are data recorded in the same way across practices and over time?) and data validity (are the data correct?) [14, 15].

In a recent literature review of EMR data use for health outcome research, only 24% of studies had a data validation component, and most studies used paper based records as the gold standard [16]. The least common method of validation was the use of administrative data [16, 17]. The majority of data validation studies were not conducted in the primary care setting [16]. There were difficulties with the reference standards used to evaluate EMR data quality; some studies relied on unvalidated standards (surveys, interviews); current “gold standards” such as paper notes may be problematic [14, 18, 19]. As one editorial noted, “there are no agreed reference standards for reporting data quality in primary care and this limits measurement of data quality in electronic patient records” [20]. While there are many possible ways to measure EMR data quality and many areas that can be measured, [8, 21, 22] systematic reviews of data quality assessment have noted a focus on diagnostic data, laboratory testing, risk factors and demographic information, with limited information on data quality regarding preventive services [7, 15].

We recently studied the effects of the first two years of EMR implementation in the practices of eighteen community-based family physicians in Toronto [23]. We measured preventive services targeted by Ontario’s pay for performance program. The pay-for-performance targets were based upon the percentage of a physician’s eligible enrolled patients being provided with Pap smears, mammograms, influenza vaccinations, fecal occult blood screening and primary vaccinations of children under two within a specified timeframe [2426]. We found no difference in the change of service provision between physicians implementing EMR and a group who continued to use paper records [23].

As part of the study, we evaluated aspects of the quality of the data present for those preventive services in the medical records by comparing results obtained from medical record audits against external data sources. The external sources were provincial administrative claims-based data housed at the Institute for Clinical Evaluative Sciences (ICES). ICES is an independent, not-for-profit health services research organization funded by the Ministry of Health and Long Term Care of Ontario. Ontario has population-based coverage for eligible physician, laboratory, diagnostic imaging and hospital services through the Ontario Health Insurance Plan. A copy of fees submitted to the Ministry of Health and Long Term Care of Ontario is maintained in anonymized administrative databases at ICES.

As noted above, medical record reviews and audits have traditionally been considered to be the “gold standard” for certain health care services, and have been used to validate administrative data [27, 28]. The administrative data we used to determine the rates of preventive services have not been validated using this method. However, we assumed that these administrative data captured a relatively constant proportion of services provided over the time during the period we studied. For example, if 75% of influenza vaccinations were available in administrative data in 2006, we would expect that approximately 75% would be available in 2007. We know of no positive or negative changes during the time period studied that would have affected billing proportions captured in administrative databases. We therefore used the change in preventive services reported in the administrative data for the study practices as the reference standard and compared this change to results from the chart audits. This gave us a method of measuring the reliability of data recorded in EMRs over time as compared with administrative data.

Our null hypothesis was that data for preventive services are reliably entered in clinical records over time during EMR implementation; there are no statistically significant or clinically important differences between changes in EMR data and changes in administrative data.

The research question was: does the change in preventive services in medical records before and during the first two years of EMR use agree with the change in preventive services in administrative data?



The eighteen physicians in this study had previously participated in a pay-for-performance study [29] and data on their characteristics and performance for 2004 and 2005 were available. They changed to a blended capitation model at the end of 2004, in which patients formally register (or roster) with a family physician. These physicians were exposed to pay-for-performance for preventive services in 2005 and began EMR implementation early in 2006. All participating physicians used the same EMR software (Nightingale On Demand®). We studied the change in preventive services in the two years prior to EMR implementation (2004 and 2005) and the first two years of EMR implementation (2006 and 2007).

The principal investigator was also a participant in this study.

Outcome measures

The study end point was whether or not a preventive service was documented within the required time period for an eligible patient. The target patient population consisted of all eligible enrolled patients. Documentation that the patient received the service through another health care provider was acceptable. Information on services and exclusion criteria is presented in Table 1; children’s vaccinations were not examined, as billing codes in administrative databases also include vaccinations other than the five used in the study.

Table 1 Eligibility criteria, exclusion criteria, and required period for preventive service provision

The denominators were the number of patients eligible for each service who were rostered to the physicians in each cohort by March 31st of each fiscal year (for example, March 31st 2005 for the 2004 fiscal year). Physicians report the preventive service performance levels they have achieved to the Ministry of Health and Long Term Care on March 31st.

The numerators were the number of eligible rostered patients who received a service in the 30 months prior to March 31st of each year for Pap smears, mammograms or fecal occult blood testing, or an influenza vaccination from October 1st to December 31st of the prior year. The rate of service was defined as the proportion of eligible patients receiving a service at least once in the previous 30 months (Pap smears, mammograms, fecal occult blood tests) or in the previous fall (October 1st to December 31st) for influenza vaccination.

We calculated a composite process score [30]. This is calculated by using the total number of medical records audited for eligible patients for each physician as the denominator, and the total number of services recorded in the audits as the numerator. We did not obtain fecal occult blood testing for 2004 as part of our previous study on pay for performance; fecal occult blood testing was not part of the Ontario pay for performance program until 2006. This measure was not included in the composite score due to lack of complete data. The composite process score therefore included mammography, Pap smears and influenza vaccinations from 2004 to 2007.

Data sources

Chart audits

We primarily audited the electronic medical records. However, when data were unavailable in an EMR, we retrieved data from the paper chart. We determined that 40 charts per service per provider would be required to achieve a study power of 80% to detect a clinically important increase in service provision of 5% or higher, with an alpha level of .05. To further enhance statistical power, we audited 50 charts per year, per service, per physician.

Five data auditors abstracted data. The research coordinator initially audited ten charts for each service in two practices and reviewed this with the principal investigator. The coordinator then trained each data auditor, and reviewed at least ten charts for each service. The data were independently entered in an Epi Info database [31] by two data entry clerks. Each clerk entered a training sample of at least ten charts for each service. A randomly selected 10% sample of data for each service, each year, and each physician was re-audited and entered in the database; we used the Kappa statistic to compare the two audits.

Administrative audits

We obtained administrative data for the entire practices from ICES, using the following datasets:

  • The ICES Physician Database (IPDB) for information on physician country of graduation;

  • The Corporate Provider Database (CPDB) for information on physicians’ Ontario Health Insurance Plan billing number;

  • The Ontario’s Registered Persons Database (RPDB) for patient age (as of August 31st 2007), gender and immigration recency by date of OHIP registration [32];

  • The Client Agency Program Enrolment (CAPE) tables for information on patient enrolment in each physician’s roster;

  • Statistics Canada data on neighborhood income, linked to patients’ residential postal code for estimates of income quintiles [32];

  • The Canadian Institute for Health Information’s Discharge Abstract Database for hospital discharge diagnoses;

  • The Ontario Health Insurance Plan for billing and diagnostic data to identify patient visits and diagnoses;

  • The Ontario Diabetes Database (ODD) for diabetics [28];

  • The Ontario Asthma Database (OASIS) [33];

  • The Ontario Congestive Heart Failure Database [34];

  • The Ontario Chronic Obstructive Pulmonary Disease Database [35];

  • The Ontario Hypertension Database [27];

  • The Ontario Myocardial Infarction Database [36];

  • The Ontario Cancer registry for information on breast or colorectal cancer

  • The Ontario Breast Screening Program (OBSP) for information on mammography

A detailed description of the billing codes for the administrative data is provided in Additional file 1.


We first calculated the composite process score for each year, for both medical record audits and administrative data. An equal number of charts had been randomly audited for each service in the medical record; we therefore assigned an equal weight to each service in the administrative dataset (which contained data for the entire practice) for the composite score calculation. We then adjusted for differences in patient age [37] using logistic regression. We used the Generalized Estimating Equation to adjust for the clustering structure of the data in regression models.

Next, we compared the composite process score found in medical record audits and administrative data for each year using the chi-square test. We then compared the year over year change in the composite process score (that is, the percent of change found in the medical record audits and the percent of change found in the administrative data) using logistic regression.

A difference of 5% in the change of services between medical record audits and administrative data was considered clinically important.

Analyses were performed with the use of SAS software, version 9.2 (SAS Institute). All tests were two sided and p values less than 0.05 were considered statistically significant.

The study was approved by the University of Toronto’s Research Ethics Board; the Sunnybrook Research Ethics Board approved the use of ICES data. All physicians provided written informed consent.


Physician characteristics and patient characteristics are presented in Table 2 and 3, respectively.

Table 2 Characteristics of physicians
Table 3 Characteristics of the total practice population

A comparison of the composite scores obtained from the medical record audits and from administrative data is presented in Table 4. There was a statistically significant greater increase of 4.2% (95% CI 2.0%–6.4%) in the change in services found in medical record audits following the introduction of pay for performance in 2005. However, this did not reach the previously identified clinically important level of 5%. There was a statistically significant and clinically important larger decrease in services in the first year of EMR when these services were measured using medical record audits rather than administrative data. Measured services declined by 8.7% (95% CI −11.0%– –6.4%) more when measured by medical record audits. There was no statistically significant or clinically important difference in change in the following year

Table 4 Comparison of medical record audits and administrative data, composite score for mammography, Pap smears and influenza vaccinations

There was a significantly smaller proportion of services found in the medical record audits as compared with the administrative data in the year that the EMR was introduced (2006). There were more services found in the medical record audits for the two years prior to EMR (2004 and 2005), and no difference was found in the following year (2007).

Table 5 presents data for individual services derived from medical record audits and administrative data. These are presented graphically in Figure 1 (for Pap smears and mammograms) and Figure 2 (for influenza vaccinations and fecal occult blood tests).

Table 5 Individual services levels derived from medical records audits and from administrative data
Figure 1
figure 1

Mammogram and Pap smear service levels derived from medical record audits and from administrative data.

Figure 2
figure 2

Influenza vaccination and fecal occult blood service levels derived from medical record audits and from administrative data.

The intracluster correlation for each service in medical record audits was generally small, at 0.036 for influenza vaccination, 0.0197 for fecal occult blood testing, 0.0189 for mammography, and 0.009 for Pap smears. Increasing patient age was a significant factor for the provision of a preventive service (data not shown). Using medical record audits, there was a decrease in the provision of mammograms, pap smears and influenza vaccinations between 2005 and 2006 (p < 0.001); there was no significant change in fecal occult blood testing (p = 0.12).

Administrative data showed no significant change between 2005 and 2006 for Pap smears (p = 0.52) or mammography (p = 0.48); there was a decrease in influenza vaccination (p < 0.0001) and an increase in fecal occult blood testing (p < 0.0001).

The overall agreement between the sample of medical records that were re-audited was acceptable (kappa 0.954).


We present a novel method of assessing an aspect of EMR data quality. We compared the rate of change of services over time in the EMR with the rate of change in administrative databases. We found a greater decrease in preventive services in the first year of EMR when data were obtained from medical record audits as opposed to administrative data sources. The discrepancy between the two methods points to the possibility of missing data for preventive services in the medical records.

Reliability reflects data stability, and is necessary (but not sufficient) for data validity [15]. Our administrative data for Pap smears were based on billing codes submitted by laboratories and physician billing codes as detailed in Additional file 1. Laboratory billings would not have been affected by EMR implementation. Administrative data for mammograms were based on radiology billing codes and data from the Ontario Breast Screening Program, as shown in Additional file 1. These also were independent of EMR implementation.

The change in influenza vaccinations was similar in the medical record audits and in the administrative data, perhaps reflecting fewer problems with documentation. Documenting an influenza vaccination does not require looking two years back for the presence of the service, as mammography or Pap smears do; therefore, there may be a less complex workflow associated with recording this service during the move to EMR. There were delays in vaccine delivery in 2006 and 2007, which could account for the lower levels of vaccination found in both the medical record audits and the administrative data during the fall season of those two years.

Challenges with the data needed to measure quality have been reported elsewhere in the literature [7, 8, 11, 14, 15]. Roth [42] found that only one-third of the indicators needed for a quality assessment program could be easily extracted from EMRs, and that there were difficulties associated with provider data entry habits and differences across different EMR applications [42]. The structure of the EMR is more complex than that of the paper chart: physicians may not be entering data in consistent or expected locations, making it difficult to extract [42]. Physicians and auditors may have challenges in navigating the electronic medical record. Data from external sources may be scanned in and may not be extractable electronically [43]. Physicians may continue to use both paper and electronic medical records, [5] scattering data across two different systems and possibly increasing the amount of incomplete or duplicated data in audits. Research and quality improvement projects using EMR data will need to consider the quality of data entered in the EMR, as well as issues specific to the EMR application used [42].

Baron described the implementation of a mammography recall program within an innovative, fully computerized primary care group practice [43]. The system was initially unable to properly audit mammograms and to produce accurate lists of patients to be recalled; mammograms were scanned in but were not recognized by the EMR. Baron described the development and implementation of practice processes to “tag” incoming mammograms so that patients could be properly categorized as having or not having had a mammogram within the previous two years [43]. Essentially, the practice cleaned and restructured their mammography data so that data were reliably entered in an area of the EMR where they could be audited.

The quality of information (accuracy, reliability, completeness) has been found to be associated with empirical measures of success in implementing information technology in the business literature [44]. Unreliable information makes a system less useful, impacting implementation efforts and decreasing the net benefits that could be obtained from the technology [45].

Measuring performance depends on accurate documentation [1, 46, 47]. Once reliable and valid data have been entered into the EMR, interventions that have been found to increase performance, such as audits and feedback to clinicians, [48, 49] point of care prompts for needed interventions, [4951] and reminder letters to patients [52, 53] can be effectively implemented. We found a lack of improvement in preventive service documentation associated with the early stages of EMR implementation [23]. It is possible that elements of those negative results were due to problems with data quality during the early EMR implementation efforts.


This study was limited to a group of selected physicians in Toronto. However, all physicians in this study were practicing in community-based settings, similar to the majority of family physicians in Ontario [54]. We studied a single commercially available EMR application, and results may differ for different EMRs. Nonetheless, a recent review of data in a national primary care EMR database using nine different EMR applications found that data quality problems were pervasive across all platforms [4]. Administrative data and patient level data in the EMR could not be linked; we compared practice level data using randomly selected EMR charts. Nonetheless, there is no a priori reason to suspect that there are systematic differences between the two samples.


In conclusion, we found that, in the early phase of EMR implementation, data for the preventive services we measured were not reliably entered over time in the medical records we audited when compared to provincial administrative data. Data reliability should be assessed if EMR-based data are used to measure and improve quality.