Usefulness of a large automated health records database in pharmacoepidemiology
In the present study, using a large automated health records database, we investigated the incidence of cardio-cerebrovascular events, diabetes new-onset events, and dialysis initiation events in hypertensive patients, and examined the effects of antihypertensive medications on these incidences.
Materials and methods
We conducted a search of an automated health records database that contained anonymous information from the health insurance claims and the results of laboratory tests at 15 medical facilities across Japan. The study cohort was defined as patients who were diagnosed with hypertension and who visited a medical institution in the registration period. Events were defined by diagnosis, medication history, and laboratory test results.
We obtained a cohort of 20,686 patients diagnosed with hypertension. The mean (standard deviation, SD) age in the cohort was 67.9 (13.2) years, and the follow-up period was 2.56 (1.42) years. The total incidence rates per 1,000 person-years in the present study population showed good agreement with rates in reported cohort studies: 8.10 (5.6–11.1) for cerebrovascular events, 1.27 (0.5–7.4) for cerebral hemorrhage, 6.57 (4.6–8.9) for cerebral infarction, 0.46 (0.1–1.0) for subarachnoid hemorrhage, and 1.75 (1.6–4.4) for myocardial infarction. The standardized incidence rates of cardio-cerebrovascular events, diabetes new-onset events, and dialysis initiation events were 9.73, 20.94, and 5.99 events/1,000 person-years, respectively.
In terms of the incidence of the investigated events in hypertensive patients, the study results suggested that the automated health records database data were as valid and reliable as data from other epidemiological studies.
KeywordsAutomated health records database Pharmacoepidemiology Hypertension Cardio-cerebrovascular events Diabetes mellitus
In Japan, prospective cohort studies based on the drug registry have been conducted as post-marketing surveillance according to Good Post-marketing Study Practice. As these studies were designed by pharmaceutical companies to investigate their own marketed products, they have not included comparative arms and were conducted over different periods of time.
In the USA and Europe, databases are compiled from prescription information and electronic health records and are used for post-marketing surveillance of medical products . In Japan, database services that use claim information are available. The Claims Database of the Japan Medical Data Center is compiled from the claims information of health insurance societies . The Claims Database can provide disease names and prescription records even if a patient switches to another health clinic, but it cannot provide test results or patient outcomes. However, large automated health records databases that use electronic medical charts are useful, because the accuracy of information is increased by the inclusion of patient outcomes and test results in the data. Therefore, large automated health records databases have potential advantages for epidemiological studies. However, it requires a lot of effort to collect data in a unified format from many medical institutions.
In the present study, we conducted a preliminary study using an automated health records database to demonstrate the usefulness of such a database in pharmacoepidemiology. Using this database, we focused on cardio-cerebrovascular events in hypertensive patients for the main purpose of validating it. The selection of these diseases was based on the assumption that the database can be validated by comparing the results of this study with other epidemiological data on cardio-cerebrovascular events. We then calculated the incidence of diabetes in hypertensive patients and in those patients who initiated dialysis, and examined the effect of antihypertensive medications on these incidences. Our aim is to assess the usability of large automated health records databases through these analyses.
Materials and methods
In the present study, we conducted a search of an automated health records database provided by Medical Data Vision Co., Ltd. (MDV). The database is an electronic health records-based database that contains anonymous information from the health insurance claims for about one million patients since January 2003 and on the results of blood tests and other laboratory tests for about 410,000 patients since January 2004 at 15 medical facilities across Japan. This database contains anonymized information, including patient background information such as age, gender, and relevant medical department, as well as disease name on the prescription, and information on medications, surgery, injections, tests, diagnosis procedure combination (DPC) claims, and results of blood tests and other laboratory tests.
The study cohort was defined as patients in the database who were diagnosed with hypertension and who visited a medical institution at least once during the period from January to December 2006, which was taken as the registration period. The start date of follow-up was set when an antihypertensive drug was prescribed for the first time in the registration period. The cohort was followed up for the incidence of various events to December 2009. The follow-up for the cohort was initiated on the date of first visit and was continued until the date of the final visit. Completed cases were defined as those for which there were observations from 2006 to December 2009, while withdrawn cases were defined as those where there was dropout prior to December 2009. Hypertensive patients in the database were identified according to International Classification of Diseases, 10th revision (ICD-10) (Supplementary Table 1). However, patients were excluded from the study cohort if they had any antitumor drug or anti-human immunodeficiency virus (HIV) drug administered before or on the day of registration. Antitumor drugs and anti-HIV drugs were defined according to the first 4 digits of YJ code, which is a computerized receipt code.
This study was approved prior to implementation, after the protocol was reviewed by the ethics committee of Kyoto University Graduate School of Medicine. Individual patients could not be identified because all the collected data, including the name of each medical institution, were anonymized and unlinkable.
Data collected from the database
The following information was collected from the study cohort database: each subject’s gender, age, date of registration, follow-up end date (day of last visit before December 2009), date of diagnosis with hypertension (day on which the patient was diagnosed with hypertension for the first time), complications, medications, blood test results, and presence/absence of extracorporeal dialysis. Complications and diseases diagnosed before or on the day of registration, except for having past history of cardio-cerebrovascular disease, were classified according to ICD-10 into diabetes, hyperlipidemia, cerebrovascular disease, myocardial infarction, angina pectoris, cardiac failure, renal disease, hyperuricemia, and other macroangiopathy (Supplementary Table 1). We classified patients into six groups based on their medications, according to the first 4 digits of the YJ code at time of registration: the angiotensin II receptor blocker (ARB) group, calcium channel blocker (CCB) group, angiotensin-converting enzyme inhibitor (ACE inhibitor) group, other antihypertensive drugs (others), a two-drug-use group, and an over-two-drug-use group. Other medications recorded included antidiabetic drugs, antihyperlipidemic drugs, and antiplatelet/anticoagulant drugs, and these were recorded according to the first 4 digits of the YJ code. Blood test result information collected was HbA1c, fasting blood sugar, total cholesterol, high-density lipoprotein (HDL) cholesterol, low-density lipoprotein (LDL) cholesterol, serum creatinine, and triglycerides. Apparently abnormal values in blood test results were excluded from the analysis.
Definition of outcomes
For cerebrovascular events, data on cerebral infarctions except cardiogenic embolism (CI), hypertensive intracerebral hemorrhage (ICH), and subarachnoid hemorrhage (SAH) were extracted from the database by ICD-10 code based on the diagnosis information at hospitalization, and the first day of diagnosis after registration was determined to be the date of onset. For cardiovascular events, data on myocardial infarction, angina pectoris, and cardiac failure were extracted from the database by ICD-10 code based on the diagnosis information at hospitalization, and the first day of diagnosis after registration was determined to be the date of onset. Cardio-cerebrovascular events were defined as combinations of cerebrovascular events and cardiovascular events, and the data were analyzed with the date of onset defined as the day when either a cerebrovascular event or a cardiovascular event occurred for the first time.
For events of new-onset diabetes, a treatment-based definition and a definition based on the results of HbA1c tests were used. The diabetes mellitus (DM) medication event was defined as one where an antidiabetic drug was newly administered during the follow-up period, and the date of event onset was defined as the day on which the treatment with the antidiabetic drug was started. Antidiabetic drugs were defined according to their YJ code (first 4 digits). “HbA1c 6.5%” was defined as an event where the HbA1c level was measured at 6.5% or greater, and the date of event onset was defined as the day when the measurement was performed. In addition, “HbA1c 6.1%,” which is a new criterion for the diagnosis of diabetes, was defined as an event where the HbA1c level was measured at 6.1% or greater. As HbA1c was measured in the concomitant use of antidiabetic drugs, these events defined by HbA1c measurement indicated patients for whom blood glucose control with antidiabetic therapy was difficult. Patients who had been diagnosed with diabetes before registration were excluded from the analysis of events of new-onset diabetes.
Newly introduced dialysis events were defined as those where dialysis was performed in patients who had not undergone dialysis before registration, and the date of event onset was defined as the day on which dialysis was performed for the first time. Data analysis was also carried out on the incidence of newly introduced dialysis events in patients diagnosed with diabetes at registration.
The crude incidence rate of each event (per 1,000 person-years) was calculated from the number of events and the total follow-up period. In addition, the standardized incidence rates by age and gender were calculated according to 2005 census figures. The adjusted hazard ratio with 95% confidence interval (95% CI) for each event of the usage pattern of antihypertensive drugs was calculated using Cox regression analysis for the time period until the onset of each event. Covariates included gender, age, inpatient/outpatient status at registration, complications (diabetes, hyperlipidemia, cerebrovascular disease, myocardial infarction, angina pectoris, cardiac failure, renal disease, hyperuricemia, and macroangiopathy except cardio-cerebrovascular disease), antihypertensive drugs at registration (CCB, ARBs, other single-use drugs, combination use of two drugs, and combination use of over two drugs), and other medications (antidiabetic drugs, antihyperlipidemic drugs, and antiplatelet or anticoagulant drugs) at registration. A Cox proportional hazard regression was calculated using SAS software version 9.2 (SAS Inc., Cary, NC, USA).
Results and discussion
We obtained from the database a cohort of 20,686 patients diagnosed with hypertension during the period from January to December 2006. The proportion of completed cases at December 2009 was 59.5% (12,299), and the mean (SD) follow-up period was 2.56 (1.42) years. Due to the way hospital health records are handled, the database does not include patient information when a patient moves to another clinic or hospital. As there were observations for almost 60% of patients at December 2009 and the mean observation period in the study cohort was 2.56 years with a 3-year observation period, this suggests that most patients in the cohort continued therapy in the hospital and did not move to another clinic. Patients with chronic hypertension do not move to other medical sites. In these cases, it is possible to apply the data in this database to a population-based survey.
The distribution of demographic variables in the cohorts
Total study population
Risk study for diabetes
Risk study for hemodialysis
Number, N (%)
Mean ± SD age, years
67.9 ± 13.2
67.7 ± 13.9
67.9 ± 13.2
Gender, N (%)
In/outpatients, N (%)
Concomitant and previous diagnosis, N (%)
Antihypertensive drug use at entered period, N (%)
Other concomitant use drug at registration, N (%)
Antithrombotic or anticoagulant drug
Investigation of the validity of the frequency of cerebrovascular events
Incidence rates for cerebrovascular or cardiovascular event patients
Number of events (%)
Incidence rate (per 1,000 person-years)
Cardiovascular and cerebrovascular events
To examine the representation of groups which the database comprised, we compared the disease frequencies with the data from a previous epidemiological survey in Japan (Table 2). The Hisayama survey is a generally cited, large-scale cohort study in Japan. For hypertensive patients in the Hisayama survey, Arima et al.  reported standardized incidence rates of the cerebrovascular events of ICH, CI, and SAH for each grade of hypertension. In our present study, the data have not been compared by severity of hypertension, nor evaluated by subtype of CI. However, if it is concluded that selection bias exists only in patients with grade 1 or grade 2 hypertension in consideration of the fact that the cohort comprised outpatients being treated for hypertension, there were no significant deviations. The JIKEI-Heart study is a randomized clinical trial with controlled valsartan and non-ARB treatment for 3,081 Japanese hypertensive patients . The KYOTO-Heart study is a randomized clinical trial with controlled valsartan and non-ARB treatment for 3,031 uncontrolled Japanese hypertensive patients . The results from the present database research also did not have significant deviations when compared with these clinical trial results. These results suggest that the population in the database used in this study is close to a previous epidemiological survey and clinical study in Japan and is valid.
Investigation of the validity of the frequency of diabetes and dialysis initiation events
Incidence rates for diabetic events and newly introduced dialysis
Number of events (%)
Incidence rate (per 1,000 person-years)
DM medication group
Antidiabetic drug use
HbA1c 6.5% group
HbA1c 6.1% group
Newly introduced dialysis
Newly introduced dialysis in diabetes patient
The number of patients with the event of newly introduced dialysis was determined to be 365 (Table 3). The standardized incidence rate was 5.99 events/1,000 person-years. The number of hypertensive patients with a complication of diabetes at registration in whom dialysis was newly introduced was 230. The standardized incidence rate was 12.08 events/1,000 person-years. The incidence rate for newly introduced dialysis in our study was higher than that in the JIKEI-Heart study and the KYOTO-Heart study. One reason for the deviation may be attributable to a possible high capture rate in our study, because the database included data from several dialysis therapy hospitals, while hospitals in the above clinical trials did not (Table 3).
Risk of cerebrovascular or cardiovascular event and hypertension treatment at registration
Hazard ratios for cerebrovascular or cardiovascular events
Antihypertensive drug use at registration
Adjusted hazard ratio versus CCB
HR (95% CI)
2 Drugs use
Over 2 drugs use
This study is the first pharmacoepidemiologic study to use a large automated health records database in Japan. Health records databases are useful in evaluating clinical event rates in hypertensive patients. Using the database and adjusting for confounding factors, it is possible to evaluate the relationship between antihypertensive drug medications and clinical events. Carrying out database surveys is an important method in pharmacoepidemiologic research, and we believe that database research can apply to other chronic disease surveys. However, there are some limitations with database surveys using automated health records databases. Firstly, although the data on complications, treatment history, and laboratory test results in the database were adjusted to the degree possible, it may not have been possible to completely adjust for confounding by indication, whereby a particular antihypertensive drug is administered to high-risk patients. As the database does not have vital sign measurements, such as blood pressure, weight, body mass index (BMI), and heart rate, this is a particularly important limitation in hypertension research. Secondly, the database has hospital-based data and not population-based data. In our case for hypertension, although there were few patients who moved to other clinics or hospitals, in other chronic disease surveys this point should be checked carefully. Other limitations were absence of confirmation of the clinical diagnosis, lack of random assignment to treatment groups, and lack of some information.
In conclusion, although it may be difficult to perform valid analyses in studies of highly specialized diseases and treatments due to selection biases, the present study indicates that automated health records databases in Japan can be applied to pharmacoepidemiologic studies of general diseases treated in various medical institutions. In the future, it will be possible to conduct comparative studies between treatment groups and nontreatment groups.
Conflict of interest
This study was sponsored by Nippon Boehringer Ingelheim Co., Ltd. The cost of using the database was born by Nippon Boehringer Ingelheim Co., Ltd., and the data were lent to Kyoto University. The study executors, Hirokuni Hashikata, Akio Koizumi, and Kouji Harada, do not have any conflict of interest with Nippon Boehringer Ingelheim Co., Ltd. and Medical Data Vision Co., Ltd. Tatsuo Kagimura is an employee of Nippon Boehringer Ingelheim Co., Ltd. Masaki Nakamura is an employee of Medical Data Vision Co., Ltd.
- 3.Arima H, Tanizaki Y, Yonemoto K, Doi Y, Ninomiya T, Hata J, Fukuhara M, Matsumura K, Iida M, Kiyohara Y. Impact of blood pressure levels on different types of stroke: the Hisayama study. J Hypertens. 2009;27:2437–43.Google Scholar
- 4.Mochizuki S, Dahlof B, Shimizu M, Ikewaki K, Yoshikawa M, Taniguchi I, Ohta M, Yamada T, Ogawa K, Kanae K, Kawai M, Seki S, Okazaki F, Taniguchi M, Yoshida S, Tajima N. Valsartan in a Japanese population with hypertension and other cardiovascular disease (Jikei Heart Study): a randomised, open-label, blinded endpoint morbidity–mortality study. Lancet. 2007;369:1431–9.PubMedCrossRefGoogle Scholar