FormalPara Key Points for Decision Makers

The use of artificial intelligence (AI) with claims data has the ability to identify and reduce common biases in healthcare, such as doctor and omitted variable biases.

The ability of AI to detect intricate patterns in claims data pooled with other sources can lead to better care (e.g., detection of underdiagnosed diseases) and insurance coverage processes.

Patient confidentiality, methodological transparency, and potential discrimination are important issues to consider when using AI with claims data.

1 Introduction

Healthcare claims data provide structured information on patient interactions with the healthcare system, such as treatments given, providers used, billed amounts, and prescriptions filled. They consist of the billing codes that healthcare providers, such as physicians and hospitals, submit for payment by commercial and government health plans. Since they are collected regularly for administrative purposes, healthcare claims provide a relatively inexpensive source of information, over long time periods, for large numbers of patients. Moreover, as our payment structures change and patients ask more of their providers, the volume of available claims data is likely to increase rapidly.

The large sample sizes can be useful for studying rare conditions and the longitudinal dimension of the data allows researchers to examine treatment adherence and effects over time. However, healthcare claims typically provide only limited information on clinical severity, patients’ health status, and other variables of interest [1]. This can be ameliorated by supplementing claims data via linkages with other data sources, such as census data to account for neighborhood income or electronic medical records (EMRs) to obtain more clinical information [2].Footnote 1 These linkages open the door for a number of research possibilities, as relevant healthcare data can come in a variety of unstructured formats, such as text, pictures, audio, or video files.Footnote 2

While linked datasets permit a more comprehensive comparative analysis of alternative health technologies and interventions [4], incorporating the additional dimensionality and complexity of the data into the analysis can present challenges for standard statistical analysis [5]. In particular, explicit functional relationships in novel combinations of data may be unknown ex ante. Recent developments in artificial intelligence (AI) research have led to computer algorithms and systems with the ability to learn and extract complex patterns from raw data [6]. For example, machine learning algorithms, such as random forests and deep convolutional neural networks, often detect intricate relationships between input and output variables to improve out-of-sample predictions. Natural language processing algorithms can be used to parse and interpret written text. AI algorithms have been applied successfully in many domains of business, government, and science [7].

There are important differences in emphasis between standard statistics and AI algorithms. Standard statistics emphasize our understanding of underlying mechanisms (i.e., fitting a specific model and hypothesis testing). While AI algorithms can often detect unforeseen relationships and complicated nonlinear interactions within the data, their emphasis is typically on prediction accuracy (e.g., identifying the best treatment or course of action) [5].

Multiple approaches have been developed to palliate concerns about the interpretability of AI algorithms. For example, recent research has developed measures of the importance of specific covariates based on how much each contributes to the prediction accuracy of the model(s) [8, 9].Footnote 3 Like standard statistical methods, these importance measures can be biased based on the data used or the variables included in the model. However, they do allow researchers to tie predictive AI models back to intuitive relationships and existing knowledge.Footnote 4 Another approach is to develop methods that ally the flexibility of AI and the interpretability of simpler models, such as optimal classification tree methods.Footnote 5 These approaches are particularly valuable in healthcare settings, in which it can be particularly important to understand the decision process behind a decision, such as a diagnosis or treatment.

In this paper we discuss some recent and potential applications of AI to healthcare claims data.Footnote 6 AI has the potential to improve the detection of diseases and treatment adverse effects, identify and reduce diagnostic errors or personal biases, and improve the monitoring of insurance costs and fraud. However, this potential is associated with risks and challenges. Patient confidentiality, methodological transparency, and potential discrimination are important issues to consider when using AI with claims data.

2 Making the Most of Healthcare Claims Data

‘Artificial intelligence,’ broadly defined, is the ability for machines to perform tasks characteristic of human intelligence. In this paper we focus on machine learning algorithms, which are a subset of AI. Rather than researchers assuming explicit functional relationships between variables, machine learning algorithms are designed to learn relationships from the data. These learned relationships can be exploited to improve data analytics relying on claims data in numerous ways. Some examples are presented in this section.

2.1 Pooling Knowledge to Provide Better Care

A significant advantage of AI over standard statistical analysis is its ability to examine large multidimensional data with many variables. A statistician may easily be overwhelmed by a large number of potentially usable variables. AI algorithms, in contrast, are often able to identify which variables are important and detect the optimal combination of these variables for the task at hand.

By combining claims data with other information sources, such as patients’ laboratory results and EMRs, the knowledge and experience of a wide range of medical practitioners can be pooled together to detect complex patterns indicative of certain illnesses. The identification of these patterns can improve the early detection of diseases and the detection of underdiagnosed or rare diseases, provide more accurate diagnoses, or lead to a more proactive offering of personalized preventive services.

In the case of conditions that are underdiagnosed, for example, a sample of patient claims data (combined with laboratory values, specialist visits, and physician notes) can be analyzed by physicians to determine which of the patients should have been diagnosed and at what point in time. This process can then be used to train algorithms to detect other instances that should be flagged and identify early markers and indicators of future disease onset. For example, recent research has used AI to predict type 2 diabetes mellitus using claims data [11]. More generally, there have been a number of applications using AI with other types of data (e.g., CT scans, or genome sequencing in precision medicine) to support diagnoses, either through the prediction of disease onset or the identification of drug-resistant strains of disease (e.g., drug-resistant tuberculosis [12]).

Another promising application of AI is the detection of treatment adverse effects. Adverse effects are typically hard to discover in standard human trials because of relatively small sample sizes. Large claims data may help discover these adverse effects, in particular when combined with the pattern recognition power of AI. Recent research has used machine learning algorithms to detect adverse drug reactions using EMR data [13]. Healthcare administrative data can be linked with EMR data to add more clinical information [2], and may improve detection. More generally, as discussed by Dadson et al. [14], AI has been used to detect drug–drug interactions based on the text in government and scientific reporting systems and may be used with genomic data to determine genetic factors associated with particular adverse effects.

AI has also been used broadly to systematically learn from past experience. Examining patterns in pooled data can improve the monitoring of services, such as hospital readmissions or diseases contracted on site. For example, machine learning analysis on patients’ electronic health records, demographics, medical histories, admission and hospitalization histories, and likely exposure to the Clostridium difficile bacterium was shown to increase the accuracy in predicting the risk of hospital infections [15].

2.2 Identifying and Reducing the Effects of Biases

AI also has the potential to identify and reduce the effects of common biases in healthcare, such as doctor biases and omitted variable bias.

2.2.1 Doctor Bias

Doctors may exhibit biases in their propensities to prescribe particular drugs. These biases may result from the limited time doctors have to make a diagnosis, the information doctors are exposed to (e.g., limited samples, medical press, or patient history), doctors’ own, potentially self-reinforcing, experience (e.g., repeated diagnoses), or they may result from specific cognitive biases or personality traits (e.g., aversion to risk or ambiguity) or forecasting biases (e.g., over-confidence or belief in the law of small numbers) [16].

As described earlier, AI used on claims data pooled with information from other sources can provide support for better care. Taking advantage of more knowledge and varied experience can reduce doctors’ biases in decision-making. Doctors could be provided with a statistical diagnosis tool using models calibrated with data. For example, AI algorithms could scan patient histories, data, and relevant literature as a doctor is entering observations and notify the doctor of potentially useful information. In some settings, the diagnostic tool could also incorporate prior expert knowledge and decision-making into a predictive AI algorithm [17]. Although this tool would be limited to the extent that the relevant data can be encoded and made machine-readable, it could complement the doctor’s judgement with unbiased information [18].

2.2.2 Omitted Variable Bias

Claims data often contain information about specific treatments not available via randomized trials [19]. Using this information, algorithms can identify covariates correlated to both outcomes and treatment that previous researchers did not realize were important. This can reduce bias by ensuring no pertinent variables are omitted from the statistical analysis. In addition, AI prediction tools also feature a variety of methods to avoid overfitting and exclude superfluous variables.

Despite its richness, there is some potentially relevant information that may be missing or limited in claims data, even after pooling it with other sources, which can lead to omitted variable bias. The lack of lifestyle characteristics information is often cited as a main limitation of claims data [20]. Information on how the recommended treatment was followed by the patient or the ‘quality of care’ may also be missing. The ability of AI to find complex patterns in the data can potentially approximate this missing information via combinations of the variables that are available. This can improve the matching of treated and untreated patients, which in turn helps correct for treatment selection biases in retrospective studies of treatment efficacy or safety [21].

For example, it is notoriously difficult to compare the effect of different treatments based on retrospective studies. The decision to prescribe one treatment over another is generally informed by a doctor’s evaluation, and the factors that affect that evaluation, such as the patient’s disease severity, other co-morbidities, and history of compliance, may be unobservable to the researcher. If, say, one treatment tends to be used for more severe cases, comparing its efficacy (or safety) against another treatment without controlling for this tendency will bias the results in favor of the treatment that is typically used for easier cases.

Propensity score matching is one statistical technique used to limit this bias. The method essentially relies on a two-step process in which the probability that the doctor will prescribe a given treatment is estimated in the first step and then used in second step to account for differences in patient characteristics that affected the prescribing decision. An intuitive way to think about the first step is that it is an attempt to mimic (or predict) the doctor’s prescribing decision based on patient information. The better the prediction in the first step of the procedure, the smaller the bias in the overall comparison of treatments. Some researchers have advocated the use of a ‘chain of proxies’ often found in claims data to improve prediction (e.g., old age may serve as a proxy for co-morbidity or cognitive decline) [22]. Through the detection of intricate patterns amongst these proxies, AI methods can improve predictions even further. Recent research demonstrates that significant bias can remain in many claims data applications after using conventional methods and that the use of AI can significantly reduce, and essentially eliminate, such biases [21, 23, 24].

2.3 Improving Insurance Coverage Processes and Fairness

AI has had an influence on each step of the processing pipeline for insurance coverage: claim submission, claim adjudication, and fraud monitoring. Its influence is likely to continue well into the future.

An efficient and accurate intake process for claims submissions can reduce burdens on patients and the entire medical system. For example, AI image recognition can be used to facilitate the process of submitting and coding claims. One insurance company has trialed a procedure to automate claim approvals. Patients electronically submit photos of their hospital bills and, within moments, receive notification of receipt, approval, and credit to their account [25]. The cost savings associated with this automation process may allow insurance companies to scale up and provide more coverage. Some insurance companies have developed algorithms that recommend and incentivize healthy habits and behaviors to policyholders, such as exercise and nutrition strategies [26]. If successful, these may avoid claims submissions for preventable illnesses altogether.

Claim adjudication involves checking coverage, limits, contracts with providers, pharmacies, appropriate diagnosis, and procedure coding. Complex and suspicious claims trigger secondary, usually manual, processing steps. AI can save time via the efficient processing of normal claims (e.g., through increased automation in the settlement of claims based on their complexity and known patient history) [27]. In addition, AI can be used for early detection of abnormal price patterns among healthcare providers. This may help insurance companies monitor their costs and understand whether price increases reflect actual quality improvements.

Increasing the accuracy of triaging claims through AI can reduce losses due to fraud (e.g., a provider billing for services that were not actually provided or the provision of unnecessary medical services to generate insurance payments). While the share of actual fraudulent claims is generally very low [28], these can add up to very significant dollar amounts. Just counting savings to Medicare, the Health Care Fraud Unit of the US Department of Justice’s Criminal Division recovered and transferred billions of dollars in the 2017 fiscal year [29]. Several data mining efforts have been undertaken with increasing levels of sophistication to improve fraud detection [30]. A Google Patent search for “healthcare machine learning fraud detection” reveals over 7700 results [31]. The general challenge is to limit false positives, which can lead to losses in reputation of targeted firms as well as enforcement efforts that are channeled inappropriately, while increasing the proportion of fraud detected. The rules to detect individual fraud are now relatively simple to implement. Recently, more complex network fraud pattern recognition algorithms have been developed with the potential to significantly reduce the costs for insurers [32].

3 Risks and Challenges

While there is great potential for AI applications with claims data, this potential is associated with risks and challenges. Issues related to confidentiality, clarity of scope, appropriate methodology, and transparency of results need to be addressed.

3.1 Data Provenance: Confidentiality and Quality

Most ethical problems related to confidentiality can be dealt with by anonymizing the data. In the past, removing social security numbers, phone numbers, names, and most date of birth information was sufficient to anonymize data. However, the recent emergence of third-party datasets (e.g., social media and cell phone tower access records) has provided additional avenues to triangulate datapoints and identify individuals within the data [33]. This creates additional challenges and a potential need for better de-identification techniques that ensure that shared information cannot be used to approximate personal information even when combined with pattern detection techniques.

3.2 Data Use: Interpretation of Methodological Approaches

As is the case with standard statistical methods, AI algorithms built on biased data will lead to biased predictions. Supervised machine learning algorithms, for instance, learn relationships between outcome and predictor variables after being trained on a set of examples for which the true outcome is known. Algorithms will have trouble classifying new examples that are very different to what they have seen before. In machine learning this problem is referred to as “hasty generalization” [34]. While this problem is not unique to AI, the rapid speed at which individuals are now able to analyze large datasets has raised concerns that some believe the need for evidence of causal relationships is unnecessary, large databases by themselves are enough, and that the numbers will speak for themselves [35]. While complex correlation patterns are useful for making predictions, they may be of limited use for explaining why certain events occur [36].Footnote 7 The importance measures [8, 9] and highly interpretable AI models [10] discussed earlier can be useful in this regard.

Using AI or any statistical tool requires careful consideration of both the underlying data and the results. There are inherent biases in most claims databases due to sample selection (e.g., the characteristics of the insured population and varying standards of care), such that results may not be generalizable to other population groups. For instance, evaluating the effect of a drug that requires a very demanding treatment on a subpopulation with a high adherence rate may overestimate the effectiveness of the drug on the general population. In addition, the decisions made in real-life applications based on an algorithm’s results can have lasting effects (e.g., if an individual is incorrectly declined insurance because their profile shares some characteristics with at-risk groups) [39].

Supplementing claims data with information from other sources may limit the potential bias from unobserved variables. As discussed earlier, particularly rich data may also be used to triangulate and proxy for information that is still missing. However, rather than relying on technology to solve potential issues ex post, recent research has investigated whether technology can help gather more exhaustive, rich, and less biased datasets ex ante. For example, the blockchain technology associated with bitcoin is being considered as a potential solution to securely store healthcare data. Individuals would have ownership over their own personal data and would potentially receive financial incentives to make it widely available. Blockchain technology could also help document the provenance, veracity, and selection of the data used in studies. This could greatly reduce the widespread difficulty in replicating published medical research. Relatedly, this documentation could reduce the ability to cherry-pick results [40].

3.3 Usage: Prevention of an Adverse Welfare Impact from Discrimination

Just as AI algorithms can help detect a disease in its earliest stage, they can also help predict future health costs and therefore be used, in theory, to adjust insurance premiums based on combinations of personal characteristics. If an insurer can predict illness reliably, competition could lead to lower premiums for predictably healthier individuals and higher premiums for predictably sicker ones, even if the characteristics used for prediction are beyond an individual’s control.

Even if such a process could be appropriately implemented, it may be economically inefficient. An individual who is uncertain about their future sickness would want to smooth their future income across possible states of nature (e.g., sick vs. not sick). The individual would want insurance, but if the insurance company has reliable forecasting technology, competition could undermine that person’s ability to obtain insurance, leaving the individual worse off.Footnote 8

The extent to which such insurer strategies are already used and how much should be prohibited by law is often the subject of debate, particularly since similar issues appear in other industries. For example, research on credit access has examined the use of variables correlated with race and gender when evaluating loan requests and the potential bias against certain groups [42, 43].

At issue in these cases is not the methodology itself but rather its use. Let us assume, for example, that loans are evaluated by a group including individuals with biases (e.g., against immigrant or older loan applicants). If an algorithm is trained to mimic the decisions of these individuals (i.e., predict the loan approval decision) it will implicitly reproduce these biases. However, if the algorithm is trained to predict the future return of the investments made (i.e., the loans), the algorithm may show that unbiased decisions lead to better outcomes and may therefore provide better recommendations. In that case the AI algorithms might, for example, better align loan decisions with firms’ long-term profit interests while reducing biases [44].

AI methodologies are therefore not inherently good or bad, biased or clean. They are what we make of them. Interestingly, AI and claims data can be used to analyze the scope of the potential insurance discrimination problem, by examining the insurance premium distribution and identifying the groups that benefit for different scenarios of harmonization versus tailoring of premium rates.

4 Conclusion

Linkages of claims data with additional datasets provide rich sources of healthcare information. Given the complexity and scale of these datasets, it is unsurprising to observe that healthcare data have frequently been used in tandem with AI methods (Table 1). These methods can detect intricate patterns within complex data, improving their own performance through experience. For example, using claims data, laboratory results, and EMRs, the experiences of a large set of medical practitioners can be consolidated and used to detect patterns indicative of certain illnesses.

Table 1 Examples of benefits and challenges

While these disease patterns can be used to improve the quality of care, this knowledge can also be used by doctors as a relatively unbiased complement to their decision-making. AI has the potential to correct for biases due to omitted variables as well, using the complex patterns detected in the data to approximate the missing information with the information that is available. This has significant potential for use in numerous applications, such as retrospective studies comparing the effectiveness of different treatments.

If insurers, providers, and regulators can work together to increase the linkages between datasets, in a way that respects privacy, the capabilities driven by AI can lead to cost savings and quality improvements throughout the healthcare industry.

There are risks and challenges associated with the increased application of AI to healthcare claims and related datasets (Table 1). For example, AI has had a strong influence in the health insurance coverage process. Claim submission, claim adjudication, and fraudulent claim detection have either already improved or have the potential to improve due to the use of AI. However, the predictive ability of AI can also be used to adjust insurance premiums in undesirable ways based on combinations of personal characteristics. More generally, other potential issues involve the confidentiality of data and appropriate use of AI tools.

These risks and challenges are not insurmountable, however. They only imply that, as we increasingly apply more tools from AI, we need to be careful and responsible.