Our two case studies, on medical diagnostics/malpractice and on corporate valuation/the business judgment rule, analyze recent advances in ML prediction tools from a legal point of view. Importantly, they go beyond the current discussion around the data protection requirements of explainability to show that explainability is a crucial, but overlooked, category for the assessment of contractual and tort liability concerning the use of AI tools. We follow up in Sect. 4 on explainability proper, additionally implementing an exemplary spam classification, to discuss the trade-off between accuracy and explainability from both a technical and a legal perspective.
Medical diagnostics is a field where technology and the law patently interconnect. The medical standard of care defines contractual and tort liability for medical malpractice. However, this standard is itself shaped by state-of-the-art technology. If doctors fail to use novel, ML-driven methods, which are required by the applicable standard of care, liability potentially looms large. Conversely, if they apply models that result in erroneous predictions, they are equally threatened by liability. Importantly, both questions are intimately connected to explainability, as we argue below.
The rise of ML diagnostics
Indeed, ML models using a variety of techniques are increasingly entering the field of medical diagnostics and treatment [overview in Topol (2019)]. For example, researchers developed a model, based on reinforcement learning, for the diagnosis and optimal treatment of patients with sepsis in intensive care. The model, termed “AI clinician”, analyzes the patient’s state and selects appropriate medication doses (Komorowski et al. 2018). On average, the AI clinician outperformed human intensive care experts. In a large validation data set, different from the training data set, patients’ survival rate was highest when the actual doses administered by human clinicians matched the AI clinician’s predictions.In another recent study, a deep neural network was trained to detect Alzheimer disease based on brain scans (Ding 2018). Not only did the network outperform human analysts by an important margin; it also detected the disease an average 75 months before the final clinical diagnosis. This makes particularly effective early treatment medication available to patients much earlier. Supra-human performance could also be established in the detection of retinal diseases (De Fauw 2018) and in heart attack prediction (Weng et al. 2017).
ML is increasingly used in cancer diagnosis and treatment, too (Kourou et al. 2015). This is often due to advances in pattern recognition in images, a field in which deep neural networks perform particularly well. Researchers trained a convolutional neural network (CNN) to identify skin cancer on the basis of images of skin lesions (Esteva et al. 2017). The CNN matched the performance of 21 board-certified dermatologists in the classification of two varieties of skin cancer (the most common and the deadliest version of skin cancer). Similarly, a Chinese ML was reported to have beaten a team of 15 experienced doctors in brain tumor recognition (Press 2018). However, not all that glitters is gold: IBM’s Watson had, according to news reports, difficulties in correctly identifying cancer patients (Ross and Swetlitz 2018), and hospitals were reported to cut back collaborations with IBM (Jaklevic 2017). This shows that rigorous field validation is necessary before ML models can be safely used in medical contexts—an issue that informs our analysis of the legal prerequisites for the adoption of such models.
As the previous section has shown, predictions used by medical AI models are not, of course, fully accurate in every case. Hence, we shall ask what factors determine whether such a potentially erroneous model may be used by a medical doctor without incurring liability. Furthermore, with ML technology approaching, and in some cases even surpassing, human capacities in medical diagnostics, the question arises whether the failure to use such models may constitute medical malpractice. The relevant legal provisions under contract and tort law differ from country to country. We offer a legal analysis in which we generally refer to German and US law in particular; nevertheless, general normative guidelines can be formulated.
Adoption For the sake of simplicity, we assume that all formal requirements for the use of the ML model in medical contexts are met. In April 2018, for example, the US Food and Drug Administration (FDA) approved IDxDR, an ML tool for the detection of diabetes-related eye disorders (US Food and Drug Administration 2018). While liability for the adoption of new medical technology is an obvious concern for medical malpractice law (Katzenmeier 2006; Greenberg 2009), the issue has, to our knowledge, not been discussed with an explicit focus on explainability of ML models. The related but different question whether the avoidance of legal liability compels the adoption of such a model has rarely been discussed in literature (Froomkin 2018; Greenberg 2009) and, to our knowledge, not at all by the courts. The answer to both questions crucially depends on whether it is considered negligent, under contractual and tort liability, (not) to use ML models during the treatment process. This, in turn, is determined by the appropriate medical standard of care.
Generally speaking, healthcare providers, such as hospitals or doctor’s practices, cannot be required to always purchase and use the very best products on the market. For example, when a new, more precise version of an x-ray machine becomes available, it would be ruinous for healthcare providers to be compelled to always immediately buy such new equipment. Therefore, they must be allowed to rely on their existing methods and products as long as these practices guarantee a satisfactory level of diagnostic accuracy, i.e., as long as they fall within the “state of the art” (Hart 2000). However, as new and empirically better products become available, the minimum threshold of acceptable accuracy moves upward. Otherwise, medical progress could not enter negligence norms. Hence, the content of the medical standard, whose fulfilment excludes negligence, is informed not only by experience and professional acceptance, but also (and in an increasingly dominant way) by empirical evidence (Hart 2000; Froomkin 2018).
Importantly, therefore, the acceptable level of accuracy could change with the introduction of new, more precise, ML driven models. Three criteria, we argue, should be met for this to be the case. First, the use of the model must not, in itself, lead to medical malpractice liability. This criterion, therefore, addresses our first question concerning liability for the positive use of ML technology in medicine. To rely on the model, there must be a significant difference between the performance of the model and human-only decision making in its absence. This difference must be shown consistently in a number of independent studies and be validated in real-world clinical settings—which is often lacking at the moment (Topol 2019). The superiority of the model cannot be measured only in terms of its accuracy (i.e., the ratio of correct over all predictions); rather, other performance metrics, such as sensitivity (a measure of false negatives)Footnote 1 or specificity (a measure of false positives),Footnote 2 also need to be considered. Depending on the specific area, a low false positive or false negative rate may be equally desirable, or even more important, than superior accuracy (Froomkin 2018; Topol 2019; Caruana 2015). For example, false negatives in tumor detection will mean that the cancer can grow untreated—quite likely the worst medial outcome. The deep learning model for detecting Alzheimer disease, for example, did have both higher specificity and sensitivity (and hence higher accuracy) than human radiologists (Ding 2018). The superiority of the novel method must, in addition, be plausible for the concrete case at hand to be legitimate in that instance. Under German law, for example, new medical methods meet the standard of care if the marginal advantages vis-à-vis conventional methods outweigh the disadvantages for an individual patient (Katzenmeier 2006); the same holds true for US law (Greenberg 2009).
This has important implications for the choice between an explainable and a non-explainable model: the non-explainable model may be implemented only if the marginal benefits of its use (improved accuracy) outweigh the marginal costs. This depends on whether the lack of explainability entails significant risks for patients, such as risks of undetected false negative treatment decisions. As Ribeiro et al. (2016) rightly argue, explainability is crucial for medical professionals to assess whether a prediction is made based on plausible factors or not. The use of an explainable model facilitates the detection of false positives and false negative classifications, because it provides medical doctors with reasons for the predictions, which can be critically discussed (Lapuschkin et al. 2019; Lipton 2018) (see, in more detail, the next section). However, this does not imply that explainable models should always be chosen over non-explainable models. Clearly, if an explainable model performs equally well as a non-explainable one, the former must be chosen [for examples, see Rudin (2019) and Rudin and Ustun (2018)]. But, if explainability reduces accuracy—which need not necessarily be the case, see Rudin (2019) and below, Sect. 4—the choice of an explainable model will lead to some inaccurate decisions that would have been accurately taken under a non-explainable model with superior accuracy. Therefore, in these cases, doctors must diligently weigh the respective marginal costs and benefits of the models. In some situations, it may be possible, given general medical knowledge, to detect false predictions even without having access to the factors the model uses. In this case, the standard of care simply dictates that the model with significantly superior accuracy should be chosen. If, however, the detection of false predictions, particularly of false negatives, requires or is significantly facilitated by an explanation of the algorithmic model, the standard of care will necessitate the choice of the explainable model. Arguably, this will often be the case: it seems difficult to evaluate the model predictions in the field without access to the underlying factors used, precisely because it will often be impossible to say whether a divergence from traditional medical wisdom is due to a failure of the model or to its superior diagnostic qualities.
Hence, general contract and tort law significantly constrains the use of non-explainable ML models—arguably, in more important ways than data protection law. Only if the balancing condition (between the respective costs and benefits of accuracy and explainability) is met, the use of the model should be deemed generally legitimate (but not yet obligatory).
Second, for the standard of care to be adjusted upward, and hence the use of a model to become obligatory, it must be possible to integrate the ML model smoothly into the medical workflow. High accuracy does not translate directly into clinical utility (Topol 2019). Hence, a clinical suitability criterion is necessary since ML models pose particular challenges in terms of interpretation and integration into medical routines, as the Watson case showed. Again, such smooth functioning in the field generally includes the explainability of the model to an extent that decision makers can adopt a critical stance toward the model’s recommendations (see, in detail, below, Sect. 3.1.2, Use of the model). Note that this criterion is independent of the one just discussed: it does not involve a trade-off with accuracy. Rather, explainability per se is general pre-condition for the duty (but not for the legitimacy) to use ML models: while it may be legitimate to use a black box model (our first criterion), there is, as a general principle, no duty to use it. A critical, reasoned stance toward a black box model’s advice is difficult to achieve, and the model will be difficult to implement into the medical workflow. As a general rule, therefore, explainability is a necessary condition for a duty to use the model, but not for the legitimacy of the use of a model. The clinical suitability criterion is particularly important in medical contexts where the consequences of false positive or false negative outcomes may be particularly undesirable (Caruana 2015; Zech et al. 2018; Rudin and Ustun 2018). Therefore, doctors must be in a position to check the reasons for a specific outcome. Novel techniques of local explainability of even highly complex models may provide for such features (Ribeiro et al. 2016; Lapuschkin et al. 2019).
Exceptionally, however, the use of black box models with supra-human accuracy may one day become obligatory if their field performance on some dimension (e.g., sensitivity) is exceptionally high (e.g.,> 0.95) and hence there is a reduced need for arguing with the model within its high performance space (e.g., avoidance of false negatives). While such extremely powerful, non-explainable models still seem quite a long way off in the field (Topol 2019), there may one day be a duty, restricted to their high performance domain, to use them if suitable routines for cases of disagreement with the model can be established. For example, if a doctor disagrees with a close-to-perfect black box model, the case may be internally reviewed by a larger panel of (human) specialists. Again, if these routines can be integrated into the medical workflow, the second criterion is fulfilled.
Finally, third, the cost of the model must be justified with respect to the total revenue of the healthcare provider for the latter to be obliged to adopt it (Froomkin 2018). Theoretically, licensing costs could be prohibitive for smaller practices. In this case, they will, however, have to refer the patient to a practice equipped with the state-of-the-art ML tool.
While these criteria partly rely on empirical questions (particularly the first one), courts are in a position to exercise independent judgment with respect to their normative aspects (Greenberg 2009). Even clinical practice guidelines indicate, but do not conclusively decide, a (lack of) negligence in specific cases (Laufs 1990).
In sum, to avoid negligence, medical doctors need not resort to the most accurate product, including ML models, but to state-of-the-art products that reach an acceptable level of accuracy. However, this level of accuracy should be adjusted upwards if ML models are shown to be consistently superior to human decision making, if they can be reasonably integrated into the medical workflow, and if they are cost-justified for the individual health-care provider. The choice of the concrete model, in turn, depends on the trade-off between explainability and accuracy, which varies between different models.
Use of the model Importantly, even when the use of some ML model is justified or even obligatory, there must be room for reasoned disagreement with the model. Concrete guidelines for the legal consequences of the use of the model are basically lacking in the literature. However, we may draw on scholarship regarding evidence-based medicine to tackle this problem. This strand of medicine uses statistical methods (for example randomized controlled trials) to develop appropriate treatment methods and displace routines based on tradition and intuition where these are not upheld by empirical evidence (Timmermans and Mauck 2005; Rosoff 2001). The use of ML models pursues a similar aim. While it does, at this stage, typically not include randomized controlled trials (Topol 2019), it is also based on empirical data and seeks to improve on intuitive treatment methods.
Of course, even models superior to human judgment on average will generate some false negative and false positive recommendations. Hence, the use of the model should always only be part of a more comprehensive assessment, which includes and draws on medical experience (Froomkin 2018). Doctors, or other professional agents, must not be reduced to mere executors of ML judgments. If there is sufficient, professionally grounded reason to believe the model is wrong in a particular case, its decision must be overridden. In this case, such a departure from the model must not trigger liability—irrespective of whether the model was in fact wrong or right in retrospect. This is because negligence law does not sanction damaging outcomes, as strict liability does, but attaches liability only to actions failing the standard of care. Hence, even if the doctor’s more comprehensive assessment is eventually wrong and the model prediction was right, the doctor is shielded from medical malpractice claims as long as his reasons for departing from the model were based on grounds justified on the basis of professional knowledge and behavior. Conversely, not departing a wrong model prediction would breach the standard of care if, and only if, the reasons for departure were sufficiently obvious to a professional (Droste 2018). An example may be an outlier case, which, most likely, did not form part of the training data of the ML model, cf. Rudin (2019). However, as long as such convincing reasons for model correction cannot be advanced, the model’s advice may be heeded without incurring liability, even if it was wrong in retrospect (Thomas 2017). This is a key inside of the scholarship on evidence-based medicine (Wagner 2018). The reason for this rule is that, if the model indeed is provably superior to human professional judgment, following the model will on average produce less harm than a departure from the model (Wagner 2018). Potentially, a patient could, in these cases, direct a product liability claim against the provider of the ML model (Droste 2018).
Particularly in ML contexts, human oversight, and the possibility to disagree with the model based on medical reasons, seems of utmost importance: ML often makes mistakes humans would not make (and vice versa). Hence, the possibility of reasoned departure from even a supra-human model creates a machine-human-team, which likely works better than either machine or human alone (Thomas 2017; Froomkin 2018). Importantly, the obligation to override the model in case of professional reasons ensures that blindly following the model, and withholding individual judgment, is not a liability-minimizing strategy for doctors.
Short summary of case study 1
Summing up, ML models are approaching, and in some cases even surpassing, human decision-making capacity in the medical realm. However, the legal standard of care should be adjusted only upward by the introduction of such models if (1) they are proven to be consistently outperforming professional actors and other models, (2) they can be smoothly integrated into the medical workflow, and (3) its use is cost-justified for the concrete healthcare provider. Conditions (1) and (2) are intimately related to the explainability of the model. If Condition (1) is justified, the use of the model per se does not trigger medial liability. If all three are justified, failure to use it leads to liability. Finally, when a model’s use is justified and even when it is obligatory, doctors are compelled, under negligence law, to exercise independent judgment and may disagree with the model, based on professional reasons, without risking legal liability.
Mergers and acquisition
Similar issues arise when machine learning models are used to predict the transaction value of companies (Zuo et al. 2017). Again, the standard of care managers have to employ with respect to due diligence and other preparatory steps to a merger, but as well in the decision proper, are themselves shaped by state-of-the-art technology, namely information retrieval techniques. If managers fail to use novel, ML-driven methods, which are required by the applicable standard of care, the question of liability arises. One particular feature in transactional contexts (M&A) has to be seen, however, in the high transaction costs of mergers and acquisitions. Therefore, the business judgment rule to be applied in this context is seen to require rather conservative models. This might imply that methods with low false positive rates (recommending a transaction) should be preferred for matters of risk analysis and due diligence. Once ML models surpass a certain threshold of predictive value, reducing false positive rates, it may become mandatory for directors to use these models to avail themselves of the business judgment rule, i.e., the rule that protects managers from liability if their predictions turn out to be wrong. It may even be that also false negative rates—even though much less likely to raise concerns—could become a concern for managers, albeit more from a managerial success, not a liability point of view.
The rise of ML valuation tools
Indeed, ML models using a variety of techniques are increasingly entering also the field of mergers and acquisitions (Jiang 2018). A first set of papers analyses due to which factors mergers have come about in the past, using machine learning. For example, a group of researchers developed a model, based on latest machine learning applied to so-called earnings call transcripts, analyzed the role of different types of corporate culture on the number and the rapidity of mergers (Li 2018)—however, not yet reporting on later failure of the mergers analyzed. Earnings call transcripts are summaries of conversations between CEOs, in part also CFOs with analysts that (indirectly) show what is seen as being essential at the most professional conversation level. The results found incrementally suggested that corporate culture focusing on innovation can be distinguished from corporate culture focusing on quality and that two conclusions can be drawn. The first conclusion is that firms with a corporate culture focusing on innovation are more likely to be acquirers than firms with a corporate culture focusing on quality. In the second conclusion, authors find that firms with the same or a more similar corporate culture (from the same of both categories named or also within them) tend to be merged more often and with less transaction costs. While the first conclusion appears to be more descriptive and of less immediate impact on a legal assessment, and while the second set of conclusions seem not really astonishing, at least the core distinction is more refined than normally found in corporate literature. Moreover, transaction costs are certainly one of the key information parameters to be analyzed and taken into account in an assessment under the business judgment rule (see below).
In a second set of articles, researchers focused on particular countries, because of the strong varieties of capitalism element in corporate mergers, for instance Japan, partly still before machine learning, partly with machine learning based on a large-scale data set concentrating on the Japanese corporate setting (Shibayama 2008; Shao et al. 2018).
Even the earlier study would potentially largely profit from machine learning. The studies are different, but both focus on rather substantial sets of data, one on cash flow analyses—still in some part to be conducted. The other study came to the (non-machine learning based) conclusion that knowledge transfer—so important as a motive for mergers—leads indeed to increased innovation (with respect to the development of drugs). According to authors, however, this is not universally true, indeed not so much in the case of a merger of equals (at least in Japan), but of non-equals (big and small), and moreover innovation then takes place mainly in the first few years (two to three). The data set used in the study might form an ideal test case for a comparative assessment of the respective strengths and reliabilities of conventional analyses in the area of mergers (focusing on one industry in one big country) and machine-based analyses.
However, in practical, but also in legal terms, the development of machine learning in mergers and acquisitions seems to lag behind that in medical care (and malpractice) rather considerably. This is implied by the relatively low number of empirical publications and applications in this field. Hence, it is safe to infer that by sheer numbers, mergers supported by machine learning account only for a very small fraction of mergers. A duty to use these techniques is therefore imaginable only in the future. Moreover, the techniques used so far use both supervised and unsupervised techniques of learning, with only very recent discussion and evaluation of the comparative advantages (Jiang 2018). Finally, the decisive question of how high the failure rate is when using results with and without machine learning is still basically undecided. This shows that rigorous field validation is necessary before ML models can be safely used in corporate/merger contexts—an issue that informs our analysis of the legal prerequisites for the adoption of such models. This also shows that designs of machine learning still have to be better adapted (also) to those questions that are legally decisive, namely to the question of failure that is core for the most important legal issue, i.e., liability (of course, also a core question in economic terms).
Legal liability: the business judgment rule
With ML technology approaching, and potentially (in the future) even surpassing, human capacities in due diligence and other decisional steps to be taken in mergers and acquisitions, namely in the choice and evaluation of a target firm, the question arises whether the failure to use such models may constitute a violation of the duty of care. For setting the standard of care in such decisions, its attenuation via the so-called business judgment rule has to be taken into account. The relevant legal provisions under corporate law (again) differ from country to country. We again offer a legal analysis in which we generally refer to German and US law in particular; nevertheless, general normative guidelines can be formulated.
Adoption For setting the legally required parameters correctly, a model has to start from the rather nuanced legal basis of such decision taking—in which avoiding false positives would seem to be more required than avoiding false negatives. Contrary to what has been explained for ML in medicine, the use of ML does not require permission by the competent authority, but is within the decision power of managers, while the situation is similar for the second point. Again, the question whether the avoidance of legal liability compels the adoption of such a model has (to our knowledge) not been discussed in literature nor in the courts. The answer to this question crucially depends on whether it is considered negligent under corporate law standards, such as Section 93 of the German Stock Corporation Code (Aktiengesetz).
Germany and the US both distinguish between a very demanding standard of professional care boards and managers have to exercise—Germany in an even more outspoken way (Hopt and Roth 2015)—, and an application of the so-called business judgment rule. Indeed, both jurisdictions also attenuate the strict standard of care in genuine business decisions via application of the business judgment rule. This rule has been introduced also in Germany—on the model of US law—in section 93 para. 1(2) of the Stock Corporation Code, different, for instance, from UK law (Kraakman 2017; Varzaly 2012). In principle, this rule is aimed at giving managers more lee-way in such decisions and foster courageous decision taking in genuine business matters, giving rather extended discretion to boards/managers whenever the decision made does not appear to be utterly mistaken. This also guards against hindsight bias in evaluating managerial decisions. The rule nevertheless has its limits. Among the most important are the prerequisite of legality—legal limits have to be respected, namely those of penal law as in the famous Ackermann case of Deutsche Bank (Kraakman 2017)—, but additionally—and still more important in our context—the informational basis for a decision has to be retrieved in a sufficiently professional way. Thus courts restrict scrutiny in content to the outer limits, but substitute such scrutiny by stricter procedural rules—namely on information retrieval. A core limiting factor for the adoption and use of ML is the regulation of the structure of a merger (in the narrow sense) or acquisition (takeover) under European, German and US Law. It requires not only full information of the board (within the business judgment rule), but also ample disclosure and co-decision taking by other bodies within the companies—thus making explainability of the process and the results reached via ML paramount (Grundmann 2012).
Given this legal basis, one can, in principle, refer to the tree criteria developed above for medical malpractice (and their discussion) also for the corporate merger setting. In the corporate setting, it would, first, also not constitute improper use of resources if management recurred to ML techniques only once a significant difference between the performance of the model and human-only decision making in its absence can be shown—again in a number of independent studies, also validated in real world settings. In contrast to the medical context, however, false positives and false negatives tend in principle not to have the same weight in corporate mergers and acquisitions. Rather, for the reasons given (transaction costs), a low false positive rate would in principle have to be the prime standard and target of such assessment. Again, this shows that explainability is key: to evaluate whether a prediction might be incorrectly positive, an explanation will generally be required. Hence, the trade-off between accuracy and explainability explored in the section on medical malpractice reproduces in the corporate setting, too. Second, for the standard of care to be adjusted upward, it must be possible to integrate the ML model smoothly into the procedure of a merger and acquisition undertaking, namely disclosure and possibility of independent decision taking in several distinct bodies. Again, smooth functioning therefore includes the explainability of the model to an extent that decision makers can adopt a critical stance toward the model’s recommendations. Third, the cost of ML is, however, less important. Companies need not refrain from a transaction if they cannot reasonably afford the costs of ML—as no third party is affected by the decision to an intolerable extent. Rather, management has to make an overall cost-benefit analysis, in which the cost of ML will be one, but only one, factor. Given the high stakes usually at play in corporate mergers and acquisitions, the cost of ML will rarely be dispositive.
Use of the Model In the corporate setting, questions of use of the model are shaped still more prominently by the factual basis of the subject matter and by the structure of legal rules, namely the business judgment rule. These peculiarities render the results reached above for medical malpractice still more important and unquestionable in corporate law. First, mergers and acquisitions are generally not very successful in the long-term (Jiang 2018)—the failure rate after five years from when the transaction is carried through amounts to up to 50% according to some studies. Moreover, the business judgment rule allows not only discretion whenever proper research (information retrieval) has been accomplished, but it also imposes an evaluation of the information found by the managers. Thus, in terms of content, the discretion is large; in terms of procedure, only intolerable deviation from information properly retrieved constitutes a ground for liability. Hence, managers may in most cases override the machine recommendation. Of course, managers will take into account also factors that are not legal in the narrow sense (legal limits related), namely the preferences of the body of shareholders and other constituencies.
The overall conclusion would seem to be the same as in malpractice cases. Particularly in ML contexts, human oversight, and the possibility to disagree with the model based on entrepreneurial reasons, seems of high importance: ML makes mistakes humans would not make (and vice versa, potentially even more often one day). Hence, in corporate contexts, too, a machine-human-team, which likely works better than either machine or human alone. Importantly, the possibility to disagree often presupposes explainability. Furthermore, withholding individual judgment is not a liability-minimizing strategy for managers, either. Therefore, there are no incentives for refraining from a reasoned departure from the model.
Short summary of case study 2
Summing up, ML models are approaching, and in the future potentially even surpassing, human decision-making capacity also in the corporate mergers realm. However, the legal standard of care should be adjusted only upward by the introduction of such models if (1) they are proven to be consistently outperforming professional actors and other models, and, due to disclosure requirements and multiple party decision making, when (2) they are used as interactive techniques that can be explained to and critically assessed by humans. Given the size and the timeframe of the transactions, integration into the process is basically not a concern (different from the malpractice cases). As in that case, managers must be obliged to exercise independent judgment and may disagree with the model, based on professional reasons, without risking legal liability.
Key results from case studies 1 and 2: contractual explainability
Both the case study on medical malpractice and on corporate mergers show that, from a legal point of view, explainability is often at least as important as predictive performance to determine whether the law allows or even requires the use of ML tools under professional standards of care. Most notably, this conclusion is independent of the debate on the extent to which the European GDPR does or does not require the explainability of certain algorithmic models (see above, under Sect. 2). The criteria developed for assessing the necessity of explainable AI stem, legally speaking, not from data protection law but from specific domains of contract and tort law. As such, they are not confined to the EU, but apply equally in all countries with negligence regimes, in contract and tort law, similar to our reference jurisdictions: the US and Germany. Since there has been a considerable degree of convergence in EU contract law in the past decades (Twigg-Flessner 2013), the guidelines we develop apply, at least broadly, both in US and in EU law. In the third case study, we therefore examine more in detail the interaction, and trade-off, between performance and explainability from a technical perspective.