Introduction

The rapid growth in military use of artificial intelligence (AI) promises to transform the conduct of warfare. Accordingly, it has prompted significant debate among policy makers, scholars and the international community, most notably among States parties to the Convention on Prohibitions or Restrictions on the Use of Certain Conventional Weapons Which May Be Deemed to Be Excessively Injurious or to Have Indiscriminate Effects (1980). For more than eight years, those States have debated whether and how to mount a regulatory response to the rapid development of increasingly autonomous weapon systems (AWS) which will employ AI and related technologies to supplement or even, to an extent, replace human decision-making on the battlefield (Defense Science Board, 2016, p. 13).

That debate has so far failed to provide any firm, substantive guidance to States about how to regulate the use of AI on the battlefield, other than by reinforcing the presumption that international humanitarian law (IHL), the body of international law which governs the conduct of armed conflict, continues to apply to the use of AI-enabled AWS as it applies to the use of other military technologies. Consequently, the obligation of States parties to Additional Protocol 1 of the 1949 Geneva Conventions (1977) (API) to review new weapons for compatibility with the State’s international legal obligations remains ‘the only binding mechanism the international community has to force states to assess whether the employment of these weapons, means and methods, which build on fundamentally new technologies, raises any significant concerns from a humanitarian perspective’ (Boulanin & Verbruggen, 2017, p. 6).

The nature of AI and AWS complicate legal reviews of weapons in a number of ways (Haugh et al., 2018) but there is one capability in particular which may be offered by future AWS which will raise unique concerns for reviewers: the capacity to learn in situ. That is, the emerging capacity for an AI-enabled AWS to adapt to its surrounding environment and optimise its outputs over time without direct human input.

This paper surveys current understandings of the weapons review challenges presented by in situ learning and outlines some concepts which may inform appropriate changes to existing weapons review processes. Among other points, it argues that the most widely supported proposal, that of iterative, post-deployment field reviews of learning weapon systems, is legally permissible only within limits which depend on the degree of legal risk posed by the weapon system’s capacity to learn.

The technology of learning machines

The technology of autonomous systems, particularly weapon systems, has been discussed extensively elsewhere (Watson & Scheidt, 2005; McFarland, 2015), as has the long-running controversy about exactly which capabilities qualify as ‘autonomous’ in the context of a legal discussion (Horowitz, 2016). For the purposes of this paper, machine autonomy only provides a broad context for a discussion of learning systems and it is not necessary to engage closely with the debate about the precise nature and use of AWS. It is sufficient to say that an autonomous weapon system is broadly one that ‘once activated, can select and engage targets without further intervention by a human operator’ (US Department of Defense, 2012, p. 13). That is, the essential quality of an AWS is that it operates, to a certain extent, on its own, with little or no direct intervention by a human operator.

AI, in the context of weapon systems, is an approach to developing software systems (including those which underpin AWS) such that they exhibit behaviour that may be described as ‘intelligent’. Often, that involves replicating some aspect of biologically observable intelligence such as cognition, decision-making or, most importantly here, learning. The rationale is that biological entities can often learn to operate effectively in complex environments about which they have relatively little knowledge, and it is hoped that software-based machines which replicate the functionality of biological brains will likewise be able to perform complex tasks in dynamic environments about which their designers and operators might have only limited knowledge. It is broadly accepted that effective use of AI will be essential to designing weapon systems which can perform complex tasks autonomously in combat environments (Farrant & Ford, 2017, p. 399; Office of the Secretary of Defense, 2018, p. 13).

Machine learning is an application of AI which utilises data to improve and optimise system outputs over time with minimal human intervention (IBM, 2020; Farrant & Ford 2017, p. 398). Most of the technical details of machine learning are beyond the scope of this paper. However, from a legal point of view, the critical considerations are as follows.

First, depending on which algorithm is being used, many systems which employ machine learning (including many of those underpinning AWS) are ‘trained’ by repeatedly being provided with large quantities of data which represents the information they will likely encounter when in operation. They then use heuristics to tune their behaviour when exposed to that data. The resulting relationship between the system’s inputs (such as data gathered from sensors) and outputs (actions the AWS takes in response) is not necessarily optimal or error-free as curating data sets that capture every possible element of an operating environment is near impossible (Saidulu & Sasikala, 2017). Dynamic operating environments are inherently unpredictable and may introduce details that the system has not been trained against. Unknown or atypical inputs can result in system outputs that may not be completely understood by the human operators.

Second, training/learning can happen at different stages of a machine’s lifecycle. It can be confined to the development phase (‘offline learning’), which involves a more static system that does not continue to learn once in operation (Farchi et al., 2021). Alternatively, learning can be allowed to continue after the machine is put into operation (‘online learning’) which involves a more dynamic system that continues to change and learn in situ. The second type of learning is the focus of this paper.

Weapons reviews

Weapons review programmes are formal processes whereby States assess new weapons for compatibility with the State’s international legal obligations prior to employing those weapons in armed conflict (Jevglevskaja, 2015; McClelland, 2003; Vestner & Rossi, 2021, pp. 523–529). Weapons reviews provide value to States in numerous ways: inter alia, they are a means of complying with obligations borne by parties to law of war treaties, they provide field commanders with assurance of the legality of weapons and ammunition, they are a resource for responding to queries on matters of legality, and they build awareness of legal considerations among weapon developers and other interested parties (Parks, 2005, pp. 105–107). For States parties to API, weapons reviews are also a formal treaty obligation, as set out in article 36:

In the study, development, acquisition or adoption of a new weapon, means or method of warfare, a High Contracting Party is under an obligation to determine whether its employment would, in some or all circumstances, be prohibited by this Protocol or by any other rule of international law applicable to the High Contracting Party.

Unfortunately, the practice of weapons reviews appears to fall short of its potential. There is little evidence of widespread practice of weapons reviews by parties to API (Farrant & Ford, 2017, pp. 392, 401), with only a handful of States being known to conduct reviews (notably, the United States has developed a rigorous weapons review process which predates the signing of API, although the US is not a party) (Daoust et al., 2002, p. 354; International Committee of the Red Cross, 2006, p. 946). Nor can it be said that a weapons review obligation has crystallised into customary international law (Jevglevskaja, 2018).

Challenges in reviewing learning weapons

International law does not mandate any specific process for conducting a review, or precise scope of matters to be considered in a review (Daoust et al., 2002, p. 352). It is left to each State to develop a review process that enables the State to meet its legal obligations (One example being the process adopted by Australia (2018)). That is where the challenge lies for States attempting to review AI-enabled AWS which learn in situ. There is widespread agreement among States and commentators that traditional approaches to the legal review of weapons will be insufficient to determine the legality of weapon systems that learn on the battlefield and new processes will need to be developed (Sanders & Copeland, 2020; Backstrom & Henderson 2012; Lewis, 2019). Developing those new processes promises to be very difficult due to at least three significant technical and legal challenges which have not yet been overcome.

First, difficulties arise from the nature of the software which will drive advanced, learning AWS. Machine learning algorithms are highly complex, and much of the functionality of the software may arguably be beyond the scope of a weapons review, as a reviewer is concerned only with matters which affect a State’s ability to meet its international legal obligations. However, in situ learning software capabilities present at least one characteristic challenge to weapons reviewers: how to ascertain and legally assess the behaviour of software which, by design, can optimise its behaviour in response to events on the battlefield. We use the term optimise here, as opposed to modify, as the purpose of learning systems is to improve outputs over time. By definition, to modify is to change or transform whereas to optimise is to improve efficiency. A learning system aims to optimise its outputs over time in pursuit of static goals. The goals of the system have not changed; however, the methods for achieving the goals are optimised over time. Take for example an AWS that locates and shoots targets. An in situ learning capability applied to this system may mean that the system is capable of learning the most optimal way to shoot a moving target. The output of the AWS will still be ‘locate and shoot’, that has not changed. However, how the system achieves this output has been optimised – perhaps the system aims the shot slightly ahead of the target as it has learnt that a moving target at X distance travelling at Y speed requires a shot fired in Z direction at T time. There is some level of unpredictability here, particularly when considering the dynamic nature of the operating environment; however, the fundamental purpose and the required output of the system are ultimately known. This distinction is critical when considering the unpredictability of learning systems. While emergent behaviours (behaviours which appear at the system level following implementation of the capability (Di Marzo Serugendo et al., 2006)) will be unpredictable, the fundamental purpose and goals of the system remain unchanged.

Assessing the behaviour of any advanced AWS in a complex and dynamic combat environment is difficult even if the system’s responses to battlefield events are known and predictable. For AI-enabled AWS, which have the capacity to dynamically optimise their outputs with little to no human intervention, the number of possible sequences of action and reaction increases and the task of reliably predicting the behaviour of the system, and hence its capacity to be used in compliance with specific legal obligations (Boulanin, 2015), becomes more difficult (Gillespie, 2015, p. 52; Vestner & Rossi 2021, p. 544).

It should be noted that unintended behaviours or consequences to the operation of a system are concerns that extend to most systems, autonomous or otherwise. There is almost always a risk that a capability or a human may behave in an unexpected way. This may be due to system malfunctions or human error. We accept these risks, to a certain extent, under the assumption that appropriate mitigations have been put in place to minimise the likelihood of these anomalies. Risk tolerances are not a novel concept, particularly in a military context.

Second, AWS present a novel challenge to reviewers in that they are likely to require assessment of rules of targeting law, which are not normally part of a weapons review (Lawand, 2006, p. 928). Functions which have traditionally been performed by human beings, such as selecting targets and assessing their legal status, may instead be performed, at least in part, by the AWS itself (McFarland, 2020, ch. 3). Accordingly, Boothby (2018, p. 40) notes that ‘a weapon review of a weapon system employing autonomous attack technology should not limit itself to a consideration of the usual weapons law criteria…but should also consider whether the technology, and its intended manner and circumstances of use, enable the rules of targeting law to be complied with.’ That is, a State which operates AWS must still observe its targeting law obligations, and to the extent that the State chooses to rely on an AWS to perform targeting functions, the State must ensure the AWS will operate in a manner consistent with targeting law (Boulanin, 2015, p. 5). Sanders and Copeland (2020) expand: ‘the requirement to assess any AWS functionality that engages the reviewing state’s IHL obligations—such as distinguishing between lawful and unlawful targets—will require careful consideration of issues such as the degree of human input to targeting decisions, the acceptable standard of machine compliance, and the measures for ensuring ongoing compliance with IHL.’

That reference to ‘the acceptable standard of machine compliance’ is important. It is incumbent on a reviewing State to determine the standards of compliance required to pass a legal review (International Weapons Review, 2019). In respect of the critical function of assessing the lawful status of a potential target, for example, Backstrom & Henderson (2012, p. 494–495) report that the human-applicable standard ‘is not clear-cut. The standard expressed by the International Criminal Tribunal for the Former Yugoslavia is that of “reasonable belief”. In their rules of engagement, at least two states have adopted the standard of “reasonable certainty”. A third approach, reflected in the San Remo Rules of Engagement Handbook is to require identification by visual and/or certain technical means’. That raises questions about the standards which should then apply to compliance with rules of targeting law when targeting functions are performed by an AWS (Copeland & Reynoldson, 2017, p. 103). Can subjective human standards such as ‘reasonable belief’ be applied to machines? If they can be so applied, should they? Or should machines be held to a higher standard? Other commentators posit in respect of AWS specifically that ‘[t]he standards of compliance must, as a minimum, be equivalent to a trained human operator, and, where possible, exceed the human standard’ (International Weapons Review, 2017) or that ‘the acceptable reliability threshold will depend upon the effect of failure on a system’s ability to comply with any given rule’ (Farrant & Ford, 2017, p. 410).

In situ learning adds another layer of complexity to the matter of setting a standard of compliance in that it is necessary to ensure that the required standard, once it is determined, continues to be met as the AI-enabled AWS optimises its behaviour on the battlefield.

Third, as Copeland & Reynoldson (2017, p. 101) note, it may be difficult to access all the information necessary to thoroughly review a learning AWS due to challenges around explainability of AI algorithms:

[The normal review process] relies on an assumption that all relevant empirical information relating to the weapon, including military, technical, health, and environmental data will be available for consideration during the review process. The challenge this poses for weapons with AI (that are capable of machine learning) is that the AI must be able to convey the reasoning behind its decisions and actions in an understandable way to the legal reviewer.

That need for access to the reasoning adopted by systems to be reviewed presents problems at this early stage of AI development, when some of the most capable AI algorithms are so complex and opaque that even their creators cannot explain in detail how they reach specific decisions (IBM, 2021). Consequently, there is tension between the need to optimise AWS decision-making performance and the need to understand the underlying processes.

Overcoming these challenges will require progress in both the technical (testing and evaluation (T&E) and validation and verification (V&V) of complex learning software systems (Gillespie, 2015)) and legal (States’ weapons review processes) dimensions of weapon behaviour assurance.Footnote 1 Many commentators take the view that development and testing processes will need to blend somewhat with legal review processes such that both technologists and lawyers contribute to the development and legal assessment of learning AWS from the earliest stages of development through the capability lifecycle (Defense Science Board, 2016, p. 35; Vestner & Rossi 2021, pp. 533–534; Copeland & Reynoldson 2017, p. 109). The next section outlines some concepts which might inform developments in the legal dimension of States’ weapon review efforts.

Adapting review processes to learning weapons

One may wonder why questions about adapting Article 36 review processes must arise at all in a discussion about reviewing a new weapon system technology. Article 36 is not technology-specific and the substantive legal obligation to be met is the same for AI-enabled AWS as it is for any other type of weapon. While there is clearly significant technical work to be done in terms of developing the T&E and V&V processes which will support reviews of learning systems, surely it will be possible for legal personnel to simply apply the output of improved T&E/V&V processes to existing review procedures, as they do for other types of weapons, albeit expanded to include rules of targeting law where relevant?

On a plain reading of Article 36, doing so would require that through the ‘study, development, acquisition or adoption’ (Boothby, 2018, p. 35 ) phases of the weapon system’s lifecycle, prior to deploying the AWS on the battlefield, reviewers would need to anticipate and assess all the legally significant behaviours that the AWS would be expected to learn and exhibit in its intended circumstances of use and reach a sufficient level of confidence that those behaviours would remain consistent with the State’s legal obligations. Reviewers would therefore be required to review the nature of the learning algorithm itself, to ensure that the entire set of behaviours which the algorithm may allow the AWS to adopt would lie within the set of behaviours which are consistent with the State’s legal obligations. They would need to anticipate all stimuli which the AWS might encounter which would cause it to modify its behaviour, as well as anticipate the nature of the modifications which the AWS’s highly complex learning software would make to the weapon system’s behaviour, and then assess with confidence whether those changes would themselves remain within the bounds of what is legally acceptable.

That task might well prove to be technically infeasible. The dynamic nature of the battlefield environment and the difficulty of anticipating the actions of a sophisticated adversary, as well as the software and hardware complexity of an advanced learning AWS, may effectively rule out a simple ex ante assessment process for all but the most tightly constrained learning systems. In most cases, the number of possible combinations of stimuli which the system could encounter and actions it may take in response will almost certainly rule out exhaustive testing of the system’s behaviour. The scale of the problem may even exclude the possibility of a confident ex ante estimation of the chance of remaining in compliance with the law over the long term once deployed.

Accordingly, momentum is building behind the idea that periodic post-deployment legal reviews would be required, extending across the entire lifecycle of the AWS, to ensure that an AWS has not learned to behave in a manner which would violate the operating State’s legal obligations (Farrant & Ford, 2017, p. 406; Defense Science Board 2016, p. 50; Ford 2017, 461). Vestner and Rossi’s (2021, pp. 544–545) view is representative of this line of thinking:

One suggestion that is gaining growing consensus is to conduct “runtime verifications” on online machine learning systems in order to keep up with the system adaptations to the environment. This would shift the focus toward a continuous evaluation during the full life cycle of the system, first and foremost the operational phase. The system would be certified prior to deployment to attest its suitability for use in a limited set of scenarios, while the incompleteness of the processes and potential unintended scenarios would be acknowledged. Developers should then continuously monitor performance and report to regulators to allow re-evaluation and take corrective measures if necessary. By doing so, the system is assessed in real operating conditions, and violations of its properties and specifications are detected and addressed while the system is running. It would also solve two problems at the same time. First, continuous verification during system operation would ensure that although its unpredictability is accepted, this would not lead to violations of the system’s specifications and applicable law. Second, it would provide those responsible for system functioning with a tool to retain control over it. This is relevant from an operational standpoint since no commander would entrust a system operating outside its sphere of control with critical functions.

Under this paradigm, the initial state of the AWS (before learning any new behaviours) would be reviewed prior to deployment, but learned behaviours would only be reviewed later, as the AWS is used in combat or other operations.

The idea of periodic post-deployment reviews (‘runtime verifications’) may also be justified if one takes the view that a weapon which learns new behaviours effectively becomes a different weapon, distinct from the weapon as it was when reviewed prior to any in situ learning taking place (the degree of difference which would justify a post-deployment review is discussed further in a later section). Article 36 of API states that the review obligation applies to ‘new’ weapons, but States which are known to conduct weapons reviews generally take the view that ‘new’ is to be interpreted to include weapons which have been modified such that their behaviour is altered in a way which may be legally significant (Ministry of Defence Development, 2016, p. 4; International Committee of the Red Cross, 2006, p. 938 note 21, p. 952 note 82). The ICRC agrees, arguing that the Article 36 obligation covers ‘an existing weapon that is modified in a way that alters its function, or a weapon that has already passed a legal review but that is subsequently modified’ (International Committee of the Red Cross, 2006, p. 938). While presumably the modifications in question would traditionally have been changes made manually to a weapon system by human beings, there does not appear to be any problem in extending the concept of ‘modification’ to changes in behaviour made automatically by the system itself in pursuit of optimal outputs.

Post-deployment reviews would also contribute to the deploying State’s obligation under API art 57(2)(a)(i) to ‘[d]o everything feasible to verify that the objectives to be attacked are neither civilians nor civilian objects and are not subject to special protection’. Given the tendency of AI systems to err in ways that may be unexpected and inscrutable to their human designers and operators, this obligation could be interpreted to require a degree of human oversight of the targeting process.

Still, adopting periodic post-deployment reviews is potentially problematic, as weapons reviews are intended to be prospective and preventative, as Parks (2005, p. 114) notes:

The words ‘new’, ‘acquisition or adoption’ indicate that the requirement in Article 36 is prospective rather than necessarily retroactive. The caveat ‘necessarily’ is included in the preceding sentence as it should not be inferred that were a weapon illegal per se or prohibited by treaties to which the state in question is a State Party, it would gain a ‘free pass’ as to future use because the weapon or munition in question was acquired prior to a state’s ratification of Additional Protocol I.

Prima facie, periodic reviews after deployment may appear to risk undermining the preventative purpose of weapons reviews, and specifically to violate Article 36’s requirement that the legal review be conducted during the ‘study, development, acquisition or adoption’ of the weapon system. If learned weapon system behaviours are reviewed periodically only once they are adopted on the battlefield, it is possible that, by the time the behaviour is assessed, the weapon system may have already been operating in an unreviewed manner.

However, a review programme which extends beyond initial deployment of the weapon system appears to be consistent with the intent of Article 36, provided that the prospective nature of each review is not lost. The periodic review regime would have to be coordinated with the AWS learning algorithm such that each review, including the initial one prior to deployment, would address the present behaviour of the weapon system plus the set of behaviours the system might reasonably be able to adopt prior to the next review. Reviewers would need to be sufficiently certain that the weapon system would not be likely to adopt any behaviours beyond the scope of each review prior to the next review taking place. Provided that condition could be satisfied, the sequence of reviews would collectively satisfy the requirements of Article 36.

Coordination between the learning algorithm and the review schedule is likely to mean that constraints will have to be imposed on the degree to which the system can learn new behaviours (Office of the Secretary of Defense, 2018, p. 10):

For the most demanding adaptive and non-deterministic systems, a new approach to traditional [testing, evaluation, validation and verification] will be needed. For these types of highly complex autonomous systems, an alternate method leveraging a run-time architecture that can constrain the system to a set of allowable, predictable, and recoverable behaviors should be integrated early into the development process.

Scholars have also identified several technical safeguards which may be required. Meier (2016, p. 130) argues for protection against errors and interference:

The weapons system must have appropriate safeguards in place to terminate activity and seek operator input should the system fail to perform as intended; and it must include robust safeties and anti-tamper measures.

The Defense Science Board (2016, p. 26) points to measures to help ensure that AI decisions are explainable:

If the system includes learning, interactions with human operators would be facilitated if the system came with a design and training pedigree to help human teammates and supervisors anticipate novel system behaviors as the system evolves in reaction to past experiences and training.

Boothby (2018, p. 27) cautions against use of learning algorithms that might enable an AWS to learn to become less restrictive in applying force than its operators intend:

If additional artificial intelligence were to be applied to such a system, with the consequence that the system learns how to detect protected persons or objects such as civilians or civilian objects more reliably, this would seem to be acceptable both legally and ethically. By contrast, an artificial intelligence system that permits the weapon system to loosen pre-set constraints that reflect targeting law will of course be legally and presumably ethically unacceptable. Between these relative extremes, testing should be employed to determine exactly how the weapon system’s learning process is limited and whether the results of all possible learning outcomes will be such that the platform’s decisions, and the precautions taken in reaching them, comply with targeting law rules.

Overall, the requirement is that weapon reviewers’ assessments of the legally significant aspects of the behaviour of a learning AWS must remain accurate until the next review is conducted.

Optimisation vs. adoption of new behaviours

An earlier section of this paper discussed the distinction between optimisation of behaviour (improving efficiency in the pursuit of a given goal) and modification of behaviour (more extensive changes which go beyond merely improving efficiency). This distinction is a useful basis for determining when a learning system needs to be subjected to an additional legal review, provided that ‘optimisation’ and ‘modification’ are defined with legal considerations in mind. That is, in order to avoid the need for an additional review, legal risks would need to act as constraints on the optimisation process.

If optimisation is understood as improving the means by which a static goal is pursued with the constraint that the improvements do not increase the risk of legally proscribed behaviour, then that optimised behaviour would not, strictly speaking, require another legal review. (Of course, it may be considered prudent to conduct legal ‘runtime verifications’ regardless). For example, a learning ‘locate and shoot’ system might be constrained not to learn to engage a target in closer proximity to civilians than its original pre-deployment review considered it would, regardless of whether doing so might improve the chance of successfully destroying the target.

Learning which goes beyond such constrained optimisation would necessitate a new legal review. For example, learned behaviour which changes the type of risk or increases the degree of risk posed to civilians would prompt consideration of whether the operating State is still taking ‘all feasible precautions in the choice of means and methods of attack’ (API, art 57(2)(a)(ii)) with a view to avoiding civilian harm. That is a question which would require human intervention in the nature of a legal review.

The need for communication and standardisation

Even with sufficient coordination between learning algorithms and legal review schedules, and if the required T&E/V&V advances can be made, challenges will remain. In the short term, practical difficulties with conducting post-deployment reviews of weapons will have to be overcome. Can reviewers be continually ensured timely access to AWS after deployment, given the exigencies of combat environments? Can the required expertise and resources reliably be made available in the field, given the cutting-edge nature of the software and hardware which will need to be reviewed? On the assumption that each individual AWS may learn different behaviours based on its individual experiences, how can it be made feasible to thoroughly review each individual unit in what will be, in many cases, a very large force of autonomous systems? Some research is being conducted into automated testing and evaluation processes for learning AWS so that weapon systems could report on developments which would trigger the need for a review. With such an arrangement, ‘[a] system would report on its learning and modifications throughout its lifecycle, allowing the end user to understand whether a new legal review might be needed’ (Boulanin & Verbruggen, 2017, p. 24).

However, automated processes would only form one part of a complete solution and further challenges will arise over the longer term. With AI technologies advancing rapidly, military applications will continue to multiply, as will the complexity of the systems being developed. Presumably T&E and V&V processes will also evolve, but will review procedures developed today continue to be suitable?

Given these challenges, it is encouraging that there are signs of willingness among States and other members of the international community to share information on weapons reviews and potentially work towards establishing universal standards and mechanisms. As Kotlik (2020) notes:

...several instances of information sharing have taken off in the last five years, including a Weapon Review Forum convened by the United Kingdom, a project to update the ICRC’s Guidance on new weapons, and an informal process launched by the UN Office for Disarmament Affairs, in cooperation with UNIDIR. In addition, some States have raised the issue in the context of the Group of Governmental Experts of the Convention on Conventional Weapons (CCW), where Australia and Israel recently shared information about their reviewing processes. A working paper submitted by Argentina emphasised that information sharing could lead to the adoption of standardised universal mechanisms, a reduction of the gap between mechanisms used by weapons-producing countries and those that acquire them, and the enhancement of control over the emergence of new weapons.

In the context of the CCW discussions on AWS, the United States has also been open about sharing their weapons review procedures. As early as 2015, the US Mission to International Organizations in Geneva (2015) proposed that CCW discussants develop an ‘interim outcome document that sets forth what is entailed by a comprehensive weapons review process, including the policy, technical, legal and operational requirements that would apply if a state were developing [AWS].’ Such a document would set out best practices for reviewing AWS and perhaps provide a basis for developing an approach to the review of learning systems. Cochrane (2020) and Poitras (2018) have made similar arguments.

As Boulanin (2015) points out, cooperation among States on reviewing AWS could also reinforce the weapons review regime more broadly: ‘Increased transparency on weapon review procedures could become a virtuous circle in many respects. It could contribute to the development of interpretative points of guidance on the implementation of Article 36 and consequently strengthen international confidence in the mechanism.’ With so little evidence of States parties to API meeting their obligation under Article 36 to review new weapon systems for legal compatibility, this would be a welcome development which would do much to promote respect for IHL.Footnote 2

Conclusion

The rapid growth in the market for AWS is forcing lawyers, technologists and policy makers to consider novel and difficult questions about the practice of warfare vis-à-vis the role of human involvement in combat operations (de Preux, 1987, p. 427 para. 1476) and the regulation of artificially intelligent weapon systems.

In some respects, the regulatory challenges presented by AWS reflect those being encountered in other fields where AI is making inroads: how to ensure attributability and human accountability, how to address challenges such as algorithmic biases (Danks & London, 2017) and lack of explainability of AI-made decisions, how to ensure that an adequate degree of human control (Article 36, 2013, p. 1; Crootof 2016) is retained, and so on.

Other challenges are unique to autonomy in a military context, with the matter of ensuring ongoing compliance with IHL and other applicable legal frameworks being particularly prominent. Meeting this challenge will require a blending of legal and technological skills and processes, most immediately in the form of new procedures for conducting legal reviews of new weapons as required by article 36 of API.

More broadly, the growth of AWS, AI and learning machines has led to calls for the ‘digitalisation’ of legal frameworks governing armed conflict (Vestner & Rossi, 2021, p. 553) and there are some signs of renewed academic interest in the search for a generalised theory of the interaction between law and technology (Cockfield, 2003; Bernstein, 2007). It may well be that efforts such as those will one day guide the uptake of intelligent machines, including AWS, but in the meantime the unique challenges of AI and machine learning have focused attention on the importance of legal reviews as a means of ensuring compliance with the law, and it is crucial that they be conducted rigorously by all States which operate AWS.