Holmes, when he articulated a way of thinking about law that departed from the prevalent deductive formalism of his day, traced an outline recognizable in twenty-first century computer science. The nineteenth century understanding of legal reasoning, which Holmes thought at best incomplete, had been that the law, like an algorithm, solves the problems given to it in a stepwise, automatic fashion. A well-written law applied by a technically competent judge leads to the correct judgment; a bad judgment owes to a defect in the law code or in the functioning of the judge. Holmes had a contrasting view. In Holmes’s view, the judge considers a body of information, in the form of existing decisions and also, though the judge might not admit it, in the form of human experience at large, and in that body discerns a pattern. The pattern is the law itself. As computer science has developed from algorithm to machine learning, it, too, has departed from models that find satisfactory explanation in formal proof. In machine learning, the input is data, as in the law in Holmes’s view the input is experience; and, in both, the task to be performed upon a given set of inputs is to find patterns therein. Thus in two different fields at different times, a transition has occurred from logic applied under fixed rules to a search for patterns.

In the present chapter, we consider more closely the inputs—experience and data; in Chapter 4 we will consider how, in both law and machine learning , patterns are found to make sense of the inputs ; and in Chapter 5 we turn to the outputs, which, as we will see, are predictions that emerge through the search for pattern.

3.1 Experience Is Input for Law

To what materials does one turn, when one needs to determine the rules in a given legal system? Holmes had a distinctive understanding of how that question is in fact answered. In The Common Law , which was published sixteen years before The Path of the Law , Holmes started with a proposition that would join several of his aphorisms in the catalogue of jurists’ favorites: “The life of the law has not been logic; it has been experience.”1 This proposition was further affirmation of Holmes’s view that logic, on its own, only gets the jurist so far. More is needed if a comprehensive understanding of the legal system is to be reached. Holmes proceeded:

The felt necessities of the time, the prevalent moral and political theories, intuitions of public policy, avowed or unconscious, and even the prejudices which judges share with their fellow-men, have had a good deal more to do than syllogism in determining the rules by which men should be governed. The law embodies the story of a nation’s development through many centuries, and it cannot be dealt with as if it contained only the axioms and corollaries of a book of mathematics.2

We see here again the idea, recurrent in Holmes’s writing, that law is not about formal logic, that it is not like mathematics. We also see an expansion upon that idea, for here Holmes articulated a theory of where law does come from. Where Holmes rejected syllogism—dealing with the law through “axioms and corollaries”—he embraced in its place the systematic understanding of experience. The experience most relevant to the law consists of the recorded decisions of organs having authority over the individual or entity subject to a particular legal claim—judgments of courts, laws adopted by parliaments, regulations promulgated by administrative bodies, and so on.

Holmes understood experience as wider still, however, for he did not invoke only formal legal texts but also “prevalent moral and political theories, intuitions of public policy… even the prejudices which judges share with their fellow-men.”3 The texts of law, for Holmes, were part of the relevant data but taken on their own not enough to go on.

In response to Holmes’s invocation of sources such as political theory and public policy, one might interject that, surely some texts have undoubted authority, even primacy, over a given legal system—for example, a written constitution, to give the surest case. In Holmes’s view, however, one does not reach the meaning even of a constitution through logic alone. It is to history there as well that Holmes would have the lawyer turn:

The provisions of the Constitution are not mathematical formulas that have their essence in form, they are organic, living institutions transplanted from English soil. Their significance is vital, not formal; it is to be gathered not simply by taking the words and a dictionary but by considering their origin and the line of their growth.4

That Holmes was a keen legal historian is not surprising.5 When he drew attention to “a nation’s development through many centuries,” this was directly to his purpose and to his understanding of the law. For Holmes, experience in its broadest sense goes into ascertaining the law.

3.2 Data Is Input for Machine Learning

As we suggested in Chapter 2, a common misperception is that machine learning describes a type of decision-making algorithm: that you give the machine a new instance to decide, that it does some mysterious algorithmic processing, and then it emits an answer. In fact, the clever part of machine learning is in the training phase, in which the machine is given a dataset, and a learning algorithm converts this dataset into a digest. Holmes talked about a jurist processing a rich body of experience from which a general understanding of the law took form. In the case of modern machine learning, the “experience” is the data; the general understanding is in the digest, which is stored as millions of finely-tuned parameter values. We call these values “learnt parameters.” The learnt parameters are an analogue (though only a rather rough one) to the connection map of which neurons activate which other neurons in a brain.

The training dataset—the “experience” from which the system learns—is of vital importance in determining the shape that the system eventually assumes. Some further words of detail about the training dataset thus are in order.

Computer scientists describe the training dataset in terms of feature variables and outcome variables. To see how these terms are used, let us take an example of how we might train a machine to classify emails as either spam or not spam. The outcome variable in our example is the label “spam” or “not-spam.” The feature variables are the words in the email. The training dataset is a large collection of emails—together with human-annotated labels (human-annotated, because a twenty-first century human, unlike an untrained machine, knows spam when he sees it). In the case of legal experience, the facts of a case would be described as feature variables, and the judgment would be described as an outcome variable .

There is a subfield of machine learning, so-called “unsupervised” machine learning, in which the dataset consists purely of feature variables without any outcome variables. In other words, the training dataset does not include human-annotated labels. The learning process in that kind of machine learning consists in finding patterns in the training dataset. That kind of machine learning—unsupervised machine learning—corresponds to Holmes’s broader conception of experience as including “prevalent moral and political theories” and the whole range of factors that might shape a jurist’s learning. Classifications are not assigned to the data a priori through the decision of some formal authority. They are instead discerned in the data as it is examined.

After the machine has been trained, i.e. after the machine has carried out its computations and thus arrived at learnt parameter values from the training dataset, it can be used to give answers about new cases. We present the machine at that point with new feature variables (the words in a new email, which is to say an email not found in the training dataset), and the machine runs an algorithm that processes these new feature variables together with the learnt parameters. By doing this, the machine produces a predicted outcome—in our example, an answer to the question whether that new email is to be labeled “spam” or “not-spam.” We will consider further below (in Chapter 5)6 the predictive character of machine learning, which is shared by Holmes’s idea of law.

Data, especially “big data,” is the grist for machine learning. The word data is apt. It comes from the Latin datum, “that which is given,” the past participle of dare, “to give.” The dataset used to train a machine learning system (whether or not classifications are assigned to the data in the dataset a priori) is treated as a given in this sense: the dataset is stipulated to be the “ground truth”—the source of authority, however arbitrary. A machine learning system doesn’t question or reason about what it is learning. The predictions are nothing more than statements in the following form: “such and such a new case is likely to behave similarly to other similar cases that belong to the dataset that was used to train this machine.” It was an oft-noted inclination of Holmes’s to take as a given the experience from which law’s patterns emerge.7 The central objection commonly voiced about Holmes’s legal thinking—that he didn’t care about social or moral values—would apply by analogy to the predictions derived from data. We will explore this point and its implications in Chapters 610 below.

In typical machine learning, the training dataset thus is assembled beforehand, the parameters are learnt, and then the trained machine is put to use. Holmes’s concept of law follows a similar path. The collected experience of society (including its written legal texts) may be likened to the training dataset. The learnt experience of a jurist may be likened to the parameter values in a machine learning system. The jurist is presented new questions, just as the machine (after training has produced the learnt parameters) is presented new feature variables, and, from both, outputs are expected.

Jurists will naturally keep accumulating experience over time, both from the cases they have participated in and from other sources. In a particular variant of machine learning, a machine likewise can undergo incremental training once it has been deployed. This is described as online learning, denoting the idea that the machine has “gone online” (i.e., become operational) and continues to train. On grounds of engineering simplicity it’s more common, so far, to train the machine and then deploy it without any capability for online learning.8

There is perhaps an aspect of Holmes’s understanding of the law that does not (yet) have any counterpart in machine learning, even its online variant: a legal decision is made in anticipation of how it will be used as input for future decisions. An anticipatory aspect is not present in machine learning in its present state of the art. We will explore this idea in Chapter 9.

3.3 The Breadth of Experience and the Limits of Data

Another distinction is that the experience Holmes had in mind is considerably broader than the typical training datasets used in machine learning, and it is less structured. The machine learning system is constrained to receive inputs in simple and rigid formats. For example, a machine receives an input in the form of an image of prespecified size or a label from a prespecified (relatively) small set of possibilities; its output is an image or a label of the same form. The tasks that machine learning can handle, in the present state of the art, are those where the machine is asked to make a prediction about things that are new to the machine, but whose newness does not exceed the parameters of the data on which the machine was trained. Machine learning is limited in this respect. It is limited to data in a particular sense—data as a structured set of inputs; whereas the experience in which jurists find the patterns of law is of much wider provenance and more varied shape.

Machine learning, however, is catching up. There is ongoing research on how to incorporate broad knowledge bases into machine learning systems, for example to incorporate knowledge about the world obtained from Wikipedia. Any very large and highly variegated dataset could be an eventual training source, if machine learning gets to that goal. The case reports of a national legal system would be an example, too, of the kind of knowledge base that could be used to train a machine learning system. To the extent that computer science finds ways to broaden the data that can be used to train a machine learning system, the data training set will come that much more to resemble Holmes’s concept of experience as the basic stuff in which are found the patterns —texts of all kinds, and experience of all kinds.

Now, we turn to finding patterns , which is to say how prediction is arrived at from the data that is given.

Notes

  1. 1.

    Holmes (1881) op. cit. Prologue, p. xii, n. 9.

  2. 2.

    Id.

  3. 3.

    When writing for the Supreme Court on a question of the law of Puerto Rico, Justice Holmes reiterated his earlier idea about experience, here concluding that the judge without the experience ought to exercise restraint. The range of facts that Holmes identified as relevant are similar to those he identified forty years earlier in The Common Law:

    This Court has stated many times the deference due to the understanding of the local courts upon matters of purely local concern… This is especially true in dealing with the decisions of a Court inheriting and brought up in a different system from that which prevails here. When we contemplate such a system from the outside it seems like a wall of stone, every part even with all the others, except so far as our own local education may lead us to see subordinations to which we are accustomed. But to one brought up within it, varying emphasis, tacit assumptions, unwritten practices, a thousand influences gained only from life, may give to the different parts wholly new values that logic and grammar never could have gotten from the books. Diaz et al. v. Gonzalez et al., 261 U.S. 102, 105–106, 43 S.Ct. 286, 287–88 (Holmes, J.) (1923).

    Legal writers, in particular positivists, “have long debated which facts are the important ones in determining the existence and content of law.” Barzun, 69 Stan. L. Rev. 1323, 1329 (2017). Holmes’s writings support a broad interpretation of “which facts…” he had in mind, and he was deliberate when he said that it is only “[t]he theory of our legal system… that the conclusions to be reached in a case will be induced only by evidence and argument in open court, and not by any outside influence”: Patterson v. Colorado ex rel. Att’y Gen., 205 U.S. 454, 562 (1907) (emphasis added).

  4. 4.

    Gompers v. United States, 233 U.S. 604, 610 (1914).

  5. 5.

    See Rabban (2013) 215–68.

  6. 6.

    Chapter 5, pp. 54–57.

  7. 7.

    See further Chapter 10, pp. 114–119.

  8. 8.

    Kroll et al., op. cit., n. 76, at 660, point out that online learning systems pose additional challenges for algorithmic accountability.