Abstract
Einstein’s razor, a corollary of Ockham’s razor, is often paraphrased as follows: make everything as simple as possible, but not simpler. This rule of thumb describes the challenge that designers of a legal system face—to craft simple laws that produce desired ends, but not to pursue simplicity so far as to undermine those ends. Complexity, simplicity’s inverse, taxes cognition and increases the likelihood of suboptimal decisions. In addition, unnecessary legal complexity can drive a misallocation of human capital toward comprehending and complying with legal rules and away from other productive ends. While many scholars have offered descriptive accounts or theoretical models of legal complexity, most empirical research to date has been limited to simple measures of size, such as the number of pages in a bill. No extant research rigorously applies a meaningful model to real data. As a consequence, we have no reliable means to determine whether a new bill, regulation, order, or precedent substantially effects legal complexity. In this paper, we begin to address this need by developing a proposed empirical framework for measuring relative legal complexity. This framework is based on “knowledge acquisition”, an approach at the intersection of psychology and computer science, which can take into account the structure, language, and interdependence of law. We then demonstrate the descriptive value of this framework by applying it to the U.S. Code’s Titles, scoring and ranking them by their relative complexity. We measure various features of a title including its structural size, the net flow of its intra-title citations and its linguistic entropy. Our framework is flexible, intuitive, and transparent, and we offer this approach as a first step in developing a practical methodology for assessing legal complexity.
Similar content being viewed by others
Notes
Recent evidence points to a potential bipartisan political constituency in favor of at least basic overtures toward simplicity. H.R. 946: Plain Writing Act of 2010 (Signed by President Obama on October 13, 2010) is designed “[T]o enhance citizen access to Government information and services by establishing that Government documents issued to the public must be written clearly, and for other purposes”.
This data set was provided by the Cornell Legal Information Institute and can be accessed at http://hula.law.cornell.edu/uscode_xml_dist/usc-xml-2010-10-28/. The United States Code features a total of fifty Titles. However, Title 34—Navy has been repealed. With the recent approval of Title 51—National and Commercial Space Programs the United States Code will once again feature a total of fifty active Titles. All code and additional replication materials are available here https://github.com/mjbommar/us-code-complexity.
End users of the Code are actors who interact directly with its text. End users include not only sophisticated parties, such as lawyers and lawmakers, but also laypersons, public interest groups, and businesses.
A “slip law” is the first print of a new law in pamphlet form, usually available 2–3 days after enactment. The Government Printing Office (GPO) offers a useful description of this process see http://www.gpoaccess.gov/plaws/about.html.
While we do not incorporate these regulations into our analysis, we recognize that their incorporation would paint a more complete picture of the relevant legal landscape. Through a process similar to the compilation of the United States Code, federal regulations are subsequently compiled by topic in the Code of Federal Regulations (C.F.R.). Of course, administrative regulations and the United States Code are not the only sources of federal legal materials. There also exist additional materials such as judicial decisions, executive orders, revenue rulings, etc.
As Title 9 is the smallest Title in the United States Code, it allows us to clearly indicate these distinctions that would otherwise be obscured by the size of the tree for other Titles.
While Chapter 1 is explicitly labeled, the remaining Chapters are located at the same horizontal level of the hierarchy.
Since Tn is a tree as in Fig. 1, As must be |V| − 1.
All code and additional replication materials are available here https://github.com/mjbommar/us-code-complexity.
The online appendix can be access here: http://computationallegalstudies.com/measuring-legal-complexity-appendix/.
One obvious weakness with our proposed measure of linguistic complexity is its failure to capture the underlying semantics. As this is an introductory effort, we would invite future work focused upon this particular dimension of the question.
We acknowledge the extensive literature on text complexity (e.g., Flesch and Gould 1949; Kincaid et al. 1975; Si and Callan 2001). Much of this work, however, is directed at reading comprehension of standard sentences and paragraphs. The United States Code is a specialized document with its passages separated by the unique presentation formatting used to display statutes.
We offer this as a ceteris paribus proposition across the millions of words contained in the United States code.
We selected tokens rather than other alternative length measures, such as pages, as we believe these are far less likely to be impacted by formatting conventions.
There exist additional potential complications. For example, in some instances, longer words are more specific and thus their use can result in less ambiguity.
It should be acknowledged that compression ratios are a common alternative to entropy measures. However, due to the large variation in compression algorithms and their implementation-specific behaviors, we felt that simple Shannon entropy was a more reproducible measure than compression.
This is the red, green, blue or RGB value. A pure black canvas has an RGB value = #000000.
In the random signal case, each pixel is assigned a random color assignment. The pseudocode for this assignment requires a randomly generated string of numbers where the assigned number corresponds to an RGB value and the length of the string is equal to the number of pixels on the canvas.
In expectation, given an initial random assignment of pixel colors and a reasonably large canvas, there is likely to be at least some clustering of RGB values. This implies that at least some form of reduced representation is possible. However, this compression will be nominal.
In the context of message compression, the fragment “orange in of going the not large kick more end to …” does not easily lend itself to reduced form representation.
In the case of the uniform signal, the first fragment “dog” is the only new information content that is imparted to the end user. With only the first fragment and the total length of the message the signal could be quickly compressed.
While a number of alternative and more sophisticated forms of entropy exist, the original Shannon entropy measure is the most straight-forward measure and is still commonly used in the information science literature. Thus, for the purpose of comparing the distribution of words within Titles, we apply the Shannon entropy. For additional work on entropy see Tsallis (1988) and Rényi ( 1961 ).
Following upon common practice in the field of information retrieval and computational linguistics, we use the stopword list from the Natural Language Toolkit (NLTK) available at http://www.nltk.org/.
Of course, if an element contains no citations whatsoever, then the protocol above collapses to only the first two rules. However, given many elements of the Code do contain citations, we embed this consideration into our analysis.
This “walk” is by no means a random walk. Rather, it could better be described as a special case of graph traversal. These extended citation paths can grow to be quite lengthy. The maximum path length from 46 USC §51510 and 7 USC §87e requires thirty-two separate steps to complete.
As an additional complication, note that when a named Act like the IRC of 1986 is cited, one must consult a short name list in order to determine where the Act was codified.
Instead, the citation graph disobeys the hierarchical or vertical tree and memorializes various horizontal connections between elements.
In the strongly connected component of the graph, there is a directed path from each vertex in the graph to every other vertex. In the weakly connected component, there is an undirected path from each vertex in the graph to every other vertex.
A given section can feature internal references to other internal provisions. For purposes of this measurement, we do not distinguish this case from the more general case of interdependence.
It is really important to highlight the wide set of potential composite complexity measures that one could contemplate. The purpose of this article is to set forth some of the core components that might be contemplated in a future application.
The online appendix can be access here: http://computationallegalstudies.com/measuring-legal-complexity-appendix/.
In this case, this is akin to assigning each measure a weight of \(\frac{1}{3}\).
While mere averaging has a certain attraction, it also represents a somewhat arbitrary approach. Given that we do not have any specific theoretical grounds that justify a departure, we have chosen this naïve approach.
In this case, “normalization” implies that in all components that comprise the composite measure the size of the Title is controlled for in one respect or another. Therefore, the measured highlighted Table 12 all measures feature a “per section” or some other analogous form of standardization.
Again, these are structure, interdependence and language.
The online appendix can be access here: http://computationallegalstudies.com/measuring-legal-complexity-appendix/.
The online appendix can be access here: http://computationallegalstudies.com/measuring-legal-complexity-appendix/.
References
Achen CH (1978) Measuring representation. Am J Polit Sci 22:475–510
Ansolabehere S, Snyder JM Jr, & Stewart C III (2001) Candidate positioning in US House elections. Am J Polit Sci 45:136–159
Arrow KJ (1963) Social choice and individual values. Yale University Press, New Haven
Austen-Smith D, Banks JS (1996) Information aggregation, rationality, and the Condorcet jury theorem. Am Polit Sci Rev 90:34–45
Barton BH (2008) Judges, lawyers, and a predictive theory of legal complexity. University of Tennessee Legal Studies Research Paper No. 31
Bates JE, Shepard HK (1993) Measuring complexity using information fluctuation. Phys Lett A 172(6):416–425
Becker GS (1983) A theory of competition among pressure groups for political influence. Q J Econ 98(3):371–400
Bibel LW (2004) AI and the conquest of complexity in law. Artif Intell Law 12(3):159–180
Bittker BI (1974) Tax reform and tax simplification. U Miami L Rev 29:1
Black D (1948) On the rationale of group decision-making. J Polit Econ 56(1):23
Bommarito MJ II, Katz DM (2010) A mathematical approach to the study of the United States code. Physics A 389(19):4195–4200
Bommarito II, Michael J, Katz DM (2009) Properties of the United States code citation network. arXiv preprint arXiv:0911.1751
Bonanno C, Collet P (2007) Complexity for extended dynamical systems. Commun Math Phys 275(3):721–748
Boose JH (1989) A survey of knowledge acquisition techniques and tools. Knowl Acquis 1(1):3–37
Boose JH, Gaines BR (1990) The foundation of knowledge acquisition. Academic Press Professional, San Diego
Bose R (2002) Information theory, coding and cryptography. Tata McGraw-Hill Education
Boulet R, Mazzega P, Bourcier D (2011) A network approach to the French system of legal codes—part I: analysis of a dense network. Artif Intell Law 19(4):333–355
Bourcier D, Mazzega P (2007) Toward measures of complexity in legal systems. In: Proceedings of the 11th international conference on artificial intelligence and law. ACM, pp 211–215
Bourcier D, Mazzega P (2007b) Codification law article and graphs. In: Lodder AR, Mommers L (eds) Legal knowledge and information systems. IOS Press, pp 29–38; ISBN 978-1-58603-810-6
Buchanan JM, Tullock G (1965) The calculus of consent: logical foundations of constitutional democracy, vol 100. University of Michigan Press, Ann Arbor
Buckley JJ (1984) The multiple judge, multiple criteria ranking problem: a fuzzy set approach. Fuzzy Sets Syst 13(1):25–37
Cecil MA (1999) Toward adding further complexity to the internal revenue code: a new paradigm for the deductibility of capital losses. U Ill L Rev 1083–1139
Cimiano P, Hotho A, Staab S (2005) Learning concept hierarchies from text corpora using formal concept analysis. J Artif Intell Res 24:305–339
Cox GW, McCubbins MD (2007) Legislative leviathan: party government in the House. Cambridge University Press, Cambridge
Csiszar I (1991) Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. Ann Stat 19(4):2032–2066
Donaldson SA (2003) Easy case against tax simplification. Va Tax Rev 22:645
Downs A (1957) An economic theory of democracy. Harper & Brothers, New York
Dworkin R (1986) Law’s empire. Harvard University Press, Cambridge
Eckenrode RT (1965) Weighting multiple criteria. Manag Sci 12(3):180–192
Einstein A (1934) On the method of theoretical physics. Philos Sci 1(2):163–169
Epstein RA (1995) Simple rules for a complex world. Harvard University Press, Cambridge
Epstein RA (2004) The optimal complexity of legal rules. Law School, University of Chicago. Olin Working Paper No. 210
Eustice JS (1989) Tax complexity and the tax practitioner. Tax L Rev 45:7
Fainmesser I, Fershtman C, Gandal N, Panunzi F (2005) A consistent weighted ranking scheme with an application to NCAA college football rankings. Centre for Economic Policy Research
Feldman DP, Crutchfield JP (1998) Measures of statistical complexity: why? Phys Lett A 238(4):244–252
Feltovich PJ, Spiro RJ, Coulson RL, Myers-Kelson A (1995) Reductive bias and the crisis of text (in the law). J Contemp Legal Issues 6:187
Ferstl EC, von Cramon DY (2007) Time, space and emotion: fMRI reveals content-specific activation during text comprehension. Neurosci Lett 427(3):159–164
Flesch R, Gould AJ (1949) The art of readable writing. Harper, New York, p 196
Flournoy A (1994) Coping with complexity. Loyola of Los Angeles Law Rev 27(3):809
Francesconi E (2011) A learning approach for knowledge acquisition in the legal domain. In: Sartor G, Casanovas P, Biasiotti M, Fernández-Barrera M (eds) Approaches to legal ontologies. Springer, Netherlands, pp 219–233
Frisch D (2011) Commercial law’s complexity. Geo Mason L Rev 18:245
Ganapathi V, Vickrey D, Duchi J, Koller D (2012) Constrained approximate maximum entropy learning of markov random fields. arXiv preprint arXiv:1206.3257
Gibbard A (1973) Manipulation of voting schemes: a general result. Econometrica 41(4):587–601
Golan A, Judge G, Perloff J (1997) Estimation and inference with censored and ordered multinomial response data. J Econom 79(1):23–51
Halford GS, Busby J (2007) Acquisition of structured knowledge without instruction: the relational schema induction paradigm. J Exp Psychol Learn Mem Cogn 33(3):586
Hamming RW (1986) Coding and information theory. Prentice-Hall, Englewood Cliffs
Harsanyi JC (1955) Cardinal welfare, individualistic ethics, and interpersonal comparisons of utility. J Polit Econ 63:309–321
Holsapple CW, Raj V, Wagner WP (2008) An experimental investigation of the impact of domain complexity on knowledge acquisition (KA) methods. Expert Syst Appl 35(3):1084–1094
Iria J (2009) A core ontology of knowledge acquisition. In: Aroyo L, Traverso P, Ciravegna F, Cimiano P, Heath T, Hyvönen E, Mizoguchi R, Oren E, Sabou M, Simperl E (eds) The semantic web: research and applications. Springer, Berlin, pp 233–247
Jaynes ET (1957) Information theory and statistical mechanics. Phys Rev 106(4):620
Kades E (1997) Laws of complexity and the complexity of laws: the implications of computational complexity theory for the law. Rutgers L Rev 49:403
Kaplow L (1995) A model of the optimal complexity of legal rules. J Law Econ Organ 11:150
Kincaid JP, Fishburne RP Jr, Rogers RL, Chissom BS (1975) Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel (No. RBR-8-75). Naval Technical Training Command Millington TN Research Branch
Kintsch W, Van Dijk TA (1978) Toward a model of text comprehension and production. Psychol Rev 85(5):363
Kirman AP, Zimmermann JB (2001) Economics with heterogeneous interacting agents, vol 503. Springer, Heidelberg
Kolmogorov AN (1965) Three approaches to the quantitative definition ofinformation. Probl Inf Transm 1(1):1–7
Koppelman SA (1989) At-risk and passive activity limitations: can complexity be reduced. Tax L Rev 45:97
Lall A, Sekar V, Ogihara M, Xu J, Zhang H (2006) Data streaming algorithms for estimating entropy of network traffic. ACM SIGMETRICS Perform Eval Rev 34(1):145–156. ACM
Landauer R (1988) A simple measure of complexity. Nature 336:306–307
Landauer R (1996) The physical nature of information. Phys Lett A 217(4):188–193
Lazer D, Pentland AS, Adamic L, Aral S, Barabasi AL, Brewer D, Van Alstyne M (2009) Life in the network: the coming age of computational social science. Science (New York, NY) 323(5915):721
Lloyd S, Pagels H (1988) Complexity as thermodynamic depth. Ann Phys 188(1):186–213
Long SB, Swingen JA (1987) An approach to the measurement of tax law complexity. J Am Tax Assoc 8(2):22–36
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Mazzega P, Bourcier D, Bourgine P, Nadah N, Boulet R (2011) A complex-system approach: legal knowledge, ontology, information and networks. In: Sartor G, Casanovas P, Biasiotti M, Fernández-Barrera M (eds) Approaches to legal ontologies. Springer, Netherlands, pp 117–132
McCaffery EJ (1990) Holy grail of tax simplification. Wis L Rev 1267–1322
McKelvey RD (1976) Intransitivities in multidimensional voting models and some implications for agenda control. J Econ Theory 12(3):472–482
McKelvey RD (1986) Covering, dominance, and institution-free properties of social choice. Am J Polit Sci 30:283–314
Mitchell M (2009) Complexity: a guided tour. Oxford University Press, Oxford
Nigam K, Lafferty J, McCallum A (1999) Using maximum entropy for text classification. In: IJCAI-99 workshop on machine learning for information filtering, vol 1, pp 61–67
Ohm P (2009) Computer programming and the law: a new research agenda. Vill L Rev 54:117
Ostrom E (1990) Governing the commons: the evolution of institutions for collective action. Cambridge University Press, Cambridge
Pagallo U (2010) As law goes by: topology, ontology, evolution. In: Casanovas P, Pagallo U, Sartor G, Ajani G (eds) AI approaches to the complexity of legal systems. complex systems, the semantic web, ontologies, argumentation, and dialogue. Springer, Berlin, pp 12–26
Page SE (2008) Uncertainty, difficulty, and complexity. J Theor Polit 20(2):115–149
Paul DL (1997) Sources of tax complexity: how much simplicity can fundamental tax reform achieve. NCL Rev 76:151
Phelan DR (2009) The effect of complexity of law on litigation strategy. In: Masson A, Shariff MJ (eds) Legal strategies. Springer, Berlin, pp 335–351
Pitt MM, Slemrod J (1989) The compliance cost of itemizing deductions: evidence from individual tax returns. Am Econ Rev 79:1224–1232
Pollock E, Chandler P, Sweller J (2002) Assimilating complex information. Learn Instr 12(1):61–86
Poole KT, Rosenthal H (1991) Patterns of congressional voting. Am J Polit Sci 35:228–278
Quade D (1979) Using weighted rankings in the analysis of complete blocks with additive block effects. J Am Stat Assoc 74:680
Rényi A (1961) On measures of entropy and information. In: Fourth Berkeley symposium on mathematical statistics and probability, pp 547–561
Riker WH (1962) The theory of political coalitions, vol 578. Yale University Press, New Haven
Rook LW (1993) Laying down the law: canons for drafting complex legislation. Or L Rev 72:663
Rothkopf MH, Pekeč A, Harstad RM (1998) Computationally manageable combinational auctions. Manag Sci 44(8):1131–1147
Ruhl JB (2008) Law’s complexity: a primer. Ga St UL Rev 24:885
Sanderson M, Croft B (1999) Deriving concept hierarchies from text. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, ACM, pp 206–213
Schenk DH (1989) Simplification for individual taxpayers: problems and proposals. Tax L Rev 45:121
Schennach SM (2005) Bayesian exponentially tilted empirical likelihood. Biometrika 92(1):31–46
Schnotz W, Kürschner C (2008) External and internal representations in the acquisition and use of knowledge: visualization effects on mental model construction. Instr Sci 36(3):175–190
Schuck PH (1992) Legal complexity: some causes, consequences, and cures. Duke Law J 42:1–52
Schuck PE (2000) The limits of law. Westview Press, Boulder
Sen A (1970) Collective choice and social welfare. Holden Day, San Francisco
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423
Shannon CE (1951) Prediction and entropy of printed English. Bell Syst Tech J 30(1):50–64
Shoham Y, Leyton-Brown K (2009) Multiagent systems: algorithmic, game-theoretic, and logical foundations. Cambridge University Press, Cambridge
Si L, Callan J (2001) A statistical model for scientific readability. In: Proceedings of the tenth international conference on Information and knowledge management. ACM, pp 574–576
Slemrod J (2005) The etiology of tax complexity: evidence from US state income tax systems. Public Financ Rev 33(3):279–299
Slemrod JB, Blumenthal M (1996) The income tax compliance cost of big business. Public Financ Rev 24(4):411–438
Soofi ES (2000) Principal information theoretic approaches. J Am Stat Assoc 95(452):1349–1353
Spiro RJ, Jehng JC (1990) Cognitive flexibility and hypertext: theory and technology for the nonlinear and multidimensional traversal of complex subject matter. Cogn Educ Multimed Explor Ideas High Technol 163–205
Stoop R, Stoop N, Bunimovich L (2004) Complexity of dynamics as variability of predictability. J Stat Phys 114(3–4):1127–1137
Surrey SS (1969) Complexity and the internal revenue code: the problem of the management of tax detail. Law Contemp Probl 34:673–710
Sweller J, Chandler P (1994) Why some material is difficult to learn. Cogn Instr 12(3):185–233
Tang A, Jackson D, Hobbs J, Chen W, Smith JL, Patel H, Beggs JM (2008) A maximum entropy model applied to spatial and temporal correlations from cortical networks in vitro. J Neurosci 28(2):505–518
Tress W (2009) Lost laws: what we can’t find in the United States code. Golden Gate UL Rev 40:129
Tsallis C (1988) Possible generalization of Boltzmann–Gibbs statistics. J Stat Phys 52(1–2):479–487
Tukey JW (1957) Sums of random partitions of ranks. Ann Math Stat 23:987–992
Tullock G (1995) On the desirable degree of detail in the law. Eur J Law Econ 2(3):199–209
Weingast BR, Marshall WJ (1988) The industrial organization of Congress; or, why legislatures, like firms, are not organized as markets. J Polit Econ 96:132–163
White MJ (1992) Legal complexity and lawyers’ benefit from litigation. Int Rev Law Econ 12(3):381–395
Wright RG (2000) Illusion of simplicity: an explanation of why the law can’t just be less complex. Fla St UL Rev 27:715
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Katz, D.M., Bommarito, M.J. Measuring the complexity of the law: the United States Code. Artif Intell Law 22, 337–374 (2014). https://doi.org/10.1007/s10506-014-9160-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10506-014-9160-8