Fuld (2000) argued that function allocation ‘is a useful theory but not a practical method’ (p. 231). We point out that the very purpose of science is theory development and the organisation of knowledge in the form of testable explanations, rather than merely being a practical method. In this article, we regard the Fitts list as a scientific theory, and from this perspective, its aim is to explain (or predict) allocation of function decisions already made, not to be used to guide engineering decisions. A function allocation theory should have broad generalisability and apply to a rich variety of real human–machine systems, and at the same time it should accurately describe which functions should be allocated (or are currently allocated) to human and machine.
Below, we invoke philosophy of science to help in judging the adequacy of the Fitts list as a scientific theory, and to explain why the Fitts list has been such a persistent factor throughout the history of function allocation. Three of the most commonly used axiological values in evaluating the appropriateness of scientific theories are precision, generality, and simplicity (Cutting 2000; Popper 1959; Speekenbrink 2005), although other values are regularly included as well. For example, Kuhn (1977) listed five criteria, sometimes designated as ‘The Big Five’: accuracy, consistency, scope, simplicity, and fruitfulness. In this article, we use a similar, but more comprehensive, set of criteria for appraising theories (models), which was composed for the cognitive sciences. This set was originally proposed by Jacobs and Grainger (1994), and later adapted by Pitt et al. (2002):
-
(a)
Plausibility: Are the assumptions of the model plausible?
-
(b)
Explanatory adequacy: Is the theoretical explanation reasonable and consistent with what is known?
-
(c)
Interpretability: Do the model and its parts make sense? Are they understandable?
-
(d)
Simplicity: Does the model capture the phenomenon in the least complex manner?
-
(e)
Descriptive adequacy: Does the model provide a good description of observed data?
-
(f)
Generalisability: Does the model predict well the characteristics of new, as yet unobserved data?
The Fitts list can be argued to fulfil these criteria for appraising scientific models, each of which is addressed below.
Plausibility
The Fitts list makes various assumptions. The psychological needs of the human (affective and emotional requirements, job satisfaction, motivation, fatigue, stress, working under time pressure), temporal effects (learning, contextual variations), individual differences, safety, economic utility, availability, maintainability, the rapid evolution of technology, social values, the iterative design process, task complexity and interconnectedness between functions, as well as the organisational and cultural context, are all not modelled (e.g. Chapanis 1965; Clegg et al. 1989; Drury 1994; Greenstein and Lam 1985; Hancock and Scallen 1996; Price 1985; Sanders and McCormick 1987). Furthermore, the possibility that there will be tasks that neither machines nor humans can do well, or that both can do equally well, is ignored (Clegg et al. 1989; Price 1985). The fact that the Fitts list does not take into account dynamic allocation has also been pointed out by many: ‘Frustration with the MABA–MABA approach led to a very simple insight. Why should function, tasks, etc. be strictly allocated to only one performer? Aren’t there many situations whether either human or computer could perform a task acceptably? …This insight led to identification of the distinction between static and dynamic allocation of functions and tasks’ (Rouse 1994, p. 29, as quoted by Inagaki 2003). Hancock and Scallen (1996) argued against this acontextuality of the list by stating that ‘at all points in the design process, the allocation problem is chronically underspecified. That is, there is never sufficient knowledge of the situation so that all tasks can be described in Fitts-like terms and apportioned respectively’ (p. 27).
Researchers dissatisfied by the general nature of the Fitts list have proposed extended and fine grained models of function allocation. Today, numerous fine-grained function allocation models can be found in the literature (for overviews, see Older et al. 1997; Parasuraman et al. 2000), including variations and extensions of the Fitts list (Bekey 1970; Chapanis 1960; Ip et al. 1990; Sanders and McCormick 1987; Swain and Guttman 1980; US Department of Defense 1987), qualitative or quantitative multi-criteria analyses (Meister 1987; Papantonopoulos 2001), expected value analyses (Sheridan and Parasuraman 2000), flow charts to assist in the design process (Malone and Heasly 2003), mental workload analyses and psychophysiological techniques (Hancock and Chignell 1988; Pope et al. 1995; Prinzel et al. 2003; Reising and Moss 1986; Wei et al. 1998), intent inferencing models (Geddes 1985; Govindaraj and Rouse 1981; see Parasuraman et al. 1992 for an overview), cognitive models (Corker et al. 1997; Degani et al. 1999; see Parasuraman 2000 for an overview), network optimisation (Shoval et al. 1993), and queuing theory (Chu and Rouse 1979; Rouse 1977; Wu et al. 2008). The focus has shifted towards dynamic task allocation (Byrne and Parasuraman 1996; Debernard et al. 1992; Greenstein and Lam 1985; Hancock and Scallen 1996; Kantowitz and Sorkin 1987; Parasuraman et al. 1996; Rencken and Durrant-Whyte 1993; Rieger and Greenstein 1982; Rouse 1988; Scerbo 2007; Sharit 1996), and it is increasingly recognised that automation is not a zero-sum game but can be designed for different levels of human or machine authority and for different processing stages, such as information acquisition, information analysis, decision making, and action implementation (Endsley and Kaber 1999; Parasuraman et al. 2000).
Function allocation models are often evaluated in terms of the number of requirements fulfilled (cf., Older et al. 1997). Consequently, it is tempting to increase the model’s complexity such that it captures a greater variability, for example by including dynamic allocation, trade-offs, and iterative design. However, the fine-grained function allocation models specified above tend to be restricted in scope. In contrast to the Fitts list, they address specific areas, such as when to switch between human and machine as a function of human workload and task accuracy, and specific applications, for example, the ground collision avoidance system tested on fighter aircraft (Hardman et al. 2009). Our observations here are in line with a review about quantitative models in automation by Parasuraman (2000), which concluded that ‘the price of quantification may be a reduction in generality’ (p. 945).
Furthermore, many of the newly proposed function allocation models have limited validity, and have been evaluated in laboratory environments only. As Hollnagel and Cacciabue (1999) rightly pointed out, it is necessary to stay in touch with reality: ‘Investigations that are driven by laboratory and experimental concerns all too easily end up by looking at phenomena that are derived from the theories and models alone. While such investigations may be valuable to determine whether the theories are good theories, in the sense that they can be used to make predictions, they do little to determine whether the theories are valid, i.e. whether they are about real phenomena’ (p. 5). Importantly, several quantitative function allocation models require complicated calculations even for simple tasks, and do not taken into consideration the contextual reality (Parasuraman et al. 1992). For example, Wu et al. (2008) used a queuing network-model human processor to dynamically control the delay times between messages of in-vehicle systems presented to car drivers. Their approach relied on intricate calculations from a cognitive model to provide a numeric estimate of human workload as a function of age, speed, and curvature of the road, as well as a message controller determining optimal delay times between messages. Although their approach provides a precise quantitative estimate of workload, it can be questioned whether their calculations will be valid outside the laboratory environment, in which drivers are subjected to many environmental influences.
We argue that it is illusory and objectionable to expect that a scientific model should capture all the variables described above. Function allocation models should not gain credence merely because they include so many variables that any possible case can be described. The inappropriate tendency of researchers to strive for perfect-fit models has also been recognised by Roberts and Pashler (2000): ‘The use of good fits as evidence is not supported by philosophers of science nor by the history of psychology; there seem to be no examples of a theory supported mainly by good fits that has led to demonstrable progress’ (p. 358). Scientific models are always imperfect to a certain degree in their attempt to maintain predictive validity and to parsimoniously capture the phenomenon of interest (e.g. MacCallum 2003).
What is important in terms of scientific adequacy is whether the assumptions made by a model are plausible. The assumptions of the Fitts list are plausible because they have managed to capture the most important regularity of automation: if the machine surpasses the human, the function must be automated; if not, it does not make sense to automate. The Fitts list states that the primary (but not necessarily the only) driving force behind automation should be performance: precision, power, speed, cost. These are factors that Sheridan (2004) called ‘the obvious advantages of automation’ (p. 163), while Wickens (1992) similarly explained that the purpose of automation is improving performance, namely: ‘performing functions that the human operator cannot perform because of inherent limitations … performing functions that the human operator can do but performs poorly or at the cost of a high workload … augmenting or assisting performance in areas in which humans show limitations’ (pp. 531–532).
Explanatory adequacy
The Fitts list is internally consistent in the sense that its 11 statements are diverse and non-contradictory. The Fitts list has a solid theoretical basis because it was developed ‘on the basis of what psychologists know at the present time about the limiting characteristics of human capacity and performance’ (p. 5), including overload, stress, fatigue, inattention, boredom, and short-term memory, and it used an information-processing approach (or communication theory in the terms used in the report), a dominant paradigm within cognitive psychology and human factors research (Proctor and Vu 2010). Even some of the strongest critics of the Fitts list recognised that the comparative nature of human and machine is theoretically an elegant solution to the allocation of functions and that ‘the facts to be found in all the existing versions of the Fitts list are all correct’ (Jordan 1963, p. 162).
The rejection of comparability
A number of researchers have criticised the theoretic foundations of the Fitts list by arguing against its elementaristic (atomistic, reductionistic, materialistic, mechanistic or information processing) character that forces a description of humans based on machine capabilities and human limitations. They have suggested that the Fitts list implies separation and comparability of human and machine, and that complementarity is what is important instead (Campbell and Essens 1996; Goom 1996; Fallon 2006; Jordan 1963; Hoffman et al. 2002; Hollnagel and Bye 2000; Kantowitz and Sorkin 1987). As Hancock (2009) noted: ‘For a variety of reasons, although this endeavour is well intentioned, this bipartite approach is unlikely to succeed either in principle or in practice. In principal it is a fallacious approach since it acts to dichotomize human and machine in the very instances where the human–machine linkage should be the unit of concern’ (p. 100).
A growing chorus of researchers favouring the complementarity viewpoint have found resort in theories that appraise overall function congruence and function matching with the aim to fulfil higher-order commitments such as maintaining control and resilience (Dekker and Hollnagel 1999; Dekker 2011; Hollnagel 2004; Hollnagel et al. 2006; McCarthy et al. 2000). The focus herein is on the complexity and emergent behaviour of systems and on the importance of reciprocal relationships and complementarity (as well as joint work, teamwork, team play, partnership, cooperation, collaboration, joint performance, respect or symbiosis) between human and machine (Bye et al. 1999; Christoffersen and Woods 2002; Dekker 2011; Downs et al. 1988; Grote et al. 1995; Hancock 1993; Hoc 2001; Leveson 2004; Malin et al. 1991). A more extreme form of these theories entails the complete rejection of the notion of an a priori allocation of functions. In a series of articles, Dekker and Woods (2002), Dekker and Hollnagel (2004) and Hollnagel and Woods (2005) rejected function allocation completely, and the Fitts list in particular, on the grounds that it relies on the so-called ‘substitution myth’ (a term originally proposed by Sarter et al. 1997, p. 1) and the ‘false idea that people and computers have fixed strengths and weaknesses’ (Dekker and Woods 2002, p. 241). They argued that ‘capitalizing on some strength of computers does not replace a human weakness. It creates new human strengths and weaknesses—often in unanticipated ways’ (Dekker 2005, p. 162). Dekker and Woods recommended that ‘system developers abandon the traditional ‘who does what’ question of function allocation’ (p. 243) and consider how to turn automated systems into effective ‘team players’ that coordinate work. These provocative commentaries by Dekker and others represent the apex of a move away from human-in-the-loop control and borrowed engineering models towards supervisory and cognitive control of increasingly complex systems (Hancock 2009; Hollnagel and Cacciabue 1999; Sheridan 2000, 2004). As explained by McCarthy et al. (2000), the field has seen a ‘shift from a reductionist separation of qualitatively different humans and machines, to an attempt at their integration in socio-technical systems and other systemic approaches’ (p. 198). This ‘giant swing away from simpler human functions used with proceduralized equipment to much more complex cognitive enterprises’ (Meister 1999, p. 222) is driven by a raft of new technologies (Byrne and Gray 2003), in particular the computer, which have changed the role of human operators from manual control to monitoring and directing of automation (Sheridan 2004).
Theories focusing on complementarity are undoubtedly useful because they provide broad insight into the variables that need to be considered in an iterative multivariate design process. However, they do not provide explicit answers as to whether a function should be automated or not. They are also relatively immune to scientific scrutiny, as they cannot be compared in terms of goodness of fit and the degree of falsifiability. In response to Dekker and colleagues, Lintern (in press) argued that abandoning a concrete interest in function allocation cannot be taken seriously if one wants to engage with engineers and other design communities. Sheridan (2004) has also pointed out that the meaning of human–machine cooperation is yet to be worked out in terms useful to humans. It is noteworthy that the authors of the Fitts report already acknowledged the importance of a systems approach, but also recognised the criterion problem, and that a reductionist strategy is required: ‘Requirements such as safety and efficiency define the goal, or ultimate criteria, for which the system is designed. However, the researcher usually cannot deal directly with ultimate criteria but must seek intermediate or proximate indices-of-merit for various parts of the system’ (p. xii).
The paradox of comparability
Some researchers have attempted to invalidate the Fitts list on theoretical grounds by pointing out that when human functions are described in mechanical terms, it is always possible to generally build a machine that could perform more efficiently than the human (Hancock and Scallen 1996). This will inescapably lead to the design philosophy to ‘design the man out of the system’ (Jordan 1963, p. 162). As Jordan further explained it, ‘to the extent that man becomes comparable to a machine we do not really need him any more since he can be replaced by a machine’ (p. 162), and Reason (1987), ‘the credibility of Fitts List foundered on a simple paradox: If a task could be described exactly (i.e. in mathematical terms), then a machine should perform it; if not, it could only be tackled using the ill-defined flexibility of a human being’ (p. 468). Ironically, ahead of his critics, Fitts (1962) had already recognised the same paradox: ‘If we understand how a man performs a function, we will have available a mathematical model which presumably should permit us to build a physical device or program a computer to perform the function in the same way (or in a superior manner). Inability to build a machine that will perform a given function as well as or better than a man, therefore, simply indicates our ignorance of the answers to fundamental problems of psychology’ (p. 34).
It can be argued that the paradox is fallacious, since the Fitts list (1951) explicitly acknowledged that humans surpass machines in aspects that are uniquely human and cannot be described mechanistically. For example, it was stated that ‘automatic computers are superior in speed and accuracy to human brains in deductive reasoning, but no success has been attained in constructing a machine which can perform inductive reasoning’ (Fitts 1951, p. 8), and that ‘human engineering, if it is to escape the dilemma of the old time and motion study engineering, must guard against exclusive use of the ‘machine’ model in its theory of human behavior’ (Fitts 1951, p. v, quote from T Gordon in the editorial forward by MS Viteles). In other words, the criticism that the Fitts list implies that technology determines the language of attributes (Dekker and Woods 2002) or that ‘technology (with the right capabilities) can be introduced as a simple substitution of machines for people’ (Woods 2002, p. 15) is false, precisely because the unique heuristic human capabilities are such a central theme of the list.
Interpretability and simplicity
The comprehensibility of the Fitts list is perhaps one of the key reasons behind its success. It does not contain complex equations, interconnected functions, or other forms of complexity. According to Sheridan (2004), ‘no other allocation model has replaced it in terms of simplicity and understandability’ (p. 60). The only simpler function allocation formulations we could find were: ‘humans should be left deal with the ‘big picture’, while the computer copes with the details’ (Sheridan 1997, p. 91), and ‘men are flexible but cannot be depended upon to perform in a consistent manner whereas machines can be depended upon to perform consistently but they have no flexibility whatsoever’ (Jordan 1963, p. 163), both representing the Fitts list in a reduced form.
Descriptive adequacy
The categorisation in the Fitts list is qualitative (not numeric), but indicates the direction of the effect on specific comparisons. It is therefore more specific than many other function allocation methods, such as flow charts, which mention variables that should be taken into consideration but do not provide explicit answers with respect to what to automate and what not to automate.
The predictions of the Fitts list are in line with empirical data about how automation is usually implemented in actual human–machine systems, such as in aviation, robotics, and car driving. Indeed, ‘in present systems, the machines (computers) usually take care of data acquisition and automatic controls, whereas the operators are left with the tasks of state identification, diagnosis, planning and decision making’ (Hollnagel and Cacciabue 1999, p. 3) and this allocation is so embedded in our modern-day thinking that it can be regarded as obvious (Sheridan 2004). As Sheridan and Verplank (1978) first stated 25 years ago (see also Sheridan 2004), it is the lowest-entropy tasks in particular (routine, repetitive tasks) that are automated, whereas the high-entropy tasks are left to the human operator, which is in agreement with the Fitts list. This automation principle was already discussed in the Fitts report: ‘In general, machines excel humans in the kinds of things we have already turned over to them in our society—especially tasks requiring great strength, and tasks of a very routine nature’ (p. 8).
Generalisability
The Fitts list applies to a range of different functions, both physical and mental. Furthermore, and more arguably, the Fitts list is generalisable over time. When published in 1951, there were few computers (note the vacuum tube in Fig. 2) and the human factors discipline had only recently been established. Some have argued that it is no longer valid because machines have surpassed humans in many more categories not mentioned in the original Fitts report (Chapanis 1965; Kantowitz and Sorkin 1987; Parasuraman et al. 2008). Indeed, computers have become a billion times faster since 1951 (e.g. NUTD’s Tianhe-1A, with a speed of 2.5 petaflops, deployed in 2010 and IBM’s Sequoia, with 20 petaflops, expected later this year; see also Moore’s law; Kurzweil 2005; Moore 1965), with a speed of response to signals down to the sub-picosecond (e.g. optical gates, Hulin et al. 1986; atomic clock with an accuracy of 1 ns per day), inductive reasoning having been introduced into computers in the form of machine learning, and computer statistical prediction competing with human judgement (Grove et al. 2000). Computers now surpass humans in various perceptual and cognitive activities under certain circumstances, including playing chess and face recognition (O’Toole et al. 2009), lip-reading (Hilder et al. 2009), or answering basic knowledge questions (Ferrucci 2010).
Despite all these developments, the promises of strong artificial intelligence set forth in the 1960s have not been fulfilled. In highly automated systems, the role of the human is to keep track the bigger picture by perceiving patterns, inductive reasoning, and improvisation (Sheridan 2004), which is in accordance with the Fitts list. Even in aviation, one of the most automated disciplines (Sheridan 2004, p. 14), it is anticipated that the role of the human pilot will remain important for the foreseeable future (Mulder 2009). This is in line with what the Fitts report predicted 60 years ago: ‘It appears likely, that for a good many years to come, human beings will have intensive duties in relation to air navigation and traffic control’ (Fitts 1951, p. 11).