The Society of Automotive Engineers distinguishes between five levels of automation with different degrees of autonomy (Fig. 1). With the basic driver support features of levels 1 and 2, ADAS systems can accomplish limited tasks, although the driver must constantly supervise the car—at level 1 in what is called hands on-mode, at level 2 in hands off-mode. Advanced automation starts at level 3, in which the car is able to regulate its behavior autonomously in specific situations in eyes off-mode. At level 4, a mind off-mode, the driver can focus on something else, and at level 5 there is no need for a driver—steering wheel optional. The degrees of automation in today's cars range from collision avoidance systems or lane assistants at levels 1 and 2 to recent prototypes of autopilots entailing a suite of interconnected ADAS systems for highway usage at level 2 or 3, and the first driverless shuttles at level 4 being operated commercially by Waymo in strictly geofenced areas in Arizona and California. Whether or not the use of fully autonomous vehicles (level 4 or 5) will ever be viable in all areas of the world is a matter of debate—and since 2019 more and more companies have begun to withdraw their market predictions as the complexities of the task have become clearer (Stilgoe 2020). Nonetheless, even if autonomous systems are not (yet) capable of totally replacing the driver, the relevant hardware is already installed in millions of cars and new software is constantly being developed. Semi-autonomous cars, which are already being produced by almost all companies, generate extreme amounts of data due to the multitude of sensors they possess, and they require powerful CPUs that process these data in real time. Even recent car models at lower levels are sophisticated computer systems.
To gain an understanding of autonomous cars, it no longer suffices to view them as vehicles of traffic that serve the demands of transportation. They are not only combustion or electric engines but also media complexes, supercomputers with a host of different interfaces, highly developed adaptive systems, machines for data processing, and context-sensitive environmental technologies. Equipped with machine learning, they are able to react to the requirements of their environment and the behavior of passengers as well as that of passers-by; they both anticipate possible events and project estimates into the future by modeling their surroundings in different degrees of granularity. Machine learning is one element of this process, but regarding the recent hype, it is important to keep in mind that the process of generating an alternativity for decisions is not dependent upon machine learning.
All ADAS systems and their complex interrelations are meant to optimize interactions with the environment, but the environment with which an autonomous system operates is never a given and isomorphic representation of the world. Rather, at least for more complex combinations of ADAS systems on higher levels of automation, the environment is a fragmentary and operational model that has been created by a specific correlation of sensors with different capabilities, filter algorithms that analyze sensor data, processes of machine learning that optimize pattern recognition, and operating decision modules. By focusing on this intersection of different technologies, this paper explores the epistemological constellation in which the autonomous car's environment is constituted by processes of world modeling that, as is shown below, incorporate probabilities that are fundamental for microdecisions. Microdecisions are decisions within and for the model, even if their results can be perceived as actions of the car. But it is not possible to deduce the process leading to the action from the observation. It is necessary to describe microdecisions on a different scale and with reference to their microtemporality.
In the decision modules of autonomous cars, information from different sources is collected. Use is made of not only algorithmically filtered sensor data about the environment but also odometric data about speed and acceleration, localization, routing, maps, traffic information, etc. This module is tasked with merging this information and extracting decisions that meet requirements of safety, security, and reliability. Due to the heterogeneity of these data, formalizing this process is extremely challenging. At the core of these technologies lies a mutual dependency of decisions, probabilities, and the virtuality of world models. This dependency is central to how these technologies transform the world in which we live. For this reason, it is important to understand at least the basic technological procedures employed here, instead of taking them for granted, and consequently to go into some details that might initially seem obscure. This 'close reading of technology' is not meant to imply any kind of determinism. It is justified because it demonstrates that the core of these technologies cannot be understood with a solely technical vocabulary. Autonomous technologies bring forth a new, yet-to-be-analyzed and fundamentally political constellation of microtemporalities and environmental spaces, of probabilities, data, and decisions.
Within the complex and manifold interrelations of the technical elements that constitute an autonomous car, microdecisions may have different functions. As the following example show, modules for decision-making can be specifically assigned with the task of mediating between the environment, the car’s reactions, and its sensory input. Decision-making and algorithmic systems installed in cars are, of course, proprietary and not accessible to the public. To provide an overview of their functionality, I will refer to a research project called xDriver, developed by Zdzisław Kowalczuk and Michal Czubenko at the University of Gdansk, Poland (2011). This project, which consists of an algorithmic setup for a self-driving car, is not representative of the many different approaches to algorithmic decision-making in environmental technologies, especially because it tries to integrate affective computing into the autonomous system. It can, however, serve as an example for the general structure of decision algorithms. The project’s main research objective is to develop an autonomous system of decision-making by modeling cognitive and psychological attributes of human drivers. This system is highly adaptive to changing environmental conditions, because constant feedback-loops between its components “calculate the estimates of the impact factor of prospective reactions.” (Czubenko et al. 2015: 574). Its basic sensor-based decision algorithm is shown in Fig. 2.
As the algorithmic flow-chart in the figure shows, decision-making is one module of a set of different interconnected tasks that depend upon each other in strictly determined ways. Such an algorithm, understood as a set of rules to be followed to transform an input into an output, consists of dozens of subroutines that manage different tasks and depend upon input from the environment. Of interest in this context is that the module “optimization of decision” is integrated into two interrelated feedback loops with the environment: As a consequence of the decision, the current state of the car is modulated by a reaction. This change of status is instantly fed back into the decision-making process: the car constantly monitors the effects of its decisions on its own status which covary with changes in the scenario. These changes, caused by the car, are part of the second, larger feedback circle in which the car perceives the environment, builds a virtual model and comes to a decision based on projections about the status of the identified objects.
This twofold involvement of the decision into two mutually dependent feedback loops shows that decision-making is not only part of a larger ensemble of algorithmic patterns, but that the module in which the status of the car and its reactions are woven into the environment also depend upon continuous input from the outside. As a result the current state and the decision about the reaction are always temporally connected. The decision is based on an already past state of the environment which is then directly fed back into the decision for the next cycle. What seems trivial becomes decisive once we consider the microtemporality of these processes. The mutual dependency of all influencing factors results in a constant re-adjustment of its decision-based behavior.
The autonomous system must take a model of the car's environment as a basis to make numerous decisions to solve specific situations: braking or not braking, turning right or left, changing lanes or not, and—in the future—perhaps even passing another vehicle or not. These acts cannot be understood solely as the execution of deterministic algorithms that always provide the same output with the same input by specifying a threshold at which a defined reaction must be triggered in the case of a specific event. Nor can they be explained by reference to an instance of consciousness that decides on the basis of knowledge, experience, or perception. Rather, if we stipulate that autonomy in this context means that a car is able to adapt to environmental challenges, then any conception of algorithm-based solutions becomes problematic. Autonomous systems need to have a multiplicity of exit-points linked to specific sequences of interaction. Exit-points are facilitated by algorithmic if–then conditions that evaluate external data compiled by sensors. To take an algorithmic view implies that if the system is exposed to the same conditions, it will react in exactly the same way. However, in a car, an autonomous system must be able to respond to all relevant events in the environment. Its autonomy requires that there be several options open to it. Otherwise, the system would be deterministic, not operationally or strategically autonomous, and would not be able to adaptively adjust to unpredictable and unsafe environments. In fact, the algorithms of autonomous systems include probabilistic models because deterministic responses for all scenarios are, in fact and in principle, quite impossible. The unpredictability of the environment necessitates a probabilistic approach to algorithms that allow for self-adjustment and can have multiple inputs, for example neural networks or Bayesian networks with decision trees that determine different outcomes based on states of the system.
Consequently, it becomes difficult to conceive of autonomous decision-making as the transformation of input into output on the basis of algorithms such as if–then conditions, decision trees, or flow-charts. Probabilistic algorithms and predictive neural networks are well suited to adapt to variable inputs and environmental uncertainties. Nonetheless, as I want to argue, it is important to describe these processes on a different level because, in the simplest possible terms, (micro)decisions cannot be epistemologically reduced to algorithms. Even if we take into account that algorithms are not opposed to probabilistic and predictive models, they are temporally non-specific and not bound to a specific medium. For this reason, I suggest that we find a new language of description and a heuristic instrument that would take into account the openness and nondeterminateness necessary for decisions.Footnote 3
Decisions require time. If a decision happens immediately, it would have been decided in advance and it would be deterministic. Temporality is, thus, key to understanding how these technologies operate. The temporality of microdecisions is an effect of the sheer mass of calculations and the speed of automated processing. Microdecisions exceed human capacities because their numbers and speed can only be accomplished by computers: their quantity is their quality. The algorithmic processes that underlie microdecisions can in principle also be performed by humans because, as formalizations, algorithms can be processed by a machine or a human at any speed with exactly the same results. By definition, time is not critical for algorithms, only for the technologies that apply them. An algorithm returns the same results, regardless of whether it is processed by a human brain or by a digital computer, of whether it takes a microsecond or an hour. But the medium-specific speed and quantity of microdecisions are not substitutable. Time is critical for microdecisions. They need to be performed at levels below human capabilities. Like their temporality, their scope, too, is always a question of sheer computational power, which enables microtemporalities in which decisions can be decoupled from human agency. As all computational processes, they only become effective when processed in times and quantities that are inaccessible to humans. If decisions are drawn in this microtemporality, it becomes necessary to negotiate what it means to decide and to be sovereign.
To explain this microtemporal dimension of automated decision-making and the reciprocal and recursive entanglement of autonomous technologies with their environments, I would like to provide an example that helps reconstruct the framework of alternativity underlying microdecisions. In September 2016, the car manufacturer Tesla released Update 8.0 for the operating system of the onboard autopilot limited to highway applications. The update included a new algorithm for processing the signals received from the vehicle’s built-in radar. The fact that Tesla speaks of an “autopilot” is not to be mistaken as evidence of self-driving capacities at level 4 or 5. Rather, autopilot here means the combination of different ADAS systems: collision warning, autosteer lane centering, self-parking, automatic lane changes, the ability to summon the car, and adaptive cruise control. In this regard, Tesla offers a good example of the constraints, challenges, and potentials of driving assistants, although it is certainly not the only competitor in the field.
Manufactured by Bosch, the mid-range radar sensor (MRR), which is mounted on the underside of the car, is also used by other automakers. Tesla, though, was the first company to introduce a new feature: with algorithms optimized by machine-learning, the on-board processor of the 2016 Model X, an nVidia DRIVE PX2, extends its data analysis potential to the motion of the car driving in front of the Tesla. Beginning with this update, the system uses the fact that the radar signal is reflected between the underbody of the preceding car and the road surface to detect the approximate shape and movement of objects in front of it, even if they are invisible to the driver and the car’s visual sensors (Tesla 2016).
This new application does not rely on improved hardware but on optimized algorithmic processing of available sensor data. It extends the car's reaction radius to events that are invisible to the driver. This implies that the temporality of the intervals of intervention between the car and the driver does not coincide. Two months after the update, a high-speed accident occurred between two non-Teslas on a motorway in the Netherlands, fortunately without any injuries. A dashcam in an uninvolved Tesla Model X recorded the accident. The video, which the driver published online, shows how the Tesla’s onboard autopilot reacts to the imminent collision of the two cars in front of it, even before the collision takes place.Footnote 4 The Tesla brakes automatically, before the driver even has the chance to recognize that something is going to happen, let alone intervene. A warning signal can be heard in the video and the car starts to slow down, but at this moment nothing unusual can yet to be seen on the highway. Seconds later, we realize that the Tesla has, by means of the new algorithm, predicted the collision before it happened, calculating the speed and movement of the vehicle that was two cars in front of the Tesla and thus invisible to the Tesla’s driver. Had the Tesla not responded automatically in this short interval, it might have driven directly into the resulting accident.
This interval of intervention is accessible only to the car, not to the driver. Only after the event do we understand that not only the time to react was below the threshold of human attention and responsiveness—though the driver could potentially anticipate the event—but also that the driver could not have reacted because he could not see anything. The car, on the other hand, anticipates an accident that will only be visible to the driver in its consequences. In other words, the intervals of intervention of the car and the driver do not overlap. The car's algorithms calculate the likelihood of a future in which it would be involved in the accident. In an extremely short time span, far shorter than any possible human response time, it must decide, based on sensor data, between this future and a response that will help avoid this future. The autonomous car brakes before the accident even happens, let alone becomes visible to the driver. It responds to a potential event because its ADAS systems compare the speed and direction of all three cars, calculating the likelihood of the accident and reacting accordingly. This reaction results from a deterministic consequence of the calculation of the probability of the collision. The algorithms are deterministic in their process, i.e., they always deliver the same output with the same input. Hence, if the threshold is exceeded, the ADAS systems, depending on other variables such as weather, road surface, cars driving behind it, initiate the braking process.
The probability of world models (SLAM)
In spite of the givenness of the reaction, it must be possible, in principle, for a decision to be different because only then would the autonomous system be able to interact with an environment that continuously demands new adaptations of behavior. If the vehicle had no alternatives, it would not be able to respond to, and interact with, its unpredictable environment. Its autonomy necessitates not only several exit-points but an openness toward other options and an implementation of a probability calculus on the level of its technological architecture. The key questions are where, in such a process, microdecisions are drawn, how alternatives are created technically, and how alternatives are compared to each other. In the following, I will try to answer these questions by referring to one specific architecture of autonomous systems. There are other architectures, but the examples described here allow to further refine the heuristic of microdecisions (Fig. 3).
To understand how autonomous vehicles interact with their environments, and thereby project the alternativity that underlies microdecisions, it is important to distinguish between a strategic, a tactical, and an operational level. This subdivision and the corresponding illustration were developed in the context of a research project on the architecture of self-driving cars at the Technical University Braunschweig in Germany. The three levels correspond to the top three rows. Below that is the level of sensor-based data acquisition. While the strategic level is concerned with the navigation of the car between two locations and the operational level is concerned with the execution of driving maneuvers, the tactical level encompasses methods of locating the car in its surroundings and analyzing the situation. Algorithms are inevitably involved at all levels. At the strategic level, they consist, for instance, in calculating routes, and at the operational level, in steering and maneuvering the vehicle. But at the tactical level, algorithms are used to create a probability of possible world models and corresponding options for action.
The figure shows how sensor data flows into the “Feature Abstraction and Model-Based Filtering” module and from there further into “Context/Scene Modeling” and “Road-Level Environmental Modeling.” On the left side, among the externally supplied data, which include street maps or traffic reports, there are three levels of world modeling. Modeling—whether of worlds or scenes—in this context is not the complete representation of the outside world in the mind of the machine, as it were, but the assembly of fragmented sensor data into a viable, i.e., operable, model of the environmental factors relevant to the system.Footnote 5 Any modification or reintroduction of sensors requires an adaptation of these algorithms—as in the Tesla example, by a new filtering of radar data. The different capacities of sensors must be adjusted for the detection of the environment. For example, lidar or “light detection and ranging” (which is not used by Tesla because it is very expensive) uses point-clouds of lasers to accurately capture three-dimensional models. Lidar has a limited range, but it determines the contours of nearby objects by the measurement of the distance to these objects, while optical cameras are unreliable in close-up situations but provide a long range. In this regard, each sensor technology collects data constituting a sensor-specific environment, and these data can then be merged with data about other environments. This process is called world modeling. Depending on the equipment of the vehicle, the applications range from the calculation of the distance to other road users to 360-degree modeling.
There are different approaches to modeling and I will focus here on one specific method, SLAM that was important for the historical development of the first self-driving cars. This approach has the advantage that it is well documented, while many other technologies are proprietary. Regarding current developments, modeling for operational design domains (ODDs) is a standard approach. The OOD approach determines the conditions of operating the car in advance. It consists of pre-given spatial and operational limits based on incremented maps and pre-collected data of the strictly defined domains in which an autonomous car is supposed to operate. It also defines situations in which the car cannot operate safely without the driver's support. These domains can consist of a geofenced area, an approach pursued by Waymo and Uber for example, or for specific modes of behavior, for example driving on a highway or in bad weather. If the conditions are not met, the car returns the driving tasks to the driver. For robotics—the historical context of the development of SLAM—the challenge consists in engaging with uncertain and unknown environments in general, something that is replaced in ODDs by prefabricated maps and definitions. SLAM, in contrast, was developed to find a solution for environmental uncertainty and has also conceptually contributed to a new understanding of autonomy. While ODD might include SLAM modeling for given situations, the originary claim of SLAM is to operate under conditions of uncertainty (for example in extraterrestrial terrain).
Scene or world modeling in each case includes procedures that—in different degrees of granularity—contextualize the isolated sensory data about the environment and the condition of the vehicle, interweave the model of a world, and locate the vehicle. As I will explain in the following, a model created with the help of SLAM is neither identical to the topological space nor a representation of objects but extracted from data about the probabilities of states and events that are themselves extracted from the constantly changing relations between the car and that what is registered by its sensors. The car operates on nothing else than probabilities and has no access to an “objective world” that could be represented as a map. As it will turn out, the mapping of a robot's environment, which is the preliminary step of every world model, is itself an act of calculating probabilities. As a model of probabilities, the vehicle's world model is virtual in the sense that each probability encompasses a multiplicity of alternate futures. World modeling, in the context of such technologies, is always the modeling of possible worlds that are merged in a single model that contains them virtually.
To move in the complex environments of road traffic, an autonomous vehicle must continuously register the states—shape, position, and movement—of the surrounding objects and locate itself in relation to them. It does not have access to a view from the outside but must constantly recalculate its own location and possible reactions to its environment. Since both the vehicle and other road users are mobile, the environmental relationships are constantly changing. Because the vehicle cannot know which position it is currently in, neither its location nor its relationship to other objects are given. In other words, the technical challenge consists of a safe approach to the uncertainty of the environment combined with the unpredictability of the behavior of other road users. Safety in dealing with this double uncertainty (of the system over its own state as well as the environment) is a key component of the strategic and operational autonomy of the vehicle.
The problem of environmental orientation that I am raising here was discussed in robotics and artificial intelligence research around 1990 under the name of “simultaneous location and mapping” (SLAM). This research posited that the initial condition of a robot is its lack of information about its environment (Durrant-Whyte 1987; Smith and Cheeseman 1986). This environment can be mapped only by moving and collecting data with a robot’s available sensors, but these sensors are just as prone to error as the odometry of the robot’s own state. All the data that the robot registers about its environment are relative to its own position and, therefore, dependent on its location, which in turn is needed to locate itself on the map that is to be created. As the robot moves and measures its environment both at the point of origin and during movement, it acquires different sets of data about the environment depending on its positions and the available sensors. These datasets can then be compared by probabilistic techniques that broadly fall under the rubric of Bayesian filters, named after the eighteenth-century mathematician Thomas Bayes. This results in probability values for the new position of the robot as well as for the shape, position, and movement of surrounding objects. As Max Kanderske and Tristan Thielmann write, the "model of the world cannot be deterministic (it cannot be computed with certainty from an initial state), but has to be probabilistic" (Kanderske and Thielmann 2019: 121). All data collected by the robot is relative to its position, and its position can only be estimated in relation to the environment. In short, in order to locate itself, the robot needs a map, and to create a map, it must locate itself (Burgard et al. 2016: 1134). The tasks can only be solved simultaneously: location and mapping are inseparable. On the basis of the collected data, an always fragmentary map or model is constructed. This map only contains probability values. It is a construction of a possible, but probable, world.
This uncertainty, though, can be transformed into an operational probabilistic. In an authoritative 1987 essay, “Uncertain Geometry in Robotics,” engineer Hugh F. Durrant-Whyte (1987) suggests that environmental uncertainty can be processed by algorithms with the help of probabilistic methods. Around 1990, various algorithmic techniques emerged in robotics to solve this problem of dealing with uncertainty (e.g., Kalman filter, FastSLAM, particle filter localization). All of them attempt to merge data collected from available sensors into a world or scene model. The fact that the word “belief” is also often used to denote this model demonstrates that the autonomous system is deemed neither to achieve objective knowledge nor to accurately determine its situation (Thrun et al. 2006a, b: 3). The challenge is that the contours captured by sensors change with each movement of the robot. Accordingly, the model is always bound to the site of sensory observation and has an operative function: “The world modeling system serves as a virtual demonstrative model of the environment for the whole autonomous system” (Beyerer et al. 2010: 138). The model is a map that does not represent the outside world; rather, it shows what is relevant in relation to the robot with respect to what is registered by its sensors.
Because of the SLAM problem, the world or scene models consist of nothing other than probability values about the attributes of the environment (plus, eventually, externally supplied data about traffic, roadmaps, or GPS, which are not accurate enough for handling local navigation). These methods mark a conception of the environment as technically permeated by uncertainty. This uncertainty has two components. It means both the robot’s ignorance regarding its position and the unpredictability of the environment’s dynamics. This duplication of uncertainty is the epistemological core of SLAM.
These methods combine sensory detection and filtering algorithms to solve the SLAM problem. Based on the data about environmental relations, the central processing unit of the robot calculates, through mathematical filters, the probability of the possible positions and movements of the tracked objects in space (Thrun et al. 1998; Matthaei and Maurer 2015). Algorithmic filters are based on the comparison of sensor data collected at different positions and at different times. The superimposition of these data results in probabilities. Roughly speaking, the analysis of sensor data is accomplished by calculating probable attributes of the environment as well as dynamics of the movement by means of Bayesian filters from the measured distances and contours of objects at different times in relation to the robot. A Bayesian filter compares the model of the environment at time t−1 with the sensor data measured from another position at time t. Referring to the superposition of both measurements, the robot calculates the probability of its localization as well as of objects in the environment. Since the present t can always turn out to be different from the future calculated from the past t−1, this probabilistic approach corresponds to the virtuality of a possible but probable world. The identification of specific objects, such as children or traffic signs on the roadside, as well as the calculation of appropriate responses predominantly undertaken by optical cameras and algorithms optimized by machine learning, are only a secondary step to this virtualization of the environment.Footnote 6
Mobile autonomous systems did not become operational until around 2005, when computing capacities were sufficient to optimize algorithms through machine learning and to evaluate in real time data acquired by improved sensors, in particular the new lidar method. At the cutting edge of these technologies were the three Grand Challenges launched in 2004, 2005, and 2007 by the Defense Advanced Research Projects Agency (DARPA) and the US Department of Defense (Iagnemma and Buehler 2006). The challenge of the first two races was for an autonomous car to drive a predetermined route through the Californian Mojave Desert without a driver or remote control. In the first DARPA Grand Challenge in 2004, the most successful vehicle, a prototype from Carnegie Mellon University, only managed twelve of the 227 km. Just one year later, the autonomous car Stanley, developed by the Stanford Racing Team and the engineer and AI researcher Sebastian Thrun, won the second challenge, applying the technical convergence of new sensor technologies with probabilistic algorithms and machine learning as a standard procedure. Today, this approach dominates the development of autonomous cars, robots, and drones in constantly evolving versions.
The prototype Stanley, whose further development Junior took second place two years in the DARPA Urban Challenge, employs a whole range of sensors: a rooftop mounted rotary lidar module, optical cameras, a radar, and a GPS module. The first step in these early attempts was to synthesize the data from these different sensors by the algorithmic filters of the SLAM method into a world model and, in the second step, to optimize the maneuvering by means of the new possibilities of machine learning (Thrun et al. 2006a, b). This double approach replaced the need for an a priori topological map and allowed maneuvering and navigation even in rough terrain or in the presence of other road users whose behavior cannot be predicted.
Since the DARPA Grand Challenges, algorithmic methods of localization and systems of spatial tracking that are used in autonomous cars as a sensory composite of optical, infrared, ultrasound, and thermal imagers, together with sonar, radar, laser, and lidar, do not simply represent the captured space but rather register the outlines and distances of objects through different wave spectra. However, since both the vehicle and the objects are potentially mobile, no attempt is made to specify more than probabilities of their position. These sensors provide relations between the vehicle and its surroundings, which, according to SLAM, can only be registered by comparison at different times and at different positions, which is to say, registered by and through movement. The complex sensors of these technologies are not focused on the mapping of an ontology in which all objects are registered on the basis of given coordinates as on a virtual map. Rather, objects are registered via the constantly changing relations of a virtual environment including probabilities and improbabilities. This virtuality is transformed into an operational model, which in turn enables time-critical interaction based on microdecisions that are bound to probabilities. Out of the mass of probabilities, these decisions choose the probabilities that are relevant for the car and necessitate specific behavior. Autonomy, in this context, is the capacity to choose options. It necessarily depends upon technologies that bring forth these options und thus instantiate microdecisions.
Even if the ADAS systems of today's cars do not operate on world models as interconnected and complex as the DARPA example (no 360-degree modeling, for example), the localized and situated sensing that is necessary for ADAS systems nonetheless operates on the basis of probabilities of the state of the environment. The worlds that are modeled are not representations but, rather, they are necessarily constructions of what may be relevant to operate under given conditions of uncertainty.