Imaging for the Automotive Environment
There is increasing interest in the development of advanced driver assistance systems (ADASs), driven in large part by the need to enhance safety and the driving experience for the occupants of passenger vehicles. Cameras (often multiple cameras) are a critical sensor component of such systems and provide a rich source of information that is often not possible from other sources. This chapter discusses some of the key aspects surrounding the use of cameras in the automotive environment. It covers the key legislative and commercial drivers and technical requirements, before discussing characteristics of fisheye lenses and how these are deployed in typical camera system architectures. The chapter then discusses aspects relating to image and video quality, a topic that has been much studied in consumer applications but only relatively recently considered in the automotive environment. Finally, the application of cameras as a key technology in the evolution of autonomous vehicles is considered.
KeywordsImage Quality Assessment Autonomous Vehicle Advance Driver Assistance System Video Quality Assessment National Highway Traffic Safety Administration
Definition of Key Terms
Advanced driver assistance system
Complementary metal-oxide semiconductor
Camera monitoring system
Cross traffic alert
Digital signal processor
Field of view
High dynamic range
Image quality assessment
Image signal processor
International Telecommunication Union
Original equipment manufacturer
Video Graphics Array
Vulnerable road user
Introduction: Application Context and Environment, Why Cameras?
An attentive awareness of the immediate and evolving automotive environment is vital to the drivers of the world’s more than one billion cars. The complexity of the vehicular environment has increased from a world of infrequent low-speed vehicles with low-speed maneuvers and forgiving hazards to one of dense high-speed transport environments with a diversity of low error margin behaviors among all road users. Car companies, in tandem with national legislators, have sought to use advanced driver assistance system (ADAS) technology to enhance the safety and driving experience of vehicle occupants. It is vital that such technology provides a maximum amount of information to the driver with a minimum of distraction or redirection of concentration. The most recent technology that has been demonstrated to provide the richest dataset by an intuitive modality is that of the electronic automotive vision systems.
Car companies’ markets for electronic automotive vision systems are driven by customer use cases. The earliest of these to be addressed is where a driver performs a predominantly reversing maneuver and whose view is occluded by the vehicle. This utility has evolved to the degree that as of May 2017, the US Department of Transportation’s National Highway Traffic Safety Administration (NHTSA) has mandated (NHTSA 2014) that all vehicles of <10,000 lbs must incorporate a rear visibility system while further stating that such requirements are the only measures which address the backover risk identified by the Cameron Gulbransen Kids Transportation Safety Act of 2007 (US Government 2007). Furthermore, NHTSA has recommended that a corresponding display unit of 500 cd/m2 is adopted to present this information to a driver. This and similar legislative developments are elaborated later, suffice to say that automotive video technologies that began as optional comfort assistance applications have evolved into mandated safety electronics systems. These technologies require “glass-to-glass” (i.e., lens to display) considerations for applications, the latest of which includes “camera monitoring system” (CMS) mirror replacement systems, which will be discussed in more detail below.
Mass-produced automotive camera technologies began with CCD (charge-coupled device) image sensors in aftermarket applications, but these had a fundamental disadvantage that the sensor and the associated processing elements could not be packaged together in a monolithic integrated circuit. CMOS technologies superseded CCD allowing the sensor and its associated processing elements to be packaged compactly and relatively cheaply together, including the output driver electronics used to drive an NTSC video to a simple display. A progression of use cases and associated technology challenges followed the proliferation of OEM rearview cameras in mass production (Denny 2014) after Europe’s first automotive CMOS cameras were developed for Land Rover by Connaught Electronics Ltd. (subsequently acquired by Valeo). Customer expectations grew. This created several challenges.
The first among these challenges was simply that customers lacked semantics for their expectations that could be articulated as general empirical design standards, and this was particularly apposite as customer expectations relating to low light performance, color fidelity, camera size, and cost became product differentiators.
A useful heuristic for resolving the ambiguity and supported image quality standard development in automotive viewing system is that “image quality should be ‘FUN’,” an acronym denoting fidelity, utility, and naturalness and connoting design considerations and standards.
In addition, all of this must be performed in real time, several tens of times per second across a diversity of rapidly changing lighting and temperature regimes.
The original CCD sensor and ISP (image signal processor ) combinations were based on incompatible semiconductor packaging technologies and thus had to be packaged separately, leading to a “2-chip solution.”
However, a smaller, single package was more desirable due to packaging constraints and also to allow tighter integration between the ISP computations and dynamic sensor performance optimization, and this was facilitated by CMOS as the underlying semiconductor technology is similar to that used in the ISP hardware.
Finally, as ISPs became more complex and the sensor became more sensitive, it was necessary to separate the elements again and the sensors moved back to a single package, while the ISP was implemented in a separate IC package and/or indeed in centralized multi-camera ECUs (electronic control units), sometimes in combination with other sensor fusion architectures.
The key sensor/lens technologies to improve included sensitivity, resolution, and dynamic range, and these are elaborated below.
Sensitivity improvements were largely due to improvements in pixel design and manufacturing process technologies, which allowed smaller pixels with less noise to convert light more reliably into electronic signal while addressing noise problems related to semiconductor physics. Also, the temperature needed to induce particular levels of noise in an image sensor increased, so the sensor technologies were able to operate in a broader range of automotive use cases with an increased tolerance to the ego heating generated by smaller packaging and greater processing power.
As the silicon area is the main cost determinant of image sensors and a smaller pixel size allows more pixels in that area, it was possible to get an increased resolution of a scene. The resolution increase was driven by application requirements to cover a larger field of view within a camera’s scene which in turn drove the development of automotive fisheye lenses.
The dynamic range of a camera system reflects its capacity to discern structure at both high light levels and low light levels in a scene. The original automotive camera systems struggled with 45 dB levels of dynamic range, which meant that even in a simple scene a camera had to decide between overexposing and thus “whiting out” bright structure in order to present structure in relatively dark areas and underexposing and “blacking out” dark structure in order to present structure in relatively bright areas. This made even simple scenes difficult, as backgrounds would disappear if reversing a vehicle in a shaded area or twilit hazardous areas would disappear during a parking maneuver with the approach of vehicular headlamps.
The combination of these innovations has facilitated the development of automotive vision systems whose front end (camera electronics/image signal processing/optomechanics) can deliver high-quality real-time data to the complex background electronics and corresponding vehicular display systems.
The improvements in image capture and processing technology can be considered in terms of the questions “How good is the image?” and “How far can we take it?”
The former question opens the area of image quality assessment and the latter the considerations of complex backend processing and autonomous driving applications. These too are introduced in this chapter.
Automotive Display Requirements
Display systems have become increasingly common in modern automobiles. Many early applications have included infotainment, satellite navigation, and interface to vehicle parameterization and diagnostics. The complexity of automotive displays has increased in recent years, progressing from relatively small analog displays to tablet-sized digital touchscreen displays. In recent years, displays have also been used to display video from automotive camera systems. Automotive display systems have therefore to cope with a wide variety of applications, all with their own strict requirements. The automotive environment itself also poses significant challenges, all of which determine the design, vehicle integration, and performance requirements.
Display Performance Requirements
The automotive environment places specific performance requirements on displays, particularly when used to display camera images. In this context, displays must be considered as safety devices and not just as infotainment systems.
High brightness and contrast ratio – it should be possible to view the screen clearly, even in the presence of glare (e.g., due to direct sunlight through the vehicle windows). Often, car manufacturers recess the display into the dashboard, to avoid direct sunlight. Also, many automotive displays dim automatically at nighttime, to reduce driver distraction. A value of 500 cd/m2 has been recommended for basic reversing maneuvers (NHTSA 2014).
Low viewing angle dependence – the display viewing angle will change depending on the mounting position of the display and the driver/passenger sitting position. The display brightness and color reproduction should therefore not be affected by the viewing angle.
Good color reproduction – particularly when used to display camera images. The color gamut must be sufficient for accurate reproduction of real-life scenes (e.g., accurate reproduction of warning signs, brake lights, turn signals, etc). Color inversion at high viewing angles is a significant disadvantage.
High durability – automotive displays have a significantly longer life expectancy than many displays in consumer applications. If a cover glass is used, it must be optically bonded to the display, to avoid glare. For larger displays, this process is difficult and expensive. As a result, automotive displays tend not to have cover glass.
High response time – particularly throughout the required temperature range of automotive applications (−40–105° C). If the response time is low, ghosting effects can occur. This is not acceptable, particularly when displaying camera images.
High refresh rate – in some camera applications, up to 60 frames per second must be supported, to ensure smooth display at high speeds.
Low power consumption – the lower the power consumption, the higher the overall fuel efficiency of the vehicle.
Should not lighten or have trailing effects when touched – particularly important for touchscreen applications.
The size of automotive displays ranges from low-cost 3.45″ up to 12.3″ in high-end vehicles. There are a number of factors which influence the size of the display. These include cost, physical constraints of the vehicle cockpit, and legislative restrictions. Given the well-established layout of the vehicle cockpit, there are a limited number of places to mount a vehicle display. For most applications, the display is mounted in the center console or, in some high-end vehicles, replaces the instrument cluster behind the steering wheel (e.g., 2013 Daimler S-Class). As the screen size increases, the number of operations it performs usually increases. For example, if you have a large display, there is no space to have separate multimedia or climate controls. These functions are instead combined in a large, touchscreen device.
The enclosed space of the car cockpit also limits the required size of the display. For the typical driver in the average vehicle, the viewing distance is typically less than 1 m.
In recent years, automotive companies have been exploring the concept of eMirror or camera monitor systems (CMSs), where camera and display systems replace conventional rearview mirrors. Requirements for the size and location of CMS displays are described in ISO16505 (ISO 2014). Conventional vehicle mirrors must be capable of showing a legally defined field of view (as defined by UN document ECE-R46 (UNESC 2012) in Europe and FMVSS-111 (NHTSA 1999) in the US jurisdiction), with a defined magnification factor. In a conventional vehicle rearview mirror, the magnification factor is a function of the mirror curvature. For a planar mirror, the magnification factor is 1.0. For a convex mirror, the greater the curvature, the smaller the magnification factor – objects appear smaller than they are in reality.
In a CMS mirror replacement system, the magnification factor is instead defined by the ratio of the display viewing angle (which is a function of the distance between the driver’s eyes and the display and the orientation of the display) and the field of view of the camera lens. Therefore, in a CMS system, the size of the display used is tightly coupled with the camera lens field of view and the location and orientation of the display within the cockpit. For example, if you increase the camera field of view (to reduce the size of blind zones), the display must either become larger or closer to the driver. There is also currently no consensus on the CMS display location within the vehicle cockpit. In recent years, there have been several concept cars where the display has been mounted in the door or in the vehicle A-pillars. There are also examples where the display is mounted in the dash or either side of the steering wheel. It is therefore likely that there will be a large variety of display locations and sizes used for CMS systems in the years to come.
Display resolution must be sufficient to preserve observable details within the scene. Many systems currently on the market have VGA resolution (720*480 pixels), at a distance of approximately 1 m from the driver. This is generally acceptable for basic infotainment and rearview camera applications. However, the trend in the automotive market is moving toward (relatively) higher-resolution cameras (1–2 megapixels) and larger displays. Also, for applications such as CMS mirror replacement systems, where cameras are used to display objects at greater distances, preservation of detail will become more critical.
The maximum required resolution of a display is ultimately a function of the display distance and the angular resolution of the human eye. For example, Apple claims that the resolution of their Retina Display is sufficiently high that the human eye cannot distinguish pixels, for a given viewing distance. Apple’s criteria is based on the assumption that the average person with 20/20 vision has an angular resolution of 1 arc minute. When Steve Jobs introduced the iPhone 4s, he stated that a resolution of 330 pixels per inch (PPI) was required for Retinal Display for a device held 10–12 in. (25.4 cm) away (Sydell 2010). If we assume a viewing distance of 80 cm (which is typical of a driver viewing a display on the center console), then a display would require a resolution of 104 PPI.
Applying this to an example of a 10.1″ display with 16:9 aspect ratio, the display resolution must be at least 915*515 pixels so that the human eye cannot resolve pixels at a normal viewing distance. This is therefore the minimum screen resolution for viewing systems such as CMS. Equally, increasing screen resolution beyond these limits therefore will give limited performance improvement, as this screen resolution already outperforms the angular resolution of the human visual system.
The typical aim of visual camera systems is to display a vehicle’s blind zones to the driver. A vehicle’s blind zones are the areas around the vehicle that cannot be seen directly by the driver by looking forward or by using any of the vehicle’s standard rearview mirrors (internal and external) from the normal sitting position. The sizes of these areas are determined by the size and design of the vehicle and mirrors and will vary according to the car model and manufacturer.
The short range nature of the vehicle blind zones means that the requirements for camera systems to monitor these area are (a) the camera must cover a broad lateral area of the vehicle, (b) the range of vision is short (compared to, e.g., longer-range front cameras), and (c) the resolution of the camera should be such that the spatial resolution is highest in the areas of interest. These requirements immediately suggest that a fisheye camera is suitable for such short-range vision system.
While rear-facing cameras can be used to cover some blind zones, the use of standard lenses has limitations. As shown in Figs. 1 and 2 from Hughes et al. (2009), standard lens camera systems (e.g., 45° field-of-view (FOV) lenses) are unable to fully cover the blind zone of some SUVs. A standard lens camera with FOV of 45° can only cover perhaps 1 m of the SUV rearward blind zone. Figure 5 from Hughes et al. (2009) illustrates how the use of a wide-angle lens camera system (e.g., > 100° vertical FOV lenses) enables the entire SUV rearward blind zone to be covered.
While wide-angle lenses greatly increase the area in blind zones that can be covered, problems arise due to the deviation of wide-angle lens cameras from the rectilinear pinhole camera model, due to geometric distortion effects caused by lens elements.
Because of this distorted representation of the real-world scene on-screen, there is the potential for the driver to not recognize obstacles and VRUs . Additionally, the distortion may cause the driver to misjudge distance to objects, due to the nonlinearity of the view presented. Thus, camera calibration and fisheye compensation are important tasks for automotive camera applications. Not only do they make images captured by the camera more visually intuitive to the human observer, they are often also necessary for computer vision tasks that require the extraction of geometric information from a given scene. Hughes et al. (2009) describe in detail how the geometric fisheye effects can be compensated.
Using parameters from fisheye camera calibration, the video streams from the four cameras can be undistorted (fisheye corrected), rectified (perspective correction to show the plane of the ground), and merged to create a seamless top-view image of the vehicle’s immediate environment. For a top-view application, an area of 2–4 m from the vehicle body is displayed to the driver.
There is a natural move toward the reuse of existing sensors for tasks other than simple viewing, as this leads to a cost reduction for automotive manufacturers. With this in mind, several detection functions have recently been added using fisheye cameras as the main signal source (Savage et al. 2013).
Many factors influence the overall quality of the signal produced by an automotive camera system and ultimately its usefulness in the automotive context. These factors can be loosely divided into scene-dependent factors (lighting, occlusion) and scene-independent factors, e.g., image capture, compression, and transmission, and previous sections in this chapter have already discussed a number of these. In the context of automotive camera systems, the question naturally arises: how do we define “quality” in automotive applications, and how do we measure it in a meaningful and reproducible way? This section will briefly consider some of the issues around image and video quality and discuss relevant research in the area, particularly in the area of objective image and video quality assessment and how this research field has developed.
The fundamental concepts of image and video quality are well established in the consumer electronics world, particularly in the context of multimedia transmission and storage. With the increasing prevalence of generation, transmission, and display of video with a combination of mobile devices and social media, the volume of high-resolution images and video generated each year continues to grow (Bovik 2013). In the consumer space, quality of an image has been traditionally equated with “perceptual quality” and more particularly with “naturalness,” i.e., how “natural” does an image appear to be or, put another way, how faithfully does it reproduce the real world in the eyes (and mind) of the viewer? The development of compression algorithms for multimedia content in particular has driven many of the developments in defining and measuring perceptual image and video quality (Chariglione 2012). However, in the automotive space, the expectation of what constitutes good quality is not so straightforward, and there is no single clear definition available (Hertel and Chang 2007; Winterlich et al. 2013). One of the reasons is that in automotive applications, images and video are required for two different requirements: display to the driver (e.g., reverse assist cameras, blind spot monitoring) and advanced driver assistance systems, e.g., pedestrian detection and lane departure warning, where machine vision algorithms are used to automatically detect objects of interest in the scene. These different applications require different scene characteristics for optimal performance, and the notion of what constitutes “good quality” for one does not necessarily equate to the same thing for the other. Even in the case of visual display only, there are often contradictory requirements. For example, Hertel and Chang (2007) distinguish between the “naturalness” of a scene (which is close to the consumer interpretation) and the “usefulness” of a scene; for example, the usefulness of a scene may be increased through artificially enhancing local contrast and object edges, thus rendering objects easier to see by the driver, yet these operations reduce the “naturalness” of the scene and therefore its “perceptual quality.” While the definition of what constitutes “quality” in the automotive space is still somewhat open, it is instructive to look at the development of methods for quality assessment and consider where these may or may not be applicable in the automotive environment.
The standard method for perceptual quality assessment is subjective testing using human subjects (ITU-T 1999; ITU-R 2012). However, this process is expensive, is time consuming, and often produces inconsistent results. Therefore, there is much interest in the development of automatic image quality assessment (IQA) and video quality assessment (VQA) algorithms that attempt to automatically predict perceptual quality of images and video (e.g., Seshadrinathan and Bovik 2009; Lin and Kuo 2011; Deng et al. 2014). Image quality assessment (IQA) algorithms are often classified into the following categories: (1) “Full-reference” (FR) algorithms require an original or reference image which is assumed to be undistorted and against which the test image is to be compared, using a suitable set of measures or features extracted from the pixel data. (2) “No-reference” (NR) algorithms do not require an explicit reference and only use the distorted test image to determine quality. Such algorithms are especially useful in, e.g., real-time applications where a suitable reference may not be available. (3) “Reduced-reference” (RR) algorithms do not require a copy of the full original image but rather calculate a number of measures from the distorted image and compare them with the same measures extracted from the original image.
One of the most widely used FR algorithms is the structural similarity (SSIM) index and its variations (Wang and Bovik 2002; Wang et al. 2004; Wang and Li 2011). This algorithm is based on the notion that the human visual system has developed in order to focus in particular on perceiving structure in images, and therefore a “good-quality” image is one in which structural information is preserved. The SSIM index extracts three measures from the reference and distorted images, namely, structure, luminance, and contrast. These three measures are then combined to give a single figure representing the level of similarity between the distorted image and the reference image. While FR methods such as the SSIM index achieve good performance in predicting perceptual quality of images, they have limited utility in applications where a reference image is not available, including the automotive environment where typically only a degraded image is available to the driver or a machine vision algorithm; in such cases, a NR (or RR) algorithm may be needed (Hemami and Reibman 2010; Wang and Li 2011). Many NR methods are designed to be distortion specific (Marziliano et al. 2004; Sadaka et al. 2008; Zhu and Milanfar 2009; Liu et al. 2010; Bovik 2013). Specifically in the automotive environment, Winterlich et al. (2013) describe an objective NR metric for perceptual blur, for the case where the images are acquired using wide-angle fisheye lenses, commonly used in automotive systems, which introduce additional radial distortion.
There is also much interest in the development of distortion-agnostic methods that do not make many assumptions about the nature of the distortion. Many such methods are based on the use of “natural scene statistics” (NSS) (Saad and Bovik 2012a; Bovik 2013). The rationale behind the use of NSS lies in the fact that such natural statistics are often changed by distortion in characteristic ways (Wang and Bovik 2011). NR algorithms often make use of training or other machine learning algorithms that “learn” the relationship between distortion and perception using large databases of opinion scores derived from subjective tests. Among recently developed high-performing NR IQA algorithms are the BRISQUE (Mittal et al. 2012) and NIQE (Mittal et al. 2013); the former is trained on a large database of opinion scores obtained from subjective tests, while the latter attempts to achieve the goal of blind IQA without training on subjective scores.
Most research to date has focused on algorithms for image quality assessment; however, a number of algorithms for video quality assessment (VQA) have also been developed. Among full-reference VQA algorithms, examples include the algorithms described in Pinson and Wolf (2004), Seshadrinathan et al. (2010), Seshadrinathan and Bovik (2010), and Vu et al. (2011). Some of these algorithms explicitly incorporate temporal analysis, including the calculation of motion vectors, and hence can be computationally demanding (Bovik 2013). As noted earlier, in applications such as automotive where a reference video is not available, there is a need for NR or RR VQA algorithms. One approach to the development of NR VQA algorithms is to extend NR IQA algorithms through adding a temporal dimension, e.g., via information on the differences between frames (Saad and Bovik 2012b).
In addition to the use of images and video for human attention, the second area of application of images and video in automotive systems is in machine vision for driver assistance, including automatic detection of objects of interest in the environment, e.g., pedestrians, other vehicles, lane markings, etc. Extensive development has been carried out on the development of such machine vision systems to the extent that they are now commonplace in cars; representative surveys of common applications can be found in Gerónimo et al. (2010), Dollár et al. (2012), and Loce et al. (2013). In the context of automotive machine vision applications, particularly those that involve detection of some object of interest, the issue of image and video quality is important because of the need to understand the impact of various degradations on detection performance. Knowledge of these relationships can be used to guide system design and configuration, for example, it could be used to guide compression ratio in video transmission systems in order to ensure robust detection performance across a wide range of conditions (Hase et al. 2011; Winterlich et al. 2014). As with quality for visual inspection purposes, in driver assistance systems, NR quality assessment algorithms are particularly useful if quality predictions must be made in real time. Winterlich et al. (2014) presented a pedestrian detection system based on the use of blind quality assessment approaches to categorize video frames according to distortion type, combined with a multi-classifier detection framework with training to individual distortions to optimize overall system performance. This work considered both sensor noise modeled as AWGN that would typically be introduced at image capture (Schoberl et al. 2012) and JPEG compression.
Future directions in IQA and VQA research (Bovik 2013; Winterlich et al. 2014) include consideration of 3D image quality, greater consideration of color perception, and the use of high-dynamic-range cameras and the implications this may have particularly on NR algorithms, as these are heavily based on scene statistics, which vary with sensor dynamic range.
The dream of ubiquitous autonomous cars has been with us for many years; however, a commercially available fully autonomous vehicle still lies some years in the future and will require significant technological and societal advances before it becomes simply “the car.”
However, until very recently, the technology to implement a fully autonomous vehicle was not available at a cost that could result in mass production of autonomous vehicles for ordinary civilians. Military and aerospace have had highly autonomous systems for many years, but at a cost that would be prohibitively expensive for commercial civilian applications and only in limited environments and situations that are carefully controlled. Civilian transport has seen increasing levels of automation, starting with simple cruise control and moving to more intelligent “adaptive cruise control” (Shaout and Jarrah 1997). While simple speed and distance control can be achieved relatively easily, control of the steering wheel is a much more technically challenging prospect. One of the earliest groups to tackle the problem was that of Alberto Broggi in the University of Parma, Italy (Broggi et al. 1999). With the advent of the Google driverless car (NYT 2010) and other similar announcements from most of the major automotive manufacturers, the race is on to be the first to market with the world’s first truly reliable, safe, and trustworthy fully automated self-driving vehicle. In the meantime, the path to full automation will see many new vehicles coming on the market boasting increased levels of automation, offering features such as motorway driving, self-parking, valet parking, and other smart and helpful functions. The watershed moment of seeing a commercially available vehicle without a steering wheel is coming; it’s just that nobody really knows when that will happen. While the technical challenges are immense, the political, social, and insurance-based challenges of allowing driverless cars on public roads may be equally significant (Anderson et al. 2014).
It is clear that any autonomous vehicle will require high levels of intelligence, high levels of automation, and an ability to accurately, reliably, and comprehensively sense its environment. While a range of sensors are currently used to sense the environment, such as reversing cameras, radar, ultrasonic, and lidar, the primary role of these sensors is driver assistance, and the final decision about the reliability of the information coming from the sensors largely resides with the driver. The current approach is to allow vehicles to take control only in very specific situations, yet the driver may still intervene should anything go wrong.
From the perspective of camera sensors, the utility of the images will also change because the focus will shift from delivering streams of high-quality images to a human viewer to delivering high-quality images for machine learning applications. However, as outlined in the previous section, the definition of quality depends on the specific requirements of the application, so an image that looks good to a human viewer may not be the best representation of a scene for a machine learning algorithm.
The role of the camera will be important, because while other sensors can determine the presence and position of objects in the environment, none can compete with the camera in its ability to classify objects in the environment. One of the shortcomings of conventional visible spectrum cameras is that they require a minimum level of illumination to function, which limits their usability at nighttime or in inclement weather such as heavy rain or dense fog.
For many years, people have dreamed of a future where cars could drive themselves. Considering the amount of money being invested by the automotive industry on sensors, algorithms and prototype hardware for autonomous vehicles, it is looking increasingly likely that the dream is soon to become reality. However, autonomous vehicles may have a few legislative and legal barriers to cross before they can be unleashed on public roads (particularly where they will be required to interact with human controlled vehicles), aspects of which have been anticipated by Westhoff (2012) in the German jurisdiction.
Looking to the future, a key trend in automotive systems is the move toward fully autonomous vehicles. All major vehicle manufacturers are investigating and promoting some form of vehicle autonomy. This has started with automated parking, which is already available on the market and will reach full fruition in future decades with fully autonomous vehicle driving, where the driver need only enter their destination. To match this, there are accelerated increases in the requirement that vehicles have sufficient intelligence and sensors to enable them to achieve adequate situational awareness to ensure efficient and safe operation.
Cameras are, and will continue to be, a critical element in such systems. In terms of machine vision and machine learning algorithms, camera systems will be used to add intelligence to autonomous driving system, in the form of object detection, depth estimation, vehicle localization, and mapping. Equally importantly, and perhaps a little underappreciated by the industry, cameras will also be fundamental in the interface the driver has to the autonomous vehicle. They can provide natural information to the driver about the environment, unlike any other sensor type. In the coming years before full vehicle autonomy, this will be particularly critical for helping the vehicle and the driver handle the transition from automated driving to manual driving. In the context of autonomous vehicles, cameras will enable full and true situational awareness for the vehicle driver.
With this in mind, this chapter has discussed a number of issues relating to the use of cameras in automotive systems. The rationale for the development of automotive camera systems was first discussed, and broad technical and commercial requirements were outlined. Then, a discussion on the deployment of wide-FOV lenses was given. The topic of image and video quality – relatively new in the automotive field – was then considered, finally finishing on the topic of autonomous vehicles and the role of cameras in the development of such systems.
- Anderson JM, Kalra N, Stanley KD, Sorensen P, Samaras C, Oluwatola OA (2014) Autonomous vehicle technology: a guide for policymakers. RAND Corporation report. http://www.rand.org/pubs/research_reports/RR443-1.html
- Broggi A, Bertozzi M, Fascioli A, Bianco C, Piazzi A (1999) The argo autonomous vehicle’s vision and control systems. Int J Intell Control Syst 3(3):409–441Google Scholar
- Deng C, Ma L, Lin W, Ngan KN (eds) (2014) Visual signal quality assessment. Springer International Publishing, Switzerland. 978–3319103679Google Scholar
- Denny P (2014) Imaging challenges for ADAS systems – the road ahead is long and winding; a tier1 perspective for sensor suppliers. Proceedings of IS Auto 2014. Brussels, BelgiumGoogle Scholar
- Hase T, Hintermaier W, Frey A, Strobel T, Baumgarten U, Steinbach E (2011) Influence of image/video compression on night vision based pedestrian detection in an automotive application. In: Proceedings of IEEE 73rd vehicular technology conference (VTC spring), Budapest, 2011, pp 1–5Google Scholar
- Hertel D, Chang E (2007) Image quality standards in automotive vision applications. In: Proceedings of IEEE intelligent vehicles symposium, pp 404–409Google Scholar
- International Organization for Standardization, ISO16505 (2014) 3rd progress report of ISO/TC22/SC17/WG2 (ISO 16505)Google Scholar
- ITU-R Recommendation BT.500-13 (2012) Methodology for the subjective assessment of the quality of television pictures. International Telecommunications UnionGoogle Scholar
- ITU-T Recommendation P.910 (1999) Subjective video quality assessment methods for multimedia applications. International Telecommunications UnionGoogle Scholar
- Kane S (2011) 2012 family cars with self-parking technology. http://www.thecarconnection.com/news/1067819_2012-family-cars-with-self-parking-technology. Accessed Jan 2015
- National Highway Transport Safety Administration §571.111 standard no. 111; rearview mirrors (1999)Google Scholar
- National Highway Transport Safety Administration (2014) Federal motor vehicle safety standards; rear visibility; final rule, 49 CFR part 571Google Scholar
- New York Times. http://www.nytimes.com/2010/10/10/science/10google.html?_r=0. Accessed 20 Jan 2015
- Saad M, Bovik AC (2012b) Blind quality assessment of videos using a model of natural scene statistics and motion coherency. In: Proceedings of annual Asilomar conference signals systems computers, Monterey, Nov 2012, pp 332–336Google Scholar
- Sadaka N, Karam L, Ferzli R, Abousleman G (2008) A no-reference perceptual image sharpness metric based on saliency-weighted foveal pooling. In: Proceedings of IEEEE international conference on image processing, San Diego, 2008, pp 369–372Google Scholar
- Savage D, Hughes C, Horgan J, Finn S (2013) Crossing traffic alert and obstacle detection with the Valeo 360Vue® camera system. In: 9th ITS European congress, Dublin, 4–7 June 2013Google Scholar
- Schöberl M, Brückner A, Foessel S, Kaup A (2012) Photometric limits for digital camera systems. J Electron Imaging 21(2):020 501–1–020 501–3Google Scholar
- Seshadrinathan K, Bovik A (2009) Video quality assessment. In: Bovik AC (ed) The essential guide to video processing. Elsevier, New YorkGoogle Scholar
- Sydell L (2010) Live blog: Steve Jobs Introduces the iPhone 4. http://www.npr.org/blogs/alltechconsidered/2010/06/07/127530049/live-blogging-apple-s-developers-conference. Accessed 20 Jan 2015
- United Nations Economic and Social Council. Proposal for the 04 series of amendments to regulation no. 46 (devices for indirect vision). ECE/TRANS/WP.29/2012/87Google Scholar
- US Government, Cameron Gulbransen Kids Transportation Safety Act of 2007, (Public Law 110–189, 122 Stat. 639–642), § 4 (2007)Google Scholar
- Valeo. http://www.valeo.com/medias/upload/2014/04/31542/valeo-innovative-back-over-protection-system-wins.pdf. Accessed Jan 2015
- Vu PV, Vu CT, Chandler DM A spatiotemporal most-apparent-distortion model for video quality assessment. In: Proceedings of IEEE international conference on image processing, Brussels, 2011, pp 2505–2508Google Scholar
- Westhoff D (2012) Fahrzeughautomaitiserung im rechtlichen Kontext. VDI-Berichte Nr. 2166, p 263, 28. VDI/VW-Gemeinschaftstagugng Fahrerassistenz und Integrierte SicherheitGoogle Scholar
- Winterlich A, Zlokolica V, Denny P, Kilmartin L, Glavin M, Jones E (2013) A saliency weighted no-reference perceptual blur metric for the automotive environment. In: Proceedings of international workshop on quality of multimedia experience (QoMEX), Klagenfurt am Wörthersee, Austria, 2013, pp 206–211Google Scholar
- Winterlich A, Denny P, Kilmartin L, Glavin M, Jones E (2014) Performance optimization for pedestrian detection on degraded video using natural scene statistics. SPIE J Electron Imaging 23(6):061114-1Google Scholar
- Zhu X, Milanfar P (2009) A no-reference sharpness metric sensitive to blur and noise. In: International workshop on quality of multimedia experience (QoMEX), San Diego, 2009, pp 64–69Google Scholar