The use of computerized systems in cars is not new. Features for safety or driver assistance began appearing in the 1970s with the anti-lock braking system (ABS) followed by the electronic stability program (ESP) and the standardization of on-board diagnostics (OBD) in the 1990s. These systems were already able to collect data and process this information to check the performance of various car systems (e.g. emission control; early warning of malfunctions by way of the dashboard “Check Engine” light). Since then, numerous additional in-car technologies like event data recorders (EDRs) or OBD-II standards were developed and are fitted almost as standard nowadays. Connectivity often complements these existing in-car technologies and also maximizes the data collection capabilities of a car [1, 2].
2.1 What Is Connectivity of Smart Cars?
The connectivity of smart cars refers to their ability to exchange information between the car and its surroundings. It can be differentiated between the data collected and stored inside the car which is only accessible through a physical connection and data that are transmitted. The interactions range from car to car, car to infrastructure, car to devices up to car to service providers and manufactures, usually referred to as car2x connectivity. Thus, connectivity describes the digital exchange between cars. The idea behind the concept is to send relevant traffic or road information to other cars around. The car can also communicate with the infrastructure and receive information about road conditions (e.g. construction sites) or other external objects (e.g. traffic lights or traffic signs). Furthermore, cars can set up a wireless connection to other devices, for instance helping the driver to navigate. Moreover, the car can connect with service providers and manufactures and send errors reports or get a reminder for the next check-up [2,3,4]. Consequently, car2x connectivity produces a vast amount of data and exacerbates the issue of data collection.
2.2 Which Data Are Collected?
Typically, three groups of collected data can be differentiated: data of the car, data of the car occupants and data about the environment of the car.
Every car has several identifiers which are transferred with every communication of the respective device. This can include the vehicle identification number (VIN), mobile device identifiers, SIM-cards, media access control (MAC) addresses, Bluetooth identifiers and radio frequency identification (RFID). The so called telematics data of a car include information on the location and changes of the location (e.g. geo tracking data, route, speed). Telematics can be compared to a black box that records information about the driving behaviour on how, when and where a person drives. Cars are also equipped with many sensors which collect the operational state and functionality of the single components during the drive (inter alia the engine, gear, brakes, tire pressure, fumes). In addition cars can be equipped with event data recorders (EDRs). They collect data shortly before, during and after a car accident and, for instance, store the direction of movement, longitudinal acceleration or the status of the brakes [1, 5, 6].
The car also collects a multitude of information referring directly to the occupants of a car. Using the connected components of the car for example requires registration with the provider (car manufacturer) and the creation of a user account. For registration, data such as name, address or date of purchase can be required. Additionally internal sensors might obtain information about the physical or biological characteristics using biometric detection systems to identify the driver. The car can keep personalized information about the voice or data communication from text messages and remember habits of the driver, for instance music choice or seat and mirror position. The car might even be able to test the physical fitness of the driver by analysing the heartbeat, breathing or head- and eye-movement [1, 6].
The car also collects information about its environment, be it the physical surrounding or the human environment. This includes information about upcoming obstacles, blind spots, traffic sign analysis or even the social network of the driver. Including, for example, if drivers connect their phone with the car, it might gain access to their address book. Smart cars can also function as Wi-Fi-Hotpots and through this, collect the identification and use data of other devices and their users and owners [1, 2].
Although many of the collected data are not directly linked to a person, in many cases personal details can be revealed through other information (e.g. workplace information through geo tracking data). From a legal point of view, personal data are any information that relate to an identified or identifiable natural person according to Article 4(1) GDPR. Under this legal definition, a person is identifiable when he or she can be identified by reference to an identifier such as name, an identification number, location data or one or more factors specific to inter alia the physical, genetic, economic or cultural identity of the natural person in question. As smart cars use several identifiers for communications, these, in consequence, constitute personal data [1, 7]. Furthermore, the combined analysis of several attributes, which by themselves do not make a person identifiable, can turn the information in question into personal data . This is especially true where location data, as explicitly referenced in Article 4(1) GDPR, are used.
These types of data collected and the purposes of use receive new dynamism in the context of modern technologies of data collection and processing that can be subsumed under the term of big data which will be the topic of the next section.
2.3 What Is Big Data?
Big data is a controversial buzzword which is used by a variety of stakeholders (e.g. private sector, public sector, science, and press and media) to characterize modern tendencies of data collection and processing in the networked, digitized, information-driven world. It is not a precise scientific concept, but rather a highly contested idea that differs depending on the context .
Although there is no uniform definition of big data, many definitions revolve around an understanding which involves three major aspects of the phenomenon: volume, variety, and velocity, with volume referring to the vast amounts of data that are being generated and accumulated, variety referring to the different types of data and data sources that are brought together, and velocity referring to the ability of real time analyses based on elaborate algorithms, machine learning and statistic correlations. Over the time, the three V’s were expanded to cover other important aspects, including particularly: veracity and value. The former refers to the correctness and accuracy of information, the latter to the assessment of the societal value, big data analyses may or may not offer [9,10,11].
The types of data included in big data analyses might comprise any type of structured or unstructured (text, image, audio, or video) data. These data might be collected from public datasets (e.g. administrative data and statistics about populations, geography, economic indicators, education etc.), from businesses, web pages, newspapers, emails, online search indexes, and social media, or from any kind of sensors (mobile, such as sensors carried on the body or drones as well as stationary sensors such as CCTVs or Wi-Fi/Bluetooth beacons) .
Big data analyses are used for several purposes that can be grouped under the terms of descriptive statistics and inductive statistics. The former relates to big data analyses that are based on the elaborate analysis of data sets with high information density to measure things, or to detect trends. The latter relates to the analysis of large data sets with low information density in order to reveal relationships and dependencies, and to predict outcomes and behaviour. However, one important characteristic of big data that spans all areas of application is that its analyses are not limited to specific purposes. Instead, the continuous analysis of data is supposed to generate new purposes for which the existing data can be used .
As a result, many observers agree that big data is a disruptive technology with possible implications for all economic and policy areas (transport, energy, education, security, health, research, taxation, etc.) and that it represents a particularly weighty shift that will affect society as a whole [12, 14, 15].
2.4 Who Profits from These Data?
Regarding smart cars, many promises are made to the public about the potential benefits of big data.
New technologies in cars promise drivers advances in safety and convenience. Through intense car2x communication and background analyses of the collected data, the prevention of accidents and better traffic management (better traffic light control, avoidance of traffic jams, and so on), indications of discounts (special deals at a nearby petrol station, or restaurant, etc.) and many more potential benefits are promised not only to allow more secure travelling, but also to return monetary benefits to the car owners, allow more comfort and at the same time being less damaging to the environment [16, 17].
Big data opens new prospects of control for the state. Courts, financial authorities and law enforcement agencies could use the generated data for purposes of criminal prosecution, hazard prevention or the collection of public revenue. Very similar to how many other big data applications are framed, smart mobility concepts focus on emphasizing the societal surplus promised. Such promises include that traffic controls enhanced by big data analytics will be more economic, ecological, efficient, cost-effective, comfortable and secure. This may be achieved by an array of sensors that are spread all over a city and which allow the continuous collection of various data [16, 17]. In the meantime, many cities around the world have already introduced smart city concepts to innovate and enhance city life through lower costs and less environmental pollution. These concepts, however, vary in scope and depth and range from pioneering cities such as Stockholm and Amsterdam which rely on individual agencies or research bodies to comprehensively networked and highly centralized smart cities such as Singapore [18, 19]. For many years, the rise in the numbers of cars caused problems regarding the maintenance and expansion of city infrastructures, especially regarding automobile traffic. In times of strict budgets, municipalities and government agencies welcome these new opportunities as means of a more cost-effective and ecologically sound urban infrastructure and land use planning.
The interrelation of big data applications and smart cars needs to be understood in the broader context of digitized, networked, sensor-laden environments. Therefore, the development of smart car services should not only be understood in the isolated context of catchwords such as smart mobility and smart traffic controls. Rather, the whole environment, including all its artefacts such as infrastructures, buildings and inhabitants, should be regarded as both the provider of data and user of data-driven analyses . The main interest of the industry lies in the monetarization of the data that is generated in such environments either to improve their current business or to develop new business models. Manufacturers and garages can use the car’s diagnostic and performance information to improve their products or develop new business models (e.g. customer relations management, marketing and after-sales services). The use of data surfaces also offers new business fields like traffic information, fuel price data banks, driver-apps, or hotel booking systems. Service providers might offer real time navigation or maintenance services based on telematics. Also, the advertising industry can profit from the vast amounts of data and initiate personalized advertising. Insurance companies may offer their customers personalized insurance rates based on their tracked individual driving behaviour [2, 20,21,22].
2.5 Potential Risks
The generated data offer a variety of information about the users and therefore are open for misuse. These data are collected inter alia in the interest of car manufacturers, suppliers, garages, insurances, courts, financial authorities, law enforcement agencies, and municipalities. Interfaces unnoticeably transfer the data outside the connected car. The user cannot avoid this and/or is not aware of this fact. Every car will leave a digital trace which allows the deduction of detailed profiles of every movement, behaviour and the personality of the driver, passengers and any other person within range of the sensors. It offers potential for surveillance activities and unauthorized persons might be able to gain access to the car by exploiting security vulnerabilities. Furthermore, companies might use this data for their insurance or credit decisions or use it to reject warranty or guarantee claims of customers [6, 22, 23].
However, these characteristics apply not only to smart cars; rather they can be seen as an illustration of the potential risks of big data in the age of the internet of things. The ability of smart devices to connect and the resulting system of systems (thinking for example of smart homes or smart cities) offers many opportunities to collect personal data and to use it for further purposes. And while data protection is still predominantly considered as an individual right, proponents of big data analyses often frame their initiatives by means of the societal benefits that big data promises in a variety of sectors (aside from traffic management, a special focus is on the health care sector) [15, 24].
Regardless of whether personal data are included in the underlying datasets [which may or may not be the case, cf. 25], the results of any big data analysis might very well impact certain individualsFootnote 1 as well as groups or even society at large. The Article 29 Working Party draws particular attention to the issues of insufficient data security, loss of transparency for users, inaccurate, discriminatory or otherwise illegitimate analysis results as well as increased possibilities of government surveillance . Group discrimination along racial lines, for example, as opposed to obvious and nowadays illegal racial profiling practices of past decades, might simply result from (for example, credit scoring or risk assessment) decisions based on algorithms that evaluate the data in a biased and inaccurate way [27, 28].
When the legislative procedure for the GDPR officially started in January 2012, critics of big data hoped it would provide a legal solution. The following section will provide some insights if the GDPR achieved this and how it tries to protect individual rights in the digital age.