Keywords

1 Introduction

The company Amazon advertises that its voice assistant Alexa “brings hands-free living into your home and can help you stay entertained and connected”.Footnote 1 These type of voice assistants can perform a variety of functions, from small tasks such as launching an app or playing music to organising a smart home. In such smart homes, a number of different domestic functions, such as lightning, heating and various household appliances, can be controlled, by means of a digital assistant, in particular via smartphone and respective apps.Footnote 2 Voice assistants can also create shopping lists on demand, keep a calendar with appointments and provide reminders for these appointments.Footnote 3 For children in particular, the devices offer opportunities for playing but also for learning.Footnote 4 Statistics expect that between 2022 and 2027, the volume of the global smart speaker market will continue to grow. In 2022, the market volume was around 171 million units and by 2027 this number is expected to reach almost 309 million.Footnote 5 Although voice assistants, like Alexa, Siri or the Google Assistant have become a familiar tool in everyday life, their use has given rise to a number of questions and problems in the context of the right to privacy that are not yet fully addressed. Artificial intelligence software, especially voice controlled and deep learning software, is becoming more and more common in our daily lives. Upcoming generations of lawyers will have to deal with the specific problems that these systems bring with them. The aim of this chapter is to identify and legally analyse the problems that arise in the context of family dynamics when smart speakers are used, especially in the area of data protection. This also gives a necessary spotlight for legal education in an area that tends to be easily overlooked or rather considered unimportant: children´s rights.

The contribution aspires to give a first overview over the topic, its aim is to provide a starting point for further discussion.

2 A Brief Introduction to the Functioning of Voice Assistants

Voice Assistants are software systems that allow users to interact with computers using spoken language.Footnote 6 Therefore, these systems are part of virtual assistance systems, of which the most popular commercial ones, such as Apple's Siri, Amazon's Alexa and Google's Assistant are widely known and used in private households.Footnote 7 They support a wide range of tasks of various complexity, from simple tasks as setting a timer, using a calendar, or alarm, to more complex tasks, such as reading or writing text messages, or controlling third-party apps (e.g. streaming services), to even controlling connected smart home devices (e.g. light switches and thermostats).

Although they are so widespread and common it is difficult to obtain precise information about the exact software architecture of these systems.Footnote 8 However, some authors introduce generalized models of an Intelligent Personal Assistant,Footnote 9 demonstrating how such systems must be generally structured to meet the requirements of listening, understanding, and responding meaningfully. According to such a model shown in Fig. 1, the user communicates with the system via an interface like a smartphone or smart speaker, which records what the user said and channels the assistant’s response. Between these two steps the assistant leverages advanced data processing methods to process the input and provide a meaningful response. Automatic Speech Recognition (ASR) is used in the first step, to translate the recorded audio data to text form. The data now can be read by the system and the meanings of words and sentences can be extracted by Natural Language Understanding (NLU), resulting in a semantic representation of the input. Depending on the completeness and feasibility of the interpretation, a dialog manager or an action selector formulates a response based on the context of the previous interaction and an available knowledge base. If the exact meaning of the user statement cannot be identified, is incomplete, or is not feasible, the dialogue manager initiates further interaction with the user, for example, in the form of an error message or a query to clarify the statement. If it is an executable command, the action selection can provide a response or even directly access connected peripherals or apps such as smart light switches or the calendar function. If a semantic answer is selected, it must follow the reverse path of the input and be translated by a Speech Synthesis into a statement understandable to the user.

Fig. 1
A flow diagram. The user speaks to user interface which leads to the voice service cloud. The flow in the voice service cloud is automatic speech recognition, natural language processing, semantic interpretation which leads to automated hardware, external apps, knowledge, and lastly best outcome.

Voice assistant schematic model based on concepts referred to in Fn. 15

It can be seen that most of the required computing power is not provided by the device but is handled by a cloud service such as the Amazon Voice Services (AVS).Footnote 10 On the one hand, this enables the production of small, compact and inexpensive end devices. But on the other hand, it requires a permanent Internet connection via that the cloud service to continuously exchange data with the end device—if the device is to be more than just a speaker. A difficult task is therefore the recognition of the so-called “wake words” (“Hey Google”, “Siri”, “Alexa”, etc.), which is to be carried out on the end devices with their low computing power.Footnote 11 Only when this ‘catch phrase’ has been recognized, the voice assistant is supposed to be activated and start transmitting data to and communicating with the Voice Service Cloud.

Even though each developer uses different approaches to develop their voice assistant, resulting in different systems that vary in structure, architecture, and performance, artificial intelligence and machine learning methods are used in all of the above mentioned steps.Footnote 12 This also means that the individual components of a voice assistant are data-driven models whose performance depends not only on their internal structure, but also on being trained with a large amount of data in order to function.Footnote 13 The more complex the task of the machine learning model, the more attention must be paid to the data with which the model is trained. Variations that are to be expected in the application are already taken into account here.Footnote 14 For ASR, for example, these can be, without claiming to be exhaustive, different speech styles, speeds and colourings, speakers, voice pitches, background noise and much more.

Training in this case basically means that the model is initially fed with data and thus adapts to the given dataset. This enables the model to respond to user input and compare it to already known patterns. It then depends on the chosen methods how exactly the training of the models proceeds, which can be roughly divided into two broad groups, namely supervised and unsupervised learning methods. In supervised learning, input and output data are known and the model forms a rule to map the input data to the output data.Footnote 15 If the outputs associated with the input values are not known, only unsupervised learning methods can be considered. However, these methods cannot perform specific recognition, but rather only find similarities in the data and associate them with each other.Footnote 16

Recently, many methods of language processing have been replaced by artificial neural networks. These are modelled on the biological structure of the brain and consist of interconnected nodes called neurons.Footnote 17 Several of these nodes together form a layer and several layers form the net. In the simplest case, the feed-forward network, the layers have a logical order and the neurons of a layer are only connected with the neighbouring layers.Footnote 18 Since speech processing also depends on the context, i.e. the previous inputs into the network, special networks, the recurrent networks with a memory function, are used for this purpose. This memory function is realized by feedback in the neural network.Footnote 19 Training of such neural networks is done by comparing the input and the corresponding output and adjusting the weights based on that.

Another point to address is the knowledge base, which stores a wide range of information from ready-made answers to common questions based on “if-then rules”,Footnote 20 to user profiles and information.Footnote 21 In this field a change from a strictly static knowledge base with a predefine set of rules towards an dynamic one that adapt to the users previous interactions can be observed.Footnote 22 Machine learning algorithms can be used on these knowledge bases to detect users with similar queries and to cluster whole groups in order to adapt the assistant even better to the behaviour of the user. Common to all these systems is their ability to store a vast amount of data about the user's behaviour when they are used by consumers, in order to enable the software to learn from previously generated data for future interactions.

3 The Legal Framework in the Context of the Use of Voice Assistants

This short explanation on the function of voice assistants maps the road for the following part where the requirements the right to privacy sets for the familial use of smart speakers as interface for the voice assistant will be analysed.

3.1 The Use of a Smart Speaker and the Right to Privacy as a Human Right

The right to privacy can be found in international,Footnote 23 as well as regional human rights treaties,Footnote 24 the Universal Declaration of Human Rights,Footnote 25 and national constitutions.Footnote 26 The right is used as a kind of catch-all basic right and protects, among other things, against interference with the private sphere. In the context of this contribution the right to privacy is of special interest as it encompasses the right to informational self-determination.Footnote 27 This includes the right to decide upon one's own image, one’s spoken word, in fact everything that is necessary for self-presentation in public.Footnote 28 The right also includes the determination of data and information about one's own person on the internet, with the result that the right to privacy represents the basis for data protection under simple law.Footnote 29

The use of smart speakers can result in interferences with the right to privacy.Footnote 30 Particularly regarding the protection of personal data as an important aspect of the right to privacy.Footnote 31 It is therefore instructive to discover where the data and information given to the voice assistant over the smart speaker ends up. Data protection law, and in particular the GDPR, may for example prevent or restrict the use of servers in non-EU third states.Footnote 32

The functioning of a smart speaker, as explained above furthermore “depends not only on their internal structure, but also on being trained with a large amount of data in order to function.

Of juridical interest is therefore not only where the data goes, but also when and what is recorded at all. A smart speaker that is supposed to hear the catch phrase will have to listen in all the time.Footnote 33 Of legal interest, for this reason, is the extent to which the above statement is actually true, that “[o]nly when this word has been recognized the voice assistant is supposed to be activated and start transmitting data to and communicating with the Voice Service Cloud.” In this context it has been shown that smart speakers such as the Google Assistant not only react to “Ok Google”,Footnote 34 but also to similar sounding words or sentences. As consequence conversations and everyday situations might be recorded and forwarded that were not intended to be forwarded to the respective provider.Footnote 35 Such misfunction leads to major interferences with the right to privacy and the individual has a right against the provider to expect that these recordings are deleted.Footnote 36 This leads to the question, whether and to what extent the right to privacy itself might lead to obligations of the customers towards private individuals who are living or visiting the household where the device is used, but who are not party to the contract with the provider. This chapter especially highlights the situation of children who, as minors, need special protection which may include protection against their parents.

Although there is case law at both national and EU level that assumes the direct applicability of the principle of equal treatment/prohibition of discrimination,Footnote 37 this case law cannot be extended to the right to privacy without further ado. However, the lack of a direct binding effect of the right to privacy for private individuals (in our case the parents) does not exclude a binding effect overall. As can be seen in the jurisprudence of the ECtHR,Footnote 38 fundamental rights impose an obligation on the states to ensure that the respective right, here the right to privacy is also observed between private individuals.Footnote 39 States can fulfil such a positive obligation by introducing corresponding legislation, regulating the private relationship.Footnote 40

The question of the applicability of the right to privacy in a family adds another level to that discussion. And brings special not merely legal problems with it: while to some degree national courts have already had to consider children’s rights in relation to smart speakers,Footnote 41 cases dealing with the general permissibility of the use of smart speakers and any associated parental care obligations have not yet made it to the courts.

This is hardly surprising because children do not easily sue their parents independently, which is why disputes about the rights of the child usually find their way into proceedings before national courts dealing with the divorce of the parents or related custody disputes. At present, there are no decisions where a child has taken legal action against their own parents on the basis of their digital behaviour (be it through the use of smart speakers, or also in the context of sharing, i.e. when parents share pictures and videos of their children on the internet and the relevant platforms).Footnote 42

Legislation implementing the protective obligations towards children in relation to their parents in the context of the use of smart speakers must take the following into account: children have yet to develop, the child's right to privacy therefore includes the protection and consideration of these evolving capacities when regulating topics concerning children.Footnote 43 This protection of their personal development should not only be specially enforced against the behaviour of strangers (in our case enterprises that provide smart speakers) but also where parental behaviour threatens to harm child development.Footnote 44

3.2 Concretisation of the Right to Privacy in Legislation Relating to Children

The following section will have a look at how and to what extent the aforementioned aspects have been implemented into legislation such as the GDPR (3.2.1), the AI-Act (3.3.2) and the DSA (3.2.3).

3.2.1 Children, Their Parents, Smart Speakers and the GDPR

The GDPR refers to children in its recitals,Footnote 45 as well as in some of its provisions,Footnote 46 and makes corresponding specifications for the use and processing of their data. It regulates the capacity of children to give consent, which is assumed to be from the age of 16,Footnote 47 as well as the special consideration of the interests of the child within the framework of the balancing of interests in data processing, but also the obligation “to provide [information regarding data processing and the associated rights] in a precise, transparent, comprehensible and easily accessible form in clear and simple language”.Footnote 48 In order to give children as carefree as possible a start in adult life, the GDPR grants them their own (even though insufficient as it stands)Footnote 49 right to be forgotten in Article 17.

The GDPR envisages that data processing is generally permissible if the holder of the right has given their consent. For children, therefore, data processing is permissible with the consent of the parents (as their legal guardians), provided the child cannot yet give consent themselves. The same principles can be applied to the use of smart speakers.Footnote 50 However, cases remain problematic in which, for example, the smart speaker falsely records and/or these recordings are not deleted. Here it has been shown that it is difficult to enforce the right to deletion provided for in the GDPR, depending on the provider.Footnote 51

The regulation of parents when using smart speakers is however the wrong approach with regard to data protection for their children. Although parents are in part responsible for the violations of data protection because of the use of the product, the actual infringement itself comes from third parties (in this case the provider of the voice assistant). It is easier to regulate such private third parties, as they cannot rely on a comparably strong right such as the parental right of raising their children. Regulating parents with regard to the use of a smart speaker in connection with their children would to a certain extent confuse perpetrator and victim. For the regulation of providers, by contrast, the GDPR is decisive and a more effective implementation and enforcement is necessary to allow a more thorough protection of children´s right to privacy.

3.3 Specific Legislation Governing the Online Sector

In contrast to the GDPR, which addresses fundamental data protection concerns, the legislation around the online legislation specifically addresses various forms of artificial intelligence and challenges that online service providers present as well as risks associated with them.Footnote 52

In this context the EU Commission's proposal for an Artificial Intelligence Regulation plays a key role. Artificial intelligence is one of the core elements of smart speakers. With the further development of so-called deep learning AI in connection with smart speakers, there are also new challenges for data protection, which the EU is trying to address with its new regulation.Footnote 53 The Regulation aims to address risks of specific uses of AI. “In doing so, the AI Regulation will make sure that Europeans can trust the AI they are using.Footnote 54 In particular, with the AI Regulation, the European Union wants to ensure that AI systems located in the European single market are safe to use, in terms of existing legislation in the area of fundamental rights as well as in relation to the values of the European Union. In addition, the legal act aims to contribute to the more effective implementation of existing legislation relevant to fundamental rights in the area of artificial intelligence and to ensure that security standards are established and complied with in the use of these systems.Footnote 55 In its recital number 15, the Commission recognises the many benefits that AI can bring, but at the same time it focuses on the rights of private individuals, especially children, who could be endangered by the use of AI and whose fundamental rights should be better protected by the regulation. It describes that “that technology can also be misused and provide novel and powerful tools for manipulative, exploitative and social control practices”.Footnote 56 The Regulation specifically distinguishes in a risk-based approach three types of AI use to be regulated: (1) “prohibited practices in the field of artificial intelligence”, (2) “high-risk AI systems” and (3) “other AI systems”. First, Art. 5 of the Regulation prohibits certain practices that entail an unacceptable risk. Here, with reference to the parent–child situation, this specifically includes systems that deliberately “exploit the vulnerability of a particular group of persons on account of their age”.Footnote 57 Art. 6 of the Regulation, which defines so-called high-risk systems, also applies in the family situation, as electronic toys in particular fall into this category. Other AI systems are only marginally covered by the Regulation, which only requires certain transparency rules (Article 52 of the Regulation) and private codes of conduct (Article 69 of the Regulation) when they are used.Footnote 58 Smart speakers can fall into both the category of unacceptable risk AI and the category of high risk AI, since in the case of the Apple and Amazon systems in particular, the respective naming of the AI (Apple: “Siri” and Amazon: “Alexa”) can create the feeling, especially in children, that they are interacting with a real person.Footnote 59 This fact can be used by the respective companies for manipulation for advertising and consumption purposes. Amazon in particular states that the information generated through the use of smart speakers is used to optimise the offer of the respective user and to increase the profit of the platform.Footnote 60 In order to prevent providers from locating their services in third countries to circumvent the scope of the Regulation, the European legislator has defined the geographic scope of application as broadly as possible, so that all end devices located within the EU are covered. This is especially important for the key players in this field such as Amazon, Apple and Google which are all US companies.

In its draft, the Commission aims for a horizontal approach between the people whose privacy is to be protected, and the “providers” and “users”, who are to take the necessary measures to protect privacy rights when developing or using AI in a professional context.Footnote 61 The Regulation assumes that those affected are structurally inferior to providers and users. It does, therefore, specifically not cover the situation of “personal use” of smart speakers and the possible consequences that this use could entail. The Regulation places the responsibility for AI systems solely on the professional service providers. It is therefore even more astonishing that it only provides for possibilities of action for the EU institutions themselves and does not grant the affected parties any rights of their own or even complaint mechanisms. There are also no claims for damages or injunctive relief. Unlike in the context of the General Data Protection Regulation, which has a direct anchor in the right to informational self-determination,Footnote 62 it is difficult to derive an independent right to “trusting interaction with AI” from the general regulations on human and fundamental rights. According to this new legislation, even though it could cover an existing gap in data protection law concerning new technology, in case of a violation of their rights, individuals and in our case children neither have the possibility to act against their parents nor can they take legal action against the service provider. The future will show whether enforcement by public authorities alone will be sufficient to achieve the “ecosystem of trust” in the field of AI conjured up by the Commission.Footnote 63

4 Lessons Learned from Private Video and Audio Surveillance?

Due to the fact that, despite the existence of special legislation in the area of AI, the protection of personal data has not yet been sufficiently and adequately regulated, the question arises as to whether it is possible to adopt standards from similar areas of law and accompanying case law and transfer it to the situation at hand. To this end, the chapter analyses a situation in which AI specifically in connection with children has already appeared in a legally relevant way. We also consider the situation of private video surveillance using dash cams, which can be legally comparable to the situation of smart speakers in family systems, as there is also surveillance without the consent of the filmed party.

4.1 Services Directly Aimed at Children

One of the most famous cases of jurisdiction on smart toys is the case of the doll “My friend Cayla”,Footnote 64 which was pulled from the market by the German Federal Network Agency for a violation of §90 (1) TKG (old version) for it being an “espionage device”.Footnote 65 It had a microphone and a speaker that could be controlled via smartphone using Bluetooth. The design also included the possibility for children to ask questions, which were sent to a server in the USA and analysed there to enable the doll to respond. At heart, the German legislation was not designed to deal with this type of case. However, the German authority applied the legal rules on telecommunications surveillance and illegal broadcasting equipment to this atypical case. Already in this case from 2017, the responsibility was transferred to the manufacturers to create systems that are better suited to comply with data protection standards. According to §§ 90, 115 TKG (old version), there was also the possibility to impose an obligation on buyers of unauthorised transmitting equipment to render it unusable,Footnote 66 Placing the responsibility not only on the service provider but also on the parents.Footnote 67

This jurisdiction concerning a smart toy cannot be directly applied to smart speakers, as under German law in particular, the doll only fell under the definition of the offence because both the microphone and the loudspeaker were not visibly installed.Footnote 68 The problem, however, with smart speakers and children in particular is, that they probably cannot grasp the concept of an AI system that is able to talk to them as a human. Also, if we look at the privacy aspect, smart speakers collect very similar if not more personal information about their user than a smart or connected toy.Footnote 69 Particularly in the relationship between parents and children, binding and clear regulations should be created in the future, which not only impose obligations on manufacturers or providers with regard to data protection of their customers, but which also place an obligation on the private users of such systems to regularly check these rules in relation to third parties.Footnote 70 Especially in the area of the parent–child situation, a “safe harbour” should apply for the personal data of the children as well as the right to the spoken word, so that the children can grow up without data protection-related “baggage”.Footnote 71

4.2 Case Law on Dash-Cams

A further aspect of smart speakers associated with children is the saving of data without their consent, since they are not yet able to understand the concept of such an AI system and the consequences associated with it. In this context, the case law on dash cams will also be reviewed, as they also collect data without consent. This is particularly interesting because Amazon intends to use its Alexa voice assistant in cars in the future.Footnote 72 A so-called dash cam is a pre-installed camera or merely a smartphone with the camera function switched on, which is located in the windscreen of a private car. It records all traffic situations with the objective of having a record of what happened in the event of a traffic-related incident. In some cases, such recordings were not admitted in civil proceedings as inadmissible evidence, as the recording itself violates § 1 (1) BDSG.Footnote 73 However, in 2015, the AG Nienburg approved the use of such a recording for the first time.Footnote 74 The court's line of reasoning was mainly based on a balancing of interests between the personal rights of the accused and the interest of the witness in preserving evidence by making camera recordings. The court argued that the defendant's need for protection was minor, as only the vehicle, but not the passengers, had been recorded and it had only been a short, occasion-related recording. What is problematic about this, as various other courts have also found, is that the recording did not start on an occasion-related basis, but permanently and generally, since the dash cam had probably been switched on and recording the entire car journey.Footnote 75 Regarding smart speakers, at least a similar problem can be transferred, since due to the necessity of the so-called catch phrase, permanent monitoring becomes necessary in order to be able to use the offered service accordingly. A smart speaker is thus always “listening”, especially not on an occasion-related basis as to recognizing the signal word. Following the AG Nienburg's line in favour of smart speaker providers, the admissibility of such devices would depend on a balancing of interests between the market interests of the provider and the interest of the user in protecting his or her data. However, in this context in particular, it must be considered that the respective user himself has decided to purchase the corresponding device and has thereby implicitly consented to the permanent “listening”, which means that the same requirements are not to be placed on the balancing as in the case of “third-party monitoring” by the dash cam. The position is different, as already seen, when considering the situation of children, who in the overwhelming majority of cases do not know what they are consenting to and have not opted in to such a system of AI. In such cases, the protection of their privacy, similar to the case law on unauthorised dash cam recordings, should clearly be paramount.

5 Conclusion

This chapter has explored the functioning of voice assistants and the with coming legal challenges relating to their use in families. This has proven to be especially problematic with regard to the right of privacy of the children that are indirectly affected by the voice assistant used by their parents. Even though not regulating the parent–child relationship, the new AI Regulation is a major step towards improving data protection in this area at the EU level in addition to the GDPR, regulating providers and their duties towards customers. The biggest critique of the existing legislative proposal is the lack of concretisation of the actual prohibitions and obligations by the relevant parties. What is most important here is transparency with regard to the type and handling of data collected. So-called Privacy-by-design approaches i.e. data protection precautions already pre-installed in the system of the smart device that apply automatically and do not have to be activated first, for example, are a viable option for providers of such smart devices to achieve compliance with data protection law. Only limited lessons can be learned from the comparison with other measures taken by private individuals that are data-sensitive; a specific approach to voice assistants and their consequences for (intra-family) privacy is needed.