Monitoring the emergence of forbidden knowledge is a prerequisite for the successful governance of it. This means studying emerging machine learning technologies in different areas and sectors and identifying the dual-use characteristics of potentially harmful applications. Following pre-existing decision frameworks for dual-use technologies in other areas (Miller 2018; Tucker 2012c), one can differentiate between the magnitude of potential harms resulting from forbidden knowledge, the imminence of potential harms, the ease of access to forbidden knowledge, the amount of skills needed to gain forbidden knowledge and the awareness about the emergence or malicious use of forbidden knowledge (see Table 1). Besides this categorization of forbidden knowledge in machine learning research, further questions to assess potential publication risks can be asked, i.e. whether harms have the form of structural risks or direct consequences for individuals, other living entities, the environment or non-living things, what type of harm is imminent, i.e. whether it is aiming at physical or mental health, economic stability, human rights, the environment etc., how high the likelihood of the occurring of a particular harm is, whether it is ephemeral or permanent, what possibilities of responding to a specific harm exist, whether the source of harm is traceable and whether potential harms can be redressed (Crootof 2019). Additionally, forbidden knowledge can be yielded by research in academia itself as well as by research done at corporations. Especially machine learning research is increasingly done at corporations (Perrault et al. 2019). However, it is important to note that industry papers are not lagging behind academic papers concerning social impact considerations, displaying that ideas of an ethical “inferiority” of industry do not bear scrutiny (Hagendorff and Meding 2020).
The decisive question is: how difficult is it for individuals with malicious intentions to weaponize a machine learning application in practice? They need to be aware of certain technological features, the necessary skills as well as the necessary resources. The last two aspects depend on the availability of ready-made products or platforms. The more difficult it is to reproduce a certain technical capability in light of the available papers, codes, models, datasets, hardware etc., the more talent, skill or resources are necessary to utilize the capability. But since talent as well as advanced skills are scarce, the likelihood of abuse scenarios drops. Although, in fact, the opposite trend prevails: the likelihood of abuse scenarios increases. While many institutions like OpenAI, Facebook, Microsoft or others do not publish the full-size model of hazardous applications in the first place, freely accessible Internet platforms are emerging elsewhere, providing, in a simplified way, exactly those services that should be kept away from the public.
With “Grover”, the Allen Institute for Artificial Intelligence offers a platform where fake news can be created on any given topic (Zellers et al. 2019). The same holds true for a platform from Adam King, which can be accessed via https://www.talktotransformer.com, or for the language model HAIM of the machine learning startup AI21 Labs, which can be accessed and used via https://www.ai21.com/haim. The organization Lyrebird offers the possibility to create a voice recording from any given text (for security reasons, up till now only with one's own voice) on its website. Furthermore, and given the case that the software becomes freely purchasable, everyone can edit audio files or fake voices, provided that only a few minutes of voice recordings are available as training data, using Adobe’s program “VoCo”. On thispersondoesnotexist.com, a software from Nvidia creates deceptively real facial images of people who do not exist. Furthermore, with the application "FakeApp" anyone can create DeepFakes.
Hence, it seems obvious that technologies that can be instrumentalized for modern disinformation campaigns are becoming more widely available. It is to be expected that a process of “deskilling”, meaning a gradual decline in the amount of needed knowledge or the rise of ready-made software bundles or how-to manuals, affects the use of machine learning methods and applications, with the result that the number of individuals who can use potentially harmful machine learning technologies grows constantly. This process of “deskilling” is combined with a “democratization” of machine learning and its requirements like training datasets or software libraries. Open access and open source give rise to many innovations while rendering tools freely available to bigger and bigger crowds. At the same time, the number of (non-expert, amateur) individuals with potentially malign intentions and willing to reinterpret the purpose of dual-use technologies in a harmful context grows.
Possibly, the mere idea of using or processing data with methods of machine learning in a certain way poses the threat that individuals with malicious intentions realize this particular idea themselves. This is what Bostrom (2011) calls “idea hazard”. The “idea hazard” is accompanied by the “data” or “product hazard” (Ovadya and Whittlestone 2019), where machine learning applications themselves or their respective outputs pose a danger. All the mentioned hazards can be conveyed by openly accessible research papers, containing more or less details about particular software solutions. Open access, however, is not a “binary variable” (Brundage et al. 2018). There are gradations in the way research results can be shared, for instance, by adopting a staged publication release during which various (negative) effects of the released applications are monitored. Tangible information sharing rules relate to the kind of information or code that is exchanged or selectively made public. Documents about applications possessing certain risks of causing harm when used “in the wild” in an uncontrolled manner can contain rough descriptions of the achievement or simple proofs of concept. The next step is to publish pseudocode, parts of the code or machine learning algorithms without the necessary hyperparameters. Ultimately, papers can contain appendices with fully working exploits or the complete code together with the trained model as well as tutorials (see Table 2).
Each level of sharing information raises the risk that third parties can use the research for malicious or criminal purposes. The fewer details that are shared by researchers, the higher the need for technical expertise on the part of third parties who wish to reproduce or harness the original achievements. Apart from the extend of shared technical details, the availability or reproducibility of forbidden knowledge depends on many further conditions: the amount of monetary costs to acquire certain hard- or software, whether code is equipped with or without comments as well as their level of detail, whether code is compiled or raw, whether details about the types of the required hardware being used are known and so forth.
Areas of forbidden knowledge
In the following section, the paper sheds light on different areas of forbidden knowledge. Examples of machine learning-based applications that were retracted or never got published are to be described. Various fields of application will be discussed, including contexts like sexuality, social manipulation, algorithmic discrimination, “artificial general intelligence” as well as further areas where sensitive information can be produced or acquired.
Regardless of the particular area, one can differentiate between machine learning applications that aim at single individuals (e.g. “gaydar” applications) and applications that aim at a wider social context or whole societies (e.g. “artificial general intelligence”). Furthermore, one has to distinguish between single inventions or individual research works that result in forbidden knowledge (e.g. research on automatically winning commercial games) and the dynamics of many small inventions or consecutive research activities that gradually produce forbidden knowledge (e.g. research on deep fakes). Moreover, applications that gather or detect sensitive information (e.g. digital suicide risk detectors) have to be differentiated from applications that generate or make up fake sensitive or discrediting information (e.g. text generators). In addition to that, either information on machine learning applications itself or information in the form of an application’s output can be considered to have the status of forbidden knowledge.
Research on synthetic media and the publication of corresponding findings and insights is delicate. This was perfectly shown by researchers at OpenAI. They developed a text generator called GPT-2 (Radford et al. 2019b) which is so powerful that they decided to follow a staged release policy (Clark et al. 2019; Radford et al. 2019a; Solaiman et al. 2019). OpenAI has teamed up with several partner universities studying the human susceptibility to artificially generated texts, potential misuse scenarios or biases in the produced texts. The original decision not to release the full-fledged text generator was fueled by fears concerning the circumstance that GPT-2 could significantly lower the costs of disinformation campaigns or simplify the creation of spambots for forums or social media platforms. Although admitting that they found “minimal evidence of misuse” via their “threat monitoring” (Solaiman et al. 2019), doubts are justified as to whether their monitoring, which was mainly focused on online communities, was really reliable, since OpenAI and their partners can mostly monitor current public plans or cases of misuse for their application, but not non-public or potential future misuse scenarios as well as advanced persistent threats.
In the overall view, machine learning technologies make it possible to automatically create any media, be it images (Karras et al. 2017), videos (Thies et al. 2016), audio recordings (Bendel 2017) or texts (Radford et al. 2019a). The quality of the media created is constantly improving, so that previously accepted principles, such as “seeing is believing” or “hearing is believing”, have to be abandoned. Whether the content corresponds with actual events does not matter. While researchers try to catch up and find solutions to reliably detect fake samples produced by generative adversarial networks (Rössler et al. 2019; Valle et al. 2018), it remains true that generative models make it really easy to generate or edit media. Despite those technical solutions to detect synthetic media and approaches to educate humans on detecting machine manipulated media (Groh et al. 2019), a further, quite strict idea is to limit the availability of trained generative models. Against this background, it is astounding how unquestioningly papers have been published in recent years, in which leap innovations in the generation of fake media, especially videos, are described—although many research groups, for instance, the one behind Face2Face, did not release their code (Fried et al. 2019; Ovadya and Whittlestone 2019; Thies et al. 2015, 2016, 2018, 2019). Synthetic videos, no matter if they are generated through Face2Face, DeepFakes, FaceSwap or NeuralTextures, can have all sorts of negative consequences, from harm to individuals, national security, to the economy and democracy (Chesney and Citron 2018). Fake porn is used to intimidate journalists, fake audios to mimic CEOs and commit fraud, fake pictures to trick other people into disclosing sensitive information (Harwell 2018; Satter 2019) and so on. Despite obvious risks, the improvement of synthetic media also poses the risk of people claiming that real footage is fake, erroneously denying its verisimilitude. In this context, technical solutions to detect synthetic media should also operate in the opposite direction, meaning that they should be able to detect recordings that are real.
Issues of social manipulation via machine learning applications came especially into public awareness after reports about the role of Cambridge Analytica during the successful UK’s Vote Leave campaign and the 2016 US presidential election. Some of the methods used during the elections trace back to research in the field of psychometrics. Here lies the origin of methods where individual’s psychological profiles are automatically extracted from their (harmless) digital footprints via machine learning to influence their behavior or attitudes. Researchers proved that very few data points a particular individual generates suffice to make accurate predictions about personality traits (Kosinski et al. 2014, 2015; Lambiotte and Kosinski 2014; Youyou et al. 2015), which can in turn be used for improved persuasive techniques, called “micro-targeting”. Micro-targeting, for instance, can significantly raise the click-through-rates of personalized online advertisements (Matz et al. 2017). However, it is not just advertisements. Several companies exploit techniques where psychometrics and machine learning are combined to conduce “behavioral change programs”, blurring the lines between the military and civic use of (dis-)information campaigns or “psy-ops” (Ramsay 2018). The scientists involved in the related research honestly talk about “considerable negative implications” (Kosinski et al. 2013) of their work. Psychometrics research, which builds the foundation for methods of social manipulation, was metaphorically called a “bomb” (Grassegger and Krogerus 2016). Others dubbed the methods used by Cambridge Analytica and similar organizations “weapon-grade communication techniques” (Cadwalladr 2019), which clearly point at the dangers certain machine learning applications can pose.
The emergence of (unintended) algorithmic discrimination can be a reason for retracting machine learning applications. To name just two examples: in 2016, Microsoft developed a chat bot called “Tay”. After a while, it was bombarded with racist, sexist language by trolls. And since “Tay” is based on machine learning algorithms, it inherited and automatically reproduced the discriminatory language (Misty 2016). After 1 day, Microsoft had to retract the application. This popular example shows the danger of not anticipating that machine learning applications can be manipulated by adversarial inputs or not anticipating to equip those applications with meta-rules, meaning that programmers define boundaries, software agents are not allowed to overstep (Wallach and Allen 2009). Another example where failures to prevent discrimination were an issue and lead to the retraction of machine learning-based tools is Amazon’s experimental hiring software. The software used machine learning techniques to score job candidates (Dastin 2018). It discriminated against women, since it was trained with patterns in applications that were submitted over a 10-year period, which came mostly from men. Amazon had to shut down the project after they found out about its shortcomings.
Those two examples, to which many more could be added (Bolukbasi et al. 2016; Hagendorff 2019b), depict cases of discrimination through technology. Another strand of discrimination occurs when technology is used to assist discrimination. One can, for instance, think of using machine learning methods to sift through datasets containing demographics, profiling, biometric, medical or other behavioral data to gain insights about racial, sexual or cognitive differences between different groups of persons. It must be stressed that in this context, research on the nature versus nurture debate can be very problematic not only due to potentially malicious interest of the involved researchers, but also due to the inability of the public to deal with corresponding research findings and due to the political consequences the publication of particular findings would likely have. The same holds true with regard to machine learning-based research on mental illnesses or intelligence. For instance, researchers showed that social media profiles or especially Facebook posts can be used to predict depression (Choudhury et al. 2013; Eichstaedt et al. 2018). Those insights can be used for the common good, but also for purposes of unjust social sorting or discrimination (Lyon 2003).
Issues related to sexuality are another area where machine learning applications can cause widespread harm. For instance, a software called “DeepNude” found rapid sales, allowing users to automatically render pictures (of women) into nude photos. Shortly after its release, the developers stopped offering the software (Quach 2019), but one can still use it via various online platforms. Deciding to stop selling the software did obviously not stop its further dissemination. Another case, where a particular machine learning application is more than just prone to abuse, is the use of deep neural networks for detecting sexual orientation from facial images. This was first demonstrated in the famous paper by Kosinski and Wang (2018). The study raised a lot of criticism (Todorov 2018). Some of its findings were later confirmed by a replication study (Leuner 2019), although the results of the study still leave the question open, to which extent the prediction of sexual orientation is influenced by biological features, such as facial morphology or by differences in presentation, grooming and lifestyle. Notwithstanding that and just assuming that the study engenders the mere expectation that a person’s sexual orientation can be derived from facial features, John Leuner, who conducted the replication study, correctly claimed that the research “may have serious implications for the privacy and safety of gay men and women” (Leuner 2019), a sentence which is nearly identical with the claim of Kosinski and Wang, who write that their findings “expose a threat to the privacy and safety of gay men and women.” (Kosinski and Wang 2018) To prevent this threat to a certain degree, Leuner did not disclose the source of his data, which he collected for his study. Otherwise, Kosinski and Wang stress that abandoning the publication of their findings “could deprive individuals of the chance to take preventive measures and policymakers the ability to introduce legislation to protect people.” (Kosinski and Wang 2018) The researchers hope that upcoming or current post-privacy societies are “inhabited by well-educated, tolerant people who are dedicated to equal rights” (Kosinski and Wang 2018). This may sound naïve, especially in view of current political trends and raising group-focused enmity. Therefore, the misuse of the aforementioned research is a considerable concern, advising stronger caution when publishing research results in the context of machine learning applications dedicated to reveal or generate traits connected to sexuality or sexual orientation.
Further sensitive fields
What holds true for sexuality is at the same time applicable to further sensitive fields where machine learning techniques are applied to detect forbidden knowledge about an individual’s intelligence, political views, ethnic origin, wealth, propensity to criminality, religiosity, drug use or mental illnesses. Regarding the latter, Facebook, for instance, repeatedly ushered initiatives for suicide and self-harm prevention. By merely analyzing likes, comments or other interactions on their platform, Facebook can “sense” suicide plans and help affected persons or persons who are related to the affected ones via overlays with information on suicide prevention. Due to its sensitive nature, this tool was not released in Europe and is, therefore, representing another case of forbidden knowledge (Keller 2018). Information, for instance, about one’s suicide risk is traditionally protected by privacy norms (Veghes et al. 2012). Those norms were first and foremost based on restricting access to or controlling the dissemination of personal information, for example via concepts of contextual integrity (Nissenbaum 2010; Tavani 2008). With regard to existing machine learning techniques, those methods are obsolete (Belliger and Krieger 2018; Hagendorff 2019a). Now, intimate personal information like sentiments or personality traits can not only be automatically extracted from Social Media profiles (Youyou et al. 2015), but also from personal websites or blogs (Marcus et al. 2006; Yarkoni 2010), pictures (Segalin et al. 2017), smartphone usage (Cao et al. 2017; LiKamWa et al. 2013) and many more. Furthermore, particularly sensitive applications for purposes of reading one’s mind, for rudimentary brain-to-brain interfaces or even the decoding of dreams are being developed (Horikawa and Kamitani 2017; Jiang et al. 2019). This new, machine learning-based research stands in a long tradition of trying to control, read or manipulate individual’s minds with different technologies (Wheelis 2012).
Apart from machine learning technologies which aim at single individuals, there are applications that have effects on a more general, societal level. Examples for such applications, that can likewise fall under the category of forbidden knowledge, could be innovative artificial trading agents used for market manipulation (Wellman and Rajan 2017), software to conduct automated spear phishing (Seymour and Tully 2016), “AI 0-days” or other massive vulnerabilities in machine learning procedures itself, as well as methods for automated software vulnerability detection (Brundage et al. 2018), classified surveillance technologies, the combination of data from fleets of earth-observing satellites with news sources, mobile devices, social media platforms and environment sensors (Kova 2019) or even machine learning-based applications build to assist with or conduct torture (McAllister 2017). In addition to such rather obvious areas where forbidden knowledge may occur, machine learning applications have also been held back in less obvious places as a result of risk assessments. For instance, a 2019 publication (Brown and Sandholm 2019) demonstrated how “Pluribus”, a machine learning-based software, is stronger than professional human players in six-player no-limit Texas hold’em poker. With a short reference to the fact that the “risk associated with releasing the code outweighs the benefits” (Brown and Sandholm 2019), the researchers decided to only release the pseudocode, but not the complete program to not harm the poker community, as well as online poker companies. Ultimately, not only in poker, but in any online game where players can win money, it is to be expected that machine learning applications can be used to win money illegitimately using computational agents. The fact that software developers decide not to publish programs in this context is just another symptom of an increasing amount of forbidden knowledge in machine learning.
Moreover, the creation of “artificial general intelligence” is associated with the fear of technology developing an uncontrollable momentum of its own (Bostrom 2014; Omohundro 2008, 2014; Tegmark 2017). That is why some researchers demand to halt every research effort aiming for “artificial general intelligence”—despite the fact that discussions around “artificial general intelligence” are often quite far-fetched and speculative. Nevertheless, this does not mean that current technologies are completely rid of the risk of becoming uncontrollable. When Facebook, for instance, developed an “intelligent” bot for negotiation purposes (Lewis et al. 2017), it happened that the negotiation software transitioned from using English language to a language or dialect of its own, which humans cannot understand. This phenomenon of computational agents developing code words for themselves (Das et al. 2017; Lazaridou et al. 2016; Mordatch and Abbeel 2017) prompted the Facebook researchers to shut down the negotiation bot (Wilson 2017) to stay in control about what the system is communicating.