Interperforming in AI: question of ‘natural’ in machine learning and recurrent neural networks

  • Tolga YalurEmail author
Open Access
Student Section


This article offers a critical inquiry of contemporary neural network models as an instance of machine learning, from an interdisciplinary perspective of AI studies and performativity. It shows the limits on the architecture of these network systems due to the misemployment of ‘natural’ performance, and it offers ‘context’ as a variable from a performative approach, instead of a constant. The article begins with a brief review of machine learning-based natural language processing systems and continues with a concentration on the relevant model of recurrent neural networks, which is applied in most commercial research such as Facebook AI Research. It demonstrates that the logic of performativity is not brought into account in all recurrent nets, which is an integral part of human performance and languaging, and it argues that recurrent network models, in particular, fail to grasp human performativity. This logic works similarly to the theory of performativity articulated by Jacques Derrida in his critique of John L. Austin’s concept of the performative. Applying Jacques Derrida’s work on performativity, and linguistic traces as spatially organized entities that allow for this notion of performance, the article argues that recurrent nets fall into the trap of taking ‘context’ as a constant, of treating human performance as a ‘natural’ fix to be encoded, instead of performative. Lastly, the article applies its proposal more concretely to the case of Facebook AI Research’s Alice and Bob.


Performativity Machine learning Natural language processing Recurrent neural networks Derrida Facebook 

1 Introduction

A never-ending goal of the research on human–machine interaction has been to achieve a state where humans and computers may have natural conversations. In recent decades, there have been overwhelming advances in the competency of computers to recognize, parse and understand language and images, and generate responses in the context of conversations, especially by the help of the new practical deep neural network models developed in relatively independent laboratories of MIT and DFKI. In particular, when it comes to the use of these state-of-the-art models in daily life, it is imperative to take into account the developments in the laboratories of major commercial technology firms, such as Facebook and Google. In these laboratories, there are quite a few models that understand natural language in constrained domains, goal-oriented dialog contexts, no matter how stilted digitized voices of these models might sound. Yet, even with the current advanced models developed in these laboratories, we still see artificial agents that generate speech in a limited number of situations for limited number of purposes, such as reserving a place in a restaurant in the case of Google’s Duplex, or playing chess in the case of IBM’s Watson (Danaher 2018). In particular, machine learning-based natural language processing systems are still struggling to model simple human-like communication, such as conversations. They do not go beyond constant contextual constraints to engage in a flowing conversation that can force the artificial agent (bot or AI assistant) to adjust to human agents’ natural linguistic systems instead of the natural language systems adapting to the artificial agent.

This paper unpacks the most commonly and commercially used NLP model, recurrent nets or recurrent neural networks (RNN), to question the treatment of “natural” in the process of designing and developing artificial agents in commercial laboratories, such as Facebook AI Research. Recurrent nets are designed for pattern recognition in data, such as text, numerical data, or images. The algebraic functions of these nets are enhanced by the repetitive insertion of similar data inputs in a way as to get close to the way human memory operates to use language in the context of conversation. In other words, RNNs take the input as something they have already recognized. For instance, a present case of an input would be someone speaking to Amazon’s Alexa to play a particular song on Spotify (“Alexa, can you play Billie Eilish’s ‘Bad Guy’”), and Alexa completes this task by parsing the words, recognizing the traces of each word and the entire sentence structure by means of its previously recorded audio dataset, as well as grammar structures. In this sense, these models have two sorts of inputs at work: the one from the past (encoded data) and the one at the present (input data), which determine how recurrent nets are supposed to respond to new inputs. This procedure is accepted as the same as humans do in real life and is termed ‘natural language processing’. This paper questions the treatment of the “natural” in RNN models in particular, and in machine learning-based NLP models in general. As we will see, “recurrent” or repetitive dimension of language learning requires an examination on the performative aspects of language traces (as opposed to the formal abstraction of models that appeal to the “natural”).

The article begins with a description of NLP architecture deployed in the research on machine learning, with an overview of the particular neural networks that have been found applicable for developing algorithmic agents toward commercial purposes. The next section outlines the processes through which language gets recorded and indexed into big datasets, and employed by recurrent neural networks based on machine learning. Then, in the third section, the article takes a theoretical detour through Jacques Derrida’s reworking of performativity based on his critique of John L. Austin’s work on the constative and performative utterances to underline the trouble with the natural, and to propose the performative as a better concept to grasp the functions of recurrent nets. The last section then applies this theoretical argument more concretely to the case of Alice and Bob, two prototypical AI bots (known as “Turkers”) developed by Facebook AI Research, and underlines some of the key problems with RNNs in particular.

2 The question concerning the “natural”: human likeness

Human-like performance is a common denominator in broad definitions of AI. One of the most cited books in the field, Russell and Norvig’s Artificial Intelligence: A Modern Approach, elaborate a number of leading definitions and find out that ‘doing like a human’ and ‘human rationality’ are common characteristics (Russell and Norvig 2016). This is no surprise, since these definitions follow Alan Turing’s footsteps, and that Turing’s ideal computer was based on imitating human-like speech (Turing 1950). Various versions of Turing’s famous test illustrate why certain schools approach AI as an embodiment with human-like traits, and thus treat these projected embodiments of human-like traits as natural fixes. Contemporary research on the development of AI relies on a similar understanding of the human mind and sign systems, which, in most current cases, show a synthesis of language and articulation, imitation and data processing.

Natural language processing is a comprehensive practice of predominantly statistical and broadly computational techniques used in the process of analyzing, parsing and representing language (Allen 2006). NLP research generally started in the 1950s thanks to Turing’s work, which provided the research with the basic criteria, the Turing Test, toward successful intelligence models (Chowdhury 2003). NLP techniques have evolved from earlier models that processed sentences in minutes in the 1950s to the search engines that process large texts in seconds in the 2000s (Cambria 2014). NLP techniques are an integral part of digital tools from computers to smartphones that provide them with various tasks at multiple levels such as parsing language, sentence breaking, part-of-speech tagging (POS), named-entity recognition (NER), optical character recognition (OCR) and machine translation.

Recent NLP research concentrates on machine learning methods due to the growth of machine learning mechanisms and algorithms in the applied fields, such as pattern recognition. In earlier models, NLP tasks incorporated simplistically trained support vector machines (SVM) to arrive at the threshold between (two) groups of linguistic data (e.g., sentences) whose distance from each particular datum point (e.g., words) is maximized, disregarding many points as outliers. In the last decade, however, NLP tasks have been supported by various neural networks with sophisticated vector representations, and the research has yielded powerful outcomes of using these networks on NLP tasks. This increasing use of neural networks is a result of machine learning methods that allow the automated aspect of representing language by learning, which transcends traditional machine learning techniques that rely heavily on manual touches (Fig. 1) (Socher and Perelygin; Collobert et al.).
Fig. 1

Simple visualization of supervised and unsupervised learning processes. NLP systems are based on a common process of initial training of their network models. Engineers in this process supervise the learning function of the system’s network and the production of linguistic output after processing an input. Breaching points between linguistic actions (talking) and performance goals (talking in a particular context for a particular task) are fulfilled manually (Conneau et al. 2017). Unsupervised learning is also another method deployed. In this case, after a process of supervised learning, an NLP network learns the logic of computation and processing data autoregressively, that is, by using the computations of the previous step in the computations of the next step. In this case, while certain tasks and performance goals are given for the initial supervision, breaching points between linguistic actions (talking) are connected. For recent examples, see Kelly and IBM (2018)

At first glance, this interactive approach to research brings forward not only the importance of machine learning, but also the significance of conducting human-like interactions (Bennett et al. 2003). However, ‘interaction’ is a form/trace mediated exchange that is separated from action, how humans feel or how humans use multimodal activity. A closer look at the infrastructure of NLP models thus opens up the question of ‘natural’ as in the human world. Once asked, it will be argued that misassociations with the ‘natural’ place unnecessary limits on the architecture of NLP systems. Furthermore, in mistreating the human-like as a fixed norm, as in the unchanging laws of physics, NLP systems also incorporate the ‘context’ component into machinic models as a constant as opposed to a variable.

There are multiple machine learning methods that do the work of processing various forms of data through dimensions that are composed of representations. Recent methods and models that have been employed to master the “natural” come forward with regard to the design of commercial AI products. The model of recurrent neural networks (RNNs) has been put into operation in the designs by Facebook’s and Google’s AI teams. Facebook’s Mechanical Turker Descent researchers Yang et al. and Google Duplex engineers propose the interactive process as a better method to produce natural language since humans learn a language in a natural environment (Yang et al. 2017; Leviathan and Matias 2018). It is therefore imperative to scrutinize the architecture of recurrent nets, a fundamental feature of major machine learning research on language processing. We will unpack the relations between recurrent nets and the “natural” to clarify the trouble with the ‘context’ component of these nets.

2.1 Recurrent neural networks

Whether recorded data are distributed similar to the ‘human world’ or not is a prominent issue in NLP systems. Given that the data are not significantly different from those found in the human world, it is hypothesized to be reliable. The distributional hypothesis is related to the context component of any NLP system. Distributions represent words with similar meanings by vectors, and the hypothesis makes the strong assumption that these words will only be uttered this way in similar contexts. The distributional vectors function to grasp the neighbors of a particular word within some borders known as windows. This procedure called “word embedding” computes similarities between vector representations through various formulations such as cosine. Usually, these embeddings are obtained through initial pieces of training in large datasets, by subsidiary goals such as forecasting a word used in a particular context (Mikolov et al. 2010). Context is pictured as a psychological force that ‘works’ to get semantic information from the data. Because windowing reduces the number of dimensions, word embeddings are useful in deciding on the context for NLP tasks. Machine learning-based NLP models in general, and RNNs in particular, represent not only words, but also sentences employing these distributed representations (word embeddings).

RNNs perform the same computational representation sequentially, that is, they do the same computation in each step of the sequence, and every other step depends on the computation of the previous step of the sequence (Fig. 2). Previous computations function as memory traces that explain the present state of the information. Popular NLP tasks such as textual predictions in search engines, machine translation, use this modeling (Mikolov et al. 2010).
Fig. 2

A simple recurrent neural network and the unfolding of it in time of the computation involved in its forward computation. The sequence in its totality is supported by the time steps. Xt stands for the language as the input to the RNN at the time t, and St represents the hidden state of the network’s memory component that re-adjusts data from previous steps. In NLP applications, Xt usually embodies embeddings as explained above, but it is also a representation of the content. The value of St depends on the present and the previous states, that is, the meaning of the present input, St, depends on the previous meanings as well as the Xt. As new inputs (Xt+1) are inserted, the regressed meaning (the value of St+1) slightly changes, and it will depend on St. Finally, Ot is the output of the processed language. A simple example of this algebraic model is speech-to-text conversion in which a machine transcribes speech (input) to text (output) based on the previous transcriptions stored in its dataset, and the new transcription is integrated into its dataset.

Source: Gao et al.

Because RNNs process linguistic information sequentially, their algorithms can recognize sequentiality as embedded in language. RNNs parse language into units of a sequence, and these units represent characters, words or sentences. In essence, RNN uses a lexical semantic approach to such units. Depending on the meaning of use, each new unit might change its semantic meaning based on the previous unit (like compound words). RNNs can capture various meanings through the mediation of its re-adjustable and sequential function to model texts of any length from words to longer documents (Tang et al. 2015).

2.2 Mastering the “natural”

Various versions of the Turing Test have shown why research on AI strives to compose models with natural human characteristics such as language. In the representative version of the test, there are three participators: an interrogator, a human and a machine. If the interrogator cannot tell who is human and who is the machine, or if the machine can trick the interrogator, it passes the test as “intelligent.” Turing proposed his digital computer model as a self-learning machine using an audio dataset recorded to tape. The earlier models of imitative AI relied on the mediation of data between the machine and the human, or the mediation between “intelligences” on the basis of lingual signs (see Shannon and Weaver 1949).

Turing’s proposal for intelligence did not explicitly refer to an AI’s understanding of natural language in a dialog but, rather, how language could be learned and imitated to pass the test. In this vein, the Turing design concentrates on mere imitation without recurrence. RNN, however, is designed not only to imitate the natural, but also approximate and fixate the natural through repetition. Their design incorporates learning the architecture of language. Performing beyond imitation requires repetition of similar patterns in different contexts. It thus demands their embedded ability to learn structures in language. Repetition of the learned patterns and then putting together small pieces to make a meaningful whole, using this larger structure in another step and then reiterating this process help RNN algorithms build exhaustive structures of words, sentences and dialogs from a limited dataset. The infrastructure of this particular NLP model highlights these themes of linguistic performance and contextualization, as well as an orientation toward an ultimate goal (for any NLP framework, this process of objectifying a task with a goal is known as incentivization, in other words, agents are intentionalized). The whole is associated with differing treatments of gathering, processing and generating data.

RNN architecture focuses on language as “natural”. Despite innovative and powerful techniques, we argue that the concept of “natural” needs to be problematized by taking a performative approach to language. Jacques Derrida’s articulation of performativity would offer an analytical tool for such a goal. Linguistic performativity provides us with a variable of context. In this way, rather than offer a fixated meaning of “conversation as context”, one can see the differing meanings of each word, sentences and texts not only within a given dataset, but also in their interactively generated equivalents. As such, we will now turn to a survey of relevant aspects of Jacques Derrida’s view of the performative. The results show a problem immanent to the RNN architecture in particular and, it is suggested, NLP systems in general.

3 Performativity and contextual possibilities

Every sign, linguistic or nonlinguistic, spoken or written… in a small or large unit, can be cited, put between quotation marks; in so doing it can break with every given context, engendering an infinity of new contexts in a manner which is absolutely illimitable. This does not imply that the mark is valid outside of a context, but on the contrary that there are only contexts without any center or absolute anchoring [ancrage]. This citationality, this duplication or duplicity, this iterability of the mark is neither an accident nor an anomaly, it is that (normal/abnormal) without which a mark could not even have a function called “normal.” What would a mark be that could not be cited? Or one whose origins would not get lost along the way? (Derrida 1988, 12)

A central concept of the theory of language as imitated and reiterated performance is performativity, highlighted by Jacques Derrida in ‘Signature Event Context’ (1988). Derrida critiques John L. Austin’s concept of performative utterances in How To Do Things With Words, where Austin has put forward that performativity in languaging is based on action that includes ‘context’ and ‘consciousness’ and human use of ‘illocutionary’ force. Austin identifies two forms of interrelated utterances: constative utterances that define something and performative utterances that refer to the act of performing what is said. In his view, statements such as “This computer is old,” or “I bought a new computer,” are either descriptive or reported, rather than performative. Austin’s notion of ‘performative utterances’ relies on indulging in an act such as a marriage that is confirmed by uttering words, “taking to be one’s wife”, “To name the ship is to say (in the appropriate circumstances) the words ‘I name &c.’ When I say, before the registrar or altar, etc., ‘I do’, I am not reporting on a marriage, I am indulging in it” (Austin 1962, 6).

Derrida concentrates on not action, but signs and speech traces. In this way, he turns to the issue of “the conscious presence of the intention of the speaking subject in the totality of his speech act,” which, he claims, is a highly significant ingredient of performativity (Derrida 1988, 14). He uses the term ‘author’ for the origin and the norm of the language (which he tends to associate with writing) and, further, illustrates how the author’s intention can be de-contextualized in different uses of language by the speaking subject’s conscious intention and memory traces. All elements, Derrida stresses, are replete with ‘reiteration’ and ‘recitation.’ For there to be a successful or an unsuccessful speech, respectively, language should be cited and reiterated. Repetition of constatives, in Derrida’s view, slightly alter the meaning of the speech acts in ways that are context sensitive. This is Derrida’s perspective of performativity.

Unlike Derrida, Austin denies that ‘the mediation of thinking’ is central to language. His aim was to trace language to the performative. Accordingly, he addresses, not speech traces, but actual people who build and launch ships, get married, make avowals, make philosophy and other similar ‘circumstances’. He subordinated the ‘semiotic’ to ‘speech acts’ whose illocutionary and perlocutionary force depend on circumstances. Eventually, he probably came to see that when one starts with speech acts, the performative/constative distinction is untenable. The constative cannot oppose the performative; in other words, constative acts are also performative. Austin’s concern was with human activity in which wordings play a part. He focused on languaging as a multimodal connector of people and things. As an activity, Austin’s view of languaging does not reduce to language processing.

Derrida was aware that communication is not only the mediation of thinking, but also what he calls an ‘original movement’ of traces of the previous meanings of the same. The success of the performative lies in the movement within any given contextual limitation. Derrida stresses that Austin’s “ideal regulation… excludes the risk [contextual difference of meaning] as accidental, exterior, one which teaches us nothing about the linguistic phenomenon being considered” (1988, 14). Austin has emphasized that, uttered by an actor on a stage, performative utterance becomes void and deprived of any possibility of a variable meaning understood contextually. Derrida finds out that this claim does not include the possibility that performative utterances, along with any other acts, can be “quoted.” Austin overlooks the fact that quotation can de-contextualize, abnormalize, and parasitize the original, which weakens the argument that actually uttered words can map onto elemental variables (1988, 21–22). Derrida emphasizes that this exception embodies a powerful alteration, difference, of general iterability or citationality. Performance becomes a success, something new, with the variability, and iterability, of context. This general iterability means that an utterance is coded:

Could a performative utterance succeed if its formulation did not repeat a “coded” or iterable utterance, or in other words, if the formula I pronounce in order to open a meeting, launch a ship or a marriage was not identifiable as conforming with an iterable model, if it were not then identifiable in some way as a “citation”? (Derrida 1988, 18)

For Derrida, reiterability, re-citeability of any semiotic sign/trace is an inherent characteristic of language. The origin and intention become insignificant as this reiteration proceeds, allowing the sign to be used for new possibilities. Iterability does not simply mean that any performative utterance is a repetition of a norm. Different contexts yield different results, and a reiteration cannot be pure. This general iterability assumes that there is residual for the return of the same in the process of variation between each item re-uttered. Each re-uttered item will vary in the process of constating. The differential residual will make it impossible for a re-presenting or an absolute return of the original in which the spectral author, or the linguistic sign’s past, is displaced since it becomes insignificant.

Our inquiry thus concerns, building on Derrida, how speech traces can be used performatively. Derrida’s view of the performative concerns ‘written’ marks that exist independently of performance. In his light, whereas traces cannot perform, living systems do so. Derrida’s concern with traces overlaps with our concern with machine-derived and manipulated traces. Language processing-based computational models deal with traces, not with Austin’s concerns about what language does things to living people. Accordingly, the question as to how computational means can be used to classify and manipulate traces and elements is significant. We argue that they can be treated as performative and tagged to elements and variables to pick out ‘speech act types’ and types of contexts.

We conclude that as long as context as a variable is not included in NLP models, there is no possibility of the new. Austin’s theory applies to the limited performance of the AI agents within given norms and goals; Derrida’s critical approach, however, opens up a contextual or even creative approach to language. The only invention would be the re-invention of speech act, followed by a supervised narrowing that is sensitive to context. In the case of Facebook’s and Google’s AI researches, the proposal is that of simple contexts run by enormous data, and this is because the researchers conceive unusual events to be usual in big data (Yang et al. 2017; Michaely et al. 2017; Oord et al. 2017). FAIR’s categorization is still flexible due to the supervised learning process through repetition. The data structure of RNNs is inevitably encoded at the level of algorithm, grammar, subject–object relation and the editorial code, and thus the possibility of new contexts is always limited by these operations. Hence, RNNs function through contextualization, as evidenced by the linguistic nature. This restrains their ability to investigate the critical domain upon which contextual variations parasitize the original data and produce the new. What this means for us is that even those intentions and contexts that are not included in the master algorithm of RNNs can be included through the artificial agent’s entrance into the game of linguistic performativity that constitutes the agent’s ability to do what it is not encoded to do.

4 The case of Alice and Bob

Catch only what you’ve thrown yourself, all is mere skill and little gain; but when you’re suddenly the catcher of a ball thrown by an eternal partner with accurate and measured swing toward you, to your center, in an arch from the great bridge-building of God: why catching then becomes a power - not yours, a world’s.

Rainer Maria Rilke

In the case of Mechanical Turker Descent, Facebook researchers understand natural language as a large dataset of interactive speech traces that can all be structured into speech-to-text or text-to-speech utterances. Although these double structures of speech are obviously more flexible than the convolutional neural networks that Yann LeCun (Conneau et al. 2016) was analyzing at the time of the FAIR’s launch, they still come down to basic concepts that can be analyzed by specifying a relation between iterability and intention. This means that those things that are included in the given datasets can be successfully repeated by an intentional (goal-oriented) AI agent, Turker. Yet, this success is limited by a single repetition of the patterns in the fixed dataset. Interactive learning, in augmenting the patterns in the dataset, necessitates recursion of different contextual uses of, say, the same element. However, this variability of contextual uses of inputs is not included as a key component of RNNs. Context is only taken as a basic conversation.

Performativity of a natural language is a threshold through which new contexts, differential acts, and presences can take place. Although the apparent recursivity is the aim of NLP programming, this process has overlooked linguistic performativity. Linguistic performativity lies at the center of machine learning based NLP and clarifies a train of thought that is endowed with underlining the constraints of such attempts to project the natural, that is, performative, into the artificial. What is evident in recurrent nets is the scarcity of the relevant components and underrepresentation of the contextual dimensions taking place in the linguistic performativity. We will now clarify this problem by a simple insertion of linguistic performativity and will provide a basic articulation of the missing component of context between the recursive and the performative through the case of FAIR’s Alice and Bob.

4.1 The trouble with Mastering the Dungeon

The Mechanical Turker Descent’s architecture uses this mode of sequential RNN infrastructure for its machine learning model (Fig. 3). The artificial agents of this infrastructure learn collaboratively through sequential rounds of playing with language. In the gamified environment of the ironically named Mastering the Dungeon, AI agents compete with each other as they proceed in this setting. After each round, agents’ data are processed, accumulated and shared with each other so that their linguistic functions might operate better in their interactions with human agents. In the simple game, a dragon resides in this dungeon and trained by humans to interact with objects from swords to other characters. The dragon is an instance of the Turker agents, and the dungeon is a representative environment for mastering natural language through the use of sequential RNNs. This setting allows for communication and interaction not only with the environment but also among the agents. The advantage of this machine learning model is that it merges learning with performing language. To break down this performance, MTD allows a learning artificial agent to interact, collaborate, and communicate in regard to the data that are embedded in this network of the dungeon. The network of MTD provides the agents with an ability to create new words, verbs, and sentences within the constraints of the dungeon.
Fig. 3

The competitive-interactive Mechanical Turker Descent (MTD) algorithm. In each round, Turkers compete to produce the best training data.

Credit: Facebook AI Research

The objective of Facebook’s project is to see whether it was possible for AI agents “to engage in start-to-finish negotiations with other bots or people while arriving at common decisions or outcomes” (Lewis et al. 2017).1 Their goals were to negotiate for a number of objects like hats, books, and balls. AI agents are supposed to learn how to negotiate with humans without letting them know that they are talking to a machine, or without exposing them to a mechanic performance of language. For this, as mentioned above, the constraint to be expanded at the heart of the neural network model is the conversational flow within the contextual constraints of the dungeon.

The more challenging test, however, was to let AI agents talk to each other, instead of humans, without knowing that they were machines. The agents, in this case, were called “Bob” and “Alice”, and they quoted some word embeddings like “to me” and “the” many times to exchange certain numbers of balls and books (Fig. 4). Instead of enumerating the amount of these bonuses, they isolate the numbers with words like “the” (Lewis et al. 2017). The sequences following one after the other added repeated words and deferred the context of the task in the technically natural way expected from humans. Particular sayings (or acts) emerge between Alice and Bob and, later, reiterated by re-evoking the past (i.e., tracing previously encoded word embeddings).
Fig. 4

The AI agents “Alice” and “Bob” developing a new English language in the intentionalized process of negotiation. MTD’s Alice and Bob share the same architecture with the Fig. 2 above, with an inter-directional encoder that parses word embeddings into sequences of hidden states. The architecture concentrates on the agents’ acts, models these acts and forecasts the possible present acts by tracing previous sets of acts encoded in its dataset.

Credit: Facebook AI Research

The conundrum in this particular case, and arguably in the RNN models in general, is the way the context is treated. From the perspective of linguistic performativity, context, which is fixated as dialog in neural networks, becomes a variable itself. The word and sentence embeddings of the network depend on the varying context. In each recurrence, the context moves along with the network, as each repeating word is added to the sentences. Each time a word is added as a new component to the conversation with the incentive/intention to get close to a particular goal, any success is deferred. This contextual deferring, however, naturally produces a form of linguistic performance that is only relevant to the task in itself. As the interaction takes place within the deferring context, it endows the word embeddings with deferred meanings to arrive at the balls or the books.

What Facebook’s researchers consider any given new to be is an entity in a certain language system made up of text-to-speech function, and this performative operation restricts the possibility of enabling a deferring context that would otherwise be unconstrained. Some aspects of this problem can conceivably be corrected should the objective of these researches be something other than designing entities for practical tasks designed for human–machine interaction.

5 Conclusion

In casting off the possibility of the interactive performance, FAIR’s research underrepresents the contextual spirals that are needed for the master game plan. The problem with NLP is that it treats the world as constants in the “natural” used in physics. This paper has argued that humans do not live in such a world; we live, perform, reiterate and recite traces, and language is no exception. The ways we inhabit the world allows what Alan Turing calls imitation, and what Jacques Derrida calls repeating with differences. Each time we repeat a linguistic trace in the present, we do it differently from what we did in the past. The context, or space, revolves around the words and sentences in relation to the previous uses. NLP’s language systems fail to grasp the contextual spirals, an integral part of human performance and languaging, since speech traces, as a consequence of actions/activity, have to be performative.

Recurrent nets, a leading protocol in NLP research, might generate a certain ability and performativity that was not available before, but the statistical network models are problematic in relation to the context component, and this is proven by the interperformative consequence in the case of Alice and Bob. The suggestion to design an alternative model into machine learning is beyond the scope of this article, since it is not an internal problem in RNN models themselves but rather the treatment of ‘natural’ in any recognition models in general. With regard to the constitutive limitations of these models, it would therefore be expected to include context as a variable and linguistic performativity as a method to test the success of the artificial agents. Otherwise, as the case of Alice and Bob shows, these artificial agents would quite rapidly evolve into incomprehensible aliens.


  1. 1.

    FAIR published their early MTD research findings on the Facebook Code Blog—available at



I would like to express my gratitude to my beloved one, who both visibly and invisibly interperformed with me in numerous spaces before, during and after the development of this manuscript. I should also thank my professor Denise Albanese (George Mason University) for her invaluable help in the process of initial revisions of the manuscript.


This research did not receive any specific grant from funding agencies in the public, commercial or non-profit sectors.


  1. Allen JF (2006) Natural language processing. Encyclopedia of cognitive scienceGoogle Scholar
  2. Austin JL (1962) How to do things with words. The William James lectures delivered at Harvard University in 1955. Clarendon Press, OxfordGoogle Scholar
  3. Bennett IM, Babu BR, Morkhandikar K, Gururaj P (2003) US Patent no. 6,665,640. US Patent and Trademark Office, Washington, DCGoogle Scholar
  4. Chowdhury GG (2003) Natural language processing. Ann Rev Inf Sci Technol 37(1):51–89MathSciNetCrossRefGoogle Scholar
  5. Conneau A, Schwenk H, Barrault L, Lecun Y (2016) Very deep convolutional networks for natural language processing. arXiv preprintGoogle Scholar
  6. Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A (2017) Supervised learning of universal sentence representations from natural language inference data. arXiv preprint. arXiv:1705.02364
  7. Danaher J (2018) Toward an ethics of ai assistants: an initial framework. Philos Technol 31(4):629–653CrossRefGoogle Scholar
  8. Derrida J (1988) Signature event context. Limited Inc. Northwestern University Press, EvanstonGoogle Scholar
  9. Gao M, Shi G, Li S (2018) Online prediction of ship behavior with automatic identification system sensor data using bidirectional long short-term memory recurrent neural network. Sensors 18:4211. CrossRefGoogle Scholar
  10. Kelly K, IBM (2018) What’s next for AI? Q&A with the co-founder of Wired Kevin Kelly. IBM Blog.
  11. Leviathan Y, Matias Y (2018) Google duplex: An ai system for accomplishing real-world tasks over the phone. Google AI Blog.
  12. Lewis M, Yarats D, Dauphin YN, Parikh D, Batra D (2017) Deal or no deal? Training AI bots to negotiate. Facebook Code.
  13. Michaely AH, Zhang X, Simko G, Parada C, Aleksic P (2017) Keyword spotting for Google assistant using contextual speech recognition. In: Automatic speech recognition and understanding workshop (ASRU), 2017 IEEE. IEEE, pp 272–278Google Scholar
  14. Mikolov T, Karafiát LM, Burget JC, Khudanpur S (2010) Recurrent neural network based language model. In: Proceedings of interspeech, vol 2, p 3Google Scholar
  15. Oord AVD, Li Y, Babuschkin I, Simonyan K, Vinyals O, Kavukcuoglu K, Casagrande N (2017) Parallel WaveNet: fast high-fidelity speech synthesis. arXiv preprint. arXiv:1711.10433
  16. Russell S, Norvig P (2016) Artificial intelligence: a modern approach (global 3rd edition). Pearson, EssexzbMATHGoogle Scholar
  17. Shannon CE, Weaver W (1949) The mathematical theory of communication. Urbana, ILGoogle Scholar
  18. Tang D, Qin B, Liu T (2015) Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of conference of empirical methods natural language processing, pp 1422–1432Google Scholar
  19. Turing AM (1950) Mind. Mind 59(236):433–460MathSciNetCrossRefGoogle Scholar
  20. Yang Z, Zhang S, Urbanek J, Feng W, Miller AH, Szlam A, Weston J (2017) Mastering the Dungeon: grounded language learning by mechanical Turker Descent. arXiv preprint. arXiv:1711.07950

Copyright information

© The Author(s) 2019

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.Cultural StudiesGeorge Mason UniversityFairfaxUSA

Personalised recommendations