Hybrid-augmented intelligence: collaboration and cognition
The long-term goal of artificial intelligence (AI) is to make machines learn and think like human beings. Due to the high levels of uncertainty and vulnerability in human life and the open-ended nature of problems that humans are facing, no matter how intelligent machines are, they are unable to completely replace humans. Therefore, it is necessary to introduce human cognitive capabilities or human-like cognitive models into AI systems to develop a new form of AI, that is, hybrid-augmented intelligence. This form of AI or machine intelligence is a feasible and important developing model. Hybrid-augmented intelligence can be divided into two basic models: one is human-in-the-loop augmented intelligence with human-computer collaboration, and the other is cognitive computing based augmented intelligence, in which a cognitive model is embedded in the machine learning system. This survey describes a basic framework for human-computer collaborative hybrid-augmented intelligence, and the basic elements of hybrid-augmented intelligence based on cognitive computing. These elements include intuitive reasoning, causal models, evolution of memory and knowledge, especially the role and basic principles of intuitive reasoning for complex problem solving, and the cognitive learning framework for visual scene understanding based on memory and reasoning. Several typical applications of hybrid-augmented intelligence in related fields are given.
KeywordsHuman-machine collaboration Hybrid-augmented intelligence Cognitive computing Intuitive reasoning Causal model Cognitive mapping Visual scene understanding Self-driving cars
The unprecedented development of artificial intelligence (AI) technology (Marr, 1977; Russell and Norvig, 1995) is profoundly changing the relationships and interactive modes between humans and between humans and their physical environments and society (McCarthy and Hayes, 1987; Holland, 1992). With the help of AI, solving various problems of high complexity, uncertainty, and vulnerability in every field of engineering technology, scientific research, and human social activities (Eakin and Luers, 2006; Martin, 2007; Gil et al., 2014; Ledford, 2015) and continuously promoting the development of society and socioeconomics have become the cherished goals of science and technology. AI is an enabling technology leading to numerous disruptive changes in many fields (Minsky, 1961; Stone et al., 2016). Using AI technology reasonably and effectively can greatly promote valuable creativity and enhance the competitiveness in both humans and machines. Thus, AI is no longer an independent, isolated, and self-cycling academic system, but a part of the human evolutionary process.
In recent years, deep learning methods have gained rapid development with the boost in computer data acquisition, storage, and calculation capabilities (Hagan et al., 2002; Sun et al., 2014). A new boom in AI has been triggered, especially in high demand fields such as cloud computing (Youseff et al., 2008), big data (O’Leary, 2013), wearable devices (Son et al., 2014), and intelligent robots (Thrun et al., 1998), which all promote the development of AI theory and technology.
The development of AI can be described using a three-dimensional (3D) space, which includes strength, extension, and capability. Strength refers to the intelligence level of AI systems, extension refers to the scope of the problems that can be solved by AI systems, and capacity refers to the average solution quality that AI systems can provide. General AI systems can do unsupervised learning deftly based on experience and knowledge accumulation. However, general AI cannot be realized with a simple combination of computing models and algorithms from AI methods. DeepBlue (Campbell et al., 2002), Watson (?Rachlin, 2012; Shader, 2016), and AlphaGo (Silver et al., 2016) are AI systems that have achieved great success in challenging human intelligence in some fields by relying on the powerful processing ability of computers. However, these systems cannot evolve to a higher intelligence level by virtue of their own thought processes yet. There is a gap between those systems and general AI in regards to a high self-learning ability (Simon, 1969; Newell and Simon, 1972; Selfridge, 1988).
Intelligent machines have become the intimate companions of humans, where the interaction and cooperation between a human and an intelligent machine will become integral in the formation of our future society. However, many problems that humans face tend to be of high complexity, uncertainty, and open-ended. Because the human is the service object and arbiter in the ultimate ‘value judgment’ of an intelligent machine, human intervention in the machine has been consistent throughout the evolution of these systems. In addition, even if sufficient or infinite data resources are provided for AI systems, human intervention cannot be ruled out of intelligent systems. There are many problems to be solved in AI, for example, how to understand the nuances and fuzziness of human language in the face of the human-computer interaction system, and especially how to avoid the risks or even harms caused by the limitations of AI technology in some important applications, such as industrial risk control (de Rocquigny et al., 2008), medical diagnosis (Szolovits et al., 1988), and the criminal justice system. To solve these problems, human supervision, interaction, and participation must be introduced for verification purposes. Hence, on the one hand, the confidence level in intelligent systems will be improved, and human-in-the-loop hybrid-augmented intelligence will be constructed; on the other hand, human knowledge will be optimally used. Therefore, in this paper, we highlight the concept of hybrid-augmented intelligence, which skillfully combines human cognitive ability and the capabilities of computers in fast operations and mass storage. Particularly, the definitions are as follows:
Definition 1 (Human-in-the-loop hybrid-augmented intelligence) Human-in-the-loop (HITL) hybrid-augmented intelligence is defined as an intelligent model that requires human interaction. In this type of intelligent system, human is always part of the system and consequently influences the outcome in such a way that human gives further judgment if a low confident result is given by a computer. HITL hybrid-augmented intelligence also readily allows for addressing problems and requirements that may not be easily trained or classified by machine learning.
Definition 2 (Cognitive computing based hybrid-augmented intelligence) In general, cognitive computing (CC) based hybrid-augmented intelligence refers to new software and/or hardware that mimics the function of the human brain and improves computer’s capabilities of perception, reasoning, and decision-making. In that sense, CC based hybrid-augmented intelligence is a new framework of computing with the goal of more accurate models of how the human brain/mind senses, reasons, and responds to stimulus, especially how to build causal models, intuitive reasoning models, and associative memories in an intelligent system.
In addition, because of issues with qualification (Thielscher, 2001) and ramification (Thielscher, 1997), not all problems can be modeled; i.e., it is impossible to enumerate all the prerequisites of an action, or to enumerate all the branches following an action. Machine learning cannot understand real world environments, nor can it process incomplete information and complex spatial and temporal correlation tasks better than the human brain does. It is impossible for a formal system of machine learning to describe the interaction of the human brain across the spectrum of non-cognitive factors and cognitive functions or to emulate the high plasticity of the brain’s nervous system. The brain’s understanding of non-cognitive factors is derived from intuition and influenced by empirical and long-term knowledge accumulation (Pylyshyn, 1984). All these biological characteristics of the brain contribute to enhancing the adaptability of machines in complex dynamic environments or on the scene, promoting machine abilities in non-integrity and unstructured information processing and self-learning, and inspiring the building of CC hybrid-augmented intelligence.
CC frameworks can combine the modules for complex planning, problem solving, and perception as well as actions. These frameworks can possibly provide an explanation for some human or animal behaviors and study their actions in new environments, and they could build AI systems that require much less calculation than existing systems.
2 Human-computer collaborative hybrid-augmented intelligence
2.1 Human intelligence vs. artificial intelligence
Humans can learn, speak, think, and interact with the environment to perform actions and study. The capacity for human movement also depends on such learning mechanism. The most ingenious and important ability of human beings is to learn new things. The human brain has the ability for self-adaptation and knowledge inference, which transcends experience. In addition, human is gregarious, a quality where cooperation and dynamic optimization show that collective intelligence is much better than that of any individual. In one word, human intelligence is creative, complex, and dynamic (Guilford, 1967; Sternberg, 1984). The creativity of human beings means that human intelligence is skillful in abstract thinking, reasoning, and innovation, creating new knowledge and making associations. The complexity of human intelligence implies the structural complexity and connective plasticity of the neural system inside the human brain, and the complexity inherent to a series of intuitive, conscious, and thinking mechanisms. At present, there is no common conclusion regarding the mechanism of human intelligence, but it is precisely because of the complex structure of the human brain that human intelligence can better specialize in dealing with non-integrity and unstructured information. The dynamic nature of human knowledge evolution and learning ability makes humans more adept at learning, reasoning, collaborating, and other advanced intelligence activities.
2.2 Limitations of existing machine learning methods
Machine learning (Nilsson, 1965; Michalski et al., 1984; Samuel, 1988) makes it possible to predict the future through the patterns of past data. In short, machine learning can be considered to be the automation of predictive analyses and it generates models based on computer data. When dealing with a new task, the system makes a corresponding judgment according to a data-based model, which is a ‘training & test’ learning mode. This learning mode depends entirely on the machine’s performance and learning algorithms (Bradley, 1997). In fact, the process of using machine learning to deal with complex, dynamic, and unstructured information (Wang et al., 2017) is much more complex than that of the human process because a machine has to make choices between data sources and options, while a human can quickly make a decision according to slight differences in the tasks and the complex relationships among the data.
Machine learning relies excessively on the rules, which results in poor portability and scalability. Thus, it can work only in an environment where there are tight constraints and limited objectives, and it cannot process dynamic, non-complete, and unstructured information. Although hybrid-augmented computational intelligent systems can be constructed by artificial neural networks (Yegnanarayana, 1994), fuzzy reasoning (Mizumoto, 1982), rough sets, approximate reasoning (Zadeh, 1996), and optimization methods, such as evolutionary computation (Fogel, 1995) and group intelligence (Williams and Sternberg, 1988), to overcome individual limitations and achieve synergies to some degree with the integration of different machine learning methods and adaptive techniques, these systems are still incapable of exercising common sense, to solve time-varying complex problems, and to use experience for future decisions (Jennings, 2000). Indeed, no matter how much development happens in machine learning, it is impossible for a machine to complete all the tasks in human society individually. In other words, a human cannot rely completely on machine learning to carry out all work, such as economic decision making, medical problem solving, and mail processing.
Humans are capable of extracting abstract concepts from a small number of samples. However, even though deep neural network (DNN) has made great progress in recent years, it is still difficult to make a machine do such things like a human. However, Lake et al. (2015) used Bayesian learning methods so that a machine can learn how to write letters like a human through a small amount of training data. Compared with traditional machine learning methods, which require a great deal of training data, this method requires only a rough model, and then uses a reasoning algorithm to analyze the case and update the details of the model.
The growth in the amount of data is a source of ‘complexity’ that must be tamed via algorithms or hardware, whereas in statistics, the growth in the amount of data brings ‘simplicity’ in a statistical sense, which often provides more support for reasoning, leading to stronger, asymptotic results. At a formal level, the gap is made evident by the lack of a role for computational concepts (such as ‘runtime’ in core statistical theory) and the lack of a role for statistical concepts (such as ‘risk’ in core computational theory). Therefore, machine learning with a stronger reasoning capacity requires more integration of computational and inferential aspects at the foundational level (Jordan, 2016).
2.3 Human-in-the-loop hybrid-augmented intelligence
The Internet provides an immense innovation space for HITL hybrid-augmented intelligence. Internet information processing is considered by some researchers as the processing of highly structured and standardized semantic information, a process they believe can be processed by computers as long as human knowledge is properly marshaled. In fact, the Internet is full of disorganized, messy fragments of knowledge (Wang et al., 2017), and much of it can be understood only by humans. Therefore, machines cannot complete all the tasks of Internet information processing. Human intervention is still needed on many occasions.
HITL hybrid-augmented intelligence needs to cover the basic functions of computable interaction models, including dynamic reconstruction and optimization, autonomy and adaptivity during interactive sharing, interactive cognitive reasoning, and methodologies for online evaluation. HITL hybrid-augmented intelligence can effectively realize the concept of human-computer communication, especially at the conceptual level of knowledge, where computers can not only provide intelligent-ware in different models, but also talk to human beings at the conceptual level of knowledge.
how to break through the human-computer interaction barrier, so that machines can be trained in the intelligence circuit in a natural way,
how to combine human decision making and experience with the advantages of machine intelligence in logical reasoning, deductive inference, and so on, so that a man-machine collaboration of high efficiency can be realized,
how to build cross-task, cross-domain contextual relations, and
how to construct task- or concept-driven machine learning methods which allow machines to learn from both massive training samples and human knowledge, to accomplish highly intelligent tasks by using the learned knowledge.
HITL hybrid-augmented intelligence is able to process highly unstructured information, generating more accurate and more credible results than what can be derived from a single AI system.
3 Hybrid-augmented intelligence based on cognitive computing
In nature, human intelligence is undoubtedly the most robust. The construction of hybrid-augmented intelligence based on CC, which uses research on effective cooperation mechanisms between biologically inspired information processing systems and modern computers, could possibly provide a novel method to solve the long-term planning and reasoning problems in AI.
3.1 Computing architecture and computing process
The construction of CC hybrid-augmented intelligence should take into consideration computing architecture and computing processes. That is to say, the kind of computing architecture and the kind of computing process needed to complete the calculation must be decided.
Modern computers are based on the von Neumann architecture. The computing process is based on the fact that computing tasks can be formulated by a symbolic system. Running a modern computer is a process of calculation by a formal model (software) in the von Neumann architecture, which can achieve complete and undifferentiated copies of data. Different solutions (software programs) to different problems are required. Once the model is established, its computational capabilities and the tasks it faces are determined.
The computing architecture of a biological intelligence is based on the brain and nervous system. The calculation process of biological intelligence is the process of constantly adapting to an environment or a situation, that is, applying risk judgments and value judgments. The biological intelligence’s information processing mechanism has two aspects. One is a natural evolutionary process, which requires the biological intelligence system to be able to model the status of the environment and of itself and then provide an ‘interpretable model’, which forms the measurement of risk and value. The other is ‘selective attention’ (Moran and Desimone, 1985), which provides an efficient mechanism for comprehensive judgments of risk or value and screening key factors in complex environments, such as children looking for a father’s face in the crowd after school. In many cases, risk and value judgments are based on continuously cycling thinking activities of prediction and choice on the basis of cognitive models, and verification thinking activities evolve and improve the cognitive models, such as summing up an abstract or formulaic experience as a theorem. Biological intelligence is a process of evolution; in addition to the common characteristics, it presents individual differences such as individual experience (memory), value orientation (psychological factors), and even ‘nerve expressions’ of microscopic differentiation. For instance, the same face may be represented differently in different human brains.
Experience indicates that for different tasks, the computing process of a biological intelligence is possibly separated from its computing architecture. Yet, sometimes these two parts cannot be separated from each other (how to identify this separation is also worthy of study). For a computing architecture, the cognitive model of biological intelligence can be used to complete the ‘modeling’ progress and formalize its representation. Finally, taking advantages of modern computers (computing devices), an effective collaborative calculation can be realized. For the computing process, a neuromorphology model can be constructed to emulate the biological brain in structure and processing. Therefore, the critical step in forming an effective CC framework is to develop the hybrid-augmented intelligence inspired by biological intelligence.
3.2 Basic elements of cognitive computing
The above CC process requires construction of a causal model to explain and understand the world. Using the causal model to update the prior probability (the prediction) by the posterior probability (the observation), the association analysis is completed based on the probability analysis of given data, and the time/space-based imagination or prediction (such as spatial variation over time), provides understanding, supplement, and judgment of the environment or situation. Planning action sequences are used to maximize future rewards, and prior knowledge is applied to enrich the reasoning of small-scale data to achieve good generalization ability and fast learning speed.
how to realize brain-inspired machine intuitive reasoning,
how to construct a causal model to explain and understand the world,
how to use the causal model to support and extend learned knowledge through intuitive reasoning, and
how to construct the knowledge evolution model, i.e., how to learn to learn and how to acquire and generate knowledge rapidly through the combination of knowledge.
3.3 Intuitive reasoning and casual model
3.3.1 Intuition and cognitive mapping
Intuition is a series of processes in the human brain including high-speed analysis, feedback, discrimination, and decisions (Fischbein, 2002). Studies have shown that the average accuracy of human intuitive judgment is higher than that of non-intuitive judgment (Salvi et al., 2016). Humans make many decisions through intuition in their daily lives, such as judging the proximity of two objects, perceiving the unfriendliness of another’s tone, and choosing one’s partner or a book. Intuitive decision making is not just done by common sense. It involves additional sensors to perceive and become aware of information from outside.
Intuition can be divided into three processes, namely selective encoding, selective combination, and selective comparison (Sternberg and Davidson, 1983; Sternberg, 1984). Selective encoding involves sifting out relevant information from irrelevant information. However, selective encoding is still insufficient for a human to achieve accurate understanding. Selective combination is also needed to combine the encoded information in some way and form reasonable internal relations with other information as a whole. Thus, selective combination involves combining what might originally seem to be isolated pieces of information into a unified whole that may or may not resemble its parts. Selective comparison involves relating newly acquired information to old information that one already has. When people realize the similarity to a certain degree between the old information and new information, people can use this similarity to achieve a better understanding of the new information.
Therefore, intuition helps humans make rapid decisions in complex and dynamic environments. Besides, it greatly reduces the search space in the process of solving problems and makes the human cognitive process more efficient.
Intelligence represents a model of characterization and facilitates a better ultimate cognition. One kind of cognitive ‘pattern’ that arises in the mind can be thought of as a world model constructed based on prior knowledge. This model contains three kinds of relationships: interaction, causality, and control. The world model can be considered a cognitive map of the human brain, which resembles an image of the environment. It is a comprehensive representation of the local environment, including not only a simple sequence of events but also directions, distance, and even time information. This concept of a cognitive map was first proposed by Tolman (1948). A cognitive map can also be represented by a semantic web (Navigli and Ponzetto, 2012). From the aspect of information processing theory, a cognitive map (or cognitive mapping) is a dynamic process with steps of data acquisition, encoding, storage, processing, decoding, and using external information (O’Keefe and Nadel, 1978).
Humans’ intuitive reasoning is closely related to the prior knowledge processing ability of the brain. This ability is about abstraction and generalization instead of the rote memory of prior knowledge. It is precise because of this ability that human intuition can make rapid risk mitigation decisions based on the world model in the human brain.
3.3.2 Machine implementation of intuitive reasoning
Although a machine has the power of symbolic computation and a storage capacity that the human brain cannot match, it is hard for existing machine learning algorithms to realize the concepts mentioned above, such as a cognitive map, decision space searching, and cost of space, like a human brain.
If the intuitive response can be considered as finding the global optimal solution in the search space, intuition can be regarded as the initial iteration position of the solution. This position is valid with large probabilities. This initial iteration position is not important when solving a simple problem. However, when solving complex problems, compared with traditional machine reasoning methods, the advantages of intuitive reasoning will be highlighted. In the latter case, traditional machine reasoning methods are likely to fall into local minima (Ioffe, 1979), while intuitive reasoning can provide a reasonable initial iteration position so that it can avoid the local minima problem to a great extent.
In practical terms, the solution space is often complex, non-convex, or even structurally indefinable (Hiskens and Davy, 2001). Therefore, the selection of an initial iterative position is critical and can even decide whether the final result is the global optimal solution or not. In common machine learning methods, the initial iteration position is usually obtained at the sacrifice of the generalization abilities of the algorithm, such as the introduction of strong assumptions (Lippmann, 1987) and increasing human intervention (Muir, 1994). Constructing brain-inspired machine intuitive reasoning methods will avoid the problem of local minima and improve the generalization abilities of AI systems. Then, we can establish models for problems with uncertainty.
As seen from the above discussion, intuitive reasoning depends on the heuristics and reference points. The heuristic information is derived from experience, i.e., the prior information, which determines the direction of the problem solving. The choice of the reference point depends on other related factors, which determine the initial iteration position of the solution. Intuitive decision making does not seek to find the absolute solution of the target solution position, but to assess whether or not the deviation from the reference point is more conducive to the avoidance of loss. In reality, intuitive judgments tend to show the characteristics of the minimal cost (or minimal risk) based on the ‘reward and punishment’ rule. Therefore, intuitive reasoning can be simulated by machines. The hybrid-augmented intelligence based on CC requires optimally integrating the two reasoning mechanisms, i.e., intuitive reasoning (Tversky and Kahneman, 1983) and deductive reasoning (Dias and Harris, 1988), based on mathematical induction.
Although AlphaGo has adopted a more general framework, it still involves a great deal of manually encoded knowledge. Designing specific encoding schemes for specific problems is the most common way to describe a problem to be solved in previous and present AI research. However, encoding methods are often manually designed for a particular purpose and do not ensure optimality. Besides, AlphaGo does not have the ability of associative memory. However, AlphaGo combines intuition (Go sense) with explicit knowledge (rules and chessboard) by the nonlinear mapping of deep learning and the jumping of Monte Carlo tree searching (Browne et al., 2012). It is of great value to the research into novel AI technologies.
In the research on hybrid-augmented intelligence, more attention must be paid to other methods of learning and reasoning, such as deep learning based reinforcement learning (Mnih et al., 2015), recurrent neuron network based methods (Mikolov et al., 2010), and differentiable neuron computers (Graves et al., 2016).
3.3.3 Casual model
The causal model in CC can track the development spatiotemporally by cognitive inference at the physical level as well as at the psychological level, which means the learning procedure is guided by the mental state.
Temporal and spatial causality widely exist in many AI tasks, especially in object recognition. For example, Chen D et al. (2016) proposed a tracking system for video applications. Because the spatial causality between the target and the surrounding samples changes rapidly, the ‘support’ in the proposed system is transient. Thus, a short-term regression is used to model the support. The short-term regression is related to support vector regression (SVR), which exploits the spatial causality between targets and context samples and uses spatial causality to help locate the targets based on temporal causality.
3.4 Memory and knowledge evolution
3.4.1 Implementation of memory by artificial neural networks
The human learning mechanism (Nissen and Bullemer, 1987; Norman and O’Reilly, 2003) is based on memory, which is the foundation of human intelligence. The human brain has an extraordinary memory capability and can identify individual samples and analyze the logical and dynamic characteristics of the input information sequence. The information sequence contains a great amount of information and complex temporal correlations. Therefore, these characteristics are especially important for developing models for memory. Memory is equally important, if not more than computation for cognitive functions. Effective memory is able to greatly reduce the cost of computation. Take the cognition of human faces as an example. The human brain can complete this cognitive process through a few photos or even a single photo of a face. This is because a common cognitive basis of human faces has been formed in the brain and the common features of the face have been kept in mind so that identifying the new or unique features of the faces is the only task for a human. However, machines require many training samples to achieve the same level as a human does. For example, in natural language processing tasks such as a question-answering system, a method is needed to temporarily store the separated fragments. Another example is to explain the events in a video and answer related questions about the events, where an abstract representation of the events in the video must be memorized. These tasks require the modeling of dynamic sequences in the time scale and forming long- and short-term memories of historical information properly.
However, information is converted into binary code and written into memory for computer devices. The capacity for memory is completely determined by the size of the storage devices. In addition, storage devices are hardly capable of processing data, which means storage and calculation are completely separated physically and logically. Therefore, it is difficult for existing AI systems to achieve an associative memory function. For future hybrid-augmented intelligence systems, a brain-like memory ability is in great demand (Graves et al., 2013) so that the machine can imitate the human brain’s long- and short-term memories. For example, we could form a stable loop that represents some basic feature, and maintain it for some period of time in a part of the artificial neural network (Williams and Zipser, 1989).
Differentiable neural networks combine well the advantages of the structured memory of a traditional computer with the capabilities for learning and decision making in a neural network, so they can be used to solve complex structured tasks that a traditional neural network is quite unable to do. The core of the differentiable neural network is the controller, which is characterized by a deep neural network, where the controller can carry out intelligent read from and write in memory and reasoning decisions (Mnih et al., 2013; 2015; Lillicrap et al., 2016). For a given condition, the differentiable neural network makes inferences and autonomous decisions based on the relevant experience and knowledge in memory, and constantly makes itself remember, learn, and update strategies to complete the processing of complex data and tasks.
3.4.2 Knowledge evolution
A probabilistic model is established for prior knowledge.
The evolution model is expected to achieve knowledge combinations and update them.
The evolution model is expected to determine whether to go further by using an existing strategy, or trying other methods, which is a self-proven process.
In the process of validation, the evolution model is expected to generate a rich understanding of the environment by taking advantage of causal models through intuitive reasoning and experience at the physical or psychological level (Lake et al., 2016). Such understanding forms the basis of a verification capacity.
3.5 Visual scenes understanding based on memory and inference
The visual system plays a crucial role in understanding a scene through the visual center. The perception of the environment constructs a cognitive map in the human brain. Combining rules and knowledge stored in the memory with external information, such as map navigation information, a driver can make his/her driving decision and then control the vehicle following the decision.
Similarly, a brain-inspired automatic driving framework can be constructed and enlightened by the memory and reference mechanisms of the human brain. When someone is driving, the primary perception to create a basic description of the environment can be formed by the brain with a single glance. According to these environment perception results, integrating with the knowledge of the situation and related rules in the memory, the knowledge map of the traffic scene can be constructed.
4 Competition-adversarial cognitive learning method
4.1 Generative model and adversarial network
The generative model and adversarial network (Xiao et al., 2016) are combined by the competitive and adversarial cognitive learning methods, which can effectively represent the intrinsic nature of the data. This learning framework combines supervised learning with unsupervised learning to form an efficient cognitive learning. Adversarial training was first proposed by Szegedy et al. (2013) and Goodfellow et al. (2014a). The main idea of this learning framework is to train a neural network to correctly classify both normal examples and ‘adversarial examples’, which are bad examples intentionally designed to confuse the model.
Approaches to machine learning can be roughly divided into two categories: generative and discriminative methods. Models obtained by the two methods correspond to a generative model and a discriminative model, respectively. The generative model learns the joint probability distribution P(X, Y) of samples and generates new data according to the learned distribution, and the discriminative model learns the conditional probability distribution P(Y|X). A generative model can be used for unsupervised as well as supervised learning. In supervised learning, the conditional probability distribution P(X|Y) is obtained from the joint probability distribution P(X, Y) according to the Bayes formula; hence, many observation models have been constructed, such as the Naive Bayesian model (Lewis, 1998), mixed Gaussian model (Rasmussen, 2000), and Markov model. An unsupervised generative model is to learn the essential characteristics of real data, to give the distribution characteristics of samples, and to generate new data corresponding to the learned probability distribution. In general, the number of parameters of the generative model is far smaller than the size of the training dataset. Thus, the generative model can discover data interdependency and manifest the high-order correlation of the data without labeling information.
Generative adversarial networks (GANs) (Goodfellow et al., 2014b; Denton et al., 2015; Radford et al., 2015) were proposed to promote the training efficiency of the generative model, and to solve the problems that the generative model fails to process. GAN consists mainly of two parts, a generative network used to generate samples and a discriminator used to identify the source of the samples. When new samples generated by the generative network and real-world samples are fed into the discriminator, the discriminator will distinguish the two kinds of samples as accurately as possible. The generative network tries to generate new samples that cannot be discriminated by the discriminator (Mirza and Osindero, 2014; Salimans et al., 2016; van den Oord et al., 2016). Actually, generative adversarial learning is inspired by the zero-sum game from the game theory (Nash, 1950). During the training, the parameters of the generative model and the discriminative model are alternately updated (update one when the other is fixed) to maximize each other’s error rate. Thus, the two parts compete with each other in an unsupervised manner, and ultimately a nearly perfect generative model can be obtained.
4.2 Generative adversarial networks in self-driving cars
First, the construction of a reliable and safe unmanned vehicle system requires to learn a variety of complex road scenes and extreme situations. However, in reality, the collected data cannot cover all of the road conditions. Therefore, complex and vivid scenes with more traffic elements need to be constructed to train a more robust unmanned system online by combining the generative adversarial network with traffic knowledge base and structure of the cognitive map.
Machine learning systems will no longer rely too much on manually labeled data with models that can generate similar data according to the limited labeled data. Hence, an unsupervised computing architecture, which relies only on a small amount of manually labeled data, can be constructed for efficiently competitive adversarial learning.
5 Typical applications of hybrid-augmented intelligence
AI technology is creating numerous new products and changing the way of people’s work, study, and life in almost every aspect. It has become a powerful driving force to promote sustained growth and innovative development of social economy. In this section we introduce some typical applications of hybrid-augmented intelligence.
5.1 Managing industrial complexities and risks
Managing industrial complexities and risks is a typical application of hybrid-augmented intelligence. In the networked era, how to manage the complexity and inherent risks of industry in a modern economic environment has become a daunting task for many sectors (Shrivastava, 1995). Due to the dynamic nature of the business environment, various industrial environments are facing extensive risks and uncertainties. In addition, the importance of enterprise risk control, business process socialization, business social networks, and the configuration of social technology are promoted extensively because of the advances in our information society and sociocultural environment. The socialization of business is a process of socialization defined, specified, and implemented by an organization for the purpose of achieving implicit or explicit business benefits. The socialization and business social networks require establishing a specific business-driven social structure, or a specific business configuration. They facilitate the flow of information and knowledge (primarily through advanced technologies such as the Internet and AI) and contribute to business intelligence. In particular, the external networks of enterprises not only directly affect the competitiveness of enterprises, but also indirectly affect the competitiveness of enterprises by influencing the internal resources such as total assets and levels of technical expertise (Hu et al., 2010; 2013).
5.2 Collaborative decision-making in enterprises
Collaborative decision-making is critical to almost all businesses (Hoffman, 1998; Ball et al., 2001). The free exchange of ideas in an enterprise is likely to create more innovative products, strategic solutions, and lucrative business decisions (Fjellheim et al., 2008).
5.3 Online intelligence learning
To provide personalized tutorial, the online hybrid-augmented intelligence learning system can construct a sense-making model dynamically, and plan different learning schemes according to different abilities and responses of learners. The core of the system is to transform traditional education into a customized and personalized learning system, which will profoundly change the formation and dissemination of knowledge.
5.4 Medical and healthcare
The applications of cognitive medical hybrid-augmented intelligence systems with humancomputer interaction, medical imaging, biosensors, and nano surgery will bring a revolutionary change to the medical field.
5.5 Public safety and security
A typical example of an anomaly prediction task is sentiment analysis (Zhao et al., 2010). With the development of social networks, analyzing sentiment access by Internet data is possible. Sentiment analysis is an effective means to predict the occurrence of abnormal events (public safety events). Facing massive unstructured Internet data, it is, however, impossible for humans to predict abnormal events without the aid of Al. This requires the prediction module of the security system to process large-scale data automatically and hand the results to humans, who will make further judgment. In the process of anomaly detection and subsequent disposition, there is a similar interaction mechanism between a human and a security system. Thus, a security system based on human-computer collaborative hybrid-augmented intelligence is formed.
At present, surveillance cameras are deployed almost everywhere, which can provide massive video streams for monitoring public security. Due to the lack of manpower, those videos are not fully used. Hybrid-augmented intelligence based on CC can detect suspect events and characters from massive data (e.g., dangerous carry-on items, anomaly postures, and anomaly crowd behaviors). For results with low confidence or significant impact, experts will get involved and interact with the security system and make further judgments by their intuition and domain knowledge. Meanwhile, a cognition model can leverage experts’ feedbacks to improve the analytical ability for video understanding and finally, a better and faster system can be achieved for prediction, detection, and subsequent disposition of anomaly events.
5.6 Human-computer collaborative driving
Automatic driving system (Varaiya, 1993; Waldrop, 2015) is a highly integrated AI system and also a hotspot of research in recent years. Currently, fully automatic driving is still facing difficult technological challenges. A conception of human-computer collaborative driving was first put forward in the 1960s (Rashevsky, 1964). Along with the development of intelligent transportation systems, 5G communication technologies, and vehicle networking, human-computer collaborative driving has become more and more robust and advanced (Im et al., 2009).
The key problems of man-machine collaborative driving are how to realize machine perception and judgment, interaction of information in machine and humans’ cognition, and decision-making (Saripalli et al., 2003). Therefore, how to coordinate the two ‘drivers’ to realize safe and comfortable driving of the vehicle is a pressing fundamental problem faced by the hybrid-augmented intelligence man-machine collaborative driving system.
Man-machine collaborative driving is also able to provide an approach to driving learning for the automatic driving intelligence system. The system learns the actions of human drivers, including driving behavioral psychology from the process of man-machine collaborative driving.
5.7 Cloud robotics
In recent years, robots have been widely used in industrial manufacturing (Johnson et al., 2014; Schwartz et al., 2016), life services (Walters et al., 2013; Boman and Bartfai, 2015; Hughes et al., 2016), military defense (Gilbert and Beebe, 2010; Barnes et al., 2013), and other fields. However, traditional robots have the problem of simplification of instructions, which makes them difficult to update the knowledge among the robots and hard to interact with humans; so, it is difficult to carry out complex tasks. Therefore, how to enhance the intelligence of an individual in a multi-robot collaborative system is a major challenge for multi-robot collaborative augmented intelligence
In addition, entertainment is an important application of hybrid-augmented intelligence. In recent years, technologies such as augmented reality and virtual reality have been widely used in game industry, such as Pokemon Go, which enhances humans’ participation by superimposing users’ real scenes and game virtual scenes, promotes the development of game industry, and contributes to the technological progress. Moreover, social platforms such as Facebook and WeChat, shopping websites, and other entertainment websites push related information to users by making personal preference analysis, which can become more effective and accurate by introducing human-computer collaborative hybrid-augmented intelligence.
Intelligence machines have become human companions, and AI is profoundly changing our lives and shaping the future. Ubiquitous computing and intelligence machines are driving people to seek new computational models and implementation forms of AI. Hybrid-augmented intelligence is one of the important directions for the growth of AI.
Building human-computer interaction based HITL hybrid-augmented intelligence by combining perception and cognitive capabilities of humans with the computer’s capabilities to calculate and store data can greatly enhance AI system’s decision-making capability, the level of cognitive sophistication required to handle complex tasks, and adaptability to complex situations. Hybrid-augmented intelligence based on CC can solve the problems of planning and reasoning that AI research area has been facing for a long time through intuitive reasoning, experience learning, and other hybrid models.
In this survey, the importance of the development of human-computer cooperative hybrid-augmented intelligence and its basic framework are described on the basis of discussing the limitations of existing machine learning methods. The basic problems of hybrid-augmented intelligence based on CC such as intuitive reasoning, causal modeling, memory, and knowledge evolution are discussed, and the important role and basic approach of intuitive reasoning in complex problem solving are described. The visual scene understanding method based on memory and reasoning is also presented. Finally, typical applications of hybrid-augmented intelligence in the fields of managing industrial complexities and risks, collaborative decision-making in enterprises, online intelligent learning, medical and healthcare, public safety and security, human-computer collaborative driving, and cloud robotics are introduced. We encourage both the industry and academia to investigate and enrich hybrid-augmented intelligence, in both theory and practice.
We are grateful to the reviewers for their valuable comments which helped us improve the manuscript.
- Ando, R.K., 2007. Biocreative II gene mention tagging system at IBM Watson. Proc. 2nd BioCreative Challenge Evaluation Workshop, p.101–103.Google Scholar
- Ando, R.K., Dredze, M., Zhang, T., 2005. Trec 2005 genomics track experiments at IBM Watson. 14th Text REtrieval Conf., p.1–10.Google Scholar
- Ball, M.O., Chen, C.Y., Hoffman, R., et al., 2001. Collaborative decision making in air traffic management: current and future research directions. In: Bianco, L., Dell’Olmo, P., Odoni, A.R. (Eds.), New Concepts and Methods in Air Traffic Management. Springer Berlin Heidelberg, Berlin, Germany, p.17–30. http://dx.doi.org/10.1007/978-3-662-04632-6CrossRefGoogle Scholar
- Barnes, M.J., Chen, J.Y.C., Jentsch, F., et al., 2013. An overview of humans and autonomy for military environments: safety, types of autonomy, agents, and user interfaces. Proc. 10th Int. Conf. on Engineering Psychology and Cognitive Ergonomics: Applications and Services, p.243–252. https://dx.doi.org/10.1007/978-3-642-39354-9_27CrossRefGoogle Scholar
- Cimbala, S.J., 2012. Artificial Intelligence and National Security. Lexington Books, Lanham, USA.Google Scholar
- Denton, E.L., Chintala, S., Fergus, R., et al., 2015. Deep generative image models using a Laplacian pyramid of adversarial networks. Proc. 28th Int. Conf. on Neural Information Processing Systems, p.1486–1494.Google Scholar
- Dias, M.G., Harris, P., 1988. The effect of make-believe play on deductive reasoning. Br. J. Devel. Psychol., 6(3):207–221. http://dx.doi.org/10.1111/j.2044-835X.1988.tb01095.xGoogle Scholar
- Dounias, G., 2003. Hybrid computational intelligence in medicine. Proc. Workshop on Intelligent and Adaptive Systems in Medicine.Google Scholar
- Eakin, H., Luers, A.L., 2006. Assessing the vulnerability of social-environmental systems. Ann. Rev. Environ. Resourc., 31:1–477. http://dx.doi.org/10.1146/annurev.energy.30.050504.144352Google Scholar
- Fischbein, H., 2002. Intuition in Science and Mathematics: an Educational Approach. Springer Science & Business Media, Berlin, Germany.Google Scholar
- Fogel, D.B., 1995. Evolutionary Computation: Toward a New Philosophy of Machine Intelligence. Wiley-IEEE Press.Google Scholar
- Gilbert, G.R., Beebe, M.K., 2010. United States Department of Defense Research in Robotic Unmanned Systems for Combat Casualty Care. Report No. RTO-MP-HFM-182, Fort Detrick, Frederick, USA.Google Scholar
- Goodfellow, I.J., Shlens, J., Szegedy, C., 2014a. Explaining and harnessing adversarial examples. ePrint Archive, arXiv:1412.6572.Google Scholar
- Goodfellow, I.J., Pougetabadie, J., Mirza, M., et al., 2014b. Generative adversarial nets. Advances in Neural Information Processing Systems, p.2672–2680.Google Scholar
- Graves, A., Wayne, G., Danihelka, I., 2014. Neural turing machines. ePrint Archive, arXiv:1410.5401.Google Scholar
- Guilford, J.P., 1967. The Nature of Human Intelligence. McGraw-Hill, New York, USA.Google Scholar
- Hagan, M.T., Demuth, H.B., Beale, M.H., et al., 2002. Neural Network Design. PWS Publishing Co., Boston, USA.Google Scholar
- Hilovska, K., Koncz, P., 2012. Application of artificial intelligence and data mining techniques to financial markets. ACTA VSFS, 6:62–76.Google Scholar
- Hoffman, R., 1998. Integer Programming Models for Ground-Holding in Air Traffic Flow Management. PhD Thesis, University of Maryland, College Park, USA.Google Scholar
- Holland, J.H., 1992. Adaptation in Natural and Artificial Systems: an Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. MIT Press.Google Scholar
- Honey, C.J., Thivierge, J.P., Sporns, O., 2010. Can structure predict function in the human brain? NeuroImage, 52(3):766–776. http://dx.doi.org/10.1016/j.neuroimage.2010.01.071Google Scholar
- Hu, P., Wen, C.L., Pan, D., 2013. The mutual relationship among external network, internal resource, and competitiveness of enterprises. Sci. Res. Manag., V(4):90–98 (in Chinese).Google Scholar
- Hughes, D., Camp, C., O’Hara, J., et al., 2016. Health resource use following robot-assisted surgery versus open and conventional laparoscopic techniques in oncology: analysis of English secondary care data for radical prostatectomy and partial nephrectomy. BJU Int., 117(6):940–947.Google Scholar
- Im, D.Y., Ryoo, Y.J., Kim, D.Y., et al., 2009. Unmanned driving of intelligent robotic vehicle. ISIS Proc. 10th Symp. on Advanced Intelligent Systems, p.213–216.Google Scholar
- Janis, I.L., Mann, L., 1977. Decision Making: a Psychological Analysis of Conflict, Choice, and Commitment. Free Press, New York, USA.Google Scholar
- Lake, B.M., Ullman, T.D., Tenenbaum, J.B., et al., 2016. Building machines that learn and think like people. Behav. Brain Sci., 22:1–101.Google Scholar
- Lillicrap, T.P., Hunt, J.J., Pritzel, A., et al., 2016. Continuous control with deep reinforcement learning. ePrint Archive, arXiv:1509.02971.Google Scholar
- Martin, J., 2007. The Meaning of the 21st Century: a Vital Blueprint for Ensuring Our Future. Random House.Google Scholar
- Mikolov, T., Karafiát, M., Burget, L., et al., 2010. Recurrent neural network based language model. Conf. of the Int. Speech Communication Association, p.1045–1048.Google Scholar
- Mirza, M., Osindero, S., 2014. Conditional generative adversarial nets. ePrint Archive, arXiv:1411.1784.Google Scholar
- Mnih, V., Kavukcuoglu, K., Silver, D., et al., 2013. Playing Atari with deep reinforcement learning. ePrint Archive, arXiv:1312.5602.Google Scholar
- Moran, J., Desimone, R., 1985. Selective Attention Gates Visual Processing in the Extrastriate Cortex. MIT Press, Cambridge, USA.Google Scholar
- Newell, A., Simon, H.A., 1972. Human Problem Solving. Prentice-Hall, Englewood Cliffs, USA.Google Scholar
- Noh, H., Hong, S., Han, B., 2015. Learning deconvolution network for semantic segmentation. IEEE Int. Conf. on Computer Vision, p.1520–1528.Google Scholar
- O’Keefe, J., Nadel, L., 1978. The Hippocampus as a Cognitive Map. Clarendon Press, Oxford.Google Scholar
- Park, C.C., Kim, G., 2015. Expressing an image stream with a sequence of natural sentences. Advances in Neural Information Processing Systems, p.73–81.Google Scholar
- Pylyshyn, Z.W., 1984. Computation and Cognition: Toward a Foundation for Cognitive Science. The MIT Press, Cambridge, Massachusetts, USA.Google Scholar
- Rachlin, H., 2012. Making IBM’s computer, Watson, human. Behav. Anal., 35(1):1–16.Google Scholar
- Radford, A., Metz, L., Chintala, S., 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. ePrint Archive, arXiv:1511.06434.Google Scholar
- Rashevsky, N., 1964. Man-machine interaction in automobile driving. Prog. Biocybern., 42:188–200.Google Scholar
- Rasmussen, C.E., 2000. The infinite Gaussian mixture model. Advances in Neural Information Processing Systems, p.554–560.Google Scholar
- Salimans, T., Goodfellow, I., Zaremba, W., et al., 2016. Improved techniques for training gans. Advances in Neural Information Processing Systems, p.2226–2234.Google Scholar
- Selfridge, O.G., 1988. Pandemonium: a paradigm for learning. National Physical Laboratory Conf., p.511–531.Google Scholar
- Simon, H.A., 1969. The Sciences of the Artificial. MIT Press, Cambridge, USA.Google Scholar
- Sternberg, R.J., 1984. Beyond IQ: a triarchic theory of human intelligence. Br. J. Educat. Stud., 7(2):269–287.Google Scholar
- Stone, P., Brooks, R., Brynjolfsson, E., et al., 2016. Artificial Intelligence and Life in 2030. One Hundred Year Study on Artificial Intelligence: Report of the 2015-2016 Study Panel, Stanford University, Stanford, USA.Google Scholar
- Szegedy, C., Zaremba, W., Sutskever, I., et al., 2013. Intriguing properties of neural networks. ePrint Archive, arXiv:1312.6199.Google Scholar
- van den Oord, A., Kalchbrenner, N., Kavukcuoglu, K., 2016. Pixel recurrent neural networks. ePrint Archive, arXiv:1601.06759.Google Scholar
- Wang, F.Y., 2004. Artificial societies, computational experiments, and parallel systems: a discussion on computational theory of complex social-economic systems. Compl. Syst. Compl. Sci., 1(4):25–35.Google Scholar
- Wang, J.J., Ma, Y.Q., Chen, S.T., et al., 2017. Fragmentation knowledge processing and networked artificial. Seieat. Sin. Inform., 47(1):1–22.Google Scholar
- Xiao, C.Y., Dymetman, M., Gardent, C., 2016. Sequencebased structured prediction for semantic parsing. Meeting of the Association for Computational Linguistics, p.1341–1350.Google Scholar
- Yau, S.S., Gupta, S.K.S., Karim, F., et al., 2003. Smart classroom: enhancing collaborative learning using pervasive computing technology. ASEE Annual Conf. and Exposition, p.13633–13642.Google Scholar
- Zhao, Y.Y., Qin, B., Liu, T., 2010. Sentiment analysis. J. Softw., 21(8):1834–1848.Google Scholar