1 Introduction

Over the past decades, significant progress in the field of artificial intelligence (AI), machine learning, and deep learning has been made with several real-world problems having been successfully solved. The success in these fields has resulted in the emergence of various methods including fuzzy logic, swarm intelligence, genetic programming, and hybrid approaches, such as neuro-fuzzy and genetic fuzzy systems, all of which have contributed to the design and analysis of complex intelligent systems. Among these methods, deep learning techniques such as deep neural networks (DNN) have made major advances in solving problems that have resisted the best attempts of the AI community for many years. The term “deep” is used because the depth of the network is greater than that of conventional neural networks, which are often referred to as shallow networks (Paul and Singh 2015). Conventional neural networks are limited in their ability to process natural data in their raw form. For decades, the construction of pattern-recognition or machine-learning systems requires careful engineering and considerable domain expertise when designing a feature extractor that transforms the raw data into a suitable internal representation or feature vector from which the learning system (often a classifier) can detect or classify patterns in the input (Ashraf et al. 2020). Compared to typical neural networks with a single hidden layer, a DNN applies representation learning that allows a machine to be fed with raw data and automatically discover the representations needed for detection or classification using multiple hidden layers (LeCun et al. 2015). Hence, a DNN has turned out to be very good at discovering intricate structures in high-dimensional data, and thus is applicable to many domains of science, business, and engineering.

Although a DNN is an effective approach for handling big data problems, the superior accuracy of the model, however, comes at the cost of high complexity. It is therefore essential to note a few points before employing this type of network to solve certain problems. Because a DNN uses more than one hidden layer, it can provide a deeper analytical model; however, each added layer adds computational complexity (Sharma 2019). Further, such networks are inspired by a traditional neural network that utilizes the gradient descent optimization approach for network training. Hence, the DNN frequently encounters the problem of being stuck in the local minima. In addition to these challenges, as the major disadvantage of a DNN, the model is often criticized as being non-transparent and its predictions are not traceable by humans owing to its black-box nature (Buhrmester et al. 2019). It is challenging to trust the findings generated by such deep networks. Hence, there is always the possibility of a communication gap occurring between analysts and DNNs (Bonanno et al. 2017; Hayashi 2020). This downside more often limits the usability of such networks in the majority of real-world problems, where verification of the predicted results is a major concern.

To cope with these problems, few studies from the literature (Aviles et al. 2016; El Hatri and Boumhidi 2018; Zhang et al. 2020a, b, c) have combined a DNN with fuzzy systems to produce a novel deep neuro-fuzzy system (DNFS). Fuzzy systems are structures established on fuzzy techniques oriented toward information processing, and are mainly used for implementation in systems where the use of classical binary logic is impossible or difficult. Their main characteristic involves a symbolic knowledge representation in the form of fuzzy conditional IF–THEN rules (Czabanski et al. 2017). Therefore, the novel hybridization of a DNN and fuzzy systems has demonstrated an effective way to reduce uncertainty using fuzzy rules.

As an emerging hybrid approach, the use of DNFS has gained enormous popularity among research communities during the past 5 to 6 years in the field of AI. Hence, positive growth in the implementation of this model can be seen in distributed systems, cloud computing, healthcare, and various other areas. However, to the best of the authors’ knowledge, no systematic review has been conducted with the sole focus on highlighting the current progress in the domain of DNFS with detailed facts and figures.

This study, therefore, presents a systematic literature survey of the research work published between the year 2015 and 2020 with the following major contributions:

First, a comprehensive methodology has been designed to perform an in-depth search in a systematical way by following a revised study mapping process comprised of seven phases (shown in Fig. 1).

Fig. 1
figure 1

Revised study mapping process

This paper contributes to deliver the basic concept of DNFS and highlights some of the open questions covering different variations of structural designs that have been introduced in the literature with a combination of deep neural networks and fuzzy systems.

The study also covers the optimization methods and techniques that have been widely used to train and optimize the parameters of DNFS.

In addition, this paper presents information regarding the intensity of the research conducted in this discipline by performing extensive searches in different scientific databases.

One of the research questions included in this paper intends to highlight the applications of DNFS, which is one of the main focuses of this study.

Finally, this survey highlights the research gaps, issues, and challenges that require further attention from researchers. It provides a comprehensive body of knowledge and delivers the current status of this particular field, while suggesting some potential future directions.

The remainder of this paper proceeds as follows. Section 2 highlights related research based on the available survey studies from the literature, whereas Sect. 3 presents the methodology designed to conduct this systematic review. Section 4 answers the research questions set out in our systematic survey by analyzing the synthesized results of identified publications from available sources in the literature. The identified issues, gaps, challenges, and future areas of study are discussed in Sect. 5. Finally, the conclusions of this study are presented in Sect. 6.

2 Related work

From the past 5 to 6 years, successful attempts to hybridize deep learning and fuzzy systems have attracted researchers to implement such a method in various real-world applications. An extensive literature have been published to date, focusing on experimenting with the model in new domains where it has not been implemented in the past. However, considering that DNFS is a novel approach, at present, very few survey studies have been carried out delivering the overall insight regarding this domain. Therefore, the focus of this section is to highlight the survey studies that are conducted under the domain of DNFS. These survey studies were carefully selected and studied to create a general idea of the present state of DNFS research.

The survey performed by Dorzhigulov and James (2020) mainly focuses on neuro-fuzzy and similar machine learning models from the perspective of functionality and architectures. In their study, the authors presented an overview of fuzzy systems and described the stunning journey related to the hybridization of fuzzy systems and neural networks. Moreover, deep learning methods such as a DNN can be integrated with fuzzy systems to introduce automated optimization of neural architectures. Therefore, this study described the DNFS architectures, including an adaptive neuro-fuzzy inference system (ANFIS), fuzzy neural networks, fuzzy trees, and overviews of neural architectures that use some fuzzy elements, such as radial basis function networks (RBFN) and a fuzzy adaptive resonant theory map (ARTMAP).

The literature has shown significant interest in the domain of control systems and classification using neuro-fuzzy systems. However, most of the neuro-fuzzy systems presented in the literature are software-based solutions that provide improved training algorithms or mathematical and architectural modifications of the model. However, neuro-fuzzy systems still face challenges of slow training when dealing with big data, which affects its overall performance. There have been limited studies implementing neuro-fuzzy systems as dedicated high-performance hardware, including (Jhang et al. 2018; Khati et al. 2019; Mata-Carballeira et al. 2019), that have proposed the use of field-programmable gate array (FPGA) devices. This hardware solution tends to be more efficient and faster, but with a trade-off in flexibility. Recently, only one study (Marlen and Dorzhigulov 2018) can be seen using memristive crossbar arrays with a fuzzy membership function that acts as a resistor, capacitor, and inductor. Hence, the study of Dorzhigulov and James (2020) suggests that in the near future, hardware solutions should be used to improve the performance and speed of these hybrid approaches.

In the same vein, another recent and interesting study (Das et al. 2020) found in the literature explored the different ways in which deep learning is improved with fuzzy logic systems along with the utilization of the model in various real-life applications. It can be seen that using fuzzy theory along with deep learning can improve the performance of models in which the data are noisy, heterogeneous, incomplete, or vague. However, a problem of computational complexity may occur when utilizing fuzzy systems. The availability of software platforms such as the Compute Unified Device Architecture (CUDA) of Nvidia, the Radeon Open Compute (ROCm) ecosystem released by Advanced Micro Devices, Inc. (AMD), and the Math Kernel Library (MKL) by Intel further accelerate the deep learning processes. The computation of fuzzy parameters is time-consuming using the presently available architectures, despite the models providing resistance to noise and searches over a wider space. Alternatively, fuzzy logic can be used alongside standard deep learning models to process the input or output. Models can make use of fuzzified inputs coupled with standard deep learning models such as deep belief network (DBN) or convolutional neural networks (CNN). This allows leveraging software platforms to accelerate DNN training using fuzzy systems. This study further suggests exploring better ways to improve the performance of fuzzy deep learning models in the future.

Taking a deeper look into deep learning-based neuro-fuzzy systems, an excessive and appealing approach can be found in (Singh and Lone 2020). This study develops the basics for readers from fuzzy sets to the concepts of fuzzy rules and reasoning to understand membership functions with the help of real-world scenarios and case studies in simple mathematics. Furthermore, it describes the working style of Mamdani fuzzy inference systems, Takagi–Sugeno–Kang (TSK) fuzzy inference systems, and Tsukamoto fuzzy inference systems, along with explanations of how these three models vary from each other. A CNN, natural language processing (NLP), and recurrent neural networks (RNN) were implemented in the subject area of computer vision and time-series prediction. Different variations of the architectures have been defined with the integration of fuzzy systems and deep learning. Insight into these hybrid approaches as intelligent systems in the modern world are provided. In addition, it simplifies the implementation of fuzzy logic, neural networks, DNFS, and related concepts using Python, which encourages readers to experiment with these machine learning and deep learning methods. This study not only builds the fundamentals but also encourages newcomers in the field of AI to implement these methods in their respective research areas.

The fourth and last survey study was conducted by de Campos Souza (2020). This study aims to describe the proposed methodologies and existing and improved techniques, including the implementation of neuro-fuzzy in applications such as pattern classification, time-series prediction, fault detection, and various other approaches developed since the year 2000. Moreover, the author provided a well-defined model architecture, describing its problem-solving abilities, mechanisms, training algorithms, and different ways to extract information through fuzzy rules. The major emphasis of the study was to specifically survey neuro-fuzzy systems from the literature that provide supervised learning. The central focus of this survey is to gain in-depth knowledge of the neuro-fuzzy systems without the implementation of deep learning. However, a small section of the study focuses on the training of neuro-fuzzy models using deep learning methods to perform the tasks of data classification (Deng et al. 2017), traffic incident detection (El Hatri and Boumhidi 2018), and sentiment analysis (Nguyen et al. 2018). The survey also mentioned a few studies implementing a semi-supervised DNFS for image classification (Xiaowei Gu and Angelov 2018) and remote sensing scene classification tasks (Gu et al. 2018). As future work, this study not only suggests exploring dynamic hybrid model architectures, it also advises on addressing new learning algorithms.

From this section, it is clear that currently there are only four survey studies present in the literature that discuss novel DNFS approaches. It is important to note that these studies did not focus on DNFS only. The main motivation of these survey studies is to build general concepts about DNFS models along with various similar techniques, such as neuro-fuzzy systems without a deep learning approach. Although these studies comprehend helpful knowledge on understanding the basic concept of a hybrid model, they do not explore the trends and developments regarding DNFS. Based on these observations, Table 1 provides a summarized comparative analysis of the above-mentioned four survey studies.

Table 1 Comparative analysis of the review studies presented in the literature

The summary of the comparative analysis presented in Table 1 provides confirmatory evidence of a research gap. The current literature does not offer comprehensive, systematic, and more importantly, quantitative research knowledge for researchers who wish to explore the scope and current research progress on DNFS in the area of AI, particularly in hybrid approaches.

Therefore, our systematic survey aims to broaden the knowledge of readers by presenting not only a basic structural understanding, but also aims to deliver the information regarding the optimization methods and application domains through a quantitative analysis, facts, challenges, scope, and future suggestions. To fulfill the above-mentioned objectives, the next section (Sect. 3) provides the detailed methodology that has been followed to construct this systematic survey.

3 Methodology

The purpose of this systematic literature survey is to provide a complete list of all possible studies related to DNFS reported in the recent literature between the year 2015 and 2020. Various attractive approaches to conduct the survey studies have been implemented in (Singh and Singh 2020; Yu and Pan 2021; Yu and Sheng 2020). The guidelines followed in this paper are taken from systematic literature reviews published by Baashar et al. (2020), Schön et al. (2017), and Muhammed et al. (2018), whereas, a revised study mapping process illustrated in Fig. 1 is designed for this review paper, which is a combination of preferred reporting items for systematic reviews and meta-analyses (PRISMA) statement (Moher et al. 2009; Safdar et al. 2018); and a systematic mapping process presented by Hussain et al. (2019). It is comprised of seven phases (as shown in Fig. 1), i.e., a preliminary study, formulation of research questions, identification of the search criteria, all papers found in a literature search, the screening process, an eligibility and quality assessment of the selected studies, data extraction and compilation, and the final list of articles included in this systematic study after applying the exclusion criteria.

3.1 Phase 1: Preliminary study

A preliminary study is an initial and significant phase applied when conducting a systematic literature survey. In this phase, we narrow down the search parameters by obtaining the background information for studies related to deep neuro-fuzzy systems. A random search was performed on the common search engines over the Internet using a single domain-specific keyword, such as deep neuro-fuzzy systems. Afterward, we filtered out the search results and focused on those studies that helped us to choose search strings strictly related to the domain of our study.

3.2 Phase 2: Formulating research questions

In this phase, we formulated the key research questions that can direct us throughout the research and writing process. We determined the quality of the research questions based on the elements of constructiveness, focus, and relevance to a specific area or problem. Therefore, an extensive research has been conducted on the development of neuro-fuzzy systems (NFS). However, it is challenging to find sufficient literature guidance regarding novel deep neuro-fuzzy systems (DNFS) that involve deep learning methods such as basic DNNs. Therefore, the primary motivation behind this study is to prepare a systematic review regarding the development of DNFS, their optimization methods, and their application subjects.

The following are the fundamental research questions:

RQ-1:

What are the fundamental concepts related to deep neuro-fuzzy systems?

Motivation Answering this question, will set the basics and help in understanding the primary knowledge regarding deep neuro-fuzzy systems.

RQ-2:

What approaches have been widely employed for the optimization of deep neuro-fuzzy systems?

Motivation This question aims to identify optimization techniques involved in the training and learning of deep neuro-fuzzy systems.

RQ-3:

What is the intensity of publications in the domain of deep neuro-fuzzy systems in terms of year-wise and directory-wise?

Motivation In this question, the potential of deep neuro-fuzzy systems is investigated using conference proceedings, journals, articles, and book chapters.

RQ-4:

What are the most promising and practical application subjects and domain areas where deep neuro-fuzzy systems have been implemented?

Motivation This question investigates the studies that highlight the application, particularly in the domain of deep neuro-fuzzy systems in multiple potential subject and domain areas.

3.3 Phase 3: Identification of search criteria

The preliminary search was conducted using the Google Scholar search engine, which helped to formulate the research questions for this study. Next, research questions were used in this survey to identify the relevant keywords and search terms/strings related to the DNFS topic. The identification stage of this study further helped us to explore the topic from a broader perspective while employing specific search criteria. In this survey study, five scientific databases and venues (Table 2) were selected for a more advanced search using finalized keywords (Table 3) from the domain of DNFS. The selected databases were chosen because they offer a wide variety of the most essential and highest-impact journals and conference proceedings. Because each scientific database uses different search features and filters to perform a systematic search, it is important to adjust the search string for every scientific database. Table 2 presents the search criteria used to apply the systematic search for this study using the selected scientific databases and search engines.

Table 2 Identified search criteria
Table 3 Keywords and search strings

The search was performed with the aim of obtaining the maximum number of relevant studies published in these databases based on the specific keywords and search strings using the logical operators OR and AND, as presented in Table 3. However, it is important to modify the search strings to meet the unique specifications of each database. Moreover, the search strings highlighted in Table 3 were used to retrieve research papers for RQ1, RQ2, and RQ4. Meanwhile, RQ3 is specifically formulated to investigate the intensity and potential of DNFS based on the publications published between the year 2015 and 2020 using the keywords and search strings shown in Table 3.

3.4 Phase 4: Screening process (inclusion and exclusion criteria)

After completing the database search using the identified keywords and search strings, all retrieved papers were screened based on the inclusion and exclusion criteria used to filter and select only the studies that were relevant to answering the research questions, while excluding studies that were not. Table 4 lists the criteria that were followed throughout the screening process to evaluate each paper and decide whether to include or exclude the paper in the systematic literature survey.

Table 4 Inclusion and exclusion criteria for search screening

3.5 Phase 5: Eligibility and quality assessment

In addition to the inclusion and exclusion criteria, it is critical to assess the eligibility and relevance of the primary studies found in the previous stage. In this survey, three quality assessment scores applied to answer every question were adopted from (Hordri et al. 2017) to examine the eligibility of individual study based on the factors, such as the significance of a particular study, the quality of the results and analysis, and future research guidelines or findings. The scoring procedures are 1 (Yes), 0.5 (Partly), and 0 (No), whereas Table 5 describes the quality assessment scores employed to check the eligibility of each DNFS paper from the records.

Table 5 Eligibility criteria in terms of the scores for each paper for a quality assessment

3.6 Phase 6: Data extraction and compilation

After appropriately classifying the studies to be included in the systematic literature survey following the steps in Phases 4 and Phase 5, we performed a data extraction and compilation to examine and compare the relevant studies. Generally, the data extraction and compilation are performed using a variety of available tools and software, including Microsoft Excel (spreadsheets), REDCap, and Google Sheets. We utilized Microsoft Excel to record data from the publications to answer the research questions and achieve the goals of the study. The following information was extracted from each included study: title, abstract, keywords, authors, publication year, scientific databases/venue, publication type (e.g., journal, conference, or book chapter), the technique used (new method, modified, or hybrid approach), and study type (e.g., analysis, survey or a mixture of both).

3.7 Phase 7: Included (final findings and results)

Figure 2 summarizes the entire revised study mapping process according to the PRISMA guidelines using a flow chart. As illustrated in Fig. 2, a total of 252 studies were found using the keywords and search strings (as listed in Table 3) in scientific databases including ACM, IEEE Xplore, Scopus, ScienceDirect, and SpringerLink from the years 2015 to 2020 during the identification phase. The screenings of the 252 collected records were then conducted based on the inclusion and exclusion criteria mentioned in Table 4.

Fig. 2
figure 2

PRISMA flow chart for selection of the studies in the systematic literature survey

At this stage, 166 relevant studies were included. The preferences were given to the journal articles, conference proceedings, and book chapters relevant to the domains of DNFS, its optimization methods, and applications in various domains that are written in English language and published between the year 2015 and 2020. A total of 86 studies were excluded at this stage because they were not related to the DNFS domain, written in languages other than English, published prior to the year 2015, or were published as tutorials, short papers, interviews, blogs, or duplicated publications.

The 166 relevant studies screened from the previous phase were further cross-checked for eligibility based on the eligibility criteria specified in Table 5 of phase 5 before being included in this systematic literature survey. The eligibility criteria were designed to assess whether the studies have clearly defined objectives, a clearly presented methodology, a clear experimental process, stated limitations, and findings based on scoring outputs of 1 (yes), 0.5 (partly), and 0 (no). As a result of the quality assessment process, a total of 105 studies remained in the records, whereas 61 studies were excluded because they did not meet the eligibility criteria. Therefore, after following the mapping process, the final 105 collected studies were included in the systematic literature survey to answer the RQs of this study and highlight the research gaps, issues, and challenges of this particular domain.

Furthermore, Fig. 3 shows the publication type of the 105 included studies based on their publication avenues, such as studies published in journals, conference proceedings, book chapters, and preprint servers for the analysis.

Fig. 3
figure 3

Included studies based on the publication types

4 Analysis and synthesis of data

This section answers the research questions in this study and provides a better understanding of the collected data.

4.1 RQ-1: What are the fundamental concepts related to deep neuro-fuzzy systems?

A deep neuro-fuzzy system (DNFS) is an advanced concept of hybridization, where deep learning approaches, such as deep neural networks and fuzzy logic approaches, are combined to solve various real-world complex problems involving high-dimensional data. That said, before going into the details of DNFS models, it is essential to build preliminary knowledge for the readers by providing an overview of deep neural networks and neuro-fuzzy systems, which makes it easier to comprehend the idea behind developing DNFS.

Deep neural network (DNN) Deep learning enables multi-layer cognitive models to learn and interpret data with several levels of abstraction, replicating the brain perception and knowledge representation; hence, it is indirectly capable of capturing complex large-scale data structures. In comparison to various existing state-of-the-art methods, the importance of deep learning approaches is growing rapidly owing to their extraordinary performance in several applications, such as visual, audio, social, and medical data (Voulodimos et al. 2018). In general, AI models are trained to perform data processing tasks based on hand-crafted features derived from raw data or features learned from other basic AI models. Using deep learning, computers can automatically learn useful representations and features directly from the raw data, avoiding the challenging step of manually crafting the features (Lundervold and Lundervold 2019). Deep learning techniques have achieved an excellent performance in computer vision, automatic speech recognition, and natural language processing. It is evident from the term “deep learning” that a DNN model involves a greater number of processing layers, instead of fewer layers in a simple neural network recognized as a shallow learning model. The advancement from shallow to deep learning models has increased the possibility of dealing with complex and nonlinear functions (Shrestha and Mahmood 2019).

Neuro-Fuzzy Systems (NFS) Several NFS have been presented in the literature. However, the ANFIS model is the most frequently used approach. The concept of ANFIS was presented by Jang in 1993, which is a proficient combination of neural networks and fuzzy logic (Hussain et al. 2015). The key benefit of a neural network is the ability to learn from data. However, such a network is considered a “black-box,” because it does not clarify how the final outcome is achieved. Therefore, with the help of IF–THEN rules in fuzzy systems, one can interpret the results generated by the model (Kruse and Nauck 1998). The following is a presentation of the standard fuzzy rule:

$$\mathrm{IF }x\,\mathrm{i\, A\,and\,}y\mathrm{ is\,B\,THEN \,}z=f(x,y)$$
(1)

where A and B are fuzzy sets, and z is a polynomial or a constant (Emad Hussen et al. 2020). Hence, a neuro-fuzzy model can incorporate human knowledge and self-learning competencies that can potentially approximate every situation (Mohd Salleh and Hussain 2016).

ANFIS has gained popularity among the research community as compared to other variations of fuzzy and neuro-fuzzy systems because it has been successfully implemented in various classification, rule-based process controls, and pattern recognition applications. It also embeds learning mechanisms to adapt and update all adjustable parameters of the model with two-pass learning algorithms, i.e., forward pass and backward pass. This algorithm is a hybrid of gradient descent (GD) and a least squares estimator (LSE), which helps to adjust the antecedent and consequent parameters of the ANFIS model to minimize the error between the actual output and the targeted output (Salleh et al. 2018).

Novel hybrid approach of deep neuro-fuzzy system (DNFS) For many years, the scientific community has explored several ways to execute sophisticated algorithms that can learn from data, the main barriers of which were the computational power and limited data. Such attempts and years of development to solve these problems have resulted in an exciting new subfield of machine learning called deep learning (DL). A popular algorithm within DL is a DNN (Bonanno et al. 2017). A DNN attempts to learn multiple levels of abstractions and representations to locate complex relations between data. This evolving subfield has been rising rapidly over the past few years owing to the ever-increasing computational power and unlimited access to data. Although DNNs have shown remarkable progress with respect to feature learning from big data, the model is unable to express the uncertainties with data because of its “black-box” nature (Shwartz-Ziv and Tishby 2017). Despite the usefulness of a DNN, a gap in communication exists between the analysis and DNN. Since the beginning of big data, massive amounts of data have been produced daily around the world within the domain of science and industry. However, when the volume of data increases, the existence of noise and unpredictable uncertainties in large amounts cannot be ignored, which is another critical issue of data ambiguity (Angelov and Gu 2018). This issue again becomes challenging for many autonomous systems because the majority of these machine learning models are designed and trained using labeled data (Bonanno et al. 2017).

The drawbacks of such methods have been solved by introducing an additional machine learning process into neural networks, that is, fuzzy inference, to create an explainable rule-based structure known as a neuro-fuzzy system (NFS). The NFS allows experts to generate rule-based structures. Once the rules are generated, it is possible for the experts to bias features generated from the DL by providing feedback to the system. In addition, through these rule-based structures, an analyst can easily understand how a decision has been made by the system (An et al. 2019). Using fuzzy if–then rules, neuro-fuzzy systems such as ANFIS can approximate complex nonlinear problems. These systems can be applied in various applications that involve encoding both objective measurements and subjective information (Bonanno et al. 2017). Therefore, with the emergence of the deep learning concept and a DNN, some researchers have introduced fuzzy inference system elements and modules into such systems to address the possible uncertainties in the raw data, similar to ANFIS. Recently, a few studies have incorporated the concepts of an explainable rule-based structure called fuzzy inference with a DNN as a DNFS to overcome the black-box problem of DNN (An et al. 2019). Figure 4 shows DNFS, combining the advantages of fuzzy systems and a DNN (Aviles et al. 2016).

Fig. 4
figure 4

Representation of DNFS by combining the advantages of fuzzy systems and a DNN

DNFS comprises a broad category of hybrid systems that combine the properties of the DNN and fuzzy inference systems in different architectures. The structural designs found in the literature are classified into three categories, i.e., a sequential, parallel, and cooperative DNFS by integrating fuzzy inference systems with a DNN. The following subsection briefly explains the structures of the three categories and their respective examples from the literature.

4.1.1 Sequential structural designs of deep neuro-fuzzy system

A sequential DNFS is suitable for solving problems involving high linearities, such as time-series data, text documents, sentiment or video classification, and speech recognition. In sequential structural design, data processing in a fuzzy system and a DNN take place one after the other, as presented in Fig. 5a, b. In fuzzy theory, a fuzzy set \(A\) in a universe of discourse \(X\) is represented by a membership function \({\mu }_{A}\) taking the values from the unit interval as \({\mu }_{A}:X \to [\mathrm{0,1}]\). At this stage, a membership function shows the degree of similarity for a data point within the universe of discourse of \(x\in X\) (Yazdanbakhsh and Dick 2020). The approximate reasoning and decision-making ability of fuzzy logic assist the fuzzy system in effectively describing the uncertainty of the real world. It can work with the data having the characteristics of imprecision, ambiguity, and uncertainty (Gallab et al. 2019; Vlamou and Papadopoulos 2019).

Fig. 5
figure 5

Sequential DNFS: a fuzzy systems incorporated with a DNN and b a DNN incorporated with fuzzy systems

The process in the first approach of DNFS’ sequential structural designs (Fig. 5a) starts by taking the input features of the data and converting the values into fuzzy sets which are processed by DNN. This means that the inputs enter the fuzzy system to get the fuzzy linguistic values. Afterward, the neural network helps to generate the outputs of the sequential DNFS model. Similarly, the process in Fig. 5(b) works in the opposite way (Vieira et al. 2004). The DNN model assists the fuzzy system in determining the desired parameters when the DNFS model cannot measure the input values directly from the data (Abraham 2001).

To develop a better understanding of sequential DNFS, the study of (Sarabakha and Kayacan 2019) is presented in this review paper. This study presents a deep fuzzy neural network (DFNN) proposed by Sarabakha and Kayacan, which uses one antecedent fuzzification layer for learning control of nonlinear systems. As illustrated in Fig. 6, the DFNN neurons are organized in an input layer with \({n}_{inp}\) neurons, a Gaussian fuzzification layer \({\mu }_{x}\) with \({n}_{gF}\) neurons, hidden layers \({n}_{hL}\) with (\({n}_{H}\) + 1) neurons in each layer, and an output layer with \({y}_{out}\) neurons.

Fig. 6
figure 6

Illustrates an example of the sequential DNFS

In a fuzzy set theory, the degree of truth is defined by the membership function which contains the curve. This curve represents every single point in a specified input space. The inputs \({x}_{1},\dots {x}_{{n}_{inp}}\) to the DFNN are fuzzified using three Gaussian membership functions, i.e., \({c}_{gF,1}=-1\), \({c}_{gF,2}=0\), \({c}_{gF,3}=1\), and \({\sigma }_{gF,1}={\sigma }_{gF,1}={\sigma }_{gF,1}=1\), as indicated in Eq. (2). The two parameters of c and σ defines the center/mean and width of the curve/variance. Figure 7 shows the representation of the Gaussian membership function.

Fig. 7
figure 7

Gaussian membership function

$${\mu }_{{gF}^{l}}\left({x}_{j}\right)={e}^{-\frac{1}{2}{\left(\frac{{ x}_{j}- {c}_{gF,l }}{\sigma gF, l}\right)}^{2}}, j=1,\dots ,{n}_{inp} and l=1, 2, 3$$
(2)

The fuzzified inputs \(\left({\mu }_{{gF}^{1}}\left({x}_{1}\right), {\mu }_{{gF}^{2}}\left({x}_{1}\right), {\mu }_{g{F}^{3}}\left({x}_{1}\right),\dots ,{\mu }_{{gF}^{1}}\left({x}_{{n}_{inp}}\right), {\mu }_{{gF}^{2}}\left({x}_{{n}_{inp}}\right), {\mu }_{{gF}^{3}}\left({x}_{{n}_{inp}}\right)\right)\) are forwarded to the first hidden layer of the DFNN through the weights \({w}_{1}\) of the network. The DFNN hidden layers are aligned in a fully connected model using network weights\({w}_{i}, i=2,\dots , {n}_{hL}-1\). Finally, the output \({y}_{1},\dots ,{y}_{out}\) is calculated using the weights \({w}_{{n}_{hL}}\) until reaching the output of the final hidden layer. The weights of network \({W}_{i, }i=1,\dots , {n}_{hL} ,\) are restricted by some positive constants of\({c}_{W, i }, i=1,\dots , {n}_{hL}\), i.e.,

$${\Vert {w}_{i}\left(m\right)\Vert }_{\infty }\le {c}_{W, i } \forall m i=1,\dots , {n}_{hL}$$
(3)

The learning of this model is divided into two stages: offline pre-training and online training. During the offline pre-training process, the classical controller executes a series of trajectories and collects a batch of training samples. The controller based on a DFNN, such as \({DFNN}_{0}\), is then pre-trained on the data samples obtained to estimate the inverse dynamics of the system. Because \({DFNN}_{0}\) cannot tune the new conditions, online training is conducted at this stage. During this stage, the DFNN continuously monitors and updates the input of the controller to improve the performance. The DFNN adaptive information is generated by expert knowledge encoded in the rule-based method using fuzzy mapping. The approximation of the inverse dynamics of the system is a typical problem of regression; thus, the mean square error was set as a cost function in this study for both offline and online training. Various other examples of similar deep neuro-fuzzy structure designs can be found in the literature (An et al. 2019; Aviles et al. 2016; Dabare et al. 2019; El Hatri and Boumhidi 2018; Korshunova 2018; Liu et al. 2020a, b; Ramasamy and Hameed 2019; Yeganejou and Dick 2018, 2019).

4.1.2 Parallel structural designs of deep neuro-fuzzy system

In a parallel structural design, data are processed separately from the fuzzy systems and a DNN, and then fused to obtain the final output of the data, as shown in Fig. 8. The parallel structure uses a fuzzy system with a hierarchical DNN that derives information from both fuzzy and neural representations. The knowledge learned from these two respective views is then combined to form the final data representation for the classification (Chen et al. 2019; Deng et al. 2017).

Fig. 8
figure 8

Structural design for parallel or fused DNFS

An example of a parallel or fused structure can be seen in the study of Chen et al. (2019), where authors have proposed a hierarchical Pythagorean fuzzy deep neural network (HPFDNN). As illustrated in Fig. 9, the HPFDNN model consists of four phases: fuzzification (Pythagorean fuzzification), neural net, fusion, and learning phases.

Fig. 9
figure 9

Illustrates an example of parallel/fusion DNFS

Phase 1 - Pythagorean fuzzification phase of HPFDNN In this phase of the model, inputs are linked to several Gaussian membership functions defined in Eq. (2) and Fig. 7 to determine the degree of membership and to specify the input belonging to a certain fuzzy set.

In this phase, if \(p\) is input, \(q\) is an output, and \(n\) is the layer number, \({p}_{i}^{n}\) denotes the input of the i-th neuron of the n-th layer. Similarly, \({q}_{j}^{n}\) represents the output of the jth neuron in the n-th layer. The output is processed through a Pythagorean fuzzification, which is defined with parameter \(r\) to indicate the non-membership function as follows:

$${q}_{i}^{(n)}=s\left({p}_{i}^{n}\right)={\mu }_{i}^{2}\left({p}_{i}^{n}\right)- {r}_{i}^{2}({p}_{i}^{n})$$
(5)

Phase 2 - Neural net phase of HPFDNN In this phase, the neural net is formed based on the perceptron. In the perceptron, input values are multiplied by the weight of each neuron, and when the degree of the entire input signal exceeds the defined threshold value, a neuron produces the output. This is achieved by computing the sum of the weighted inputs with the threshold neural function (NF) on the sum to produce the output. Subsequently, the DNN converts the inputs into high-level representations by activated neurons, and the neural net phase helps the model acquire neural features. In this phase, a sigmoid activation function (Fig. 10) was used as follows:

Fig. 10
figure 10

Sigmoid activation function

$${a}_{i}^{(n)}=\frac{1}{1+{e}^{-{p}_{i}^{(n)}}}$$
(6)

where \({q}_{i}\left(n\right)= {w}_{i}^{(n)} {a}_{i}^{(n)}+ {b}^{(n)}\). Here, the activated weights and biases are presented as the output of the neural net.

Phase 3 - Fusion phase of HPFDNN In the proposed HPFDNN model, the fusion of fuzzy and neural nets is processed to obtain the output using the following operation:

$${q}_{i}^{(n)}={q}_{f}^{(n-1)}+ {q}_{n}^{(n-1)}$$
(7)

where \({q}_{f}\) and \({q}_{n}\) represent the outputs of both the fuzzy and neural phases.

Phase 4 - Learning phase of HPFDNN This final phase of the HPFDNN generates a trained DNN model where the output of every layer is used as the input for the upcoming layer as follows:

$${q}^{(n)}={w}_{n-1}^{(n)} {q}^{(n-1)}+ {b}^{(n)}$$
(8)

where a weight matrix connection is presented by \({w}_{n-1}^{(n)}\) in layer \(n\) with \(n-1\). The bias vector is represented by \({b}^{(n)}\). In addition, a sigmoid function is used in each hidden layer of this phase to transform the de-normalized output q with real values as follows:

$${q}^{(n)}= \frac{1}{1+{e}^{-{w}_{n-1}^{(n)}} {q}^{(n-1)}+ {b}^{(n)}}$$
(9)

4.1.3 Cooperative structural designs of deep neuro-fuzzy system

In cooperative deep neuro-fuzzy designs, there are two potential models of DNFS, as illustrated in Fig. 11a, b. In Fig. 11a, the fuzzy interface block converts the crisp input into fuzzy values to provide an input vector to a multi-layer neural network in response to linguistic statements. Then, the DNN is trained to generate the required outputs, and defuzzification of the outputs is performed to convert the fuzzy value into a crisp output value. As shown in Fig. 11b, the fuzzy inference mechanism is determined by a multilayered DNN. Fuzzy systems obtain the computational characteristics of learning offered by a DNN, and in return, the DNN receives the interpretation and clarity of the system representation (Phuong and Kreinovich 2001).

Fig. 11
figure 11

Structural design of cooperative DNFS: a fuzzy deep neural network and b deep neuro-fuzzy network

A simple example of a cooperative DNFS was proposed by (Yeganejou et al. 2020) using a CNN for feature extraction and by transferring the outputs of the final convolutional layer for fuzzy classification, as depicted in Fig. 12.

Fig. 12
figure 12

Illustrated example of cooperative DNFS

With the proposed model, modifications can be made to the CNN based on a dataset or individual needs. In Fig. 12, layers 1–3 are the same as those in the LeNet architecture. Subsequently, the data paths are divided into two parts. In the case of a deep fuzzy structure, the feature maps of layer 3 are extracted and sent for fuzzy clustering. Rocchio’s algorithm was employed to generate the final classification results for the network. The other path leads to a more conventional deep network including subsampling layer 4, a fully connected layer with ReLU neurons, and finally to a fully connected layer with softmax activation functions.

$${o}_{i}=\frac{e\left(\overrightarrow{{w}_{i}^{M}} . \overrightarrow{{O}_{fcl}} \right)}{\sum_{n=1}^{N}e\left(\overrightarrow{{w}_{n}^{M}} . \overrightarrow{{O}_{flc}}\right)}$$
(10)

where \({O}_{i}\) is the i-th network output, \(\overrightarrow{{O}_{flc}}\) is the output from the former fully connected layer, \(\overrightarrow{{w}_{i}^{PM}}\) is the modified value of weights for the i-th softmax neuron, and the number of neurons in the softmax layer are represented by n and \(\overrightarrow{{w}_{i}^{M}} .\) Here, \(\overrightarrow{{O}_{flc}}\) represents the logit value of the output layer. The outputs of the network are defined as [0, 1], and the summation to 1 for all inputs of the network.

The fuzzy classifier unit of the proposed model implies a process for selecting a feature map, which states that either all elements are selected from the feature map, or none of the elements are selected. This process helps to omit redundant feature maps, which leads to a better classification accuracy. Next, a PCA-based dimension reduction idea is employed using GK clustering, in which a matrix inversion requires a step with each chosen feature map providing 196 features of a 14 × 14 image. Fuzzy clustering was executed, and Rocchio’s algorithm (Yeganejou et al. 2020) (presented in Fig. 13) was employed as a classifier.

Fig. 13
figure 13

Flow of fuzzy Rocchio’s algorithm

A mini-batching variant of stochastic gradient descent was used as an additional momentum to train the network.

$$\Delta {w}_{ij}\left(t\right)= \mu {\delta }_{j}\left(t\right){p}_{i}\left(t\right)+m\Delta {w}_{ij}(t-1)$$
(11)

where \(\Delta {w}_{ij}\left(t\right)\) represents the tuning of the ij-th weight after observing the t-th mini-batch of the input patterns. The learning rate is presented as \(\mu\), \({\delta }_{j}\left(t\right)\) shows the error for neuron j, and m is the constant of the momentum. Subsequently, the former mini-batch of the ij-th weight is tuned using \(\Delta {w}_{ij}(t-1)\) and a cross-entropy loss function \({L}_{CE}\) is used as follows:

$${L}_{CE}= -{\sum }_{n=1}^{N}{b}_{n}\mathrm{log}(l\left({O}_{n}| \overrightarrow{I}\right))$$
(12)

where \({b}_{n}\) states the ground-truth of the class that the current input \(\overrightarrow{I}\) is n, and l shows the probability that the output \({O}_{n}\) will be projected by the deep network for input \(\overrightarrow{I}\).

In addition, other examples of cooperative-type DNFS have been reported in (Greeshma and Bindu 2017; Nguyen et al. 2018, 2019; Samanta et al. 2019).

4.2 RQ-2: What approaches have been widely employed for the optimization of deep neuro-fuzzy systems?

Optimization plays an extremely important role in discovering the best solution from a set of available options with minimal resources. In the field of computing, engineering, or a simple task of online shopping, to find solutions rationally, optimization methods help to identify the best solution from a wider range of possible options. Similarly, various optimization methods, such as a gradient descent, stochastic gradient descent, and conjugate gradient methods, have been adapted by machine learning and deep learning techniques for parameter optimization. These methods are well known as exact methods. However, metaheuristic algorithms have gained more popularity over exact methods for solving optimization problems owing to the simplicity and robustness of the results generated when implemented in a wide range of fields, including engineering, industry, transport, and even social sciences (Hussain et al. 2019). The exact methods are suitable for delivering optimal solutions for smaller problems by following local search mechanisms, whereas metaheuristic-based methods have shown a significant performance in finding optimal solutions when solving large-scale problems using their ability of a global search (Kolajo et al. 2019). The metaheuristic concept further offers a search mechanism based on a single solution and population-based methods. Population-based (PB) metaheuristics offer a wide range of algorithms, such as evolutionary algorithms (EA) or swarm intelligence (SI)-based metaheuristics. The EAs are composed in a population of individuals, where each individual represents a search point in the space of possible solutions, and are subjected to a collective learning process for transmitting the information to the next generations. In SI, the individual member in a swarm works independently on the basis of their stochastic behavior and observations from the neighborhood or surroundings environment (Kurban et al. 2014).

This section aims to provide an insight into the optimization methods used for optimizing the DNFS in the included studies. A careful investigation of the optimization method is presented in Table 6.

Table 6 Search results for optimization methods

The deep analysis of the final data revealed that most papers on the DNFS model have employed exact methods for network optimization, as presented in Table 6. Meanwhile, only five studies have employed population-based (PB) metaheuristic approaches to optimize DNFS such as brain storm optimization (BSO) (Ravi 2020), elephant herd optimization (EHO) (Velliangiri and Pandey 2020), genetic algorithms (GA) (Lee 2020), crow search algorithm (CSA) (Chandrasekar 2020), and the Jaya optimization algorithm (JOA) (Siva Raja and Rani 2020). However, in three studies, the optimization was performed by combining one metaheuristic with another metaheuristic algorithm such as the genetic algorithm (GA) with big bang-big crunch (BB-BC) (Chimatapu et al. 2018), biogeography-based optimization (BBO) with hessian-free (HF) (Zheng et al. 2017) and BBO with greedy layer-wise training method (Zheng et al. 2016). Few studies in the records were found to explain the model without mentioning any optimization methods. Figures 14 and 15 provides a clearer image of the distribution and record found in scientific databases for each optimization method presented in Table 6.

Fig. 14
figure 14

Overall distribution of the optimization methods used with DNFS

Fig. 15
figure 15

Distribution of the DNFS optimization methods in scientific databases

Based on Fig. 15, IEEE Xplore has the highest number of publications in the optimization of DNFS when using different methods. The publication record found in Scopus indicates the least number of studies from the literature to optimize DNFS, and SpringerLink shows that no studies have been published in the directory using optimization techniques other than exact methods. Along with the type and intensity of each optimization method in scientific databases, Fig. 16 identifies the trend of studies in the DNFS domain employing exact optimization methods.

Fig. 16
figure 16

Trend of exact methods in the DNFS domain

Similarly, a deeper analysis was carried out to understand the beginning of the trend for researchers using population-based (PB) metaheuristics for DNFS optimization. This will help to identify the scope of metaheuristics-based methods within this domain. From Fig. 17, it is obvious that the implementation of metaheuristic methods in the DNFS domain first took place in the year 2017, and only eight studies were conducted from 2017 to 2020. Moreover, we can conclude that most studies have employed EA methods compared to SI in optimizing DNFS.

Fig. 17
figure 17

Trend of population-based (PB) metaheuristic methods in the DNFS domain

4.3 RQ-3: What is the intensity of publications in the domain of deep neuro-fuzzy systems in terms of year-wise and directory-wise?

According to the primary data extracted from scientific databases, it can be concluded that the advanced research related to the integration of deep learning techniques such as a fuzzy-based DNN in the form of DNFS initially took place in the year 2015 (Laleye et al. 2015) and has gained the attention of the research community ever since. Since then, the model has been employed to solve problems in various application areas. Although the implementation of such a system is still in its early stages, the rise in the number of publications, as indicated in Fig. 18, cannot be ignored. It is clear from the figure that the research towards the integration of a DNN and fuzzy systems attracted researchers more effectively from 2015 to 2016, and a steady growth in the number of publications had occurred until 2017. After a decrease in 2018, a positive increase in the publications on DNFS can be seen in Fig. 18 for the period of 2019 to 2020, along with application subjects and an analysis of DNFS optimization methods. Hence, by observing the intensity of the publications in Fig. 18, it can be concluded that the novel DNFS has a promising scope at present and as well as in the future.

Fig. 18
figure 18

Publications of DNFS year-wise

A total of 252 publications found in the literature were published over the past 6 years (from 2015 to 2020). This does not necessarily imply that all publications in the literature have been found, and there remains more to be explored. However, during the DNFS-related keyword search, most of the databases displayed only related studies on the first three pages. This shows that there is still much to be explored in the domain of DNFS and the intensity of the research conducted in this domain has continued to increase over the past 6 years, as shown in Fig. 18. To be more careful, we exceeded our search to a maximum of five pages for each scientific database. The priority was given to studies appearing on the initial five search result pages from ACM, IEEE Xplore, Scopus, ScienceDirect, and SpringerLink. Figure 19 shows the intensity of publications for DNFS in each scientific database. It was revealed that the most popular avenue for published literature related to the DNFS is IEEE Xplore (also with the highest number of conference proceedings, journals, papers, and book chapters) followed by SpringerLink, ScienceDirect, Scopus, and ACM.

Fig. 19
figure 19

Publications of DNFS directory-wise

4.4 RQ-4: What are the most promising and practical application subject and domain areas where deep neuro-fuzzy systems have been implemented?

Combining fuzzy systems with a DNN enables the development of AI models that are not only accurate in prediction but also inherently interpretable and understandable to humans. Furthermore, the experimental results presented in (Yazdanbakhsh and Dick 2019) show that the DNFS approach is capable of achieving better accuracy than a DNN with the same level of abstraction/depth. Because basic neuro-fuzzy systems have been major research topics for over 27 years, various surveys and systematic review studies can be found in the literature. However, there is a lack of current research on the novel DNFS hybrid technique (Yeganejou and Dick 2018). Hence, researchers have started exploring the potential of this new domain by implementing the model in various applications ranging from the computing domain to the healthcare, manufacturing, and aviation industries. Most breakthroughs regarding the implementation of DNFS in various application subject domains are highlighted in the following subsections.

4.4.1 Deep neuro-fuzzy system applications in the subject domains of computing

Techniques in the field of AI have made significant contributions to the solution of different real-world problems, including those in the computing domain. Likewise, the novel DNFS has also shown positive potential in solving multiple problems, mainly from this domain. In this subsection, we discuss the maximum number of studies found in records implementing the DNFS in different subject domains of computing, including distributed systems, cloud computing, cybersecurity, Internet marketing, software testing, and the classification of image, speech, text, and video.

(i) DNFS application on distributed systems The difficulty of classifying sentiments on Twitter is important for real-world situations such as decision-making and information systems, where customers might obtain relevant information through online reviews. Service ratings can serve as an excellent point for the decision-making process as they provide quick information on the online reviews (Uma 2020). Therefore, an optimization-based fuzzy deep learning classification was proposed in (Uma 2020) and (Bedi and Khurana 2020) for sentiment analysis. The proposed method was developed to solve the misclassification problem in social media reviews. Similarly, taking the advantages of deep learning and fuzzy inference, the authors in (Nguyen et al. 2018) proposed a hybrid fuzzy convolutional neural network (FCNN) with the integration of fuzzy logic and a CNN model for text sentiment classification of Twitter sentiment and movie reviews. The proposed model can resolve ambiguities in data with linguistic labels that are important for emotion detection for sentiment analysis tasks. A comparison of the results between the proposed FCNN and a conventional CNN showed that the proposed FCNN achieves better classification accuracies on emotional data. In another study (Zhou et al. 2014), the authors embedded prior knowledge into the learning structure, making a two-step semi-supervised learning method called fuzzy deep belief networks (FDBN) for sentiment classification. In addition, DNFS has been implemented for congestion control in wireless sensor networks (WSNs) (Monisha and Ranganayaki 2018) and data streaming processing (Mahardhika Pratama et al. 2018).

(ii) DNFS application on cloud computing A few years ago, personal computers were not capable of tackling heavy workloads with vast amounts of data for processing. Hence, processing massive data was (and in some regards still is) a challenging problem until cloud computing was introduced to offer services as a solution to the massive data storage. While attempting to reduce the service rates, the most important aspect was the capability to schedule an upscale of cloud system resources for future or on-demand use. To ensure that cloud services are affordable to the customers, the authors in (Chen et al. 2018a, b) proposed a fuzzy deep neural network (FDNN) model to predict the demand for cloud computing resources. This model can assist customers in deciding the number of resources to be reserved for their computing needs, thereby reducing their operational costs. In another study, the same authors developed a hierarchical Pythagorean fuzzy deep neural network (HPFDNN) model by incorporating the properties of Pythagorean fuzzy logic and DNN (Chen et al. 2019).

(iii) DNFS application on cybersecurity Modern malware is an alarming threat to both individual and organizational security. Over the last few decades, several malware families have been codified and differentiated based on their behavior and functionality. Most machine learning methods work well with the general benign-malicious classification, but are unable to distinguish new malware among many classes (Shalaginov and Franke 2017). Therefore, the authors Shalaginov and Franke (2017) proposed a novel deep neuro-fuzzy architecture for multi-label malware classification and fuzzy rule extraction. In addition, smart grids (SGs) are critical and intelligent systems. However, to ensure the security, network requires cybersecurity method with advanced mechanisms of intrusion detection and prevention systems (IDPSs). Therefore, to provide the best possible security for SGs, a smart collaborative IDPS was introduced in (Patel et al. 2017) with a fully distributed management system to support and prevent the network from attacks. In (Amosov et al. 2019), a hybrid model of fuzzy logic with convolutional layers was introduced to detect the denial-of-service (DoS) attacks in a highly loaded corporate network by recognizing the abnormal network traffic.

(iv) DNFS application on Internet marketing Apart from the sentiment analysis, Internet marketing and online advertising are seen as successful approaches for promotional engagement because of their solid and personalized communication capabilities. As a result, many researchers have shown interest in Internet advertisements, which have become an important source of revenue for online businesses. The click-through rate (CTR) is an effective factor for determining the effect of targeted advertising. Therefore, an FDNN was proposed in (Jiang et al. 2018) to predict the advertising CTR. In addition, fuzzy clustering and deep learning were combined in (Yin et al. 2020) to forecast the sales of the new products.

(v) DNFS application in software testing In the field of software engineering, human-based software testing consumes a lot of time and resources. However, software testing is an essential part of the process to validate the performance of a product under different circumstances before being released for consumer usage. Therefore, to save cost and time during the testing process, various software testing tools related to Oracle have been reported in the literature. Among such studies, in (Monsefi et al. 2019), the authors implemented a novel deep neuro-fuzzy approach for software testing on Oracle. The proposed approach was validated using four different applications and produced better accuracy in detecting the errors with correct data. By contrast, in the study (Liu et al. 2019), a deep-learning and fuzzy oversampling-based model called DeepBalance was used for software vulnerability detection.

(vi) DNFS application on image, speech, and text classification Earlier, deep learning was successfully applied in various tasks such as data, text, and image classification. Likewise, the majority of the articles in the literature on DNFS applications can be seen handling these challenges by combining fuzzy systems and a DNN. The author Korshunova (2018) proposed a convolutional fuzzy neural network approach with the help of convolutional, pooling, fully connected, and fuzzy self-organization layers for the classification of real-world objects and image scenes. The approach combines the advantages of a CNN and fuzzy logic to tackle ambiguity in the interpretation of the input sequence. Inspired by the current advancements of CNNs, the study of Greeshma and Bindu (2017) developed a fuzzy deep learning algorithm for single-image super-resolution. This novel approach uses a fuzzy rule layer along with a deep network to recreate a high-resolution image. In addition, the authors of (Deng et al. 2017) presented an FDNN for classification tasks, such as natural scene image categorization and stock trend prediction. Similar studies for image classification can be found in (Guan et al. 2020; Kunchala et al. 2020; Liu et al. 2020a, b; Liu et al. 2020a, b; Manchanda et al. 2020; Tianyu and Xu 2020; Yeganejou and Dick 2018, 2019; Yeganejou et al. 2020; Zhang et al. 2020a, b, c).

In addition to image classification, DNFS has been successfully implemented to perform speech classification. A speech enhancement framework using a fuzzy deep belief network (FDBN) was reported in (Samui et al. 2019). In this model, the network is implemented to perform pre-training with the help of multiple FDBNs to enhance the stability and speed of feature learning. Meanwhile, Xu and Xiao (2018) suggested a theoretical method using a combination of fuzzy optimization with deep learning to cope with the fuzziness of emotions.

Since data are growing at an exponential rate, it is essential to summarize a text document in order to understand the key elements of the document. Many studies have been conducted on summarization methods, and most of them are extractive summarizers. Henceforth, the research work by Chopade and Narvekar (2017) presented a hybrid approach of DNN and fuzzy logic systems. In this study, a restricted Boltzmann machine (RBM) is proposed with a DNN and fuzzy rule based on the phrases to extract the features using a sentence matrix.

(vii) DNFS application on video classification and robotics The study of Nguyen et al. (2019) presented a novel convolutional neuro-fuzzy network, which incorporated a CNN into the fuzzy logic domain to derive high-level features of emotions from the text, audio, and image data. Alternatively, the authors in (Cunha Sergio and Lee 2020) proposed a novel hybrid DNN with ANFIS to interpret the emotions of a video from its visual features and a deep long short-term memory recurrent neural network to produce the related audio signals with an equal emotional impression. Likewise, in another study (Savchenko et al. 2018), the authors implemented a fuzzy analysis with a CNN in still-to-video recognition. Moreover, the model has also been implemented in human action recognition and robotics (Bendre et al. 2020; Chen et al. 2020a, b; Liao et al. 2020; Mohmed et al. 2020; Wu et al. 2020).

Figure 20 shows the intensity of publications for various DNFS applications in the computing domain. Based on Fig. 20, most DNFS applications are focused on image, speech, and text classifications, followed by video classification and robotics, distributed systems, cybersecurity, cloud computing software testing, and Internet marketing.

Fig. 20
figure 20

Intensity of DNFS-related publications for application subjects in the computing domain

4.4.2 Deep neuro-fuzzy system applications in the subject domains of healthcare

Similar to the computing domain, DNFS has been successfully implemented in the healthcare sector to perform robotic surgeries and predict various diseases. A deep neuro-fuzzy approach was implemented in (Aviles et al. 2016) to estimate the interaction forces while performing a Robotic surgery. A fuzzy hybridized FCNN model was used in (Ramasamy and Hameed 2019) to classify healthcare data. In the study (Davoodi and Moradi 2018), a modern fuzzy deep model was proposed for intensive care units (ICUs) for mortality prediction. For this, a deep framework was developed based on the layered structure of the fuzzy rule base, which can address big data problems. In (Park et al. 2016), the researchers proposed fuzzy deep learning (FDL), which is a specific estimation method for intra-and inter-fractional variations in many patients. The proposed FDL was built by breathing clustering, a prediction of precise movements, and decreasing the computational cost. Sharma et al. (2020) employed a DNFS as a decision-making system to predict the risk and severity of diseases. The study presented a hybrid diagnosis strategy (HDS) using fuzzy inference and a DNN to detect COVID-19 patients. Moreover, the novel hybrid approach to DNFS has been implemented for tumor and cancer detection and segmentation (Banerjee et al. 2020; Lima et al. 2020; Mudiyanselage et al. 2020; Özyurt et al. 2019; Pitchai et al. 2020; Rahouma et al. 2019; Sengan et al. 2020; Shen et al. 2020; Yang et al. 2020; Zhang et al. 2020a, b, c).

4.4.3 Deep neuro-fuzzy system applications in the subject domains of finance and economics

After the field of financial engineering expanded over the last few years from financial signal analysis to financial prediction methods, this field has become the most important topic among the academic communities and the financial world. Several hybrid intelligent financial prediction systems incorporating neural networks, fuzzy logic, and genetic algorithms have been proposed over the past 20 years. Likewise, to make a worldwide financial prediction, Lee (2020) introduced a chaotic type-2 transient-fuzzy deep neuro-oscillatory network (CT2TFDNN) with retrograde signaling. Other studies (Chandrasekar 2020; Chen et al. 2020a, b; Wang 2020; Xiao 2020) have implemented the DNFS method in Bitcoin price prediction, stock index prediction, and e-commerce platforms.

4.4.4 Deep neuro-fuzzy system applications in the subject domains of traffic flow and incident prediction

In the era of AI today, the use of intelligent software for travel assistance is growing rapidly. An intelligent transportation system (ITS) is an advanced transport management system that incorporates electronic information, AI, global positioning system (GPS) tracking, communications engineering, and other techniques. Generally, traffic flow data has the limitation of complexity and noise interaction. However, compared with the previous deterministic explanation, fuzzy theory is capable of generalizing the original data more logically (An et al. 2019). Hence, a fuzzy-based convolutional neural network (F-CNN) approach was implemented in (An et al. 2019) for predicting traffic flow. This approach uses a fuzzy inference system (FIS) to produce uncertain knowledge about traffic incidents. In addition, a CNN training algorithm is used to learn the characteristics of internal traffic data, traffic accident information, and external information, thus forming an F-CNN prediction model to predict the traffic flow. A few similar studies have employed DNFS for traffic flow prediction and incident detection in (Chen et al. 2018a, b; El Hatri and Boumhidi 2018; Sumit and Akhter 2019; Usman et al. 2020). At the same time, the authors in (Chai et al. 2020; Ivanov et al. 2019) proposed the same model for monitoring unmanned surface vehicles (USVs) and hypersonic vehicles.

4.4.5 Deep neuro-fuzzy system applications in the subject domain of the manufacturing industry

In the welding industry, methods and techniques must consider trends of robotic usage and a large multi-structural architecture to meet the criteria of current development projects in the market. In addition, the technologies in modern manufacturing have led to new developments in welding techniques. Drawing inspiration from the AI technique, the study in (Kesse et al. 2020) proposed the implementation of an AI-based tungsten inert gas (TIG) algorithm for welding to identify the control parameters and predict the optimal welding bead width using fuzzy deep learning. Similarly, for industrial accidents, it is important to prevent and control industrial accidents with an early warning. The existing approaches are time-consuming, unreliable, and incompetent in coping with uncertainty. Therefore, an FDNN was implemented in (Gobinath and Madheswaran 2020; Lin et al. 2020; Yun et al. 2020; Zhang et al. 2020a, b, c; Zheng et al. 2017) to diagnose the faults in machines, provide a forecast, and alert managers for possible industrial accidents in advance. In addition, the study of (Remya and Sasikala 2019) used hybridization of the back-propagation in deep learning and fuzzy logic decision tree for rubberized coir fiber classification.

4.4.6 Deep neuro-fuzzy system applications in the subject domain of the aviation industry

In the aviation sector, passenger profiling plays a vital role in maintaining commercial airline security. However, the conventional methods have become inefficient in handling the rapidly increasing amounts of electronic records. Hence, the researchers in (Zheng et al. 2016) proposed a deep neuro-fuzzy approach with the integration of ordinary Pythagorean-type fuzzy sets and a deep Boltzmann machine (DBM) as a Pythagorean fuzzy deep Boltzmann machine (PFDBM). This study further proposed a hybrid learning algorithm combining a biogeography-based optimization (BBO) metaheuristics algorithm to improve the exploration search and a gradient-based method to enhance the exploitation search. The simulation results performed on the Air China datasets indicate that the proposed solution offers a high classification accuracy with a great learning ability. In addition, many pattern-analysis tasks can be solved using this approach.

4.4.7 Deep neuro-fuzzy system applications in the subject domains of energy and load forecasting

The incorporation of smart meters in energy management systems has made it easier for electrical companies to access the electricity usage data of their customers. However, extracting and analyzing enormous amounts of data is challenging for these companies. Therefore, researchers have started to utilize various AI techniques to analyze data retrieved from smart meters (Javaid et al. 2019). The efforts have been made for the energy management of the residential buildings in (Javaid et al. 2019). This study focused on an efficient load and cost optimization by proposing the use of DNFS for solving uncertain behaviors of consumers with large amounts of data. The final finding of the study confirms the robustness of the proposed model in terms of cost optimization and energy efficiency. Apart from this, one more study was found in the literature that combines fuzzy and deep learning methods to predict the hourly load of the next 7 days. The proposed technique showed a superior performance compared to the traditional load forecasting schemes (Sideratos et al. 2020).

Figure 21 summarizes and visualizes the DNFS research applications in various domains, together with the intensity of publications for each application subject domain. By contrast, Fig. 22 visualizes the distribution of records found in each application domain for the papers included in this systematic literature survey.

Fig. 21
figure 21

The intensity of publications in different application domains of DNFS

Fig. 22
figure 22

Distribution of records found in each application domain

5 Discussion

As a comprehensive analysis of the studies found through a designed study mapping process, this systematic literature survey reports a total of 105 relevant studies addressing the research questions. This section has been thoroughly organized into two subsections. The first subsection (Sect. 5.1) highlights the research gaps, issues, and challenges found while answering the four research questions. In addition, a few recommendations are presented to facilitate the researchers in finding potential directions for future work. Meanwhile, the limitations of this systematic literature survey are presented in second subsection (Sect. 5.2).

5.1 Research gap, challenges, and future recommendation for RQ1, RQ2, RQ3, and RQ4

Research Question (RQ1) Based on the literature published during the past 5 to 6 years, several variations of DNFS have been proposed since its emergence, and the successful implementation of this emerging model is growing rapidly in a variety of application domains. Because adding a fuzzy layer into a DNN is extremely flexible, it is possible to include it anywhere in the network architecture, depending on the desired behavior of the fuzzy layer. Hence, this study also covered three different structural designs that have been developed in the formation of deep neuro-fuzzy-based models, such as sequential structural designs, parallel structural designs, and cooperative structural designs.

Since these novel deep neuro-fuzzy systems are a type of deep network with a hybrid of fuzzy rules, membership degrees along with DNN parameters such as the learning rate, number of layers, number of nodes per layer, a huge number of weights, activation functions, and an optimizer. Therefore, despite various choices of structural designs, answering the first research question of this study revealed that most of the examples found during the period of 6 years have used sequential structural designs to develop DNFS as compared to parallel and cooperative structure designs to keep the model simple. However, sequential structures are designed to be linear and are considered as slow models compared with the other two structural designs. This behavior of linearity becomes a challenge in implementing DNFS in the big data paradigm owing to its deep architecture. Whereas a parallel and cooperative structural design could be more successful in solving complex real-world problems involving large-scale data because of their flexibility in learning and interpretability.

Hence, in the future, research on the development of DNFS could be further oriented toward efficient hybridization of fuzzy or neuro-fuzzy systems with a DNN to create parallel and cooperative models.

Moreover, because these models have been proposed to tackle big data problems, the computational complexity increases when dealing with huge and complex data owing to the deep architecture. In addition to suggested structural changes, few studies from the literature have recommended introducing hardware solutions such as the use of powerful GPUs, an FPGA, and memristors in the future to overcome the problem of computational complexity.

In the same context of big data, it is often challenging to model suitable techniques and methods to deal with streaming data continuously generated by different sources at high speeds. In the past, various AI-based decision-making systems have been presented (Almuammar and Fasli 2019; Lobo et al. 2018; Mahardhika Pratama and Wang 2019; Ullah et al. 2019). However, when it comes to DNFS, only a single study (M. Pratama et al. 2020) has been found in developing this model for the continuous learning of non-stationary data streams. In this study, a deep evolving fuzzy neural network (DEVFNN) with an elastic structure is introduced. This approach helps to make dynamic modifications in fuzzy rules and the depth of the network structure. In the future, more research work should be directed toward constructing DNFS models that can analyze streaming data dynamically to generate instant and reliable outcomes.

Research Question (RQ2) The second research question covered the methods employed to optimize the parameters of the DNFS. Based on the data presented in Table 6 and Fig. 14, it is obvious that most of the studies have used the exact methods such as a gradient descent (GD) algorithm, which is iterative and prone to being stuck in the local minima. Therefore, when facing large-scale data, deep neuro-fuzzy models often deal with slow convergence and poor outcomes (Das et al. 2020). This problem ultimately affects the accuracy of the model during the classification tasks.

As a result, several modern optimization techniques under “metaheuristics” have been introduced and implemented in the literature to efficiently optimize machine learning models. However, very limited efforts have been made in the literature from 2017 to 2020 to solve this problem of the local minima with the help of metaheuristic techniques for DNFS models. Moreover, based on our findings for this research question, the majority of studies have used evolutionary-based metaheuristic optimization approaches, whereas only two studies have adopted the swarm intelligence approach. However, according to the research work presented in (Kurban et al. 2014), swarm-based algorithms are generally more accurate and reliable than evolutionary algorithms. In contrast, an analysis based on the study in (Janga Reddy and Nagesh Kumar 2020) states that evolutionary algorithms outperform swarm-based algorithms in terms of finding a near-optimal solution within a reasonable computational time.

In addition, we cannot ignore the concept of the “No Free Lunch theorem”, which states that no single metaheuristic is better than another metaheuristic algorithm for solving all real-world problems. Therefore, it is challenging to generalize a particular metaheuristic optimization algorithm that can be used to solve classification, time-series, computer vision, natural language processing, and other tasks. To date, several attempts have been made to introduce new metaheuristic techniques in the literature. Among these methods, some of the popular algorithms are cuckoo search (CS) (Gandomi et al. 2013), bat algorithm (BA) (Yang and He 2013), grey wolf optimizer (GWO) (Mirjalili et al. 2014), animal migration optimization (AMO) (Li et al. 2014), whale optimization algorithm (WOA) (Mirjalili and Lewis 2016), emperor penguins colony (EPC) (Harifi et al. 2019), mayfly algorithm (MA) (Zervoudakis and Tsafarakis 2020), and equilibrium optimizer (EO) (Faramarzi et al. 2020).

With respect to the issues identified in the above statement, research on optimizing DNFS with metaheuristic-based algorithms still needs significant work in the future. Researchers may in the future investigate and explore the newly introduced metaheuristic optimization methods mentioned above to compare and further improve the performance of DNFS.

Research Question (RQ3) The intensity of publications in the domain of DNFS was carefully examined and addressed in this research question. Based on our extensive search from the online databases, it can be concluded that the research on the development of DNFS first took place in 2015. According to Fig. 18, the domain of DNFS was not explored much during the first four years (2015–2018) of its development. This might have caused difficulties and challenges for researchers to understand this new concept of integrating deep learning with fuzzy systems. Nevertheless, a major rise in the publication of DNFS can be seen from the years 2019 to 2020. The DNFS model has started gaining attention among research communities while successfully solving various problems in the subject areas of computing, engineering, and industry.

Since this domain is still a new area of interest, the majority of the research work has been conducted by implementing the model in different application domains without any major efforts on improving its performance by reducing the computational complexity. Hence, future research should be geared towards exploring efficient ways to improve the training mechanism, hyperparameter tuning (e.g., learning rate and number of hidden layers), fuzzy knowledge base, and structural modifications of the DNFS model.

Research Question (RQ4) With this research question, this study tried to cover the majority of the applications in the domain of DNFS. Since the model has been recently introduced, there were minimal studies found in the literature that have used DNFS for different application subjects. Figure 21 of Sect. 4.4 shows that the use of DNFS models in the computing area is the leading trend, followed by healthcare, traffic management systems, and manufacturing industry when compared to other applications. Most studies in the computing domain have used the DNFS model for image, speech, and text classification. Subsequently, video classification and robotics, as well as distributed systems, are trending ahead of other fields such as cybersecurity, cloud computing, software testing, and Internet marketing, as shown in Fig. 20. However, relatively little research using DNFS has been identified in the aviation industry, finance and economics, and energy management.

Therefore, while answering this research question, it can be concluded that there is a vast scope related to the future implementation of the model in technologies under the fourth industrial revolution (4IR), such as AI, blockchain, virtual and augmented reality, cybersecurity, biotechnology, the Internet of Things, digital signal processing, robotics, manufacturing industry, and renewable energy.

5.2 Limitations of this systematic literature survey

A critical analysis of the records found in the literature revealed that only four survey studies have been published on DNFS. However, our study is the first initiative in this domain to present a systematic literature survey. The motivation behind conducting this systematic literature survey was to investigate and identify detailed statistics and figures by performing an in-depth analysis of the records obtained from the literature, and to cover all papers related to the area.

However, this systematic literature review has a few limitations. For instance, during the screening procedure, only studies published with detailed knowledge and written in English were included in this systematic literature survey. As a result, there may be some short papers or publications that are published in other languages, which may have made a positive contribution in this domain but were not analyzed. Furthermore, to maintain the quality and reliability of this systematic literature survey, we had to exclude research papers that claimed their method was a combination of fuzzy systems and deep learning but had poorly defined methodologies. We are aware that these filters might have affected the final findings of the included studies. Nonetheless, the decision to exclude the aforementioned papers was not taken lightly and was conducted based on the inclusion and exclusion criteria (see Table 4), as well as the eligibility criteria (see Table 5). Thus, the goal of this systematic literature review was to identify and highlight the majority of work published in the DNFS domain.

6 Conclusion

This systematic literature survey aims to capture state-of-the-art research in the novel domain of DNFS by following the guidelines of well-written systematic reviews from the literature. As a result, a revised study mapping process comprising seven phases was introduced in this study. Four research questions were designed to lay the foundation of this study and help extract meaningful information from the database to draw a comprehensive picture of the current state of research related to DNFS. A total of 252 studies were retrieved during the first step of the primary search using the selected keywords and search strings. It became obvious that DNFS-based systems are relatively new, with only a few relevant papers found in the literature during the in-depth analysis of the identified publications. However, a total of 105 studies were found during the quality assessment process, which provides an answer to the research questions of this systematic literature survey. Moreover, the well-defined answers to the research questions helped to identify the research gaps, issues, and challenges of this particular domain.

In addition, this study addressed possible future directions, including potential structural designs (e.g., parallel and cooperative architectures) to further strengthen the outcomes for solving the classification and prediction-related problems. This study also suggested the implementation of modern optimization methods such as metaheuristic techniques to optimize DNFS in the future. This study also suggests to review the performance of the model by improving the training mechanism, hyperparameter tuning (e.g., learning rate and number of hidden layers), and fuzzy knowledge base. Along with that, this study also discovered and recommended potential application areas where the DNFS has not yet been deployed, such as virtual and augmented reality, business, education, robotics, manufacturing, renewable energy, and engineering.

Recommendations were made to address the limitations found in the literature to help both researchers and practitioners interested in this particular domain. Therefore, this comprehensive systematic literature survey aims not only to provide researchers with the maximum information about DNFS in a single paper but also offer a platform for researchers who wish to commence their research and explore the potential of DNFS for future work. In the final analysis and conclusion, this study discussed the limitations of the systematic literature survey that affected the final results of the included studies, such as the fact that few studies were excluded owing to poorly defined methodologies, short papers, and research published in languages other than English.