1 Introduction

According to the increasing daily data production, we need a powerful tool to discover valuable information from large volumes of data. An important research area named data mining was developed to extract helpful information and useful knowledge of the text, and data mining serves as a basis for artificial intelligence [5]. AI can learn and train with the help of machine learning and deep learning. This learning and training process helps with the classification and analysis of results. In the 1950s, Alan Turing and Arthur Samuel introduced machine learning [62]. Artificial Intelligence (AI) techniques have increased in various domains [47]. This paper discussed and analyzed deep learning because it is an important origin of data science. It is very useful to data scientists who collect, analyze, and interpret large quantities of data; deep learning makes this method quick and simple [51]. In medical science, artificial intelligence has been used for classifying malignant tumours by a non-sequential recurrent ensemble of the deep neural network model. The training and validation accuracy and the ROC-AUC scores have been satisfactory over the existing models [38]. In medical science, artificial intelligence has been used for classifying malignant tumours by a non-sequential recurrent ensemble of the deep neural network model. The training and validation accuracy and the ROC-AUC scores have been satisfactory over the existing models [43].

In the following, the adjective deep in deep learning refers to the use of multiple layers in the network algorithms built in the form of multi-layered models based on various inputs. These models of deep learning are often based on Artificial Neural Networks(ANN), which is why they are called Deep Learning Neural Networks(DLNNs) [39]. Many examples demonstrated that DLNNs have better results than classical models. ANN is one of the most important data mining tools, which has also been used as a powerful learning model in various fields of text mining and data mining since the late 1990s [19].

In fact, in this paper, the application of neural networks in text processing issues concerning our Natural Language Processing (NLP) is considered, which will be reviewed due to the importance of categorization and prediction. It also applies linguistic analysis techniques to process the text. More attention has been paid to text classification due to the significance of assigning a predefined label to many documents automatically [70]. It also uses linguistic analysis techniques to process a text, such as sentiment analysis (SA) (also known as opinion mining) has been a core research topic in artificial intelligence (AI) [78]. A systematic review will guide researchers who are interested in conducting new research. Following an SLR method, all relevant studies from the accessible electronic databases are combined and presented to answer the research questions. An SLR study creates a new viewpoint and helps new researchers learn about advanced technology and scientific discipline [28]. SLR verifies research efforts related to a specific topic [47]. We expect the systematic study to be fully explained at all stages and transparent to other researchers.

This Systematic Literature Review (SLR) follows the Kitchenham and Charters [28] guidelines. It identifies different research papers published from 1997 to 2021 in the context of deep neural networks in programs related to content analysis algorithms. The number of reviewed papers was 130; however, after reviewing the inclusion and exclusion criteria, only 43 articles were in the study. Answered The research questions by extracting the appropriate information from the 43 articles and then forming a statistical representation using tables and figures. The results illustrate the trend of research conducted in this area over the past years and suggest new research topics. The present study systematically demonstrates text processing processes, including NLP, information detection, text mining, visualization of results, and the integration of neural networks to analyze the collected studies.

The paper is organized as follows: Section 2 contains the literature review, Section 3 includes Materials and methods, Section 4 contains Data, Section 5 contains QualityControl, Section 6 contains Analysis of Results, Section 7 includes Challenges and Recommendations, and Section 8 comparison results and discussion.

2 Review of literature

A massive study has been conducted on data mining to extract relevant and significant information from raw and unstructured text. Much work has been done on extracting useful features using various deep learning models for better results in text classification problems. Some of the related work in text categorization are discussed in this section [44].

In this section, we describe some basic concepts that were used later. AI is a general concept that includes ML and deep learning methods. Over the past decades, due to the importance of data mining, it has been shown that it can display different types of data through text, numbers, image, audio, and video; in this paper, we use textual data; many articles have examined different texts. One of the essential methods in text mining is ANN, which has demonstrated exemplary performance in predicting texts and classifying various problems [72].

The Artificial Neural Network is intended to verify which input is compatible with which output type. This relationship of the Neural Network with the desired output value can be introduced as sentiment analysis, and deep learning models have developed a method for learning vector representation to classify sentiment [64]. Bengio and Mikolov proposed learning techniques for the semantic representation of words in the text. The authors generated embedding word vectors with semantics vectors and used them to transform words into vectors [58]. A systematic study of various methods of neural network classification in the Arabic text shows that text classification is a way to search for textual information and explore the data [65]. Deep Learning techniques have recently attracted the academic community [9]. In the survey, deep learning models have been extensively applied in the field of NLP and show extraordinary possibilities. Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. The several sections in this survey briefly described the main Deep Learning architectures and related techniques that have been applied to NLP tasks and a comprehensive study of its current applications in sentiment analysis [75]. Deep learning methods are useful for high-dimensional data and are becoming widely used in many areas of software engineering [36]. The main criterion of this paper for classification is to examine the model using the confusion matrix, which considers only the recurrent neural network models related to text processing. A deep learning and text mining framework for accident prevention is an example in society [79].

Deep learning model examples are typically designed for text classification (e.g., fast-text and Text-CNN or pre-trained language models, such as BERT(Bidirectional Encoder Representations from Transformers) [70]. With the massive progress of the internet, text data has become one of the main formats of big tourism data. Text is an effective and generally existing form of judgment expression and evaluation by users. In the past period, a variety of text mining techniques have been proposed and applied to tourism analysis to improve tourism value analysis models, build tourism recommendation systems, create tourist profiles, and make policies for supervising tourism markets; Tourism text big data mining methods have made it conceivable to analyze the behaviours of tourists and realize real-time monitoring of the market. As the key text analysis technique, NLP is experiencing a period of strong progress. Both machine learning and present deep learning with high achievements have been greatly applied in NLP. The successes of these techniques have been further boosted by the progress of NLP, Machine Learning, and Deep Learning [34].

In addition, Text mining, a section of synthetic intelligence, is gaining ground nowadays regarding applications in business and analysis. Most of the recent research conducted during this paper focused principally on advanced or the hybrid of deep neural networks to induce economic and higher results. For the same, RNN and LSTM-based models have been used, and the accuracy of the proposed models has been enhanced by varying the hyper parameters and using Glove word embeddings [44].

Deep neural networks (DNNs) have revolutionized the field of NLP; Convolutional Neural networks (CNN) and Recurrent Neural networks (RNN), the two main types of DNN architectures, are widely explored to handle various NLP tasks and have shown that hidden size and batch size can make DNN performance vary dramatically. This suggests that optimizing these two parameters is crucial to the good performance of both CNNs and RNNs [71].

Also, an improved method for link forecast in attributed social networks is presented. One of the newest link prediction methods is embedding methods to generate the feature vector of each node of the graph and find unknown connections. In order to justify the proposal, the authors conducted many experiments on six real-world attributed networks for Comparison with the state-of-the-art network embedding methods. One of the newest link prediction methods is embedding methods to generate the feature vector of each node of the graph and find unknown connections [8].

This article, unlike other articles, examines text processing for emotion analysis using all in-depth learning methods. In this article, we compare the topic of emotion analysis with several methods.

Many studies have been done on text processing and deep neural networks. In this article, all deep learning methods have been studied, some machine learning methods have been studied, and text mining has not been studied only on a specific topic. This subject can be the difference between this article and other articles.The following definitions describe classification techniques:

2.1 Concepts and definitions

Text mining

Text mining is an important way to extract meaningful and valuable information from unstructured textual data and helps researchers achieve their goals. Text mining methods, through the identification of the subject, patterns, and keywords, automatically examine a considerable amount of information and increase helpful knowledge and information from them, including various methods (information retrieval) [31, 63].

NLP

Natural language processing (NLP) is mentioned as the branch of computer science and, more particularly, the branch of artificial intelligence or briefly (AI) that is concerned with giving computers the ability to understand the text and spoken words in much the same way human beings can. NLP is one of the important superlative technologies of the information age. The history of NLP dates back to the 1950s [52, 58] NLP is used in classification, clustering, sentiment analysis, text summarization, etc.

Deep learning

Deep learning is an extension of ML where deep neural networks are employed for feature extraction and analysis from large-sized datasets [59], and that is a subset of ML in which the computer learns without knowing the human previous characteristics and knowledge. Most Deep Learning models are based on artificial ANNs. The artificial neural network was first published in the 1960s and has strongly emerged since 2006; this network is sorted into layers and works in coordination. Their structure is an input layer, hidden (middle) layers, and an output layer that performs the calculations. During the training process, the weight of the neurons changes to achieve an optimal network. The models applied in NLP include Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). It is the junction of neural networks, graphic modelling, optimization, AI, pattern recognition, and signal processing. Deep learning architecture has emerged as the algorithm of the century because of its ability to generalize, learn, and meet practical expectations [58]. Figure 1 shows summaries and classification of Deep Learning and machine learning and related tasks along with their essential characteristics.

Fig. 1
figure 1

classification of deep learning and machine learning [39]

Supervised Learning

This type of machine learning is based on using labelled data to train the learning algorithm. The data is labelled since it consists of pairs, an input that a vector can represent, and its corresponding desired output, which can be described as a supervisory signal [39].

Unsupervised Learning

Unlike supervised learning, this method uses an input data set without any labelled output to train the learning algorithm. No right or wrong result exists for any input object, and no human intervention to correct or adjust like supervised learning [39].

RNN

This type of network, also called RNN, was first created by David Rumelhart in 1986 and developed later. It is a kind of neural network used in speech recognition NLP and sequential data processing. These networks do not transmit information in just one direction (from the input layer to the output one). In RNN, each node acts as a memory cell and carries on operating and calculating. Despite Feed Forward Neural Networks, in RNNs, edges can form circles. These networks can easily remember their previous input because of their memory power and use it to process the following sequence. LSTM is one of the most popular types of these networks. RNN is used for time series, text, and audio data. This network cannot memorize the sentence for long-term; that is the weakness of the networks, which is the so-called Vanishing gradient problem [61]. This network can be combined with convolutional networks; The problem with this network is that it is difficult to train.

The performance of RNN is such that it is necessary to know the previous words to predict the next term. RNN repeats the operations as is evident from its name (implies the same operation on all parts of a sequence (or series) of inputs). It takes the input Xt and uses the output ht. The resulted output depends on the previous calculations. This output indicates how much the output value will be compatible with the input variable. Therefore, the information will be transmitted from one step to the other, enabling us to model a sequence of vectors. We are trying to approximate a probabilistic distribution on all possible next words for the previously given words in a dictionary at each time step. The output layer in an RNN is also a Softmax layer that presents a vector that refers to [59].

LSTM

After the RNN encountered gradient disappearance, LSTM networks emerged intending to solve the problem of gradient disappearance or gradient explosion in 1997 Hochreiter and Schmidhuber for the development of the RNN; these networks are for sequential data. In the recurrent network, the content of each step is rewritten. These networks are also the premiers to recurrent networks in assessing accuracy and speed and examining more data [21]. This kind of network always works better than RNN [26]. First, we should convert the list of input sequences into the vectors expected by an LSTM network. That means we must have linguistic representation. Word-embedding methods are used, then the texts converted into vectors are entered into the network one by one, and the amount of memory is updated after each word. Then, prediction or proper labelling will be made.

Convolutional Neural Network (CNN)

A CNN is a kind of neural network used in AI, NLP, speech, and images. This network was first introduced by Hubel and Wiesel [23]. These networks are considered deep neural networks consisting of three hidden input and output layers; the convolution neural network center has a convolution layer, which is why this type of network is named. The input layer is an array of numbers. Image Net database data developed this architecture using the Relu activator function. Due to the higher performance speed and reduced training time, these networks have high accuracy in image recognition [56]; and feature recognition without human intervention. CNN no longer has the possibility of long-term dependence, and LSTM is preferred for issues such as language modelling in which dependency is essential. The high number of input data can be a disadvantage of this network.

Neural network

The Artificial Neural Network (ANN) is composed of a group of several perceptron’s/neurons in each layer. An artificial neural network is called Feed-Forward because the inputs are processed only in the forward direction. ANN consists of information, hidden, and output layers, and each layer tends to learn actual weights. These networks can be deep or shallow. External networks have a hidden layer (i.e., a layer between input and output), but deep networks have more layers called deep neural networks. Can use these networks to solve problems with tabular data, image data, and text data. The neural network is a part of AI based on the biological model of humans and animals, which is a suitable method for detecting unknown patterns in data with different applications and types [72].

Text classification

The text classification process is the method of automatizing a collection of documents into specific groups based on the content of the text itself through the application of particular technologies and algorithms [65].

Word embedding

Since NLP is an algorithm, these algorithms must display words numerically as input vectors. In traditional NLP, words are represented as one-hot encoded one-dimensional vectors. For example, the sentence “artificial neural network is the best” comprises six words. The word Artificial Neural Network is the best; in this way, each word takes the number one, and the rest becomes zero. The length of this vector is six (100,000, 010000, 001000, 000100, 000010, and 000001). Word embedding is a method that helps to analyze the meanings of words. The embedding description is learned in word embedding using shallow neural networks [56]. A word embedding is a real value vector representing a single word based on the context in which it appears. They represent a dictionary and have a wide range of applications in NLP. There are several ways to learn word embedding [14, 18, 29].

Sentiment analysis

In the discussion of NLP, sentiment analysis has a fundamental role. Sentiment analysis includes mental classification that shows a specific text as subjective or objective, and sentiment classification that classifies a mental text as positive, negative, or neutral [45].

3 Materials and methods

A literature review could only be considered a systematic review if it is based on research questions [74]; Fig. 2 follows the main stages of the research. In general, the research questions were identified.

Fig. 2
figure 2

Research steps flow chart

This study uses guidelines for a systematic review of deep neural networks in text processing problems [28]. At first, the keyword search strategy was identified in the research, and then the data sources were identified. Next, the inclusion and exclusion criteria of the paper were determined. Furthermore finally, the quality and data analysis were evaluated. These steps are discussed in detail in the following sections.

3.1 Research strategy

The research was performed by limiting the efficient fundamental concepts related to the subject under study. Numerous studies have been conducted in deep learning, machine learning, different neural networks, RNN, and short-term memory. Many published studies are not reviewed within the scope of the present paper. Following the search strategy, the keywords used to collect the studies contained NLP, RNN, LSTM, deep learning, machine learning, and optimization that briefly and comprehensively showed.

  • LSTM

  • RNN,LSTM

  • TEXT MINING,LSTM

  • LSTM,BERT,RNN

  • LSTM,DL,ML

  • LSTM,NLP

  • LSTM,WORD EMBEDDING

  • LSTM,BERT

  • LSTM,OPTIMIZATION

  • BERT,TEXT MINING

In addition to abbreviations, phrases are also thoroughly searched. All the abbreviated words searched in databases show that more than a thousand related studies have been conducted and made available to the public.

3.2 Inclusion and exclusion criteria

+ Inclusion criteria

The inclusion criteria have been chosen as follows, and the inclusion criteria are indicated with (+),

  • + The study should be of the text classification type.

  • + Input data must be textual.

  • + There must be a method of embedding words.

  • + Reputable journals should be reviewed unless they are part of a text classification study.

  • + The presence of NLP words, RNN, short-term memory, text mining, and sentiment analysis in the title or keywords is a preference to enter the study list.

-Exclusion criteria

Exclusion criteria include topics that are not valued in the study and are removed from the study list. Exclusion criteria are indicated with (−),

  • Papers that do not have obvious information.

  • Input data is of numerical case.

  • The subject of study is the type of forecasting process of an investigation.

  • The superior method is the kind of machine learning.

  • The papers should only include definitions and reviews of the method.

  • Short papers that are without model description and review.

Figure 3 shows the number of changes in the selection of papers. Out of 130 papers evaluated in the first stage, 63.

Fig. 3
figure 3

The process of selecting articles according to inclusion and exclusion criteria

articles were removed; consequently, 67 articles remained in the review. Moreover, of those, 43 papers that directly with the issue of text classification and sentiment analysis are shown in Table 7.

3.3 Research questions

In line with this study, the research questions focus on specific and general aspects of the paper. More specifically, this SLR poses the following four research questions:

  1. 1.

    What are methods of Machine Learning (ML) and deep learning used in classification and word processing?

  2. 2.

    Which methods are more accurate in classifying and processing text?

  3. 3.

    What are the active databases in optimization algorithm studies, data mining in neural networks, word processing, and Long Short Term Memory (LSTM)?

  4. 4.

    What methods are used for word embedding in studies and word processing problems?

3.4 Database resources

Today, digital libraries are suitable for searching for books, magazines, and articles. The present study started in November 2020. Most of the research is from 2016 to 2020; some is from previous years. Table 1 shows journals providing papers and the frequency that have been used. In addition, Fig. 4 shows the frequency of articles received from 1997 to 2021. The papers from Google Scholar, Science Direct, Taylor, and Francis digital libraries are searched and stored. Then, the systematic rules necessary to meet the study’s aims were applied. Keyword selection is a key step in any systematic review. It determines what articles should be detected.

Table 1 Databases of reviewed papers
Fig. 4
figure 4

Frequency of articles from 1997 to 2021

4 Data

In this section, we introduce standard datasets related to text mining to evaluate the proposed models. A variety of data has been used for assessing the proposed models. They have various features, which we discussed in this section, and we will introduce their widely used data.

Book Corpus

The dataset consists of 11,038 unpublished books from 16 different types. The pre-training corpus consists of Book Corpus (800 million words) and English Wikipedia (2500 million words).

Internet Movie Database (IMDB) contains actors, movies, and TV series information. It has more than 100,000 movies and various series. It has 83 million registered users as of October 2018, and their opinions and sentiments for several movies [59].

Pima-Indians-diabetes-database (PIDD): This dataset is originally from the National Institute of Diabetes, Digestive, and Kidney Diseases.

Stanford Sentiment Treebank (SST): This dataset includes 11,855 movie reviews and includes train (8544), development (1101), and test (2210) data [67].

SST1

It is an extension of MR(Movie review) but provides five kinds of labels, including very negative, negative, neutral, positive, and very positive [66].

SST2

The set consists of two classes, including negative and positive. The number of training, development, and test sets samples is 6920, 872, and 1821, respectively [15].

Subjectivity (SUBJ)

The subjectivity dataset contains sentences labelled concerning their subjectivity status (subjective or objective) [67].

Wikipedia

It is a free online encyclopedia created worldwide and is available in 317 languages; it has over 55 million articles.

Twitter

Twitter is an American social network where users post messages known as tweets. These datasets can be labelled with three sentiment poles: positive, neutral, and negative [53].

PUBMED

It is a free search engine that primarily accesses the MEDLINE database from sources and abstracts of biosciences and biomedical topics.

TREC

TREC is a data set for classification questions based on reality. TREC divides all questions into six categories: location, human, entity, abbreviation, description, and numeric. The training dataset contains 5452 labelled questions, while the testing dataset contains 500 questions [80].

Movie Review

A set of videos was recovered from imdb.com in the early 2000s by Bo Pang and Lillian Lee. These studies were collected and made available as part of their research on NLP [66].

Amazon

This dataset includes reviews of Amazon products and services and contains reviews (ratings, text, and helpful votes), metadata products (descriptions, category information, price, brand, and image features), and links (viewed/also bought graphs).

5 Quality control

Another challenging problem is the evaluation of results. The recommended methods should be evaluated and compared with other methods. We can use machine learning and data mining methods to identify hidden patterns. These methods are available to evaluate the performance of machine algorithms, classification, and regression. The selection criteria must be considered carefully. Also, types of evaluation criteria can be mentioned in Table 2 [18].

Table 2 Evaluation criteria used in the papers on deep learning and neural network

Figure 5 pie chart shows the evaluation criteria used in the deep learning paper and Table 3 shows the frequency evaluation criteria, and also the evaluation criteria used in the articles are illustrated in Table 7. The most widely used evaluation of the criterion for the model is accurate.

Fig. 5
figure 5

The pie chart shows the evaluation criteria used in the Deep Learning paper

Table 3 Frequency of evaluation criteria used in articles

6 Analysis of results

The papers that were evaluated through inclusion and exclusion criteria were categorized. Table 7 shows that deep learning methods have been reviewed in the papers on sentiment analysis, text classification, and text mining. However, in this research, various models such as RNN, short-term memory, CNN, Bert algorithm, and neural network; were investigated based on hybrid ideas to show that they have achieved high accuracy in the field. Accuracy is a more important factor in evaluation because it shows the proportion of correctly classified numbers to the total number of classifications. A total of 43 articles that have directly discussed sentiment analysis and text classification, along with the methods considered in the papers and the method chosen among the methods, the evaluation criteria, and the type of word embedding, have been classified. Other papers classified by systematic method, text mining, explanation of methods, and algorithms were included in the selected articles but not in the analysis category. The analysis of the results showed that the methods related to the deep learning algorithm performed stronger than the machine-learning algorithm in the study related to text classification, sentiment analysis, and text mining methods.

The results showed that the performance of short-term memory alone and the combination of this network with convolutional networks, Bert, as well as attention-based mechanism and BI-LSTM, have mostly been used and have shown a higher accuracy percentage. The superior hybrid methods are categorized in Table 4.

Table 4 Hybrid methods with higher accuracy and performance in text classification

The use of machine learning methods is one of the exclusion criteria in this study, but due to their application in text processing problems, the algorithms mentioned in Table 5 are employed. The methods’ definition and utilization are discussed in the [12, 77].

Table 5 Effective Machine-learning algorithms in classification and text processing

The SVM algorithm has the most potent method among the nominated algorithms that have shown the highest accuracy in classifying problems. In addition, the procedures for word embedding in the studies and text processing problems are shown in the last column of Table 7.

The methods of word embedding in text processing problems are fully described in the systematic literature review [29] and are shown in Table 6. According to the collected studies, it is found that the process of text processing techniques related to deep learning and machine learning is approximately related to each other.

Table 6 Embedding word methods to convert text to vectors

In both methods of deep learning and machine learning, data preparation, selection, and extraction of algorithm feature selection and learning method, model training based on the train data, model testing based on the test data, evaluation of model accuracy, model application, and final evaluation, and knowledge acquisition from data analysis has been used. From a text processing point of view, these two methods have the same procedure, but the major difference is that in deep learning, more details are considered, the learning process is repeated, and the number of steps is large. It also provides considerable accuracy over machine learning methods and requires more input data. According to hidden layers, deep learning networks do learning in the text, which causes the prediction of the next word, label recognition, questions, and answers (Table 7).

Table 7 Text classification and sentiment analysis paper

7 Challenges and recommendations

This research focused on the AI methods based on text mining and language processing techniques considering deep neural networks. The problem with these networks is that the concept of test and train is used to achieve the exact number of layers for labelling concepts, and the recurrent of the network helps reach this number, but more time is allocated. We recommend that for future studies, mathematical models to optimize the neural network will examine all number of layers and the layer that has been targeted. The algorithm’s speed, computational power, and flexibility of neural networks can be compared. Also, the hybrid models of machine learning with deep learning could be examined in terms of accuracy and quality.

8 Comparison results and discussion

In this section, 4 types of neural networks are compared. Comparisons of these networks are shown in Table 8.

Table 8 Comparison of relationships between CNN, RNN ANN networks

The word yes indicates that these networks are related to the named features. Also, The word data describes what kind of information can be used in any network. That is, digital data, text, audio, and video are done by which type of network. As well as the characteristics of reciprocal relations, Feedback relationships, Parameter sharing, Spatial relationships, Vanishing and exploding gradient, and Deep Learning Neural networks are defined for these networks. Moreover, it shows that the evaluation criteria in the confusion matrix are always used to examine the statistical test.

9 Conclusion_Future

We live in the “Information Age,” being exposed to considerable information and data. Due to the production of an enormous amount of information daily in various processes and fields and the problems that come with manually categorizing concepts, we have used concepts related to NLP called text mining. This concept could convert a large amount of information into practical knowledge. Text mining can analyze and process text by machine learning and deep learning algorithms sub-branches of AI. According to the review of articles, this result was obtained from a set of methods used to analyze, predict, label results, and answer questions in text mining. Deep learning, including neural networks, has studied texts in many layers, and this is a survey of the long sequence of the text. The most widely used evaluation criteria for the model is accuracy, and the databases of SST1, SST2, and IMDB data have been used more in papers, respectively.

The neural networks, which achieved the highest accuracy, include the LSTM and RNN network and the hybrid LSTM with BI-LSTM, CNN-LSTM, CONV-LSTM, and BERT-LSTM were studied among neural networks. In addition, after deep learning methods, the support vector machine (SVM) has gained high accuracy of results among machine learning methods.

For activities we can do in the future. Can do sentiment analysis with multivariate decision-making methods. Can be compared MODM, MCDM TOPSIS methods to determine the accuracy of the network against confusion matrix methods. Examine the GRU network and compare its accuracy with the LSTM network.