1 Introduction

At this present time, the internet is being widely used by people globally. Social media play a vital role in content sharing through the internet. People express their feelings (opinions) towards a particular topic on social media. Users can easily know the feedback of people on the interested topics e.g., goods, products, or services. Opinions (Wang and Meng 2022) can be analysed and used to assess the quality of a product or service (Pavitha et al. 2022), to identify the problem associated with a product, or to improve the quality of the product. However, it is quite difficult and time-consuming if the opinions from thousands of comments are manually analysed. Thus, the researchers introduce data mining approaches to analyse public opinions. Sentiment analysis (Liu et al. 2022; Haselmayer and Jenny 2017; Liu et al. 2022; Wang and Meng 2022) is a part of data mining that extracts and analyses the subjective information of public opinions from social media or other sources using the natural language processing and computational linguistics. One of the machine-learning techniques, deep learning method outperforms other techniques in terms of precision and performance on image (Khandokar et al. 2021; Islam et al. 2022), video (Islam et al. 2020) and speech (Islam et al. 2021; Hasan et al. 2021; Khanday et al. 2020; Shofiqul et al. 2020; Islam et al. 2021, 2022) classification.

In sentiment analysis, an in-depth analysis is used to determine the strength of the feelings, known as sentiment score. Sentiment analysis is conducted mainly at three levels (Wu et al. 2019), which are document, sentence and aspect levels. However, document level can only discover general polarity and not particular emotions for each entity (Kumar and Sachdeva 2021). Its goal is to classify the complete opinion texts as a single type e.g., positive or negative vision for a product, an item or a service (Kumar and Sachdeva 2021). When assessing the overall view of a product, the sentiment analysis evaluates the product whether it is of good or bad quality. Sentence-level sentiment analysis, on the other hand, handles emotion at the sentence level with superior subjectivity and objectivity, albeit it is not appropriate for complex sentences (Hoogervorst et al. 2016). The text consists of opinions named as positive, negative, or neutral expressions. This is the most comprehensive analysis of a document (Kausar et al. 2019). It is used for a sentence comment or feedback. Sentence-level sentiment analysis is especially useful for social-media comments or opinions in the current era. The machine-learning approaches should handle these sentiments individually. The last type of sentiment analysis is aspect-level sentiment analysis to handle negation for simple and short sentences but achieves weak performance for negation in long and complex sentences (Kumar and Sachdeva 2021). This helps an organization understand more precisely the emotion or opinions of the people by analysing their comments. In aspect-level sentiment analysis, the sentence contents are checked to extract an individual feeling in detail to achieve its intensity with the semantic dependency relationships between essential tagged terms. Next, the emotional value of the entire sentence is integrated so that the polarity of the statement can be measured. Information rating in sentiment analysis is calculated using the variation of the whole report expression message.

Machine learning algorithms (Gulati et al. 2022) are faster but their performance or accuracy is not satisfactory. Deep learning methods perform better than the machine learning method for textual sentiment classification (Bharti et al. 2022; Mahendhiran and Kannimuthu 2018).

Employing deep learning (Diwan and Tembhurne 2022; Liu et al. 2022) for sentiment analysis represents an innovative approach to comprehending and evaluating text data, presenting a multitude of benefits. One of its main advantages lies in its remarkable precision and performance (Bharti et al. 2022; Mahendhiran and Kannimuthu 2018). Deep learning models, especially neural networks, have the capability to grasp intricate patterns and correlations within the data, resulting in superior accuracy when compared to conventional machine learning techniques (Mewada and Dewang 2023). Furthermore, deep learning models possess the ability for autonomous feature acquisition, eliminating the necessity for manual feature engineering. They can adjust to different data types and circumstances, rendering them highly versatile and suitable for a variety of domains and languages (Liu et al. 2022). These models demonstrate proficiency in discerning the contextual backdrop in which sentiments are conveyed, a crucial aspect for precise sentiment analysis. Additionally, deep learning models can handle large-scale data adeptly, making them well-suited for sentiment analysis across platforms like social media and other review systems. Utilizing pre-existing models and leveraging transfer learning further refine their efficiency and reduce the time required for training (Mewada and Dewang 2023). Through the integration of multimodal data, deep learning models broaden the horizons of sentiment analysis by incorporating text alongside various modalities such as images or audio (Mahendhiran and Kannimuthu 2018). This integration enriches the analysis and provides a more thorough comprehension of the sentiments being conveyed. Equipped with real-time or near real-time analytical capabilities, deep learning models empower businesses to swiftly monitor and respond to customer sentiments, allowing them to tailor their strategies accordingly. Furthermore, these models display an ongoing enhancement cycle, continually refining their precision and adaptability as they encounter more data and undergo iterative training processes (Bharti et al. 2022). In summary, sentiment analysis utilizing deep learning leads the forefront of sentiment analysis methodologies (Jia and Wang 2022; Zhang et al. 2021), offering unmatched precision, contextually informed insights, and adaptability across an array of applications and fields.

To analyse public sentiment, there are some popularly used methods named as CNN (Ezaldeen et al. 2022; Diwan and Tembhurne 2022; Dangi et al. 2022), LSTM (Mittal et al. 2021; Han et al. 2021) BiLSTM (Schuster 1997), GRU (Han et al. 2021), BiGRU (Li et al. 2022; Han et al. 2020), Capsule (Jia and Wang 2022; Zhang et al. 2021; Liu et al. 2022), Capsule based attention BiLSTM (Dong et al. 2020), Attention (Liu et al. 2022, 2022; Xu et al. 2020), Attention BiGRU (Liu et al. 2022), Attention LSTM (Zeng et al. 2019), Attention CNN (Islam et al. 2021), Hybrid (Liu et al. 2022), (Liu et al. 2021; Kumar and Sachdeva 2021; Trusca and Spanakis 2020; Dashtipour et al. 2020), Hybrid CNN (Aslan 2023), Conv-BiGRU (Başarslan and Kayaalp 2023), Attention BERT (Mewada and Dewang 2023), Hybrid Capsule BiLSTM (Mewada and Dewang 2023) and Multimodal (Bharti et al. 2022; Mahendhiran and Kannimuthu 2018, Neuro symbolic AI (Roig Vilamala et al. 2022; Cambria et al. 2020, 2022; Shakya et al. 2021; Bosselut et al. 2021; Tiddi et al. 2020) and efficent network (Dangi et al. 2023).

The use of deep learning for sentiment analysis encounters a range of obstacles that necessitate careful attention for the improvement and reliability of the approach (Zeng et al. 2019). One significant challenge revolves around acquiring a substantial amount of high-quality labeled data (Muhammad et al. 2020), a fundamental requirement for effectively training robust models. Overfitting (Gupta and Sharma 2022), a common issue, calls for the implementation of strategies such as regularization and data augmentation to prevent models from performing exceedingly well on the training dataset but poorly on unseen data (Abonizio et al. 2021). Adapting the models to various domains proves challenging due to differences in language, expressions, and sentiment cues. Grasping context and linguistic subtleties, especially in cases involving negations or modifiers, remains intricate (Kumar et al. 2020). Multilingual sentiment analysis poses another formidable hurdle, requiring models to comprehend diverse languages along with their unique linguistic characteristics. The uneven distribution of data, the interpretability of models, real-time processing without compromising accuracy, ethical considerations to mitigate biases, and ensuring continuous adaptability are among the pressing challenges (Gandhi et al. 2023). Addressing these issues will significantly enhance the effectiveness and applicability of sentiment analysis based on deep learning across diverse domains and languages. Researchers and practitioners are actively engaged in advancing model architectures, refining data collection methods, and devising innovative training approaches to confront these challenges effectively (Liu et al. 2021; Kumar and Sachdeva 2021; Trusca and Spanakis 2020; Dashtipour et al. 2020).

We included a complete table of abbreviations to Table 1 to make easy comprehension and introduce to different types of terminology. Two columns illustrate the abbreviation as well as its meaning.

1.1 Recent trends in deep learning based sentiment analysis

In recent times, there have been a lot of trending application fields of Sentiment analysis like Affecting Computing, Aspect extraction, Text summarization, Knowledge Extraction, Product Recommendation, Movie review, Language Understanding, and Opinion Mining.

We presented a complete table of recent trends of sentiment analysis to Table 2 to make easy comprehension and introduce different trending applications of sentiment analysis. Two columns have been used in this table to illustrate the trending name as well as its proper references.

Table 1 List of Abbreviations used in this article
Table 2 Recent trends in Sentiment analysis

1.2 Limitation of recent review

There are several review and survey papers on deep learning-based sentiment analysis. Some recent review papers are listed in Table 3 with their work finding and limitations. A recent review in 2023 (Chan et al. 2023) Nicely presented some deep learning methods and their application in sentiment analysis. But did not cover analytical analysis and limitations of existing methods of deep learning-based sentiment analysis.. In 2022, a very recent review paper covered approaches and applications for sentiment analysis but did not cover the deep learning approach properly (Govindarajan 2022). Another recent paper in 2022 presents several deep learning approaches, their challenges, and their future issues but did not analyse any data (Seema et al. 2022). A review paper by Dang in 2020 provides an overview of sentiment analysis based on deep learning with finding and experimental results of different deep learning models but research gaps are not clearly presented (Ragini et al. 2018). In 2020, a review paper by Onan et. al on deep-learning-based opinion mining was published (Onan 2020). This paper analysed text representation and text embedding to compare the performance of the different machine and deep-learning approaches and finally presented the author’s model (RNN with attention mechanism and GloVe) that achieves 98.29% accuracy. This paper presents a good analysis of performance results, but it does not had enough details of findings, research gaps or other analyses. Kumar et. al has published a review paper in 2020 based on Twitter data with soft computing (Kumar and Jaiswal 2020). This paper presents a good analysis with different tools, domains, and features but has no clear and complete information for research gaps and merits. In 2019, Yadav et. al in their review paper described the widely known datasets, key characteristics of the datasets, the taxonomy of sentiment analysis models and analysis levels, the deep learning model applicable to them, their accuracies as well as the comparison of different deep learning models (Yadav and Vishwakarma 2020). This paper presents the merits and demerits of only some papers on different models with less information about research gaps and findings. A review paper on deep learning for aspect-level sentiment classification with survey, vision, and challenges was published by Zhou et. al in 2019 (Zhou et al. 2019). This paper gives a comprehensive study of recent deep learning-based approaches and their summary and evaluation measures. The paper also provides some information on research challenges and future research direction. The paper focuses on descriptions of model and data information and result comparisons but does not present a sufficient analysis of findings, research scope, and limitations. A comprehensive survey on deep learning-based sentiment analysis was conducted by Zhang et al. (2018). In this paper, a complete overview of different deep learning models with related figures and equations was presented. In this paper, the author provided related papers with proper descriptions of used features, tools and others but did not make a sufficient analysis of research gaps, merits and limitations. There are some review article that nicely addressed only challenges but did not cover method, application and data information (Hussein 2018; Mohammad 2017; Saxena et al. 2022).

Table 3 Recent review for sentiment analysis using deep learning

From the above description, it is noted that most of the authors analysed existing research in a conventional way. Existing survey papers describe in a general manner with a simple description of the methods and related works. Normally, we see a similar pattern of a description of the comparative study or related study in a conventional survey paper for sentiment analysis. Related study or comparative study describes existing research with the name of journals, used tools, performance, type of tasks findings etc. However our review paper is unique with different types of critical analysis. Our survey paper systematically describes the taxonomy, latest implementation, design and dataset of sentiment analysis and also explains pre-processing of data, performance metrics as well as related recent research references, text embedding with associated references, the architectures of deep-learning framework and algorithms. In the critical analysis section of our research, we describe related research with their methods, advantages, disadvantages analytical results and research gaps. Finally, we present the drawbacks, challenges, limits and future work of research in this area. All of the mentioned features of our paper make this paper as an unique survey. Our main contributions on this review paper can be summarized as follows:

  1. 1.

    Presenting a novel discussion of deep learning based sentiment analysis with taxonomy, recent applications and proper datasets information.

  2. 2.

    Discussion of the data pre-processing, performance metrics, text embedding, deep learning model with their implementation architecture and overall drawbacks, challenges, limitations and future works.

  3. 3.

    Critical analysis of all categories of deep-learning based sentiment analysis methods with their tools, advantages and disadvantages of findings.

  4. 4.

    Proposing an improved hybrid deep learning CRDC model using capsule network with bidirectional RNN and deep CNN and showed that the CRDC model acquired highest accuracy on data IMDB at 88.15%, on Toxic as 98.28%, on Crowdflower at 92.34% and 95.48%.

  5. 5.

    Recommendation of best model based on comparative and analytical analysis.

As illustrated in Fig. 1, the remainder of this study is organised as follows. The first component of the research gives background information. Section 2 provides the demonstrations and methodology for different types of deep learning models. Section 3 provides a critical examination of recent research, including methodology, methods, data, findings, and research gaps. We have also presented our proposed method in this section and showed its performance. Section 4 gives the pros and cons of different recent deep learning methods. Section 5 is for an overall presentation of challenges. Section 6 for discussion with the best model recommendation. Section 7 covers significance as well as limitations and future directions. Section 8 is for the final conclusion.

Fig. 1
figure 1

Organization of the paper

1.3 Article selection process

A flow diagram of the literature screening process is shown in Fig. 2. This flowchart shows how the papers are included and excluded at each level. All of the papers were chosen from a variety of reputed journals. Some papers were not considered throughout the study period for the reasons listed in the figure. Selection of final papers was made by passing through both the screening and eligibility levels. First, our search returned a total of 1000 relevant publications. We extracted a list of 950 articles by removing 50 duplicated references and further screened out 750 articles to obtain a list of 200. Next, we eliminated 58 research based on the criteria e.g., systematic assessments and review reports and only considered full-text articles that are related to sentiment analysis. Finally, we studied the full text of the remaining 142 papers and deleted 42 of them because they were used in another publication with the same objective. In the end, 100 papers in our review analysis matched the eligibility requirements.

Fig. 2
figure 2

Flow diagram of literature search for including studies in our systematic review

1.4 Classification of sentiment analysis

Sentiment analysis is performed on human opinion or comments for a product, a service, or an event through social media, websites, emails or other platforms. Since a text of evaluations or comments may have multiple meanings, sentiment analysis can be challenging. Sentiment analysis is performed using text mining with machine linguistics and natural language processing tools. It can assist in understanding an individual’s psyche by analysing emotions expressed in online text (Kolkur et al. 2015). Opinion mining and emotion mining are the two most common subcategories of sentiment analysis. Opinion mining includes subjectivity detection and polarity classification. Emotion detection, polarity classification, and emotion categorization are necessary processes of of emotion mining. We will concentrate on opinion and emotion mining in this study. The taxonomy of sentiment analysis from text is shown in Fig. 3.

Fig. 3
figure 3

Taxonomy of sentiment analysis

This figure shows that sentiment analysis is subdivided into two types: opinion mining (Mittal et al. 2021; Han et al. 2021) and emotion mining (Bharti et al. 2022; Mahendhiran and Kannimuthu 2018). Each section is also subdivided into different categories based on its application fields. Opinion detection (Mittal et al. 2021; Han et al. 2021; Schuster 1997), opinion summarization (Abdi et al. 2021) and argument expression detection (Galassi et al. 2020) are in the subdivision of opinion mining (Jia and Wang 2022; Zhang et al. 2021; Liu et al. 2022). On the other hand, emotion detection (Liu et al. 2021; Kumar and Sachdeva 2021), emotion polarity classification (Trusca and Spanakis 2020; Dashtipour et al. 2020) and emotion cause detection (Peng et al. 2021) is in the subdivision of emotion mining. Our review paper focuses on deep learning-based opinion and emotion mining.

1.5 Sentiment analysis approach

There are a lot of methods or approaches in sentiment analysis. Overall taxonomy of sentiment analysis approaches is discussed in this section with a figure. A complete overview of deep learning approaches analysed by this article for sentiment detection, analysis and classification is shown in Fig. 4. Here we list the most recent and popular tasks in sentiment analysis using deep learning are CNN (Diwan and Tembhurne 2022; Dangi et al. 2022), 2DCNN (Kamyab et al. 2022; Silva et al. 2018), 3DCNN (Shen and Guo 2022), LSTM (Mittal et al. 2021; Han et al. 2021) BiLSTM (Schuster 1997), GRU (Han et al. 2021), BiGRU (Li et al. 2022; Han et al. 2020), Capsule (Jia and Wang 2022; Zhang et al. 2021; Liu et al. 2022), Capsule based attention BiLSTM (Dong et al. 2020), Attention (Liu et al. 2022, 2022; Xu et al. 2020), Attention BiGRU (Liu et al. 2022), Attention LSTM (Zeng et al. 2019), Attention CNN (Islam et al. 2021), Hybrid (Liu et al. 2021; Kumar and Sachdeva 2021; Trusca and Spanakis 2020; Dashtipour et al. 2020), Multimodal (Bharti et al. 2022; Mahendhiran and Kannimuthu 2018) and Neuro Symbolic AI (Roig Vilamala et al. 2022; Cambria et al. 2020, 2022; Shakya et al. 2021; Bosselut et al. 2021; Tiddi et al. 2020).

There is also another application of sentiment analysis, this is the finding of the neutrality and its ambivalence handling (Mostafa et al. 2023; Alsayat and Ahmadi 2023). This used to filer (Valdivia et al. 2018) and detect neutrality (Valdivia et al. 2017) from text using deep learning approaches. The number of used methods in sentiment analysis analysed in this article are shown in Fig. 5a. From this figure, it is clearly shown that basic CNN and RNN classifiers are mostly used in sentiment analysis.

Fig. 4
figure 4

Taxonomy of sentiment analysis method

The number of used methods in the sentiment category analysed in this article is shown in Fig. 5b From this figure, it is clearly shown that opinion mining and multiple emotion recognition are mostly used in sentiment analysis.

Fig. 5
figure 5

a Quantification of used methods in sentiment analysis in this article. b Amount of sentiment categories analyzed that are presented in this article

1.6 Application of deep learning based sentiment analysis

In human-computer interaction, sentiment analysis from text is critical, and aspect feature extraction methods are critical in content management. In data mining, sentiment analysis is utilised for document classification. Sentiment analysis has a wide range of applications. Some of them are mentioned here, along with a brief description. Data mining of a user’s text in social media, academic material, and other sources. The technique of concept extraction is commonly employed. In concept analysis, data and aspect retrieval is utilised to extract relevant features from the text and recover information or aspects. The use of document categorization for idea analysis is common in document classification. Document classification for the conceptual framework is commonly used in document classification by extracting concepts from various sorts of documents. By utilising its retrieved concepts characteristic from various forms of text, concept analysis is also commonly employed in text categorization. To recognise emotion or opinion from text, emotion identification or social media results will be analysed with multilevel sentiment analysis is utilised. Stock market investing is the most important component of a country’s economy. Financial analysis and stock market forecasting have numerous applications. In addition to product reviews, movie reviews, and restaurant reviews, sentiment analysis is applied in review analysis. Sentiment analysis is designed for political analysis and the prediction of live events. The application of SA utilising deep learning is shown in Table 4 below.

Table 4 Summary of Application of SA using deep learning

1.7 Types of sentiment classification

Sentiment analysis are normally classified in to three types as below. Deep learning methods are used popularly to predict those three types of sentiment.

  1. 1.

    Aspect level Sentiment Categorization: A text analysis technique known as aspect-based sentiment analysis (ABSA) (Do et al. 2019) classifies data by aspect and finds the sentiment associated with each one. By connecting particular attitudes with various characteristics of a good or service, aspect-based sentiment analysis can be used to analyze consumer feedback. Recently, popular method like LSTM (Ma et al. 2018), CNN (Meng et al. 2019), RNN (Basiri et al. 2021), Capsule, Attentnion (Basiri et al. 2021), Hybrid (Basiri et al. 2021) and Neuro symbolic method (He et al. 2022) has been addressed to analyse aspect based sentiment.

  2. 2.

    Sentence level Sentiment Categorization: One of the key directions in the sentiment field of analytics is sentence-level sentiment detection. The past research on the subject focused on identifying a sentence’s polarity (e.g., positive, neutral, or negative), using semantic information gleaned from the sentences’ text information. Recently, popular method like BiGRU with CNN and CRF has been addressed to analyse sentence based sentiment (Chen et al. 2017), deep method for sentence level sentiment analysis (Hassan and Mahmood 2017), Arabic sentiment analysis (Alsayat and Elmitwally 2020; ELMITWALLY and Alsayat 2020).

  3. 3.

    Document level Sentiment Categorization: The process of determining if whether text in a document, or even a collection of documents, has different polarity is known as document-level sentiment classification. Recently, popular method like BiLSTM with CNN has been addressed to analyse document based sentiment (Rhanoui et al. 2019), Rule and deep learning based method for document level sentiment analysis (Ray and Chakrabarti 2022).

1.7.1 Other tasks of sentiment analysis

  1. 1.

    Sarcasm Analysis: Sarcasm detection is a relatively specialized area of NLP research, the goal of this sector is to determine whether a particular text is satirical or not. Deep learning and the BERT model are used in a context-based feature method for sarcasm recognition with benchmark datasets (Eke et al. 2021). Deep learning as well as ensemble learning for sarcasm detection (Goel et al. 2022).

  2. 2.

    Aspect extraction and categorization: An approach that is made in bidirectional encoder representations from transformers to extract multi-domain aspects (Dos Santos et al. 2021). Another method uses a deep convolutional neural network for aspect extraction for opinion mining (Poria et al. 2016).

  3. 3.

    Opinion expression extraction: Aspect-specific opinion expressions contain both the aspect and the opinion expression inside the original sentence context, whereas general subjective opinion expressions may include both the aspect and the expression. An approach that uses long short-term memory deep neural networks to extract aspects for Arabic review sentiment analysis (Al-Smadi et al. 2019), Hostel review analysis (Alsayat 2023).

  4. 4.

    Trends topic Detection: Trending topics can be identified and predicted using deep learning using streaming data (Pathak et al. 2021).

  5. 5.

    Product Recommendation: The development of deep learning has also benefitted recommender systems to recommend products and other services to the customer. Context-aware recommendation system using deep learning that takes into account contextual features (Jeong and Kim 2022).

  6. 6.

    Finance prediction: The practice of applying neural network techniques in many areas of the finance industry is known as deep learning for finance. Under the digital economy, risk prediction in the financial planning of companies listed is based on an improved BP neural network (Li et al. 2022).

  7. 7.

    Live game prediction: Sports game prediction model utilizing an attention-based LSTM network for both exercise and training (Shen 2022). Deep learning-based systematic prediction of degrons as well as E3 ubiquitin ligase interaction (Hou et al. 2022).

  8. 8.

    Stance Detection: Rumor identification on social media sites is significantly aided by the categorisation of user comments based on their stance. Bipartite Graph Neural Networks for Twitter User Stance Detection (Zhang et al. 2023). Utilizing an embedded LSTM model, stacked models are used to extract arguments and identify stances (Rajula et al. 2022).

1.8 Sentiment analysis architecture

A general structure for standard machine learning and deep learning-based sentiment analysis is described in this section. There are a few basic processes in the standard machine learning techniques for sentiment analysis as shown in Fig. 6. This architecture is based on deep learning-based sentiment analysis. The first stage is to collect data from the dataset, then use NLP (Natural Language Processing) tools to process sentences of data, and tokenization (dividing a string series into parts) of each word to convert to an integer. Next, in the word embedding stage, tokenized integers are converted into real-valued vectors. The output of the embedding stage is sent to a deep learning system. Finally, the parameters in the deep learning model must be fine-tuned in order to accurately forecast the sentiment class. In the Fig. 6, Darker blue indicates more popularly used and less dark blue indicates normally used.

Fig. 6
figure 6

Deep learning architecture for Sentiment Analysis

1.9 Data pre-processing used for sentiment analysis

Researchers make data easier to understand and use by preprocessing it. This procedure removes data discrepancies or duplicates that may otherwise degrade the accuracy of a model. Data preparation also guarantees that no incorrect or missing values exist as a result of human mistakes or bugs. This section describes related data and pre-processing steps for sentiment analysis purposes.

1.9.1 Data

Information used in sentiment analysis is divided into two categories: datasets and lexicons. Choosing decent datasets and lexicons for testing and performance evaluation of a method is crucial. There are a variety of popular lexicons those are used in sentiment classification. Table 5 provides an overview of the lexicon, including the name of the lexicon, the author’s name, the year of production, the word size, and the set of feelings.

Table 5 Summary of the lexicon used for polarity detection

There are a variety of prominent datasets for emotion recognition and polarity detection that can be used for sentiment analysis. Table 6 provides an overview of datasets, including the dataset name, bibliography, development year, quantity of data, and type of data.

Table 6 Summary of sentiment polarity related dataset on reviews

Here, we have given some links of open source data with its size and task, See Table 7.

Table 7 Some open source data

1.9.2 Data pre-processing

Pre-processing data is the crucial step. This involves the removal of noise data text and non-informative elements of the original text, such as hashtags to improve the classification accuracy. Textual data from social media sites is thought to be chaotic and in need of cleansing. The goal is to reduce the amount of data, and it is a key stage in text classification. The goal of cleaning up a dataset from useless or distracting data is to produce more accurate findings. Pre-processing should be done on every database. Figure 7 shows the prepossessing and its visualization. We list the raw data pre-processing stages and related tasks as follows:

  1. 1.

    Tokenization: Each text in the dataset is divided into tokens based on separators like tab, comma, blank space, or any other separator found in the dataset.

  2. 2.

    Stemming: It is the process that standardizes every word to its stems. To remove the suffix from each word, it relies on grammatical rules.

  3. 3.

    Normalization: The majority of users attempt to write their posts in a shortcut format. However, at some point, terms are simplified into their formal written format. The computer then makes things simpler to tell the difference between two words with the same meaning but written in different ways.

  4. 4.

    Irrelevant Noise: Set of social networking sites are typically dirty and disclose worthless data. This contributes to the efficiency of the categorised model. Punctuation marks, digits, special characters, usernames, and URLs are just a few examples. It must be cleaned of all irrelevant data before it can be processed in any way.

  5. 5.

    Stop word removal: Avoid deleting crucial terms, but remove stop words, such as conjunctions and prepositions, which are commonly utilised as phrases with low knowledge quality. In either SA or text emotion recognition, stop words may not have a significant influence in shifting the scenario from (positive to negative) or (happy to sadness).

Fig. 7
figure 7

Preprocessing of data

1.10 Text embedding

Data must be incorporated because the inputs cannot be directly taken from the deep learning framework. There are a variety of embedding strategies available. For each incoming text word, the embedding layer contains a vector representation depending on the semantic correlation to the relevant word. Several common text encoding techniques are explained briefly by the author including research related. In the classification of text analytics, text representation using deep learning is shown in Fig. 11. Table 8 presents a short summary of text representation technique with reference.

1.10.1 Word based embedding

  1. 1.

    WORD2VECTOR (Mikolov et al. 2013): Word2Vec developed by Google, Word2vec is used to find distributed representation of each word. Word2vec trains the terms against each other in the input set, which are neighbours. One can use either of the three types of the Bag of Words (BOW), Continuous Word Bag (CBOW), and the Skip-gram package.

  2. 2.

    BOW (Zhang et al. 2010): Bag-of-words (BOW) framework is a simplified text representation tool which presents a text as the bag of its words that does not follow both grammar and word order. It preserves multiplicity.

  3. 3.

    CBOW(Continuous Bag of Words (CBOW)) (Zhang et al. 2010): The algorithm predicts the specific term in the CBOW architecture from a spatial view. The sequence of the words in meaning does not affect the prediction. The model uses the present word in the ongoing skip-gram design to predict the subsequent context word range. CBOW is quicker and skip-grams are slower, but for the infrequent case of words, they work better.

  4. 4.

    FASTTEXT (Mikolov et al. 2013): Fast-Text is developed by the Facebook research team for embedding vectors with three hundred dimensional for every word. The word representation is taught in it effectively. For utilizing the details at the character level fastText works for all of the rear words.

  5. 5.

    GLOVE VECTOR (Pennington et al. 2014): The form of the Glove vectors is unsupervised, and the terms are represented by the vector for each word. The terms are identified by word similarity distance as well as semantic space.

1.10.2 Phrase based embedding

  1. 1.

    SKIP-GRAM(SG) (Ma et al. 2018): Skip-grams are a generality of n-gram in linguistics computations, in particular language modeling, where components (usually words) should not be identical with the text under review but may leave text gaps over and over. They offer one way to overcome the problem of data sparsity with traditional study of n-gram.

  2. 2.

    SSWE (Tang et al. 2014): Tang et. al, developed the SSWE model by translating sentiment knowledge into continuous words. It analyses the positive and negative n-gram when the range of [1,0] as well as [0,1] respectively. It can handle coherent and incorporates the feelings of verbs as well as their syntactic meanings.

1.10.3 Sentence based embedding

  1. 1.

    ELMo (Ulčar and Robnik-Šikonja 2022): ELMo model manages the various meanings of the terms in a context-based manner, representing an embedded vector depending on the whole sentence of the expression. The ELMo expression-based model is able to handle the syntactic as well as semantic features of the word by handling many context-based meanings (modeling of polysemies).

  2. 2.

    GLoMo (Yang et al. 2018): Graph from the low level modeling is called GLMo, this system focuses on latent graph training with the unsupervised approach. This also offers a context to enhance the performance of NLP activities for sentiment analysis, linguistic inference, answers to questions as well as the classification of images.

  3. 3.

    Universal Language Model Fine-tuning (ULMFiT) (Nithya et al. 2022): In ULMFiT, a common domain language framework is pre-trained as well as optimized for the target domain. Its processing is invariant in the scale, amount and label of documents and thus appears to be universal. This implements a single design and learning to carry out multiple activities and needs no domain-specific and labels.

  4. 4.

    OpenAITransformer (Liu et al. 2019): OpenAITransformer uses a linguistic model while a training signal to train a major transformer framework as unsupervised learning. Therefore, fine tuning of the algorithm with limited supervised data sets allows the specific problem to be solved.

  5. 5.

    BERT (Gao et al. 2019): By treating both left and right contexts together, BERT pretrains duel-directional representations of unlabelled data in each of these layers. It allows the solution of each NLP problem to be fine-tuned by applying a single output layer to the pre-trained model.

  6. 6.

    ContextToVector (Liang et al. 2019): A very recent method of text embedding is Context2Vec. It consists of different contextual meaning based information to understand the context of a sentence. It can handle contextual information nicely, but it is expensive computationally.

  7. 7.

    Sent2Vec(Sentence to Vector) (Agibetov et al. 2018) Sent2Vec is indeed an unsupervised variant of fastText that uses the full phrase as the context as well as all vocabulary terms as possible class labels.

1.10.4 Document level embedding

  1. 1.

    Doc2Vec (Walkowiak et al. 2019): Doc2vec generates a numerical representation of a document of any length. However, unlike words, documents do not have logical structures, thus another way must be discovered. Micolov’s proposal was simple but clever, they have been using the word2vec model and added another vector (Le and Mikolov 2014).

  2. 2.

    Latent Dirichlet Allocation (LDA) (Hoffman et al. 2010): LDA is a generative statistical topic modeling method that enables unobserved entities to explain why some sections of the data are comparable to sets of observation. For instance, if observations are terms gathered into documents, it is assumed that each document is a composition of a small number of topics as well and the occurrence of each word is related to one of the topics.

  3. 3.

    TF-IDF(Term Frequency Inverse Document Frequency) (Patil 2022): is a statistical approach that represents the number of times of a word or phrase appear in a document and is balanced by the total number of document in the dataset or corpus that contain the word. For a term i in document j, the weighting factor of TfIdf is given is \(W_{ij}=tf_{i,j}*log(N/df_i)\), where \(tf_{i,j}\) is the number of occurrences of i in j, dfi is the number of documents containing i and N is the total number of documents. Tf-Idf is also developed by Google.

  4. 4.

    Paragraph Vectors: Distributed Bag of Words (PV-DBOW) (Ji et al. 2019): Paragraph vectors is similar to word2vec’s skip-gram design. The classification challenge is to identify a single context word utilizing only the paragraph vector. A text window is sampled for each repetition of stochastic gradient descent, followed by a single random word selected from that window.

The following summary Table 8 gives the overview of the popular text representation technique.

To make a summary, we can say that Word2Vec, Bag of Words and GloVec Vector work better than other methods for word representation. GloVec handles semantic similarity with small and big dimensional facilities. Similarly Word to Vec and the Bag of words have higher word size and their relational vector to represent text. There are also some context based embedding techniques like Context to Vector, Skip Gram and Elmo but their size and number of relation among words and phrases are not enough for training for good prediction as word based embedding.

Deep learning methods (Mikolov et al. 2013), GloVe (Pennington et al. 2014), and FastText (Bojanowski et al. 2017) all employ Word2Vec. Word2Vec is simpler in terms of time and space. GloVe was later released, and it functions as a global vector for word representation. This is superior to Word2Vec. FastText, unlike Word2Vec and GloVe, was designed for word classification with massive automatic units. Furthermore, ELMO (Sarzynska-Wawer et al. 2021) and BERT (Gao et al. 2019) are contextual based word embeddings. The polysemy challenge of general word embeddings is solved by using context information (Meena et al. 2022) (Fig. 8).

Fig. 8
figure 8

Taxonomy of popular text representation in sentiment analysis using deep learning

Table 8 Summary of popular text embedding in deep learning-based sentiment analysis

1.11 Performance metrics

In this section, we go through the most common performance metrics used in sentiment analysis. Performance measures are calculated using both actual and forecasted data. Since the sentiment is contextual and varies amongst persons, the sentiment evaluation process is challenging and viewed as a complex procedure. Datasets may contain variable and unbalanced opinion contents, as well as ingenuine data annotation may lead to erroneous performance in sentiment analysis. Thus, comparing the results to other benchmarks may be futile. Performance metrics reveal how well a method can perform on datasets. Precession, Recall, F-value or F1 score or F Score or F measure, Inter-annotator consensus, K-fold cross validation, and Pearson Correlation coefficient are some of the performance analysis metrics that can be used in sentiment analysis. To quantify performance, each of the performance measures employs obtained results from a method and the actual data. The performance metrics are defined as follows:

  1. 1.

    Accuracy: Accuracy is measured by the ration of summation of the exactly matched instances to total number of instances, i.e., It is calculated by the Eq. (1).

  2. 2.

    Precision: The performance of a method is measured by the precision that signifies the accuracy of sentiment classification. It is calculted by the Eq. (2).

  3. 3.

    Sensitivity or Recall: Recall or sensittivity measures the proportion of the measured positive sentiment and is defined by

  4. 4.

    Specificity: Recall measures the proportion of the measured positive sentiment and is calculated by the Eq. (3).

  5. 5.

    Specificity: The amount of valid negative predictions divided by total number of negatives is used to compute specificity (SP). It’s also known as a true negative rate (TNR). It is calculted by the Eq. (4).

  6. 6.

    F-value or F1 score, or F Score, or F measure: This is the weighted mean of precision and recall. The higher the F value is, the better the accuracy is. Actually, to find a compromise between precision and recall, the F1 score is calculated as (5):

  7. 7.

    Loss function: Loss function calculates the unpredictability. The calculation of cross entropy loss function is given in Eq. (6).

P denotes Positive, T represents True, F means False, and N represents Negative in the calculation. The anticipated value yi in the loss function is y. The Eqs. (1) to (6) are used to calculate their calculations.

$$\begin{aligned} \tiny Accuracy&= \frac{\left( TP+TN\right) }{TP+TN+FP+FN} \end{aligned}$$
(1)
$$\begin{aligned} Precision&= \frac{\left( TP\right) }{TP+FP} \end{aligned}$$
(2)
$$\begin{aligned} Recall/Sensitivity&= \frac{\left( TP\right) }{TP+FN} \end{aligned}$$
(3)
$$\begin{aligned} Specificity&= \frac{\left( TN\right) }{TN+FP} \end{aligned}$$
(4)
$$\begin{aligned} F1&= 2*\frac{\left( Precession*Recall\right) }{Precession+Recall} \end{aligned}$$
(5)
$$\begin{aligned} Loss(y)&=y_i log\left( {\bar{y}}_i\right) +\left( 1-y_i\right) +log \left( 1-{\bar{y}}_i\right) \end{aligned}$$
(6)

In Table 9, we provide a table that shows the performance metrics which are used by different recent deep learning-based sentiment classification studies.

Table 9 List of performance metrics

2 Method

For text processing task, there exist numerous deep learning models. Deep learning models were developed utilising a variety of artificial neural network techniques. This section explains all of the advanced and widely used deep learning techniques for sentiment analysis. We presented a diverse approaches on deep learning models with processing, equations and algorithms with diagrams are presented. As shown in Fig. 9, a deep learning-based method comprises some basic steps. The input text is prepared using various embedding approaches, then fed into a deep learning model, and finally predicted the results. This is the framework of sentiment analysis with deep learning methods.

Fig. 9
figure 9

Deep learning-based model for sentiment analysis

2.1 CNN based method

CNN is a sort of artificial neural network that is commonly used for image as well as object identification and classifications. LeCun et al. (1998) proposed the architecture of CNN in 1998. In this approach, tiny bits of data are transformed into high-level vectors using a convolution layer (Ezaldeen et al. 2022; Diwan and Tembhurne 2022). A CNN layout’s basic configuration with accompanying layers is shown in Fig. 9. A CNN has input layers that are completely coupled to the output layers. The inputs are passed through the filtered map of convolution and pooling stages which are critical for its efficacy. Finally, resulting outputs are mapped by some function e.g., soft max in order to produce the final output. The output is next classified. It can be noted that a backpropagation process is required to compute the unknown parameters involved in the intermediate stages between the input and output layers. Overall procedure of CNN based textural sentiment analysis is shown in Fig. 9a.

2.2 RNN based method

A recurrent neural network (RNN) is a type of artificial neural network (ANN). RNNs (Recurrent Neural Networks) are now the most basic as well as powerful neural networks. These methods have produced promising outcomes for a variety of technologies, resulting in their enormous popularity (Han et al. 2021). The main goal of RNN is to process large sequential data. The concept of internal memory distinguishes RNN from typical neural networks. RNNs which are derived from feed-forward neural networks (FNN) can handle variable length of groups of inputs by using their internal state (memory) (Tealab 2018). Gated states, also known as gated memory or gated recurrent units, are regulated states that are a feature of long-term memory networks (LSTMs) (Mittal et al. 2021; Han et al. 2021). It is also known as the Feedback Neural Network (FNN). The sequential operation of RNN based sentiment analysis is shown in Fig. 9b.

2.2.1 LSTM based method

LSTM(Long term short memory) is an RNN prolongation that necessitates the storage of inputs for a long time. LSTM can handle long ranged features that helps to increase classification performance. In contrast to RNN’s basic internal memories, LSTM has an advanced memory. The contents of the memory can be read, written, and wiped. As a result, it addresses the vanishing point problem that plagues RNN. The LSTM will determine which facts should be remembered and which should be forgotten. In LSTM, the memory can be gated. Sequential operation of LSTM based sentiment analysis is shown in Fig. 9b. When the contexts of the input seem essential, BiLSTMs have proven to be especially useful. It is a popular tool for categorising emotions. Data flows from backward to forwards. In the case of unidirectional LSTM. Data travels not only backwards but also forwards and backwards utilising two hidden states in Bi-directional LSTM. As a result, Bi-LSTMs (Schuster 1997) have the best understanding of the context (Ragheb et al. 2019). BiLSTMs were utilised to increase the amount of network-usable input data.

2.2.2 GRU based method

GRU is known as Gated Recurrent Unit. Integrated gates control of GRU is permitted to pass through to the output and may be trained to acquire knowledge from a longer time period (Han et al. 2021). Kyunghyun invented Gated Recurrent Units (GRUs) as a gating function for Recurrent Neural Networks (RNNs) (Cho et al. 2014). A LSTM and a GRU share a number of similarities. Compared to LSTM, GRU contains fewer parameters. The GRU algorithm outperforms the LSTM algorithm. For high performance, GRU employs a gate. The reset and update gate are in GRU. Each reset gate determines how to combine new input data with only previous data, while the update gate determines what kind of old memory to keep. Unlike LSTMs, GRUs have no background conditions. Update and reset gates cause the gate to be forgotten and linked to the previous concealed layer (Han et al. 2021). The sequential operation of GRU based sentiment analysis is shown in Fig. 9. The bidirectional gated recurrent unit (BiGRU) (Li et al. 2022; Han et al. 2020) is made up of two unidirectional GRUs that work together to solve the context information loss problem (Zhou and Bian 2019). As a result, the reset and upgrade elements of the LSTM reset gate are basically separated in a GRU, as seen in Fig. 9(b).

2.3 Attention based method

Attention (Islam et al. 2021) is a neural network method that represents cognitive attention The effect increases some portions of the input data while decreasing others, with the notion being that the network should focus more on that tiny but significant component of the data. Attention examines two sentences, turns them into a matrix with words from one sentence as columns and words from the second sentence as rows, and then matches them to identify significant context Liu et al. (2022) (Liu et al. 2022) Xu et al. (2020). This is very useful in machine translation. That’s great, but it gets even better in translation (Bahdanau et al. 2014). It acts as a single hidden layer on a neural network to get a sense of how the attention process works. The network’s purpose with the attention function is to determine the significance of each hidden state and to produce a weighted sum of all input features. The diagram from Bahdanau’s Attention Method. Sequential operation of attention based sentiment analysis is shown in Fig. 9c.

2.4 Capsule based method

A CapsNet is an artificial neural network (ANN) framework for building hierarchical interconnections which can be used in sentiment analysis models. In contrast to neurons in a normal neural network, capsule networks use capsules. Capsules include all of the critical data from an image that generates a vector. In contrast to neurons, which produce a scalar value, capsules may maintain track of the feature’s orientation. Capsule neural networks can reduce training time and accommodate long-range features thanks to its routing method. The primary capsule, secondary capsule, and squash function are indeed the three steps of the routing process in any way. Figure 9d shows a simple representation of a dynamic capsule neural network.

2.5 Hybrid method

In hybrid deep learning (Liu et al. 2022), conventional methods are combined to get higher accuracy. This method may get good results but have some complexity problems. Figure 9e shows a simple representation of a dynamic capsule neural network. Contextual meaning, semantic dependency and related challenging issues are nicely handled by the hybrid deep learning methods (Gao et al. 2019; Dashtipour et al. 2020).

2.6 Neuro symbolic AI method

Neuro-symbolic artificial intelligence (Hitzler et al. 2022) is indeed a research and practice field that integrates machine learning techniques relying on artificial neural networks, like as deep learning, with symbolic approaches to computing and artificial intelligence (AI), such as those found in the AI subdiscipline of knowledge recognition as well as reasoning. Although neuro-symbolic AI has a long history, it constituted a somewhat specialized field until recent advancements in machine learning driven by deep learning that led to a sharp increase in interest in and research activity in this area.

This method has huge number of application in the field of sentiment analysis (Kocoń et al. 2022) like product recommendation (Golovko et al. 2020), knowledge discovery (Makni et al. 2021), question answering (Oltramari et al. 2021), event prediction (Hassanzadeh et al. 2022) as well as emotion analysis (ŠŠkrlj et al. 2021). Figure 9f shows the basic mechanism of Neuro symbolic AI method to analyze textual sentiment.

3 Result analysis

This section includes comparative result analysis and analytical result analysis. In the comparative analysis section, we includes the critical analysis of existing recent method of based on their category. On the other hand, in the section of analytical analysis, we proposed a model and its comparative result from manual our implementation.

3.1 Comparative result analysis

All other approaches are outperformed by newer approaches that use deep learning architecture. Many deep learning systems can handle syntax and syntactic aspects of texts based on large-scale datasets. It addresses the shortage of current or available methods with better intensity and precision. This section contains summaries of current developments in sentiment analysis. The data, approach, sentiment analysis model, language, advantage, and disadvantage are all listed in the table utilised in this section. In the topic of sentiment analysis, there are numerous related works. Traditional methods can be classified into the following categories: 1. CNN 2. RNN(LSTM, GRU) 3. Attention 4. Capsule 5. Hybrid Deep learning. and 6. Neuro symbolic AI.

3.1.1 CNN based analysis

Zhang and Wallace (2015) developed CNN for text categorization. Textual features are extracted easily using CNN method, and current studies in sentiment analysis has made significant progress (Ezaldeen et al. 2022; Diwan and Tembhurne 2022; Dangi et al. 2022). A new application of conversational sentiment analysis using 2D CNN got 97.5% accuracy (Wang and Meng 2022). Zhang (Zhang et al. 2015) used character-level CNN for text classification in 2015. For text classification, very deep CNN is employed (Conneau et al. 2016), and CNN is used to detect abusive language on Twitter (Park and Fung 20170. However, CNN is not completely suitable since it is unable to capture long-ranged features, as well as the crucial dependent features data and spatial position specifics of its function. CNN-based sentiment analysis works are extensively utilised; the network structure can only handle preset text and employs network parameters that are similar in size to traditional machine learning (Zhang et al. 2015; Meena et al. 2022). It differs from repeating networks in that sequences are repeated and texts of any size can be handled as a result. Some recent CNN-based techniques for Sentiment Analysis are shown in Table 10.

3.1.2 RNN based analysis

Recurrent Neural Networks (RNN) (Liu et al. 2022), Long Short-Term Memory (LSTM) (Mittal et al. 2021), and Bidirectional LSTM (BiLSTM), GRU (Han et al. 2021), BiGRU (Han et al. 2020) tools are commonly used in deep learning approaches. We will now review the limited attempts at using deep learning to sentiment classification, but we will find that genuine performance assessments based on Recurrent Neural Networks are sparse. For sentiment classification, there are numerous recurrent neural network (RNN) techniques. RNN can anticipate outcomes with long-range features by evaluating past information or features. Researchers have proposed a variety of RNN modifications such as LSTM because the first RNN has problems such as gradient absence and gradient dispersion. The LSTM (Mousa and Schuller 2017; Cho et al. 2014) can track long-term dependency in sequences with storage units and gate structures that regulate how data in storage is being used and updated to increase model computation efficiency (Mukhtar et al. 2018; Johnson and Zhang 2017). For the pre-trained LSTMs model, Dai et. al incorporated sequential learning in a recurrent neural network (Dai and Le 2015). BiLSTM model has recently been employed in target-related tasks including categorising relationships based on numerous evaluations. Recently, the BiLSTM model has been utilised to categorise relations based on numerous evaluation entities, which is a target based task. Sayeed (Saeed et al. 2018) created a deep neural network with GRU and max pooling to accurately categorise overlapping feelings using Bi-GRU. The Recurrent Neural Network has almost addressed the aforesaid restrictions of managing long-ranged dependent features of input (RNN) (Cho et al. 2014; Lai et al. 2015; Wang et al. 2016) and capsule neural network (Srivastava et al. 2018; Dong et al. 2020). For the sentiment classification task, RNN-based deep learning algorithms outperformed machine learning approaches generally. Some modern RNN deep learning-based techniques for sentiment analysis are listed in Table 11.

3.1.3 Attention based approach

Attention processes are quite popular in NLP, as evidenced by a large number of research articles (Liu et al. 2022) (Liu et al. 2022; Islam et al. 2021). Bahdanau et al. (2014) published the first work with an attention mechanism for machine translation. It acts as a single hidden layer on a neural network to get a sense of how the attentive process works. A recent method for aspect based sentiment analysis shows good performance with obtaining satisfactory accuracy on Laptop and Restaurant review dataset (Zeng et al. 2019). The network’s purpose for the attention function is to determine the significance of each hidden state and to produce a weighted sum from all input features. The neural network based framework combined with the attention mechanism have recently delivered superior performance (Islam et al. 2021) compared approaches in target-related tasks such as identifying relationships based on various evaluation objects, utilising BiLSTM (Tai et al. 2015; Lai et al. 2015) are two words that come with Attention processes are employed largely for grouping elements. Galassi proposed a cutting-edge computational architecture based on widely available lexical sentiment tools. Two types of attention, lexicon-dependent contextual attention and contrasted co-attention, are supplied in order to improve model efficiency (Galassi et al. 2020; Yanase et al. 2016). Some recent attention based techniques for Sentiment Analysis are shown in Table 12.

Table 10 CNN based Sentiment Analysis
Table 11 RNN based Sentiment Analysis
Table 12 Attention based Sentiment Analysis
Table 13 Capsule-based Sentiment Analysis
Table 14 Hybrid model based Sentiment Analysis
Table 15 Neuro Symbolic AI model based Sentiment Analysis

3.1.4 Capsule network based sentiment analysis

Hinton Hinton et al. (2011) first proposed the "capsule" concept for CNN and RNN. The capsule network’s main function is to establish spatial linkages and position directions based on the standard neural network, and to recognise objects by combining invariance and coverability. In response to Hinton’s work, Sabour (Zhao et al. 2019) proposed a capsule network that took advantage of the vector performing strength, rather than the typical convolution neural network production, by using a dynamic routing method. Kim (Kim et al. 2020) suggested a simple routing model that effectively reduces the routing algorithm’s computing complexity. Zhao et al. Zhang et al. (2018) has employed this capsule network in textual tasks and proved its perfection in a variety of datasets, motivated by the novel capsule network’s performance. Zhao added three methods to the dynamic routing algorithm to remove any noisy capsules that may contain or haven’t been properly trained redundant features (Lai et al. 2015; Zhang et al. 2018). Capsule network (Lai et al. 2015) has conducted the first research in the present modeling framework. The difficulties of routing computation were acknowledged, as was Wang’s combined capsule and attention network for text-based sentiment classification (Wang et al. 2016). In general, in the sentiment analysis challenge, RNN-based deep learning systems using capsule networks outperformed machine learning alternatives (Jia and Wang 2022; Zhang et al. 2021; Liu et al. 2022; Dong et al. 2020). Table 13 shows capsule net techniques for sentiment analysis that use capsule networks.

3.1.5 Hybrid model based sentiment analysis

The Lexicon-based system has issues with low accuracy and domain dependency. Machine learning methods also relies on domain knowledge. The hybrid sentiment detection technique combines two or more techniques that are designed to outperform than other approaches (Dong et al. 2020; Liu et al. 2022, 2021; Alsayat and Ahmadi 2023; Alsayat 2022). A combination of different sentiment techniques has been shown to perform better than a single method in some previous studies (Du et al. 2019). Sentiment detection using a hybrid method that combines Keyword, Lexicon, and Machine Learning works better because it overcomes the shortcomings of previous or existing methods (low accuracy, lack of handling of syntactic and long-range features, etc.). On smaller databases, machine learning approaches perform better, while the hybrid training strategy delivers better accuracy than traditional machine learning methods (Gao et al. 2019; Dashtipour et al. 2020). The development and improvement of hybrid methods is a current and future focus area in sentiment classification that will lead to a more effective automatic emotion recognition system. Table 14 shows various recent Hybrid deep learning-based sentiment analysis techniques.

3.1.6 Neuro symbolic AI based sentiment analysis

Neuro Symbolic AI based Sentiment analysis uses Knowledge based graph and deep learning. Recently, a novel process for context reasoning that uses knowledge bases to lead the learning of deep learning models uses the idea of neuro-symbolism (Tiddi et al. 2020). This method got 69% accuracy commonsense data. Another new method that uses generated neural commonsense knowledge models to produce contextually relevant symbolic knowledge structures on demand (Bosselut et al. 2021). This method got 50.1% accuracy which is more satisfactory than some of the other compared methods. The effectiveness of the neuro-symbolic approach towards dynamically building knowledge networks for reasoning is demonstrated by experiment findings on two datasets. While offering interpretable reasoning pathways for its predictions, this method outperforms pre-trained language models and conventional knowledge models by a wide margin. Another novel method that employed Neuro-Symbolic AI to predict Student Strategy (Shakya et al. 2021). The authors create a symbolic AI model that can combine the benefits of DNNs, Markov knowledge and symbolic AI in order to forecast strategies.. This method had a 96.78% accuracy rate. Recent unsupervised with reproducible subsymbolic strategies build reliable symbols that transform natural language into a kind of protolanguage and, as a result, extract polarity from text in a fully understandable and justifiable way using auto-regressive language models and kernel methods (Cambria et al. 2022). This approach, which uses a different dataset, yielded significantly better results on STS data set, this method’s accuracy with the SenticNet lexicon and neuro-symbolic approach was 90.08%. A novel approach applies symbolic and subsymbolic AI tools to the intriguing problem of text polarity recognition, integrating top-down and bottom-up learning (Cambria et al. 2020). Develop a new version of SenticNet, a logical reasoning-based knowledge base for sentiment classification, in particular by integrating logical reasoning with deep learning architectures. This method obtained 91.54% accuracy on LJ-5k dataset. The neuro-symbolic approach (Roig Vilamala et al. 2022), that integrates the neural and symbolic methodologies to enable training with sparse data. This method got accuracy of 86.57%. Efficiency needs to be raised while time complexity needs to be decreased. In Table 15, we presented some neuro symbolic methods for sentiment analysis with their methods, findings and limitations.

3.2 Types of the modalities in sentiment analysis

There are different types of modalities in sentiment analysis like Unimodal and Multimodal. Modalities are categorized based on the data types analysed using the model (Zucco et al. 2019). In this section, we introduce the type of modalities of sentiment analysis and their applications (Hazarika et al. 2022; Zadeh et al. 2017). Some good examples of modality based sentiment analysis are given in Table16

3.2.1 Unimodal sentiment analysis

Unimodal sentiment analysis consists of only one type of data to be predicted or classified. Only one type of data among Audio, Video, Image, and Text are analyzed in the unimodal sentiment analysis.

3.2.2 Multimodal sentiment analysis

Multimodal sentiment classification is a different aspect of classic text-based sentiment classification that extends beyond textual analysis to incorporate audio and visual input (Kaur and Kautish 2022). It might be bimodal, with varying combinations of two modalities, or trimodal, with three modalities. With the abundance of social media data publicly available in various forms including videos and images, traditional text-based sentiment analysis has transformed into more complex models of multimodal sentiment classification, which can be used in the development of virtual assistants, examination of YouTube (Wöllmer et al. 2013) film reviews, examination of news videos, and emotion recognition (also known as emotion classification) such as anxiety monitoring, among many other things (Keswani et al. 2020; Liu et al. 2013).

One of the most important fundamental tasks in multimodal sentiment analysis, similar to traditional trend analysis, is sentiment classification, which categorizes various attitudes as positive, negative, or neutral (Pang and Lee 2005). To complete such a task, the intricacy of assessing text, audio, and visual data necessitates the use of several fusion approaches, such as feature-level, decision-level, as well as hybrid fusion. The variety of textual, auditory, and visual information used in the study influences the effectiveness of these fusion approaches and the classification methods used (Kaur and Kautish 2022).

Table 16 Modalities of Sentiment
Table 17 Similarities and differences among different methods

3.3 Summary of analysed methods

Based on the examination of the linked works in the preceding section, it can be stated that extant recent work in the field of sentiment classification from text performs well. Handling cohesion, context, semantic meaning handling, negation, modifiers, and the statement’s intensifier are all lacking. Lexicon techniques or dictionaries can handle grammatical syntax, but they come with drawbacks such as low accuracy, longer time complexity, dictionary dependency, and domain reliance. Unsupervised research has the advantages of adaptability, flexibility, and ease of use, but it also has a higher time complexity and worse accuracy. Machine learning, on the other hand, performs best in terms of speed but is limited in terms of handling semantics and sentence dependency. The accuracy of modern deep learning-based methods in handling independent textual features is higher. Deep learning also has several constraints, such as the proper handling of context and grammatical structures. Automatic emotion recognition is still a hotly debated topic in academia. All strategies, however, have significant limits that encourage researchers to develop new techniques to improve performance. The similarities and the differences among different methods are presented in the Table 17.

3.4 Motivation to propose a method

There are a lot of limitations of the existing sentiment analysis method as discussed in literature section. To recover some of the limitations, we have following contribution with our proposed method. Contribution are listed below.

  1. 1.

    To reduce higher time complexity in sentiment analysis (Ali et al. 2019; Batbaatar et al. 2019).

  2. 2.

    To handle the syntactic dependency for complex sentence (Zouzou and El Azami 2021; Srivastava et al. 2018).

  3. 3.

    To increase accuracy accuracy in sentiment classification (Saravia et al. 2018a; Batbaatar et al. 2019).

In general, a review paper consists of simple or critical review of different articles. When a good and high performed method is proposed with a review article then the contributions becomes effective and the recommendation of a model becomes effective easily. In this section, we are proposing a high performance deep learning model (CRDC) for sentiment analysis. Our CRDC (Capsule with Deep CNN and duel structured RNN) technique for sentiment classification at multi-label levels employs a novel approach for analysis that combines a hierarchical focus capsule network with a two-way recurrent neural network and a convolutional neural network. This section covers the proposed method with its procedure, dataset information, algorithm and result analysis tables, and figures.

3.4.1 Data and metrics used in analytical analysis

We do four data-set experiments on the problem of multilabel text-based sentiment categorization to evaluate the performance of our suggested solution.

Here is a brief explanation of these datasets. ER (Saravia et al. 2018b) stands for Emotion Recognition. The data in this dataset is multiclass and multilabel. This 414810 text comprises six types of emotions (anger, love, fear, sadness, joy, and surprise). Each emotion is assigned to a specific feeling. MR is a series of Cornell University English film reviews that were recorded on www.rottentomatoes.com. There were 5331 good film reviews and 5331 critical film reviews. The average word length is 20 (Pang and Lee 2005). of the dataset is shown in Table 18. Toxic dataset provides comments from human ratters on Wikipedia discussion page updates that are characterised as toxic behavior. Toxicity is divided into six categories. There are 159571 comments in this dataset. Any toxicity level with one or both of the six toxicity levels can be found in a single statement in the dataset. On the internet, IMDB has 50000 movies and extreme division reviews. There are 25,000 good and 25,000 negative outcomes in this dataset of 294 (Maas et al. 2011) days which is a fine period. The Crowdflower dataset includes 40000 emotional opinions on 12 different emotions. There are twelve sentiments in this collection. Table 18 shows the overall summary of the dataset. In this table, CV means no standard train/test split and thus nested 10-fold cross validation is used.

Table 18 Overall summary of Dataset analytical analysis

3.4.2 Proposed CRDC Method

Our model is developed to solve some of the limitations of the existing method. The performance is increased based on the higher feature extraction interpretability using Dilated CNN based BiGRU-BiLSTM layer to the routed capsule network. Our method handles long-ranged independent and relevant features with less time complexity. Actually dilated CNN method extracts features fastly with less convolutional operation. BiGRU-BILSTM layer handles long ranged features properly. On the other hand the dynamically routed capsule layer captures relevant features in relation to less training time. The overall operational process is shown Fig. 10. The benchmark data will be obtained first, followed by the insertion of all data. The embedded result will be sent to the BIGRU(128) - BiLSTM (128) layer. The BiGRU-BiLSTM method produces a new vector routing mechanism for extracting important information for the CNN-based deep layer. The Dilated CNN outputs will then be sent to the updated capsule network, which will subsequently be sent to the next layer for prediction. These layers are combined in a sensation classification module with several levels and labels.

In our proposed model and related model, We used cross entropy, Adam optimizer, Learning rate \(1e^{-3}\), Dropout as 0.5, batch size 32 and ROC for the assessment. We implemented recent deep learning models and our proposed CRDC model on for data set named as Emotion Recognition (ER) (Saravia et al. 2018b), Toxic comment (Van Aken et al. 2018) classification, IMDB (Van Aken et al. 2018) and CrowdFlower (Oberländer and Klinger 2018) data.

Main algorithm for our proposed method is given at the algorithm 1: Here we sequentially present the operation and procesiong of each layer to produce the output.

Algorithm 1
figure a

Main classification algorithm

Fig. 10
figure 10

Architecture of Proposed CRDC model

3.4.3 Compared result on manual implementation

This section presents different models, and comparative result with our models based on our analytical analysis of three different datasets.

3.4.4 Overview of different deep learning models implemented

We have implemented popularly used nine models like CNN, LSTM, BiLSTM, GRU, BiGRU, Capsule, Attention Based model and our model. In the realm of text-based sentiment analysis, several influential models are utilized. Convolutional Neural Networks (CNNs) employ convolutional layers to extract localized features, particularly effective in capturing relevant patterns and information crucial for sentiment analysis. Long Short-Term Memory (LSTM) models are designed to understand contextual information and long-term dependencies, making them ideal for accurately analyzing sentiment within text. Bidirectional LSTMs (BiLSTMs), processing input data in both forward and backward directions, enhance context comprehension by considering past and future text simultaneously, significantly improving sentiment analysis. CNN-LSTM, a hybrid model, leverages CNNs for initial feature extraction followed by LSTM layers for comprehensive context understanding, capturing both local and global features within the text. Capsule Networks emphasize hierarchical representations and part-whole relationships, offering an alternative approach to feature extraction for nuanced sentiment analysis. Attention-based models dynamically highlight critical words or segments, providing a nuanced understanding of sentiment by focusing on key sentiment-bearing components within the text. These models, either individually or in combination, greatly contribute to achieving high accuracy and insight in sentiment analysis.

In Table 19, we have presented all the details of the different deep learning architectures that are implemented. In this table, we have given layer information, size, activation function, learning rate, batch size, and number of epochs.

Table 19 Detail on the different deep learning architectures implemented

In this section, we present an overall result analysis. This section consists of a performance table, bar graph and graphical figures generated from the implementation. In the Table 20, we show the training performance of our model with other compared models on different dataset.

Tabel-12 shows the comparison of multiple deep learning algorithms with our CRDC model as well as baselines on four small and large text categorization benchmarks to distinguish multilayer and multilabel emotions from text (validation set accuracy and loss). In this table, the key consequences of several methodologies are listed in order. The "*" results are acquired using our method.

Table 20 Training result comparison of multiple deep learning algorithms with our CRDC model

In the Table 21, Based on four large datasets, we compare the effectiveness of the model against those of another model. Table 21: Comparison of multiple deep learning algorithms with our CRDC model as well as baselines on four small and large text categorization benchmarks to distinguish multilayer and multilabel emotions from the text (validation set accuracy and loss). In this table, the key consequences of several methodologies are listed in order. The "*" results are acquired using our method.

Table 21 Validation result comparison of multiple deep learning algorithms with our CRDC model

In Fig. 11 we see the performance on data of (a) ER data, (b) IMDB data, (c) CrowdFlower data and (d) Toxic data. Each emotion class of data are labelled with different colours indicated. It is clearly shown that our model performed slightly better than other models.

Here, Fig. 11 shows that the training performance accuracy of different deep learning models and our CRDC model on (a) ER data, (a) IMDB data, (a) CrowdFlower data and (a) Toxic data. Each models performance is labeled with different colours indicated. It is clearly shown that our model performed slightly better than other models.

Our proposed method attains the highest accuracy on data IMDB 88.15%, Toxic 98.28%, CrowdFlower 92.24% and ER as 95.48%. Thus, the proposed method can be used for automatic sentiment analysis in different applications with higher performance. Because the utilization of dual structured RNN and Dilated CNN with a capsule network can extract relevant features and save time.

Fig. 11
figure 11

Training performance accuracy of different deep learning models and our CRDC model on (a) ER data, (a) IMDB data, (a) CrowdFlower data and (a) Toxic data

Table 22 Compared performance on IMDB dataset
Table 23 Compared performance on Toxic comment dataset
Table 24 Compared performance on CrowdFlower dataset
Table 25 Compared performance on Emotion Recognition dataset

We have implemented all models on selected four data set. Here, Table 22, 23, 24 and 25 show that the training performance accuracy of different deep learning models on each class of IMDB, Toxic Comments, CrowdFlower and Emotion Recognition data. Based on the table overview, our proposed model works better than other model. In the Table 24, we use some notation for each class name of CowrdFlower data like A: Worry, B: Anger, C: Hate, D: Sadness, E: Enthusiasm, F: Happiness, G: Boredom, H: Neutral, I: Fun, J: Surprise, K: Relief and L: Love.

3.5 Confusion matrix result on test data

A confusion matrix is a fundamental tool used in the field of machine learning, particularly in classification problems. Its purpose is to provide a detailed and clear representation of the performance of a classification model. It’s a matrix that helps visualize the true positive, true negative, false positive, and false negative predictions made by a model. In the Table 12 We have shown the performance of our model using a confusion matrix based on four datasets named as IMDB, ER, Crowdflower and Toxic comment data.

Fig. 12
figure 12

Confusion matrix performance on a IMDB data, b ER data, c Toxic Comment data, and d CrowdFlower Data

3.6 Time complexity analysis

The research introduces CRDC, a novel hybrid deep learning model that combines Bidirectional RNN and dilated CNN, incorporating a capsule-based hierarchical attention mechanism. This hybrid model is engineered to optimize runtime efficiency. In terms of speed on the testing set, CRDC model surpasses other deep learning methods, achieving an impressive less runtime on four datasets testing performance. Rather than utilizing a standard CNN, the proposed approach employs dilated convolution to reduce computational complexity while still capturing crucial features. The integration of dilated components, as opposed to traditional convolution layers, results in reduced processing complexity and improved feature significance. Although CRDC model exhibits faster processing compared to some models using attention mechanisms with standard CNN and RNN, it lags behind more traditional models such as CNN, GRU, and LSTM in terms of speed. Future research will concentrate on further diminishing time complexity by leveraging diverse optimization techniques, normalization, and regularization. The comprehensive complexity analysis during testing on 20% of the IMDB, Toxic comment data, Croedflower data, and ER dataset are provided in Table 26.

Table 26 Time complexity analysis

4 Result comparison of the state of the art methods

In this section, we compared the performance of the existing state of the art method with our model. The Table 27 gives the overall performance on the selected datasets named as IMDB, Toxic, CrowdFlower and the ER data. In this table A indicates Accuracy.

Table 27 Compared the performance of the state of the art method

5 Advantages and problems of deep learning approaches in sentiment analysis

From the study of deep learning with this article, it is clear that deep learning architectures have shown outstanding results and important advances in sentiment analysis, there are still some disadvantages in using the following algorithms: Table 28 shows the overall pros and cons of deep learning methodologies. In this table, we listed the most advanced deep learning methods with their basic pros and cons.

Table 28 Advantages and problems of deep learning methodologies

6 Challenges in sentiment analysis using deep learning method

In sentiment analysis, there are many challenges. In general the most common challenges are the collection of data and its augmentation, validation of data, pre-processing of data, model development with satisfactory performance and the privacy with the data and model. There are also some other challenges such as understand the tone of the text from long sentence, handling semantics and coherence, multimodal data analysis, complex sentence analysis, handling sarcasms, handling uncommon emojis and biasness of emoji, deal with idoms, phrases and comparative sentence, handling negation properly, handling multilingual text and dealing short form text. Deep learning method has improved a lot but there are a lot of scope to develop high performed model to handle mentioned challenges in sentiment analysis (Hussein 2018; Mohammad 2017; Saxena et al. 2022).

Challenges of different types of deep learning method are addressed here, the challenges can boost researchers to develop new and improved methods to analyse textual sentiment.

  1. 1.

    Data generation: In the subject of public opinion research, manual data generation is difficult (Murugaiyan and Uyyala 2023). We should be helped with social networks or survey results to gain user feedback. The biggest challenge in data generation is obtaining permission as well as validation to make data accessible for analysis.

  2. 2.

    Data availability: The majority of the data used in sentiment classification is public, while some is private. To evaluate every one of these data sets, we must be aware of their labelling, augmentation, as well as balanced validation (Vatambeti et al. 2023).

  3. 3.

    Data labeling Legislation and verification: The results of sentiment analysis are shared according to tight restrictions. One of the main protocols specifies that the bare minimum of specimens and data from infected people must be collected in the shortest amount of time possible. As a result, evaluation is becoming increasingly challenging. Before analysing data, it is necessary to validate proper data labeling and validity from the resource and its given license (Dake and Gyimah 2023).

  4. 4.

    Data augmentation: For evaluating output quality, data augmentation techniques are more prevalent. In order to provide new information with advanced applications, the data augmentation (Li et al. 2023) sector requires new research on the matter. If indeed the original dataset has biases, the data augmented from it will also have biases. As a result, determining the most effective data augmentation approach is critical.

  5. 5.

    Noisy data: The existence of noisy textual data from the data set can significantly lower the precision of any valuable information predictions. Many studies have shown that data set noise (Mercha and Benbrahim 2023) reduces classification performance and leads to poor prediction results. Proper handling of noisy data at the pre-processing level can improve performance.

  6. 6.

    Imbalanced data: Balanced categorisation is especially difficult due to the considerably unbalanced class distribution as well as excessive misclassification costs. The complexity of imbalanced categorization is exacerbated by factors such as dataset size, label noise, and data dispersion. Balance data can be handled by resampling sentiment text data and using other technologies (Li et al. 2023).

  7. 7.

    Data reprocessing: Missing information, Error input, inconsistent data, regional formats, incorrect data kinds, file modification, and missing privacy protection are some of the issues in sentiment data preparation. Data preparation includes data cleaning, data integration, data transformation, and data reduction, among other things. A technique for reducing noise and addressing information gaps is data cleaning (Alqarni and Rahman 2023).

  8. 8.

    Challenge of data scarcity in training Deep learning and solutions: Data scarcity poses a significant obstacle when training deep learning models for sentiment analysis, as a substantial amount of labeled data is typically required to build an accurate and robust model (Alzubaidi et al. 2023). When faced with a limited dataset, several challenges arise. Firstly, there’s a struggle with model generalization, where deep learning models, particularly deep neural networks, may memorize training samples instead of learning the underlying patterns due to the constrained data. This inhibits their ability to handle new, unseen data effectively. Overfitting becomes a heightened concern due to the scarcity of data, leading to exceptional performance on training data but poor performance on unseen data, as the model learns noise and irrelevant patterns from the limited dataset. Moreover, a small dataset may introduce biased representations, deviating from the true sentiment distribution and reflecting the biases of the training data. Capturing the variability of sentiments expressed across diverse contexts, languages, and writing styles is challenging with limited data. Additionally, model complexity and performance are compromised. Acquiring labeled data for sentiment analysis is costly and time-consuming, demanding significant resources and expertise (Alqarni and Rahman 2023).

    To mitigate these challenges, various strategies are crucial. Data augmentation techniques, such as paraphrasing and translation, can help in diversifying the dataset and creating additional training samples (Ghorbanali and Sohrabi 2023). Transfer learning, utilizing pre-trained models on related tasks and fine-tuning them for sentiment analysis, is an effective approach. Semi-supervised learning, active learning, domain adaptation, collaborative efforts for data sharing, and expert involvement in annotation are essential strategies to optimize model performance despite data scarcity. Implementing these approaches can contribute to the development of accurate and robust sentiment analysis models, even when faced with limited labeled data.

  9. 9.

    Handling emoji-based text: Emoji can assist users express feelings and comprehend the meaning of a text, but they also introduce ambiguity in communication interpretations, resulting in inefficiencies. Main challenges of handling emoji and emoticons in the text is that those are esists in an unformatted and unstructured manner (Bai et al. 2019). Proper pre-processing and emoji handling tools can attain good results in sentiment analysis.

  10. 10.

    Handling long and complex sentences: In sentiment analysis, the big challenge is handling of long and complex sentences (Bensoltane and Zaki 2023). Most of the conventional deep learning model like CNN, GRU can not handle this. LSTM, capsule and attention based deep learning methods are good for handling it. However developing more adaptable methods can be a great contribution to handling complex and long sentences.

  11. 11.

    Finding and dealing with user biasness: Once it comes to molding corporate culture, refining sales strategies, and lowering employee turnover, user feedback is invaluable. However, due of biases, many corporations and organisations struggle to digest information (Rakhecha et al. 2023). These can come from the standpoint of the user or the surveyor, who may not take an ex-comments user’s seriously. Sentiment analysis software can help you better comprehend user opinions from surveys and online job review sites like Glassdoor. Text analytics can assist in deciphering the true meaning of user comments and analysing emotional responses to identify bias and eliminate human error.

  12. 12.

    Working on multilingual text: When a mixture of languages is added to sentiment classification, all of the issues outlined above become even more complicated. To grasp negations, each language requires its own part-of-speech tagger, lemmatizer, as well as grammatical constructions. Because every language is distinct, it cannot be translated into a common language such as English to extract information. Only through the importance of hard work can these sentiment classification issues for multilingual information be addressed. However, the findings are worthwhile because you will receive the maximum possible sentiment classification prediction performance (Khan 2023).

  13. 13.

    Challenges in handling sarcasm: In casual chats and memes on social media, people employ irony and sarcasm. The use of sarcastic comments to indicate negative sentiment may make it difficult for emotion-mining systems to discern the underlying context of what the response is truly implying (Tan et al. 2023). Effectively addressing sarcasm in sentiment analysis proves challenging due to its subtle, variable, and context-dependent nature. Sarcasm operates subtly within language, making it difficult to consistently identify and interpret across diverse linguistic styles and contexts. Context, including tone and speaker intent, is crucial for accurate detection. The linguistic complexity of sarcasm, involving irony, understatement, or hyperbole, further complicates analysis. Divergent interpretations based on cultural differences add to the challenge, along with the ambiguity resulting from contradicting literal meanings. Sarcasm often blends positive and negative sentiments within a statement, making sentiment categorization complex (Rahma et al. 2023). Annotating datasets for sarcasm is intricate due to the need for nuanced human judgment. Additionally, the intentional hiding of sarcastic intent adds to the complexity of automated sentiment analysis. Overcoming these obstacles requires advanced natural language processing techniques, contextual analysis, sentiment lexicons, and models capable of understanding subtle linguistic nuances and contextual dependencies for accurate sarcasm identification in sentiment analysis.

  14. 14.

    Challenges in Handling Negation in Sentiment analysis Negation introduces a layer of complexity in sentiment analysis, posing challenges that impact the accuracy and reliability of sentiment interpretation. One significant hurdle is accurately identifying the scope of negation within a sentence-determining whether it affects a single word, a phrase, or the entire sentence is crucial (Rahman et al. 2023). Furthermore, the presence of double negations amplifies the challenge, necessitating a nuanced understanding of their cumulative impact on sentiment polarity. Syntactic variations in negation expressions, like the use of "not," "never," or "neither," add to the intricacy, requiring comprehensive recognition and handling. Resolving ambiguity arising from certain negations, such as phrases like "not bad," demands careful interpretation, as these can be challenging to classify as slightly positive or negative. Nested negations within sentences and the contextual sensitivity of negation further add to the complexity, demanding accurate disentanglement of their combined effects on sentiment. Additionally, domain-specific expressions of negation call for specialized handling, tailoring sentiment analysis models to the specific linguistic conventions of various domains. Lastly, machine learning limitations, particularly in capturing subtle negations and complex linguistic structures, emphasize the need for ongoing advancements to enhance the accuracy of sentiment analysis in the presence of negation (Kaur and Sharma 2023). Addressing these challenges necessitates a multi-faceted approach, leveraging advancements in natural language processing, context-aware models, and continuous refinement of machine learning algorithms.

  15. 15.

    Dealing context with idioms and comparative sentence: Context plays a pivotal role in negation, where a single word’s sentiment can be reversed based on the surrounding context, making it essential to consider the broader linguistic environment (Altaf et al. 2023). But idioms are different things because it presents different meaning. In the case of comparative sentences, comparative phrases can be challenging because they don’t always express a viewpoint. Knowledge graphs assess the relationships between concepts, words, as well as emotions. Additionally, sarcasm and irony, often employing negation, can significantly alter the intended sentiment, requiring sophisticated analysis to unravel their true meaning (Tahayna and Ayyasamy 2023).

  16. 16.

    Handling semantics and synthetic dependency: Semantics is the study of the deconstruction of words, phrases, and sentence construction. It affects our understanding of the text as well as our understanding of other people’s words in everyday conversation. Dependency parsers map words in a phrase to semantic roles, allowing syntactic interactions between words to be identified (Asudani et al. 2023).

  17. 17.

    Model Development : Appropriate classification is especially troublesome due to the disproportionately uneven sentiment class distribution and excessive mis-classification costs (Khan 2023). The complexity of imbalanced categorization is increased by factors including dataset size, label noise, as well as data dispersion. When there is a lack of training data, poor data quality, irrelevant features, nonrepresentative training data, overfitting, or underfitting, a deep learning model faces difficulties (Tahayna and Ayyasamy 2023).

6.1 Trustworthy deep learning requirements in text-based sentiment analysis

Creating a trustworthy sentiment analysis system using deep learning for text-based data involves several key requirements to ensure accuracy, reliability, and ethical considerations (Albahri et al. 2023). Here are the essential requirements:

  1. 1.

    High-Quality Data:

    Ensure a diverse and representative dataset with a wide range of sentiments and contexts to train the deep learning model accurately (Vatambeti et al. 2023).

  2. 2.

    Annotation of Training Data:

    Add accurate sentiment labels (positive, negative, neutral) to the dataset to facilitate supervised learning and model training (Dake and Gyimah 2023).

  3. 3.

    Preprocessing and Cleaning of Text:

    Apply preprocessing techniques such as standardizing text, tokenization, removing stop-words, and stemming to clean the data and enhance consistency (Alqarni and Rahman 2023).

  4. 4.

    Maintaining a Balanced Dataset:

    Maintain a balanced distribution of sentiment classes in the training dataset to prevent bias and enhance the model’s capacity to handle diverse sentiments (Alzubaidi et al. 2023).

  5. 5.

    Selection of Appropriate Models:

    Choose suitable deep learning models like recurrent neural networks (RNNs), long short-term memory (LSTM), convolutional neural networks (CNNs), or transformer-based models (e.g., BERT, GPT) for effective sentiment analysis (Khan 2023).

  6. 6.

    Optimization of Hyperparameters:

    Optimize hyperparameters such as learning rate, batch size, and model architecture to enhance the model’s performance and convergence (Bischl et al. 2023).

  7. 7.

    Training and Validation of the Model:

    Train the model using a portion of the dataset and evaluate its performance on a separate validation set to prevent overfitting and ensure broad applicability (Haque et al. 2023).

  8. 8.

    Performance Evaluation Metrics:

    Employ accurate evaluation metrics like accuracy, precision, recall, F1-score, or area under the receiver operating characteristic curve (AUC-ROC) to gauge the model’s effectiveness (Habbat et al. 2023).

  9. 9.

    Utilization of Ensemble Methods:

    Consider employing ensemble learning methods to amalgamate predictions from multiple models, thereby improving overall accuracy and robustness of sentiment analysis (Tiwari et al. 2023).

  10. 10.

    Ethical Considerations and Fairness:

    Address bias and fairness concerns in the dataset and model predictions to ensure equitable treatment of diverse demographic groups and steer clear of reinforcing stereotypes (Basir et al. 2023; Wang et al. 2023).

  11. 11.

    Interpretability and Explanation:

    Prioritize models that provide explanations for their predictions to enhance transparency and cultivate user trust in the system’s decision-making (Saha et al. 2023).

  12. 12.

    Regularization and Dropout Techniques:

    Implement regularization techniques like dropout to prevent overfitting and enhance the model’s ability to generalize (Zhou et al. 2023).

  13. 13.

    Ongoing Model Update and Fine-Tuning:

    Regularly update and fine-tune the model with new data to adapt to evolving language and changing sentiments in the text (Sheu et al. 2023).

  14. 14.

    Update and Fine-Tuning:

    Periodically update and fine-tune the model with new data to adapt to evolving language and sentiments in the text (Ding et al. 2023).

  15. 15.

    Testing and Validation in Real-World Scenarios: Testing the model in real-world scenarios to ensure its effectiveness, reliability, and accuracy across various domains and use cases (Jiang et al. 2023; Benarafa et al. 2023).

By adhering to these requirements, we can develop a trustworthy deep learning-based sentiment analysis system that provides reliable sentiment predictions for text-based data.

7 Limitations and future works in sentiment analysis

It is impossible to discern emotion by working like a person. Recent work in the subject of sentiment analysis from text has shown to be quite accurate. Coherence, context, semantic significance handling, negation, modifiers, and the sentence’s intensifier are all lacking. This problem is being addressed with a context-based task. Existing works have shortcomings when it comes to managing semantics and word dependency in sentences. Most of the above-mentioned traditional methodologies are limited in their use of linguistic abilities, and document aspects can be difficult to interpret in depth. Their learning and training period following feature encoding is insufficient. Deep learning techniques can handle text features with a wide range of independence. The position and orientation of the item are not encoded in the predictions made by CNN. It is completely devoid of all internal data pertaining to the position and direction. It sends all of the information to the same neurons, which may not be capable of handling it. As a result, deep learning has limits in terms of effectively interpreting context and grammatical information. It is clearly shown that our proposed CRDC outperforms other deep learning approaches. Furthermore, deep learning methods necessitate a large dataset for model training.

Furthermore, a movie’s sentiment and genre could be predicted simply by watching its trailer. People are relying more on multimodal data these days while assessing attitudes. Live sentiment analysis is now a trending task for a game, a product, or something else.

We have a plan to develop a cloud based Sentiment analysis system as like as figure shown in Fig. 13. In this figure, at first, data will be collected from different sources like Twitter, Facebook or other social media. Data will be collected using API, Survey or Web scrapping process. The collected data will be generated on the cloud based storage system. If the data is not pre-processed then our online text prepossessing will pre-process the data and embed the data. After the successful prepossessing of the data, the deep learning model will extract textual features and extracted features will be classified using a machine learning algorithm. Finally, the predicted result will be presented.

8 Discussion with recommendation

We examined the most notable work on sentiment analysis using deep learning-based architectures in this post. To begin, we discussed sentiments and their various forms, as well as their utilization and significance in sentiment analysis. Then, we offered a taxonomy of sentiment classification analysis that includes Customized features-based approaches as well as Machine learned features-based approaches. We reviewed the architecture of prominent deep learning models such as CNNs, RNNs, LSTM, GRU, Attention and Capsule Networks, as well as the essential and well-known work that has used these architectures in sentiment classification. We have also given a quick rundown on attention-based networks, RNNs, and capsule networks, all of which have lately gotten the researcher’s interest. We’ve also identified sentiment classification tasks and described the deep learning models that can be used to solve them. In comparison to other deep learning models, we found that LSTM produced better outcomes. Finally, we looked at the primary sentiment classification dataset, its important features, the deep learning model that was applied to it, and the accuracy (or F1 score) that was derived from it. We also talked about the importance of different researchers developing fresh datasets, and drawbacks of using deep learning in sentiment classification, and the advantages and disadvantages of various deep learning techniques and Neuro symbolic techniques. We analysed over 200 articles and concluded that sentiment classification using deep learning method is a promising study area based on the extensive examination of multiple deep learning-based approaches as well as their state-of-the-art performance results mentioned in this paper. For the recommendation of different tasks in deep learning based sentiment analysis is clearly presented in Table 29.

Table 29 Recommendation of tasks in deep learning based sentiment analysis
Fig. 13
figure 13

Cloud based sentiment analysis

8.1 Impact of practical implications over conceptual implications in terms of this research

The reality that would happen if certain circumstances were met is referred to as practical implication. For example, when a computer does sentiment experiments as an analyst, the dependability of the data they acquire has practical ramifications for how humans reliably judge the usefulness of various sentiments physically. Conceptual research is described as a process in which research is carried out by observing and evaluating previously available material on a certain issue. Conceptual research does not include any real trials. It has anything to do with abstract notions or ideas. Furthermore, the conceptual definition is primarily important in research work since it assists the reader in understanding the meaning of the researcher, particularly on terms or ideas that may have several interpretations or that may be contradictory. For this reason, we conceptually reviewed on sentiment analysis approach with its background and recent application. After this, we critically analysed different methods and proposed a method. In case of practical implication, we implemented different methods to compare our method with recent methods to predict sentiment practically.

9 Conclusion

This paper suggests that deep learning delivers more accuracy than other methods in text-based sentiment classification. Although supervised machine learning is a faster but more time-consuming method with higher accuracy, this type of algorithms cannot handle negation, intensifier, or modifier clauses in a sentence. However, deep learning is better than other methods in terms of accuracy, data size, handling of features, context, and syntactic. More precisely, most of the limitations of the sentiment analysis method can be overcome if the knowledge-based mechanism is integrated with deep learning. In this survey, it is concluded that RNN with attention or capsule-based features gives better results than other deep learning methods. Neuro symbolic AI also gives better performance.

This paper presents a novel discussion of deep learning based sentiment analysis with taxonomy, recent applications and proper dataset information. Our article states complete discussion of the data pre-processing, performance metrics, text embedding, deep learning model with their implementation architecture and overall drawbacks, challenges, limitations and future works. Additionally we have critically analysed all the categories of deep-learning based sentiment analysis methods with their tools, advantages, disadvantage of findings and research gaps. Our proposed CRDC method uses a capsule network with a Bidirectional RNN and deep CNN. Our proposed method got an accuracy of IMDB 88.15%, Toxic 98.28%, Crowdflower 92.34% and ER 95.48%.

Although, Text based sentiment analysis faces some challenges for lack of coherence, negation, intensifier and semantic meaning handling, and thus the trend of research is to improve existing tasks or develop a new method to handle that. There are a lot of tasks and challenges like handling multimodal data, multiple data sources, live data, and feedback data. This identifies the need to explore new computational techniques to improve the exactness of sentiment analysis on the social web and other related areas.