A systematic review of social network sentiment analysis with comparative study of ensemble-based techniques

Tiwari, Dimple; Nagpal, Bharti; Bhati, Bhoopesh Singh; Mishra, Ashutosh; Kumar, Manoj

doi:10.1007/s10462-023-10472-w

A systematic review of social network sentiment analysis with comparative study of ensemble-based techniques

Open access
Published: 12 April 2023

Volume 56, pages 13407–13461, (2023)
Cite this article

Download PDF

You have full access to this open access article

Artificial Intelligence Review Aims and scope Submit manuscript

A systematic review of social network sentiment analysis with comparative study of ensemble-based techniques

Download PDF

Dimple Tiwari¹,
Bharti Nagpal²,
Bhoopesh Singh Bhati³,
Ashutosh Mishra^4,6 &
…
Manoj Kumar ORCID: orcid.org/0000-0001-5113-0639^5,7

4790 Accesses
11 Citations
Explore all metrics

Abstract

Sentiment Analysis (SA) of text reviews is an emerging concern in Natural Language Processing (NLP). It is a broadly active method for analyzing and extracting opinions from text using individual or ensemble learning techniques. This field has unquestionable potential in the digital world and social media platforms. Therefore, we present a systematic survey that organizes and describes the current scenario of the SA and provides a structured overview of proposed approaches from traditional to advance. This work also discusses the SA-related challenges, feature engineering techniques, benchmark datasets, popular publication platforms, and best algorithms to advance the automatic SA. Furthermore, a comparative study has been conducted to assess the performance of bagging and boosting-based ensemble techniques for social network SA. Bagging and Boosting are two major approaches of ensemble learning that contain various ensemble algorithms to classify sentiment polarity. Recent studies recommend that ensemble learning techniques have the potential of applicability for sentiment classification. This analytical study examines the bagging and boosting-based ensemble techniques on four benchmark datasets to provide extensive knowledge regarding ensemble techniques for SA. The efficiency and accuracy of these techniques have been measured in terms of TPR, FPR, Weighted F-Score, Weighted Precision, Weighted Recall, Accuracy, ROC-AUC curve, and Run-Time. Moreover, comparative results reveal that bagging-based ensemble techniques outperformed boosting-based techniques for text classification. This extensive review aims to present benchmark information regarding social network SA that will be helpful for future research in this field.

EnSWF: effective features extraction and selection in conjunction with ensemble learning methods for document sentiment classification

Article 09 March 2019

Classification of Textual Sentiment Using Ensemble Technique

Article 05 November 2021

A Novel Ensemble Approach for Feature Selection to Improve and Simplify the Sentimental Analysis

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

With the incremental growth of information technology and social platforms, user-generated information can easily be posted online, and this information contains people's sentiments and emotions toward a particular issue. While government, companies, and individuals are interested in retrieving the sentiments behind that reviews. Miserably, with the massive amount of data, it is challenging to polarize these comments and reviews. Where human experts are overpriced for labeling these reviews manually. Accordingly, SA is gaining a lot of popularity in research topics (Chen and Yang 2011). It is a broadly active method for analyzing and extracting opinions from text using individual or ensemble learning techniques. This field has unquestionable potential in the digital world and social media platforms. The vast content generated on the web is unstructured, which can be processed by the SA and converted into meaningful information. SA is the subset of NLP that combines computational linguistics, a rule-based approach, and machine learning for extracting the public's opinion from content provided on social platforms, including text, images, and videos. According to the requirement of a particular application, the problem of sentiment classification is primarily handled at aspect, sentence, and document levels. Aspect-based SA is known as the feature-level SA in which multiple features are extracted from the text reviews. Aspect-based SA provides a deep study of reviews and extracts the context of reviewers for a particular domain (Thet et al. 2010; García-Pablos et al. 2018). The aspect-level approach mainly depends on the syntactic features of the text reviews (Che et al. 2015). Sentence-based SA approach works on finding the polarity for a particular sentence. Here, the various words are linked together to form a sentence and extract the polarity from that sentence N-Grams technique is used, which separates the words into pair of one, two, or maybe three. Sometimes N-Gram technique is failed to find the relationship between these words. Therefore, dependency tree and typed dependency have been introduced to address the word separation problem in text classification (Meena and Prabhakar 2007). In the sentence-level classification, each sentence is considered a separate unit and assumes that every sentence produces only one opinion: positive, negative, or neutral (Jagtap and Pawar 2013). Each document is considered a single unit in the document-based approach, and a single opinion is assigned for the whole document. The Bag-of-words approach is very popular and provides more accuracy in handling complexity in document-level SA (Bhatia et al. 2015). Most sentence-level applications try to achieve good accuracy in the whole document (Zhang et al. 2009). SA and opinion mining are two popular fields that help to calculate opinioned information from online social platforms. These are commonly reciprocal to present a similar meaning. However, some researchers are used them for handling slightly different problems. SA is used to detect the sentiment from reviews as neutral, negative, or positive, and opinion mining is used to analyze a text's subjectivity (Tsytsarau and Palpanas 2012). Previous research employed machine learning and heuristic-based methods very frequently. Heuristic-based methods mainly depend on semantic features and linguistic characters, whereas machine learning-based algorithms are classified into unsupervised, supervised, and ensemble learning.

Several articles have been published related to SA using different techniques, which generates a need for a deep study to summarize the trends and aspects related to SA. One comparative study and one detailed survey were also presented a few years back by Xia et al. (2011) and Giachanou and Crestani (2016) in 2011 and 2016, respectively. Xia et al. (2011) provided a comparative study of ensemble-based techniques for SA but did not cover the advanced ensemble approach of this field. Giachanou and Crestani (2016) presented an in-depth survey related to Twitter SA and summarized the previously proposed approaches of SA in Twitter. However, this survey did not implement any latest techniques for comparative discussion and did not explore the latest updates in this field. Here, we provide a detailed SA survey and present all the recent facts and trends related to this field. This study investigated the research work from 1996 until 2022 utilizing online repositories and tried to cover all the essential aspects related to SA, which will provide deeper information to upcoming researchers in a single manuscript. Extensive experiments have also been conducted on different domains to provide the best ensemble approach for the sentiment classification task—this analytical study was mainly conducted for sentence-level SA using ensemble machine-learning techniques. Furthermore, experimented ensembles are categorized into two major categories; bagging and boosting. Accordingly, eight ensemble learners were implemented, where five belonged to boosting approach and three from the bagging approach. Figure 1 presents the summarized taxonomy of our social network SA survey.

Multiple learners learn together in an ensemble approach to get more accurate and efficient results than individual learners. Ensemble methods have been used in NLP applications and are proven better than a single method (Zhang et al. 2009). The Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) models with averaging method generate better results than individual ones (Minaee et al. 2019). Although governments, businesses, and individuals are always interested in calculating the polarity and sentiment from the reviews, no consistent conclusion is available to prove which methodology is best for this process. Therefore to find conclusive results, this study compares eight ensemble techniques on four popular datasets to investigate the performance of ensemble models for SA. The main objective of this study is to explore the latest research on sentiment classification with a comparative analysis of ensemble-based techniques. Therefore, we explained five research queries.

RQ1 What are the different approaches, publishing platforms, and benchmark datasets used by researchers for the SA.

To discover the most popular approach and dataset used in the field of SA. This would be helpful for the researchers to understand the current scenario related to this area.
RQ2 What are the major challenges facing the researchers during sentiment calculation from text reviews.

Discuss the challenges in the field of NLP with their proposed solutions.
RQ3 What are the distinct feature engineering techniques for selecting the essential features from text reviews.

To explain the various feature engineering techniques for dimensionality reduction of text datasets. Thus, many critical research papers have been collected from different publishing sites to map popular feature engineering techniques for text datasets.
RQ4 What are the researchers' emotion theories to detect the emotions from the social content, including text, images, and videos.

To identify the common emotions that are present in prestigious theorist emotion sets. It would provide the best emotion set to future researchers for opinion extraction from the social content, including text, images, and videos.
RQ5 Which is the best ensemble technique for sentiment classification and future opportunities of SA.

To discover the best ensemble technique this provides the highest results in terms of all standard measures. Hence, various experiments were conducted on different domains to select the best technique of text classification. It would be helpful for SA-related applications. Future opportunities related to SA have been discussed.

The further sections of this study are categorized as follows: Sect. 2 presents the extensive literature survey related to the SA. Section 3 elaborates on the all-important aspects of SA. Section 4 describes the methodology used for the comparative study. Section 5 presents the comparative results and analysis. Section 6 discussed the future opportunities of SA. Finally, Sect. 7 generates the study's conclusion and addresses some needful issues for future research.

2 Literature survey

SA is extensively used to extract people's opinions, emotions, and sentiments toward a particular brand, business, place, or product. Various techniques and approaches are also introduced to classify the sentiments as the demand for SA increases. After analyzing the vast literature on sentiment classification, we have concluded that SA can use five significant approaches. Figure 2 presents the classification of all the major approaches used by researchers for sentiment classification.

First, the lexicon-based approach uses a manually or automatically-generated list of various positive, negative or neutral polarity terms for sentiment classification. The lexicon approach computes the semantic orientation of phrases and words in sentences and documents to reveal the sentiments. Usually, the lexicon-based approach uses adjectives to indicate the semantic adjustments (Taboada et al. 2011). Second, the machine learning approach is a widely adopted technique for SA. Most researchers preferred a machine learning-based approach for sentiment classification due to their fast execution and reliable results. Machine learning provides various single learners, namely Naïve Bayes (NB), K-Neighbors (KN), Linear Regression (LR), Support Vector Machine (SVM), and so forth. Third, the graph-based approach selects the nodes and vertices based on the feature (reviews and tweets) available in input materials. Various graph-based models such as Enterprise Graphs, Hyper-graph, Hashtag Graphs, N-Gram Graph, and Co-Occurrence Graph are available for effective SA process (Krishnakumari and Akshaya 2019). Fourth, the ensemble approach combines multiple weak learners to form a powerful learner. Various ensemble learners, namely Random-Forest, Extra-Tree, Meta-Estimator, Ada-Boost, Gradient-Boosting, Light-GBM, Cat-Boost, and Extreme Gradient-Boost, are available to make the sentiment process more effective than the lexicon approach and single machine learning approach. Fifth, the most potent Hybrid approach that enhances the capability of sentiment classification model with the integration of machine learning and lexicon-based approach or with the combination of multiple machine learning algorithms. A hybrid approach is a novel idea that the researchers present to build a more prosperous and robust model for solving a particular problem. The researcher performs various experiments with discriminant techniques on specific data and tries to create a more effective model than a single and ensemble model. For example, linguistic dictionary and SVM were combined to build a hybrid model for political tweets sentiment classification that acquired 93% accuracy for sentiment classification, which is significant enough and beneficial for politicians to make strategies for future elections (Nandi and Agrawal 2016). Here, we categorized all the previous research into two parts: Sentiment Analysis (SA)—which studies the subjective information in the text, and Sentiment Classification (SC)—which identifies the opinions from the text and assigns a particular label to them.

2.1 Lexicon-based approach

Phrases and opinions implement lexicon-based approaches without prior knowledge of labels. Here, collective phrases are treated as an opinion lexicon along with negative and positive words. Opinion lexicons determine the orientation of the terms available in the text dataset. The lexicon-based approach is categorized into two parts; the Dictionary-Based approach- judges the sentiment based on phrases available in lexicons, and the Corpus-Based approach—extracts the context present in the text. Table 1 reports the list of lexicon-based research from 2011 to 2022.

Table 1 Lexicon-based approach for SA from 2011 to 2022

A systematic review of social network sentiment analysis with comparative study of ensemble-based techniques

Abstract

Similar content being viewed by others

EnSWF: effective features extraction and selection in conjunction with ensemble learning methods for document sentiment classification

Classification of Textual Sentiment Using Ensemble Technique

A Novel Ensemble Approach for Feature Selection to Improve and Simplify the Sentimental Analysis

Explore related subjects

1 Introduction

2 Literature survey

2.1 Lexicon-based approach

2.2 Machine learning-based approach

2.3 Graph-based approach

2.4 Ensemble approach

2.5 Hybrid approach

2.6 Extensive literature analysis

2.6.1 Growth in publications of SA

2.6.2 Publication platform for SA

2.6.3 Popular datasets for SA

2.6.4 More favorable techniques for SA

3 Important aspects of SA

3.1 SA challenges

3.2 SA feature engineering

3.2.1 Feature extraction

3.2.1.1 PCA

3.2.1.2 LSA

3.2.2 Feature selection

3.2.2.1 Filtered method

3.2.2.2 Wrapper method

3.3 SA emotion theories

4 Methodology used for comparative analysis

4.1 Dataset collection

4.2 Data preprocessing

4.3 Tokenization

4.4 TF-IDF vectorization

4.5 Ensemble techniques

4.5.1 Bagging ensemble approach

4.5.1.1 Random-forest

4.5.1.2 Extra-tree

4.5.1.3 Meta-estimator (linear SVC)

4.5.2 Boosting ensemble approach

4.5.2.1 Ada-boost

4.5.2.2 Gradient-boosting

4.5.2.3 Cat-boost

4.5.2.4 Extreme-gradient boost (XGB)

4.5.2.5 Light-GBM (LGBM)

5 Comparative results

6 Research opportunities in SA

6.1 SA in medical

6.2 SA in politics

6.3 SA in industries

6.4 SA in finance

6.5 Technical discussion

7 Conclusions and future work

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation