Sentiment Analysis of Tweets using Text and Graph Multi-views learning

With the surge of deep learning framework, various studies have attempted to address the challenges of sentiment analysis of tweets (data sparsity, under-speciﬁcity, noise, and multilingual content) through text and network-based representation learning approaches. However, limited studies on combining the beneﬁts of textual and structural (graph) representations for sentiment analysis of tweets have been carried out. This study proposes a multi-view learning framework ( end-to-end and ensemble-based ) that leverages both text-based and graph-based representation learning approaches to enrich the tweet representation for sentiment classiﬁcation. The efﬁcacy of the proposed framework is evaluated over three datasets using suitable baseline counterparts. From various experimental studies, it is observed that combining both textual and structural views can achieve better performance of sentiment classiﬁcation tasks than its counterparts.


Introduction
With the growing popularity of Twitter, tweets have been popularly considered for sentiment analysis studies in recent times.Unlike regular text, sentiment analysis of tweets needs to handle inherent challenges like under-specificity due to limited characters (240 characters), informal writing styles, misspelling, code-switching, code-mixing contents, etc.Researchers have adopted various approaches such as sentiment-specific representation learning [1], tweet expansion [2], users relationship characteristics [3], multi-source information [4], ensembling [5], etc. to mitigate the above challenges.Earlier studies of sentiment analysis primarily focus on textual views; however, recent studies have shown the advantages of exploiting network embedding for sentiment analysis of tweets [2,[6][7][8].In the studies [2,6], authors construct a global network from a set of tweet corpus and learn the representation of required attributes such as keywords, hashtags, users, etc., for sentiment classification.Whereas, studies [7,8] construct a local network using a dependency parse tree of the individual tweet and learn a representation of the tweet for further classification.The above studies have shown that capturing structural information helps in enhancing sentiment analysis performance.It is also reported that network embedding is less sensitive to the social media-related noise mentioned above.
Though the above studies have shown the prospect of enhancing the performance of sentiment classification of tweets by incorporating both graphical and textual views, there exist a few limitations of the studies.The studies of Singh et al. [2] and Lu et al. [6] have exclusively performed node embedding over the tweet corpus.The node embedding is subsequently incorporated into the individual textual view to capture the tweet's sequential information (i.e.textual views).These approaches do not consider incorporating the textual and structural views alongside.On the other hand, Meng et al. [7] and Zhang et al. [8] investigate the use of a dependency parse tree to represent a tweet in a network structure.However, adopting a dependency parse tree to represent a tweet may not always be feasible in the case of multilingual contents (code switch and code mix) and informal textual constructs.In such a case, it is required to represent a tweet in a graph structure that is language-insensitive yet captures the semantic relationship of the words in incorporating both structural and textual information.It is also observed that these studies combine textual and structural views progressively, where the representation of the words from text views is passed on to the structural views to represent the nodes (words).These approaches have not considered the notion of complementing both views simultaneously in parallel.
Motivated by the above observations (i.e., advantages of capturing structural information in the tweet, advantages of using network embedding), this paper proposes a multi-view based neural model to exploit both the textual and structural properties in parallel for an improved sentiment analysis system and attempt to understand two research questions -(i) How informative is a graph-based representation of a tweet compared to text-based representation?(ii) Does the text-based representation and graph-based representation complement each other?This study consider to use a heterogeneous multi-layer network proposed in the study of Singh et al. [2] to represent a tweet.A multi-layer network is a network formed by connecting different layers of networks.For example, a tweet or a collection of tweets can be represented as a heterogeneous multi-layer network by connecting layers of mention's relations, hashtag's relations, keyword's co-occurrence relations.Figure 1 shows an example of representing a tweet to a heterogeneous multi-layer network.Since the heterogeneous multi-layer network exploits co-occurrence characteristics rather than the linguistic structure, it makes the heterogeneous multi-layer network less sensitive to social media-related multilingual noise [2].The key difference between this work and Singh et al.'s study is the way of representing the heterogeneous graph in latent

Hashtag Layer Keyword Layer
Mention Layer Fig. 1 An example of representing a tweet to a heterogeneous multi-layer network structure embedding space (refer Section 3.2).This study can be considered an extension work of the study of Singh et al. that investigates whether the graph-based and text-based representations complement each other and if the heterogeneous graph captures the underlying linguistic information like the graph generated using a language-specific dependency parser.
In the proposed study, two types of tweet views are represented: textual views and graphical views and generate the embedding representation for each view using an appropriate embedding method.In this study, we use Convolution Neural Network (CNN) [9] and Bidirectional Encoder Representations from Transformers (BERT) [10] based representation learning for textual view and Deep Graph CNN (DGCNN) [11] and Segmented-Graph BERT (Seg-BERT) [12] for graphical view.The representations thus obtained are then integrated using an attention-based aggregator.The efficacy of the proposed model is evaluated using suitable baseline counterparts.From various experimental setups over three datasets, it is evident that the proposed multi-view model provides better sentiment analysis performance than its single view counterparts.Further, it is also observed that the proposed model is less sensitive to under-specificity, noise, and multi-lingual content.In summary, this paper has the following contributions: • Propose a multi-view learning framework to incorporate text and graph views of the tweet.• Evaluate the performance of the proposed framework for sentiment analysis in comparison with graph-based and text-based representations of tweets.• Investigates whether the tweet graph be necessarily a language-dependent graph or language-insensitive heterogeneous graph..The remainder of the paper is organized as follows.In Section 2, the literature related to this study is presented.Section 3 presents the proposed investigation study.The experimental setup is described in Section 4. The results and observations are analyzed in Section 5. Finally, the study of this paper concludes in Section 6.

Related studies
There exist a few studies that exploit both text and graph views for sentiment analysis tasks.This section presents a brief review of the literature related to the proposed study.Recent studies have started exploiting graph representation-based methods on top of the text-based representation for aspect-based sentiment analysis tasks.Studies in [7,8,13] have considered using the Graph Convolution neural Network (GCN) for learning the node features in the aspect-based sentiment classification tasks by transforming the opinionated text to a tree using a dependency parser of the English language.Zhang et al. [8] apply GCN over the dependency tree of the input text with its node features generated using Long short-term memory (LSTM) model capturing the contextual information of the text.To obtain the aspect-specific features, they apply masking over the GCN output to filter out the non-aspects words features and apply attention over these aspect-aware features for the sentiment classification task.In a similar approach, Meng et al. [7] consider using BERT embedding for learning contextual node features of GCN.Chen et al. [13] perform aspect-based sentiment classification in a multi-view learning framework.This study employs GCN over the dependency tree and LSTM over the word sequence and concatenates the learning representation for the aspect-based sentiment classification task.The difference between the above studies and our proposed study is the application of network representation.The above studies consider using a dependency parser to represent the input text in a graph structure.However, it is not feasible to have a dependency tree for every language as tweets are highly multilingual.Unlike the above studies, this paper considers exploiting the relations of hashtags, mentions, and regular tokens present in tweets as a heterogeneous multi-layer network for sentiment classification of tweets rather than its aspects-based sentiment analysis.
In a different direction but related, Lu et al. [6] consider GCN and BERT to generate the word embeddings.Their study considers vocabulary graphs to generate node embedding using GCN and pre-trained BERT embedding for text-based representation.The two-word embeddings are concatenated to generate the sentence representation via multi-head attention over the input word embeddings for the underlying sentiment classification task.Their approach does not complement the representation of text and graph views.Yao et al. [14] perform text classification using GCN by representing the text corpus to a heterogeneous network with the document as one type of node and the informative keywords connecting them.Their study applies GCN over the single structure, which requires the training and testing document to be present in the heterogeneous graph for generating the representation of the document.Hence, it not feasible to represent unseen tweets for sentiment classification task.Unlike the above studies, this paper represents tweets for sentiment classification by simultaneously learning representation of text and graph views.

Proposed study
Given a tweet T with n words (w 1 , w 2 , w 3 , ..., w n ), the objective of this paper is to incorporate semantic relation of words represented in different views (textual and graph) through a multi-view representation model.The text-view is represented using  text embedding methods such as CNN, BERT.The graph-view is represented using graph embedding methods such as DGCNN, Seg-BERT.Figure 2 shows a high-level architecture of the proposed framework.
In the remaining part of this section, italic lowercases (e.g., w i , s), bold lowercases (e.g.x i , h), and bold uppercases (e.g.W) are used to denote scalars, vectors and matrices respectively.A tweet T is represented in text-view as a matrix X ∈ R n×d where X i (i th row of the matrix X) represents the embedding of the word w i of dimension d.This study considers FastText embedding [15] to generate the initial semantic word embeddings.However, the proposed framework can be applied to any word or node embedding method.The semantics of the word sequence relations are captured using a text representation model F seq that transforms the textview X to a vector z seq , i.e., z seq = F seq (X, θ seq ) where θ seq is the model learning parameter.Since hashtags and mentions are added by the author of the tweet, capturing the relation of hashtags, mentions, and normal tokens will be of great interest as hashtags and mentions can link tweets to similar topics or themes.In order to capture the semantic relations of the words, the tweet T is represented in graph-view as a heterogeneous multi-layer graph via an adjacency matrix representation A n×n to accommodate the relation of hashtags, mentions, and normal keywords present in the tweet.The process of representing T to the heterogeneous multi-layer graph is discussed in Section 3.2.The semantics of the relations of words are captured using graph instance representation learning model F graph that transformed A n×n to a vector z graph using its corresponding word embedding X as nodes features, i.e., z graph = F graph (A, X, θ graph ) where θ graph is the model learning parameter.This study exploits CNN and BERT models as the text representation model (F seq ) for capturing the local semantics of tweets.While DGCNN and Seg-BERT models are considered as the graph representation model (F graph ) to capture the semantic relations of the tokens in tweets.The text and graph representation models considered in this study are further discussed in Sections 3.1 and 3.2.
Given a text-view representation z seq and graph-view representation z graph of a tweet T , the two views are integrated using the Scaled Dot-Product Attention mechanism [16].Given a query tweet, the idea is to assign attention weights to text-view and graph-view over the input query as the average of both the representation.The purpose of the attention mechanism is to capture the right amount of information for each view to represent the input tweet.We define the query of the attention by element-wise average of the z seq and z graph representations, i.e., The attention weight vector of the text view is defined as: Similarly, the attention weight vector of the graph view is defined as: The text-based and graph-based representations generated using the above methods can be incorporated in an end-to-end or ensemble fashion to generate the final representation of tweet for sentiment classification task.In the end-to-end framework, as the name suggested, both the text and graph representation methods are learned together for the tweet classification task.While in ensemble framework, the text and graph representation methods are learned individually and fused in parallel to generate the final tweet representation.The two views are integrated by concatenating the weighted representation of each view.The weighted representations help select the informative representation of each view that supports generating the final representation of the tweet.The concatenation of the two views can be define as follow: The sentiment classifier is trained using a dense layer with Relu activation function.It can be mathematically define as: where W and b and weight and bias parameters of the dense layer.We use Categorical Cross-Entropy loss function define in Equation 6and Adam Optimizer as the optimization technique for training the propose framework.
where c is the number of sentiment classes, t ic is the c th ground truth class for the tweet, l is the total number of training samples, and s ic is the predicted probability on sample i for the c th class.

Text representation model
Given a tweet, text-view can be generated using any suitable text embedding methods.In this paper, we have investigated text representation using CNN [9] and BERT [10].This section discusses CNN and BERT based embedding briefly.

Convolution Neural Network
From earlier studies [2,17], it is reported that CNN captures local semantics specially for short text better than recurrent-based models for sentiment classification tasks.If a matrix X ∈ R n×d defines a tweet, then, the i th row of the matrix X represents the embedding of the i th word in the tweet.To apply convolution over the matrix X, we consider kernels of size h × d to capture spatial properties of h consecutive words in the tweet.We apply a filter f at a position t of X using the following expression.
where W (f ) is the kernel matrix for the filter f and b (f ) is the corresponding bias.We consider padding and apply filter f with a stride size 1 to obtain a convolution vector c (f ) for the tweet matrix X.The elements of c (f ) vector are defined as follow: After applying maxpooling, we obtain a vector z f to represent the tweet using the filter f i.e., z (f ) = maxpooling(c (f ) ) (9) We consider 128 number of filters.The 128 z f vectors obtained from 128 filters are concatenated to obtain the vector representation of the textual view of the tweet represented by For ease of reference, we can define the whole operation as: where θ denotes the required hyper-parameters of the CNN model such as k filters, h convolution window size.We apply 2-layers of CNN model with same parameters over the input X to represent the input tweet, i.e., In order to reduce the size of z cnn vector, the z i vector is further transformed to scalar by applying global maxpooling over z i .

Bidirectional Encoder Representations from Transformers
Majority of the recent study on text embedding considers using Bidirectional Encoder Representations from Transformers, more commonly known as BERT [10].Earlier studies have considered using BERT as a pre-trained model [6,7].However, it is inefficient to use a pre-trained BERT model if it does not match the current domain of interest leading to out-of-vocabulary issues [18].This study considers building BERT from scratch to overcome the inefficiency caused by using pre-trained BERT models.Given a tweet representation X ∈ R n×d , the BERT model captures the semantic information of the word sequences by relying only on the attention-weighted representation of the words.The word order relation is incorporated into the initial word embedding X by adding element-wise positional embedding.The position embedding for each word position pos can be defined as: There are l number of transformer blocks stacked on top of the other in the BERT architecture.The initial input to the first transformer block is the sum of word embedding X and positional embedding P, i.e., Z 0 = X + P. To capture the different aspects of tweet semantics, a transformer block t can have mh multi-head attention layers.For each attention head i ∈ (1, mh) in a transformer block t, three matrices are generated using dense layer over the input Z t serving as the query, key, and value to find the attention-weighted representation using the Scaled Dot-Product Attention mechanism [16], i.e., where Q i , K i , and V i is a linear transformation of the input Z t through three different weight parameters {W qi , W ki , and W vi } ∈ R n×n .The output of each attention head i ∈ (1, mh) in a transformer block t can be defined as: The attention-weighted outputs of the multi-head attention layer are concatenated to generate the semantic representation using dense layer with Relu activation function as output for the transformer block t, i.e., where W ∈ R n×n•mh and B ∈ R n×d are the weights and biased parameter matrices, Z t+1 represent the output of the t transformer block.The output of the last transformer block, i.e., Z l+1 is considered as the final representation of the input tweet T to the BERT model.To represent in the vector space, Z l+1 is being flatten into z bert ∈ R n•d×1 vector for sentiment classification.For ease of reference, the whole operation can be defined as: where θ represents the hyper-parameters such as l number of encoders, mh number of multi-head attentions, d hidden layer dimensions.We have considered the same hyper parameters used in original BERT setup, i.e., l = 8 transformer blocks and mh = 8 multi-head attentions.

Tweet graph construction
Using the same intuition of the study of Singh et al. [2], a tweet is represented as a heterogeneous multi-layer network by considering the link between hashtags, mentions, and normal tokens that co-occur in a tweet.Figure 1 illustrates how a tweet is represented in a heterogeneous multi-layer network.It can be highlighted that before transforming the tweet to the heterogeneous multi-layer network, a preprocessing step, such as removing stopwords, normalizing keywords (converting to lowercase, removal of url links), and so on, could be performed.This study perform removal of the stopwords and normalization of keywords before transforming tweet to the heterogeneous multi-layer network structure.In this study, the heterogeneous multi-layer network is represented using three types of undirected relations i.e. mention-mention (MM), hashtag-hashtag (HH), mention-hashtag (MH) or hashtag-mention (HM) and five directed relations i.e. keyword → keyword (KK), keyword → hashtag (KH), hashtag → keyword (HK), keyword → mention (KM), and mention → keyword (MK).The directed edges are considered to capture the sequence relation of normal tokens.Accommodating all the eight types of relations of a tweet with n tokens can be represented in the adjacency Sentiment Analysis of Tweets using Text and Graph Multi-views learning matrix as: where MH = HM and B r represent the adjacency matrix representation of the relation r ∈ {HH, MH, HM, MM, KK, KH, HK, KM, MK}.
The key difference of this study from that of Singh et al. [2] is in utilizing the supra-adjacency matrix A n×n to represent the structural information of the tweet graph.The Singh et al.'s study encompasses using a biased random walker to generate alternative naturalistic word sequences via language modeling from the tweet graph to yield multifaceted representations of the tweet, followed by using a text embedding model (CNN, LSTM) to represent the random walk sequences and the original tweet in latent space.In such a case, the relations of the nodes in the graph are not fully captured due to the noise created using random walk and language model algorithm.
To address the aforementioned issue, this study explores graph embedding techniques to harness node relationships and capture latent information of the heterogeneous graph.Recent studies of graph instance representation learning [11,12] have shown promising results in capturing the latent representation of the graph.Therefore, to capture the insight relation of hashtags, mentions, and normal keywords, this study considers the graph representation learning methods [11,12].

Network expansion
This study investigates whether adding semantically related tokens into the tweetgraph can enrich the representation of the tweet.To expand a tweet graph, the semantically related nodes of all tokens in the tweet are retrieved using cosine similarity over the word embeddings generated using FastText (FT) [15] and Sentiment Hashtag Embedding (SHE) [1] methods.We select top 20 tokens having high cosine similarity scores to the tokens present in the tweet as semantically relevant nodes of the tweet.These 20 nodes are added to the tweet graph by introducing an undirected edge with all the nodes.For ease of reference, such node expansion approach is considered as semantic Node Expansion (NE).
Further, we investigate whether adding semantically related as well as sentiment polarized tokens into the tweet-graph can enrich the representation of the tweet or not.For this study, the previously selected semantically similar nodes through NE is filtered by selecting only the sentiment polarized tokens.To select the sentiment polarized tokens, this study exploits the SHE method to classify the sentiment of the 20 semantically relevant nodes.Then, the sentiment polarized node expansion is performed by dividing the 20 nodes into three different sentiment sets, i.e., positive, negative, and neutral.The dominating sentiment set, i.e., majority of the nodes having same sentiment, are selected for sentiment polarized node expansion.For ease of reference, this study consider such expansion approach as Sentiment polarized Node Expansion (SNE).

Graph representation model
Recent studies on graph instance representation learning [11,12] have shown a promising results in capturing the latent representation of the graph.We can apply graph instance representation learning methods such as DGCNN [11] and Seg-BERT [12] over A n×n to represent it in vector space for graph classification task.

Deep Graph Convolution Neural Network
Zhang et al. [11] have used Graph Convolution Neural network (GCN) [19] for graph classification task.Compared to the study of Kipf and Welling [19] which work on single structure, this method is able to represent graphs of arbitrary structures.
They proposed an algorithm named SortPooling similar to Weisfeiler-Lehman node coloring algorithm [20] for sorting vertex features to learn the global graph topology.Given a graph A n×n and feature matrix (word embedding) X ∈ R n×d , we can apply multiple stacks of GCN at time t to output Z t as where Ã ∈ R n×n is the adjacency matrix with added self-loops (identity matrix) i.e.Ã = A + I, Z 0 = X, W ∈ R d×c is the neural weight parameters 1 shared with all the graphs, and h is the number of GCN layers.For learning global node features, the output of each GCN layers are concatenated row-wise i.e.Z = Z 1:h and apply SortPooling over Z i.e., Z sp = SortP ooling(Z).The output Z sp is fed to CN N layer to generate the graph representation via MaxPooling, i.e., where θ graph is the learning parameters of CNN.We use the same parameters considered in text representation model (refer Section 3.1.1).

Segmented-Graph BERT
Zhang et al. [12] have used BERT architecture to encode graph information given node features such as word embeddings (X), latent representation of adjacency neighborhood matrix (A), node degree matrix (D), and node global role matrix (WL) pre-computed using Weisfeiler-Lehman algorithm [20].We feed these features as input to the BERT model, i.e., Hence, we can learn graph instance representation of a graph similar to normal BERT model which captures semantic relations of the nodes in the graph as where θ graph is the learning parameters of BERT.We use the same parameters considered in text representation model (refer Section 3.1.2).

Experimental setup 4.1 Dataset
To evaluate the efficacy of the proposed framework, this study considers a Societal dataset used in [2,17] for sentiment classification task.This dataset contains 1,505 under-specified tweets (tweets having less than 5 tokens) and 1,626 multilingual tweets (code-mix of Hindi and English languages).The Societal dataset is curated over 4 topics happened in India namely Kashmir Unrest, Pathankot Attack, Surgical Strike, and GSTN2 .Table 1 shows the characteristics of the training dataset considered in this study.

Baseline classifiers
To evaluate the performance of the proposed framework, we consider four singleview classifiers i.e.; CNN, BERT, DGCNN, and Seg-BERT, and two multi-view classifiers i.e., T+MLN and VGCN-BERT as baseline models for comparision.
• CNN: The output of z cnn of CNN model over the input X is considered as the tweet representation for sentiment classification task in Equation 5. • BERT: The output of z bert of BERT model over the input X is considered as the tweet representation for sentiment classification task in Equation 5. • DGCNN: The output of z dgcnn of DGCNN model over the input X is considered as the tweet representation for sentiment classification task in Equation 5. • Seg-BERT: The output of z seg−bert of Seg-BERT model over the input X is considered as the tweet representation for sentiment classification task in Equation 5. • T+MLN: The work of Singh et al. [2] is considered as one of the baseline method for incorporating graph as well as text information.

Results and Observation
In this section, we investigate the efficacy of the proposed framework over the baseline methods through the two research questions -(i) How informative is a graph-based representation of a tweet as compared to that of the text-based representation?(ii) Does the text-based and graph-based representations complement each other?The efficacy of the proposed framework is investigated over the Societal dataset using a 10-fold cross-validation strategy.Table 2 shows the performance of the classifiers over Societal and SemEval datasets for the sentiment classification task.
For this analysis study, the under-specified and multilingual tweets are excluded from the Societal dataset.These tweets are considered to investigate whether the proposed model is able to address the challenge of social media noises.

How informative is a graph-based representation of a tweet compared to text-based representation?
The first part of Table 2, i.e., single view methods, shows the performances of the single-view classifiers.In the societal dataset, it is observed that the best performance achieved by a single-view classifier is up to 77.39% accuracy with F-Macro of 75.71% using Seg-BERT over the heterogeneous tweet graph.While the performance of the sentiment classifier built over text representation, i.e., text-view, can achieve the best performance up to 77.16% accuracy using CNN.Similarly, in the SemEval 2016 dataset, it is observed that the graph-based classifier DGCNN can achieve best up to 74.31% with 48.38% F-Macro score while the text-based classifier CNN can achieve a comparable performance accuracy of 73.41% and 47.26% F-Macro score over the same dataset.In contrast, it is observed that the CNN classifier can achieve the best performance of 64.42% accuracy with an F-macro score of 61.98% over the SemEval 2013 dataset while the DGCNN classifier can achieve a best up to 62.03% with a 56.59%F-Macro score over the SemEval 2013 dataset.It is observed that the CNN-based classifier can perform better than the BERT-based classifier.The primary reason is due to the less amount of training samples.However, it is observed that using the Societal dataset (a larger experimental dataset), the graph-based BERT classifier (i.e., Seg-BERT) with the additional graph-based information can enhance the classifier's performance.The above experiments over different corpus sizes establish that the graph-based representation can achieve comparable performance with the text-based approach.The following section further investigates using the proposed framework to comprehend if the text and graph views complement each other.

Does the text-based and graph-based representations complement each other?
Table 2 second part shows the performance result of incorporating graph and text views for sentiment classification task.It is evident from the Table 2, i.e., multiview methods, that incorporating both the text and graph views have significantly improved the performance of sentiment classifiers for both end-to-end and ensemble than the single-view based methods.The ensemble frameworks using CNN and DGCNN methods can achieve the best performance of up to 79.34% accuracy and 77.03% F-Macro scores.In contrast, its end-to-end framework can achieve up to 78.70% accuracy and 76.35% F-Macro score.It is observed that the performance of multi-view classifiers using BERT and Seg-BERT could not improve the performance compared to its individual classifier performances over Societal and SemEval 2016 datasets.One of the reasons for not performing well compared to the individual view is that Seg-BERT takes both the text and graph information while encoding graph representation.In contrast, BERT takes only the text information to encode sequence representation.Hence, the tweet representation generated using Seg-BERT has redundant information.Adding BERT information in the multi-view framework has created a noisy representation of the tweet due to the losses while training the multi-view framework.Among the baseline methods for incorporating multi-views, the T+MLN classifier is able to achieve best up to 76.69% accuracy and 73.97% F-macro score.It is also observed that the best performance of the single-view and multi-view classifiers over the SemEval-2016 dataset is relatively comparable.However, a clear difference between single-view and multi-view classifiers' best performance is observed in the Societal and SemEval-2013 dataset.One of the reasons for underperforming is due to the small size corpus.Compared to the Societal and SemEval-2013 datasets, the corpus size of the SemEval-2016 dataset is minimal; therefore, the node information in the tweet graphs of this dataset is not fully incorporated.With a larger corpus, the graph representation learning method is able to benefit the global properties of the nodes.As a result, the performance of the end-to-end and ensemble-based classifiers have significantly improved using DGCNN.This study shows that the node's properties in the tweet graph can inherently be captured with a larger corpus.Further, this study shows that incorporating text and graph views can better enrich the tweet representation for sentiment classification tasks than individual classifier performance.Therefore, from the above investigation, it is evident that both text and graph views complement the representation of the tweet for sentiment classification.

Heterogeneous multi-layer network v/s Dependency tree
This section investigates if there is a need for a language-dependent dependency parser to construct the tweet graph.For this study, we consider an off-the-shelf dependency parser in English language 3 to construct the tweet graph.Since SemEval datasets are English language datasets, we consider these datasets for the experimental study.The sentiment classification performance of tweets is evaluated over two variant representations of the tweet, i.e., tweet represented using the dependency parser and the heterogeneous multi-layer network.Figure 3 shows a performance comparison of the single-view and multi-view classifiers over the tweet graph using dependency parser and the heterogeneous graph.It is evident from the figures that the performances of the single-view classifiers, i.e., DGCNN and Seg-BERT, over the tweet representation using the heterogeneous graph have better classification accuracy than using the dependency graph.It is observed that the best performing classifiers (i.e., ensemble classifier) for both the graph representations are relatively comparable.The ensemble classifier trained over the SemEval 2013 dataset using the heterogeneous multi-layer network can achieve the best of up to 66% accuracy while using the dependency graph can achieve up to 65% accuracy.Similarly, the ensemble classifiers trained over the SemEval 2016 dataset using the heterogeneous multi-layer network can achieve the best of up to 75% accuracy while using the dependency graph can achieve up to 73% accuracy.This study shows that the heterogeneous multi-layer network is language invariant and able to perform better than language-dependent word graph structure.Further, to investigate whether the heterogeneous multi-layer graph is less sensitive to social media-related noises, we investigate the performance of the proposed framework over the under-specified and multilingual tweets in the following subsections.In this study, we consider tweets having less than five tokens as under-specified tweets.To investigate the performance of the proposed framework over these tweets, the best performing ensemble-based classifier (as observed in Table 2), i.e., the ensemble of CNN and DGCNN classifiers and the single-view classifiers, are considered.The performances of the single-view classifiers are compared with the end-to-end and ensemble frameworks using the under-specified and multilingual  tweets.Furthermore, we investigate the performance of sentiment classifiers on classifying tweets by adding semantically relevant tokens and sentiment polarized tokens to the tweet-graph through NE and SNE approaches.For this study, the classifiers are not re-trained over the expanded tweet-graphs.Instead, the representation of tweets is generated from the expanded graph for comparison.To ease of reference, we use the notation classifier+NE to indicate the classifier uses the expanded graph generate using either NE or SNE approaches for sentiment classification.

Performance of sentiment classification over under-specified tweets
As mentioned above, to investigate the performance of the proposed framework over the under-specified tweets, the best performing ensemble-based classifier (in Table 2), i.e., the ensemble of CNN and DGCNN classifiers and the single-view classifiers, i.e., CNN, BERT, DGCNN, Seg-BERT are considered for comparison.From the Figure 4(a), it is observed that the proposed framework using ensemble-based method outperforms the individual-view-based classifiers by achieving best accuracy up to 70.60% and F-macro score of 66.40%.While, the end-to-end framework is able to achieve up to 69.32% accuracy and 62.30% F-macro score.The best performance of a single-view classifier is using CNN classifier, which can achieve up to 65.93% accuracy and 60.40% F-macro score followed by BERT with 64.89% and 62.19% accuracy and F-macro scores respectively.Among the graph based approach, DGCNN can achieve up to 63.51% accuracy and 58.80% F-macro score while Seg-BERT achieves upto 61.89% and 53.99% accuracy and F-macro scores respectively.From this study, it shows that incorporating both views can better represent tweets than representations of their individual views.Further, after performing semantic Node Expansion (NE) over the under-specified tweet graph, it is evident from the Figure 4(c) that the performance of the classifiers significantly improves.It is observed that with NE, the performance of DGCNN+NE and Seg-BERT+NE improves to 77.62% and 74.56% accuracies respectively.Incorporating the text and graph-views using NE in the end-to-end framework (i.e.end-to-end+NE) further improves the classifier performance by 78.42% accuracy.With ensemble framework (i.e.ensemble+NE), the classifier performance improves up to 78.82% accuracy.Furthermore, after performing Sentiment polarized Node Expansion (SNE), the best performance we are able to achieve is up to 78.85% accuracy using the ensemble classifier i.e., Ensemble+SNE.It shows that adding semantically related polarized sentiment nodes in the tweet graph can further enrich the tweet representation even without re-training the classifiers.From this study, it is evident that the proposed framework can address the problem of the under-specificity of tweets with a high margin compared to the performance of the single-view classifiers.

Performance of sentiment classification over multilingual tweets
This section investigates the performance of the proposed framework over multilingual tweets.Similar to the under-specified tweets evaluation study, the best performing ensemble-based classifier, i.e., the ensemble of CNN and DGCNN classifiers and the single-view classifiers using SHE, are considered for comparison.It is observed from Figure 4(b) that incorporating both the text and graph views in the ensemble framework can achieve up to 71.28% accuracy and 66.64% F-macro score.While, the end-to-end-based classifier can achieve up to 70.59% accuracy and 66.38% F-macro.Among the single-view classifiers, the DGCNN classifier has achieved the highest of 60.86% accuracy and 55.54% F-macro, followed by Seg-BERT with 60.28% 41.18% accuracy and F-macro scores, respectively.The BERT classifier can achieve up to 59% accuracy and 58% F-macro scores, while the CNN classifier can achieve up to 57% accuracy and 47% F-macro score.This shows that incorporating both text and graph views can better represent a tweet than representing its individual views.Further, with node expansion of the tweet-graph, the improvement of the performance of classifiers is evident in Figure 4(d).The DGCNN and Seg-BERT classifier over the NE of tweet-graph can achieve up to 79.41% and 74.1% accuracies respectively.Further, incorporating text representation over the expanded graph using end-to-end and ensemble frameworks improves the performance of the classifiers by achieving up to 84.19% and 86.95% accuracy respectively.Furthermore, with SNE of the tweet-graph, the best performance we are able to achieve is up to 87.17% accuracy using ensemble of text representation and 86.95% end-to-end representation of SNE of tweet-graph.This study shows that the proposed framework of incorporating text and graph views can better enrich the tweet representation than its single-view representation.The tweet representation can be enriched further more by adding semantically related sentiment polarized nodes in the tweet graph.It is also evident that the proposed framework is able to address the problem of multilingual tweets by incorporating both text and graph-views in the multi-view learning framework.

Limitation of the propose framework
Even though the performance of the classifiers improves with SNE, it is observed from Figure 4(c, d) that the performances of the classifiers over the NE and SNE of tweet-graphs are relatively comparable.The reason for the less effectiveness of SNE could be due to filtering the sentiment polarized nodes.As the sentiment polarized nodes are chosen based only on the dominating sentiment, the number of nodes selected using SNE and the total nodes selected using NE is almost similar in size.Hence, mechanizing a method to retrieve more relevant sentiment polarized nodes of the input tweet could further enhance the performance of the sentiment classification task.

Conclusion and Future work
This study proposes a multi-view learning framework for sentiment classification of tweets to address under-specificity, noise, and multilingual content by representing tweets using text and graph representation learning methods.To incorporate both text and graph-views in the multi-view learning framework, this study explores both end-to-end and ensemble based classifiers.It is observed from various experimental studies that the performance of the tweet sentiment classifier improves significantly after incorporating both text and graph views than its individual-view classifiers.The ensemble based classifier is able to perform better than end-to-end based classifier on incorporating both the views.Further, it is observed that the proposed framework can perform better than its counterpart in addressing multilingual and under-specified tweets.Moreover, after performing node expansion over the tweet graph, the performance of the classifiers improves furthermore through semantic (NE) and sentiment polarized node expansion (SNE).To further enhance the performance of the sentiment classification task, retrieving relevant sentiment polarized nodes of the input tweet can be a future scope of the study.

Tweet:
Historic day for the Nation, #GST bill passed in Lok Sabha.#Congratulations to the nation,salute 2the vision of #PM @narendramodi ji

Fig. 2
Fig. 2 Proposed framework for sentiment classification of tweet by incorporating text and graph views through text and graph representation models.A and X represent the word embedding and adjacency matrices of the input tweet, and α i represent the weighted representation of the graph (G) and text (T) representations

Fig. 3
Fig. 3 Performance of classifiers over SemEval 2013 and 2016 challenge datasets.End-to-end and Ensemble classifiers are combination of CNN and DGCNN methods.

Fig. 4
Fig. 4 Performance of classifiers over different under-specified and multi-lingual tweet categories.End-to-end and Ensemble classifiers are combination of CNN and DGCNN methods; classifier+NE is the classifier performance of tweet classification over node expansion graph; classifier+SNE is the classifier performance of tweet classification over sentiment polarized node expansion graph

Table 1
Characteristics of the experimental datasets

Table 2
Performance of sentiment classifiers over the Societal dataset.