Keywords

1 Introduction

The recent explosion of violence involving groups of young people requires a serious discussion: One of the fundamental contexts for the development of such manifestations of violence is the school, both as an institution responsible for the training and transmission of knowledge, and as a relational space between young people and adults [1]. In the evolutionary process of the young person, school life represents an important stage in his social experience, experimenting with different ways of interacting: The young person learns the rules of behavior and strengthens their cognitive, emotional, and social skills. The school, therefore, can become the theater of both prosocial behaviors and aggressive behaviors, occasional or repeated, which have a profound impact on the development of the individuals involved in various capacities [2]. In fact, peer abuse occurs mainly between classmates or schoolmates, or between people who, voluntarily or not, share time, environment, and experiences [3]. People are hurt when they feel rejected, threatened, offended. Young victims, adolescents, and pre-adolescents, who are often ashamed to talk about it with someone, for fear of a negative judgment or for fear of receiving further confirmation of their being weak from the other. Bullying has long been under observation, while cyberbullying is a new and perhaps more hidden form, because it is less striking. It’s a subtle manifestation of bullying itself, but no less important. Its diffusion is due to the massive use of information technology which has allowed the creation of new meeting spaces [4].

Bullying is a specific form of violence which, unlike the normal quarrels that exist between children, destined to lead to small jokes, acquires persecutory traits. The bully attacks the intended victim with physical and psychological acts, to subdue it until it is annihilated, often inducing the most fragile victims to extreme gestures, or in any case opening wounds destined to remain for life. Most adolescents have experienced bullying, one in three of these cases occurs in the school setting [5].

The term cyberbullying means those acts of bullying and stalking, prevarication carried out through electronic means such as e-mails, chats, blogs, mobile phones, websites, or any other form of communication attributable to the web [6]. Although it comes in a different form, online bullying is also bullying. Circulating unpleasant photos or sending emails containing offensive material can hurt much more than a punch or a kick, even if it does not involve violence or other forms of physics coercion. In online communities, cyberbullying can also be group-based, and girls are usually victims more frequently than boys, often with messages that contain sexual allusion. Usually the heckler acts anonymously, but sometimes he doesn’t bother at all about hiding her identity. In this period of pandemic due to the spread of the Covid-19 contagion, with the adoption by many states of prolonged lockdown periods, this form of abuse has taken on even greater weight [7].

Social networks are means through which it is possible to communicate, share information and always stay in contact with people near and far. There are many, which differ from each other in various characteristic aspects aimed at satisfying the needs of some or many, but the purpose remains the same for all: to put the bet on the connection between individuals at the center, making it easier and more accessible. Among these, some of the best known and used are Facebook, Instagram, Twitter, and LinkedIn. Social networks are not limited only to instant messaging such as chats, but allow you to create your own profile, manage your social network and share files of all kinds that persist over time. Electronic bullying mostly occurs through social networks. This is because the web, with the ability to create and share millions of contents, has introduced a large amount of personal data and information into cyberspace [8]. The information ranges from personal data, tastes, favorite activities, places visited. This is because almost all social networks have rather soft personal data access policies, which allow their advertisers, and not just them, to collect thousands of data about their users. In many cases, in fact, it is sufficient to enter your name and surname in a search engine or in a social network, to know the opinions of a person, his romantic and working relationships, his daily activities [9]. The result is the social media paradox: if on the one hand we can more easily modify and shape our virtual identity, it is also true that, following the traces left by the different virtual identities, it is easier for others to reconstruct their real identity. This is because, the insertion of their data, their comments, their photo in a social network builds a historical memory of their activity and personality that does not disappear even when the subject wants it. The Data Protection Act, while helping to prevent the misuse of personal data, does not offer sufficient protection. It is therefore necessary to identify new methodologies capable of detecting possible cases of cyberbullying to intervene promptly and reduce the damage caused by these acts on the psychology of young people [10].

The term Sentiment analysis indicates the set of techniques and procedures suitable for the study and analysis of textual information, to detect evaluations, opinions, attitudes, and emotions relating to a certain entity [11]. This type of analysis has evident and important applications in the political, social, and economic fields. For example, a company may be interested in knowing consumer opinions about its products. But also, potential buyers of a particular product or service will be interested in knowing the opinion and experience of someone who has already purchased or used the product [12]. Even a public figure might be interested in what people think of him. Let’s imagine a political figure, who wants to know what people think of his work, to monitor and control the consent for his next eventual re-election. Of course, there are already tools for the detection of consensus and opinions (surveys and statistical surveys); but through Opinion Mining techniques it is possible to obtain significantly lower detection costs and, in many cases, greater informative authenticity. Indeed, people are not obliged to express opinions, on the contrary, they flow freely without any coercion [13].

In recent years, the use of techniques based on Deep Learning for the extraction of sentiment from sources available on the net has become widespread. Deep learning is a branch of machine learning based on algorithms for modeling high level abstractions on data. It is part of a family of targeted techniques learning methods to represent data [14,15,16,17,18]. Recurrent neural networks (RNN) are a family of neural networks in which there are some feedback connections, such as loop within the network structure [19]. The presence of loop allows to analyze time sequences. In fact, it is possible to perform the so-called unfolding of the structure to obtain a feedforward version of the network of arbitrary length which depends on a sequence of inputs. What distinguishes the RNN from a feedforward is therefore the sharing of a state (weights and bias) between the elements of the sequence. So, what is stored within the network represents a pattern that binds the elements temporally of the series that RNN analyzes [20].

In this work, we will first introduce the general concepts underlying sentiment analysis, and then move on to the analysis of the architecture of algorithms based on recurrent neural networks. Subsequently, a practical case of classification of the polarity of the messages extracted from the WhatsApp chat will be analyzed for the identification of possible acts of cyberbullying. The rest of the chapter is structured as follows: Sect. 2 presents the methodology used to extract knowledge from the data. Section 3 describes the analyzed data and the results obtained with these methodologies, discussing them appropriately. Finally, in Sect. 4 the conclusions are reported.

2 Methodology

2.1 Sentiment Analysis Basic Concepts

The problem of text categorization is to assign labels to texts written in natural language. Text classification is a problem addressed in Information Retrieval since 1960. The applications are innumerable: searching for content related to a theme, organizing, and indexing web pages or other documents, other anti-spam, determining the language of a text, rationalization of pre-established archives. In the 1990s, the development of statistical techniques in artificial intelligence led to a paradigm shift in this area as well. In fact, before this period the problem was mostly solved, in practical applications, through what is called knowledge engineering: the construction by experts of a set of empirical rules, based on keywords or regular expressions and combined through Boolean operators, which classified the text [21].

To date, however, the most widespread techniques are those that exploit what is made available by modern machine learning [22]: an algorithm is provided with a series of examples of texts classified by experts, and this returns a mathematical model capable of classifying new texts. Most academic efforts also tend to focus on this technique. The advantages are first and foremost in effectiveness: accuracy is much higher than that obtained through rules-based approaches and is for some problems comparable to that of a human classifier. Furthermore, it is usually much easier and faster for an expert to categorize sample texts than to define, together with a computer scientist, the rules necessary for the categorization: for this there are also economic advantages in terms of the expert’s working time. Furthermore, any refinements or updates of the classifier can be carried out systematically, through new sets of examples.

Recently, new text analysis tools are catching attention, not so much related to the extraction of specific characteristics of the text, but to some status of its author. This definition includes those inquiries by their nature aimed at the subject, such as the analysis of the writer’s opinions and his feelings towards the object of the text. These two objectives, partly overlapping, are known in the literature as Opinion Mining and Sentiment Analysis, respectively. A third problem, in some ways similar and derivative, is the detection of the agreement, or the measure of the degree of agreement between two authors.

In recent years, the development of the Web has offered numerous possibilities for applying these techniques [23]. In fact, the large amount of textual content containing personal opinions of the authors has allowed several research ideas. Ordering these documents for the opinions they express offers several practical possibilities: For example, we could search for the keywords that are most present in negative reviews of a product, before buying it or to improve its sales strategy. Or, we may automatically have a concise assessment of a blog or comment author’s opinion. Furthermore, on a larger scale, it is possible to hypothesize search engines for reviews, which find, classify, and present textual content present on the web that give opinions on a certain object searched for [11].

All these objectives therefore presuppose the identification of subjective contents expressed in a text. The problem is often broken down into two distinct sub-problems:

  • the existence or not of these subjective contents, that is, to distinguish objective texts from subjective texts

  • identify the polarity of the sentiment present in subjective texts (positive, neutral, or negative) (Fig. 1).

Fig. 1.
figure 1

Extraction of users’ opinion from social networks.

An objective text is the opposite of a subjective text, and one with a negative feeling is the opposite of one with a positive feeling; having to distinguish several topics, however, one does not have that one is the opposite of the other. Furthermore, the polarity of sentiment can be framed, contrary to the topic, as a regression problem. For example, we can establish a scale in which −10 corresponds to a negative feeling while 10 to a positive one. Although it is useful to note this difference with respect to other textual classification problems, this does not mean that a regression-based approach is the best. On the contrary, the problem becomes more solvable by framing it as a multiclass problem: negative, neutral, positive. These classes typically have a specific vocabulary, different from contiguous classes. It is also important to note that the neutral class (to which we can associate the value 0) does not express the same concept as the absence of subjectivity [13].

The analysis of textual data, within the new Big Data discipline, represents one of the most important horizons, in terms of volume and relevance of the information obtainable, and is, in fact, one of those fields in which researchers and companies are currently concentrating its efforts. This interest stems from the fact that while systems and methods are available to analyze non-textual data, the same cannot be said for textual data. Obviously, this delay is understandable, the tools were first developed to analyze the data already available historically, that is, the data that are in a structured and numerical form. Furthermore, the value of textual data has acquired real importance only in recent years, thanks to the widespread use of smartphones and the massive entry of social networks into everyday life [12]. The goal today lies precisely in being able to interpret and extract useful information for your activities from this huge amount of data, generated every day. In general, all industries can benefit from text data analysis. In any case, speaking of textual analysis we do not mean the simple identification of keywords and their frequency, but instead we mean a much more in-depth activity and the results of which can be much more precise and useful.

2.2 Extracting Social Networks Information

Social Networks are certainly the most important phenomenon of the contemporary era from a technological and social point of view. We can say that the most popular social networks such as Twitter and Facebook have revolutionized the way in which a very large and heterogeneous part of all of us interacts, communicates, works, learns, and spreads news or, more simply, fills the time for a break or one moving, perhaps by train or bus. Social Networks are virtual platforms that allow us to create, publish and share user-generated content. It is this last feature that allows us to distinguish social media and Content Communities from Social Networks, that is, platforms where users can share specific content with other members of the community.

For a virtual platform to be correctly called a Social Network, three conditions must be met:

  • there must be specific users of the platform in question

  • these must be linked together

  • there must be the possibility of interactive communication between the users themselves.

So, to give an example, Wikipedia is a social media, in fact users are not connected to each other, YouTube is a Content Community, users are connected to each other, but external people can also access the contents, while Twitter and Facebook are Social Networks, in fact, the latter satisfy the three previous conditions. The most interesting aspect of Social Networks and social media is their ability, in addition to the possibility of creating completely new and totally digital relational networks, to create content, and it is this last characteristic that makes the platforms so interesting. Moreover, we must always keep in mind, even if it is not that difficult, the importance that these tools are having on social evolution and daily behavior. Consider that by now about 59% of the world population is active on Social Networks or Media and that some events, political or custom, can generate large volumes of interesting data in a few hours.

In recent years, several researchers have used sentiment analysis to extract the opinion of users from social networks. West et al. [24] proposed a random field Markov-based model for text sentiment analysis. Wang et al. [25] applied data mining to detect depressed users who frequent social networks. They first adopted a sentiment analysis method that uses man-made vocabulary and rules to calculate each blog’s inclination to depression. Next, they developed a depression detection model based on the proposed method and 10 characteristics of depressed users derived from psychological research. Zhou et al. [26] studied customer reviews after a purchase to manage loyalty. Satisfaction, trust, and promotion efforts were adopted as the input of the model and the consumer’s buyback intention as the output. Five sportswear brands were analyzed by extracting the opinion of the merchants from the reviews to determine the intention to buy back products by consumers. In addition, the relationship between the initial purchase intention and the consumers’ intention to buy back was compared to guide the marketing strategy and brand segmentation. Contratres et al. [27] proposed a recommendation process that includes sentiment analysis on textual data extracted from Facebook and Twitter. Recommendation systems are widely used in e-commerce to increase sales by matching product offerings and consumer preferences. For new users there is no information to make adequate recommendations. To address this criticality, the texts published by the user in social networks were used as a source of information. However, the valence of emotion in a text must be considered in the recommendation so that no product is recommended based on a negative opinion.

Wang et al. [28] tried to extract sentiment from images posted on the Internet based on both image characteristics and contextual information from social networks. The authors demonstrated that neither visual characteristics nor textual characteristics are in themselves sufficient for accurate labeling of feelings. Then, they leveraged both information by developing sentiment prediction scenarios with supervised and unsupervised methodologies. Kharlamov et al. [29] proposed a text analysis method that exploits a lexical mask and an efficient clustering mechanism. The authors demonstrate that cluster analysis of data from an n-dimensional vector space using the single linkage method can be considered a discrete random process. Sequences of minimum distances define the trajectories of this process. Vu et al. [30] developed a lexicon-based method using sentiment dictionaries with a heuristic data preprocessing mode: This methodology has sur-passed more advanced lexicon-based methods. Automated opinion extraction using online reviews is not only useful for customers to seek advice, but also necessary for businesses to understand their customers and improve their services.

Liu et al. [31] proposed a deep multilingual hierarchical model that exploits the regional convolutional neural network and the bi-directional LSTM network. The model obtains the temporal relationship of the different sentences in the comments through the regional CNN and obtains the local characteristics of the specific aspects in the sentence and the distance dependence in the entire comment through the hierarchical attention network. In addition, the model improves the gate mechanism-based word vector representation to make the model completely language independent. Li et al. [32] used public opinion texts on some specific events on social networking platforms and combined textual information with sentiment time series to get a multi-document sentiment prediction. Considering the interrelated characteristics of different social user identities and time series, the authors implemented a time + user dual attention mechanism model to analyze and predict textual public opinion information. Hung et al. [33] have applied methods based on machine learning to analyze the data collected by Twitter. Using tweets sourced exclusively from the United States and written in English during the 1-month period from March 20 to April 19, 2020, the study looked at discussions related to COVID-19. Social network and sentiment analyze were also conducted to determine the social network of dominant topics and whether the tweets expressed positive, neutral, or negative feelings. A geographical analysis of the tweets was also conducted.

2.3 Recurrent Neural Network

In the case of problems with interacting dynamics, the intrinsic unidirectional structure of the feedforward networks is highly limiting. However, it is possible to start from it and create networks in which the results of the computation of one unit influence the computational process of the other. The algorithms based on this new network structure converge in new ways compared to the classic models [19]. A recurrent neural network (RNN) is based on the artificial neural networks model but differs from this for the presence of two-way connections. In feed-forward networks the connections propagate the signals only and exclusively in the direction of the next layer. In recurrent networks this communication can also take place from one layer to the previous one or connections between neurons of the same layer as well as between a neuron and itself [20]. This change in the architecture of the neural network affects the decision-making process: The decision made in an instant affects the decision that will take in the next instant.

Fig. 2.
figure 2

RNN architecture with indications of bidirectional flows between layers - unfolding of a recurring network.

In recurrent neural network, the present and recent past contribute to determining the response of the system, a common feature in the decision-making process of human beings. The differences compared to feed-forward networks are reflected in the feed-back circuit connected to past decisions: The output of a layer is added to the input of a previous layer, characterizing its processing. This feature gives recurrent networks a memory for the purpose of using information already present in the sequence itself to perform tasks precluded to traditional feed-forward networks. The information in memory is used with content-based access, and not by location as is the case with a computer’s memory. The information collected in the memory is processed in the next layer and, therefore, sent back to its origin, in modified form. This information can circulate several times gradually decreasing: In the case of information crucial for the system, the network can keep it without attenuation during several cycles, until the learning process considers it influential. Figure 2 shows an RNN architecture with indications of bi-directional flows between layers.

The RNN architecture shown in Fig. 2 requires that the weights of the hidden layer be regulated based on the information provided by the neurons from the input layer and by the processing obtained from the neurons of the hidden layer that have been activated. It is therefore a variant of the architecture of an artificial neural network (ANN), characterized by a different arrangement of the data flow: In the RNN the connections between the neurons combine in a cycle and propagate in the successive layers to learn sequences. In the network shown in Fig. 3, the so-called unfolding of the structure is performed to obtain a feedforward version of the network of arbitrary length which depends on a sequence of inputs. The weights and biases of a layer are shared, and each output depends on the processing by the network of all inputs. The number of layers of the unfolded network essentially depends on the length of the sequence to be analyzed.

Fig. 3.
figure 3

Unfolding of a recurrent neural network.

What distinguishes the RNN from a feedforward is therefore the sharing of weights and bias between the elements of the sequence. The information stored within the network represents a pattern that temporally binds the elements of the series that the RNN analyzes. In Fig. 2 each input of the hidden layer is connected to the output, but it is possible to mask part of the inputs or part of the outputs to obtain different combinations. For example, it is possible to use a many-to-one RNN to classify a sequence of data with a single output, or to use a one-to-many RNN to label the set of subjects present from an image, as shown in Fig. 4.

Fig. 4.
figure 4

a) One-to-many RNN architecture; b) Many-to-one RNN architecture.

During the input processing phase, the RNNs keep track of information on the history of all the elements of the past in the sequence in their hidden layers, that is, previous instants of time. Considering the output of the hidden layers at different times of the sequence as the output of different neurons of a deep multi-layer neural network, it becomes easy to apply backward propagation to train the network. However, although the RNNs are powerful dynamic systems, the training phase is often problematic because the gradient obtained with backward propagation either increases or decreases at any discrete time, so after many instants of time it can either become too large or become not very appreciable.

3 Data Processing, Results, and Discussion

WhatsApp is a free messaging application used to keep in touch with friends. Its free of charge and ease of use have made it the most popular instant messaging application. Creating groups is one of the main ways to exploit the potential of WhatsApp, in which dialogue can be a useful tool for exchanging information and concentrating users on a certain topic. These features have made this application very popular among students who use it by creating groups by classes, by topics or by sports groups. To begin, the WhatsApp chats of different school groups were extracted, creating datasets in.csv format. The messages were then cleaned by removing special symbols and various characters and emoticons. These symbols and characters can lead to a wrong classification. To avoid this, special symbols and emoticons have been replaced by their meaning. The next operation involved the labeling of each message by dividing it into the following classes: positive, and negative. To ensure sufficient generalization capacity for the algorithm, about 1000 messages were collected, taking care to distribute them as evenly among the two classes.

Before processing the data, it is necessary to carry out an appropriate subdivision of the data [34]. This procedure is necessary to avoid an excessive fit of the model on the data provided as input. The purpose of a classification model is to allow the correct classification of an occurrence never seen before by the model. To be sure that the model can do this, it is necessary that the performance evaluation is carried out on data that has never been subjected to the model so far [35]. The original data with the labeled examples were then partitioned into two distinct sets, training, and test sets, respectively. The classification model will then be trained using the training data, while its performance will be evaluated using the test set. The proportion of confidential data for training and testing was set at 70% for the training phase and the remaining 30% for the testing phase. This subdivision was made randomly. The accuracy of the classifier is then evaluated based on the accuracy achieved by the classifier itself on the test data [36, 37].

A preliminary step in any computational processing of the text is its tokenization. Tokenizing a text means dividing the sequences of characters into minimal units of analysis called tokens. The minimum units can be words, punctuation, dates, numbers, abbreviations, etc. Tokens can also be structurally complex entities, but they are nonetheless assumed as a base unit for subsequent processing levels. Depending on the type of language and writing system, tokenization can be an extremely complex task. In languages where word boundaries are not explicitly marked in writing, tokenization is also called word segmentation [38].

Another preliminary operation to be performed concerns the removal of the so-called stopwords. Stopwords are common words in a text that do not relate to a specific topic. Articles, propositions, conjunctions, or adjectives are typical examples of stopwords. These words can be found in any text regardless of the subject matter. They are called stopwords because they are eliminated in the search processes of a search engine, this is because they consume a lot of computational resources and do not add any semantic value to the text [39].

The last preliminary operation concerns stemming, a term used to name the linguistic process that aims to eliminate the morphological variations of a word, bringing it to its basic form [40].

Table 1. Sentiment analysis algorithm based on RNN.

In summary, in the preliminary phase, the lexical analysis of the messages is carried out, in which the tokens are extracted, that is, all the sets of characters delimited by a separator. Then the stopwords are removed, that is all those words that are very frequent but whose informative content is not relevant. Usually they are articles, conjunctions, prepositions, pronouns and are listed in the appropriate stoplists, which obviously vary depending on the language considered. After removing the stopwords, we move on to the stemming phase, in which the words are grouped into their respective linguistic roots, thus eliminating the morphological variations. The next step is related to the composition of terms and the formation of groups of words. In fact, some terms, if grouped, improve the expressiveness of the associated concept or in some cases express a different concept from the individual words that compose it. Table 1 show the algorithm used in this work.

For the setting of the classification model of messages extracted from WhatsApp chats, we used the sequential model of the Keras library. Keras is an open-source neural network library written in Python. It can run on different backend frameworks. Designed to allow rapid experimentation with deep neural networks, it focuses on being intuitive, modular, and extensible [41].

Five-layer classes were imported: Sequential, Embedding, SimpleRNN, Dense, and Activation. The Sequential class is used to define a linear stack of network layers that make up a model. The Embedding layer is used to transform positive integers into dense vectors of fixed size. This level can only be used as the first level in a model. The SimpleRNN level is used to add a fully connected RNN. The Dense class is used to instantiate a Dense layer, which is the fully connected base feedforward layer. The activation level is used to add an activation function to the level sequence. A sigmoid activation function is used, which produces a sigmoidal curve. This is a characteristic curve characterized by its S shape. This is the earliest and most often used activation function.

In the compile procedure we have set the loss, the optimizer, and the evaluation metric. As loss function, we have used the binary_crossentropy loss function, especially suited for binary classification problem. This loss function computes the cross-entropy loss between true labels and predicted labels. As optimizer the RMSProp optimizer was used, and finally for the performance evaluation the accuracy metric was used. This RMSProp optimization algorithm maintains a moving average of the square of the gradients and divides the gradient by the root of this average. The accuracy returns the percentage of predictions correct with a test dataset. Equivalent to the ratio of the number of correct estimates to the total number of input samples. It works well if there are a similar number of examples belonging to each class.

After training the model on the training data, we tried to evaluate the model’s performance on a never-before-seen dataset. The model returned approximately 85% accuracy showing clearly that an RNN-based model is capable of correctly classifying the polarity of a message.

4 Conclusion

Cyberbullying is becoming a real social problem and given the young age of the people involved it requires a lot of attention from adults. Young people are now making massive and sometimes excessive use of telematic communication channels. These channels do not have an appropriate control of the contents of the conversations due to the constraints imposed by the respect of privacy. But given the weight assumed by such conversations in the lives of children, it is necessary to think of methodologies that can guarantee vigilance without compromising the freedom of children to have spaces for socialization. Automatic identification of cyberbullying acts on social networks can help set up support policies for victims. In this study, a method based on sentiment analysis was proposed with the use of recurrent neural networks for the identification of the polarity of the message contents of the popular WhatsApp messaging app. The results showed that this methodology can represent a tool for monitoring the contents of conversations between young people.