1 Introduction

Social networks have become one of the main forms of social media where citizens can express opinions on different matters (Ayala, 2014). Twitter, Facebook and Insta gram are among the most popular social networks. Twitter had 217 million daily active users by February 2022 Twitter by the numbers: stats (2022). Facebook had 2.934 billion monthly active users by July 2022. However, the total number of persons that make use of this network each month have gone down by roughly 2 million (− 0.07%) during the three months prior to July (Facebook Statistics and Trends, 2022). Instagram had over 1.44 billion monthly active users by August 2022 The Small Business Blog (n.d.).

On Facebook, individuals can stay in touch with each other, receiving e-mails or alerts on their cell phones whenever someone posts a status update (Amidon, 2011). Instagram is utilised to share both photos and videos. Additionally, it also provides tools that enable the modification of pictures (Jackson and Luchner, 2018). Twitter users have been able to post messages of up to 280 characters each since the end of September 2017 (previously the size was 140 characters). This limit establishes many of the attributes of the language utilised in the tweets (Farzindar & Inkpen, 2015). Twitter users send messages about any topic within of the aforementioned limit and lead others to receive their tweets. In addition to posting, Twitter users can reply, retweet and quoteFootnote 1. Neither Facebook nor Instagram allow for a similar textual exchange of messages, but instead allow for annotations to be made on the content. Twitter, which requires its users to be 13 years of age or older, has a tool that is not only used for disseminating information, but also for raising awareness about issues affecting society (Quadri et al., 2018).

Compared to the access to information handled on Instagram or Facebook, Twitter data is more readily available through robust Application Programming Interface (API) (Macy et al., 2015; Weller et al., 2014). This allows researchers to obtain a significant volume of tweets with extremely useful metadata, such as type of tweet, longitude and latitude (Weber et al., 2021), or other textual description related to the location from which the tweet was posted. The used hashtags in the tweets as well as the relations between tweets, in addition to other data, can also be retrieved (Weber et al., 2021).

The study of social networks makes it possible to explore social trends and behaviors, as well as to help to create a collective image about different realities. Social networks are nowadays a highly disseminated loudspeaker for all kinds of events and news (Beykikhoshk et al., 2015; Karami et al., 2018). These platforms have a great deal of potential in facilitating guided decision-making by governments and institutions.

In this paper we use the social network Twitter to examine the public perception of education systems in different countries.

1.1 Sentiment analysis

Classifying and quantifying the sentiments expressed in the social networks allows researchers to analyse opinions that are often not expressed in more formal settings. This is especially apparent when the points of view are related to sensitive issues that are affected by the social desirability. The emotional dimension of a controversial issue is often developed more dynamically and unpredictably in a social network than in an informal face-to-face discussion (Kušen et al., 2017). The sentiment analysis thus becomes a relevant resource both for capturing the pulse of society on certain matters and for acting as a tool to both raise awareness and for predicting future behaviour (Kalyanam et al., 2016).

Sentiment analysis is a widely studied topic (Mohammad et al., 2018; Kharde & Sonawane, 2016), being used in areas as diverse as the stock market, the healthcare system, and the field of business, among others. In the area of the stock market, (Bharathi & Geetha, 2017), the sentiments of common people are united through their news feeds and combined with data related to the Index of the Bombay Stock Exchange in order to anticipate the behavior of the current stock market. A model is suggested to forecast the stock prices based on the combining of a Self-Organizing Map (SOM) and fuzzy – Support Vector Machines (f-SVM) (Naseem et al., 2018). Uhr et al. (2014) applies the sentiment analysis on financial market news implementing a software prototype for computing results on different levels of exploration, splitting text into sentences and subsets of tokens from sentences. SentimentWortschatz (SentiWS) software (Remus et al., 2010) was used. Kolasani and Assaf (2020) predicts the future movement of the United States’ stock market by analysing the sentiment of Twitter posts related to the stock market. The average sentiment value of the tweets is obtained using a Support Vector Machines Model (SVM). In the healthcare field, (Balakrishnan et al., 2021) examines the effectiveness of deep learning methods to identify the sentiment modification in breast cancer patients’ narratives. A sentiment change study was carried out to estimate the modification in the satisfaction level of the patients a deep learning model bidirectional Long Short-Term memory (LSTM) with sentiment embedding features, which provided the best result. Hopper and Uriyo (2015) utilised the sentiment analysis technique to review patient feedback for a group of gynecologists, where time-to-next-complaint procedures were used. In the business area (Suerdem & Kaya, 2015) emulates conventional attitude and market research tools by using Naïve Bayes and Support Vector Machines models in the Turkish language context. Rajkumar et al. (2019) utilizes Naïve Bayes and Support Vector Machines models to categorize product reviews as positive or negative.

Specifically, Twitter’s sentiment analysis has been used in a wide range of areas related to public trust, ranging from predicting election outcomes to examining resentment against government policies (Arcila-Calderón et al., 2017; Calderón et al., 2015; Blasco and Coenders, 2020). It has also been utilised in the study of migration (Arguedas et al., 2020; Arcila-Calderón et al., 2020), disability (Gómez-Marí et al., 2022) or diseases caused by the Ebola virus (Percastre-Mendizábal et al., 2019) or SARS-CoV-2 (Sued & Cebral, 2020).

Sentiment analysis is a procedure that mechanically detects attitudes, opinions, or emotions from text, in which several approaches can be utilised to conduct the analysis: Vohra and Teraiya (2013):

  • A knowledge-based procedure, in which a words lexicon showing positive or negative perceptions is built. Some heuristics can also be used to add more words to the initial lexicon (Musto et al., 2014; Bonta et al., 2019; Mehmood & Balakrishnan, 2020).

  • Relationships based approach, in which the relationships between certain parts of the text, whose sentiment were previously analysed, are used in order to identify the existing opinions in the principal unit. Additionally, all those connections that can exist between elements in relation to their meaning can be examined (Alrasheed, 2021; Janda Janda et al., 2019).

  • Language models implementation. The majority of existing lexicons contain only unigrams along with their sentiment scores. It has been observed that sentiment n-grams formed by combining unigrams with intensifiers or negations sometimes show improved results in the sentiment classification task. This has led to the development of new methods that process frequent n-grams as features in addition to single words (Dey et al., 2018; Walaa et al., 2014; Kaur et al., 2021).

1.2 Motivation and objectives of the study

Our purpose is to answer the following research questions: What is the predominant perception towards the education system?. Are there differences in these sentiments between sexes? To answer them we use Twitter. This is due both to the unique characteristics of this social network and to the evidence suggesting that this site has been one of the most studied social networks in the academic research (Weller et al., 2014; Agarwal et al., 2011; Kharde & Sonawane, 2016). The goals of this research are:

  1. 1.

    Regarding education systems, to identify the sentiments expressed globally, on Twitter, as well as by gender. The perceptions of education are globally examined on Twitter, considering those countries from which more than 50 tweets were posted and where the gender of the author as well as the country of origin of the tweet were identified during the analysis period.

  2. 2.

    To detect commonalities and differences between countries with the highest number of tweets.

  3. 3.

    To compare the results of points 1 and 2 with reports that may have been carried out at an international level on the perception of the education system by educators and users.

Concerning perceptions on the education system, various survey-based studies have been performed with the aim of gaining in-depth knowledge of students’ perceptions connected to the educational system. The World Innovation Summit for Education (WISE 2020) published a comprehensive survey of young people’s viewsFootnote 2 on their education and future. Also, in the PISA 2018 (OECD, 2019a) Report, for the first time, 15-year-old students were asked how they normally feel about their lives. While this study focuses on the perception of the students themselves, our research covers a much wider age range (the younger and the adult populations expressing their opinion on Twitter).

It should be noted that the determining perceptions of the education system of students and teachers has been the purpose of several pieces of research, these have aimed to solve problems related to the quality of teaching and learning, as well as unemployment-related and job dissatisfaction issues among university graduates. WISE (2020) and Ipsos Group S.A (2021) examined the perception of the students themselves towards the education system. Existing research often uses structured empirical surveys collecting qualitative and quantitative information (Safari & Barigye,Sydney et al., 2021; Kartal et al., 2015).

Public opinions on education have also been analysed theoretically (Benjamin, 1993), as well as the influence of the media on such opinions. Greater and more effective publicity about the role played by educational centers in society has been required (Benjamin, 1993). An analysis recently published (Cheeti, 2021), utilised Twitter sentiments analysis to explore the impact of the COVID-19 pandemic on education. However, the geographical dimension was not considered. An analysis by origin country of the posted tweet, as well as by gender, is carried out in our investigation.

With respect to gender inequalities, (UNESCO, n.d.) states that significant gender differences exist in access, school performance and continuity in education. Girls are at a disadvantage compared to boys in many countries. UNESCO points out that 16 million girls will never receive an education (UNESCO, n.d.). Geographic isolation, poverty, minority status, disability, early marriage and pregnancy, gender-based violence and traditional attitudes about the status and role of women are all among the main causes of inequality (UNESCO, n.d.). Due to the above, it is relevant to know if gender differences exist in perceptions on the education system. Dissimilarities between the opinions of men and women have already been detected in various social issues such as the most important environmental problems (Dreiling & Belkhir, 1997; Mouronte-López & Subirán, 2022), the epidemic of COVID-19 (Galasso et al., 2020), corruption (Bauhr & Charron, 2020), or leisure(Fontenelle & Zinkhan, 1993).

2 Materials and methods

2.1 Overview used resources

2.1.1 Software programs

T-Hoarder tool

T-Hoarder tool (Congosto et al., 2017) was used for the downloading of data from Twitter. This tool is a piece of software that carries out tweet crawling, data filtering, and shows a synopsis of information about Twitter activity on a certain topic. The tool is available for the UNIX operating system, having been developed in Python language (Van Rossum & Drake, 2009).

T-Hoarder uses two API provided by Twitter to retrieve data: REST (REpresentational State Transfer) API and the Streaming API. The REST API makes it possible to perform all kinds of queries on Twitter data synchronously. Utilising this API, only tweets posted in the week prior to the time in which the API is utilised can be obtained (Congosto et al., 2017).

The Streaming API establishes a socketFootnote 3 between Twitter and a server through which information is received asynchronously. This API allows us to download tweets in real time. T-Hoarder utilises the Tweepy library (Twitter, n.d.), which resolves low level access to the aforementioned API (Congosto et al., 2017).

Using T-Hoarder we download tweets both synchronously and asynchronously, through the REST and the Streamming API. Utilising T-Hoarder the information shown in Table 1 can be obtained by each tweet.

Table 1 Information provided by T-Hoarder for each tweet (Congosto et al., 2017)

In-house software

Several programs were developed in R (n.d.) and Python programming languages, where the following features were implemented:

Data processing

which includes all that is indicated below. More detail on the utilised procedures can be found in the Overview of used methods Section.

  • Elimination of all duplicate tweets, exclusion of replied, retweeted and quoted category interactions since they did not contain personal opinions. Only tweeting type messages were considered.

    In order to optimise the processing, similarly to Mouronte-López and Subirán (2022), certain substitutions were made: all URLs were replaced by the word “LINK”, the word "USER" took the place of all users’ mentions, while hashtags were replaced by the word “HASHTAG”. Emojis were also translated into text since they contain information about the sentiment of the tweet. The punctuation marks, articles, conjunctions, and words with a length lower than 3 letters were removed since they do not provide any such information. A reduction of words to their root term (stemming) was performed. Additionally, all inflected forms of the same word were replaced by its lemma (lemmatization). The correction of misspelt words was also carried out. Finally, the text was converted into lowercase.

    The program that carried out the mentioned functionalities was implemented in Python language. NLTK and pandas libraries were used.

  • Analogously to Mouronte-López and Subirán (2022), for each processed tweet, the gender of its author was obtained. Longitude and latitude, as well as the textual description on the localization indicated in each tweet, were also used to gain knowledge of the country from which each tweet was posted.

    The program that implemented this feature was developed in Python language. The Nominatim geolocation service and the online service genderize.io were used.

  • For each tweet, the sentiment corresponding to the processed text was obtained. This program was also implemented in Python language. The TextBlobFootnote 4 library was utilised.

Performing statistical calculations

the following statistical calculations were carried out:

  • Percentage of tweets with positive/negative sentiment in each country.

  • Mean, Standard Deviation, Median and Mode of both sentiment and subjective exhibiting tweet text in each country.

  • Construction of a graphical representation of the information in two-dimension (heat-map). Several colors were used to show different levels of analogy between countries, considering a specific similarity metric.

  • Obtaining the most frequently used words.

This program was implemented in R language. The R.utils library was used.

2.2 Overview of used methods

2.2.1 Downloading and processing of tweets

We downloaded only tweets in the English language from March 03rd until August 31st in 2021. This period was chosen because the project was granted in the first quarter of 2021 and results had to be obtained during the last quarter of that year.

For the analysis of sentiments about the education system, tweets including the sentences: "education system" and "educational system" were chosen. As previously stated, Twitter users can perform various kinds of interactions. However, because we aimed to detect individuals’ sentiment, tweets exclusively related to tweeting interaction are analysed.

As previously indicated, similarly to (Mouronte-López & Subirán, 2022), a processing of the tweets is executed in which emoticons, usernames, punctuations, and links are eliminated. The text is tokenized, stemmed, lower-cased and only words with a length longer than 2 letters are utilised for the analysis. Duplicate tweets, if any, were also removed in the procedure. After this process, 45,278 tweets were obtained for the analysis of the education system.

2.2.2 Detecting the geographic location from which each tweet is sent and identifying the gender of its author

Analogously to (Mouronte-López & Subirán, 2022), to mechanically obtain the gender for each tweet, the online service genderize.io was utilised. It is an API, which anticipates the gender of a person from their name, which is registered in the name of each tweet.

To identify, based on geolocation data, the country from which each tweet was produced, the Open Street Map (OSM) Nominatim geolocation service was utilised. Similarly to (Mouronte-López & Subirán, 2022), we assess the tweeting location not only by inspecting the geolocation field but also checking the location field in each tweet. It allowed us to maximise the number of tweets whose home country was identified.

Of the 45,278 tweets downloaded referring to the education system, 43,050 included the author’s country of origin and gender. 19,726 were produced by women and 23,324 by men. The tweets corresponded to 16,030 unique users.

It can be noted that, for this investigation, only those countries from which more than 50 tweets, where it was possible to identify the author’s gender and the home country, were selected for the analysis by country. We utilize all tweets in the study at a global level.

2.2.3 Evaluating similarities between countries

Regarding the messages exchanged on Twitter, the analogy between two countries was estimated by assessing the similarity between frequencies of used words. It is assumed that if tweets from two countries are written in a similar way, they will tend to contain the same words with analogous frequencies. In order to evaluate this similitude, we use the Sim2Count metric, which can be defined as:

$$ Sim2Count = 1 - S_{C} $$
(1)

Where

$$ S_{C}=\frac{{\sum}_{i=1}^{i=m} Country1_{i}Country2_{i}}{\sqrt{{\sum}_{i=1}^{i=m} Country{1_{i}^{2}} }\sqrt{{\sum}_{i=1}^{i=m} Country{2_{i}^{2}}} } $$
(2)

Country1i and Country2i symbolise the i component of the vectors Country1 and Country2, respectively.

Country1i represents the frequency of the word i in Country1 (FreqCountry1Wordi) and Country2i is the frequency of the word i in Country2 (FreqCountry2Wordi), this nomenclature is shown in Table 2.

Table 2 Frequency of utilised words for Country1 and Country2

The cosine similarity metric, SC, has been used frequently in order to examine similarities between texts (Lahitani et al., 2016; Ahmed et al., 2013).

2.2.4 Identifying the polarity of tweets

As previously mentioned, to check the polarity of each tweet the TextBlob library in Python was used. From the processed text obtained for each tweet this library provides two different values: polarity and subjectivity. The polarity is in the [–1,1] interval and supplies the actual sentiment, and whether it is negative or positive. The subjectivity is in the [0,1] interval, and it refers to the amount of personal opinion existing in the sentence. The higher the value the more subjectivity there is in the text (Mouronte-López & Subirán, 2022).

3 Results

Tables 3 and 4 depict, for the tweets which referred to the educational system, the number of unique users and the total amount of tweets as well as by sex. A total of 43,050 tweets were downloaded, coming from 16,070 unique users. Of these, 60.82% were male and 39.18% female (9,773 men versus 6,297 women). Tables 5 and 6 show analogous information by each analysed country.

Table 3 For interactions related to the educational system, number of unique users
Table 4 For interactions related to the educational system, number of tweets
Table 5 For interactions related to the educational system, number of unique users by country and gender
Table 6 For interactions related to the educational system, number of tweets by country

Tables 78 and 9 show the average, the standard deviation, the median and the mode of expressed sentiments by sex and country. Figures 1 and 2 show the percentage of tweets with positive and negative sentiments by country. Figures 3 and 4 display, as bar diagrams, the median of the sentiment by country, and desegregated by sex in each nation. It can be observed that negative perceptions of education systems are prevalent according to the obtained median. This happens for both sexes as well as for all analysed countries, except in India, Canada, Pakistan, Australia, South Africa and Kenya.

Table 7 For interactions related to the educational system, statistical parameters corresponding to expressed sentiments by sex
Table 8 For interactions related to the educational system, statistical parameters corresponding to expressed sentiments by country
Table 9 For interactions related to the educational system, statistical parameters corresponding to expressed sentiments by sex and country
Fig. 1
figure 1

For interactions related to educational system, percentage of tweets with a positive sentiment

Fig. 2
figure 2

For interactions related to educational system, percentage of tweets with a negative sentiment

Fig. 3
figure 3

For interactions related to educational system, median of expressed sentiments by country

Fig. 4
figure 4

For interactions related to educational system, median of expressed sentiments by country and sex

Tables 101112 and Fig. 5 depict identical information referring to the subjectivity of tweets.

Table 10 For interactions related to the educational system, statistical parameters corresponding to expressed subjectivity by sex
Table 11 For interactions related to the educational system, statistical parameters corresponding to expressed subjectivity by country
Table 12 For interactions related to the educational system, statistical parameters corresponding to expressed subjectivity by sex and country
Fig. 5
figure 5

For interactions related to educational system, median of expressed subjectivity by country

In Table 9 can be observed that, according to median and mode values, the men’s perception of the education system is better than that of women in all analysed countries. Because the standard deviation is significant for the obtained mean value, we consider the median value more appropriate for the statistical analysis.

In Table 10 can be seen that globally, medians and modes exhibit similar values for both sexes. Again, the standard deviation is relevant considering the average subjectivity. Due to this, the median is a more suitable value for the statistical study.

In Table 11 it can be observed that, considering the median values, four countries present the highest subjectivity. Again, the standard deviation is relevant considering the average subjectivity, the median is considered a more appropriate statistical metric for the study.

In Table 12 it can be noted that, considering the median values, some countries show differences in the subjectivity by sex but others do not. Once more, the standard deviation is significant considering the average subjectivity so we take the median as a more suitable statistical metric for the study (Fig. 6).

Fig. 6
figure 6

For interactions related to educational system, median of expressed subjectivity by country and sex

Figure 7 depicts the similarity heat-map for the studied countries utilising the Sim2Count metric. It can be observed that the most similar are Myanmar and Japan and the most divergent in the perception of the education system are the United States and Pakistan, as well as India and Singapore.

Fig. 7
figure 7

For interactions related to the education system, a heat-map showing the calculated similarity between countries using the Sim2Count metric

Additionally, Fig. 8 displays the most frequently used words globally in tweets, and Table 13 shows analogous information by nation, the frequency of occurrence of words in the tweets is indicated in parentheses.

Fig. 8
figure 8

For interactions related to the educational system, most frequently used words (articles and auxiliary verbs have been excluded)

Table 13 For interactions related to the educational system, the most frequently used words by country1

As an example of a country with an average positive and negative sentiment, Figs. 9 and 10 show the cloud of the most frequently used words in Japan and India. It can be noted that the graph shows words related to a lack of possessions as well as wishes in India. While in Japan the diagram displays words with the meaning related to protest, complaint or disagreement. All the above will be examined in more detail in the Discussion Section (Fig. 11).

Fig. 9
figure 9

Most frequently used words in India (articles and auxiliary verbs have been excluded)

Fig. 10
figure 10

Cloud of the most frequently used words in Japan (articles and auxiliary verbs have been excluded)

Fig. 11
figure 11

For interactions related to the educational system, most frequently used words (articles and auxiliary verbs have been excluded)

4 Discussion

The study of the perception of the educational system is a matter of relevant interest. The public perception of the quality of education programs has been explored in specific countries (Shrestha, 2013) as well as the public opinion on the Higher Education (University of Scranton, 2004).

The World Innovation Summit for Education (WISE 2020) published a comprehensive survey of young people’s views on their education and future. This survey, conducted by Ipsos Group S.A (2021), was carried out in 20 countries around the world, aiming to understand how young people perceive their education and how prepared and confident they feel about their personal destiny. The study revealed that students really value their training. Around 90% agreed that education was not just to learn a career or profession, but it has a value in itself. Additionally, nearly 84% of young people thought that learning outside school was as important as learning in the classroom. The analysis also highlights that 80% of young people were “satisfied” or “somewhat satisfied” with their education, although only 27% was “totally satisfied”.

The survey also showed two main areas of improvement that young people would like to implement in their education. Firstly, students demanded a more individualised educational approach (more than 60% said that they would like their teachers to provide them with personal advice on career guidance and on ways of learning as well as studying what is tailored to their needs). Secondly, students demanded more time to learn 21st century skills. In this respect, about half of the young people (41%) thought that the current education lacked training in new technologies (artificial intelligence, coding, etc.) as well as appropriate instruction in communication and organisational skills. Around 44% believed that education did not foster creativity and curiosity.

Also, the Organisation for Economic Co-operation and Development’s (OECD’s) in its Programme for International Student Assessment (PISA) (OECD, 2019a) provided a comprehensive and rigorous international assessment of the learning outcomes of 15-year old students. The results obtained provide important insights into the quality and equity of education systems, allowing educators and policy makers to learn from practices in different countries. The third of six volumes of this survey focuses on the physical and emotional health of students, as well as the role of teachers and parents in shaping the environment and social life at school. In the stated volume, for the first time, 15-year-old students were asked how they frequently felt about their lives. Students reported positive affective states: “happy”, “joyful”, “proud”, “content” and “cheerful”, but also negative affective conditions: “scared”, “miserable”, “fearful” and “sad”.

In the aforementioned survey, more than 85% of students of the OECD countries reported that they sometimes or always felt “happy”, “cheerful” or “joyful”. In contrast, around 6% of students reported that they “always felt sad”. It must be noted that across countries and economies, girls were more likely than boys to report that they “sometimes” or “always” felt sad. The investigation also highlighted that the more time students spent on the Internet outside of school, the more likely they were to report feeling sad or miserable. Across virtually all countries and economies, students were more likely to report positive sentiments when they had a high sense of belonging to the school and greater cooperation between students existed. In contrast, if students had experienced bullying, they were more likely to manifest sadness. The survey also noted that the difference between those students that manifested “low use” and “frequent use” of Internet outside the classroom and that indicated feeling miserable “sometimes” or “always” was 7 percentage points among boys and 13 percentage points among girls, on average across OECD countries. It can be noted that, among the five school climate indices analysed in the survey, those that best predicted positive student feelings in OECD countries were the index of sense of belonging to school, followed by the indices of student co-operation and exposure to bullying.

The study also described life at school as a key aspect of pupils’ existence. School was not only the place where children acquired knowledge, but also where they made friends, established trusting relationships with their teachers and developed an attachment to their place of learning. However, negative reactions to education could also be generated at school. There are also other factors and events outside the establishment, such as economic crises and natural disasters, as well as family-related problems, which can influence pupils’ sentiments.

Research exists that explains that students who feel attached to their school, persevere and love learning report positive affective states, such as enthusiasm, inspiration and happiness. The causes of negative affective states, such as sadness, fear, despair or shame, are often more difficult to detect than positive affective states (Anderman, 1999; Weber et al., 2016). Certain student behaviours, such as teamwork, self-regulation, general positive attitudes towards school and life, as well as a sense of belonging to school, possessing hope and receiving love, seem to protect students from experiencing negative emotions (Anderman, 1999; Weber et al., 2016).

Regarding our investigation, which covers a wider age range and type of person, Tables 7 and 8, show some statistical results such as average sentiment, standard deviation, median and modeFootnote 5. It can be observed that negative perceptions of education systems are prevalent according to the obtained median. This happens for both sexes as well as for all analysed countries, except in India, Canada, Pakistan, Australia, South Africa and Kenya. These nations exhibit a higher percentage of positive tweets, as shown in Figs. 1 and 2, which can be put in relation to their income level.

Based on their incomes by 2022, the aforementioned nations can be classified, as follows (World Bank, n.d.):

  • Lower-Middle Income Economies ($1,046 to $4,095): India, Pakistan, Kenya.

  • Upper-Middle-Income Economies ($4,096 to $12,695): South Africa.

  • High-Income Economies ($12,696 OR MORE): Australia and Canada.

Globally, as Fig. 8 shows, the words “dictatorship” and “slavery” are the most used. The vision by country is displayed in Table 13. It can be observed that in all countries except in three with lower-middle (India, Pakistan and Kenya) or upper-middle (South Africa) incomes, the term “slavery” is one of the most frequently utilised words. In all nations except India, Pakistan and Kenya, the vocabulary “dictatorship” is one of the most posted words. In India and South Africa, the word “change” is also highly used, as well as “job” and “future” in Kenya. In the aforementioned six countries in which the sentiment towards the education system is positive, this is India, Canada, Pakistan, Australia, South Africa and Kenya, the term “need” has a relevant use. The above suggests that in those nations where education is perceived as an opportunity to improve, change and progress, the sentiment towards the education system is more favourable.

It must be noted that in all countries but three of them with lower-middle income (India, Pakistan and Kenya) or upper-middle income (South Africa) the term “slavery” is one of the most frequently used words in the tweets.

Japan, which is one of the countries showing negative sentiment, as Fig. 3 shows, traditionally has occupied a good position in most international educational performance surveys. Thus, it is among the best-placed countries in the Program for International Student Assessment (PISA), especially in Science and Mathematics (with 529 and 527 points, respectively, in the 2018 edition) (OECD, 2019b). Japan’s results also show great equity, as the impact of socioeconomic status on student performance is below the OECD average.

The Japanese education system is characterized by emphasising effort over ability or intelligence. The Japanese belief in the idea of achievement through hard work has led to the proliferation of so-called “juku”, after-school classes to reinforce learning. Students work from an early age in this meritocracy system to achieve good results, especially in the exams that give access to the best higher secondary schools and the most prestigious universities (National Center on Education and the Economy, n.d.). This competitiveness could be one of the factors implicated in the negative perception of the educational system. Figure 10 depicts a word cloud diagram, in which the terms “oppose”, “dictatorship”, “boycott” and “demonstrate” are among the most frequently used words. Figure 7 shows that very high similarities exist regarding the used vocabulary between Japan, Pakistan, France, and Myanmar. High similitude SC > 0.97 has also been found between several countries. It should be noted that even if there are nations that are similar in the words used, they may not be analogous in terms of frequency of use, and may therefore show very different sentiment and subjectivity.

In India, one of the countries with the most positive sentiments towards education, as Fig. 3 shows, the access to education is far from universal. According to UNESCO, there are 252,863,750 illiterate Indians over the age of 15 (162,780,856 of whom are women) and the enrolment rate in Higher Education in 2020 was 29.4% (31.3 for women and 27.8 for men). It could be established that education is a precious asset and therefore the assessment of the education system is more positive. Figure 9 shows a word cloud diagram in which “need”, “teach”, “change”, “want” are the most recurrent terms.

Additionally, it must also be noted that, if the sentiments in each country are desegregated by sex, high differences are detected between men and women, as is shown in Table 9. This is especially noticeable in nations such as India, Canada, Pakistan, Australia, South Africa and Kenya, where the difference between the medians of the sentiments between the two sexes is higher than 0.24 Footnote 6.

Regarding the detected subjectivity in the tweets, as is shown in Tables 1011 and 12 the interactions do not exhibit very high subjectivity, nor do they show very relevant differences between sexes. However, India, Canada, Pakistan, Australia, South Africa and Kenya are the ones that show the highest subjectivity. The largest values correspond, in order, to Pakistan, India, South Africa and Kenya. These four countries have low overall human development indicators (UNDP, n.d.) (particularly India, Pakistan and Kenya), far below Australia or Canada, as it can be observed in Table 14.

Table 14 More relevant indicators related to this research

The results of the social perception of national education systems indicate that there is large room for improvement. Greater involvement of families and the local community in the education could contribute to such enhancement. Research shows that family involvement in children’s learning is strongly associated not only with the academic success, but also with their socio-emotional development. The OECD, in the aforementioned PISA reports, introduces parental involvement as a key indicator of educational effectiveness.

In addition, raising awareness of the social effects of education beyond employability could also favour a good social perception. As our study shows, in low-income countries education is perceived as a valuable asset, as a hope for a better quality of life. The OECD worked on the development of indicators on the potential social outcomes of education since the 2009 edition of Education at a glance report (OECD, 2021). The 2021 edition focuses on the impacts of education on health statuses and subjective well-being.

Health is an important policy area in OECD countries, as evidenced by the rapid increase in life expectancy over recent decades and in the context of the current COVID-19 pandemic. While education does not impact on health in isolation from other factors or in an unidirectional way, the latest OECD report (2021) shows that, on average, there is a gap by level of education. This gap is exemplified through of a difference of 5 years of life expectancy between the population with tertiary education and those with only primary or first stage of secondary education. Life expectancy is a consequence of the economic circumstances in a country, health conditions and other mortality risks that affect individuals throughout their life trajectory. Higher educational attainment provides means to improve socioeconomic status and is associated with better living habits and healthcare (OECD, 2019c).

In some countries, as previously mentioned, self-demand and competitiveness are strongly encouraged in academic training, both by families and teachers. We think that special attention should be paid to the psychological factors that can affect students in those countries.

Sentiment analysis in social networks can be a valuable tool for taking the pulse of certain social issues. In the case of educational systems, studies such as this one can help to guide educational policies, as well as to increase the transparency and efficiency of the education system.

5 Conclusions

Data science is an interdisciplinary field of scientific methods, processes and systems for extracting knowledge from data. Such science encompasses statistics, data analysis and its related methods, in order to understand and analyse current phenomena/problems (Deloitte, 2018). We apply data science to the study of the sentiments that citizens have about a problem as relevant as the education system.

The analysis of public opinion expressed in social networks makes it possible to visualise dominant opinions without the bias of social desirability. This type of analysis, as we have mentioned, has great potential both to capture the pulse of society on sensitive issues.

According to the results of our research, a predominantly negative perception of the education system has been found in most of the analysed countries. A positive perception is detected in certain low-income nations. Men exhibit a more positive sentiment than women as well as a higher subjectivity. The countries that exhibit the most positive perceptions are also those that manifest the highest subjectivity.

This research can be continued by extending the analysis to tweets written in some of the most widely spoken languages in the world today. The five most extensively spoken languages in the world are (not only mother tongues are considered): English (1,121 million speakers), Mandarin Chinese (1,107 million speakers), Hindi (698 million speakers), Spanish (512 million speakers), and French (284 million speakers) (LINGUA, 2022). We could use TextBlob to perform the sentiment analysis but we first translated the text of each tweet into English. Using TextBlob translation and language identification can be done through the Google Translate API. Other software tools could also be utilised to directly perform sentiment analysis in languages other than English.

An examination through focus group interviews can also complement the obtained results with qualitative information. Therefore, a phenomenological approach could be adopted, one focused on the collection of individual experiences of the participants in relation to a collective phenomenon (perception of the national education system), establishing different age cohorts.

Additionally, the semantic similarity between tweets can be can be determined generating graph-based spatial relationships between their texts. In this approach, the texts are represented as a graph using synonymous relationships. Community detection as well as other network analysis metrics can be utilised.

The methodology used in this research can also be used to examine citizens’ perceptions of other current issues.