1 Introduction

News published in the written press about different companies originates from the practices and events of these companies themselves. In turn, once these news items are published they project an image of these companies, which influences their reputation. Therefore, business practices (social, productive, economic, environmental and/or corporate) influence social opinion through what is said about them in the news, which, in turn, causes society to influence such practices through the image that is projected of them. Moreover, society is increasingly demanding social responsibility from companies, and requesting that they account for the social and environmental consequences of their actions. One way of measuring these social and environmental consequences is the Environmental, Social and Governance (ESG) investment criteria. These ESG criteria are a set of standards for a company’s behavior and are used as a tool for analysis, with which companies can try to measure their Corporate Social Responsibility (CSR), i.e., the degree of responsibility that the company adopts toward society (Porter and Kramer 2006). The ESG criteria for companies refer to the environmental, social and corporate governance factors that can be taken into account when investing in a company (Initiative 2005), as they influence the company in the form of corporate image. It is therefore a tool for analyzing the company’s environmental and social policies, which, in turn, can influence the company’s finances, in the form of reputation and image (good or bad).

ESG investment criteria are increasingly relevant when it comes to investing in a company. Indeed, they were priority topics at the World Economic Forum and the Davos Forum 2022 (ESG and Sustainable Finance Data Skills and Capacity Building Directory, 2020), (Davos 2022: How Businesses Can Deliver on ESG Promises | World Economic Forum, n.d.). In fact, for several years, many authors have studied the relationship between the application of ESG criteria and financial performance of the companies. Thus, Friede et al. (2015) demonstrate, through an exhaustive review, that applying ESG criteria in companies leads to better financial results. According to Amir and Serafeim, the main motivations for companies to use ESG information are, in order of importance: return on investment, customer demand, product strategy and, lastly, ethical considerations (Amir and Serafeim 2018). Brooks and Oikonomou also address the relationship between ESG criteria and financial performance. These authors find a link that is positive and statistically significant—but economically modest—between ESG criteria and financial performance on a company level. According to their article, there is an asymmetry in the financial impacts of ESG, whereby the negative financial effects of corporate social irresponsibility are greater than the positive financial effects of corporate social responsibility (Brooks and Oikonomou 2018). In their research, Fatemi et al. (2018) conclude that the strengths of ESG criteria increase company value and ESG concerns decrease it. Finally, Lee et al. (2016) find a significant positive relationship on a company level between environmental responsibility and financial performance, and between environmental responsibility and operational performance.

In this relationship between ESG investment criteria and the company’s financial results, the company’s reputation or image is a vitally important variable since it affects consumer satisfaction (Chun 2005). One way of measuring a company’s image is by taking into account two indicators: The first one is the sympathy that the company generates in society in general, and the second one is the company’s good financial results (Raithel et al. 2010). Society receives this data about companies, from external sources such as word of mouth, news, advertising, etc., and then forms an image of the company’s reputation (Kossovsky 2012). That is why, by performing a sentiment analysis (SA) of written news about companies, it is possible to measure the reputation they have in society. A positive Sentiment Analysis of news about companies will generate sympathy toward them, improving their reputation.

In this context, SA—a sub-discipline within data mining and computational semantics—is one way of measuring the image projected by news sentiments. According to Pang and Lee (2008), SA is a dynamic and extensively researched subject in the field of natural language processing (NLP). Its main objective is to computationally process the subjectivity in a given text and analyze the opinions, emotions, evaluations, and feelings of individuals. This powerful technique allows for a deeper understanding of data gathered from sentiment-rich sources such as news articles, social media platforms, reviews, and other similar content (Kim 2015). As a result, SA serves the purpose of extracting sentiments and emotions from text, finding applications in various domains, ranging from assessing customer satisfaction to understanding political opinions (Mäntylä et al. 2018; Pak and Paroubek 2010).

One limitation of SA is its capacity to score the degree of positivity or negativity within a given sentiment, without explaining the underlying reasons for these sentiments. SA only allows understanding the extent to which a sentiment is better or worse, as it provides degrees of sentiment. Consequently, when extracting sentiments from news articles about companies, the analysis remains incomplete because we really want to understand why those feelings are there. Upon identifying this limitation, a bibliometric review was conducted, revealing no existing research examining the meaning of terms related to ESG within written news articles based on the previously generated sentiment degrees (Liu et al. 2023; Mandas et al. 2023; Park et al. 2022; Salas-Zárate et al. 2017; Zeidan 2022).

The aim of this article, therefore, is to identify from written news those issues related to ESG investment criteria that influence whether a company has a better or worse reputation among consumers.

To achieve this, we will firstly identify news written in the press about certain companies. Then, from these news items and using SA techniques, a distinction will be made between those that generate positive and those that generate negative feelings. Finally, we will detect those terms related to ESG investment criteria through Word2Vec techniques executed in Python. It is possible to quantitatively obtain the vector distances between the different terms or words analyzed (word-embeddings), in order to observe those that are closer to—and therefore have greater affinity (Banawan et al. 2023) with—the term or terms of study in this research.

Therefore, thanks to NLP techniques (the combination of SA and Word2Vec methods or models), it is possible to detect, through the terms extracted from the news, the factors that influence whether a company has a better or worse reputation among consumers. As a result, companies will be able to identify, from the published news, those terms close to the ESG investment criteria that have a positive or negative influence on their own image. Among their practices related to ESG criteria, this can be a useful tool for helping companies identify which ones worsen and improve their reputation. In this way, they will be able to make strategic decisions to improve their image and, consequently, their financial results, through consumer behavior.

2 Methodology

The methodological process applied in this research is shown below (Fig. 1):

Fig. 1
figure 1

Methodological process

2.1 Database definition

The first step in the methodological process was to choose the business sample. A sample of financially consistent companies was sought. For this purpose, we selected the companies from the Eurostoxx 50 that had obtained the best dividend yield at the search date (May 2021). The eight companies with the best financial performance were as follows: Allianz, Basf, BNP Paribas, Daimler, Engie, Eni, ING andIntesa Sanpaolo (Cotizacion de EURO STOXX 50®—Indice—Resumen—Rentabilidad-Dividendo, n.d.)

2.2 Data extraction

After choosing the companies, the next objective was to retrieve the news written in the press about those companies. To do this, the original source was used, and these news items located. The query used in each case was the name of each company about which the search was being performed. The 500 most relevant news items per year were chosen for each of the companies from a time period covering 2017 to 2021. Where any company did not reach 500 news items in any year, all of them were chosen. In total, 19,953 news items were downloaded, distributed as follows according to the year (Table 1):

Table 1 Number of news items analyzed

Therefore, 2500 news items per company were downloaded (500 news items per year for 5 years) except in three cases: Intesa San Paolo, with 1768 news items, and ING with 661.

2.3 Cleaning and classification

Once the news download was done, it was then imported to the data mining software Vantage Point (Liu and Liao 2017). The data were then structured for subsequent export.

2.4 Main corpus creation

Once the data had been cleaned and classified, we then had a corpus with which to proceed to the next step—Sentiment Analysis. The aim here was to detect the topics that influence the reputation of the companies, both positively and negatively. For this purpose, two news corpora were created: the first made up of those news items that obtained a positive Sentiment Analysis, and the second of the news items that had negative results.

2.5 NLP: Sentiment analysis (main corpus)

The news items could then be exported to Orange, a machine learning and data mining suite for data analysis through Python scripting (Demšar et al. 2013). A Sentiment Analysis of the extracted news was performed using the VADER and Hu Liu tools:

  • The Phyton tool, Valence Aware Dictionary and Sentiment Reasoner (VADER), is a Sentiment Analysis framework that employs a lexicon-based approach to ascertain the sentiment values of a sentence. VADER has proven to be highly effective in analyzing social media texts, NY Times editorials, movie reviews, and product reviews (Abdul-Rahman et al. 2020). (Thu and Aung 2018; Shapiro et al. 2020; Yu et al, 2021; Medhat et al. 2014). The success of VADER stems from its ability to provide not only Positivity and Negativity scores but also to quantify the degree of positivity or negativity in a given sentiment (Tunca et al. 2023 (Simplifying Sentiment Analysis Using VADER in Python (on Social Media Text) | by Parul Pandey | Analytics Vidhya | Medium, n.d.).)

  • The Hu and Liu lexicon is another commonly utilized tool designed specifically for Sentiment Analysis of customer reviews. It classifies words into three resulting categories: Sentiment (a global measure of positivity), Positive, and Negative. The reason for selecting this tool is that it has been predominantly used in studies that do not center around textual production in social media. Its application has shown effectiveness in analyzing customer feedback and reviews in various domains (Khoo and Johnkhan 2018).

Given that there are two suitable tools, the first step in measuring the reputation of companies will be through Sentiment Analysis of published news, measured with VADER and Hu Liu.

2.6 Sub-corpora creation

Once the results of the Sentiment Analysis had been obtained, two differentiated corpora were created from the main corpus, with all the news items. The first corpus was comprised of all those news items that had obtained a positive number in the Sentiment Analysis with both tools (VADER and Hu Liu). The second corpus was composed of all those news items that had obtained at least one negative Sentiment Analysis with either of the two tools.

2.7 NLP: correlation

2.7.1 NLP: terms related to ESG

  • The identification of the terms most related to ESG was carried out in each of the two corpora (positive and negative), via Natural Language Processing (NLP) techniques. Those terms were environment, environmentally, social, socially and government. This was done through Word2Vec (NLP) models generated and executed in Python, in order to quantitatively obtain the vector distances of several terms, with a value of zero corresponding to the word vectorially closest to the chosen terms, and a value of one to that furthest away

2.7.2 NLP: visual representation

  • A visual representation of the data was obtained. By means of a conversion to a tabular structure in Python, this new information format, comprising of the vector distances of the words and their metadata, was imported into the TensorBoard Embedding Projector tool; thus obtaining a visual representation of the set of words that make up the word-embedding developed in step 2. The terms obtained were analyzed by comparing both corpora, detecting those terms that may have a positive and negative influence on the company’s reputation.

Following the prior generation of two corpora (positive and negative) and their subsequent cleaning, a Word2Vec model—using NLP techniques through Python—was then obtained for each corpus, with information on the set of vectors of the terms that make up the corpus (word-embedding). The set of vectors provides us with the vectorial distance between the different terms (or terms to be analyzed), so that we can establish those that are most similar to each other (Savytska et al., 2021).

The terms analyzed, in order to know those words that are closer and therefore related (the smaller the vector distance, the greater the affinity), were: “environment,” “environmentally,” “social,” “socially,” and “governance.” These terms were chosen because they are the ones that make up the initial ESG (Environmental, Social and Governance). In addition, when applying NLP techniques using Python, it was observed that the words “environmentally” and “socially” appear with a high frequency in the two generated corpora; so in order to cover the maximum number of terms referring to the ESG concept, these two terms were also analyzed and their corresponding Word2Vec model created.

The most important configuration used in Python during the application of NLP techniques in the generation of Word2Vec models was as follows:

  • Vector size: The word vectors used have a dimension of n = 200.

  • The architecture used to train the algorithm was the so-called skip-gram.

  • Negative sampling was used to train the model.

  • min_count: All terms with a total frequency of less than five were not taken into consideration.

  • Window: The maximum distance between the term to be studied and the word to be predicted within the corpus sentences was five.

  • Epochs: The number of iterations performed on each corpus was 10.

Next, Fig. 2 displays the Python code developed, incorporating within it, as an example, the term “social.”

Fig. 2
figure 2

Python code developed and executed

Subsequently, in order to provide another approach, these terms and their related terms were visualized in two dimensions using the Tensorflow Embedding Projector tool (Visualizing Data Using the Embedding Projector in TensorBoard|TensorFlow, 2022). For this purpose, the final Word2Vec models using Python were converted to tabular format, and these were imported into the Tensorflow Embedding Projector for subsequent mapping of the terms to be analyzed. Within this tool, the most important configuration applied was the following:

  • Data option: Word2Vec 10 K, as it adjusts to the dimension of n = 200 defined above.

  • Cosine distance: since the data distribution is unbalanced.

  • Number of iterations: 10,000 (stable projection).

  • Projection type: t-distributed stochastic neighbor embedding (t-SNE), since it fits correctly to two and three-dimensional displays (Skublov et al. 2022).

  • Data points: Since these are corpora with many terms, and in order to eliminate unwanted and non-valuable information, the number of points (terms) was reduced to 1000.

The described configuration is as follows:

Once the NLP analysis in Python has been exported to Word2Vec format, it is uploaded in tabular format to the online tool TensorFlow Embedding Projector, as shown in Fig. 3 below.

Fig. 3
figure 3

Data loading in TensorFlow Embedding Projector online tool

With the parameters set according to the defined methodology and after over 10,000 iterations, we obtain, as depicted in Fig. 4, the visual representation of words related to a positive outcome (and vectorially closer) concerning the term “environment.” For the remaining analyzed terms, the steps and configurations used are identical, except that when observing words with vectorially closer negative meanings to a term, the negative Word2Vec model, previously generated, has been loaded in tabular format instead of the positive one. Hence, as the configuration utilized for visual study remains standardized throughout this scientific work, for better reader comprehension and observation, the forthcoming images exclusively capture the visual analysis.

Fig. 4
figure 4

TensorBoard Embedding Projector menu

As seen on the right-hand side of Fig. 4, the TensorBoard Embedding Projector also provides us with terms that are vectorially closest to the search word (in this case, “environment”). The limitation present in this case is that, in order for the system to perform adequately within an acceptable computation time, we must significantly reduce the word sample, as indicated by the TensorBoard Embedding Projector itself, as depicted in the following Fig. 5.

Fig. 5
figure 5

Filtered in TensorBoard Embedding Projector

The reduction of the sample to a maximum of 10,000 words or points would involve a reduction (or non-utilization) of 68% of the terms from the positive corpus and 59% from the negative corpus. Therefore, by reducing the sample and eliminating such a significant number of terms, the list of terms that are vectorially closest to the search word provided by the TensorBoard Embedding Projector and their vector distances differ from the results of our unfiltered corpuses. This is precisely why the NLP methodology was applied using Python. This approach ensures that we consider all terms from our corpuses (a wider terminology) and a more accurate calculation of vector distances concerning the term under study.

3 Results and conclusions

The results and conclusions, outlined in their respective sections, were derived from the methodology described earlier. As detailed in the methodology, the primary corpus yielded the initial results. The results for each company were obtained after applying the SA with VADER and Hu Liu to the corpus of news. The relevant conclusions were then drawn based on these results. By utilizing NLP to extract terms from the sub-corpora and visualizing the data, we interpreted the outcomes to arrive at the final conclusions.

4 Results

4.1 NLP: sentiment analysis (main corpus)

Table 2 shows the results obtained from applying Sentiment Analysis to the different news corpora. In this case they have been divided by company and year, from 2017 to 2021. The numbers indicate the degree of “sentiment” obtained by each company each year, when applying the two SA techniques—Vader and Hu Liu. Figures below zero (shown in red) indicate a negative result, i.e., the sentiments extracted from those news items were negative. On the contrary, if the figure is greater than zero or positive, those news items generated positive sentiments or connotations.

Table 2 Sentiment analysis applied to the news corpus

In order to study the reliability of the two Sentiment Analysis tools, the Pearson correlation coefficient was calculated, with the results giving a coefficient between VADER and Hu Liu of 0.5624. Pearson’s correlation coefficient ranges from minus one to one. A value close to one indicates a strong positive correlation, while a value close to minus one indicates a strong negative correlation. A value close to zero indicates a weak or no correlation. In this case, a correlation coefficient of 0.5624 suggests that there is a moderately positive relationship between the two columns of VADER and Hu Liu numbers.

4.2 NLP: correlation

Word2Vec (NLP) techniques were used in each of the two corpora obtained by applying SA (the one formed from news that obtained a positive result and the one formed from news with a negative result). This was done by introducing terms related to ESG in the Python code. The terms were: environment, environmentally, social, socially and governance.

4.3 Environment and environmentally

The first study terms corresponding to this scientific work—environment and environmentally—were introduced into the execution of code in Python. In this way, we quantitatively obtained the terms “positive” and “negative” with lower vectorial distance (see Table 3), synonymous with related words, due to the continuous and constant appearance by proximity to the terms environment and environmentally, within the different sentences that make up the different corpora generated by news in the written press about companies.

Table 3 Terms classified by ESG term (environment/environmentally) and corpus

It should be noted that the greater the existing affinity, the closer the vectorial distance is to the value of zero; and, consequently, the lower the affinity, the closer the value will be to one.

4.4 Social and socially

The same process was then carried out, but this time introducing the terms social and socially into the model. The terms that were retrieved according to the vectorial distance in each corpus (positive and negative) are shown in Table 4.

Table 4 Terms classified by ESG term (social/socially) and corpus

4.5 Governance

Finally, the process was repeated, but this time with the third component of the initials ESG, Governance. Once again, the terms that were retrieved according to the vector distance in each corpus (positive and negative) were those shown in Table 5.

Table 5 Terms classified by ESG term (governance) and corpus

In order to draw conclusions about these terms, it was decided to classify them. Terms obtained in each corpus (positive and negative) were classified by topics: on the one hand, those terms related to ESG investment criteria were grouped together; on the other hand, those related to the ECONOMY, and finally, those with POSITIVE and NEGATIVE connotations were also grouped together. Those terms that did not belong to any of these sections were grouped in the “NON CLASSIFIED TERMS” section. Any term belonging to more than one section, appears in all of the sections to which it belongs. This process was carried out three times: first with the data obtained from the terms “environmental” and “environmentally” (from Tables 3, 4, 5 and 6); secondly, with the results obtained by introducing the terms “social” and “socially” into the model (from Tables 4, 5, 6 and 7); and finally, the same process was carried out with the data obtained by introducing the term “government” into the model (from Tables 5, 6, 7 and 8). The results obtained in each of the three cases are as follows:

Table 6 Terms from Table 3 classified by section and corpus
Table 7 Terms from Table 4 classified by section and corpus
Table 8 Terms from Table 5 classified by section and corpus

In order to provide a visual appreciation of the vectorial distances, thanks to the conversion to tabular format using Python and the subsequent import into the TensorFlow Embedding Projector tool, different analyses were carried out on the basis of the new perspectives and/or visual models (Figs. 6, 7 and 8).

Fig. 6
figure 6

Visualization of the words and clusters associated to the term environment (positive above and negative below)

Fig. 7
figure 7

Visualization of the words and clusters associated to the term social (positive above and and negative below)

Fig. 8
figure 8

Visualization of the words and clusters associated to the term governance (positive above and negative below)

5 Discussion ad conclusions

5.1 NLP: sentiment analysis. main corpus

To check the reliability of the data obtained from the Sentiment Analysis of the news, we first analyzed the tools used, in this case VADER and Hu Liu. For this purpose, the Pearson correlation was calculated between the data obtained with VADER and Hu Liu. In this case, the correlation coefficient of 0.5624 suggests that there is a moderately positive relationship between the two columns of numbers in the two tools. As the analysis coincides, it can be concluded that both techniques are valid for calculating news Sentiment Analysis, and therefore the data obtained are reliable.

Another result which allows us to conclude that the data obtained in the Sentiment Analysis are reliable is that negative results were only obtained in 14 out of 45 total cases, i.e., in 31.1%. The companies that the news reports refer to are financially consistent, and those news reports produce sentiments with positive connotations. In other words, financially consistent companies “produce” positive sentiments, and one of the variables for measuring the good reputation or image of a company is its financial consistency (Raithel et al. 2010). From this, it can be concluded once again that the data obtained are reliable.

5.2 NLP: Correlation. sub-corpora: terms related to ESG

5.2.1 Environmental and environmentally

If we look at the data visualization of the term environment, with regard to the positive terms (green box), three main clusters can be observed. One of these clusters is composed of the term “environment” together with its related words. In addition, this cluster includes a considerable number of related terms, thus generating a significant and noteworthy area, synonymous with the importance and influence it generates and its high frequency of appearance in the different news items in the written press. As for the negative terms (red box), two main clusters can be seen, which indicates a lower segmentation, but maintaining the same explanations as above; i.e., generated by the term the cluster “environment” and its related terms is relevant and, therefore, remarkable within the “negative” corpus of news in the written press.

Regarding the terms related to “environmental” and “environmentally,” the following was highlighted: The positive corpus contains many terms associated with ESG investment criteria, and several of them have a positive connotation (wellbeing, cleanest, lower-carbon, zero-carbon); in turn, the negative corpus has only one term associated with ESG criteria, and it has a neutral connotation (socially). As for the terms associated with ECONOMY, there are several characteristics: Among the terms extracted from the positive corpus, some of them have a positive connotation (cost-effective, cost-efficient, industry leading, value-add), and several of them are related to productivity. In the negative corpus, on the other hand, some economic terms refer to capital or property (owning, rewarding). Moreover, terms associated with intentionality, i.e., actions that can help to achieve a desired result, were also detected: influenced, manageable, calculate, geared, and facilitating. Finally, the positive corpus contains many terms with positive connotations (8), and none with negative connotations. The negative corpus, on the other hand, despite containing several positive terms (4), has many more negative ones (13).

Several conclusions can be drawn from the results obtained. On the one hand, the fact that there are terms with a positive connotation in the positive corpus and terms with a negative connotation in the negative corpus confirms the reliability of the data and of the methodological process. On the other hand, terms related to the ESG criteria appear in the positive corpus, meaning that ESG criteria are associated with good practices. Moreover, the fact that there are so many terms associated with the economy indicates the close relationship between the environment (keyword) and the economy, supporting the initial thesis that ESG investment criteria are closely linked to the company’s reputation and, therefore, to its financial results. It can also be seen that several of the economic terms extracted from the positive corpus indicate good results in terms of productivity; i.e., they focus on the process, on how to do, which, linked to ESG terms, can be related to sustainable development. The concept of sustainable development implies imposing limits on technology and the social organization of environmental resources to absorb the effects of human activity (Kates et al. 2005; Geissdoerfer et al. 2016). In contrast, the economic terms in the negative corpus refer to raising capital. If we relate this to the fact that there are also many terms that indicate intentionality, it can be associated with the “use” of the environment as a reputation-enhancing tool, i.e., with greenwashing, or how companies deceive consumers about their environmental performance. Such practices can have negative effects on consumer and investor confidence (Delmas and Burbano, 2011; Strauß, 2022; Mendonça et al. 2023).

Therefore, we have detected the practices related to the environment within the ESG investment criteria that improve and worsen the reputation of companies in the news: those related to sustainable development improve it while those related to greenwashing worsen it.

5.2.2 Social and socially

Regarding the visualization of the data with the term social, among the positive terms (green box), the term social belongs to the main cluster, but does not stand out as an independent cluster. Therefore, it is an important but not crucial term in the various news items analyzed. This visual information is consistent with the analysis of vectorial distances (see Table 4), which also shows that most of the words related to the term social have vector distances greater than 0.5. As for the negative terms (red box), there is no segmentation since there is only one cluster, which includes the term “social.” In this case, as with the positive terms, it is a notable but not crucial term, which coincides with the quantitative analysis corresponding to the vectorial distances.

Once again, the reliability of the data and of the methodological process is confirmed. On the one hand, in the positive corpus there are more terms with positive connotations (15) than in the negative corpus (9). On the other hand, in the negative corpus there are more terms with negative connotations (6) than in the positive corpus (1). As for the terms associated with ESG investment criteria and the environment, almost all of them appear in the positive corpora (6) (environmentally, culturally, lower-carbon, greener, environmental, governance), while in the negative corpora only one associated term appears—governance. In other words, ESG investment criteria have a positive connotation in the press, and this can have an impact on the good image of the company.

If we focus on the positive terms in the positive corpus, they can be classified into three large blocks: those related to ESG (greener, healthier and nurture); those related to the economy (value-add, prosperity, security, wellbeing and cohesion); and finally, those terms related to ways of doing or of acting (minded, professionally, trustworthy, proactive, conscious, emotionally). These terms can be related mainly to a strong work ethic, and to positive environmental, social and economic results. Therefore, news items that positively evaluate ESG investment criteria relate work ethics to good environmental and social performance and financial prosperity. As for the terms with a negative connotation (almost all of which appear in the negative corpus), once again we can see that they are terms that indicate intentionality (influenced, manageable, calculate, facilitating, geared) or bad practices (butt, critic, misunderstood, irresponsible, discriminate). Considering that all these terms come from the keywords social, and socially, we can relate a “use” of the social aspect of the company to achieving a good image, i.e., “socialwashing.” In fact, Nardi suggests that CSR communication can be decisive in discouraging “socialwashing” (Nardi 2022).

It is therefore clear that good social practices in companies get “good press” and, consequently, improve their image. On the other hand, social practices whose sole objective is to improve their image have the opposite effect.

5.2.3 Governance

Finally, with regard to the term governance, among the positive terms (green box), four clusters can be observed. Two of these clusters are practically insignificant (“ssga” and “not-for-profit”), and the third one (“thresholds,” “glow,” “values,” etc.) has little influence on the main one. In the main cluster (and with the largest area), we find the term governance together with its related words, this being considered a cluster and term that is notable and influential in the different news items analyzed. As for the negative terms (red box), there is only one single cluster, which includes the term governance, so there is no segmentation whatsoever. In this case, and in contrast to the negative terms associated with social, the quantitative analysis corresponding to the vectorial distances supports the importance of the term governance and its related terms, and, therefore, its high frequency of appearance and notoriety within the “negative” corpus generated by the analyzed news.

When we introduce the term governance, unlike in the two previous cases (environmental/environmentally and social/socially), the differences between terms with positive and negative connotations are not apparent. In fact, in neither of the two corpora are there any terms with negative connotations. Once again, most of the extracted terms can be classified into ESG and ECONOMY.

The terms in the positive corpus are related to corporate management on the one hand (accountability, chairmanship, boardrooms), and to social responsibility on the other (diversity, responsibility, inclusivity). In other words, they deal with the responsible management of companies. The terms of the negative corpus also deal with corporate social responsibility (transparency, diversity, engagement, ethical). Among these, the terms transparency and ethical stand out, in clear reference to a “clean” management of the company. However, they do not do the same in a general context as in the positive corpus, but focus on specific companies and entrepreneurs (Landed-mills, Sarasin, Zeb, Deka, Black-Rock). They deal with the ethical and transparent management of specific companies. In other words, the focus is on the responsible and ethical management of certain companies. As there are no adjectives or names with a negative connotation in the corpus, it is not possible to know the term of the criticism, whether it is in a positive or negative sense. It can therefore be concluded that when news items refer to corporate governance, the focus is on the ethical and responsible management of certain companies.

5.3 NLP: Correlation—Sub-corpora—visual representation

In terms of data visualization—via the NLP technique and Word2Vec models—as expected the results obtained are in accordance with the graphical representations observed in the TensorFlow Embedding Projector tool. The concepts or terms “environment” and “governance” can be seen both in the positive and negative variants, where they have general vectorial distances between 0.325 and 0.474, and are always part of the main clusters or large clusters. Therefore, their frequency of use, and consequently, importance and influence in the news about different companies in the written press, is quite remarkable. The term “social,” on the other hand, has an overall vectorial distance between 0.441 and 0.561 (except for the words “security” and “media”), and is not always part of the main clusters or large clusters. Therefore, although it may appear in the news, its frequency of appearance, and consequently its influence compared to the terms “environment” and “governance,” is not as high. This is synonymous with the fact that companies today are giving greater importance, within ESG, to the environmental and governance aspect than to the social aspect.

In any case, it can be concluded that the news items about companies that appear in the written press deal with the issue of ESG business investment criteria. On the one hand, it has been shown that when talking about the environment in news related to companies, those business practices related to sustainable development improve the company’s image; but on the other hand, those related to greenwashing worsen it. On the other hand, with regard to corporate social practices, we can conclude that good corporate social practices improve the company’s image, while social practices whose sole objective is to improve their image—known as socialwashing—have the opposite effect. Finally, when news items refer to corporate governance, the focus is on the ethical and responsible management of certain companies.

5.4 Implications and limitations of the study and future research

The implications of the study for scientists, business and society have been identified. For academics, as it is a new methodology, it opens up a new perspective on SA research. In terms of interdisciplinary research, it facilitates collaboration between areas such as linguistics, computer science and social sciences by merging text analysis and sentiment processing, thus fostering the exchange of knowledge and approaches. Moreover, by being applicable to a wide range of subjective texts, from news to social media posts, it broadens the scope of research in areas such as psychology, sociology, and communication. For companies, it becomes a strategic tool to understand and improve their brand image. By identifying terms that generate negative sentiment, companies can adjust their communication and marketing strategies to address issues and improve their brand perception. It also provides an agile tool to monitor brand reputation in real time, enabling a rapid response to changing trends and perceptions. Finally, the implications for society have been analyzed. By enabling SA in various types of texts, society can better understand perceptions, opinions and reactions to issues, products or companies. This promotes greater transparency in the information that is disseminated and helps society to make more conscious consumer decisions and engage in informed discussions in social networks and other media. In addition, society can influence companies to act more responsibly, as public perception can affect their image and reputation. Finally, this analysis can provide information on emerging social trends, changes in cultural perceptions and evolving attitudes toward different issues. This can be useful for governments, non-profit organizations and other actors in decision-making and strategic planning.

In terms of the limitations of the study, the feelings generated by certain topics, and their associated words, can evolve over time. This requires constant updating of the models and studies carried out, as they may become obsolete. On the other hand, when analyzing texts, there may be difficulties in accessing them. In addition, privacy and ethical concerns must be considered, as misuse of personal data or misidentification of emotions could lead to unintended consequences. It is important to consider these limitations when applying this methodology, as they could affect the accuracy, applicability and ethics of the results obtained.

Future research in this field could focus on several aspects to improve and broaden its application. On the one hand, exploring how this model can be automatically adapted and updated to reflect changing trends. On the other hand, research could be extended to address linguistic and cultural diversity by developing SA models that are applicable to different languages and cultures. Finally, the integration of research with other areas, such as artificial intelligence, psychology or sociology, could be explored to gain a deeper understanding of how emotions relate to other human aspects.