Introduction

Large European companies are required to include non-financial information as part of their annual public reporting obligations since the Non-Financial Reporting Directive (Directive 2014/95/EU) was transposed by each Member State. The European Commission launched its Communication on the European Green Deal, which included the review of this Non-Financial Reporting Directive in order to stress sustainability and improve transparency. Until now, non-financial information has not been sufficiently comparable or reliable and may be irrelevant. Some companies do not report such information, although stakeholders require it. So, it is necessary to try to normalize the disclosure of non-financial information and to increase its comparability and homogeneity. In June 2020, the European Commission requested technical advice from the European Financial Reporting Advisory Group (EFRAG). The goal of this mandate was to get advice about the elaboration of possible EU non-financial reporting standards. The EFRAG created two specific groups: the project task force on EU non-financial reporting standards (PTF-NFRS) to satisfy this objective and on European sustainability reporting standards (ESRS) (Ortiz-Martínez, 2021). The PTF-NFRS launched a series of outreach events in January 2021 all over Europe to gather stakeholders’ views on its tentative proposals from different countries. One of the conclusions obtained from these events was the need to normalize the sustainability language. In the first quarter of 2021, its final report was published. The PTF-ESRS drafted the ESRS from June 2021 to April 2022 when the new EFRAG sustainability reporting pillar was ready to tackle the challenge of finishing these draft ESRS currently in the hands of the European Commission. This first set of ESRS has to publish as a delegated act by June 2023. In December 2022, the new Directive on sustainability reporting was published: the Corporate Sustainability Reporting Directive (CSRD), which establishes that only listed SMEs will be compulsorily required to disclose sustainability statements.

This non-financial information will still be voluntary for non-listed SMEs, even though they are the backbone of the European economy. Many European SMEs provide non-financial information because they are sometimes under pressure from their providers, clients, or suppliers. In addition, they need it to apply for financial resources or even because they imitate large companies and try to differentiate themselves from their competitors as part of a trickle-down effect or by following the leader.

The background to the sustainability and non-financial information disclosed by SMEs is scarce (Khoja et al., 2022) and far from sufficient to develop requirements about it now when Europe is immersed in a race to standardize sustainability reporting and not only Europe but also worldwide. Studying sustainability reporting is also difficult due to its mainly qualitative and narrative nature, which hampers the comparison of information. Furthermore, although most companies use Global Reporting Initiative (GRI) standards, this does not suppose homogeneous information. Hence, it is necessary to look for another type of methodology, such as lexical analysis, which means studying the text and words used in the sustainability reports by European SMEs. Although the literature on sustainability reporting is extensive, this is not the case for the studies focused on SMEs, which are most companies in Europe. So, this paper fills in this gap and the necessity of results on the content of sustainability reporting by SMEs, going further in the specificities of this type of reporting by companies with less size and resources. So, the main objective of this study is to analyze all the sustainability reports issued by European SMEs according to GRI to check their linguistic characteristics and ascertain whether there are differences in the non-financial literature depending on the size of these SMEs, as they can be very different. The structure of this paper is as follows: first, the theoretical framework that justifies our analysis and the research questions are presented. This is followed by a description of the methodology and the sample. Finally, the results are presented, and the paper ends with the main conclusions.

Theoretical Framework and Research Questions

Specifically, in non-financial disclosure, numerous studies have sought to analyze the specific features of this disclosure through content analysis or disclosure indexes. However, some authors have highlighted the inconveniences of these methodologies (Ortiz & Clavel, 2006). Due to the nature of non-financial information, which is mainly narrative, uncodified, and heterogeneous, it is desirable to analyze the texts from a statistical point of view, mixing statistical and lexical analysis (Leimdorfer, 1995).

The use of lexical analysis has been broad in other fields, such as language, memory, neuro-psiquiatrics, neuro-psychology, and computational linguistics among others (Degaetano-ortlieb & Teich, 2011). In the information published to the market by companies, banks, or other entities, some studies have applied lexical analysis but not specifically to the analysis of sustainability reporting (Adams & Cruz García, 2007; Borsje et al., 2010; D’Northwood, 2017). For example, lexical analysis has been used to analyze a company’s annual reports during a period (Ortiz Martinez & Crowther, 2007) focusing on financial information. It has also successfully analyzed non-financial information (Ortiz-Martínez & Marín-Hernández, 2020) but with a sample of financial services companies.

It has been proved that financial analysis is improved by using linguistic indicators to comprehensively analyze the information issued by companies (Myšková & Hájek, 2017). This is because the study of annual reports as a source of information is usually done from a quantitative point of view. However, they are also an essential source of qualitative information. Using words with sustainability meanings in annual reports offers essential information to stakeholders and is even an indicator of future profitability (Myšková & Hájek, 2017, 2018). Including sustainability information in the financial report answers the challenge of transparency and better communication in the “new millennium” (Gallhofer et al., 1999, 2011).

There is a call to solve the problems that accounting narratives can have (Sydserff & Weetman, 1999). This is another view of the accounting literature, which considers its lexical dimensions through empirical analysis (Sydserff & Weetman, 2002), a methodology extended to CSR disclosures when firms start elaborating non-financial information.

Tools of readability of the company reports have helped determine if non-financial disclosure is only a strategy to show socially and environmentally friendly behavior, hide other information, or use complex textual disclosures. However, the evidence shows that companies that engage more with CSR goals are more likely to comply with transparency and issue reports with higher readability (Bacha & Ajina, 2019). Also, these lexical analyses have established differences between non-financial and financial information features, so distinguishing between voluntary and compulsory disclosures, concluding that CSR information is less disclosed in social media than financial information in annual reports (Ramananda & Atahau, 2019). Furthermore, that regulation (as the European Directive 2014/95/EU) influences sustainability reporting quantity and quality (Ottenstein et al., 2022). On the contrary, there are other studies, such as Ernst et al. (2022), that check that regulatory pressure reduces the sustainability performance of SMEs. However, it is based on a survey, not an analysis of the reporting. Specifically, within the non-financial information, environmental literacy, voluntary or compulsorily, is studied as an essential aspect. In this linguistic analysis, it is found that texts constructed around environmental issues offer an image of the position of individuals and social groups (Hardy, 2013).

Two different effects have been identified in the literature which shape the disclosure characteristics. These are size and exchange (Atiase, 1987). In terms of size, it is assumed that there is a positive relationship with disclosure (Atiase, 1987) and more so when the companies operate in environmentally unfriendly activities (Stacchezzini et al., 2016). The reporting of non-financial information has been extensively studied in big companies rather than SMEs. There is research that has compared disclosure issued by big companies and by SMEs (Dias et al., 2019) and also studies that point out that SMEs are socially responsible or sustainable but do not disclose this non-financial information because they do not have the necessary financial resources (Baumann-Pauly et al., 2013; Ram et al., 2001). Nevertheless, it is proved that SMEs are involved in sustainability practices to enhance the legitimacy of their operations for their stakeholders (Crossley et al., 2021).

We have only found a previous study that applied lexical analysis to non-financial information disclosure by SMEs focusing on European financial sector SMEs (Ortiz-Martínez & Marín-Hernández, 2020). A reason for not existing can be that SMEs are issuing sustainability reports voluntarily, mainly due to the influence of big companies (Santos, 2011), for which it is compulsory (European Commission, 2016), or because they want to differentiate or to gain a competitive advantage (Torugsa et al., 2012), and so there are no databases to analyze this disclosure. Also, this non-financial information reported by SMEs complies with Global Reporting Initiatives (GRI) standards as the generally accepted standards in this field worlwide (Ortiz & Marín, 2014).

Based on the above arguments, we study sustainability reports issued by European SMEs using statistical methodology and lexical analysis and propose the following research questions:

RQ1: Which European SMEs disclose sustainability reports according to GRI?

RQ2: What is this disclosure and its assurance from the point of view of compliance with the core requirements of GRI?

RQ3: What are these sustainability reports like from the lexical point of view?

RQ4: Are there differences in the reports from the lexical point of view according to company size?

Methodology and Sample

Sample Selection

As this paper aims to study sustainability reporting according to GRI voluntarily disclosed by European SMEs, we have obtained the sample from the GRI database. Since GRI non-financial reporting standards are the most extended worldwide and SMEs are not required in Europe to elaborate on this information, this database is a good resource. The search tool of the GRI database allows one to look for non-financial reports by choosing the firm size and, specifically, reports issued by SMEs. We made our search on 11th November 2019 with the following criteria: firm size — SMEs; region — Europe; and report type — GRI-standards. Although previous versions of the GRI standards are included in the report type, these are the latest ones, published by GRI on 1st July 2018, which replaced the GRI 4 version (Corresponsables, 2023).

There were 116 organizations and 157 reports found. This means that some firms (or other organizations) issue more than one report because these standards refer to 2016, 2017, 2018, and even 2019. Nevertheless, only some of these reports are suitable for a linguistic analysis of their content. Although these reports are supposed to be globally oriented to be understood by anyone worldwide, 105 out of 157 are written in their mother tongues and are not suitable for analysis. Thus, we can only study non-financial information written in English, the most international language. This left 52 reports, of which eight had to be removed because they were published in HTML format, so analyzing them with lexical software is impossible. Finally, our sample comprises 44 sustainable reports issued by European SMEs. From the point of view of the lexical analysis methodology, the sample has the appropriate size, measured by the size of the corpus (number of words or tokens) compared to previous valid studies (Kaity & Balakrishnan, 2019; Kocon et al., 2019; Ortiz Martinez & Crowther, 2007; Ortiz-Martínez & Marín-Hernández, 2020).

Methodology

We used SPSS to analyze the features of the sustainability reports, showing the frequencies in absolute values and percentages. All features were taken from the GRI database. This first analysis will help categorize the features of European SMEs that voluntarily report on sustainability following GRI standards. This is a core first step to knowing the type of SMEs that tackle this challenge.

Then, we studied the content of these reports. The 44 reports written in English and pdf format are analyzed in this part of the paper. First, it is necessary to convert the reports into TXT files to run a lexical analysis. We used free PDF-to-text software to get the files correlatively organized according to the firm. These TXT files made up the corpus. For this analysis, we use a statistical methodology that allows us to compare disclosures, get a corpus’s main characteristics, and to find word patterns. We chose WordSmith Tools 7 software published by Lexical Analysis Software and Oxford University Press since 1996. We use different utilities that this lexical analysis software offers explained in the results section. All these different lexical tools allow us to analyze the content of the sustainability reports, not only the features of the SMEs that report but also what they report about sustainability, if there are significant differences in these texts as the reporters are also different.

Also, although all companies in the sample are classified as SMEs by GRI, we have noticed significant differences in size. So, we want to classify the sample again according to size. We chose the number of employees as a generally accepted indicator of a company’s size. We obtained the number of employees mainly from the sustainability reports issued each year by each company in the sample. However, sometimes it was not included, even though it is supposed that employees are one of the core pillars of the sustainable information of the company. In these cases, we took it from the Bloomberg database. Then, we calculated the quartiles of the variable number of employees to categorize it (Laitinen, 1992; Mckee, 2000). To check the robustness of this categorization, we ran the M test (Sârbu & Pop, 2001). As a result, we obtained a classification of these reports according to SMEs’ size of 4 groups. These groups were helpful for the last part of our analysis, in which we checked if there were differences in the lexical features of the reports depending on SMEs’ size. We studied the grouped averages of the lexical features according to SMEs’ size and ran a test of paired samples by applying a T-test to all the different pairs of size quartiles to check if there were significant differences between the different averages. This gave six hypotheses of the same average between each pair of different companies depending on their size. Finally, we applied the lexical tools again to reports classified by company size. Although all the sustainability reports analyzed come from European SMEs, as there are so different types of firms included in this category, these tools were helpful to check if the content of the reports depends, within the SMEs, on their size, bearing in mind that is not comparable, for example, a micro-company with a medium-sized company.

Results and Discussion

European SMEs Disclosing English pdf Non-Financial Information According to GRI

Firstly, it is interesting that sustainability reports voluntarily issued by European SMEs, according to GRI, are few (157). Moreover, only 44 (28%) are written in English, disclosed to international stakeholders, are in pdf format, and are ready to be downloaded and analyzed. Hence, the sample is composed of mostly global non-financial information.

When using lexical analysis methodology, what is studied is the size of the corpus: the number of words or tokens, and as explained in this paper, our corpus is bigger than others analyzed (Kaity & Balakrishnan, 2019; Kocon et al., 2019; Ortiz Martinez & Crowther, 2007; Ortiz-Martínez & Marín-Hernández, 2020). Therefore, the first step in the analysis is focused on categorizing the European SMEs that voluntarily disclose non-financial information according to GRI standards as the generally accepted sustainability standards worldwide. These first results show whether having some specific characteristics as SMEs to disclose sustainability reports is necessary. Hence, the answer to our first research question (RQ1): which European SMEs disclose sustainability reports according to GRI?

Table 1 classifies these 44 reports according to the sector in which the company operates and its country of origin. When discussing the sector, we have two kinds of breakdowns: one more disaggregated and another with grouped sectors. Here, it is worth pointing out that the authors have not chosen the characteristics of the sample. However, these are the GRI research tool results when including SMEs as firm size, Europe as region, and GRI-standards as the report type.

Table 1 The sample: European SMEs disclosing English pdf non-financial information according to GRI

SMEs that publish more sustainability reports are from Sweden and Germany (13% of the sample each), followed by the Netherlands and Greece (11.4% each). Companies that operate in the industry sector issue 25% of the total reports, and important within them are energy (6.8%) and healthcare products (6.8%). Also, real estate is essential in this disclosure (18.2% of the sample). However, regardless of these different industries in the sample, as Edeigba and Arasanmi (2022) state, industry type does not influence the sustainability practices of SMEs.

Table 2 includes the 44 reports of sustainability classified according to their year, adherence level, and if they are integrated reports following the requirements of the International Integrated Reporting Council (IIRC). More than 50% of the sample of reports are dated 2018. Almost all of these sustainability reports are made according to GRI but only to the core level (93.2%), and also, a significant number of reports are not integrated (84.1%). So, European SMEs which issue global non-financial reports do so in an accordance-core adherence level to GRI and a non-integrated report. SMEs are facing different handicaps that block sustainability reporting (Cantele & Zardini, 2020), but the lack of resources does not mean a lack of commitment to environmental improvements (Cassells & Lewis, 2011) although they comply with requirements at a core level.

Table 2 The sample: English pdf non-financial information issued by European SMEs according to GRI

If we break down the features of assurance of these reports, it can be highlighted (Table 3) that the primary trend does not verify the non-financial information (72.7%) externally. When there is an assurance of this information, it is done at a limited/moderate level (12 out of 12 reports that are verified), with the scope of only specified section(s) (8 out of 12 reports that are verified), and verified by an accountant (10 out of 12 reports that are verified), concretely by one of the Big Four (9 out of 12 reports that are verified). The assurance standards mainly used are ISAE 3000 (10 out of 12 reports that are verified). SMEs are not used to verifying their sustainability reporting, mainly because it is a burden, and the benefits do not exceed the costs because the cost–benefit assessment is not positive (IFAC, 2021).

Table 3 The sample: non-financial assurance of English pdf non-financial information issued by European SMEs according to GRI

Sustainability Reporting of European SMEs Disclosing English pdf Non-Financial Information According to GRI

Firstly, we use the Wordlist application of lexical analysis to study the main features of all the reports included in the sample (44 reports of non-financial information) (Table 4). After identifying the main features of the European SMEs that voluntarily disclose sustainability reportings and their scope and assurance, the next goal is to study the content of these reports. As the sustainability reports are mainly narrative, using lexical tools is a proper way to tackle this challenge and obtain an answer to our third previous research question: RQ3 — what are these sustainability reports like from the lexical point of view? Whether the SMEs are different, the content of the sustainability reports must also be different, or perhaps there are some trends because there is some “standard model” to fill in.

Table 4 Main features of the analyzed sustainability reports

Our sample is composed of 975,870 words, which are called tokens by the application. There are 44 reports from 37 different European SMEs (only six companies are issuing two reports in the sample). Analyzing the reports issued by the same company allows us to state that there are no significant differences in these reports when there is a deeper lexical analysis (Table 4, with the same number of SMEs in the second column).

For example, if we study the case of SME12, we can check that the number of words in the two consecutive sustainability reports is large. Nevertheless, the column “Types” includes the number of different words in the reports, removing the repeated words. Besides this, the next column, called TTR, includes the percentage of the TTR ratio, calculated as the different words over the total words, but which is more comparable when calculated as a standardized TTR, without depending on the extension of the text. After analyzing the standardized TTR, we can check that the difference in the two reports of the same company is less than two percentage points, like other differences between different reports from different companies included in Table 4.

We can see from all the reports that there are important differences in extension. For example, some are very extense (74,118 words), while others are brief (the briefest has just 1838 words). However, bearing in mind the standardized TTR, all the reports are very similar, with a ratio from 31.55 to 45.8%. There is only one exception (15.6%) from a 2017 report of a French company. So, we can state that except for the extension of the reports, if we analyze relative measurements, there are not so many differences, which was not the expected result because the companies come from different countries and operate in different sectors. Therefore, the sample includes all the European SMEs compiled in the GRI database, and regardless of the different countries of origin of these companies, the extension of the reports is very similar, which means there is a general trend in it and something shared in all of Europe what can also be a clue for an international tendency.

Table 5 includes the words most frequently used in all the sustainability reports in absolute values and percentages (the first and second positions are included as examples). We have to wait until position 18 to find a word that has meaning for our results: “management”, whose frequency is very low. This word can be used not only in a non-financial context but also in a financial one. All the following words are less frequent than the previous ones and have a very low use percentage. In position 19, it is the word “report”, almost at the same level as “management”, which shows that the company discloses this information and improves its transparency. Also included is “reporting”, which means the same. All the other most frequently used words refer to a significant aspect of non-financial information, for example, “GRI” because all these reports are elaborated according to these standards; “employees” because information about human resources is core; and so on. Note also that Table 5 includes the word “financial”, which shows that although we are analyzing sustainability reports, the financial aspects of the company are not something isolated and without relationship to the non-financial ones.

Table 5 Word analysis of the sustainability reports

The last part of the lexical analysis is included in Table 6, which includes the results from the concordance tool, which show the total number of times one word is found in the neighborhood of the chosen word. In our case, the chosen words are the most frequent words included in Table 5. When one word is studied isolated, it is not possible to know if it refers to the concepts of non-financial information. This is why it is helpful to get the concordance to check the relationships between the words and the context in which they are being used. Table 6 includes for each one of the most frequently used words the first eight words with the most important relationships (as positioned from 1 to 8 for each one of the most frequently used words), and in the column appears the number of reports in which the two words appear together (the maximum is the 44 analyzed reports) followed by the total number of times the two words were found together.

Table 6 Concordance of the most frequently used words in the sustainability reports

The words, report, annual report, and reporting, are mainly associated with sustainability and appear together in nearly all the reports. Companies are disclosing non-financial information to comply with the stakeholders’ requirements for sustainable information. The word “sustainable” also appears with the words: development and business. This is a way to include sustainability not only in the issue of information but also in the business and its management. Moreover, the word “management” is found together with the words: risk, company, parent, group, board, and directors, which are only words of the day-to-day management of the company. Other words comply with the requirements of the non-financial information, such as “GRI” with “standards”, or “employees” with “number” or “training”, because these are required data about human resources or “energy” with “consumption”. After all, again, this information has to be issued. Finally, some words seem to have a financial meaning at first sight, which is checked with the concordance analysis, for example, the word “financial” with “statements” or “value” with “fair”.

Differences in Non-Financial Information According to GRI Depending on the Size of the European SMEs

Although all the companies whose reports have been analyzed are SMEs, following the criteria of the GRI database, there are important differences in the size measured through the number of employees. As seen in Table 7, the dispersion of the data is huge, with really small companies (the minimum number of employees is 3) and big ones (the maximum number of employees is 29,612). Therefore, we have calculated the quartiles to establish four groups of companies by size. According to these four groups, we again analyze the lexical features of the reports to check for differences. Therefore, the last step of the analysis is to check the established fourth research question: RQ4 — are there differences in the reports from the lexical point of view according to company size?

Table 7 Descriptive statistics of size measured through number of employees

Table 8 shows the means of the total words included in each report depending on the quartiles of the size of the companies. Also included are the means of the standardized TTR following the same criterium. If total words give us the extension of the reports, the standardized TTR measures the different words that are not repeated in the text without the influence of the different extensions. It is clear that the length of the sustainability reports is different in the smallest companies, and there are no important differences in the three bigger quartiles. The smallest companies elaborate the shortest reports, with fewer words than half of the total average (the average of words is 22,178.86, and the average of words of the reports of the smallest companies is 9894.1818). So, the smallest companies are the briefest when publishing about sustainability. Nevertheless, at the same time, although the biggest companies elaborate the most extensive reports, they repeat more words. So, the biggest companies try to increase the length of the non-financial reports using the same words (only companies included in the first quartile of size have a standardized TTR over the average, Table 8).

Table 8 Mean of total words and standardized TTR of the reports by size quartiles

Going deeper into these differences in report length depending on the size of the companies, we ran a T-test of mean differences. Table 9 shows the results of the test of unpaired samples about the report length by size quartiles. They are statistically significant for the first three pairs of report length, which means the null hypothesis is rejected. Therefore, there are significant differences in the extension of the reports between the smallest companies and the other three quartiles of size. Concerning the extension of the sustainability report, it is important if the companies are in the first size quartile: the smallest ones. We obtained the same result when analyzing the average report length by size quartile.

Table 9 T test of mean differences of reports’ length by size quartiles

Table 10 shows the most frequently used words in each size quartile (with a frequency higher or equal to 0.20%). Most of these words are the same regardless of company size. The most frequent words are used in the four quartiles, in three out of four, or at least in two out of four. So, at first sight, there are no significant differences between the content of the sustainability reports depending on the size because they use the same words. However, the smallest companies use specifically four words most frequently: approach, global, water, and environmental. Companies included in the second size quartile use exclusively the words: financial, board, and statements. The biggest companies, those included in the third and fourth quartile of size, use most frequently the words: group and development. Companies in the third quartile also use the word: interest, and the biggest ones, in the fourth quartile, use the word: risk. So, it seems specific words depend on the company’s size. While the smallest ones use more words related to the environment, the biggest ones focus on the development and the risk that these sustainable issues can suppose for the company.

Table 10 Word analysis of the sustainability reports by company size

The next step in the lexical analysis is to obtain the concordance of the most frequently used words by size (Tables 11, 12, 13, and 14). The concordance has been obtained for all the words that appear most frequently in at least three size quartiles (Table 10) (this means seven words: GRI, management, sustainability, report, business, company, and employees). In each size, the quartile has included the five first words (the position is in the first column of Tables 11, 12, 13, and 14) according to their concordance with every one of the seven most frequently used words.

Table 11 Concordance of the most frequently used words in the sustainability reports by company size: Q1
Table 12 Concordance of the most frequently used words in the sustainability reports by company size: Q2
Table 13 Concordance of the most frequently used words in the sustainability reports by company size: Q3
Table 14 Concordance of the most frequently used words in the sustainability reports by company size: Q4

From the concordance analysis, we check that the most frequent words used in the reports that do not depend on size also appear in the neighborhood of the same words. So, the words that are together in the reports are the same regardless of the company size. For example, the word “GRI” appears in the reports mainly related to standards or disclosures; the word “sustainability” is included in the reports together with the word “report”; the word “business” appears mainly with the word “sustainable”, “ethics”, or “management”; and the word “employees” is included in the reports to show the data required by GRI about the number of employees. Again, these results are very similar to the ones obtained for all the reports without classification based on size (Table 6).

Conclusions

This paper adds essential conclusions to the research and current field of sustainability reporting by SMEs, which is mainly issued voluntarily. Firstly, it is interesting that these reports issued by European SMEs according to GRI standards are few, with 158 reports obtained from the GRI database during the last 4 years. Moreover, only 44 out of 158 are written in English and pdf format. So, although sustainability information is supposed to be for global stakeholders, this does not hold for sustainability reports written in local languages as the sample comprises SMEs and their stakeholders are nearer.

As the study is based on the content of the reports, it is essential to point out that according to the size of the analyzed corpus, our sample is appropriate compared to previous valid studies (Kaity & Balakrishnan, 2019; Kocon et al., 2019; Ortiz Martinez & Crowther, 2007; Ortiz-Martínez & Marín-Hernández, 2020). Therefore, the results describe the European SMEs when disclosing sustainability reports and extend to an international landscape as the reports are elaborated according to GRI.

These results show that these reports are still scarce, and the majority are not integrated. They comply with GRI standards at the core level and are not externally verified. Furthermore, our lexical analysis finds differences in the length of the reports, which are positively related to the company size, as is proved by the background (Atiase, 1987). Although the different significant size for differences between reports is in the smallest quartile, the smallest companies elaborate the briefest reports saving resources (Cantele & Zardini, 2020). There are no significant differences in the report length in the other size quartiles.

Nevertheless, after analyzing the words, it is obtained that reports of large-sized companies repeat words more than those of smaller sizes. However, the most frequently used words are very similar regardless of size. Finally, from the analysis of the concordance, it can be checked that there are no essential differences according to size because the words used in these non-financial aspects are standardized (Myšková & Hájek, 2018). Therefore, these results are coincident with Ortiz-Martínez et al. (2023) when they assure that regardless of the specific characteristics derived from the size of the company and the possible lack of resources, the results for SMEs align with those of large companies obtained by previous studies when analyzing sustainability practices and their disclosure. However, in our case, we studied from the angle of lexical analysis. In conclusion, these results lead us to think that although bigger companies try to elaborate a lengthier sustainability report, it is similar to the one issued by the smallest companies, which is the briefest, but with the same core content. Although the reports are different at first sight, and their content depends on company size, the lexical analysis allows us to remark that there is some template for writing these sustainability reports, which all companies use. Perhaps filling in this template is perceived as a question of image (Hardy, 2013).

There are important additional implications of these results. Theoretically, the knowledge of the content of sustainability reports by SMEs is essential to develop the European Sustainability Reporting Standards specifically focused on listed SMEs and even when the convenience of voluntary standards for non-listed companies is being discussed. Furthermore, even the vertiginous process of European sustainability standardization has been the international trigger for improving and updating the existing standards and frameworks. Thus, the existence of these trends in the use of GRI can be internationally borne in mind.

The existence of a template used by SMEs, as has been obtained in this paper, when publishing these reports according to GRI, can be a helpful benchmark and departure point. The evidence about the type of report SMEs are filling is a clue to improve sustainability disclosure and standards because sustainability disclosure should not be a burden for small companies lacking resources. Disclosure of sustainability reports is also becoming essential for SMEs as they are the backbone of the economy and play an essential role in the global goal of sustainability. However, for these companies, this should be proportionate. These implications are theoretical, to be used for the standards setter, and practical, as the standardization of sustainability reporting focused on SMEs will critically affect the majority of enterprises and their advisors in this field. Non-listed SMEs are not legally obliged to disclose sustainability reports. However, they are indirectly affected as they can not avoid being required by banks, large companies, or the public sector to facilitate this information. Moreover, if not considering the current practices, these changes will affect the economy and create costs for SMEs.

Therefore, the implications for future research are broad, as the study of sustainability reporting by SMEs is still scarce and not only in its format but also in its content. Although it is challenging to compile databases of sustainability reports voluntarily disclosed by SMEs, and they are mainly the largest ones that disclose this information, this study is another step to cope with the challenge of answering the necessities of SMEs. Hence, one limitation of this paper could be the number of sustainability reports voluntarily published in English according to GRI by European SMEs that will have to be solved in future research, increasing the sample over a more extended period and, for example, including more companies from other continents.