Introduction

The mass media hold a significant power to present things in particular ways, primarily through language usage (Fairclough, 1995). This power can influence public opinion and shape individuals’ perspectives on various issues. As a primary source of understanding the world, the media’s power and impact on discourse have clear implications (Talbot, 2007). The current study focuses on news discourse, a genre that is distinct from other forms of discourse in that it has the potential to alter how we view reality. While news reporting is intended to keep the public informed about current affairs, it becomes evident that news representation is not neutral or value-free and that journalistic practices, such as lexical choices, play a pivotal role in influencing the narrative (Fowler, 1991).

Constructing the Grand Ethiopian Renaissance Dam (GERD) on the Blue Nile, starting in 2013, has triggered a huge conflict and intense political confrontation among the Nile basin countries. Both local and international news outlets have played a critical role in this conflict by covering the status of regional diplomacy and the internal political implications of the GERD (Abtew and Dessu, 2019). However, very little research has been conducted on how the media presented the issue. Most studies to date have only looked at the issue from a media perspective, leaving a gap in research on the linguistic analysis of news coverage.

Historically, the Nile water-sharing issue between the Nile Basin states has a long-standing political tension. Egypt has dominated the control of the Nile for decades, but upstream countries have been challenging this dominance. Egypt believes its historical share of the Nile waters should be guaranteed based on the two ancient water agreements signed in 1929 and 1959, while Ethiopia insists on equitable allocation (Lawson, 2017). Ethiopia has challenged the two agreements as unfair colonial-era treaties. Thus, Ethiopia’s launch of the GERD, the largest hydropower project in Africa, challenged the long-established status quo along the Nile.

This paper holds significance due to the importance of the issue, the unique perspective, the methodological approach, and the extended analysis period. Primarily, the study delves into the discourse surrounding the construction of the GERD, a highly important and controversial issue. Second, its significance lies in adopting a critical perspective towards media discourse analysis. This approach seeks to unveil the influence of political and ideological stances on media bias. Additionally, by combining critical discourse analysis (CDA) with the analytical tools of corpus linguistics (CL), the study contributes to the evolving body of work at the intersection of these methodologies. Lastly, the extended timeframe (2013–2020) selected for analysis addresses a gap in the existing literature, where many CDA and linguistic studies on news discourse often offer detailed examinations of specific news items but cover relatively short periods (Carvalho, 2008).

The primary objective of this paper is to compare the lexical choices utilized in the news reports within the two datasets at hand. This will be realized by identifying the highest-salient keywords in each corpus, followed by an in-depth examination of their collocating words, semantic preferences, and the prosodies inferred from these collocations. In addition, the study aims to reveal the ideological implications in the news reporting on the GERD. The study seeks to address the following two research questions:

  1. (1)

    What are the lexical items collocating with the highest statistically salient keywords related to the GERD in Egyptian and Ethiopian newspapers?

  2. (2)

    What does the collocation analysis of keywords reveal about the different stances adopted by the two conflicting countries regarding the GERD issue?

Literature review

Mass media studies of the representation of the GERD

Several studies have explored the media representation of the GERD conflict (Bealy, 2014; El-Tawil, 2018; El Damanhoury, 2023; Sayed El Ahl et al. 2023). Bealy (2014) analyzed the frames used in the coverage of the GERD by The Reporter, an Ethiopian private newspaper. Using content analysis, the study identified six frames: “development”, “national image”, “right”, “victimhood”, “mutual benefit”, and “war”. El-Tawil (2018) examined the framing of the GERD construction as a crisis threatening Egypt’s security. The paper analyzed social media posts and Egyptian online newspapers from September 2017 to Mar 2018 using content analysis and found that the most dominant frames were “conflict” and “problem”, while the theme of “denial” emerged. El Damanhoury (2023) conducted a media study using content analysis to examine the relationship between China, Qatar, and the UK’s proximity to Egypt and Ethiopia. The study analyzed how China Global Television, Al Jazeera English, and BBC framed the GERD conflict between 2019 and 2021. The findings revealed that proximity played a role in the news coverage of the dispute. Sayed El Ahl et al. (2023) conducted a comparative mass media study of online newspapers’ coverage of the GERD issue in 2019. Examining three governmental newspapers, Ahram Online, The Ethiopian Herald, and Sudan News Agency, the study found that each news outlet framed information in a way that served their national interest. Although these studies have identified the emphasized areas in the news coverage of the GERD, a methodological gap exists. Many extant studies rely on qualitative analyses of limited data and overlook critical linguistic analysis for interpreting differences between news outlets.

Corpus-based CDA studies of news representation of the GERD

Few discourse analysis studies, based on corpora, have investigated newspapers’ portrayal of the Grand Ethiopian Renaissance Dam (GERD) using different methodologies. Elieba (2022) compared the lexico-syntactic features within English newspaper discourse in Egypt and Ethiopia from 2011 to 2021. Elieba analyzed content words, sentence types, verb frequency, vocabulary size, nominalizations, keywords, and complement clauses. The study identified significant differences between Egyptian and Ethiopian news reports on GERD but did not employ the CDA framework to discern bias or elucidate ideological implications in lexical choices. Elsoufy and Ibrahim (2022) conducted a corpus-based critical analysis of the coverage of GERD by Ethiopian and Egyptian newspapers. Their study focused mainly on identifying recurring themes and semantic domains without analyzing the lexical selections or keyword collocations. The present research builds on Elsoufy and Ibrahim’s study by conducting a deeper analysis of lexical choices in news discourse to detect bias and examine the discursive strategies that contribute to ideological polarization. This study approaches the topic from a unique perspective that has not been thoroughly explored in existing literature.

Linguistic studies of other discourses on GERD

Some linguistic studies have examined different discourses on the GERD. El Shazly and El Falaky (2023) conducted a cognitive-linguistic analysis of the language of press releases on the GERD. The study has explored the representations of Ethiopia’s and Egypt’s hydro-political stances. Their study collected data from statements by the Egyptian Foreign Minister and the Ethiopian United Nations ambassador. Similarly, Siraw (2023) conducted a qualitative document review and discourse analysis of speeches given by state officials in Egypt and Ethiopia. Siraw analyzed the discourse surrounding Nile water, which often leads to conflicting interests. It is worth noting that none of these studies has analyzed the choices of words employed in the officials’ statements.

To sum up, most existing studies investigated how media outlets in Nile riparian countries have framed the issue of the GERD. However, most of these studies, largely falling within the domain of mass communication, lacked a language-centered approach. Notably, only two studies, Elieba (2022) and Elsoufy and Ibrahim (2022), have conducted a linguistic analysis of news data at the lexical level. Furthermore, the present paper differs from relevant research with regard to context and data sources. For instance, some previous studies focused on how the GERD is covered solely by Ethiopian newspapers over one year (Bealy, 2014). Additionally, the present study derives its data from a genre of media different from that of most extant research. Specifically, some previous studies collected data from social media posts (El-Tawil, 2018); press releases (El Shazly and El Falaky, 2023; Siraw, 2023); and TV news (El Damanhoury, 2023). Finally, only Elsoufy and Ibrahim (2022) have combined the analytical frameworks of CDA and corpus linguistics, while most other studies have relied mainly on purposive sampling techniques. This represents a gap in the literature, as argued by Partington and Marchi (2015), who state that most non-corpus-based discourse analysis studies tend to focus on analyzing a small number of texts.

Theoretical framework

This section offers an overview of the theoretical underpinnings informing the study. It begins with an exploration of the Critical Discourse Analysis (CDA) framework, introducing the concept of ideology as a key factor in the critical analysis of news discourse. Next, the section elucidates Fairclough’s (1995) three-dimensional model as the comprehensive approach to CDA, and van Dijk’s (1998) notion of the ideological square as a model for interpreting ideological implications. Finally, the synergy of CDA and Corpus Linguistics is discussed.

Critical discourse analysis (CDA)

Critical Discourse Analysis (CDA) is an approach to examining language use, taking a critical perspective that involves analyzing how social practices shape discourse (Fairclough and Wodak, 1997). Specifically, within the CDA paradigm, discourse analysis becomes critical when socio-cultural and historical contexts surrounding the events and issues being studied are considered (Fairclough, 1995; Richardson, 2007). CDA involves discovering hidden connections, such as the relationship between ideology, power, social actors, and discursive practices (van Dijk, 1993). In addition, CDA possesses a unique quality of being both a theory and a methodology utilized to examine how individuals and institutions use language. Another crucial principle of CDA is that it does not rely on a singular theoretical framework, drawing upon various approaches to textual analysis. Lastly, CDA offers valuable tools for comprehending how discourse shapes and is shaped by ideologies.

Ideology

In the study of news discourse, the term ideology refers to a group of social beliefs, values, and perspectives that influence how information is interpreted (van Dijk, 2001). Ideology can play a significant role in how news events are presented, highlighting certain aspects while ignoring others. The way different social groups are portrayed in news language can reflect the ideological perspectives of the writers. Scholars such as Fairclough (2003) and Van Dijk (2006) view ideology as a form of power that controls and organizes socially shared beliefs. In this study, I adopt this critical approach to ideology and seek to examine how underlying ideologies can shape the narrative of news stories.

Fairclough’s approach to CDA

This study adopts Fairclough’s (1995) model of CDA as the general critical approach to discourse analysis. This approach integrates social science and linguistics within a unified theoretical and analytical framework, fostering a dialogue between the two disciplines (Chouliaraki and Fairclough, 1999). According to this model, discourse analysis entails more than textual analysis, involving three analytical focuses: (a) linguistic analysis of the text, known as description, (b) analysis of discourse practice, referred to as interpretation, and (c) analysis of the social practice, referred to as explanation. The first level involves describing the linguistic properties of the text, including vocabulary, grammar, and cohesion. The second focus entails interpreting the text by analyzing changes occurring during production, which can be influenced by journalistic practices. According to Fairclough (1995), it is at this level that implicit meanings and ideologies can be uncovered. The third layer involves analyzing various factors, such as economic, political, and cultural contexts. This level of analysis may involve examining the immediate situational context of an event or the broader social and historical contexts in which the event is embedded.

Van Dijk’s ideological square model

The present study also builds on van Dijk’s (1998) notion of ideological square to interpret the findings. Van Dijk introduced a socio-cognitive approach to ideology analysis in CDA. According to this approach, ideologies are the fundamental conceptual structures used by social groups (social dimension) to organize their shared beliefs (cognitive dimension) and actions. Therefore, the discourses of a group reflect its ideologies, which serve the interests of the group. In the present study, the content that is examined for ideological assumptions is the discourse of the two news corpora’s representations of the GERD. At the same time, the social dimension of ideology is considered by revealing the attitudes and stances of the two conflicting countries.

Van Dijk (1998) presented the notion of the ideological square, which explores the “us” versus “them” portrayal. This idea is based on polarization, which is evident in the positive representation of the in-group and the negative representation of the out-group. When it comes to the media, this model exposes how newspapers tend to categorize participants into good and bad sides during a conflict while speaking positively about their group and expressing negative opinions about their opponents or perceived adversaries. Van Dijk (2011) refers to this phenomenon as the complex meta-strategy of the ideological square.

The present study seeks to identify the discursive representation strategies utilized in the datasets. These strategies are applied in discourse to create a division between in-groups and out-groups, establishing the “us” versus “them” dichotomy (van Dijk, 2000, 2011). These include the “attribution of agency” strategy, in which we attribute positive actions to ourselves or our allies and negative actions to others. Another strategy is “dramatization” or “victimization”, which is a way of exaggerating facts to make one’s argument seem stronger (van Dijk, 2000). Additionally, there is the ideological strategy of “description”, which refers to the level of the description of polarized opinions. Utilizing this strategy entails that in-group good actions and others’ bad ones tend to be described in detail at a specific level.

The synergy of corpus linguistics and critical discourse analysis

The present study combines the eclectic framework of CDA and the analytical tools of corpus linguistics. This synergy is defined by Partington et al. (2013) as the set of studies incorporating the use of computerized corpora in their analyses. A corpus is defined as a collection of electronically accessible and readable language samples that represent a language variety, serving as a data source for linguistic research (Sinclair, 2005). In the critical analysis of news discourse, this combination allows researchers to analyze large amounts of data at multiple linguistic levels and identify repeated lexical and grammatical patterns that shape how events are described. These patterns are often loaded with ideological meanings that may not be easily observable manually.

Consequently, using corpus linguistic tools when studying media discourse can help identify new research questions, eliminate research bias, and recognize both linguistic norms and exceptions (Baker, 2006). Finally, incorporating CL tools addresses a major criticism against CDA approaches, particularly the concern of relying on “cherry-picking” data. According to Widdowson (2004), this term refers to the tendency of some CDA researchers to selectively choose small data sets that align with their research objectives and are more likely to produce results that answer their research questions.

Methodology

Data collection

Creating a reliable corpus requires a balanced and representative dataset specific to the studied language variety (McEnery and Baker, 2015). Therefore, the data collection process involved meticulously selecting news articles based on specific criteria such as corpus size, balance, and representativeness. To reflect the targeted discourse type, all the available news articles published between 2013–2020 have been collected, resulting in 2655 articles. Two comparable datasets were created by collecting a roughly equal number of articles for each corpus. The inclusion criteria were limited to hard news reports, excluding opinion articles and editorials for being beyond the scope of the study. Each corpus contained over 500,000 words.

The choice of news reports as the primary data source is deemed appropriate, as it reflects the ideological stances and biases of the news outlets in each country. Choosing the online versions of newspapers was based on their quick and widespread accessibility. Online news mirrors the content of traditional print copies and ensures easy access for readers. Further, using online news to retrieve the present study data provides seamless access to news reports through specialized search engines.

Three newspapers from each country were carefully chosen to build the Egyptian and Ethiopian news corpora. For the Egyptian corpus, selections included Ahram Online, the English-language digital edition of Al-Ahram, Egypt’s largest state-owned newspaper known for extensive circulation and historical significance. Additionally, Egypt Independent, the English version of Al-Masry Al-Youm Daily, and Daily News Egypt, a prominent English-language news website, were incorporated. These two are highly regarded private newspapers in Egypt, offering comprehensive coverage across political, business, and cultural domains. For the Ethiopian corpus, selections comprised The Ethiopian Herald, a state-owned newspaper published by the Ethiopian News Agency, the official news agency of Ethiopia. Additionally, two private newspapers were included. The first is the Reporter Ethiopia, the most circulated English-language newspaper owned by the Ethiopian Media Communications Center. Secondly, Walta Information, a news website owned by Walta Media and Communication Corporate, was integrated. Operating in both Amharic and English, Walta Information covers many topics. Overall, both Egyptian and Ethiopian data sources were chosen for their provision of reliable hard news reports in English, representation of state-owned and private outlets, and accessibility through Factiva, the search engine utilized for retrieving the articles.

The Factiva database was used to retrieve the articles with the query term Grand Ethiopian Renaissance Dam. This database was selected because its interface allowed for quick retrieval of a large number of news articles from various sources. The retrieved news reports were manually reviewed and formatted to remove unnecessary information such as copyright and markup information. For efficient management, files were saved as plain text in a format consisting of the year of publication followed by a serial number.

Corpus analysis software

Two applications were used for corpus analysis in this study. The software utilized for extracting the keywords is Wordsmith Tools (Scott, 2020). The main reason for using this application is that it is a multiplatform with a user-friendly interface that allows for identifying statistically salient and recurring features, such as keywords. Additionally, Sketch Engine (Kilgarriff et al., 2004) software was used to explore collocations and their grammatical patterns. The reason for choosing this application is that it can generate a word sketch for any word, which illustrates how this particular term is used along with its collocations in different grammatical forms, such as a subject, an object, or a modifier. The application also depicts the salience of the node’s collocational behavior using a statistical measure called the LogDice score.

Procedure

Keywords, collocations, and statistical metrics

In corpus-based analysis, keywords play a crucial role. A keyword is a word whose frequency in a corpus is significantly higher than its frequency in a reference corpus (McEnery, 2016; Brezina, 2018). Relevant corpus-analysis applications are used to extract keywords based on certain statistical measures. Commonly used metrics include ‘statistical significance’ and ‘effect size’. The former are beneficial in determining keyness, but they do not reflect the magnitude of frequency differences (Gabrielatos, 2018, p. 9). Their values are also affected by the size of the two corpora being compared. Therefore, to ensure the validity of the statistical significance metrics used in the current study, two corpora of equal size were employed. Conversely, effect size statistics are independent of corpus size and are essential to determine if the frequency difference is meaningful (Gabrielatos and Marchi, 2012; Gabrielatos, 2018). Both metrics offer insights into corpus ‘aboutness’, but effect size statistics are more suitable for critical analysis (Pojanapunya and Watson Todd, 2018, 2021). To obtain more reliable results, Gabrielatos (2018, p. 13–15) suggests that researchers decide on the keywords that will be considered for manual analysis based on the range of effect-size values and corresponding statistical significance levels or evidence against a null hypothesis (via BIC scores), as well as the particular focus of the study.

Following Gabrielatos (2018), three metrics were utilized for calculating keyness in the current study. Firstly, Dunning’s (1993) log-likelihood (LL) ratio, a confidence-based statistical significance test, compares a word’s observed frequency in the target corpus to its expected frequency in a reference corpus, with a p-value (probability value) indicating significance. The p-value score determines whether the difference in relative frequencies is due to chance alone, with a lower p-value suggesting a stronger presence of a word in the corpus and resulting in a fewer number of keywords (Baker, 2006; Brezina, 2018). For the current study, a p-value threshold of p < 0.000001 was set. Additionally, I strengthened the significance of LL by utilizing Bayes Information Criterion (BIC). BIC combines LL and corpus size to provide evidence against the null hypothesis ‘H0’, which suggests no real difference in frequency (Calzada Pérez, 2023). Finally, Hardie’s (2014) Log Ratio, an effect-size binary logarithm of the ratio of relative frequencies, was utilized as the primary metric for sorting the keywords. Figures 1 and 2 display the top keywords from the Egyptian and Ethiopian corpora, ordered by Log Ratio scores. According to Rayson and Garside (2000), Log Likelihood is computed based on a contingency table structure, comparing actual frequencies with expected frequencies under the null hypothesis or the assumption that the distribution of the keyword is the same in both corpora. The expected frequency in each corpus (E1/E2) and Log-likelihood (LL) value are calculated using the following formulas.

Fig. 1: Top keywords in the Egyptian corpus.
figure 1

An output of the Wordsmith Tools application, presenting some of the top keywords extracted from the Egyptian corpus and ordered by the Log-Ratio, LL, and BIC statistical metrics.

Fig. 2: Top keywords in the Ethiopian corpus.
figure 2

An output of the Wordsmith Tools application, presenting some of the top keywords extracted from the Ethiopian corpus and ordered by the Log-Ratio, LL, and BIC statistical metrics.

O1 is the frequency of a word in corpus 1, O2 is the frequency of a word in the corpus 2, N1 is the total number of words in corpus 1, and N2 is the total number of words in corpus 2.

$$E1=\frac{{\rm{N}}1* ({\rm{O}}1+{\rm{O}}2)}{(N1+N2)}$$
$$E2=\frac{{\rm{N}}2* ({\rm{O}}1+{\rm{O}}2)}{(N1+N2)}$$
$${\bf{LL}}=2\ast (\left(\right.{\rm{O}}1\ast \mathrm{ln}({\rm{O}}1/{\rm{E}}1))+\left(\right.{\rm{O}}2\ast \mathrm{ln}({\rm{O}}2/{\rm{E}}2)$$

Two lists of the top 50 keywords were compiled from each corpus, including items that express ‘aboutness’ or the main topic of the corpus, while excluding words that express ‘style’, such as grammatical words, names of newspapers, and days of the week. From there, a shortlist of five statistically salient nodes that were most relevant to the focus of the study was considered for further manual collocational analysis. According to McEnery and Hardie (2012, p. 123), “a collocation is a co-occurrence pattern that exists between two items that frequently occur in proximity to one another but not necessarily adjacently”. This type of analysis is especially useful for linking corpus linguistics with CDA as it helps to understand the words’ contextual meanings. Words are considered in collocation when there is a statistical relationship between them, as they occur together more often than what would be expected by chance (Baker et al., 2013). Sketch Engine utilizes the LogDice statistical metric to calculate the salience of a node’s collocational behavior. Salience is determined based on the frequencies of the node, the collocate, and the collocation within a specific frame. (Baker et al. 2013, p. 37). LogDice is an effect-size computational statistics tool based on the Dice Coefficient, which “measures the strength of association between two words (rather than a hypothesis-testing measure that produces a p-value for statistical significance)” (Baker and Levon, 2016, p. 112). Curran (2004) conducted a thorough evaluation of various collocation methods and concluded that LogDice yielded the best results. (cited in Baker, 2014, p.145). Rychly (2008) further notes that the logDice measure is unaffected by the corpus size and considers solely the frequency of a node and its collocation. As a result, it proves useful for extracting collocations and has been effectively integrated into Sketch Engine. Finally, Baker (2014) argues that the Dice Coefficient favors medium-frequency collocates and is particularly helpful in comparing corpora of small size. In addition, the analysis in the present study focused on identifying the grammatical categories of the collocating words, known as ‘colligations’. This refers to the co-occurrence of words in specific syntactic structures, such as parts of speech. The collocating items have been classified in tables according to their colligational relations and have been ordered by their LogDice saliency scores.

Semantic preference and prosody

The concepts of semantic preference and prosody were integral to the present study’s collocational analysis. Semantic preference refers to the shared semantic categories between a word and its collocates, allowing for a better understanding of meaning by analyzing its commonly associated terms. On the other hand, semantic prosody is the ability of words to establish certain meanings contextually, such as positive, negative, or ironic (Brezina, 2018). Some words tend to convey negative or positive attitudes, and identifying these tendencies in news discourse can provide valuable insights into the ideological meanings behind word choices.

Data analysis

Based on Fairclough’s (1995) and van Dijk’s (1998) approaches to CDA and the analytical tools of the CL, the data analysis was done at the following levels:

  1. 1.

    A textual analysis (description level) of the data was done, following the major techniques of corpus linguistics, namely, keywords and collocations. This included four steps.

  1. a.

    A list of five statistically salient content keywords was extracted from each corpus using the Wordsmith Tools application.

  2. b.

    The Sketch Engine software identified the collocations of each keyword.

  3. c.

    The collocating words were categorized in tables according to their grammatical patterns.

  4. d.

    The collocates were examined to identify their semantic preferences and prosodies.

  1. 2.

    The analysis of discourse practice (interpretation level) was conducted by identifying any possible hidden ideological strategies utilized in the news data. This was done by analyzing the lexical choices made in the data, which could reveal potential biases and reflect the political stance of the sources.

  2. 3.

    Analysis of social practice (explanation level) was done by considering the historical and social contexts of the issue.

Results

Textual analysis (description and interpretation levels)

To answer the research questions, the analysis at the textual level involved identifying the highest salient keywords in each corpus and then providing a detailed analysis of their collocating words. Secondly, the study investigated the discursive strategies of the polarized positive-self and negative-other representation for the discourse practice analysis or the interpretation level. This level of analysis addresses the study’s overall objectives of finding out whether and at what level the representation of GERD varies between the two datasets in hand.

Egyptian news corpus

Extracting statistically salient keywords

Table 1 shows the five keywords selected for further analysis ordered by their effect-size statistical saliency (LogR). The nodes “crisis, fears, interests, and affect” indicate a notable focus on the negative effects of the dam construction on Egypt’s share of the Nile waters. Additionally, the node “negotiations” emerges as the most frequent keyword, suggesting recurrent references in the Egyptian news data to diplomatic actions as a crucial aspect of conflict resolution between Egypt and Ethiopia.

Table 1 Keywords in the Egyptian corpus.

Collocational analysis of the node crisis

Table 2 displays the lexical collocations of the keyword “crisis” grouped by their grammatical relationships and ordered by statistical saliency. Sketch Engine has categorized the word sketch of the collocates into four grammatical patterns.

Table 2 Collocates of the node crisis.

The first group consists of words that function as modifiers of the word crisis in the [modifiers of crisis] structure. This pattern includes pre-modifiers characterizing the dam crisis in terms of its type and severity. Adjectives such as economic, current, acute, years-long, ongoing, real, and various are used to pre-modify the term crisis, with a high LogDice score of 7 or more. Attributive nouns also appear as collocates, identifying the type of crisis referred to in the corpus. For instance, the node is pre-modified by the nouns dam, GERD, food, water, electricity, flood, and Nile (LogDice score higher than 6). The second grammatical pattern, [verbs + crisis (as object)], involves the node as the recipient of various actions. The most statistically salient collocate is the verb resolve (LogDice of 11.31). This verb and other collocating verbs such as solve, end, handle, and settle (LogDice score of 8 and higher) suggest a narrative reflecting Egypt’s determined efforts to handle the crisis. The third grammatical pattern features the term crisis as the subject of certain verbs, describing the node as surrounding, facing, being, and having various characteristics. Finally, the [crisis and/or] pattern connects the crisis with other objects, such as conflict, problem, and result. Notably, the term ‘conflict’ stands out as the top-ranking collocate in this pattern, linking the dam to an ongoing dispute between Egypt and Ethiopia.

In terms of meaning, the node “crisis” has diverse collocates with distinct semantic preferences and prosodies. The first set of collocates relates to GERD-induced crises, including terms like food, water, electricity, political, economic, and internal. Additionally, the collocates Dam and Nile establish a direct association, characterizing GERD as a crisis. These modifiers, referencing various crises, indicate a strong ideological bias and reflect the discourse prosody of threat. They suggest that the Egyptian media narratives on the GERD are constructed by a victim role. This exemplifies van Dijk’s (2000) ideological strategy of “victimization”, emphasizing the distinction between in-groups and out-groups through semantic implications. As van Dijk puts it, when out-groups are negatively depicted and associated with threats, it becomes necessary to represent the in-group as a victim of these threats. The focus on this topic in the Egyptian dataset aligns with Elsoufy and Ibrahim’s (2022) findings that identify “concerns and threats” as the most recurrent theme in Egyptian news reports on the GERD conflict.

Collocational analysis of the node fears

Table 3 illustrates the collocational relationships associated with the node “fears” (both as a noun and a verb).

Table 3 Collocates of the node fears.

The collocates are categorized into three grammatical patterns. The first pattern, [modifiers of fears], includes collocates that describe Egypt’s concerns about the GERD issue, including adjectives like bad, popular, and serious. These collocates have a high statistical saliency score exceeding 6.8. Certain nouns act as pre-modifiers, such as overpopulation with the highest saliency (LogDice score of 12.9), indicating a strong collocation with the word fears. In the second grammatical pattern, [verbs + fears (as object)], the keyword occurs as an object to verbs like voice, raise, dispel, and calm. The verb ‘voice’ has the highest statistically salient association with the node fears (with a statistical score of 12.5). This strong collocation relationship indicates the explicit expression of concerns regarding GERD’s potential negative impact. In the third pattern, [fears (as subject) + verbs], the node functions as an agent of the verbs overwhelm, abound, grow, and come, with a LogDice score higher than 6.

The collocates associated with fears in the Egyptian corpus suggest certain semantic preferences. The phrase “overpopulation fears” appears frequently and carries negative discourse prosody by highlighting the idea of ‘population explosion’ in Egypt as a primary cause of concerns about water security and making the negative depiction of the GERD seem more believable. As van Dijk (1998) suggests, arguments become more credible when supported by a sequence of assertions preceding or following them. Another group of collocates, including the words bad, serious, raise, harbor, increase, overwhelm, abound, face, and grow, indicates a semantic inclination towards triggering and intensifying concerns. These terms convey a sense of frustration when discussing the issue of GERD, which suggests that Egypt perceives the situation with heightened panic. They occur in contexts loaded with ideological bias and employ the strategy of “dramatization.” As van Dijk (2000) explains, dramatization involves intentional exaggeration in favor of the group’s interests by using lexical choices that make the situation seem more intense, depicting the in-group as a victim.

Collocational analysis of the node negotiations

Table 4 displays the numerous collocates associated with this keyword in the Egyptian corpus, each revealing diverse semantic preferences and prosodies.

Table 4 Collocates of the node negotiations.

These collocates follow three distinct grammatical patterns. First, the pattern [modifiers of negotiations] includes adjectives such as US-brokered, tripartite, and intensive, with a statistical saliency of 6 or higher. This pattern also features specific nouns functioning as pre-modifiers to the node negotiations, such as dam and Egypt. Secondly, the grammatical pattern [verbs + negotiations (as object)] assigns the node the role of a goal of several verbs, including resume, continue, stall, and hold, with a LogDice score of 7 or more. Finally, the third grammatical pattern is [negotiations (as subject) + verbs], which involves the node occurring as an actor of a set of verbs, such as fail, reach, begin, and remain, with a LogDice score of 7 and above.

The collocates associated with the term “negotiations” carry a noticeable negative semantic tone. Examples include words like stall, suspend, limit, fail, and falter, which collectively convey a discourse prosody associated with failure. The elevated statistical saliency of this set of collocates implies that these lexical choices intensify the positive “us” and the negative “them” distinction. In alignment with van Dijk’s (1998) polarization strategy of “attribution of agency”, these terms attribute a negative disposition to Ethiopia regarding the negotiations. They occur in the Egyptian corpus in co-texts with “negotiations” within highly ideological contexts, contributing to a narrative that places blame on Ethiopia for the failure of the negotiation process while backgrounding Egypt’s role in hindering it.

Collocational analysis of the node affect

Table 5 shows the collocates associated with the keyword “affect,” categorized into three grammatical patterns.

Table 5 Collocates of the node affect.

The first structure, [objects of affect], refers to the collocates that act as recipients of actions, including terms like share, supply, and security (LogDice score of 6 and above). The second pattern, [subjects of affect], comprises a few collocates that refer to the idea of inducing a change, as observed in expressions like “dam completion affects” and “dam affects.” These collocates have a statistical saliency score of 8 and above. The grammatical structure in this subject-verb colligation emphasizes the discourse of ‘threat’ in the Egyptian news data by assigning the agent function to the GERD while positioning Egypt’s interests as the object of the verb affect. This colligation frames the dam project as a direct threat to Egypt. The final colligation pattern, [modifiers of affect], includes adverbs that pre-modify the node, such as negatively, adversely, severely, indeed, drastically, greatly, and significantly (LogDice score of 7 or higher). These modifier collocates contribute to the depiction of the GERD’s role in causing harm to Egypt.

In terms of semantic meanings, the collocates of the word affect can be categorized into distinct groups, each carrying different semantic preferences and prosodies. The first set of collocating modifiers has a semantic preference for describing the degree or intensity of something. However, this entire set of modifiers carries a negative discourse prosody, suggesting a connotation of causing harm. An exception within this set is the modifier not, which, in association with affect, introduces positive semantic prosody. The second set of collocates includes words related to the semantic feature of identifying the types and outcomes of something. Several of these collocates directly refer to the impact on Egypt’s share of the Nile waters, encompassing terms such as share, supply, water, quota, flow, amount, source, water share, and the Nile.

Collocational analysis of the node interests

The word sketch of the collocates associated with the term “interests” is shown in Table 6. The first pattern, [verbs + interests (as object)], comprises verbs that act on the node as the object, such as harm, safeguard, and protect, with this group holding a statistical saliency score exceeding 7. The second pattern, [modifiers of interests], features terms that describe the interests, including the adjectives common, mutual, national, strategic, Egyptian, and Sudanese (with a LogDice score of 7 or higher). The modifier common is the highest-salient collocate associated with the word interests, with a statistical score of 12.5. The remaining two grammatical patterns are [possessors of interests] and [pronominal possessors of interests]. The collocates under these patterns have a LogDice score exceeding 8.

Table 6 Collocates of the node interests.

In the context of semantic meanings, some collocations like the phrases “harm interests” and “affect interests” bear a discourse prosody that implies threat and reflects the negative representation of the GERD in the Egyptian corpus. Other terms co-occurring with the node have a semantic preference for sharing something, such as common, mutual, national, and joint. Further, collocates such as own, Egyptian Sudanese, country, everyone, Egypt, people, Sudan, Ethiopia, their, our, and its share the semantic feature of ownership. The term common, with the highest statistical saliency score, positively depicts Egypt’s stance in the dispute, suggesting a commitment to the shared interests of all involved parties. This association can be seen as carrying an ideological implication. It aligns with van Dijk’s (2011) ideological discursive strategy of “description”, where several specific and detailed propositions describe the in-group’s good actions. Finally, the terms water, strategic, and security represent the semantic aspect of the specific interests that Egypt seeks to safeguard.

Ethiopian news corpus

Extracting statistically salient keywords

Table 7 shows the five nodes selected from the list of keywords extracted from the Ethiopian corpus. The terms power, utilization, growth, and nation are used in contexts that outline the primary focus of the Ethiopian corpus, centering around the topic of economic development. Further, within the Ethiopian news data, the keyword “negotiation” reflects a recurrent reference to Ethiopia’s diplomatic efforts.

Table 7 Keywords in the Ethiopian corpus.

Collocational analysis of the node power

The collocates associated with the node power are organized into three colligational patterns, as shown in Table 8.

Table 8 Collocates of the node power.

The first pattern, [modifiers of power], includes terms such as electric, hydroelectric, electrical, hydro, and geothermal, all having a LogDice score of 7 or more. The highest statistically salient collocate under this pattern is the modifier “electric” (LogDice of 12.3). The second structure, [nouns modified by power], comprises several nouns with a LogDice score of 7 and above. The third grammatical pattern, [verbs + power (as object)], shows the node as an object of certain verbs (e.g., generate, export, provide, produce, and supply) with a LogDice score of 6 and above. The analysis shows that the node power is mainly used in co-texts with several words in reference to the energy generated from the Nile dam. The high frequency of occurrence and statistical saliency of these collocates indicate a strong emphasis on the topic of development in Ethiopian news data. In context, these occurrences evoke a positive discourse prosody, underlining the beneficial outcomes attributed to the GERD. This is consistent with the findings of several previous studies, such as Elsoufy and Ibrahim (2022), Bealy (2014), and Sayed El Ahl et al. (2023), which also highlighted a recurrent reference in Ethiopian news to the connection between power generation and economic development.

Collocational analysis of the node utilization

The keyword “utilization” appears in co-texts with a few collocates, categorized into three patterns, as shown in Table 9.

Table 9 Collocates of the node utilization.

The first pattern, [modifiers of utilization], includes adjectives and attributing nouns that describe the different uses of the node within the corpus. These modifiers, such as equitable, fair, unfair, water, and Nile, hold high LogDice scores of 7 or more, indicating a significant connection to the node. The second pattern, [verbs + utilization (as object)], comprises verbs like share and recognize, each having a LogDice score of 9 or higher. Finally, the pattern [utilization and/or] highlights collocates like mobilization and development that hold a high statistical score of 8 or more. These collocates typically appear in contexts that refer to using the dam project for economic development.

The occurrence of the word utilization in the corpus is closely linked to certain meanings that reflect significant discourse prosodies. The first group of collocates shares the semantic feature of designating the nature of something, including modifiers like equitable, reasonable, fair, proper, equal, harmonious, rational, and sustainable. The most notable is the term “equitable”, which stands out as the most salient, with a LogDice score of 12.4, indicating a strong collocational relationship with utilization. These collocations are used in the corpus within contexts that convey a strong positive prosody for the optimal use of resources. The ideological implications of the node utilization and its collocating words in the Ethiopian corpus are quite prominent, portraying Ethiopia as the in-group striving to achieve rightful goals in building and operating the dam. On the other hand, the word unfair appears in co-texts with utilization, carrying a negative semantic prosody that denotes the one-sided use of something. This particular collocating pair ideologically refers to the negative representation of Egypt in Ethiopian media news.

Collocational analysis of the node growth

Table 10 shows that the node “growth” has collocational relations with numerous words falling into various semantic categories.

Table 10 Collocates of the node growth.

The node “growth”, along with its collocating words, frequently appears in the corpus, particularly in reference to the topic of economic growth, as portrayed in news reports on the issue. The initial set of collocates occurs within the grammatical pattern of [modifiers of growth], where the node is pre-modified by adjectives and attributive nouns, with LogDice scores ranging between 6.6 and 11.9. The second group of collocates occurs in the pattern of [verbs + growth (as object)], portraying the node as the recipient of specific actions, with the collocates following this pattern having a saliency score of 7 or higher. The final grammatical pattern, [growth and/or…], includes nouns such as plan, development, agriculture, and stability.

The highest statistically salient collocate of the node growth is the word “economic” (LogDice score of 11.9). This collocated pair significantly indicates the primary focus on the topic of economic development. The association between the node growth and the word economic carries a positive semantic prosody, implying well-being and prosperity. Most collocates of the node growth have a semantic preference for identifying the nature and degree of something, such as fast, rapid, real, overall, sustainable, impressive, consistent, remarkable, and current. These words can be interpreted as emphasizing the image of the economic growth that the Ethiopian news media seek to cast upon the GERD project, contributing to a positive discourse prosody.

Collocational analysis of the node nation

Table 11 shows that the keyword “nation” occurs in proximity with several words, indicating diverse semantic prosodies.

Table 11 Collocates of the node nation.

Firstly, the grammatical pattern of [modifiers of nation] includes pre-modifiers, with a high statistical saliency of 7 and above. The second pattern, [verbs + nation (as object)], portrays the keyword as the recipient of actions alongside co-occurring verbs, demonstrating a statistical saliency score of 6 and higher. Next, the node nation appears as the subject of certain verbs in the pattern [verbs with the nation (as subject)], where high statistical scores of 7 and above emphasize the significance of the actions attributed to the nation. Finally, the pattern [nation and/or…] establishes a relationship between the keyword nation and other nouns in the corpus, such as nationality and people, each with a LogDice score exceeding 8.

The collocates of this node display a semantic preference for the defining traits of a nation-state, including terms like populous, dominant, independent, sovereign, nationality, people, and government. These collocates create a positive tone, representing Ethiopia as a powerful and influential country. These lexical choices exemplify van Dijk’s (2000) ideological strategy of “national self-glorification”, where the positive representation of the in-group is routinely implemented by praising the nation’s identity, actions, and attitudes. Certain other collocates, including judgmental adjectives like great, beloved, and prosperous, contribute to portraying the country’s prosperity and positive well-being. In contrast, the term “war-torn” has a negative connotation that implies destruction. This term has appeared frequently in co-text with the node “nation” to portray Ethiopia’s challenging economic circumstances in the past, rationalizing its urgent commitment to completing the GERD project as a prospect for a brighter future.

Collocational analysis of the node negotiation

Table 12 shows that this keyword frequently combines with different words in the Ethiopian corpus, indicating a variety of grammatical and semantic associations.

Table 12 Collocates of the node negotiation.

First, some adjectives and attributive nouns describing negotiation occur in the pattern [modifiers of negotiation], including the words technical, peaceful, Au-led, Dam, and diplomatic, all of which have a statistical saliency of 7.2 or higher. The second pattern, [verbs + negotiation (as object)], comprises verbs paired with negotiation as the object, including resume, continue, undertake, and conduct, with a saliency score of 6.4 or more. Finally, the third pattern, [negotiation (as subject) + verbs], involves the node as the subject of verbs, such as fail, center, and require, each with a score exceeding 8.5.

The collocates of “negotiation” in the Ethiopian corpus convey various semantic concepts. Some of these words refer to the parties involved in the negotiation, like trilateral, tripartite, AU-led, Washington-led, and US-led. Others identify the nature of the negotiation, including modifiers such as technical, peaceful, and diplomatic. Certain collocating words, such as complete, recommence, facilitate, and finalize, occur in the corpus within contexts that suggest a positive semantic tone that emphasizes Ethiopia’s role in pursuing diplomatic solutions and refutes Egypt’s accusations of acting unilaterally. Other collocates like “fail” occur in contexts suggesting that the failure of the negotiation is due to Egypt’s practices that solely prioritize its own national interests.

Analysis of social practice (explanation level)

Taking into consideration the historical, political, and social background of the investigated issue is of great importance to the analysis process. Over the years, Egypt and Ethiopia have employed hegemonic and counter-hegemonic strategies, revealing the longstanding dispute between the two countries (Nasr and Neef, 2016). This inherent dispute escalated into an ongoing conflict from 2013 to 2020, the period analyzed in this study. The discourses of the news data under investigation reflect these strategies.

This paper discusses a political conflict between two nations concerning the sensitive issue of water scarcity. The historical context surrounding GERD sheds light on the differences in how this issue is portrayed within the study data and offers insight into the biased perspectives presented. However, both Egyptian and Ethiopian media coverage cannot be viewed as entirely objective and accurate representations of the social and historical elements of the GERD crisis. Each side has expressed its own version of reality through media narratives, shaped by their respective perspectives and political stances. Each country has portrayed itself as the more powerful, cooperative, and diplomatic side while presenting the other as the aggressor. In summary, the representation of the GERD in Egyptian newspapers aims to maintain Egypt’s dominance over the Nile. In contrast, Ethiopian newspapers introduce a new discourse that challenges Egypt’s hegemony, positioning Ethiopia as the emerging dominant force.

Acknowledging the potential biases and perspectives inherent in media coverage is crucial. Media portrayals are not neutral reflections of reality but are influenced by editorial common sense and ideologies that shape media narratives. Editorial common sense refers to the implicit assumptions and values that guide editorial decisions. Overall, the news reporting process is intricate and multifaceted, involving various aspects of representation. One of the most crucial aspects of news reporting is the impartial and factual representation of events, which should be free from ideological bias. However, achieving this impartial representation can be challenging, particularly when covering contentious issues such as the Grand Ethiopian Renaissance Dam (GERD) project. These issues involve different parties, each with their own perspectives and biases, making it hard to present a balanced and unbiased view of the situation to the audience.

Discussion

This study seeks to explore potential media bias in the language employed to report on the construction of the GERD in online news articles from Egyptian and Ethiopian newspapers. The investigation employs a corpus-based collocational analysis within the framework of Critical Discourse Analysis (CDA) to address two research questions. Detailed discussions of the findings for each question will follow.

Collocates of the highly statistical-salient keywords

The lexical analysis revealed differences between the two corpora under investigation in response to the first research question. Specifically, the two lists of the extracted keywords comprised different items, except for the node negotiations that appeared in both lists. Furthermore, the analysis of the collocational behaviors of each set of keywords demonstrated differences in the ideas conveyed by the lexical choices. This finding is in line with Elsoufy and Ibrahim’s (2022) research, which revealed that the two sets of salient keywords extracted from their data represented distinct themes and semantic domains.

Within the Egyptian corpus, the list of keywords comprised five nodes: crisis, negotiations, fears, affect, and interests. First, the term “crisis” frequently occurs in the corpus, linked to lexical patterns framing the GERD as a national crisis, assigning blame to Ethiopia, and expressing grave concerns about dam repercussions. This somehow contrasts with El-Tawil’s (2018) findings, where a prevalent theme of “denial” was identified in Egyptian media coverage, portraying Egypt as capable of handling the dam crisis. Additionally, collocates of the keyword crisis were found in grammatical patterns with predominantly negative discourse prosodies, encompassing crisis types (e.g., water, economic, political), degree and intensity (e.g., acute, ongoing, years-long), and causes and outcomes (e.g., cause, escalate, witness, face, make, exacerbate). The term “fears” emerged as one of the most salient keywords in the Egyptian corpus, appearing in collocational patterns like “voiced fears that”, “Egypt fears”, and “overpopulation fears.” Primarily, these collocations reflected the prevailing negative discourse prosody of panic. Further, the keywords “affect” and “interests” occurred frequently and with high statistical saliency. The node “affect” appeared in frequent lexical co-texts indicative of the negative image of the Nile dam project, such as the phrases “affect the flow”, “affect the supply”, and “affect the historical share.” The negative semantic prosodies of this node’s collocates were manifested through its associations with various pre-modifiers, such as severely, greatly, and significantly. In addition, the node “interests” appeared in the corpus in contexts that referred either to Egypt alone or to the three countries of Egypt, Sudan, and Ethiopia. Its collocates also carry both positive and negative discourse prosodies, with collocates like harm and protect indicating negativity, while terms like mutual suggest positive intentions.

The linguistic analysis of the Ethiopian corpus identified five primary keywords: power, nation, utilization, growth, and negotiation. Of all the keywords, “power” had the highest frequency of occurrence. It refers mainly to the energy generated from the dam and is typically linked to terms like hydroelectric, production, supply, and capacity. These terms contribute to the positive portrayal of the GERD in the Ethiopian media as a catalyst for socio-economic progress. This aligns with the research of Elsoufy and Ibrahim (2022), who found that the theme of “economic development” was frequently discussed in Ethiopian news coverage of the GERD. Likewise, “growth” is recurrent in the Ethiopian corpus, often associated with contexts related to development, favorably depicting the dam project. Finally, “utilization” was linked to lexical items related to the topic of fair distribution of the Nile water. These results align with Bealy’s (2014) observations of the recurrent use of themes such as “development”, “right”, and “mutual benefits” in Ethiopian news media.

Finally, the results of the lexical analysis showed that both corpora used the keyword “negotiations” similarly. This term and its associated lexical collocates were frequently used to refer to finding a diplomatic solution. However, each corpus also had instances where this keyword was utilized in contexts expressing opposing perspectives. Specifically, both corpora used this term and its collocates to reference blaming the other side for the negotiations’ failure.

Lexical choices and the political stances of Egypt and Ethiopia

To address the second research question, the study conducted a lexical analysis of the words that commonly appear in proximity to the statistically salient keywords to understand patterns, associations, and the broader contexts in which the terms are used. Overall, the investigation revealed a notable bias and contrasting perspectives, each representing one of the conflicting countries, where narratives were presented selectively to align with their respective interests. These findings are in line with previous studies on media framing of this issue, including studies by Elieba (2022), Elsoufy and Ibrahim (2022), and Sayed El Ahl et al. (2023).

The study found that the Egyptian and Ethiopian news outlets present a clear ideological divide in their coverage of the GERD, with positive self-representation and negative portrayal of the other. Each dataset contained specific lexical choices that embody instances of van Dijk’s (2000, 2011) discursive strategies, particularly the “us” versus “them” dichotomy. For example, the strategies of “victimization”, “polarized description”, and “attribution of agency” were observed, which primarily attribute negative actions to the out-group while downplaying the in-group’s responsibility for similar actions. The Egyptian media seems biased in portraying the dam project as a crisis and depicting Ethiopia as intentionally seeking to harm Egypt in violation of international law. They also place blame for the unresolved political conflict solely on Ethiopia while portraying Egypt as a victim. Conversely, the Ethiopian media portray Egypt as disregarding Ethiopia’s rights to development and unjustly utilizing the Nile for its own benefit. The Ethiopian data also presents bias through the repetitive positive representation of the in-group, which takes the form of national self-glorification.

Conclusion

This paper explored media discourse surrounding the Grand Ethiopian Renaissance Dam (GERD) in depth, revealing notable biases in coverage from both Egyptian and Ethiopian newspapers. In the context of the Egyptian corpus, the collocational patterns show negative discourse prosodies that are linked to fear, shock, and frustration. In contrast, the analysis of collocations in the Ethiopian dataset reveals positive discourse prosodies that foster a sense of consensus among Ethiopians, highlight the positive outcomes of the GERD, and assert Ethiopia’s rights to fair use of the Nile water. The findings also identified certain discursive strategies and lexical items that establish clear in-group and out-group categories marked by polarization. In summary, through lexical choices, Egyptian reports depict the in-group as the powerful side with unquestionable historical rights, while Ethiopian news outlets emphasize the in-group’s good intentions. In contrast, the out-group is negatively portrayed in both Egyptian and Ethiopian reports, described as lacking credibility and taking actions that harm the opposing side.

The paper makes a significant contribution to the literature by providing insights into the representation of GERD’s image in news discourse. Unlike previous studies that primarily focus on media frames, this research delves into lexical choices, using CL techniques to substantiate the findings. Identifying the polarized representations required a deeper investigation of the contexts of the lexical items because the polarization was not based on clear-cut, ideologically biased language. In essence, the study advances our understanding of media representation in the context of the GERD conflict, shedding light on linguistic nuances that contribute to biased portrayals.

Limitations and further research

Several limitations to this study need to be acknowledged. Firstly, although the primary focus of this paper was to investigate media discourse, it did not delve deeply into the media as a discourse genre, lacking an extensive discussion on various dimensions of media representation. Therefore, it would be beneficial for future studies to examine in more detail the influence of newspapers’ standards on the writing process of news stories. Additionally, future research should focus on identifying how journalistic practices and news values are used to judge and measure the newsworthiness of events. It would also be useful to explore the relevance of these practices to the argument of objectivity versus bias in news reporting. Secondly, it is worth noting that the study data was limited to Egyptian news reports written in English. Therefore, it is recommended that more research be conducted to compare Egyptian news reports written in Arabic with other Ethiopian or international reports written in English. Another limitation of the study is that only hard news reports were investigated. Further research should expand its scope to investigate media coverage of this issue in various types of news discourse, including news editorials and opinion articles. It would also be useful to explore other forms of media discourse, like the language of political talk shows.