7.1 Summary

This dissertation proposed a novel, interdisciplinary approach to an open research problem in computer science, computational linguistics, and related disciplines, which is of pressing societal relevance: revealing biases in news articles, including subtle yet powerful bias forms such as source selection, including or omitting information, and word choice. Slanted news coverage, especially on policy issues, can have severely harmful effects, e.g., on public opinion and collective decision-making, such as in democratic elections [237, 259]. Revealing media bias to news consumers can help to mitigate these adverse bias effects and, for example, support news consumers in making more informed choices.

The first interdisciplinary literature review on media bias showed that automated approaches to reveal media bias so far suffer from providing only superficial or inconclusive results, albeit achieving high technical performance. Often, the approaches find technically significant but substantially irrelevant “biases” in news coverage. For example, an in-principle effective means to reveal biases is communicating the different slants present in news coverage to news consumers. However, prior approaches that aim to achieve this find perspectives that are technically different but do not represent meaningfully different perspectives. As a consequence, the approaches cannot effectively reveal biases. One key reason for their mixed results is that prior automated approaches analyze—or generally treat—bias as an only vaguely defined concept, for example, as

“subtle differences” [ 211 ],

“differences of coverage” [ 278 ],

“diverse opinions” [ 251 ], or

“topic diversity” [ 252 ].

The superficial methodology and non-optimal results become apparent when comparing the approaches to research in the social sciences. There, decade-long research on media bias has resulted in models to describe individual bias forms and effective methods to analyze them. The data-driven analyses determine, for example, substantial perspectives (also called frames) by identifying in-text means (also called framing devices) from which these frames empirically emerge. Due to their high effort and required expertise, these and other manual techniques do not reveal biases for the current day, which would be vital during daily news consumption to mitigate the severe bias effects.

To address the shortcomings of automated approaches and manual analyses, we devised person-oriented framing analysis (PFA), a fundamentally different approach to bias identification. Compared to prior automated approaches, PFA does not treat media bias as a single, broadly defined concept but analyzes specific, person-oriented in-text means of bias to detect substantial perspectives indeed present in the analyzed news coverage. Compared to analyses conducted in social science research, PFA does not require manual reading and annotating news articles but automates these tasks entirely.

To automatically identify frames that could previously be reliably identified only using person-oriented, manual frame analyses, we designed PFA to imitate and automate this manual procedure. Specifically, PFA determines how individual news articles that are reporting on a given event portray the individual persons involved in the event. To achieve this, we introduced two components, which resemble tasks usually conducted by human annotators in person-oriented frame analyses. First, target concept analysis identifies in-text mentions of relevant subjects (here persons) that can be targeted by biases. The component then resolves the persons’ mentions across all articles. Second, frame analysis determines how each mention’s context (here sentence) portrays the respective subject. By subsequent clustering of articles that similarly portray the persons involved in the event, PFA is able to detect substantial frames and detect groups of articles having these frames.

To showcase the functionality and results of each component in PFA, we use a running example of news excerpts taken from three articles reporting on an event of the Republican Party’s presidential primaries in 2016. We introduced these three and, in total, eight articles as part of our real-world example to practically demonstrate the research gap in our literature review (Sect. 2.6). Figure 7.1 shows the text excerpts.

Fig. 7.1
figure 1

Excerpts of articles from our real-world example introduced in Sect. 2.6. Each of the three boxes represents one article, and each of the two boxes within represents one paragraph. For simplicity, some paragraphs are skipped. Otherwise, the texts are unchanged, e.g., the order of the paragraphs is identical to the order in the articles, and typos are retained

We researched methods for the respective natural language understanding tasks of either component. For target concept analysis, we devised—among others—a dataset and method for context-driven cross-document coreference resolution in news articles. Coreference resolution aims to identify which mentions refer to which semantic concepts. In the presence of media bias, this is especially difficult and important since different journalists often refer to the same person, action, or other semantic concepts by using terms that are typically not synonymous or may even be contradictory in other contexts. Examples of such highly context-dependent coreferences include “intervene” and “invade” or “invading forces” and “coalition forces.” Related techniques for coreference resolution capably resolve generally valid synonyms and nominal and pronominal coreferences, such as “Biden,” “US president,” and “he.” However, current methods cannot reliably resolve the previously described mentions.

In contrast, our method extracts and resolves such mentions of persons and other semantic concepts across the input set of news articles. In the evaluation, our method achieved high performance for individual persons as analyzed by PFA (F1m = 88.7) and thereby outperforms the state of the art (F1m = 81.9). Figure 7.2 depicts the results of our method when applied to the running example. In addition to the mentions of the debate’s candidates shown in the figure, our method found and resolved more context-dependent mentions, such as “the Republican candidate” (for Ted Cruz) and “the business man” (for Donald Trump).

Fig. 7.2
figure 2

Article excerpts after target concept analysis. Mentions of each candidate are underlined using candidate-specific style. For simplicity, mentions are not underlined of infrequent persons, such as “woman” in the left article, and other concept types our method can resolve, such as actions or objects

For frame analysis, we devised—among others—the first method for target-dependent sentiment classification (TSC) in news articles. While researchers have developed TSC methods for various domains, state-of-the-art methods achieve “useless” [335] classification performance on the news domain (F1m ≤ 44.0). In the established TSC domains, such as product reviews or Twitter posts, authors express sentiment toward a target rather explicitly (“the camera is awesome”). In contrast, interpretation of sentiment in news articles requires a higher level of interpretation. For example, journalists typically express sentiment implicitly, such as by describing actions performed by a target person (“[...] Trump tried to take property for [from] an ‘elderly woman’ [...],” as shown in the left article of the running example). Prior methods for news TSC avoid this difficulty by focusing on cases with explicit connotations, such as direct quotes or readers’ comments. However, by focusing on such infrequent cases, the methods neglect the majority content of an article and perform poorly when applied in real-world applications.

To enable TSC on news articles despite their implicit sentiment connotation and other differences compared to established TSC domains, we created NewsMTSC, the first dataset for news TSC consisting of over 11k labeled sentences. Afterward, we devised the first TSC model for the news domain. Our deep learning model achieves high TSC performance on news articles (F1m = 83.1). The high performance allows for using our model in real-world news coverage as Fig. 7.3 shows for the running example. Our model resolved instances that prior news TSC approaches could resolve due to the sentences’ explicitness (“He’s a good guy [...],” left article; “Marco Rubio is biggest loser,” right article). In contrast to prior approaches, our model also resolved the implicit connotations, such as “The Senator from Florida came into the debate with momentum [...]” (middle article) and “[...] Trump tried to take property for [from] an ‘elderly woman’ [...]” (left article).

Fig. 7.3
figure 3

Article excerpts after target-dependent sentiment classification. The candidates’ mentions are colored depending on their local context’s sentiment concerning the respective candidate: green or red for positive or negative sentiment, respectively. Mentions with neutral sentiment are not colored

Frame clustering completes the PFA approach. PFA employs a clustering technique, such as k-means, to group articles based on how each article portrays the event’s persons. Table 7.1 shows the bias identification results for the eight articles of our real-world example introduced in Sect. 2.6, including the three used in this section’s running example. The second to fourth columns represent prior approaches, both manual and automated. The fifth column (“Frame”) represents the ground truth of framing groups derived using a standard tool in the social sciences, inductive frame analysis. Each cell shows the frame or “perspective” classified by the column’s approach for the row’s article.

Table 7.1 Results of approaches to identify biases in the real-world example. The columns “Headline,” “Political,” “Clustering,” and “Frame” show each article’s central perspective on the event according to the headline’s potential frame, the outlet’s political orientation, an automated clustering technique on word embeddings, and inductive manual frame analysis (ground truth). Please refer to Sect. 2.6 for more information on these approaches. Lastly, “PFA” shows the person-oriented framing groups as identified by the PFA approach. For each approach, the colors of its groups are chosen to maximize congruence with the framing groups of the ground truth. The higher the visual congruence of a column with the “Frame” column, the better

None of the prior approaches yielded perspectives coherent to those of the inductive analysis. Albeit not perfect, only the PFA approach yielded person-oriented frames congruent to the frames derived from the inductive analysis. Of course, this running example cannot serve as an evaluation, nor is it intended to be. However, the results are representative of the weaknesses of prior work (see also Sect. 2.6) and the strengths of the PFA approach, as our evaluation (see also Sect. 6.9) summarized in the following shows. Before, we conclude our running example by introducing our prototype system.

As a proof of concept, we developed Newsalyze. Our prototype for bias identification and communication visualizes the results of the previously described methods. We devised non-expert visualizations designed to aid readers in daily news consumption. First, an overview aims to help users get a synopsis of news events and articles reporting on them. Subsequently, an article view shows a single news article, e.g., of an event of interest. We designed both visualizations to show not only the news content but also communicate biases present in the coverage. For example, established news aggregators show events and, for each event, one or more articles reporting on it. Our overview additionally reveals biases, for example, by showcasing articles of up to three person-oriented frames (part b in Fig. 7.4) and indicating an article’s slant using a label (parts “a” and “b”). Such indicators are intended to help readers decide which articles to read, e.g., since they might offer different information on the event than already read articles.

Fig. 7.4
figure 4

Screenshot of Newsalyze’s news overview of news coverage on the US presidential primaries debate used in our real-world example. Similar to popular news aggregators, at the top, an article representative for the event coverage is shown (part “(a)”) and at the bottom a list of further articles (part “(c)”). Additionally, Newsalyze shows a comparative part (“b”) of up to three articles representative of the identified person-oriented frames. The comparative layout, the selection of these representative articles, and further bias-related information, such as the headline tags in the list of further articles, are intended to aid news consumers in quickly deciding which articles to read by understanding which articles offer which perspective

Albeit the article excerpts shown in the overview do not directly summarize the frames, they typically allow for quickly grasping the essence of each frame. In our real-world example shown in Fig. 7.4, the headlines and lead paragraphs showcased as Perspectives 1–3 (in part “b”) indicate the potential of Trump and other candidates (left article, e.g., “[...] offering hope to rivals. [...] confounding his bid to emerge as Donald Trump’s chief rival [...]”]), the direct event coverage (middle, e.g., “Highlights from New Hampshire”), and the negative framing toward Trump (right, e.g., “Trump calls for a lot worse than waterboarding”). These three perspectives match to the frames of our inductive frame analysis conducted in Sect. 2.6.

To evaluate the effectiveness of revealing biases identified automatically by PFA in a real-world news consumption setting, we conducted a user study (n = 160 respondents). We implemented various baselines, e.g., one represented the bias-agnostic news overview of popular news aggregators, such as Google News. Another baseline resembled the bias-sensitive news aggregator AllSides, which relies on the left-right dichotomy. Specifically, AllSides determines the bias of an article by using its outlets’ political orientation.

Showing the person-oriented frames identified by PFA increased bias-awareness in respondents significantly, strongly, and consistently. In contrast, the baselines increased bias-awareness not significantly or only under conditions. In a single-blind setting, i.e., where respondents did not know which bias identification method was used, only the PFA approaches significantly increased bias-awareness. Their relative effect on bias-awareness compared to the Google News baseline was Est. = 17.5% (p < 0.002). The AllSides baseline could significantly increase bias-awareness only if we revealed to respondents which bias identification method was used. If respondents were not informed, the same baseline achieved an insignificant effect, which was only slightly higher than the effect of a random baseline. In contrast, our PFA approach consistently led to strong and significant increases in bias-awareness. Moreover, respondents benefited from using our visualizations more than once. In the second half of the study, our PFA approach increased the bias-awareness most strongly (single-blind setting, Est. = 26.5 compared to the best baseline Est. = 23.5; if revealed, Est. = 28.1 compared to Est. = 12.4).

The study results and a qualitative investigation of the perspectives yielded by each approach indicated that the PFA approach effectively reveals biases by detecting meaningful perspectives indeed present in the news coverage. In contrast, the prior automated approaches suffered in some cases from their superficial methodology, which facilitates the detection of technically different but substantially irrelevant perspectives. To this extent, our study practically confirms the weaknesses of prior work as highlighted in our literature review.

Using technical means can only be part of a holistic solution to address media bias since it is ultimately news consumers who may be influenced by slanted news coverage. Thus, empowering news consumers to critically assess the news in order to mitigate adverse bias effects is, in our view, the higher-level goal. Effective means to achieve this include educating media literacy to news consumers and strengthening contrasting news formats, such as press review. However, such means typically require high effort. For example, even if news consumers have the skills to assess news coverage critically, researching alleged facts and contrasting event perspectives cause high effort. This tremendous effort may represent an insurmountable barrier to practically apply media literacy practices during daily news consumption. Automated approaches for bias communication can help to reduce the manual effort and thus represent a suitable means to enable critical assessment of coverage in daily news consumption.

Ultimately, the PFA approach can contribute to more informed decision-making. By enabling news consumers to effectively and effortlessly contrast substantial news perspectives, our approach contributes an effective means to support news consumers in critically assessing news coverage. We think that automated approaches for bias identification and communication are essential to enable bias-aware news consumption since only automated approaches can reduce the high manual effort required to contrast and critically assess news coverage.

7.2 Contributions

This section summarizes the contributions of this thesis for each of the research tasks presented in Sect. 1.3.

Research Task RT 1

Identify the strengths and weaknesses of manual and automated methods used to identify and communicate media bias and its forms.

To accomplish RT 1, we performed the first interdisciplinary literature review on media bias and approaches to analyze it as devised in the social sciences, computer science, and related disciplines. The review includes almost 200 research publications and related approaches. We found that automated bias identification approaches proposed so far often yield inconclusive or superficial results, especially compared to the results of decade-long research on the topic in the social sciences.

To facilitate interdisciplinary research on media bias, we established a shared conceptual understanding by mapping the state of the art from the social sciences to a framework, which approaches from computer science can target.

Research Task RT 2

Devise a bias identification approach that addresses the identified weaknesses of current bias identification approaches.

To overcome the deficiency of the current automated approaches for bias identification and communication, we introduced a novel approach named person-oriented framing analysis (PFA). Compared to prior automated approaches, PFA does not treat media bias as a single, broadly defined concept but analyzes in-text features representing specific person-oriented bias forms. To achieve this, PFA roughly resembles the manual process frame analysis as conducted by researchers in the social sciences. In contrast to these analyses, PFA does not require reading or annotating news articles but eliminates these tasks.

As a practical side contribution, we devised a system for crawling and extracting news articles from online outlets. The system can be used before PFA in order to gather news articles of interest. Additionally, the system has proven helpful throughout the research described in the thesis, e.g., to create training and test datasets.Footnote 1

Research Task RT 3

Develop methods for the devised approach and evaluate their technical performance.

To detect person-oriented framing, we devised two analysis components. Our first component aims to identify persons and their mentions across a given set of news articles reporting on an event. Subsequently, our second component aims to determine how the individual persons are portrayed and then groups articles that portray the persons similarly, i.e., that frame the event similarly.

For the first component, we devised a method for context-driven cross-document coreference resolution that is the first to resolve also highly event-specific coreferences as they occur across slanted news articles, e.g., “Mr. Tough Guy” and “John Bolton” [230]. To evaluate this method, we created a test dataset of 50 news articles and 10 events. Each event contains articles by five outlets covering the political spectrum, including left-wing, center, and right-wing. When creating the dataset, we aimed to enable the annotation of also subtle and complex concept types, which the coreference resolution should resolve. To ensure reliable annotation despite the concept types’ complexity, we conducted a manual content analysis as established in the social sciences. Our evaluation showed that our method reliably extracts and resolves individual persons (F1m = 88.7 compared to F1m = 81.9 achieved by the best baseline).

For the second component, we devised a method for target-dependent sentiment classification (TSC) in the domain of news articles. To evaluate this method, we created the first large-scale dataset for TSC in news articles reporting on policy issues. Our dataset consists of over 11k sentences. Each sentence includes at least one person mention and a sentiment label. Using an additional expert annotation on a random subset of the dataset, we confirmed the high quality of our ground truth. Not only increases our TSC method the state-of-the-art classification performance (F1m = 83.1 compared to F1m = 81.7), but our dataset is also the first to even enable TSC on the news domain under real-world conditions. News TSC was previously practically impossible due to the implicit character of sentiment in news articles compared to the established TSC domains.

Side contributions of our initial research to address RT 3 include a system that extracts 5W1H phrases describing the main event of an article, i.e., who did what, when, where, why, and how. Further, we devised and annotated so-called frame properties, which we had planned to use to categorize how sentences portray persons. Afterward, our exploratory research early showed a vital issue of automatically detecting substantial frames or derivatives: very high, initial annotation cost required to create a sufficiently large training dataset. Consequently, we devised the previously mentioned TSC approach to capture the fundamental effects of person-oriented framing, i.e., change of a person’s sentiment polarity.

Research Task RT 4

Implement a prototype of a bias identification and communication system that employs the developed methods to reveal biases in real-world news coverage to non-expert news consumers.

To evaluate PFA in a setting that resembles real-world news consumption, we developed Newsalyze, a system for bias identification and communication. Besides integrating the previously devised bias identification methods into the system, we developed visualizations for non-expert news consumers. First, an overview allows for getting a synopsis of current events quickly. Second, an article view shows a single news article. We designed the visualizations to suit typical news consumption while complementarily showing bias-revealing information, such as the identified person-oriented frames present in the news coverage on a given event.

Research Task RT 5

Evaluate the approach’s effectiveness in revealing biases by testing the implemented prototype in a user study.

To validate our approach’s holistic and practical effectiveness in revealing biases, we conducted a large-scale user study (n = 160) on 30 news articles. Our studies measure the change in respondents’ bias-awareness after exposure to one of our visualizations or baselines. We designed the studies to approximate real-world news consumption. Not only showed the study results that Newsalyze and the PFA approach significantly, consistently, and strongly increased bias-awareness in non-expert news consumers. By employing a conjoint design in our experiments, we were also able to pinpoint the effects of individual components in the visualizations. We used the conjoint design in the pre-studies, among others, to make a founded selection of strong visualization variants for the main study.

In addition to demonstrating the high effectiveness of PFA in a setting that resembles real-world news coverage, the evaluation led to the following conclusions:

  • Not only does PFA increase bias-awareness. Our qualitative investigation of the resulting frames concluded that they are substantial: in contrast to the other approaches, frames detected by PFA in person-centric news coverage are consistently present in the coverage.

  • Target-dependent sentiment classification is a fitting technique for the detection of person-oriented framing. This finding is contrary to the results presented in prior literature that suggested that the course, one-dimensional sentiment scale might not suffice for substantial bias analysis since it might fail to capture fine-grained nuances of framing.

  • Since the PFA approach can detect substantial frames, we think it might be a suitable approach to complement the analyses conducted in social science research on media bias. There are differences between both use cases. For example, researchers in the social sciences pre-define frames for a specific research question and topic, whereas PFA implicitly defines the frames through its analysis and the resulting article groups. Nevertheless, the PFA approach might be readily usable for exploratory research. For example, researchers could inductively use PFA to detect frames in their data. In this scenario, PFA could serve as a replacement for inductive frame analysis, thereby reducing the manual effort in initial research phases.

7.3 Future Work

We intentionally formulated our research question rather openly and broadly to reflect the young state of the art in computer science. This way, our research question expresses the need to investigate how other disciplines define and analyze bias. Such disciplines traditionally include the social sciences, where media bias has been subject to research for decades resulting in comprehensive models to describe and effective methods to analyze it.

How can an automated approach identify relevant frames in news articles reporting on a political event and then communicate the identified frames to non-expert news consumers to effectively reveal biases?

As part of the literature review, we established a shared conceptual understanding by mapping the state of the art from the social sciences to a framework, which computer science approaches can target. Using this framework and the identified weaknesses of prior bias identification and communication approaches, we narrowed our open research question down to a specific research objective (Chap. 3). Our objective focuses on the identification of person-oriented framing:

Devise an approach to reveal substantial biases in English news articles reporting on a given political event by automatically identifying text-based, person-oriented frames and then communicating them to non-expert news consumers. Implement and evaluate the approach and its methods.

Concerning both the broad research question and the specific objective, this section discusses the most important limitations of our research and derives future research ideas to address these limitations.

7.3.1 Context-Driven Cross-Document Coreference Resolution

We devised the first method for context-driven cross-document coreference resolution. Naturally, our method can only serve as the first step in this novel task. Albeit the method’s design in principle allows for using the method outside the scope of PFA, we focused on our research objective when devising and evaluating the method. Our evaluation was not intended to and cannot elucidate how effective our method is when applied to other domains and in other use cases. Before applying our method in other use cases, we propose to conduct a more sophisticated evaluation, following all standards of coreference resolution evaluation. Specifically, we propose to test the method on established datasets for coreference resolution and compare it to a larger set of related methods. We also propose creating a larger annotated dataset and thereby address various minor findings of the current annotations.

A larger dataset would also enable the training of deep learning models for cross-document coreference resolution. Recent models for single-document coreference resolution achieve strongly increased performance compared to earlier traditional methods [390]. We thus expect that such models could achieve higher performance than our method employing hand-crafted rules. However, as described in Sect. 4.3, the creation of a sufficiently large training dataset would cause high annotation cost.

7.3.2 Political Framing and Person-Independent Biases

Detecting political framing directly as opposed to detecting its effects on the framing of persons (in PFA expressed through polarity) is beyond the scope of our objective but relevant to the overall research question. While PFA represents an effective and cost-efficient approach to identify substantial frames, its focus on persons implies that the approach may fail if large parts of the coverage on an event are not person-centric. Because of this expected shortcoming, we had initially attempted to identify topic-independent framing categories. Similar to other researchers’ attempts to define or identify such categories directly, the approach turned out to be impractical. Reasons included the high cost of annotating a large dataset, which is required for training models to classify also nuanced and subtle framing categories.

Table 7.2 shows an example of framing that PFA cannot detect. Our approach would miss the two content-based frames, e.g., “restrictions are necessary” (first article) and “restrictions damage economy” (second article), because they are not directly related to individual persons. Instead, the articles focus on health or social implications versus economic consequences. To identify framing effects in non-person-oriented topics, we propose extending our approach to additional concept types, such as actions, person groups, countries, and objects. Our method for cross-document coreference resolution already resolves these types (Sect. 4.3.4). To classify the sentiment of these new types using target-dependent sentiment classification, we propose to extend the NewsMTSC dataset.

Table 7.2 Two news excerpts with opposing perspectives on the same event. The perspectives result from highlighting specific aspects of the event, here the negative consequences of loosening versus continuing COVID-19-related restrictions in Germany, March 2020. Free translation from [315, 354]

More closely resembling frame analyses as conducted in the social sciences, another idea is to determine frames directly rather than their effects. Key differences of person-oriented frames identified by PFA and political frames as defined by Entman [79] include that PFA defines person-oriented frames implicitly (each frame is defined by its articles) and inductively (frames are derived during the analysis). In contrast, social science researchers define frame for a specific research question before quantifying them in news coverage. In our view, this line of research represents the most refined approach, relying directly on decade-long, established social science research. Recent language models yielded a decisive performance leap in many natural language understanding tasks. However, training these models would cause high cost before they can classify subtle, analysis question-specific, and—viewed from a technical perspective—in part (intentionally) subjective frames (Sect. 6.10).

A pragmatic alternative to political frames might be topic-independent frame derivatives, such as microframes [193], our frame properties (see Sect. 5.2), or frame types [45, 46]. Each of these, however, has their limitations or challenges. Typically, they require manual validation or cause high annotation effort. Finally, active learning might be a suitable method to reduce the annotation cost. In active learning, human annotators need to label only a subset of examples. Typically, this subset consists of those examples the model to be trained is least sure how to classify. An iterative process of (re)training the model, selecting uncertain examples, and manually labeling them is repeated until the trained model achieves a sufficiently high classification performance.

Automated identification of framing effects or political frames directly could also be helpful in social science research. While frame analysis is an established means for analyzing how the media reports events and topics, the manual effort prevents conducting such analysis at scale. An automated approach for frame identification could assist researchers, especially in the early phases of their research. For example, PFA could serve as a tool to enable low-effort inductive exploration of news coverage by revealing which person-oriented frames are present in the data to be analyzed. Once extended to identify frames independent of persons or generally political frames, our approach could be helpful also in later phases of frame analyses.

Lastly, we propose to inspect the devised methods for biases they may have inherited from their training or fine-tuning data. We inspected both the annotated data and the model’s predictions for structural biases indirectly, e.g., as part of the expert validation and manual error analysis. However, we did not directly probe the models and results for biases. This is advised, for example, because language models can be prone to gender-induced or other bias-related prediction errors due to their pre-training data [320]. Likewise, our fine-tuning data and the other datasets may contain structural biases despite the implemented quality measures. For example, the expert validation may entail biases since all experts were influenced by Western culture. Means to probe for biases in language models are already discussed in the literature (cf. [223]).

7.3.3 Bias Identification and Communication

This section uses the findings of our PFA evaluation to discuss both conceptual and technical means to address these limitations of the PFA approach and its evaluation.

In our view, an important limitation of the experiments and results is their generalizability, e.g., due to the study’s design, its deployment, and the samples of respondents and events. For example, we measured respondents’ bias-awareness using a set of questions. A future study could measure bias-awareness more directly, e.g., by observing respondents’ news consumption behavior over a longer time frame. Like prior work, such a study would assume that effective bias communication encourages news consumers to view and compare more articles than bias-agnostic visualizations.

We also propose to sample respondents from other countries than the USA alone. Currently, we cannot generalize the study’s findings to other countries’ populations since political and media landscapes differ across countries. For example, while in the USA, the two-party system may lead to more polarizing news coverage, countries with multi-party systems typically have more diversified media landscapes [395]. We expect that these differences affect bias-awareness in general and thus also the effectiveness of our approach. Enabling studies in other countries requires extending the PFA approach to analyze news articles written in other languages. Ideas range from devising language-specific methods, where we expect overall high research effort and high-quality analysis results, to using neural machine translation before the PFA approach, which would require investigating the stability of task-specific linguistic properties, e.g., whether the subtle nuances of word choices are translated well.

The generalizability could also be improved by diversifying the event sample, which consisted in our study of 3 events and 30 articles. Besides systematically adding more events and articles, another future work idea is to randomly sample events and articles. While a random sample would reduce data selection biases, it would also require a larger respondent sample to compensate the noise introduced by the increased content diversity.

To further diversify the event sample, we propose investigating which technical changes are required to enable the PFA approach to analyze news from other sources than online newspapers. While we already included a few alternative outlets in our study, such as Breitbart, especially the news published on the increasingly consumed social media channels is different from news articles, such as concerning the texts’ lengths and writing styles. We expect that the domains’ differences require adapting the PFA approach. As a visionary outlook to broaden the generalizability of bias-awareness studies further, future approaches could aim to identify and reveal biases targeting not only persons. Ideas for extending the PFA approach range from analyzing other target concept types or political framing (as outlined previously in Sect. 7.3.2) to analyzing other forms of biases, such as through picture selection (Sect. 2.2.3).

We also propose to create a framing ground truth dataset using manual frame analysis. Such a dataset would enable measuring the influence of the articles’ content and frames on respondents’ bias-awareness. For example, by relating articles’ content and frames with respondents’ attitude toward these frames, the so-called hostile media effect could be investigated and how it affects the visualizations’ effectiveness. Additionally, a framing ground truth would enable technical evaluation of the frame classification performance of approaches.

Our study showed effects of the PFA approach and the bias-sensitive overview but overall no effects of the article visualization and respondents’ demographic factors. In the article view, only one visual clue conditionally increased bias-awareness.Footnote 2 Besides, the article view suffered from minor user experience (UX) issues and a conceptual issue, which lies in the relativity of bias. Specifically, to reveal bias, our article view should more prominently communicate frames of articles other than the viewed article. A method that could directly be integrated into our approach is to show summaries of other articles or their frames. Another future work idea is to enable users to contrast “facts” across articles, e.g., whether and how the facts of the currently viewed article are stated in other articles. To enable the mapping of such facts, methods for semantic textual similarity (STS) could be used. We already proposed exploratory systems for this line of research, but preliminary results indicate the difficulty of this approach [77, 139].

The respondent sample was too small to yield significant effects in the article visualization and respondents’ attributes. On the one hand, the distributions of respondents’ attributes, such as gender, political orientation, and education level, roughly approximated the US distribution. However, on the other hand, some distributions, such as of the education level, were thus imbalanced and contained too few observations in individual attribute levels, which prevented statistical analysis of these. Increasing the respondent sample would enable statistical analysis of the respondents’ attributes. As discussed previously in this section, a larger respondent sample would also facilitate statistical soundness regarding the effect of the article view by compensating the article view’s high content diversity.

7.3.4 Societal Implications

I conclude this thesis with an open outlook on the societal implications of using PFA and other automated approaches as part of daily news consumption. Understanding such societal implications is traditionally beyond the scope of computer science, and this dissertation cannot elucidate these implications. However, I intend to raise essential questions that could serve as a foundation for further, in-depth research across the topic’s involved disciplines, especially in the social sciences.

  • What are the causal relations of news consumption, readers’ event assessment, and societal decisions? How can approaches for bias identification and communication sustainably support collective decision-making and other societal processes?

  • Besides the previously mentioned model biases, what are other real-world pitfalls of using such automated approaches, and how can we prevent these pitfalls? For example, from all the outlets available, only a subset can be analyzed timely. Who could decide which sources automated approaches use for their bias analysis and subsequent visualization to news consumers? Which criteria are important for such selection?

The research described in this thesis highlights the effectiveness of taking an interdisciplinary approach to tackle the adverse issues caused by media bias. At the same time, the previously raised questions highlight the need for further interdisciplinary research on media bias. Not only is there a pressing need to do so, but now is also a perfect point in time. For example, news articles, representing the base of much research on media bias, are readily available in large quantities since much news is published online. Additionally, the rise of deep learning and recent language models has led to unprecedented advancements in natural language understanding. These advancements can and have enabled automated analysis of tasks that were previously difficult or nearly impossible to fulfill. This thesis, for example, devised the first model capable of classifying the subtle and implicit sentiment connotations common to news articles. I hope that with the shared conceptual understanding devised in the literature review and the proposed PFA approach, this thesis facilitates further interdisciplinary research on media bias.

My vision is that bias identification approaches, such as PFA, are integrated into popular news aggregators and news applications to enable bias-sensitive news consumption at scale. Enabling news consumers to critically assess and contrast news coverage can help mitigate the severe effects of systematically biased news coverage. Supporting individuals in making more informed choices is crucial in collective decision-making, such as democratic elections. In times of misinformation campaigns, “fake news,” and other means to intentionally alter the public discourse, bias-sensitive news consumption is of unprecedented relevance. The PFA approach contributes to the required media literacy during daily news consumption by revealing substantial biases without tedious effort.