1.1 Problem

News articles serve as a highly relevant source for information on current events and salient political issues [61, 249, 365]. How an event or issue is covered in the news can decisively impact public debates and affect our collective decision-making. What if news as an essential source of information is biased? This is a highly relevant question since news coverage is very likely to be all but strictly “neutral.” News is meant to put facts into context and assess events’ implications. It is thus expected or even desirable for news to be biased.

News not being objective is not problematic per se [384]—as long as news consumers among the general public are aware of present biases [64]. Often, this is not the case. Studies have found severely harmful effects of slanted news coverage, e.g., on collective decision-making, public opinion, and policy decisions, such as in democratic elections [237, 259]. Thus, empowering news consumers to assess coverage critically is essential to address the issues caused by media bias. Media literacy is an appropriate means for critical news assessment and balanced interaction with the media. However, while such non-technical means can be highly effective, they require high effort, e.g., to research an event’s articles and contrast their coverage. The high effort may represent an insurmountable barrier, preventing critical assessment in daily news consumption.

To give a practical idea of media bias,Footnote 1 Table 1.1 shows excerpts of two related articles from 2003 that illustrate the subtle influence of how individual news events are reported. While both excerpts describe the same event, they portray different perspectives, commonly referred to as (political) frames [79].Footnote 2 Differences in the descriptions arise from including or excluding “facts” and using different words to refer to the actors, actions, objects, and reasons. In Table 1.1, The New York Times framed the Iraqi military as an aggressor threatening (peaceful) surveillance planes, while USA Today omitted the existence of Iraqi fighter jets and vaguely justified the withdrawal of the planes “for safety reasons.” Beyond this example, the media used different overall frames for their coverage of the war itself: while Western media reported on the “War in Iraq,” Iraqi news referred to it as the “War on Iraq.” In sum, in the example depicted in Table 1.1, framing is achieved primarily by two forms of bias: word choice and fact selection.Footnote 3

Table 1.1 Two news excerpts with different, almost opposing perspectives (also called frames) on the same event due to word choice and fact selection. Adapted from [363]

Adding to the complexity of bias and the diversity of its forms and resulting framing, the perception of topics can be altered through various means besides content, language, or generally text [14, 69, 146, 147, 276]. Other means, such as image selection, can similarly affect how news consumers perceive an event. For example, news outlets can choose different photos in their articles that show the same event and persons but in a different context or overall mood. In turn, readers’ perception of the topic likely differs strongly depending on which article they read or picture they viewed.

While differences such as in word choice and selection of facts or photos are easy to notice when contrasting such opposing examples, spotting the resulting slants is very difficult or nearly impossible during daily news consumption. For example, newsreaders typically rely on only one or few similarly slanted news sources. Even if they are willing and trained to critically assess the news, researching and contrasting articles and facts causes strenuous effort. The high effort may prevent newsreaders from applying such effective yet cumbersome means routinely. Framing may then influence how we perceive specific information and assess events.

The potential negative effects of systematically biased media coverage, especially on policy issues, are manifold and can hardly be overestimated (cf. [24]). For example, a 2003 survey analyzed differences in news coverage on the Iraq War and corresponding perceptions of news consumers [71]. Fox News viewers were most misinformed: over 40% thought that weapons of mass destruction had been found in Iraq, which had been used by the US government as a justification for the war. Another study found a strong influence of news coverage on asylum policy decisions—even stronger than the impact of cultural or economic factors [184].

The problem of slanted coverage is further amplified since most people only rely on a few news sources [128], and news outlets are influenced by other media [379], often few central news agencies [33]. This is compounded by the fact that only a few corporations control large parts of the media landscape in many countries. In Germany, for example, only five corporations control more than half of the media [189], and in the USA, only six corporations control 90% [40, 318]. Further adding to the severity of media bias are recent trends in news production and consumption. More news is spread and consumed on other channels where news authors might more often disregard journalistic standards. Examples include social media channels, such as Facebook and Twitter, and alternative news portals, such as Breitbart.

In extreme cases, “fake news” may intentionally present entirely fabricated facts to manipulate public opinion toward a given topic, e.g., during the US presidential elections of 2016 [219]. While fake news is not systematically different from other types of biased news coverage, it represents the end of the spectrum insofar as biased news coverage may give way to gross distortion of facts or outright factually incorrect information. As in the example of Fox News viewers cited previously, the general population then ultimately no longer holds the same views on whether or not certain events have actually transpired. In the remainder of this thesis, the term media bias also entails the extreme but methodologically identical biases and their forms as they occur in fake news.

In sum, systematically biased coverage is a highly relevant and current issue. Biased coverage can decisively alter public opinion and poses severe societal challenges [9, 190, 237]. Empowering news consumers to critically assess coverage is essential, especially on policy issues. Albeit media literacy represents an effective means to a more balanced interaction with the media, it causes strenuous effort. Ultimately, this high effort prevents critical assessment of media during daily news consumption.

1.2 Research Gap

Enabling the comparison of substantially different perspective present in news coverage, such as shown in Table 1.1, would facilitate bias-sensitive consumption and assessment of the news. However, identifying such perspectives is currently only possible with strenuous effort using manual techniques since automated approaches cannot reliably detect them.

In computer science and related fields, media bias is a rather young research topic. Albeit technically more advanced, automated approaches tend to employ simpler models and methodology compared to manual bias analysis. Compared to the opposing views in Table 1.1, automated approaches find perspectives in event coverage that are technically different but often do not represent frames, i.e., meaningfully different perspectives. Ultimately, the approaches cannot enable news consumers to assess the news since the perspectives they identify critically may often be inconclusive, incomplete, or superficial. One key reason for their mixed results is that current approaches analyze—or generally treat—bias as an only vaguely defined concept, such as

“subtle differences” [ 211 ],

“differences of coverage” [ 278 ],

“diverse opinions” [ 251 ], or

“topic diversity” [ 252 ].

The shortcomings of automated approaches and their non-optimal results become apparent when comparing the approaches to research in the social sciences. There, decade-long research on media bias has resulted in comprehensive models to describe individual bias forms and effective methods to analyze them. Using established tools such as content or frame analysis, researchers in the social sciences can detect and quantify powerful and difficult-to-detect bias forms [155, 368]. For example, the data-driven analyses determine substantial frames by identifying in-text means (also called framing devices) from which these frames emerge. However, such analyses are conducted mostly manually, require much expertise, cause high cost, and can only be conducted for few topics in the past [155, 260].

In sum, critically assessing news coverage on policy issues is a crucial means to mitigate media bias’s adverse effects. However, while reliable techniques for bias and frame identification are available, they cannot be used during daily consumption due to their high effort and required expertise. In contrast, scalable methods for automated data analysis, such as in natural language processing, are available. Current automated approaches to reveal biases, however, predominantly suffer from superficial results, especially when compared to social science research results. In our view, only an interdisciplinary approach can support newsreaders in critically assessing coverage during daily news consumption.

1.3 Research Question

I take the previously identified research gap as a motivation to define the following research question for my doctoral research:

How can an automated approach identify relevant frames in news articles reporting on a political event and then communicate the identified frames to non-expert news consumers to effectively reveal biases?

I intentionally define the research question rather openly to better reflect—compared to prior work in the social sciences—the recency and the relatively young state of the art in computer science. In the course of this thesis, I use the findings of the first interdisciplinary literature review on media bias (Chap. 2) to narrow down the research question to a specific research objective (Sect. 3.3.3). I then propose person-oriented framing analysis (PFA) to tackle the research objective (Sect. 3.4). In the evaluation of the approach using a prototype (Chap. 6) and the conclusion of this thesis (Chap. 7), I discuss the suitability of the conducted research not only concerning the specific research objective but also in the context of the broader research question.

Albeit the research summarized in this thesis focuses on news articles, the PFA approach can conceptually be applied to any news domain, source, and genre that focuses on persons and adheres to grammar and other linguistic rules. This includes alternative news channels, such as Breitbart, which I also include in the evaluation (Chap. 6). Albeit the PFA approach is conceptually language independent, I develop methods for bias analysis in English news articles. By using and devising methods for natural language understanding of the English language, I can demonstrate the best possible performance of the PFA approach.

To address the research question, I define the following research tasks:

RT 1 :

Identify the strengths and weaknesses of manual and automated methods used to identify and communicate media bias and its forms.

RT 2 :

Devise a bias identification approach that addresses the identified weaknesses of current bias identification approaches.

RT 3 :

Develop methods for the devised approach and evaluate their technical performance.

RT 4 :

Implement a prototype of a bias identification and communication system that employs the developed methods to reveal biases in real-world news coverage to non-expert news consumers.

RT 5 :

Evaluate the approach’s effectiveness in revealing biases by testing the implemented prototype in a user study.

1.4 Thesis

This section gives an overview of the thesis at hand. Section 1.4.1 presents the structure of the thesis and its scientific contributions. Afterward, Sect. 1.4.2 introduces the peer-reviewed publications that this thesis summarizes and states how they are cited.

1.4.1 Structure and Scientific Contributions

Reading this thesis and its chapters in the provided order gives, in my opinion, the most intuitive access to the research summarized in this thesis. At the same time, each chapter is written to be understood without reading the other chapters first. Summaries of information presented in other chapters serve the purpose of good readability of individual chapters without readers having to follow cross-references often. Of course, in addition to these summaries, readers are provided with cross-references to the respective parts of the thesis where they can find more detailed information.

Chapter 1 presents a few of the severe problems caused by slanted news coverage and identifies the research gap that motivated the research described in this thesis. The chapter also introduces the research question that guided the summarized research.

Chapter 2 discusses manual analysis concepts and exemplary studies from the social sciences and automated approaches, mostly from computer science, to analyze and reveal media bias. Either of the disciplines uses distinctive terminology, and each has fundamentally different objectives and approaches. Thus, the chapter first establishes a shared conceptual understanding by mapping the state of the art from the social sciences to a framework that computer science approaches can target. In sum, the chapter identifies the strengths and weaknesses of current approaches for identifying and revealing media bias by presenting the first interdisciplinary literature review on the topic.

  • Addressed research task: RT 1

  • The publications summarized in Chap. 2 are [123, 126]

Chapter 3 discusses the solution design space to address the identified research gap. Then, the chapter devises person-oriented framing analysis (PFA), our approach to identify substantial frames and to reveal slanted news coverage. PFA aims to detect groups of articles that report similarly on an event, i.e., that frame the event similarly, by determining how each article portrays the persons involved in the event.

  • Addressed research task: RT 2

  • The publications summarized in Chap. 3 are [134, 136]

  • Further publications relevant for the research described in this chapter, e.g., which report on earlier or preliminary results leading to the results described in the thesis, are [123, 137, 138]

Chapter 4 introduces target concept analysis (TCA), the first component of PFA after natural language preprocessing. Target concept analysis seeks to identify phrases that may be subject to specific biases in a set of news articles reporting on an event. Among others, the chapter introduces the first method for context-driven cross-document coreference resolution. In contrast to prior work, the method is capable of resolving highly topic- and event-specific coreferences that may even be antonyms in general, such as “coalition forces” and “invading forces.”

  • Addressed research task: RT 3

  • The publications summarized in Chap. 4 are [124, 130]

  • Further publications relevant for the research described in this chapter are [131,132,133]

Chapter 5 presents frame identification, the second component of PFA. Conceptually, this component seeks to identify the person-oriented framing of individual articles. To approximate person-oriented framing, the method determines how individual sentences portray persons involved in the analyzed news event. Most importantly, the chapter introduces the first large-scale dataset and a novel model for target-dependent sentiment classification (TSC) in the news domain.

  • Addressed research task: RT 3

  • The publications summarized in Chap. 5 are [125, 127, 130]

  • A further publication, which reports on earlier results leading to the results described in the thesis, is [131]

Chapter 6 introduces Newsalyze, our prototype system to reveal biases to non-expert news consumers by using the PFA approach. The chapter first devises visualizations aimed to be intuitive and easy-to-use. The prototype system then integrates the visualizations and the methods devised in the previous chapters. In a large-scale user study, the PFA approach effectively increases bias-awareness in study participants. The prototype reveals substantial biases present in the news coverage, which in part could previously only be identified through manual frame analysis.

  • Addressed research tasks: RT 4 and RT 5

  • The publications summarized in Chap. 6 are [134, 137]

  • Chapter 7 summarizes the thesis and discusses the strengths and weaknesses of our research to derive ideas for future research on media bias.

Key Contributions

In sum, this thesis makes the following key contributions:

  1. 1.

    It presents the first interdisciplinary literature review on media bias combining expertise from computer science, the social sciences, and other disciplines relevant to analyzing media bias.

  2. 2.

    It proposes person-oriented framing analysis, an approach to identify and reveal meaningful frames, rather than only facilitating the visibility of potential perspectives.

  3. 3.

    It proposes a novel task named context-driven cross-document coreference resolution. This task aims to identify and resolve highly context-dependent coreferences as they occur frequently in slanted coverage. The thesis devises a dataset and method for this novel task.

  4. 4.

    It devises the first dataset and method for target-dependent sentiment classification (TSC) on news articles.

  5. 5.

    It introduces a prototype system including bias-sensitive visualizations to reveal media bias to non-expert news consumers by highlighting news articles’ framing.

  6. 6.

    It presents the results of a user study that approximates real-world news consumption to demonstrate the approach’s effectiveness concerning the change of bias-awareness in respondents.

One key finding of this thesis is that the devised methods and in particular the overview visualizations most effectively help news consumers to become aware of biases present in the news. Moreover, the results suggest that the developed system, Newsalyze, is the first to identify meaningful framing in person-centric coverage. Prior, such framing could only be identified using manual analyses or when already having an extensive understanding of a news topic.

Side Contributions

Individual chapters and sections describe further contributions required for my doctoral research, including approaches for news crawling and information extraction (Sect. 3.5) and main event retrieval from news articles (Sect. 4.2). Another side contribution is the exploratory research on automatically determining how news articles portray individual persons using so-called frame properties that represent topic-independent framing categories (Sect. 5.2).

1.4.2 Publications

To subject my research to the scrutiny of peer review, I have published all major contributions of this thesis in conference proceedings and journals. Four of the publications were honored an award or were nominated for one, two of which are directly relevant to this thesis ([131, 133]) and three of which I am the responsible first author ([129, 131, 133]). More information on the awarded publications can be found in Appendix A.5.

When writing the thesis, I aimed to achieve a trade-off between a well-readable dissertation (rewriting all my peer-reviewed publications) and a thesis following the strictest citation rules (quoting all sections related to a publication). All publications that are directly relevant to the thesis at hand are shown in Table 1.2. These are the origin of the text and other content I use in this thesis. The first column in Table 1.2 indicates which chapter is based on which publications. When using my own publications for this thesis, I copied the content of the publication and adapted words or larger parts, e.g., for consistent wording, to better fit into the overall structure of the thesis or to reflect recent literature and developments that happened since writing the original publication.

Table 1.2 Overview of the core publications describing the research summarized or used in this thesis. The publication types C, J, and W represent conferences, journals, and workshops, respectively

To acknowledge the fellow researchers with whom I published, collaborated, and discussed ideas, I will use “we” instead of “I” in the remainder of this thesis.