Keywords

1 Introduction

Data-driven storytelling embeds data into a narration and usually combines a textual representation with visualizations (Segel and Heer 2010). While magazine-style stories might be most common, there exist other presentation genres like animations and slide shows, comics, or annotated charts (Segel and Heer 2010). The power of data-driven storytelling lies in making data accessible to a wider audience, by guiding through the analysis insights while inviting users to engage through simple interactions. Textual and visual data descriptions complement each other and, together, form an integrated representation that is both easy to follow and rich of information. For instance, a story on flight data might link a globe or map that shows flight trajectories with explanations on busy routes and airports but also could provide insights into specific examples like the longest flights.

As an expressive and interactive medium, data-driven stories fit volunteered geographic information. For data that is volunteered by the public, it is a natural choice to make that data also available and understandable for a broad audience. Figure 7.1 illustrates this concept as closing a circle. Public groups or individuals volunteer geographic information that is then made available on open data platforms (I). While experts have already analyzed such data in various application scenarios (II), we contribute methods for bringing the derived insights back to the general public and decision-makers. From the open data itself (III.a) and insights of the experts (III.b), we support authoring and automatically generating reports that can be understood by this broad group of users (IV). Computer support and automation are necessary to efficiently create summaries of varying data and to allow personalized reporting. With these reporting solutions, we intend to facilitate non-experts users and foster a dialog between the public, decision-makers, and experts (V). This cycle aligns with endeavors of others to make volunteered geographic data directly usable to the public, for instance, within project IDEAL-VGI (Chap. 2).

Fig. 7.1
A cycle diagram denotes the process between the open data platform, experts, interactive reports, and the general public. It denotes algorithm and visual data analysis, report generation, understanding of data analysis, and volunteering geographic information in their respective sequences.

Conceptual diagram of the applied analysis cycle; steps III in gray mark previous research and results; steps IIIIV as black arrows represent the focus of our research, which is intended to facilitate a joint dialog between stakeholders (V)

The challenges of this research are in the identification and selection of relevant insights, as well as in their reporting as integrated visual and textual representations. The produced reports should furthermore invite the readers and users to explore the data. We have approached these challenges by first studying the interplay of text and visualization in existing examples of geographic data-driven stories (Sect. 7.2). Then, we investigated solutions that help author reports with close linking between the two representations (Sect. 7.3). Finally, designing automatically generated reports allowed us to provide, aside to a data-driven story, solutions with certain support for exploration (Sect. 7.4). While geodata plays a role in all presented research, we also consider the visualization of additional, non-geographic data.

This chapter describes the results of the project vgiReports and summarizes as well as connects an excerpt of project-related publications (Latif et al. 2021a,b, 2022a,b) and a preliminary work (Latif and Beck 2019a). Two of these works (Latif et al. 2021a,b) report results from collaborations with other projects of the priority program.

2 The Interplay of Text and Visualization

The way textual and visual descriptions are combined is crucial in data-driven stories. If integrated well, it could avoid a split attention effect between the two media (Ayres and Sweller 2005) and might even help identify misaligning information (Zheng and Ma 2022). Interactive linking of text and visualization can increase user engagement (Zhi et al. 2019) and guide user attention while supporting specifically less experienced users in correctly mapping the text and data (Barral et al. 2021).

Journalistic outlets provide many high-quality, manually crafted examples of data-driven stories. For instance, The New York Times has published in 2021 more than a hundred carefully designed visual stories and interactive graphics (The New York Times 2021). As some of such stories cover geographic aspects, we can leverage them to study how geographic data is successfully reported to a wide audience. Previous research has already studied in existing stories structure and sequence (Hullman et al. 2013), patterns of visual narrative flow (McKenna et al. 2017), and narrative order in time-oriented stories (Lan et al. 2021). Text in such stories can have different roles, ranging from introductory texts to detailed annotations of the visualization (Segel and Heer 2010). We have focused on a fine-grained analysis of such categories and the explicit and implicit interplay of text and visualization in stories, with a certain focus on geographic aspects. In a first study, we analyzed 22 full stories from a variety of news media (Latif et al. 2021b). A second study looked at a set of 110 paragraph-chart pairs stemming from 77 articles of different news media (Latif et al. 2022b). Using a qualitative methodology in both studies, we investigated the text on sentence and word level and classified the cases into different categories. Specifically, we have addressed the following research questions.

What Are the Reported Analysis Insights, and How Is the Related Data Visually Communicated? (Latif et al. 2021b)

We observed two categories of textual narrative: data-driven text and contextual embedding text. The former directly relates to the data and describes analysis insights. These insights link to the analysis tasks, namely, identify, summarize, and compare. In stories with geographic focus, location and time are generally key concepts. In particular, locations associated with extreme values as well as those depicting very dissimilar behavior (outliers) are identified and explained in the narrative. Likewise, clusters of locations are discussed together to highlight their similarities. Other reported insights include geographic and temporal variations of variable values across a geographic region or time span. The narrative either uses measures of central tendency like mean, median, or mode to summarize them or describes these variations in plain words. Lastly, the narrative compares locations or other data items using part-to-whole contrasts, correlation, or statistical ranking. Apart from these insights, as contextual embedding, a comparable proportion of the narrative blends in the background of the story and data, necessary domain knowledge, and quotes from external sources or people to make these stories self-sufficient units of information. Moreover, authors of the stories interpret analysis insights, relate to other data and information sources, and attach judgment. It is important to note that the data-driven text and the contextual embedding text are often intermingled and cannot be always unambiguously separated. As the textual narrative explains the analysis insights, visualizations act as a complement to show the relevant data. Visualizations can serve a specific purpose, for instance, to provide an overview of the data, to support comparisons, or to highlight details. The use of simple visualizations like maps, visually enriched tables, bar charts, and line plots is more common compared to slightly more advanced ones like distribution plots and scatter plots.

How Do Textual Narration and Visualization Interplay? (Latif et al. 2021b)

We discovered different kinds of linking strategies to convert visualizations and an associated textual narrative into a single engaging story. First, the visualizations are almost always placed close to the text that describes them. Likewise, the sequence of visualizations in a story is important: The overview visualizations often appear first and are followed by detailed visualizations. Second, to strengthen the linking further, textual elements like captions, annotations, or tooltips are employed inside or next to visualizations. These textual elements often explain the key insight of a visual or help users in better interpreting the visualization. We observed that the use of descriptive annotations even enabled authors to include comparatively complex and non-standard visualizations into their stories. Third, visualizations as a whole or parts of them are explicitly referenced from the textual narrative. Authors sometimes also use the same colors in text and visualization to show connections between the two media.

What Implicit References Exist Between Text and Visualization, and How Do They Relate to the Data? (Latif et al. 2022b)

Implicit references can be defined as connections between a textual narrative and a visualization if both refer to the same data items. For instance, the mentions of countries aside showing a world map make country names implicit references. However, such connections are not limited to just single entities or values but also include group references (referring to many data points, e.g., EU) and interval references (referring to numerical ranges). Furthermore, individual references can be grouped together to form higher-order references. We found that these implicit references can correspond to analysis tasks such as identification, summarization, and comparison. Almost half of the implicit references directly matched a chart feature (e.g., axis label, legend, annotation, caption). However, the other half contained linguistic variations (e.g., inferences, synonyms, abbreviations, stems, or lemmas) or numerical variations (e.g., rounded off numbers, approximations, computed measures) and are harder to map to the visualized data.

3 Authoring Interactive Reports

Creating a data-driven story requires effort: Aside from writing the text, data needs to be analyzed and visually presented. Web technologies provide a good basis for making the content available. However, whereas many content management systems allow placing textual and visual content side by side, they do not support integrating both representations closer and, through this, creating interactive documents. Filling this gap, various authoring tools and supporting approaches have already been suggested for data-driven storytelling (Tong et al. 2018, Section 3). For instance, Chen et al. (2020) developed a framework to synthesize stories from insights identified using a visual analytics systems. It allows an author to arrange insights in different simplified visualizations, annotating and connecting them to tell a story.

Whereas most of these approaches support efficiently generating stories of different kinds, they do not directly address creating explicit and interactive links between the text and visualization. In contrast, VizFlow (Sultanum et al. 2021) focuses on links between text and visualization for authoring; while their links are limited to manually created links to image-based features of the visualization, they investigate in more detail how to leverage such links for document layout. Ellipsis (Satyanarayan and Heer 2014) allows authoring staged slide show stories with annotations that can be bound to data values and adapt with it. Elastic Documents (Badam et al. 2019) does not allow creating links directly but extracts text and tables that relate and connects them using new visualizations. Related are, furthermore, general approaches for annotating charts by, among other visual marks, textual content (Ren et al. 2017).

Focusing on an easy and efficient creation of valuable links between the text and visualizations, we have developed Kori (Latif et al. 2022b). The system, as demonstrated in Fig. 7.2, supports both manual creation of links and automatic suggestions for links. The computed links are based on processing the text and consider the hierarchical structure of references discussed in Sect. 7.2. While an author is composing an interactive story, the system offers unobtrusive suggestions, which can then be inspected and accepted or discarded. For the reader, the links finally act as interactive references and, when triggered, take the users’ attention to the respective portion of the visualization. Not only do they reduce a split attention effect but could be starting points from where to explore the data further. For the manual creation and adaption of links, the system offers an interface that requires only a few interactions to define the references. There are two modes of manual construction: First, the authors can directly select visual marks in the chart using a direct manipulation mode (e.g., rectangular brush selection). Alternatively, authors can apply a series of filters to select visualized data points. Through these means, authors can effortlessly create references and focus on composing their story.

Fig. 7.2
A screenshot represents the user interface of the Kori application. It indicates the chart gallery, suffering states of the United States of America, along with the statistics, link setting, and highlighting options with their properties.

The user interface of Kori. It consists of a chart gallery (1) and an editing interface (2). It supports manual creation of links through simple interactions (3). Users can choose highlighting options and their properties (4)

In a study, we asked 11 participants having diverse background and experience to create references to link text and visualizations in three examples. In the first and second task, they reproduced given links of various kinds with the tool. The third task was more open-ended; the participants were only given a set of visualizations and had to also textually summarize some findings as a short story while linking the text to the visualizations. The results indicated that participants did not have difficulties using the interface and were able to construct meaningful references in all three tasks. Among 64 automatic suggestions that occurred in the sessions, 48 were correct and 16 incorrect. Participants also used the manual construction mode and rated it comparable to the automatic suggestion feature, both with a median of 4 on a scale from 1 (worst) to 5 (best). The feedback for the automatic suggestions was mostly positive and confirmed that many recommendations were helpful and did not disturb the users’ workflow. “Smarter” reference detection methods, however, could still improve the experience.

4 Explorative Reporting

Data-driven stories and visual reports of data might be presented as interactive documents but often remain rather static. Users can interactively navigate through the story and retrieve some details on demand, but the documents mostly lack support for starting an explorative analysis that goes beyond the original story. Moreover, stories do not adapt to personal interests or current data. The explanation for these restrictions is simple: the stories are manually written as static texts. However, if partly automizing the generation of the natural-language content, we could provide extended options for explorative data analysis and personalization.

Various techniques exist for natural-language generation (Gatt and Krahmer 2018). Whereas the use of advanced generation methods for automatic reporting of data is not yet common in journalistic and industrial practices, some research prototypes have already investigated its potential for more adaptive reporting. For instance, such generation techniques have been used to provide guidance in the data exploration process by reporting automatically derived data facts (Srinivasan et al. 2019). But also whole stories can be generated. Unlike approaches that rather target at fully automatic generation (Shi et al. 2021), we are interested in still human-authored reports, however, which can adapt to different data automatically. Used in interactive documents, the generated text blends with visualizations in a data-driven story. In earlier works, we have explored such representations, for instance, to generate profiles of scientific authors (Latif and Beck 2019b) or, in software engineering use cases, to summarize program executions (Beck et al. 2017) and code quality (Mumtaz et al. 2019). More and more, the generated reports allow greater flexibility regarding the interactive exploration of the data, which complements explanatory texts and guided data analysis. The general idea can be described as exploranation, mixing exploration with explanation (Ynnerman et al. 2018).

Now, we have investigated such approaches to apply them to geographic data in the context of different media and usage modalities. These cover novel aspects such as comparative descriptions of selected entities, novel forms of presentation such as adaptive audio guides, and novel blends of interaction forms and presentations such as chatbots. This set of diverse examples shows early prototypes that demonstrate promising directions of visual reporting; we have not evaluated them yet in detail or connected them into a more comprehensive framework.

4.1 Maps with Data-Driven Explanations

Maps that show statistical information are widely used in data-driven storytelling. Choropleth maps visualize variation in one variable for a set of regions (e.g., countries). However, oftentimes, it is desirable to describe the relationship between two variables, which would require simultaneous visualization of two values per region. For instance, per capita spending on education could be compared to per capita spending on defense to understand different geopolitical roles of countries. An established way of visualizing such bivariate data is to employ graduated symbols overlaid on a choropleth map (Elmer 2012). However, by construction, these bivariate map visualizations are more complex to interpret, and it gets harder to spot visual patterns. Additional textual explanations might counterbalance and could hint at interaction effects of the variables that would go unnoticed otherwise. Encoding more variables per region in a more complex visual glyph is doable but would render the communication to a wider audience even more challenging.

To visually and textually report at least bivariate geographic data in a more accessible way, we developed Interactive Map Reports (Latif and Beck 2019a). It employs well-established statistical methods to detect notable relationships, geographic patterns, and outliers in given bivariate data. These insights are automatically transformed into a natural-language narrative that is, then, presented alongside a bivariate map visualization as shown in Fig. 7.3. The given example relates, for states of the USA, the number of fatalities caused by storms to the number of storms to reflect if the quantity of storms is directly related to the death toll. The textual narrative serves as a guide and explains findings. Small graphics in the text help establish linking between the two representations. Users can explore the map visualization as they read through the narrative by activating interactive links (printed in boldface). Likewise, while exploring the map, users can either get additional details on a selected geographic region or a comparative text for two selected regions.

Fig. 7.3
A screenshot represents the interactive map report on the fatalities caused by storms in the United States of America in 2017. There are 3 paragraphs, along with a separate explanation for Wisconsin and a comparison of Georgia and South Carolina.

A report describing casualties due to storms in the US as a bivariate map and textual summary (top). Details on a selected region and a comparison of two selected regions are available on demand (cutouts at the bottom)

The system is capable of generating interactive reports for different bivariate geographic datasets. Through a small set of parameters that the user provides about variables, geographic region and granularity, and general terminology, it adapts the generated narrative and visualization.

4.2 Interactive Audio Guides in Virtual Reality

Virtual reality is emerging to be an engaging medium for interactive data visualization, and it has just been started to be explored for data-driven storytelling (Isenberg et al. 2018). The idea of exploranation is also applicable to virtual reality visualizations. However, as used previously in documents, longer textual narrative will not be suitable as reading would counteract immersion. As an alternative, audio can be used in virtual reality applications such as games, movies, or virtual museums. Prerecorded audio narrative might be played at various stages of the story (e.g., in a game) or activated by a user interaction (e.g., in a virtual museum). The prerecording aspect limits the flexibility, and such approaches cannot adapt to changes in the data as a result of interactions.

To support exploranation, our approach Talking Realities (Latif et al. 2022a) combines a data-driven audio narrative with an immersive virtual-reality visualization. The audio narrative is based on automatic identification of interesting analysis insights. Using speech synthesis services, it is rendered on the fly from generated text and, therefore, adapts to data selections and user interactions. To provide a smooth exploration experience, the narrative should be synchronized with visual animations. To cater to the needs of a larger target user group, Talking Realities advocates three modes with varying levels of guidance. On the one hand, fully guided tours walk users through a pre-defined sequence of findings with the least freedom to explore. Free exploration, on the other hand, lets users investigate the data visualization without any intervention. In the middle lies the guided exploration that provides hints at potential perspectives that are worthy of exploration. We have tested the approach with different immersive visualizations, ranging from multivariate statistical data to astronomic data. Figure 7.4 shows an example of intercontinental air traffic data projected onto a globe.

Fig. 7.4
An illustration denotes the flight data for a duration of 1 year. It highlights the path of 236 flights over the globe for the date second of August 2018. On the right, it highlights the path of the longest intercontinental and longest long-distance flight.

Scenes and audio explanations (here, transcribed) from our prototype implementing the Talking Realities approach for air traffic data. (Top) A description of the aggregated intercontinental flights for one day. (Bottom) Scenes reporting the longest flight from an airport and most flights to any other airport

4.3 A Chatbot Interface Providing Visual and Textual Answers

Using natural language can make interactions with a machine effortless. Chatbots that reply to textual messages are an example of that. Instead of going through context menus and then choosing the relevant option, chatbots let us verbalize our requests as we would to another human being. However, research on supporting chatbot interfaces for data analysis and visualization is still in its infancy. Although some systems are already powerful, the use of chatbots can still lead to false expectations, misunderstood questions, and unexpected replies (Tory and Setlur 2019). We believe that a chatbot interface could be a good starting point to make the first contact with the data. In response to the user query, a resulting exploranative representation of data should then enable users to verify and validate presented facts and to explore related ones.

For a specialized use case of exploring relationships among historical public figures, we have developed VisKonnect (Latif et al. 2021a) together with project WorldKG (Chap. 1). The approach offers a chatbot interface to ask questions about said historical figures. Given a question, it uses a rule-based approach to understand the intent of the question and extract meaningful entities (e.g., people, places). Based on this information, it formulates a SPARQL query to pull the relevant data from an event knowledge graph (Gottschalk and Demidova 2019). This data is then visualized in multiple linked visualizations, highlighting the timelines of individual and shared events, as well as where these events take place. These visualizations are augmented with a textual explanation that aims at answering the user question (either through simple text templates or the GPT-3 language model). Additionally, related events are listed and serve as interactive links to explore them in the associated visualization. Figure 7.5 demonstrates a query about two well-known scientists and the response generated by VisKonnect.

Fig. 7.5
An illustration presents a screenshot of a query and its response. The query reads, when did Pierre Curie and Marie Curie marry. On the right is a timeline chart highlighting important events in the lifespan of Pierre and Marie Curie.

VisKonnect answers user questions with a mix of textual reply (left) and explorable visualizations (right). The cutout of the visualization shows a timeline for the two identified scientists; annotations are placed manually for highlighting events that the users might further explore

5 Conclusion and Future Work

Within the presented research, we have empirically investigated in-depth how geographic data and related information can be jointly described and linked in textual and visual representations. For creating data-driven stories as integrated reports, we provide authoring support of such links that better connect the two representations. While links can be manually added in a flexible and easy-to-use way, our solution also automatically recommends specific linking through analyzing the data-driven text. In different reporting solutions, we were able to demonstrate the flexibility and broad applicability of our reporting solutions as automatically generated descriptions of statistical maps, as audio guides in virtual reality for immersive visualizations, and as a natural-language interface to a knowledge graph that responds with textual and visual data representations.

With these solutions, we not only guide users through insights of a data analysis but at the same time invite them to explore the data in depth. Following our overarching goal to loop back the volunteered data to the public, we are now specifically interested in transferring these empirical results, methods-oriented general solutions, and early research prototypes to specific application examples and inviting a broader audience to use them. Ongoing work targets at this already, for instance, investigating a visual reporting solution for personalized, comparative summarizations of hotel reviews.

Our research generally emphasizes that citizen participation in research is not one directional. Reflecting back results and providing options to explore the data support an even higher level of participation and should be considered in all citizen science projects and data volunteering platforms. Our ideas can be brought together with analysis solutions for volunteered geographic information, and we invite researchers developing such solutions to also investigate this perspective. Still, empirical studies are necessary that explore the effect of providing visual reporting solutions on the engagement of volunteers and the influence on decision processes.