Keywords

1 Introduction

Data visualization can be defined as “the representation and presentation of data that exploits our visual perception abilities in order to amplify cognition” [1]. It has the power to transform complex information into stories that inform and inspire action. Data visualization is an effective tool for learning analytics, as it helps to present learners’ data in a way that is easily understandable and intuitive for students, teachers, researchers, and other stakeholders. Through the use of graphs, charts, and other visual aids, it is possible to quickly identify patterns, trends, and relationships within data that may not be immediately apparent through purely numerical data analysis methods.

Visualization in learning analytics has two distinct applications. On the one hand, the use of visual dashboards has become the main vehicle for putting learning analytics into practice. Presenting data in visually appealing and intuitive ways can help promote data literacy among students and other stakeholders, encouraging greater engagement with data and fostering a culture of continuous improvement. On the other hand, learning analytics scientific production heavily relies on data visualization to present research findings in a clear and accessible manner, making it easier for readers from different scholarly backgrounds to understand and act upon research insights. Regardless of the context, the power of visualization in learning analytics lies in its ability to take complex data and turn it into meaningful insights that support better decision-making and drive improvement.

In this chapter, the reader will be guided through the process of generating meaningful and aesthetically pleasing visualizations of different types of datasets using well-known R packages. Relevant plots and plot types will be demonstrated with an explanation of their usage and usage cases. Furthermore, learning-related examples will be discussed in detail. For instance, readers will learn how to visualize learners’ logs extracted from learning management systems (LMSs) to show how trace data can be used to track students’ learning activities. Other examples of common research scenarios in which learners’ data are visualized will be illustrated throughout the chapter. In addition to creating compelling plots, readers will also be able to generate professional-looking tables with summary statistics to report descriptive statistics.

2 Visualization in Learning Analytics

Developing visualizations is a challenging task of balancing the cognitive load of users while not compromising on conveying specific insights from data [2]. Visualizations for practice in learning analytics are mostly developed for two main stakeholders: learners and instructors. Depending on the target group, a visualization or a dashboard (i.e., a collection of visualizations depicting multiple indicators) have different goals.

Learner-facing visualizations are meant to make learners aware of their own learning and to provide them with actionable feedback on their learning. Visualizations display learners’ performance on a specific metric and compare it with a reference frame: other peers, desirable learning achievement, or their own progress over time [3]. Sense-making questions triggering reflection can be added to a visualization [4, 5], or some elements of the visualizations can be highlighted and described in words using layered storytelling [6, 7]. Another option is to gamify a dashboard, for example, by using badges [8]. To provide feedback to learners, visualizations can be augmented with links to recommended resources [9], information about specific topics to review to close the achievement gap [6], or explanations of the meaning of visualizations and their implications for the learner [10]. Current learner-facing dashboards mostly show resource use and assessment data [11], compare learners to their peers [12], display descriptive analytics rather than predictive or prescriptive analytics [10], and use self-regulated learning theory as their framework [12, 13]. Some reviews found a positive effect on student outcomes [10], while others reported mixed results [11, 14]. Showing visualizations to learners can change their behavior. For example, social network analysis visualizations have resulted in fewer cross-group commenting [15], while a visualization comparing individual submission patterns with the top 25% of students in a class led to earlier homework submissions [16].

In comparison, the goal of instructor-facing visualizations is to support teachers and their decision-making process by tracking student progress. Two main types can be distinguished. Mirroring or descriptive visualizations provide insights about the learners on an aggregated or an individual level using either descriptive or comparative data. Advising or prescriptive visualizations show not only information about the learners but also alert the instructor to undertake a pedagogical action [17, 18]. Current instructor-facing visualizations mostly display course-wide information about the learners or track group work [14]. These visualizations can support teachers in facilitating student collaboration [19], planning and collecting student feedback on learning activities [20], or obtaining insights into student interactions within an online environment, such as simulations, virtual labs or online games [21, 22]. However, interpreting dashboard information is a challenging task for instructors. Although some teachers use dashboards as complementary sources of information, others act based only on the dashboard information without further investigation [23].

A common point of criticism of learning analytics dashboards is that most of them are not grounded in learning theories [13, 14]. Data-driven evaluations of dashboards focused on dashboard acceptance, usefulness, or usability are more prevalent than pedagogically-focused evaluations [24]. Some approaches were developed to mitigate these issues. The model of user-centered learning analytics systems (MULAS) presents a set of recommendations on four interconnected dimensions: theory, design, evaluation, and feedback, and can be used to guide dashboard development [14]. Another approach is an iterative five-stage Learning Awareness Tools—User eXperience (LATUX) workflow, including problem identification, low-fidelity prototyping, high-fidelity prototyping, pilot studies, and classroom use, that can be used to develop visual analytics [25]. Finally, open learner model research could be used as a source of insights while developing learning analytics visualizations, such as dashboards [9].

3 Generating Plots with ggplot2

In the previous section, we have seen how central visualization is to learning analytics. In the remainder of the chapter, we will learn how to create different types of visualizations that are relevant to different types of data related to teaching and learning. We will mostly rely on ggplot2, a popular data visualization package in R that was developed by Hadley Wickham [26]. It is based on the grammar of graphics [27], which is a systematic way of thinking about and constructing visualizations. The ggplot2 library provides a flexible and intuitive framework for creating a wide range of graphics, from basic scatter plots to complex visualizations with multiple layers. It is known for its ability to produce visually appealing and informative graphics with relatively few lines of code. It enables users to define aesthetics, such as color and size, and add layers, such as points and lines, to create customized and interactive plots. In addition, ggplot2 allows for easy customization of plot features, such as titles, axis labels, and legends.

Overall, ggplot2 is a powerful and versatile tool for data visualization in R, and is widely used by data scientists, statisticians, and researchers in a variety of fields. In this chapter, we will cover the fundamental concepts and techniques of ggplot2, including how to create basic plots, and customize their appearance. We will start by introducing the building blocks of a ggplot2 plot, including aesthetics, layers, and scales. Then, we will create a plot from scratch step by step, showing how to customize its appearance, including how to change theme, colors, and scales. We will then explore the different types of plots that can be created with ggplot2, such as scatter plots, bar charts, and histograms.

Throughout this section, we will use datasets of students’ learning data to demonstrate how to create effective visualizations for learning analytics with ggplot2. Please, refer to Chapter 2 of this book [28] to learn more about the datasets used. By the end of this section, you will have a solid foundation in ggplot2 and be able to create basic, yet compelling visualizations to explore your data.

3.1 The ggplot2 Grammar

The ggplot2 library is based on Wilkinson’s grammar of graphics [27]. The main idea is that every plot can be broken down into a set of components, each of which can be customized and combined in a flexible way. These components are:

  • Data: This is the data we want to visualize. It can be in the form of a dataframe, tibble or any other structured data format.

  • Aesthetic mapping (aes): It defines how variables in the data are mapped to visual properties of the plot, such as position, color, shape, size, and transparency.

  • Geometric object (geom): It represents the actual visual elements of the plot, such as points, lines, bars, and polygons.

  • Statistical transformation (stat): It summarizes or transforms the data in some way, such as by computing means, medians, or proportions, or by smoothing or summarizing data, or grouping them into bins.

  • Scale (scale): It maps values in the data to visual properties of the plot, such as color, size, or position.

  • Coordinate system (coord): It defines the spatial or geographic context in which the plot is displayed, such as Cartesian coordinates, polar coordinates, or maps.

  • Facet (facet): It allows to split the data into subsets and display each subset in a separate panel. It often useful for visualizing data with multiple categories or groups.

Through the combination and customization of these components, we can create a wide variety of complex and informative visualizations in ggplot2. The idea behind the graphics grammar is to provide a consistent framework for constructing plots, allowing users to focus on the data and the message they want to convey, rather than on the technical details of the visualization. In the following section, we will create a plot from scratch step by step to become familiar with the most relevant components.

3.2 Creating Your First Plot

We will now create our first plot using ggplot2. Our example deals with a widely studied matter in learning analytics, which is the relationship between online activity and achievement. We will use a bar chart to represent the number of students that have low, moderate and high activity levels in each achievement group (high achievers vs. low achievers). In order to become familiar with the syntax of ggplot2, we will recreate the plot step by step, explaining each of the elements in the plot. Below is the final result we aim at accomplishing (Fig. 1):

Fig. 1
A stacked bar graph compares the number of students and achievement group. A maximum of 30 students are low achievers with low activity, while a minimum of 14 students are high achievers with low activity. Data is approximate.

First plot with ggplot2

3.2.1 Installing ggplot2

Our first step is installing the ggplot2 library. This is usually the first step in any R script that makes use of external libraries.

A single line script that reads install dot packages open parenthesis open double quotes g g plot 2 close double quotes close parenthesis.

To import ggplot2 we just need to use the library command and specify the ggplot2 library:

A single line script that reads library open parenthesis g g plot 2 close parenthesis.

3.2.2 Downloading the Data

Next, we need to import the data that we are going to plot. For this chapter, we are using synthetic data from a blended course on learning analytics. For more details about this dataset, refer to Chap. 2 in this book. The data is in Excel format. We can use the library rio since it makes it easy to read data in several formats. We first install the library:

A single line script that reads install dot packages open parenthesis open double quotes r i o close double quotes close parenthesis.

And import it so we can use its functions:

A single line script that reads library open parenthesis r i o close parenthesis.

Now we can download the data using the import function from rio and assign it to a variable named df (short for dataframe).

A 3 line code presents how to define the corrected U R L for the data file and import the data into a data frame.

We can use the head command to get an idea of what the dataset looks like. To recreate the plot above we will need the AchievingGroup column —which indicates whether students’ are high achievers (to 50%) or low achievers (bottom 50%), according to their final grade— and the ActivityGroup column —which indicates whether students have a high level of activity (top 33%), moderate activity (middle 33%), or low activity (bottom 33%), according to their total number of events in the LMS.

A single line script reads head open parenthesis d f close parenthesis.

# A tibble: 130 x 37    User      Name   Gender ActivityGroup AchievingGroup Surname Origin Birthdate    <chr>     <chr>  <chr>  <chr>         <chr>          <chr>   <chr>  <chr>  1 00a05cc62 Wan    M      Low activity  Low achiever   Tan     Malay~ 12.12.19~  2 042b07ba1 Daniel M      High activity Low achiever   Tromp   Aruba  28.5.1999  3 046c35846 Sarah  F      Low activity  Low achiever   Schmit  Luxem~ 25.4.1997  4 05b604102 Lian   F      Low activity  Low achiever   Abdull~ Yemen  19.11.19~  5 0604ff3d3 Nina   F      Low activity  Low achiever   Borg    Malta  13.6.1994  6 077584d71 Moham~ M      High activity High achiever  Gamal   Egypt  13.7.1998  7 081b100cf Maxim~ M      Moderate act~ High achiever  Gruber  Austr~ 20.12.19~  8 0857b3d8e Hugo   M      High activity High achiever  Pérez   Spain  22.12.19~  9 0af619e4b Aylin  F      Low activity  Low achiever   Barat   Kazak~ 14.8.1995 10 0ec99ce96 Polina F      Moderate act~ Low achiever   Novik   Belar~ 9.10.1996 # i 120 more rows # i 29 more variables: Location <chr>, Employment <chr>, #   Frequency.Applications <dbl>, Frequency.Assignment <dbl>, #   Frequency.Course_view <dbl>, Frequency.Feedback <dbl>, #   Frequency.General <dbl>, Frequency.Group_work <dbl>, #   Frequency.Instructions <dbl>, Frequency.La_types <dbl>, #   Frequency.Practicals <dbl>, Frequency.Social <dbl>, ...

3.2.3 Creating the Aesthetic Mapping

Now that we have our data, we can pass it on to ggplot2 as follows:

A single line script reads g g plot open parenthesis d f close parenthesis.

We still do not see anything because we have not selected the type of chart or the variables of the data that we want to plot (Fig. 2). First, let us specify that we want to plot the AchievingGroup column (high vs. low achievers) on the x-axis. Assigning columns of our dataset to different elements of the plot is called constructing an aesthetic mapping. We can do it by calling the aes function from ggplot2, specifying that we want to map the AchievingGroup column to the x-axis, and then passing this call to aes to our plot using the second argument of ggplot:

Fig. 2
A rectangular shaped empty plot.

Empty plot

3.2.4 Add the Geometry Component

A single line script that reads g g plot open parenthesis d f comma a e s open parenthesis x equals achieving group close parenthesis close parenthesis.

We now see that the x-axis has the two possible values of AchievingGroup: “High achiever” and “Low achiever” (Fig. 3). We still need to tell ggplot2 the type of chart we want to use to plot the number of students of each type. To do that we need to add a geometrical (geom) component to our plot in which we specify that we want a bar chart. We do it by adding a + sign after our call to ggplot and calling geom_bar() (the name of the geometry that represents a bar chart).

Fig. 3
Three empty plots. The x axis label the achieving group with the high achiever and the low achiever.

Empty plot with AchievingGroup in x-axis labels

A single line script that reads g g plot open parenthesis d f comma a e s open parenthesis x equals achieving group close parenthesis close parenthesis plus g e o m underscore bar open parenthesis close parenthesis.

Now the plot can actually be called a plot. Notice that we have not specified what we want to plot in the y-axis. When not specified, ggplot2 assumes that we want to use the count of rows (Fig. 4).

Fig. 4
A bar graph compares the number of counts and the achieving group. The high and low achievers have the maximum counts of around 66. Values are approximated.

Basic bar plot showing students by achievement group

We also notice that the bars are in the wrong order. By default, ggplot2 orders the values in an ascending way (alphabetically in the case of text values). If we want to enforce our own order, we need to convert the AchievingGroup column of df into a factor and provide the ordered list of values to the levels argument.

A two line snippet code. It features how the achieving group column is selected from the data frame, converts the selected column into a factor, and specifies the order and levels of the factor.

If we generate our plot again, we see that the bars are now in the order we want them to be (Fig. 5):

Fig. 5
A bar graph compares the number of counts and the achieving group. The high and low achievers have the maximum counts of around 66. Values are approximated.

Basic bar plot showing students by achievement group after transforming the x-axis variable into a factor

A single line script that reads g g plot open parenthesis d f comma a e s open parenthesis x equals achieving group close parenthesis close parenthesis plus g e o m underscore bar open parenthesis close parenthesis.

3.2.5 Adding the Color Scale

We still need to color our bar chart according to students’ activity level. We do that by mapping the fill aesthetic to the ActivityLevel column inside the aes. When we provide the fill property, ggplot will automatically create the appropriate legend (Fig. 6).

Fig. 6
A stacked bar graph compares the number of counts and the achieving group. A maximum of 30 students are low achievers with low activity, while a minimum of 14 are high achievers with low activity. Data is approximate.

Basic bar plot showing students’ activity level by achievement group and colored by activity level

A single line script that reads g g plot open parenthesis d f comma a e s open parenthesis x equals achieving group comma fill equals activity group close parenthesis close parenthesis plus g e o m underscore bar open parenthesis close parenthesis.

Again, we need to change the order of our legend so that it follows the logical semantic order for the activity levels (low-moderate-high):

A 3 line snippet code. It features the activity group column in a data frame converted into a factor with three low, moderate, and high activity levels.

If we generate the plot again, we see now that the legend is in the right order (Fig. 7):

Fig. 7
A stacked bar graph compares the number of counts and the achieving group. A maximum of 30 students are low achievers with low activity, while a minimum of 14 are high achievers with low activity. Data is approximate.

Basic bar plot showing students’ activity level by achievement group and colored by activity level after ordering the legend

A single line script that reads g g plot open parenthesis d f comma a e s open parenthesis x equals achieving group comma fill equals activity group close parenthesis close parenthesis plus g e o m underscore bar open parenthesis close parenthesis.

However, the stacks are still not in the right order, being the low activity students at the top of the bar, and the high activity students at the bottom, which might be counter-intuitive. To change this, we need to reverse the position of the bar using position = position_stack(reverse = TRUE) inside geom_bar (Fig. 8):

Fig. 8
A stacked bar graph compares the number of counts and the achieving group. A maximum of 30 students are low achievers with low activity, while a minimum of 14 are high achievers with low activity. Data is approximate.

Basic bar plot showing students’ activity level by achievement group and colored by activity level after ordering the stacks

A two line snippet code. It features a plot with the columns achieving group and activity group, with the stacking order.

We are getting closer but the color scheme does not quite match our intended result. To add a color scheme to our plot we need to add a scale layer. In this case, the scale is for the fill property, which is the color of the bars in our chart. There are many ways to specify the color scheme. One option is to use sequential colors from the same palette. For that we add a new layer to our plot named scale_fill_brewer and we pass the palette that we want as an argument. For example, palette number 15 would look like this (Fig, 9):

Fig. 9
A stacked bar graph compares the number of counts and the achieving group. A maximum of 30 students are high achievers with high activity, while a minimum of 14 are low achievers with high activity. Data is approximate.

Bar plot showing students’ activity level by achievement group with sequential color scale

A three line snippet code. It features how to initialize the plot with the achieving group and activity group on the x axis. It creates a stacked bar filled with a set 3 color palette.

Another option is to provide a manual scale with the colors of our choice. For that we use scale_fill_manual and specify a values vector as an argument. We need to specify as many colors as unique elements in your scale. In this case we have three activity groups (for low, moderate or high activity), so we must provide three colors. There are tons of resources online where you can find or create your own palettes (e.g., Coolors, Adobe Color or Lospec). You have to provide the hexadecimal code of each color or the oflcial color name recognized by R. Below is an example (Fig. 10):

Fig. 10
A stacked bar graph compares the number of counts and the achieving group. A maximum of 30 students are low achievers with low activity, while a minimum of 14 are high achievers with low activity. Data is approximate.

Bar plot showing students’ activity level by achievement group with manual color scale

A three line snippet code. It features how to initialize the plot with the achieving group in the x axis filled with the activity group. It creates a stacked bar filled with three different colors.

Lastly, a very common color scale used is Viridis. It is designed to be perceived by viewers with common forms of color blindness. To use it in our plot we just add scale_fill_viridis_d() (Fig. 11).

Fig. 11
A stacked bar graph compares the number of counts and the achieving group. A maximum of 30 students are low achievers with low activity, while a minimum of 12 are high achievers with low activity. Data is approximate.

Bar plot showing students’ activity level by achievement group with viridis color scale

A three line snippet code. It features how to create a stacked bar plot with the viridis color scale. The x axis labeled achieving group filled with the activity group.

Viridis is the palette we need to replicate our target plot. However, the order of the color needs to be reversed so the most dense color represents the higher activity level. We do this by reversing the direction of the palette as follows (Fig. 12):

Fig. 12
A stacked bar graph compares the number of counts and the achieving group. A maximum of 30 students are high achievers with high activity, while a minimum of 12 are low achievers with high activity. Data is approximate.

Bar plot showing students’ activity level by achievement group with viridis color scale

A three line snippet code. It features how to create a stacked bar plot with the viridis color scale in the reversed direction. The x axis labeled achieving group filled with the activity group.

3.2.6 Working with Themes

Now that the geometry and color scheme of the bars looks like our initial plot, we notice that there are still some differences. An important one is the grey background of the plot. To change the general appearance of our plot, we may use the ggplot2 themes. Below are some examples (Fig. 13):

Fig. 13
4 stacked bar graphs compare the number of counts and the achieving group. A maximum of 30 students are high achievers with high activity, while a minimum of 12 are low achievers with high activity. Data is approximate.

Bar plot using different themes: theme_dark (top left), theme_classic (top right), theme_void (bottom left), and theme_minimal (bottom right)

A 12 line code. It features how to create 4 stacked bar plots with the viridis color scale in the reversed direction and the themes are dark, classic, void, and minimal. The x axis labeled achieving group filled with the activity group.

We have theme_dark with a dark background and border, theme_classic with thick axes and no grid lines, theme_void which is completely empty, and theme_minimal with a minimalistic look. There are more available in the ggplot2 documentation and even more third-party implementations. To recreate our goal plot, we select the theme_minimal. To avoid having to add the theme to all of our plots from now on, we can set a default theme for our whole project by using theme_set:

A single line script that reads theme underscore set open parenthesis theme underscore minimal open parenthesis close parenthesis close parenthesis.

Notice how now we get theme_minimal even when we do not specify it in our code (Fig. 14):

Fig. 14
A stacked bar graph compares the number of counts and the achieving group. A maximum of 30 students are high achievers with high activity, while a minimum of 12 are low achievers with high activity. Data is approximate.

Bar plot with theme minimal by default

A three line snippet code. It features how to create a stacked bar plot with the viridis color scale in the reversed direction. The x axis labeled achieving group filled with the activity group.

3.2.7 Changing the Axis Ticks

You may have not noticed that another difference with our goal plot is the ticks in our y-axis. In the goal plot we count 10 by 10, whereas in our last plot we do so 20 by 20. Just like we modified the scale of the fill aesthetic when we changed the color of our bars, we can also modify the y aesthetic to adjust to our needs. We use the scale_y_continuous layer and we try different number of breaks (n.breaks), until we find what we like best (Fig. 15):

Fig. 15
3 stacked bar graphs compare the number of counts and the achieving group. A maximum of 30 students are low achievers with low activity, while a minimum of 12 are high achievers with low activity. Data is approximate.

Bar plot with different numbers of y.axis breaks: 15 (left), 3 (middle), and 7 (right)

A 12 line code. It features how to create 3 stacked bar plots with the viridis color scale in the reversed direction and three different breaks at 15, 3, and 7. The x axis labeled achieving group filled with the activity group.

We choose 7 breaks to obtain our desired result.

3.2.8 Titles and Labels

Our plot is still missing some slight modifications to be 100% equal to the original one. For instance, the axes’ titles are not the same. To specify the y-axis label, we add a new layer to our plot named ylab and we pass a string with our desired label “Number of students” (Fig. 16):

Fig. 16
A stacked bar graph compares the number of counts and the achieving group. A maximum of 30 students are low achievers with low activity, while a minimum of 12 are high achievers with low activity. Data is approximate.

Bar plot with y-axis label

A five line snippet code. It features how to create a stacked bar plot with the viridis color scale in the reversed direction. The x axis labeled the achieving group filled with the activity group and the y axis labeled the number of students. It features a break at 7 in the y axis.

We do the same for the x-axis using xlab, and for the legend using labs (Fig. 17):

Fig. 17
A stacked bar graph compares the number of counts and the achieving group. A maximum of 30 students are high achievers with high activity, while a minimum of 12 are low achievers with high activity. Data is approximate.

Bar plot with all labels

A seven line code. It features how to create a stacked bar plot with the viridis color scale in the reversed direction. The x axis labeled the achievement group filled with the activity level and the y axis labeled the number of students.

More importantly, we are missing the overall title of the plot. To add it we use ggtitle and we pass our intended plot title “Activity level by achievement group”. Keep in mind that, whenever possible, it is better to add a caption to the image rather than a title on the plot. A caption is more accessible for visually impaired users since it is compatible with screen readers. In scientific papers, it is also more common to have a Figure caption than a title within the plot. In social media, it is frequent to see the title on the plot as images are often shared without context. However, many social media platforms allow to provide an alternative text which is what screen readers will read as a substitute for the image, and that is also the case in learning analytics dashboards (Fig. 18).

Fig. 18
A stacked bar graph compares the number of counts and the achieving group. A maximum of 30 students are low achievers with low activity, while a minimum of 14 are high achievers with low activity. Data is approximate.

Bar plot with title

An eight line code. It features how to create a stacked bar plot with the viridis color scale in the reversed direction. The x axis labeled the achievement group filled with the activity level, the y axis labeled the number of students, and the title labeled activity level by achievement group.

3.2.9 Other Cosmetic Modifications

Lastly, we need to do some slight modifications to the overall appearance of the plot. We do this through the generic theme function of ggplot2. We first modify the position of the legend by setting legend.position to “bottom”. We then increase the size of the axes titles, by setting axis.title to element_text(size = 12). Finally, we make the plot title bigger as well and put it in bold by setting plot.title to element_text(size = 15, face = ”bold”)). With these last changes, we have an exact replica of our original plot (Fig. 19).

Fig. 19
A stacked bar graph labeled activity level by achievement group compares the number of counts and the achievement group. A maximum of 30 students are low achievers with low activity, while a minimum of 12 are high achievers with low activity. Data is approximate.

Bar plot with theme modifications

An 11 line code. It features how to create a stacked bar plot with the viridis color scale in the reversed direction. The x axis labeled the achievement group filled with the activity level, the y axis labeled the number of students, and the title labeled activity level by achievement group.

3.2.10 Saving the Plot

Since we have obtained the desired result, we may now save it as an image to be able to use it elsewhere. For that, we first need to assign the plot to a variable (e.g., myplot).

A 10 line code. It features how to create a stacked bar plot with the viridis color scale in the reversed direction. The x axis labeled the achievement group filled with the activity level, the y axis labeled the number of students, and the title labeled activity level by achievement group.

We then use ggsave to save the plot to our filesystem. We need to specify the file path (including the extension, such as PNG, JPEG, etc.) where we want to save the plot (e.g., “bar.png”) as the first argument and pass the variable where we saved our plot (myplot) as a second argument. If we do not do this, ggplot2 assumes we want to save the latest plot that we created. Lastly, we may specify the width, height and resolution (dpi) of our plots. If we are submitting our figure to a scientific journal, we probably need a high resolution image. If we are using the figure in social media, we do not want the resolution to be so high as it would take a long time to load.

A single line snippet code presents how to create a plot with a width of 10000, a height of 5000, and dots per inch is 900.

Throughout this section, we have learned how we can create a plot from scratch using only the ggplot2 library and a simple dataset. We have seen the many customization possibilities (theme, scales, titles) that we can achieve using the different plot components without needing to rely on external tools for retouching our final graph. In the next section we will learn about new types of plots that might be more suitable for other types of data and their customization possibilities.

3.3 Types of Plots

The ggplot2 library offers many types of plots (or geoms) that you can choose from to visualize your data in several ways. In this section, we go over some of the most common types and present examples using students’ learning data.

3.3.1 Bar Plot

We have seen how to construct a bar plot in the previous section as an example of how to use ggplot2. But when should we use a bar plot? Bar plots are useful when we want to represent counts or any numerical variable broken down by categories. The y-axis would represent the count (or other continuous numerical variable) and the x-axis would represent the categories. Keep in mind that if the categories follow a natural order, the x-axis should respect it (for example: “Morning”, “Afternoon”, “Evening”; or “Children”, “Adults”, “Elders”). Otherwise, you can just order the x-axis alphabetically or from highest to lowest value in the y-axis (Fig. 20).

Fig. 20
A bar graph compares the number of counts and the achieving group. The high and low achievers have the maximum counts of around 66. Values are approximated.

Basic bar plot of students by achievement group

A two line snippet code. It presents how to create a stacked bar plot with the bars stacked in reverse order and the x axis labeled achieving group.

Remember that you can add a “third dimension” to the plot by using the fill property. This is known as a ‘stacked’ bar chart and helps highlight the proportion of, in this case, students’ activity level (ActivityGroup) (Fig. 21).

Fig. 21
A stacked bar graph compares the number of counts and the achieving group. A maximum of 30 students are low achievers with low activity, while a minimum of 12 are high achievers with low activity. Data is approximate.

Basic bar plot of students by achievement group filled by activity level

A two line snippet code. It presents how to create a stacked bar plot with the bars stacked in reverse order and the x axis labeled achieving group. The color scale for the bars uses the Viridis color palette.

If we care more about the actual number rather than the proportion of students with each activity level, instead of a stacked bar chart we can keep each ‘stack’ as a whole bar of their own. This plot is very useful to compare values among categories. We accomplish this by passing the position argument with the value “dodge” to the geom_bar component (Fig. 22):

Fig. 22
A clustered bar graph compares the count versus the achieving group. Low activity is high at 30 in the low achiever group whereas high activity is high at 27 in the high achiever group. Data is approximate.

Basic bar plot of students by achievement group filled by activity level with position dodge instead of stacked

A 2 line code in R. The g g plot 2 code generates a dodged bar plot using the achieving group on the x-axis and the activity group for fill colors. It applies the viridis color scale, with reversed colors.

We can now see that the highest group is represented by the low achievers with low activity, followed by the high achievers with high activity.

3.3.2 Histogram

Histograms allow us to represent the distribution of a single continuous variable. It is inherently a bar chart, but instead of each bar representing the count of a single category, it represents the count of a range of values in the x-axis (what is known as a bin). Let us, for example, create a histogram for students’ online activity. Specifically, let us see the distribution of the number of accesses to the course main page online.

If we look at our dataset, we can see that the name of the variable that we are interested in is Frequency.Course_view:

A command reads the following, head, left parenthesis, d f, right parenthesis.

# A tibble: 130 x 37    User      Name       Surname  Origin     Gender Birthdate Location Employment    <chr>     <chr>      <chr>    <chr>      <chr>  <chr>     <chr>    <chr>  1 00a05cc62 Wan        Tan      Malaysia   M      12.12.19~ Remote   None  2 042b07ba1 Daniel     Tromp    Aruba      M      28.5.1999 Remote   None  3 046c35846 Sarah      Schmit   Luxembourg F      25.4.1997 On camp~ None  4 05b604102 Lian       Abdullah Yemen      F      19.11.19~ On camp~ None  5 0604ff3d3 Nina       Borg     Malta      F      13.6.1994 On camp~ None  6 077584d71 Mohamed    Gamal    Egypt      M      13.7.1998 On camp~ Part-time  7 081b100cf Maximilian Gruber   Austria    M      20.12.19~ On camp~ None  8 0857b3d8e Hugo       Pérez    Spain      M      22.12.19~ On camp~ None  9 0af619e4b Aylin      Barat    Kazakhstan F      14.8.1995 On camp~ None 10 0ec99ce96 Polina     Novik    Belarus    F      9.10.1996 On camp~ None # i 120 more rows # i 29 more variables: Frequency.Applications <dbl>, #   Frequency.Assignment <dbl>, Frequency.Course_view <dbl>, #   Frequency.Feedback <dbl>, Frequency.General <dbl>, #   Frequency.Group_work <dbl>, Frequency.Instructions <dbl>, #   Frequency.La_types <dbl>, Frequency.Practicals <dbl>, #   Frequency.Social <dbl>, Frequency.Ethics <dbl>, Frequency.Theory <dbl>, ...

To create a histogram for this variable we may use the geom_histogram feature of ggplot2. We just pass our dataset and map the Frequency.Course_view variable to the x axis, and we add the geometry geom_histogram (Fig. 23):

Fig. 23
A histogram of count versus frequency total. It plots a fluctuating trend. The frequency at 500 has the highest count of 15. Data is approximate.

Histogram of students’ course page views

A single line code in R reads the following, g g plot, left parenthesis, d f, mapping, equals, a e s, left parenthesis, x, equals, Frequency, dot, Total, right parenthesis, right parenthesis, plus, geom, underscore, histogram, left parenthesis, right parenthesis.

We can provide our own value to the bins argument in geom_histogram to personalize how many bins we want in our plot (Fig. 24):

Fig. 24
A histogram of count versus frequency total. It plots a fluctuating trend. The frequency at 500 has the highest count of 15. Data is approximate.

Histogram of students’ course page view with 50 bins

A 2 line code in R reads the following, g g plot, left parenthesis, d f, mapping, equals, a e s, left parenthesis, x, equals, Frequency, dot, Total, right parenthesis, right parenthesis, plus, geom, underscore, histogram, left parenthesis, bins, equals, 50, right parenthesis.

We can also personalize the color scheme using fill for the background of the bars (Fig. 25):

Fig. 25
A histogram of count versus frequency total. It plots a fluctuating trend. The frequency at 500 has the highest count of 29. Data is approximate.

Histogram of students’ course page view with color, fill and linewidth

A 3 line code in R. This g g plot 2 code plots a histogram of the frequency total column from the data frame d f, with 20 bins filled in deep pink. It adjusts the number of x-axis breaks to 10.

The histogram allows us to acknowledge that most students had around 400–500 events, with another peak around 900–1000. Students with more than 1000 events were rare.

3.3.3 Line Plot

Another very widely used type of plot is the line plot. Like the histogram, it is also appropriate when we have both a numerical continuous x-axis and y-axis but it gives us a bit more liberty of what we plot and it is suitable for when we want to plot several series of data together. A very common scenario for a line plot is when we deal with timelines and we wish to visualize the evolution of a certain variable over time. Let us, for instance, plot the students’ daily events in the LMS throughout the course, a common plot in learning analytics dashboards. In the dataset that we have been using, we have the total count of events per user but not the timestamp of each event. We need to import the original event data from the dataset:

A 2 line code in R. The code imports data from an Excel file hosted on GitHub into R using, the read underscore excel, function from the read x l package, storing it in the events data frame.

The Events.xlsx file contains all the actions that the students enrolled in this course performed in the LMS (Action) with their corresponding timestamp (timecreated): clicking on a lecture file, viewing the assignment instructions, etc.

A single line code in R reads the following, head, left parenthesis, events, right parenthesis.

# A tibble: 95,626 x 7    Event.context     user  timecreated         Component Event.name Log   Action    <chr>             <chr> <dttm>              <chr>     <chr>      <chr> <chr>  1 Assignment: Fina~ 9d74~ 2019-10-26 09:37:12 Assignme~ Course mo~ Assi~ Assig~  2 Assignment: Fina~ 9148~ 2019-10-26 09:09:34 Assignme~ The statu~ Assi~ Assig~  3 Assignment: Fina~ 278a~ 2019-10-18 12:05:28 Assignme~ Course mo~ Assi~ Assig~  4 Assignment: Fina~ 53d6~ 2019-10-19 13:28:37 Assignme~ The statu~ Assi~ Assig~  5 Assignment: Fina~ aab7~ 2019-10-15 23:38:13 Assignme~ Course mo~ Assi~ Assig~  6 Assignment: Fina~ 82ed~ 2019-10-18 17:51:43 Assignme~ Course mo~ Assi~ Assig~  7 Assignment: Fina~ 4178~ 2019-10-18 15:22:56 Assignme~ Course mo~ Assi~ Assig~  8 Assignment: Fina~ 82ed~ 2019-10-22 13:46:51 Assignme~ The statu~ Assi~ Assig~  9 Assignment: Fina~ f2e9~ 2019-10-15 14:58:17 Assignme~ Submissio~ Assi~ Assig~ 10 Assignment: Fina~ 53d6~ 2019-10-19 13:28:38 Assignme~ Course mo~ Assi~ Assig~ # i 95,616 more rows

Instead of mapping timecreated directly to the x aesthetic, we can plot the timeline of the number of events per day by using as.Date(timecreated) and the geom_line geometry from ggplot2. Notice that, unlike geom_bar, if we do not provide a y aesthetic and want ggplot2 to count the number of events per day for us, we need to make it explicit by passing the stat argument with value ”count” to geom_line (Fig. 26).

Fig. 26
A line graph of count versus dates. It plots a fluctuating trend with multiple spikes and dips. October 15 has the highest count of 4900. Data is approximate.

Line plot of number of events per day

A single line code in R. This g g plot 2 code plots a line graph of event counts over time, with time created converted to Date format on the x-axis.

The line plot of students’ events allows us to identify periods of increased activity. We can see that it was low at the very beginning of the course, with some peaks corresponding to the assignment deadlines and one last peak for the final project. When the course is over, activity begins to decrease.

To make our plot more aesthetically pleasing, we can customize the color and line width. We do so by tweaking the color and linewidth properties of the geom_line. We can also fix the axes’ titles as we learned before (Fig. 27). For example:

Fig. 27
A line graph of count versus dates. It plots a fluctuating trend traced in a bright color with multiple spikes and dips. October 15 has the highest count of 4900. Data is approximate.

Line plot of number of events per day with color, linewidth, and custom labels

A 3 line code in R. The g g plot 2 code creates a line graph plotting event counts over time, with time created converted to date format on the x-axis. The lines are turquoise, with a line width of 2, and the x and y-axis are labeled accordingly.

We can also add a point to mark each date using geom_point (Fig. 28):

Fig. 28
A line graph of the number of events versus dates. It plots a fluctuating trend with multiple spikes and dips. Data points are on the spikes and dips. October 15 has more events of 5000. Data is approximate.

Line plot of number of events per hour with points every hour

A 5 line code in R. The g g plot 2 code creates a line plot of event counts over time, with time created converted to date format on the x-axis. Lines are turquoise with a width of 1.5, points are purple with a size of 2 and a stroke of 1. The x and y-axis are labeled accordingly.

Besides visualizing the events for all the students of the course, we can pinpoint specific students to follow their progress and offer them personalized support. To do this, we would need to filter our data before handing it over to ggplot2. We can filter the data using the filter function from dplyr, as we learned in Chapter 4 [29]. We first install dplyr if we do not have it:

A single line code in R reads the following, install, dot, packages, left parenthesis, double quotes, d p l y r, double quotes, right parenthesis.

Then, we import it as usual:

A single line code in R reads the following, library, left parenthesis, d p l y r, right parenthesis.

We can now filter the data and pass it on to ggplot2 (Fig. 29):

Fig. 29
A line graph of the number of events versus dates. It plots a fluctuating trend with multiple spikes and dips. Data points are on the spikes and dips. September 15 has more number of events above 100. Data is approximate.

Line plot of number of events per date for a single student

A 5 line code in R. It filters the event data for a specific user using the pipe operator and filter function. Then, it plots a line graph of event counts over time for that user, with turquoise lines, purple points, and appropriate axis labels.

3.3.4 Jitter Plots

In the previous plots we have seen aggregated information for all the cohort of students as well as information for a single student. However, in some occasions, it is very useful to see the general picture while accounting for possible individual differences. For example, using our original df dataset, we can plot the number of events on the LMS, differentiating between high achievers and low achievers.

One option is to use geom_point to represent each students’ count of events as a single point. To do this, we map the Event column to the x aesthetic, the Frequency column to the y aesthetic, and the User column to the group aesthetic (Fig. 30):

Fig. 30
A Jitter plot of the number of events versus the achieving group. It plots data points vertically in low achiever and high achiever groups. The high achiever group has more number of events about 2500. Data is approximate.

Jitter plot of number of events per achievement group using geom_point

A 7 line code in R. This g g plot 2 code creates a scatter plot of the frequency total against the achieving group from the data frame d f. Points are plotted, with appropriate axis labels. The legend is positioned at the bottom, with a smaller text size, and without a title.

However, there are many points that overlap. If we use geom_jitter instead, we take advantage of the horizontal gap between the event names to spread the points and avoid the overlap:

A 7 line code in R. This g g plot 2 code creates a jitter plot of the frequency total against the achieving group from the data frame d f. Jittered points are plotted, with appropriate axis labels. The legend is positioned at the bottom, with a smaller text size, and without a title.

The plot shows that students that are high achievers generally have a higher number of events than low achievers (Fig. 31).

Fig. 31
A scatter plot of the number of events versus the achieving group. It plots data points randomly throughout the groups. The high achiever group has more number of events about 2500. Data is approximate.

Jitter plot of number of events per achievement group using geom_jitter

3.3.5 Box Plot

When we have too many data points, it is often more useful to visualize summary statistics instead of all the points. Box plots are very useful in summarizing data distributions. We can create a box plot for the number of events per achievement group using geom_boxplot:

A 2 line code in R. This g g plot2 code generates a boxplot of the frequency total across different levels of the achieving group from the data frame d f. The x-axis represents the achieving groups, and the y-axis represents the number of events.

The lower hinge of each box indicates the 25% percentile, the thick middle line is the median, and the top hinge is the top 75% percentile. The upper whisker extends from the hinge up to the maximum value within 1.5 * IQR (inter-quantile range), whereas the lower whisker extends to the minimum value within 1.5 * IQR of the hinge. The points outside the whisker represent outliers in the distribution (i.e., values outside of the 1.5 * IQR range). As the jitter plot already hinted, the median number of events is higher in the high achieving group (Fig. 32).

Fig. 32
A box whisker graph of the number of events versus the achieving group. The outlier of the high achiever group has more events about 2500. Data is approximate.

Box plot of activity per achievement group

3.3.6 Violin Plot

We can also visualize the distribution of the number of events for each group using violin plots (geom_violin), but these are recommended when we have a large amount of data (Fig. 33):

Fig. 33
A violin plot of the number of events versus the achieving group. The maximum distribution of the high achiever group has more events about 2500. Data is approximate.

Violin plot of total activity per achievement group

A 2 line code in R. The g g plot 2 code visualizes the frequency total distribution across the achieving group using a violin plot. The x-axis represents the achieving groups, and the y-axis represents the number of events.

3.3.7 Scatter Plots

The examples we have seen so far have dealt with plotting a single variable alone or divided in categories. Another common scenario is to investigate the direct relationship between two or more variables. Scatter plots are used to visualize how two numerical variables relate to each other. For example, we can use them to see how LMS activity relates to grades (Fig. 34).

Fig. 34
A scatter plot of the final grade versus the number of events. It plots data points that are clustered in the events from 0 to 1500 and it decreases after 1500. Data is approximate.

Scatter plot of number of events vs. final grade

A 3 line code in R. The g g plot 2 creates a scatter plot of the final grade against the frequency total from the data frame d f. The y-axis represents final grades, and the x-axis represents the number of events.

In the plot, each point represents a student. Students at the right side of the plot represent students with higher activity, while students closer to the left side of the plot, represent students with lower activity. At the same time, students with low grades are closer to the bottom of the plot, while students with high grades are closer to the top. Overall, se see an upward trend whereby students with higher activity indeed obtain better grades.

We can add another dimension by coloring points according to another variable. For example, we can color the points according to high vs. low achievers (Fig. 35), so we can now where the division between the two groups is:

Fig. 35
A scatter plot of the final grade versus the number of events. It plots data points in two colors for low achievers and high achievers that are clustered in the events from 0 to 1500 and it decreases after 1500. Data is approximate.

Scatter plot of number of events vs. final grade colored by achievement group

A 4 line code in R. The g g plot 2 creates a scatter plot of the final grade against the frequency total from data frame d f. Each point is colored according to the achieving group. Axis labels are set for clarity, and the legend title is labeled Achievement to represent the color coding.

We can add yet another dimension by mapping the size aesthetic to another variable, for example Frequency.Group_work which represents the number of events related to group work (Fig. 36).

Fig. 36
A scatter plot of the final grade versus the number of events. It plots data points in two colors for low achievers and high achievers that are clustered in the events from 0 to 1500 and it decreases after 1500. The data points vary by size. Data is approximate.

Scatter plot of number of events vs. final grade colored by achievement group and sized by frequency of group work

A 6 line code in R. The g g plot 2 creates a scatter plot. The frequency total is on the x-axis, and the final grade is on the y-axis. Points are filled according to the achieving group and sized based on the frequency of group work. Points are outlined in black.

3.4 Advanced Features

3.4.1 Plot Grids

Sometimes, adding all the information in a single plot can be overwhelming and hard to interpret. For example, take a look at the following line plot that shows the number of events per day for each of the course online components (Fig. 37):

Fig. 37
A multiline graph of the number of events versus dates. It plots 12 fluctuating trends for applications, assignments, course view, ethics, feedback, general, group work, instructions, L a types, practicals, social, and theory.

Multiple series line plot

A 5 line code in R. The g g plot 2 creates a line plot of event counts over time, colored by action categories. The time created is on the x-axis, and the number of events is on the y-axis.

If we had only a few (2–5) lines, the plot would probably look good, but as the number of categories grow, the plot becomes unintelligible. Instead of showing all the lines together, the plot would be easier to understand if each component had their own plot. To do this, instead of mapping the Action column to the color aesthetic, we add a new component to our plot using facet_wrap and we pass the name of the column as a character string (”Action”). We can change the geom_line to a geom_area to enhance the visualization (Fig. 38).

Fig. 38
Twelve area charts are arranged four in a row. Each plot the number of events versus applications, assignments, course view, ethics, feedback, general, group work, instructions, L a types, practicals, social, theory, and dates. The social chart has more events than the others.

Grid of multiple plots

A 5 line code in R. The g g plot 2 code creates an area plot showing event counts over time, with the time created converted to date format on the x-axis. Each area is filled in turquoise with black outlines. The plot is facet-wrapped by action, and with appropriate axis labels.

3.4.2 Combining Multiple Plots

In the previous example, we saw how to split a plot into multiple plots. But what happens if we want to combine multiple independent plots? For that purpose, we can use the library patchwork. Install it first if you do not have it already:

A single line code in R reads the following, install, dot, packages, left parenthesis, double quotes, patchwork, double quotes, right parenthesis.

We import the patchwork library:

A single line code in R reads the following, library, left parenthesis, patchwork, right parenthesis.

We have to create the plots that we want to combine and assign each of them to a different variable. We can use previous examples from this chapter and assign them to variables named p1, p2, and p3.

A 14 line code in R. It defines three g g plot 2 plots, p 1, scatter plot of frequency total versus final grade. P 2, stacked bar plot of activity group within the achieving group. P 3, line plot of event counts over time. Each plot is customized with appropriate axis labels.

Now, if we add the three variables together separated by the + sign, the plots will be placed horizontally next to each other (Fig. 39):

Fig. 39
Three graphs arranged horizontally. A. A scatter graph compares the grade versus the total number of events. B. A stacked bar graph compares the number of events versus the achievement group. C. A scatter line graph compares the number of events versus the dates.

Multiple plots stacked horizontally

A single line code in R reads the following, p 1, plus, p 2, plus, p 3.

If we use the / character side instead, we lay them out vertically (Fig. 40):

Fig. 40
Three graphs arranged vertically. A. A scatter graph compares the grade versus the total number of events. B. A stacked bar graph compares the number of events versus the achievement group. C. A scatter line graph compares the number of events versus the dates.

Multiple plots stacked vertically

A single line code in R reads the following, p 1, forward slash, p 2, forward slash, p 3.

We can use combinations of both signs and even leave blank spaces as follows (Fig. 41):

Fig. 41
Three graphs are arranged in a grid. A. A scatter graph compares the grade versus the total number of events. B. A stacked bar graph compares the number of events versus the achievement group. C. A scatter line graph compares the number of events versus the dates.

Multiple plots in a grid

A single line code in R reads the following. Left parenthesis, p 1, plus, p 2, right parenthesis, forward slash, left parenthesis, p 3, plus, plot, underscore, spacer, left parenthesis, right parenthesis, right parenthesis.

Putting plots side by side can be very useful to compare datasets and discuss the differences. Some publication venues limit the number of figures or pages of their articles, so combining several plots together can be very useful to overcome this limitation.

4 Creating Tables with gt

We have seen earlier in this chapter multiple types of visualizations that are suitable for diverse scenarios in learning analytics. However, we must not forget the other main way of reporting results or metrics, i.e., tables. When we display a data frame in Rstudio, it is by default presented as a table, but we need to be able to extract this table and display it in a dashboard, a report or a scientific article. The library gt can help us with this endeavor. First, install it if you do not have it yet:

A single line code in R reads the following, install, dot, packages, left parenthesis, double quotes, g t, double quotes, right parenthesis.

We then import it, as usual:

A single line code in R reads the following, library, left parenthesis, g t, right parenthesis.

Let us create a table, for example, to display the descriptive statistics of students’ events in the LMS. Using the events dataset, we first count the number of events of each type (Event.name) per student (user) using group_by and count from dplyr. We then group by Event.name only and use the summarize function, also from dplyr, to create the mean, and standard deviation of the number of events of each type per student, as we learned in Chapter 5 [30].

A 5 line code in R. It groups events by user and action and counts occurrences. Next, it groups the counts by action, calculating the mean and standard deviation of counts for each action.

# A tibble: 12 x 3    Action        Mean     SD    <chr>        <dbl>  <dbl>  1 Applications  11.1   9.83  2 Assignment    56.7  34.1  3 Course_view  195.  152.  4 Ethics        11.7  10.7  5 Feedback      24.7  16.2  6 General       25.7  21.4  7 Group_work   252.  163.  8 Instructions  49.8  40.3  9 La_types      14.5   7.58 10 Practicals    77.1  33.8 11 Social        18.1  19.0 12 Theory        11.1   6.92

Now that we have a data frame with the shape that we like, we can use gt to create the formatted table by simply adding gt to the pipeline of operations (Table 1):

Table 1 Table created with gt
A 5 line code in R. It groups events by user and action, then counts occurrences. Next, it groups the counts by action, calculating the mean and standard deviation of counts for each action. Finally, it displays the summarized data using the g t function from the g t package.

We might add some tweaks by forcing the numerical columns to have two decimals and the first column to be aligned left. You can also apply themes to the table using the library gtExtras (Table 2).

Table 2 Table created with gt with formatting
An 8 line code in R. It groups events, counts occurrences, and then summarizes the counts by action, calculating the mean and standard deviation. It displays the summarized data using the g t function from the g t package, formats numeric values to two decimals, and aligns the columns to the left.

5 Discussion

The use of data visualization in the context of learning analytics has the potential to greatly enhance our understanding of student behavior and performance. Using tools such as ggplot2, instructors and researchers can create informative and visually appealing plots that highlight important patterns and trends in student activity, providing insights into factors that may be impacting student success and therefore inform instructional decisions and improve student outcomes.

As we have already seen throughout the chapter, we often use different plots when dealing with categorical variables or numerical variables; when plotting a single variable or two (or more), etc. Moreover, on some occasions when we need very detailed information, a table might be more informative compared to a figure. As a summary for the possible visualizations, Table 3 gathers the most commonly used visualization types that we have seen throughout this chapter according to the number of variables and the data type. It also points to the ggplot2 geometry that is used to create each visualization.

Table 3 Summary of the types of visualization for each data type and number of variables

Another way to decide which visualization to use is to think what kind of story we want to tell or which aspect of our data we want to highlight. Figure 42 shows a flowchart that can help choose the most suitable visualization for our data. There are many other decision charts online made for this purpose. For example, “From Data to Viz”Footnote 1 leads you to the most appropriate graph for your data and also links to the code to build it and lists common caveats you should avoid.

Fig. 42
A process map illustrates the comparison, composition, distribution, and relationship of aspects. It includes horizontal and vertical bar charts, radar charts, line charts, scatter plots, stacked bar charts, stacked area charts, pie charts, violin plots, box plots, jitter plots, and histograms.

Flowchart to decide the most appropriate visualization for your data

Throughout the rest of the book, we will see other forms of data visualization that are inherent to specific learning analytics methods. For example, in Chapter 15 [31], we will learn how to represent students’ discussions in the form of social networks, and in Chapter 10 [32], we will represent students’ sequences of activities using sequence analysis. The foundations learned in this chapter are key to understanding more complex visualizations in learning analytics and are, of course, transferable to other fields as well. We encourage readers to expand their knowledge of data visualization by referring to the recommended resources in the next section. Especially readers that would like to take their visualizations to the next step should consider using shiny,Footnote 2 a web framework for R that allows creating fully interactive web apps for data analyses such as dashboards.

6 Additional Material