1 Introduction

With its recent advances, DL has gained immense traction in research, industry, and education. As job opportunities related to machine learning are unprecedented, many want to learn about and understand DL technologies.

Fig. 1
figure 1

a Simple input types illustrate the abstract concepts behind RNNs. b An animated, modifiable network architecture shows the data flow. c The prediction visualization shows the network input, prediction, ground truth, and error bars, all animated to communicate their temporal nature. d Text helps explain the training process. e RNNs can be interactively trained. f Training parameters can be interactively explored

While initial progress in DL was mainly possible due to the rise of convolutional neural networks (CNNs), large training data sets, and GPU training in the context of image recognition [1,2,3], other network architectures, such as RNNs, which are able to process sequential data, are becoming increasingly important. At the same time, these more advanced learning architectures are more difficult to comprehend, as they employ concepts that are fundamentally different from classical computer science. Thus, by making the process behind RNNs transparent and easy to understand, research in sequential learning tasks can be accelerated as the field opens up to additional users and contributors.

Along this line, the visualization community has shown how interactive visual explorables can be effective for learning about DL concepts [4,5,6,7]. Since different architectures come with their unique challenges, existing educational applications usually focus on one type of architecture. Unfortunately, the set of existing applications still does not cover RNNs. This is despite the fact that RNNs are widely adopted in tasks such as speech processing [8, 9], handwriting recognition [10], and machine translation [11], among many others. While RNNs are capable of solving such sequential tasks, they also bring their unique architectures and concepts to capture temporal information. As these concepts differ from other network types, RNN education could be of great benefit. To facilitate RNN education, we propose exploRNN, an interactive explorable visualization for RNNs that runs directly in any modern web browser (Fig. 1).

The focus of exploRNN is to make learning about these abstract and complex network types easier, more motivating, and more applicable to real problems. By presenting learning material in a way that is conducive to learning, learners should need fewer unnecessary cognitive resources [12]. These freed-up resources are then available to be used for a deeper understanding of the learning material. We also expect that this would result in a more motivating and joyful learning experience compared to traditional learning methods, such as text. In turn, learners might be willing to spend more time learning, and more learners could be attracted in general, effectively increasing overall knowledge gain. To assess these hypotheses, we compare exploRNN with text-based learning in a between-subjects study with 37 participants. Our evaluation provides insights into when, and under which conditions, visual interactive learning environments can outperform conventional learning material.

Along this line, we make the following contributions:

  • Educational Objectives and Design Challenges for educational RNN visualizations, informing our visualization design.

  • An interactive visualization approach for RNN education, enabling investigation at different levels of granularity.

  • A quantitative, comparative evaluation, investigating the effectiveness of our approach and providing hints for other interactive educational visualizations.

exploRNN can be accessed online at: https://mi-pages.informatik.uni-ulm.de/explornn, contributing to a fast-growing corpus of visualization work in the field of DL. To our knowledge, exploRNN is the first educational visualization interface that is targeted at RNNs, an important and growing class of neural networks (NN). Additionally, our study is the first to compare conventional learning material to a visual, interactive learning environment for DL education.

2 Background: RNNs

We would like to invite readers who want to refresh their knowledge on RNNs to use exploRNN at https://mi-pages.informatik.uni-ulm.de/explornn as an interactive learning experiment. This chapter contains a brief summary of the knowledge that is communicated in exploRNN.

CNNs and multi-layer perceptrons (MLPs), which are used for most classical DL tasks, process data in a feedforward manner. On the contrary, RNNs provide a cyclical architecture in which the output of the previous timestep is used in combination with new inputs to inform the activation of a cell. The main difference in training RNNs is backpropagation through time (BPTT), where the prediction error is propagated not only back through the layers but also within the recurrent connections of the layers.

We visualize the LSTM architecture (cf. Fig. 2). Although this is not the most simple recurrent architecture that exists, it is superior in capturing long-range dependencies, as it mitigates the vanishing gradient problem [13,14,15], and is thus widely used. The main features of an LSTM cell are the gating mechanisms and the cell state. The three gates within an LSTM cell are computed based on the input at time step t, \(x^t\) and the activation of the cell at time step \(t-1\), \(a^{t-1}\) as follows:

\(i^t = sigmoid(W_{ix}x^t + W_{ia}a^{t-1} + b_i)\), what new information to use to update the cell state \(c^t\).

\(f^t = sigmoid(W_{fx}x^t + W_{fa}a^{t-1} + b_f)\), what information in the cell state \(c^t\) can be forgotten.

\(o^t = sigmoid(W_{ox}x^t + W_{oa}a^{t-1} + b_o)\), what part of the cell state \(c^t\) is used to compute the activation.

The cell state at timestep t is then computed as \(c^t = f^t \circ c^{t-1} + i^t \circ tanh(W_{cx}x^t + W_{ca}a^{t-1} + b_c)\), where \(\circ \) is the hadamard product.

While there are other architectures that also use the concept of cell state and modular updating, such as gated recurrent units (GRUs) [16], their underlying idea does not greatly differ. However, since LSTMs were the first to introduce the explained concepts, are more general in their application [17], and often outperform GRUs [18], we focus on conveying the LSTM architecture.

Fig. 2
figure 2

LSTM cell with all operations visualized. The input is added to the output of the previous time step and then used by the three gates for the gate activation

3 Related work

In this section, we first give a brief overview of the explorable explanation literature before elaborating on the corpus of related work in the area of educational visualizations for DL non-experts and RNN visualizations.

3.1 Explorable explanations

Explorable learning environments were invented long before DL raised awareness in the broader public. Their effectiveness was investigated in the line of work by Hundhausen et al.  [19, 20]. Schweitzer and Brown then described design characteristics and an evaluation of active learning settings in classrooms by using visualization [21]. There also exists a line of work on the use of visualization for programming education [22,23,24]. These approaches show how visualization can communicate algorithmic thinking effectively. We combine these ideas with more recent concepts, which have been proposed under the term exploranation in the area of science education [25], where explorable explanations provide benefits for learning.

There are numerous helpful visualizations conveying properties of NN architectures, their functionality, or application scenarios [5, 6, 26]. However, we will focus on educational, explorable visualization approaches that have been proposed for different network types. One of the most prominent interactive educational visualization approaches has been proposed by Smilkov et al.  [7]. In their explainable Tensorflow Playground, one can select the properties of a NN to be trained. They also allow the customization of certain training parameters and deployed their approach as a web-based application. Similarly, in Revacnn, users can explore the activations of a CNN by modifying the network structure and training the network in the browser [27]. While these approaches help teach the most basic concepts of MLPs and CNNs, respectively, more advanced architectures need further, specialized visualizations. Another approach that is closely related to ours, but works on a different type of NNs, namely GANs, is GanLab [4]. They focus on how the generator and discriminator are used adversarially to yield synthetic data that resembles the data distribution it was trained on. However, GANs bring their own visualization challenges, which are fundamentally different from those we found for RNNs. Additionally, neither of these systems was systematically and quantitatively evaluated.

As none of these visualization approaches is designed to help non-experts understand how RNNs function, with their unique concepts of memory and temporal dependence, our aim is to fill this gap in the literature with exploRNN. Additionally, we shed light on the usefulness of such interactive learning environments through our quantitative user study. We specifically examine the difference in learning outcomes across different learning hierarchies [28], the complexity of the learning experience by means of cognitive load [12], and qualitative assessments such as motivation, perceived quality of content, and joy throughout the learning process. We hope that our findings in this area can be an indication for similar learning environments and motivate others to conduct similar experiments.

3.2 RNN visualizations

Apart from educational visualizations, there is another line of visualization work targeted toward investigating RNNs. These approaches are designed mainly for researchers who want to understand and debug their models. An early approach toward visualizing RNNs was proposed by Karpathy et al. , who visualize the activation of RNN cells for expert analysis [29]. Strobelt et al. published LSTMVis, in which the hidden state dynamics of RNNs is investigated [30]. They specifically demonstrate how text understanding can be analyzed through investigating the structure and change of the cell state. They also presented Seq2Seq-Vis, in which sequence-to-sequence models can be probed to reveal errors and learned patterns [31]. Along the same line, Ming et al. introduced RNNVis [32]. They analyze the functionality of individual hidden state units by observing their reaction to specific text segments. With RNNbow, Cashman et al. published a visualization, in which the gradients of RNNs can be analyzed [33]. They attribute the gradient to individual letters in a textual input sequence. This way, researchers can inspect how their models learn. In another approach, Shen et al. proposed visualizations for RNNs [34] operating on multi-dimensional sequence data. Here, developers can inspect hidden unit responses to get insight into different networks. Similarly, Garcia and Weiskopf proposed a visualization for the inspection of hidden states of RNNs [35]. However, all approaches described here are expert tools that help during development. Contrary to this, we aim to convey the general idea of RNNs to novices in this area of DL.

Insights on the effects of using exploration and visualization for learning in general, as well as present educational visualizations for NNs, show how interactive exploration can help a broader audience with access to learning experiences. Therefore, we propose exploRNN, which provides insight into the function of RNNs for users who know the basics of DL, but are laymen in the area of sequential learning. Our evaluation also provides the first comparative analysis of interactive learning environments and classical learning approaches for NN education.

4 Educational objectives

To inform the visualization design of a learning experience, educational objectives are needed, which we defined based on Bloom’s taxonomy [28]. Our target users already understand the fundamental concepts of DL and know about feed-forward NNs. Without this background knowledge, the theory behind those techniques would first need to be explained, which would extend the scope of exploRNN. As our target audience aims to learn the yet unknown concepts of RNNs, we focused on recall , comprehension , and transfer of the learned information. Later, this learning can be applied in the wild to access levels four to six (analyze, evaluate, create) of Bloom’s taxonomy. Formulated on this basis, our educational objectives are:

Justification. Users should know that RNNs, in contrast to other network types, can be used for sequential data. This also includes BPTT, through which RNNs can learn temporal dependencies, which classical feed-forward networks cannot.

Cell Structure. Users should then understand how LSTM cells are built and what functionality their individual components have. Here, the cell gates, as well as the memory element, are of special importance, as they enable the processing of sequential data.

Training Setup. To understand the training process of such networks, users should know important parameters for the setup of RNNs. This includes the network structure, training parameters, and how data are fed to the network.

Task. Finally, to transfer this theoretical knowledge about RNNs to real applications, users should learn about different application areas and data types that can be used with RNNs. In the end, they should be able to describe how RNNs could be used for their own application scenarios.

Similarly to a lecture at a university or a textbook, our learning environment is designed to provide an introduction to RNNs from which interested users can start experimenting with the techniques. Accordingly, our educational objectives not only motivate the importance of RNNs, but are also aimed at providing insights about the input data and related tasks, as well as how the training process and LSTM cells work.

5 Design challenges

Since RNN cells are a special form of NN layers, they open up unique challenges for visualization-based education. We observed both visualization design challenges and technical challenges, which we describe in this section.

5.1 Visualization design challenges

We first discuss the following visualization design challenges that we identified in the context of an interactive learning environment aimed at RNNs and illustrate how they relate to our educational objectives:

Complexity. As mentioned in Sect. 1, one of our central goals is to simplify learning by reducing cognitive load [12]. However, RNNs are typically trained on a large amount of complex data that can be difficult to grasp  [36, 37]. The same holds for network architectures, which are also often too complex to fully comprehend in their entirety  [38, 39]. Consequently, all visualizations must be interpretable and intuitive, but realistic enough to form a compelling use case [40, 41].

Dynamics. An educational system to teach RNN concepts should clarify the dynamics of the sequential data on which these networks operate , as well as the dynamics of the training process . These dynamic processes must be visually communicated, including data type and data processing, both forward (inference) and backward (backpropagation), within the network [42].

Multiscale. RNN structures need to be communicated at different granularities, i.e., network, cell, and cell components . These multiple scales need to be fluidly inspectable, while at the same time, the granularity at which the user currently operates must be communicated [41, 43].

Supervision. In classical learning settings, teacher supervision or other opportunities to seek further information is provided. Contrary to this, exploRNN is designed as a standalone learning environment that does not require external guidance . Thus, supervision has to be substituted by visual guidance [44, 45].

5.2 Technical challenges

Whereas the visualization design challenges are based directly on our educational objectives, the following technical challenges relate to the development of such an interactive, explorable learning environment:

Training Time. Typically, training processes can take up to several days to convergence [46, 47]. However, for an interactive learning experience, waiting days for convergence is not feasible. To provide direct feedback to the user, our networks thus have to converge in minutes instead of hours or days.

Training Steps. Normally, computation is done as fast as possible to minimize the time it takes for the network to converge. However, we want the user to be able to follow the training process and observe individual training steps [44]. Thus, training steps should be separated temporally from the visualization.

Deployment. Modern-day learning is often conducted via online courses, blog posts, or explainable web pages [48]. Although this makes such learning environments accessible to a broad audience, it also limits the technical freedom of such applications [49]. Therefore, educational environments should be deployed to a broad audience, while also providing diverse functionality.

6 Visualization design

In the following, we discuss the visualization design of exploRNN. We explain how we tackle the aforementioned visualization challenges and learning psychology goals while targeting the educational objectives defined in Sect. 4. We first describe the overall visualization concepts we implemented for exploRNN. Then, we elaborate on the different views of our environment in the upcoming subsections.

Scales To show both an overview of the training process and give detailed insight into the computation that is performed within one recurrent cell , we employ an overview first, zoom and filter, then details on demand visualization design, following Shneiderman’s mantra  [43]. Therefore, exploRNN consists of two main views, the network overview (Fig. 1), which displays the training progress on the network scale, and the LSTM cell view (Fig. 6), which allows for a detailed inspection of an LSTM cell. This is in line with our goal of reducing complexity by focusing on individual steps of the learning process rather than presenting everything at once.

Animation Animation has shown to be effective in visualizing data relationships and algorithms  [42, 50, 51]. Furthermore, animation has shown to be associated with fun and excitement [52], which is in line with our goal of making learning more enjoyable . Thus, to visually communicate how the network operates on sequential data, we use animation throughout our visualizations .

Onboarding Novel visualizations and interactive systems can be hard to understand [53]. We designed exploRNN in a way that allows exploration without running the risk of making irreversible errors or needing teacher supervision . However, instructional aids may be important to understand such complex content  [54]. Therefore, we use an onboarding process for our educational environment [55] (cf. Fig. 3). With this process, we aim to further reduce the cognitive load during learning compared to classical learning environments  [56]. For example, the sequential nature of RNNs and the data and tasks that RNNs can be used for are communicated in exploRNN.

Textual explanations In contrast to other learning environments, which show static textual explanations below the main visualization [4, 7], we instead provide such additional information as details on demand  [43]. This way, users can access more information for exactly the components they want to learn more about , while not having to read a lot of text . Our interactively explorable dialog boxes, as shown in Fig. 4, provide information about all important elements of the learning environment. Such dialogs exist for all headings and are anchored through an icon, and for all components of an LSTM cell, which is referred to in our onboarding process.

Fig. 3
figure 3

Onboarding dialogs guide the user through our visualizations, so that no manual introduction is needed, and the user can explore exploRNN on their own. Textual descriptions with highlights provide detailed explanations for individual components. Positioning and arrows reveal associations between dialogs and components

Fig. 4
figure 4

Users can access more detailed explanations for many elements of our visualizations such as training steps, hyper-parameters, and operations in a cell

6.1 Network overview

In the network overview, following the natural reading direction of western cultures, as well as related work on NN architecture visualization [57,58,59], we arrange the network from left to right. On the left, one can see the input type that is currently used to train the network . Centered, we present an abstracted visualization of the network, where users can see how many layers the network contains . On the right, a visualization of the prediction along with the prediction error shows how training progresses . Below these visualizations, information about the training process, controls for the training process, and means to change training parameters are shown .

 Input. To experiment with the network, users can select the input data that is used to train the network from a set of explanatory input types. Data for an interactive and explainable visualization of NNs needs to both explain the network functionality and be easy to understand . Therefore, current educational visualizations use an abstract, two-dimensional distribution of points to train their networks on [4, 7]. With exploRNN, we follow this approach of employing data that is as simple as possible . As RNNs are focused on sequential data, we decided to use periodical mathematical functions and simple text snippets, which map nicely to the sequential nature of RNNs . The functions that can be used as training data in exploRNNare a sinusoidal function, a sawtooth function, an oscillating function, and a composite sinusoidal function and vary in their periodicity. To demonstrate the sequential and dynamic nature of these input functions, we animated those that are in use so that they seem to flow while being input to the network .

In addition to abstract function continuation, we also provide text-based data to train the network on . To allow for interactive training, we employ rather simplistic text samples. These include a recurring character sequence (ababab...) and the well-known text lorem ipsum. Here, we employ a similar design language as with function data, to show that most ideas can be transferred across tasks. By incorporating this text learning scenario, users of exploRNN get to learn and inspect not only abstract problems, but can also experience more realistic scenarios .

 Network. In the network visualization, we want to communicate the recurrent nature of our network , but at the same time, show all layers. Thus, instead of the more frequently used unrolling of RNN layers [60], we add a loop to the layer glyph to symbolize this recurrence. This symbolizes the feedback loop of information output at t back to the input of a cell at \(t+1\) , which enables BPTT.

Fig. 5
figure 5

left: Adding a layer between two existing layers. right: Removing a layer from the network

Our network visualization is animated as data flows through its layers . For the prediction step, dashed lines flow in the forward direction to symbolize forward data processing. For the backpropagation step, they flow backward to resemble the backpropagation of the error . Dashed lines are moving from input to output during the prediction phase, and from output to the first network layer during training, because backpropagation is not applied to the input domain.

Users can also investigate how the training progresses differently depending on the number of recurrent layers in the network . Therefore, layers can be added or removed from the network to be trained, as shown in Fig. 5. As with most explorable components, we explain the implications of this in our introduction, and users can click the next to the network heading.

 Predictions. Commencing the top row of visualizations is the data plot, where we visualize an input sample and the prediction of the network along with its ground truth . Here, multiple data points that are processed by the network one after the other are used to inform a prediction, which is visualized by sliding a gray box over the input data that is currently processed . Additionally, the prediction values slowly build up with animations to clarify that this prediction is building up sequentially . We then use vertical lines in the function plots, which slowly emerge between the prediction and the target value. This vertical line encoding is in analogy with the way we calculate errors, namely, by looking at the prediction values and calculating the difference to the ground truth . The error calculation is embedded temporally between the inference (forward network animation) and backpropagation (backward network animation) phases of the training process . Altogether, through this animated component, while not being interactive itself, users can inspect the results of modifications that have been made in other places .

 Process. According to the typical NN training setup, we divide the training process into three distinct steps: inference (forward), validation (error calculation), and backpropagation (backward). The explanation pane in the lower left of the network overview (see Fig. 1) displays which step is currently executed and provides an explanation of what happens in each of these steps . Through this, the user can learn more about the training dynamics of the network . As described previously, animations in other components complement this dynamic nature of the training process .

 Controls. In the network overview, the network is trained by means of epochs to first provide an overview [43] of the training process . To experiment , users can interact with the control area in the bottom center of our environment . In addition to automatically advancing epochs, which can be controlled with the and buttons, users can also trigger network training for a single epoch, by pressing the button . The training process can always be reset using the button. A back button to go to a previous epoch is not included in exploRNN as this would require saving multiple previous states of the network parameters, which would require significant browser memory. Therefore, and as individual epochs normally do not change the network behavior completely, going back one training epoch during training is not a common operation during neural network training, so we think users will not miss such functionality.

 Training Parameters. To communicate the training setup of an RNN , a trade-off between completeness and simplicity must be made . Thus, we let the user freely choose some training parameters, but employ restrictions for others . As mentioned, users can add or remove individual network layers and use different preset training inputs. In addition, they can change the learning rate, batch size, and noise . The learning rate and batch size allow for exploration of different training settings . Noise can be added to make the training data more realistic, resembling real-world scenarios of imperfect measurements . Parameter changes can be made through sliders, which are positioned on the bottom right. To provide an intuition about the influence of these parameters, we include pretrained models that are loaded during the onboarding steps which explain each individual parameter . Other parameters, such as units per layer or optimization strategies, cannot be changed in our implementation. This trade-off between freedom of exploration and simplicity proved to be effective in educating users about the influence of different training parameters and keeping their cognitive load low .

Hierarchical aggregation can help simplify visualization designs  [61]. Thus, after getting an overview of the network, the user can inspect another hierarchy level in detail, namely individual LSTM cells  [43]. When selecting one of the layers in the network overview a zooming transition onto one of the network layers gradually reveals the structure of an LSTM cell to support the user’s mental image of looking into one of the layers . With this multiscale approach, where users can navigate between views, orientation is important . Therefore, a color coding indicates the current level of detail. This highlight color is blue for the network overview, whereas orange is used for the LSTM cell view. Orange and blue are complementary colors, which makes them easily distinguishable, and they can be differentiated by vision-impaired users [62].

Fig. 6
figure 6

LSTM cell view. Visualization of data flow through the cell. Input to the network and its prediction. Visualization of the training error computation. A gray sliding window indicates which data points are needed to initialize the cell state. Explanations with more detailed steps for the forward direction of data flow. In addition to interactively training the network, users can change the speed at which the visualization for cell steps advances. Just as in the network overview, users can modify training parameters

6.2 LSTM cell view

In the LSTM cell view, we show a detailed visualization of the selected cell on the left, embedded in small pictograms of neighboring cells . On the right of this cell visualization, one can see the input, target, and prediction values of the network, where new points are added as they flow through the cell . Below these visualizations, we show information about the training process, controls for the training process, and means to change training parameters, similar to the network overview .

 Cell Architecture. To convey the functionality of one recurrent unit , we show all computational elements within a cell . Wherever information is combined, we show a icon. Icons for the input (), forget (), and output gates () visualize the gating functionality of an LSTM cell. While all gates that transform the data are depicted with circular icons, the cell state, which represents the saved state of the cell, is represented by a squared icon, illustrating the semantic difference between these components. Each of these cell components can be selected to get a detailed explanation of its functionality, as shown in Fig. 4, marking another level of detail in this visualization .

Data flow is visualized through connecting lines and step-by-step animations of the cell components . Here, elements that process data in the currently visualized computation step are highlighted. As in the network overview, connections moving data are symbolized with dashed lines. Those lines flow forward during inference and backward during backpropagation. This way, we communicate how the hidden state and output of these cells is computed and visualize how the data flows from one to the next operation or gate .

The reverse data flow of BPTT occurs not only once within a cell to backpropagate to the previous layer, but multiple times, for all input time steps . The connections within the cell also clarify that there are two recurrent cycles, one from the output of the cell back to the input, and one within the cell to update the cell state based on its state in the previous iteration . As a result, while other visualizations require unrolling, where time steps are visualized by displaying multiple cells in concatenation [60], we communicate recurrence through step-by-step animation. This removes the ambiguity of stacked layers vs. unrolled cells, which was shown to hinder learners in our first experiments .

 Data Plot. Right of the cell visualization, we show the input data, network prediction, and ground truth all in one graph. In contrast to the network overview, where the network is directly connected to this output graph, the cell is disentangled from this visualization. As the depicted cell typically receives data from previous cells and outputs data to subsequent cells, this visualization, where animation steps are synchronized but not visually connected on both the input and output side, better reflects the network architecture of RNNs . Users can inspect this view during interactively controlled training to see how the network processes input data to make predictions sequentially and how it calculates the training loss in relation to the processing steps within a cell .

 Training Process. The three steps of inference, validation, and backpropagation are just as relevant in the LSTM cell view as they are in the network overview . As the training speed is lower in the LSTM cell view, users can skip part of the data processing and go directly to the processing step of interest . For the forward pass, we add additional explanations for the different processing steps of receiving the layer input, calculating the gate activations, updating the cell state, and outputting the activation value . These explanations are highlighted in synchronization with the processing steps during the forward pass to the data flow in the cell visualization above , allowing users to draw links between the processing steps and the explanations they are interested in .

 Controls. In the LSTM cell view, processing is done by means of compute steps, showing a much more detailed processing pipeline than in the network overview . As in the network overview, the control area can be used to experiment with the training process . The more fine-grained advancement of the visualization is also adopted by the degree to which the animation advances with the forward button, since it only executes the next compute step within a cell . In addition to what can be done in the network overview, the speed of the animations for data processing within this cell can be adjusted, so that users can explore the processing steps at their own pace .

We want to emphasize the buildup of state within a cell based on multiple input time steps. Thus, we show how the network processes these inputs in great detail, whereas we made the animation of the backpropagation take less time than forward processing. As exploRNN is not designed to represent accurate timings anyway, this is our way of visualizing cell processes in detail, while also preserving the ability to observe multiple epochs.

 Training Parameters. Training parameters can be adjusted in the LSTM cell view just as in the network overview, giving the user even more control over the training process and room for experimentation .

To get back to the network overview, one can click anywhere outside of the LSTM cell in Fig. 6.

7 Technical realization

While the visualization design described above has been carefully crafted to meet the educational objectives described in Sect. 4 and the visualization design challenges outlined in Sect. 5.1, its technical realization needs to take into account the technical challenges identified in Sect. 5.2. In this section, we detail how we tackled these technical challenges.

 Training Time. While an RNN for a complex application cannot be trained live in the browser, we simplify the problem in multiple ways. By employing simplistic data sets, the model can converge after relatively few epochs. Additionally, we limit the number of data points that are fed to the network per epoch. Therefore, epochs are processed sufficiently fast for our interactive visualization approach. We also limit the network size to at most seven layers, so that memory consumption and processing time are reduced. In turn, users can see the training progress and get visible prediction improvements after only a few epochs, while one such epoch takes seconds to compute.

 Training Steps. A key aspect of our approach is the decoupling of computation and visualization. Through this decoupling, we are able to show the training steps in an observable manner and enable exploration at the user’s own pace. This helps users understand how the model processes input data and predicts new data points.

 Deployment. To be able to make exploRNN publicly available for a large audience, we implemented it as an interactive browser application using HTML and JavaScript. To train the RNN, we use TensorflowJS [49], for animated visualizations of the trained network, we use P5.js [63]. This way, we are able to provide an interactive, web application that visualizes the training dynamics of RNNs through animation, which is accessible at: https://mi-pages.informatik.uni-ulm.de/explornn/.

8 Limitations

While exploRNN provides a novel environment for learning about RNNs, there is still room for more advanced visualization designs that could be explored in the future. Some of these limitations are explained in the following.

Explanations exploRNN offers a lot of experimentation that is complemented by textual explanations. However, the number of textual explanations that fit into the context of such an educational system, which is designed to provide an overview of this complex topic, is insufficient to fully explain RNNs. For specific questions that are not addressed by our interactive system, we refer to developer documentation and scientific papers.

Drill-Down exploRNN explains RNNs on both a network and a cell level. Apart from seeing the data flow on these granularities and textually describing the components of a cell, visualizing the workings of these components could further benefit the learning experience. However, these components are just mathematical functions to which neither the input nor output have a directly discernable meaning. If we were to, e.g., visualize the internals of a memory component, users could only see matrices of seemingly meaningless numbers flowing through these cells. This would not add any benefit and might even result in confusion about such a visualization. To explain these internal components, novel interpretability techniques might help. Inventing and implementing those is beyond the scope of this work.

Component change To see the influence of individual components in a cell, changing or removing them could be an interesting addition to the workflow. We did not implement this capability for two reasons. First, adding such functionality goes deep into the working of individual cells, which would exceed the learning objective of getting an overview of RNNs and LSTM cells. In turn, we assume that changing single components in individual cells is unlikely to have a measurable and interpretable effect on the overall learning outcome. Second, we would have needed to implement our own DL library for this to be possible, as TensorflowJS has predefined LSTM layer implementations.

Degrees of freedom While users can change some hyperparameters and network settings in our environment, we deliberately do not expose all possible settings to our users. The goal of this limited exploration setting is that users can get an overview of important manipulations to be made, while at the same time not overwhelming our target audience. As for limited explanations, we refer to developer documentation and scientific papers for users that want to explore these details.

Layer types In our implementation, we focused on conveying LSTM cells. However, there are numerous other cell architectures for RNNs. Although we do not think this limited focus hinders learners with understanding RNNs on a high level, it would, nonetheless, be helpful for users specifically interested in certain cell types to include these in exploRNN.

9 User study

To evaluate the effectiveness of our approach, we conducted a user study with 37 participants (30 male, seven female) aged between 21 and 32. Participants were recruited from a DL course at our local university. Our study was a lecture at the end of the course, after students had already learned about feed-forward NNs. Participants were randomly assigned to one of two groups. The exploRNN group received the interactive application, and the text group was presented a text-based learning environment.

To look at learning outcome in detail, our evaluation was divided into the first three distinct, hierarchical cognitive learning goals according to Bloom’s taxonomy [28], namely recall, comprehension, and transfer. We expect higher learning outcomes for the exploRNN group compared to the text group at all three levels. For a closer look at the cognitive processes involved in learning , we also collected data for the three types of cognitive load [12]. Intrinsic cognitive load (ICL) results from the natural complexity that underlies the learning content. Since the difficulty does not differ, there should be no difference between the two groups. Extraneous cognitive load (ECL) is caused by inadequate instruction or presentation of information. Due to the step-by-step presentation of information and the direct connection of textual information and explanatory figures in the exploRNN group, we expect lower ECL for the exploRNNgroup compared to the text group. Lastly, germane cognitive load (GCL) represents the invested learning-related load. GCL is connected to the processes that are needed to construct and automate mental representations [12]. Following the reduced ECL in the exploRNN group, learners should have more free cognitive capacity in working memory to invest in learning-related GCL.

9.1 Hypotheses

Based on the described theory, we hypothesize the following. We expect a higher learning outcome, differentiated by recall , comprehension and transfer in the exploRNN group than in the text group. Furthermore, we expect no differences between the groups for ICL . We expect a lower ECL in the exploRNN group than in the text group . For the GCL, we expect it to be higher in the exploRNN group compared to the text group .

9.2 Method

Our study was split into different steps, which we explain in the order they were presented to the participants.

Prior knowledge. Prior knowledge was measured with seven open-ended questions on NNs and DL techniques (e.g., Name two activation functions used in deep learning.). The questions were developed by a domain expert. All answers were rated by a domain expert, following a predefined solution to ensure objectivity. A total of one point could be scored for each question, with partial points of .5. The maximum score for the prior knowledge test was seven.

Motivation (MSLQ). To assess motivation, the MSLQ [64] subscale for motivation was used. The MSLQ is a self-report questionnaire designed for an academic setting. Motivation was measured with twelve items (e.g., I’m confident I can do an excellent job on the tests in this study.). Learners were instructed to respond as accurately as possible, reflecting their attitudes and behaviors toward the learning module. Responses were given on a 7-point Likert scale ranging from 1 strongly disagree to 7 strongly agree. Cronbach’s Alpha was computed for the internal consistency of the measures [65], and the reliability was \(\alpha = .95\).

Learning material. The learning material was presented either as a text with illustrating figures, formulas, and graphs (see our supplementary material) or through exploRNN(see website). For both conditions, the information was the same. The only difference was the presentation medium and the lack of interactivity in text-based learning.

Learning outcome. To assess learning outcome, a domain expert developed a posttest with 11 open questions on the content of the learning session. To better understand cognitive processes, the posttest was differentiated by the first three levels of Bloom’s taxonomy  [28]. Recall was measured with four questions (e.g., Name the backpropagation algorithm that is used for RNNs.). Comprehension was also measured with four questions (e.g., Explain the meaning of this formula: \(c_t = \hbox {filtered}\_\hbox {input} + \hbox {filtered}\_\hbox {state}\)). The main purpose of these questions was to test how well people could explain and discuss the learning content. Transfer was measured with three questions (e.g., Assuming you have a poem continuation network and training data with poems from the internet. If your network now makes a prediction, how do you determine if it is correct, to calculate the loss?). These questions were designed to test the ability of learners to draw inferences from the learning content and apply it to new contexts. Similarly to the prior knowledge test, each question was rated by a domain expert, following a predefined solution to ensure objectivity. A total of one point could be scored for each question, with partial points of .5. The maximum score for recall and comprehension was four each, and for transfer, it was three, so the total maximum score for the posttest was eleven. We did an ANOVA on the learning outcome to test for statistical significance.

Cognitive load. To measure cognitive load , the differentiated cognitive load questionnaire was used [66]. It contains two items for ICL, three items for ECL, and three items for GCL, all measured as self-reports on a 7-point Likert scale from 1 strongly disagree to 7 strongly agree. To measure internal consistency, the Cronbach Alpha was calculated [65]. Reliability was \(\alpha =.66\) for ICL, \(\alpha =.81\) for ECL, and \(\alpha =.77\) for GCL. As for learning outcome, we tested for significance with an ANOVA.

System usability To quantitatively measure the system usability, the System Usability Scale (SUS) was used [67]. This scale is a self-report measurement consisting of 10 items related to the usability of exploRNN(e.g., I found the system very cumbersome to use.). Responses to the items were given on a 7-point Likert scale ranging from 1 strongly disagree to 7 strongly agree. The internal consistency of this scale was \(\alpha =.74\).

Qualitative questions For an impression of the quality of the learning material, further questions were implemented. Three open-ended questions were related to likeability (What about the learning experience did you like especially, what did you not like?), missing functionality (Was there something you would have liked to do but could not?), and additional comments (Other remarks.) . For liking (I would like to use this learning material.) and recommendation (I would recommend this learning material to my friends.) of the material, two items could be rated on a 5-point Likert scale from very unlikely to very likely. Content (How was the quality of the content?) and design (How was the design of the learning experience?) could be rated with 0–5 stars.

9.3 Results

In the following, we present the results of the user study.

Descriptive data. The analysis of the descriptive statistics showed that subjects of the text group and the exploRNN group did not differ in most of the variables. T tests (variances were equal for all variables) with respect to age (\(p=.33\)), gender (exploRNN  group 21% females, text group 16.67% females) (\(p=.74\)), MSLQ (\(p=.11\)), self-efficacy (MSLQ) (\(p=.16\)) and duration (\(p=.79\)) revealed no significant differences. Motivation (MSLQ) showed a significant t test (\(p=.02\)), indicating that learners in the text group had a significantly higher score. Descriptive data for all variables per condition are given in Table 1.

To analyze whether prior knowledge and MSLQ should be used as covariates, we conducted a correlation analysis with learning outcomes and cognitive load. Significant correlations could be found for GCL with the MSLQ (\(r=.37\), \(p=.024\)) and for the recall of the posttest with the MSLQ (\(r=.44\), \(p=.006\)). Therefore, they were included as a covariate in the following calculations concerning GCL and recall. No other significant correlations for the potential covariates could be found.

Learning outcome. Against our hypotheses, we found a significant difference regarding recall (\(F(1, 34)=3.91\), \(p=.028\), \(\eta _p^2=.103\)) in favor of the text group but not for comprehension (\(F<1\), n.s.) or transfer (\(F<1\), n.s.).

Cognitive load. Contrary to our expectations, we found a significant difference between text and exploRNN group for ICL (\(F(1, 34)=3.85\), \(p=.029\), \(\eta _p^2=.099\)). ECL showed the hypothesized effect: The exploRNN group showed a significant lower score than the text group (\(F(1, 34)=4.33\), \(p=.023\), \(\eta _p^2=.113\)). Against our hypothesis, GCL was not significantly higher in the exploRNN group than in the text group (\(F<1\), n.s.).

System usability. The SUS questionnaire indicates an excellent usability (\(M=84.47, SD=9.45\))[68]. Participants also rated our approach as significantly more likable (\(F(1, 30)=10.52\), \(p=.003\), \(\eta _p^2=.260\)), more recommendable (\(F(1, 30)=11.75\), \(p=.002\), \(\eta _p^2=.281\)), and better designed (\(F(1, 30)=20.711\), \(p<.001\), \(\eta _p^2=.408\)) compared to the learning text.

Qualitative questions. We also got some qualitative feedback in our free-form fields. Participants liked our introduction, which apparently made it easy for them to get started with exploRNN the tutorial was nice and the platform was easy to use. They also mentioned that the graphical support of these textual explanations was helpful for them to form a mental image of the setting: the mental bridge the graphical presentation helped build was helpful in memorizing and understanding. Some participants said that they did not remember specific names, as it was not important during the usage of exploRNN: I later did not remember the name of the algorithm that was used, since it was not important during the usage of the tool. Some participants asked for something similar for other types of networks, e.g., I would like to have similar resources to cover other topics from the basics such as MLPs up to advanced topics and more complicated kinds of networks. As described in Sect. 8, we only support a limited set of interactions, which some participants commented on, e.g., [I missed] changing the activation function of the LSTM gates.

Table 1 Means and standard deviations for our study results separated by groups for all variables

9.4 Discussion

In the following, we will refer back to the hypotheses we had before conducting the study and discuss the study outcome.

Learning outcome We looked at both recall and understanding regarding the learning outcome. In contrast to , we found that learners in the text group showed significantly better results for recall. While we found no significant differences between the groups regarding comprehension and transfer the results are interesting nonetheless. Although not significant, the descriptive statistics indicate that the score for transfer is about 10% higher for exploRNN compared to text . This could be a first indication that learning environments such as exploRNN can help learners build a deeper understanding of the subject compared to learning with classic text. However, significant results and further research are needed to support this statement. Compared to recall, these results may indicate that while learners are better at learning terms by heart (surface learning) when they learn with text than with exploRNN.

A possible explanation for the better recall performance in the text group could be that learners have more experience with text-reading strategies [69]. This might help with the complex terms explained here, as learners might find it easier to find information that was previously presented [70]. Thus, designing ways to easily retrieve previously presented information could be an interesting direction of future research for such interactive explorables. Another possibility is that learning with an interactive environment, which is affirmative and provides information step by step, might infuse the illusion of knowing [71]. Learners may think that after a few experiments in exploRNN,  they have acquired enough knowledge, while there is still much more to explore and learn. In the text group, it is immediately clear to the learner when the text is finished. On the contrary, exploRNNcan require user initiative for information acquisition. As the learning experience was self-controlled, participants could decide for themselves when to go from the learning content to the posttest. Referring to the illusion of knowing, learners might have felt too competent as they experienced this more guided experience. However, even though learners may be able to transfer what they have learned to other application areas, they may be missing important basic terminology that was presented in the learning material to reflect their knowledge gain in a classical learning test.

Cognitive load Against , the exploRNN group showed significantly lower ICL than the text group. The perceived difficulty of the learning material is 12.71% lower for exploRNN even though the text content was identical in both conditions. This suggests that exploRNN makes the learning material appear easier . The reason for this could be that the content is presented step by step in the tutorial of exploRNN, instead of all at once as in the text condition [56].

The results regarding are consistent with our assumptions. With a large effect size, the exploRNN group showed lower ECL than the text group . Therefore, exploRNNreduced the extraneous cognitive load compared to the text content, although the content was the same in both conditions and there were no unnecessary figures or information in the text. Combined with lower ICL, more cognitive capacity remains for GCL, which is important for learning.

For GCL, we did not find the significant difference we hypothesized in . Although ICL and ECL indicate that more cognitive capacity should be free in the exploRNN condition, participants did not invest that cognitive capacity into GCL. This could be because there might already be high investment in GCL in both groups. Another explanation could be that since participants in the text group perceived the learning content as more difficult, they might have invested more GCL to compensate for said difficulty.

System usability The results of the SUS questionnaire indicate that our system is easy to use. This supports our proposed visualization and interaction design and shows that our design choice of creating an interactive environment as a learning experience on RNNs matches our target audience well. Additionally, as participants rated our approach as significantly more likeable, recommendable, and better designed, users are likely to experience more joy, and be more motivated when learning . In combination with the reduced cognitive load exploRNN inflicts on users, we hope that this could result in a larger number of users willing to spend their time learning and more time spent learning per user. In turn, we think that this might outweigh the advantage in some areas of learning outcome with the more familiar text-based learning environment. Further longitudinal studies on NN learning systems need to be done to investigate this in more detail.

Qualitataive feedback. The open feedback forms also provided interesting insights. In general, participants seemed to like exploRNN as a learning experience and even asked for similar interfaces in other contexts. Furthermore, the amount of information and our onboarding process seemed to make exploRNN easily usable. Most of the criticism was related to limitations regarding the freedom of interaction, which we deliberately implemented to provide an overview rather than in-depth technical details. Future work might reveal how both an overview and full depth could be combined in NN learning environments. We also learned why recall might be better in the text condition. As participants mentioned, they did not feel they needed to memorize specific terms to be able to use RNNs. This seems natural, as when programming or using RNNs in the wild, remembering specific terms is also not essential as they can be searched for. On the other hand, transfer tends to be much more important when tasked with solving real problems.

For learning material where transfer is important , as in recurrent networks, our descriptive results suggest that interactive visualizations such as exploRNN might be helpful. Additionally, the lower cognitive load and higher perceived likeability of our interactive environment might result in more learners spending more time with exploRNN. Although we extensively evaluated exploRNN in this study, it remains to be seen whether our insights are transferable to other learning environments. If so, the development of future explorables could be much better informed, indicating what is important, what could be discarded, and what needs to be improved on. While this study provides first insights into the effectiveness of such educational NN exploration environments, we hope that similar evaluations of other applications can broaden our insights.

10 Conclusions and future work

This paper presents the first interactive learning environment specifically designed for RNNs. We propose a new visualization approach for inspecting RNNs where different levels of granularity are employed. To inform our visualization design, we first introduced educational objectives for this setting. Based on these objectives, we identified design challenges, which we tackle in the proposed learning environment. We hope that this process can be helpful for the development of future interactive learning environments.

Subsequently, we tested the learning outcome, cognitive load, and usability of this learning environment in an empirical study. Our study is the first quantitative evaluation of an interactive NN learning interface and, as such, provides helpful insights and directions for future work. The results of the user study indicate that while the raw learning outcome was not improved compared to conventional methods, exploRNN makes learning easier and more fun since cognitive load was significantly reduced by exploRNN, while subjective likeability was significantly improved. Based on these findings, we propose to specifically design interactive NN learning environments so that cognitive load is reduced. For broadly accessible education, exploRNN can be used in any modern browser at https://mi-pages.informatik.uni-ulm.de/explornn/.

As mentioned, more user studies for similar educational explorables could further advance the field and better inform future visualization designs. One such possible research direction is the suspected advantage of the text condition for going back to previously presented information. Here, eye-tracking studies and novel interaction designs might provide new insights. Additionally, it would be interesting to investigate whether such systems indeed lead to more voluntary learning and how that affects learning outcome.