exploRNN: teaching recurrent neural networks through visual exploration

Due to the success and growing job market of deep learning (DL), students and researchers from many areas are interested in learning about DL technologies. Visualization has been used as a modern medium during this learning process. However, despite the fact that sequential data tasks, such as text and function analysis, are at the forefront of DL research, there does not yet exist an educational visualization that covers recurrent neural networks (RNNs). Additionally, the benefits and trade-offs between using visualization environments and conventional learning material for DL have not yet been evaluated. To address these gaps, we propose exploRNN, the first interactively explorable educational visualization for RNNs. exploRNNis accessible online and provides an overview of the training process of RNNs at a coarse level, as well as detailed tools for the inspection of data flow within LSTM cells. In an empirical between-subjects study with 37 participants, we investigate the learning outcomes and cognitive load of exploRNN compared to a classic text-based learning environment. While learners in the text group are ahead in superficial knowledge acquisition, exploRNN is particularly helpful for deeper understanding. Additionally, learning with exploRNN is perceived as significantly easier and causes less extraneous load. In conclusion, for difficult learning material, such as neural networks that require deep understanding, interactive visualizations such as exploRNN can be helpful.


INTRODUCTION
With its recent advances, DL has gained immense traction in research, industry, and education.As job opportunities related to machine learning are unprecedented, many want to learn about and understand DL technologies.
While initial progress in DL was mainly possible due to the rise of convolutional neural networks (CNNs), large training data sets, and GPU training in the context of image recognition [1][2][3], other network architectures, such as RNNs, which are able to process sequential data, are becoming increasingly important.At the same time, these more advanced learning architectures are more difficult to comprehend, as they employ concepts that are fundamentally different from classical computer science.Thus, by making the process behind RNNs transparent and easy to understand, research in sequential learning tasks can be accelerated as the field opens up to additional users and contributors.
Along this line, the visualization community has shown how interactive visual explorables can be effective for learning about DL concepts [4][5][6][7].Since different architectures come with their unique challenges, existing educational applications usually focus on one type of architecture.Unfortunately, the set of existing applications still does not cover RNNs.This is despite the fact that RNNs are widely adopted in tasks such as speech processing [8,9], handwriting recognition [10], and machine translation [11], among many others.While RNNs are capable of solving such sequential tasks, they also bring their unique architectures and concepts to capture temporal information.As these concepts differ from other network types, RNN education could be of great benefit.To facilitate RNN education, we propose exploRNN, an interactive explorable visualization for RNNs that runs directly in any modern web browser.
The focus of exploRNN is to make learning about these abstract and complex network types easier, more motivating, and more applicable to real problems.By presenting learning material in a way that is conducive to learning, learners should need fewer unnecessary cognitive resources CR [12].These freed-up resources are then available to be used for a deeper understanding DU of the learning material.We also expect that this would result in a more motivating and joyful MJ learning experience compared to traditional learning methods, such as text.In turn, learners might be willing to spend more time learning, and more learners could be attracted in general, effectively increasing overall knowledge gain.To assess these hypotheses, we compare exploRNN with text-based learning in a betweensubjects study with 37 participants.Our evaluation provides insights into when, and under which conditions, visual interactive learning environments can outperform conventional learning material.
Along this line, we make the following contributions: • Educational Objectives and Design Challenges for educational RNN visualizations, informing our visualization design.
• An interactive visualization approach for RNN education, enabling investigation at different levels of granularity.
• A quantitative, comparative evaluation, investigating the effectiveness of our approach and providing hints for other interactive educational visualizations.
exploRNN can be accessed online at: https://mi-pages.informatik.uni-ulm.de/explornn,contributing to a fastgrowing corpus of visualization work in the field of DL.To our knowledge, exploRNN is the first educational visualization interface that is targeted at RNNs, an important and growing class of neural networks (NN).Additionally, our study is the first to compare conventional learning material to a visual, interactive learning environment for DL education.Fig. 2: LSTM cell with all operations visualized.The input is added to the output of the previous time step and then used by the three gates for the gate activation.

BACKGROUND: RNNS
We would like to invite readers who want to refresh their knowledge on RNNs to use exploRNN at https://mi-pages.informatik.uni-ulm.de/explornnas an interactive learning experiment.This chapter contains a brief summary of the knowledge that is communicated in exploRNN.
CNNs and multi-layer perceptrons (MLPs), which are used for most classical DL tasks, process data in a feedforward manner.On the contrary, RNNs provide a cyclical architecture in which the output of the previous timestep is used in combination with new inputs to inform the activation of a cell.The main difference in training RNNs is backpropagation through time (BPTT), where the prediction error is propagated not only back through the layers but also within the recurrent connections of the layers.
We visualize the LSTM architecture (cf. Figure 2).Although this is not the most simple recurrent architecture that exists, it is superior in capturing long-range dependencies, as it mitigates the vanishing gradient problem [13][14][15], and is thus widely used.The main features of an LSTM cell are the gating mechanisms and the cell state.The three gates within an LSTM cell are computed based on the input at time step t, x t and the activation of the cell at time step t − 1, a t−1 as follows: Input Gate: , what new information to use to update the cell state c t .
Forget Gate: , what information in the cell state c t can be forgotten.
Output Gate: o t = sigmoid(W ox x t +W oa a t−1 +b o ), what part of the cell state c t is used to compute the activation.The cell state at timestep t is then computed as While there are other architectures that also use the concept of cell state and modular updating, such as gated recurrent units (GRUs) [16], their underlying idea does not greatly differ.However, since LSTMs were the first to introduce the explained concepts, are more general in their application [17], and often outperform GRUs [18], we focus on conveying the LSTM architecture.

RELATED WORK
In this section, we first give a brief overview of the explorable explanation literature before elaborating on the corpus of related work in the area of educational visualizations for DL non-experts and RNN visualizations.

Explorable Explanations
Explorable learning environments were invented long before DL raised awareness in the broader public.Their effectiveness was investigated in the line of work by Hundhausen et al. [19,20].Schweitzer and Brown then described design characteristics and an evaluation of active learning settings in classrooms by using visualization [21].There also exists a line of work on the use of visualization for programming education [22][23][24].These approaches show how visualization can communicate algorithmic thinking effectively.We combine these ideas with more recent concepts, which have been proposed under the term exploranation in the area of science education [25], where explorable explanations provide benefits for learning.
There are numerous helpful visualizations conveying properties of NN architectures, their functionality, or application scenarios [5,6,26].However, we will focus on educational, explorable visualization approaches that have been proposed for different network types.One of the most prominent interactive educational visualization approaches has been proposed by Smilkov et al. [7].In their explainable Tensorflow Playground, one can select the properties of a NN to be trained.They also allow the customization of certain training parameters and deployed their approach as a webbased application.Similarly, in Revacnn, users can explore the activations of a CNN by modifying the network structure and training the network in the browser [27].While these approaches help teach the most basic concepts of MLPs and CNNs, respectively, more advanced architectures need further, specialized visualizations.Another approach that is closely related to ours, but works on a different type of NNs, namely, GANs, is GanLab [4].They focus on how the generator and discriminator are used adversarially to yield synthetic data that resembles the data distribution it was trained on.However, GANs bring their own visualization challenges, which are fundamentally different from those we found for RNNs.Additionally, neither of these systems was systematically and quantitatively evaluated.
As none of these visualization approaches is designed to help non-experts understand how RNNs function, with their unique concepts of memory and temporal dependence, our aim is to fill this gap in the literature with exploRNN.Additionally, we shed light on the usefulness of such interactive learning environments through our quantitative user study.We specifically examine the difference in learning outcomes across different learning hierarchies [28], the complexity of the learning experience by means of cognitive load [12], and qualitative assessments such as motivation, perceived quality of content, and joy throughout the learning process.We hope that our findings in this area can be an indication for similar learning environments and motivate others to conduct similar experiments.

RNN Visualizations
Apart from educational visualizations, there is another line of visualization work targeted towards investigating RNNs.These approaches are designed mainly for researchers who want to understand and debug their models.An early approach towards visualizing RNNs was proposed by Karpathy et al., who visualize the activation of RNN cells for expert analysis [29].Strobelt et al. published LSTMVis, in which the hidden state dynamics of RNNs is investigated [30].They specifically demonstrate how text understanding can be analyzed through investigating the structure and change of the cell state.They also presented Seq2Seq-Vis, in which sequence-to-sequence models can be probed to reveal errors and learned patterns [31].Along the same line, Ming et al. introduced RNNVis [32].They analyze the functionality of individual hidden state units by observing their reaction to specific text segments.With RNNbow, Cashman et al. published a visualization, in which the gradients of RNNs can be analyzed [33].They attribute the gradient to individual letters in a textual input sequence.This way, researchers can inspect how their models learn.In another approach, Shen et al. proposed visualizations for RNNs [34] operating on multidimensional sequence data.Here, developers can inspect hidden unit responses to get insight into different networks.Similarly, Garcia and Weiskopf proposed a visualization for the inspection of hidden states of RNNs [35].However, all approaches described here are expert tools that help during development.Contrary to this, we aim to convey the general idea of RNNs to novices in this area of DL.
Insights on the effects of using exploration and visualization for learning in general, as well as present educational visualizations for NNs, show how interactive exploration can help a broader audience with access to learning experiences.Therefore, we propose exploRNN, which provides insight into the function of RNNs for users who know the basics of DL, but are laymen in the area of sequential learning.Our evaluation also provides the first comparative analysis of interactive learning environments and classical learning approaches for NN education.

EDUCATIONAL OBJECTIVES
To inform the visualization design of a learning experience, educational objectives are needed, which we defined based on Bloom's taxonomy [28].Our target users already understand the fundamental concepts of DL and know about feed-forward NNs.Without this background knowledge, the theory behind those techniques would first need to be explained, which would extend the scope of exploRNN.As our target audience aims to learn the yet unknown concepts of RNNs, we focused on recall O1&2 , comprehension O2&3 , and transfer O3&4 of the learned information.Later, this learning can be applied in the wild to access levels four to six (analyze, evaluate, create) of Bloom's taxonomy.Formulated on this basis, our educational objectives are: O1 Justification.Users should know that RNNs, in contrast to other network types, can be used for sequential data.This also includes BPTT, through which RNNs can learn temporal dependencies, which classical feed-forward networks cannot.O2 Cell Structure.Users should then understand how LSTM cells are built and what functionality their individual components have.Here, the cell gates, as well as the memory element, are of special importance, as they enable the processing of sequential data.O3 Training Setup.To understand the training process of such networks, users should know important parameters for the setup of RNNs.This includes the network structure, training parameters, and how data is fed to the network.O4 Task.Finally, to transfer this theoretical knowledge about RNNs to real applications, users should learn about different application areas and data types that can be used with RNNs.In the end, they should be able to describe how RNNs could be used for their own application scenarios.
Similarly to a lecture at a university or a textbook, our learning environment is designed to provide an introduction to RNNs from which interested users can start experimenting with the techniques.Accordingly, our educational objectives not only motivate the importance of RNNs but are also aimed at providing insights about the input data and related tasks, as well as how the training process and LSTM cells work.

DESIGN CHALLENGES
Since RNN cells are a special form of NN layers, they open up unique challenges for visualization-based education.We observed both visualization design challenges and technical challenges, which we describe in this section.

Visualization Design Challenges
We first discuss the following visualization design challenges that we identified in the context of an interactive learning environment aimed at RNNs and illustrate how they relate to our educational objectives: V1 Complexity.As mentioned in Section 1, one of our central goals is to simplify learning by reducing cognitive load [12].However, RNNs are typically trained on a large amount of complex data that can be difficult to grasp O1 O4 [36,37].The same holds for network architectures, which are also often too complex to fully comprehend in their entirety O3 [38,39].Consequently, all visualizations must be interpretable and intuitive, but realistic enough to form a compelling use case [40,41].V2 Dynamics.An educational system to teach RNN concepts should clarify the dynamics of the sequential data on which these networks operate O1 , as well as the dynamics of the training process O3 .These dynamic processes must be visually communicated, including data type and data processing, both forward (inference) and backward (backpropagation), within the network [42].V3 Multiscale.RNN structures need to be communicated at different granularities, i.e., network, cell, and cell components O2 .These multiple scales need to be fluidly inspectable, while at the same time, the granularity at which the user currently operates must be communicated [41,43].V4 Supervision.In classical learning settings, teacher supervision or other opportunities to seek further information is provided.Contrary to this, exploRNN is designed as a standalone learning environment that does not require external guidance O1-4 .Thus, supervision has to be substituted by visual guidance [44,45].

Technical Challenges
Whereas the visualization design challenges are based directly on our educational objectives, the following technical challenges relate to the development of such an interactive, explorable learning environment: T1 Training Time.Typically, training processes can take up to several days to convergence [46,47].However, for an interactive learning experience, waiting days for convergence is not feasible.To provide direct feedback to the user, our networks thus have to converge in minutes instead of hours or days.T2 Training Steps.Normally, computation is done as fast as possible to minimize the time it takes for the network to converge.However, we want the user to be able to follow the training process and observe individual training steps [44].Thus, training steps should be separated temporally from the visualization.T3 Deployment.Modern-day learning is often conducted via online courses, blog posts, or explainable web pages [48].Although this makes such learning environments accessible to a broad audience, it also limits the technical freedom of such applications [49].Therefore, educational environments should be deployed to a broad audience, while also providing diverse functionality.

VISUALIZATION DESIGN
In the following, we discuss the visualization design of ex-ploRNN.We explain how we tackle the aforementioned visualization challenges Vx and learning psychology goals CR/DU/MJ while targeting the educational objectives Ox defined in Section 4. We first describe the overall visualization concepts we implemented for exploRNN.Then, we elaborate on the different views of our environment in the upcoming subsections.Scales.To show both an overview of the training process O3 and give detailed insight into the computation that is performed within one recurrent cell O2 DU , we employ an overview first, zoom and filter, then details on demand visualization design, following Shneiderman's mantra V3 [43].Therefore, exploRNN consists of two main views, the network overview (Figure 1), which displays the training progress on the network scale, and the LSTM cell view (Figure 6), which allows for a detailed inspection of an LSTM cell.This is in line with our goal of reducing complexity CR by focusing on individual steps of the learning process rather than presenting everything at once.Animation.Animation has shown to be effective in visualizing data relationships and algorithms O1-3 [42,50,51].Furthermore, animation has shown to be associated with fun and excitement [52], which is in line with our goal of making learning more enjoyable MJ .Thus, to visually communicate how the network operates on sequential data, we use animation throughout our visualizations V2 .Onboarding.Novel visualizations and interactive systems can be hard to understand [53].We designed exploRNN in a way that allows exploration without running the risk of making irreversible errors or needing teacher supervision V4 .However, instructional aids may be important to understand such complex content DU [54].Therefore, we use an onboarding process for our educational environment [55] (cf. Figure 3).With this process, we aim to further reduce the cognitive load during learning compared to classical learning environments CR [56].For example, the sequential Fig. 3: Onboarding dialogs guide the user through our visualizations, so that no manual introduction is needed, and the user can explore exploRNN on their own.Textual descriptions with highlights provide detailed explanations for individual components.Positioning and arrows reveal associations between dialogs and components.nature of RNNs O1 V2 and the data and tasks that RNNs can be used for O4 are communicated in exploRNN.Textual Explanations.In contrast to other learning environments, which show static textual explanations below the main visualization [4,7], we instead provide such additional information as details on demand V3 [43].This way, users can access more information for exactly the components they want to learn more about DU , while not having to read a lot of text CR .Our interactively explorable dialog boxes, as shown in Figure 4, provide information about all important elements of the learning environment.Such dialogs exist for all headings and are anchored through an icon, and for all components of an LSTM cell, which is referred to in our onboarding process.

Network Overview
In the network overview, following the natural reading direction of western cultures, as well as related work on NN architecture visualization [57][58][59], we arrange the network from left to right.On the left, one can see the input type that is currently used to train the network A .Centered, we present an abstracted visualization of the network, where users can see how many layers the network contains B .On the right, a visualization of the prediction along with A Input.To experiment with the network, users can select the input data that is used to train the network from a set of explanatory input types.Data for an interactive and explainable visualization of NNs needs to both explain the network functionality O1 and be easy to understand V1 .Therefore, current educational visualizations use an abstract, two-dimensional distribution of points to train their networks on [4,7].With exploRNN, we follow this approach of employing data that is as simple as possible CR .As RNNs are focused on sequential data, we decided to use periodical mathematical functions and simple text snippets, which map nicely to the sequential nature of RNNs V2 .The functions that can be used as training data in exploRNNare a sinusoidal function, a sawtooth function, an oscillating function, and a composite sinusoidal function and vary in their periodicity.To demonstrate the sequential and dynamic nature of these input functions, we animated those that are in use so that they seem to flow while being input to the network MJ .In addition to abstract function continuation, we also provide text-based data to train the network on DU .To allow for interactive training, we employ rather simplistic text samples.These include a recurring character sequence (ababab...) and the well-known text lorem ipsum.Here, we employ a similar design language as with function data, to show that most ideas can be transferred across tasks.By incorporating this text learning scenario, users of exploRNN get to learn and inspect not only abstract problems, but can also experience more realistic scenarios O4 MJ .B Network.In the network visualization, we want to communicate the recurrent nature of our network O1 , but at the same time, show all layers.Thus, instead of the more frequently used unrolling of RNN layers [60], we add a loop to the layer glyph to symbolize this recurrence.This symbolizes the feedback loop of information output at t back to the input of a cell at t + 1 V2 , which enables BPTT.Our network visualization is animated as data flows through its layers O3 MJ .For the prediction step, dashed lines flow in the forward direction to symbolize forward data processing.For the backpropagation step, they flow backwards to resemble the backpropagation of the error V2 .Dashed lines are moving from input to output during the prediction phase, and from output to the first network layer during training, because backpropagation is not applied to the input domain.Users can also investigate how the training progresses differently depending on the number of recurrent layers in the network O3 .Therefore, layers can be added or removed from the network to be trained, as shown in Figure 5 DU .As with most explorable components, we explain the implications of this in our introduction, and users can click the next to the network heading.C Predictions.Commencing the top row of visualizations is the data plot, where we visualize an input sample and the prediction of the network along with its ground truth O3 .Here, multiple data points that are processed by the network one after the other are used to inform a prediction, which is visualized by sliding a gray box over the input data that is currently processed V2 .Additionally, the prediction values slowly build up with animations to clarify that this prediction is building up sequentially MJ .We then use vertical lines in the function plots, which slowly emerge between the prediction and the target value.This vertical line encoding is in analogy with the way we calculate errors, namely, by looking at the prediction values and calculating the difference to the ground truth DU .The error calculation is embedded temporally between the inference (forward network animation Hierarchical aggregation can help simplify visualization designs CR [61].Thus, after getting an overview of the network, the user can inspect another hierarchy level in detail, namely individual LSTM cells O2 [43].When selecting one of the layers in the network overview a zooming transition onto one of the network layers gradually reveals the structure of an LSTM cell to support the user's mental image of looking into one of the layers V3 CR .With this multiscale approach, where users can navigate between views, orientation is important CR .Therefore, a color coding indicates the current level of detail.This highlight color is blue for the network overview, whereas orange is used for the LSTM cell view.Orange and blue are complementary colors, which makes them easily distinguishable, and they can be differentiated by vision-impaired users [62].

LSTM Cell View
In the LSTM cell view, we show a detailed visualization of the selected cell on the left, embedded in small pictograms of neighboring cells G .On the right of this cell visualization, one can see the input, target, and prediction values of the network, where new points are added as they flow through the cell H . Below these visualizations, we show information about the training process, controls for the training process, and means to change training parameters, similar to the network overview I-K .G Cell Architecture.To convey the functionality of one recurrent unit O2 , we show all computational elements within a cell DU .Wherever information is combined, we show a icon.Icons for the input ( ), forget ( ), and output gates ( ) visualize the gating functionality of an LSTM cell.While all gates that transform the data are depicted with circular icons, the cell state, which represents the saved state of the cell, is represented by a squared icon, illustrating the semantic difference between these components.Each of these cell components can be selected to get a detailed explanation of its functionality, as shown in Figure 4, marking another level of detail in this visualization V3 CR .Here, elements that process data in the currently visualized computation step are highlighted.As in the network overview, connections moving data are symbolized with dashed lines.Those lines flow forward during inference and backward during backpropagation.This way, we communicate how the hidden state and output of these cells is computed and visualize how the data flows from one to the next operation or gate V2 DU .The reverse data flow of BPTT occurs not only once within a cell to backpropagate to the previous layer, but multiple times, for all input time steps O1 .The connections within the cell also clarify that there are two recurrent cycles, one from the output of the cell back to the input, and one within the cell to update the cell state based on its state in the previous iteration DU .As a result, while other visualizations require unrolling, where time steps are visualized by displaying multiple cells in concatenation [60], we communicate recurrence through step-by-step animation.This removes the ambiguity of stacked layers vs. unrolled cells, which was shown to hinder learners in our first experiments V1 CR .H Data Plot.Right of the cell visualization, we show the input data, network prediction, and ground truth all in one graph.In contrast to the network overview, where the network is directly connected to this output graph, the cell is disentangled from this visualization.As the depicted cell typically receives data from previous cells and outputs data to subsequent cells, this visualization, where animation steps are synchronized but not visually connected on both the input and output side, better reflects the network architecture of RNNs V3 DU .Users can inspect this view during interactively controlled training to see how the network processes input data to make predictions sequentially and how it calculates the training loss in relation to the processing steps within a cell O1&2 V2 .I Training Process.The three steps of inference, validation, and backpropagation are just as relevant in the LSTM cell view as they are in the network overview O3 .As the training speed is lower in the LSTM cell view, users can skip part of the data processing and go directly to the processing step of interest V1&2 .For the forward pass, we add additional explanations for the different processing steps of receiving the layer input, calculating the gate activations, updating the cell state, and outputting the activation value DU .These explanations are highlighted in synchronization with the processing steps during the forward pass to the data flow in the cell visualization above MJ , allowing users to draw links between the processing steps and the explanations they are interested in V2&4 CR .J Controls.In the LSTM cell view, processing is done by means of compute steps, showing a much more detailed processing pipeline than in the network overview V3 .As in the network overview, the control area can be used to experiment with the training process O3 .The more finegrained advancement of the visualization is also adopted by the degree to which the animation advances with the forward button, since it only executes the next compute step within a cell DU .In addition to what can be done in the network overview, the speed of the animations for data processing within this cell can be adjusted, so that users can explore the processing steps at their own pace V2 .We want to emphasize the buildup of state within a cell based on multiple input time steps.Thus, we show how the network processes these inputs in great detail, whereas we made the animation of the backpropagation take less time than forward processing.As exploRNN is not designed to represent accurate timings anyway, this is our way of visualizing cell processes in detail, while also preserving the ability to observe multiple epochs.K Training Parameters.Training parameters can be adjusted in the LSTM cell view just as in the network overview, giving the user even more control over the training process and room for experimentation O3 .
To get back to the network overview, one can click anywhere outside of the LSTM cell in Figure 6 G V3 .

TECHNICAL REALIZATION
While the visualization design described above has been carefully crafted to meet the educational objectives described in Section 4 and the visualization design challenges outlined in subsection 5.1, its technical realization needs to take into account the technical challenges identified in subsection 5.2.In this section, we detail how we tackled these technical challenges.T1 Training Time.While an RNN for a complex application cannot be trained live in the browser, we simplify the problem in multiple ways.By employing simplistic data sets, the model can converge after relatively few epochs.Additionally, we limit the number of data points that are fed to the network per epoch.Therefore, epochs are processed sufficiently fast for our interactive visualization approach.We also limit the network size to at most seven layers, so that memory consumption and processing time are reduced.In turn, users can see the training progress and get visible prediction improvements after only a few epochs, while one such epoch takes seconds to compute.

T2 Training Steps.
A key aspect of our approach is the decoupling of computation and visualization.Through this decoupling, we are able to show the training steps in an observable manner and enable exploration at the user's own pace.This helps users understand how the model processes input data and predicts new data points.T3 Deployment.To be able to make exploRNN publicly available for a large audience, we implemented it as an interactive browser application using HTML and JavaScript.To train the RNN, we use TensorflowJS [49], for animated visualizations of the trained network, we use P5.js [63].This way, we are able to provide an interactive, web application that visualizes the training dynamics of RNNs through animation, which is accessible at: https://mi-pages.informatik.uni-ulm.de/explornn/.

LIMITATIONS
While exploRNN provides a novel environment for learning about RNNs, there is still room for more advanced visualization designs that could be explored in the future.Some of these limitations are explained in the following.Explanations.exploRNN offers a lot of experimentation that is complemented by textual explanations.However, the number of textual explanations that fit into the context of such an educational system, which is designed to provide an overview of this complex topic, is insufficient to fully explain RNNs.For specific questions that are not addressed by our interactive system, we refer to developer documentation and scientific papers.Drill-Down.exploRNN explains RNNs on both a network and a cell level.Apart from seeing the data flow on these granularities and textually describing the components of a cell, visualizing the workings of these components could further benefit the learning experience.However, these components are just mathematical functions to which neither the input nor output have a directly discernable meaning.If we were to, e.g., visualize the internals of a memory component, users could only see matrices of seemingly meaningless numbers flowing through these cells.This would not add any benefit and might even result in confusion about such a visualization.To explain these internal components, novel interpretability techniques might help.Inventing and implementing those is beyond the scope of this work.Component Change.To see the influence of individual components in a cell, changing or removing them could be an interesting addition to the workflow.We did not implement this capability for two reasons.First, adding such functionality goes deep into the working of individual cells, which would exceed the learning objective of getting an overview of RNNs and LSTM cells.In turn, we assume that changing single components in individual cells is unlikely to have a measurable and interpretable effect on the overall learning outcome.Second, we would have needed to implement our own DL library for this to be possible, as TensorflowJS has predefined LSTM layer implementations.Degrees of Freedom.While users can change some hyperparameters and network settings in our environment, we deliberately do not expose all possible settings to our users.The goal of this limited exploration setting is that users can get an overview of important manipulations to be made, while at the same time not overwhelming our target audience.As for limited explanations, we refer to developer documentation and scientific papers for users that want to explore these details.Layer Types.In our implementation, we focused on conveying LSTM cells.However, there are numerous other cell architectures for RNNs.Although we don't think this limited focus hinders learners with understanding RNNs on a high level, it would, nonetheless, be helpful for users specifically interested in certain cell types to include these in exploRNN.

USER STUDY
To evaluate the effectiveness of our approach, we conducted a user study with 37 participants (30 male, 7 female) aged between 21 and 32.Participants were recruited from a DL course at our local university.Our study was a lecture at the end of the course, after students had already learned about feed-forward NNs.Participants were randomly assigned to one of two groups.The exploRNN group received the interactive application, and the text group was presented a text-based learning environment.
To look at learning outcome in detail, our evaluation was divided into the first three distinct, hierarchical cognitive learning goals according to Bloom's taxonomy [28], namely recall, comprehension, and transfer.We expect higher learning outcomes for the exploRNN group compared to the text group at all three levels.For a closer look at the cognitive processes involved in learning CR , we also collected data for the three types of cognitive load [12].Intrinsic cognitive load (ICL) results from the natural complexity that underlies the learning content.Since the difficulty does not differ, there should be no difference between the two groups.Extraneous cognitive load (ECL) is caused by inadequate instruction or presentation of information.Due to the stepby-step presentation of information and the direct connection of textual information and explanatory figures in the ex-ploRNN group, we expect lower ECL for the exploRNNgroup compared to the text group.Lastly, germane cognitive load (GCL) represents the invested learning-related load.GCL is connected to the processes that are needed to construct and automate mental representations [12].Following the reduced ECL in the exploRNN group, learners should have more free cognitive capacity in working memory to invest in learning-related GCL.

Hypotheses
Based on the described theory, we hypothesize the following.We expect a higher learning outcome, differentiated by recall H1 , comprehension H2 and transfer H3 in the exploRNN group than in the text group.Furthermore, we expect no differences between the groups for ICL H4 .We expect a lower ECL in the exploRNN group than in the text group H5 .For the GCL, we expect it to be higher in the exploRNN group compared to the text group H6 .

Method
Our study was split into different steps, which we explain in the order they were presented to the participants.Prior knowledge.Prior knowledge was measured with seven open-ended questions on NNs and DL techniques (e.g., Name two activation functions used in deep learning.).The questions were developed by a domain expert.All answers were rated by a domain expert, following a predefined solution to ensure objectivity.A total of one point could be scored for each question, with partial points of .5.The maximum score for the prior knowledge test was seven.

Motivation (MSLQ).
To assess motivation, the MSLQ [64] subscale for motivation was used.The MSLQ is a self-report questionnaire designed for an academic setting.Motivation was measured with twelve items (e.g., I'm confident I can do an excellent job on the tests in this study.).Learners were instructed to respond as accurately as possible, reflecting their attitudes and behaviors towards the learning module.Responses were given on a 7-point Likert scale ranging from 1 strongly disagree to 7 strongly agree.Cronbach's Alpha was computed for the internal consistency of the measures [65], the reliability was α = .95.Learning material.The learning material was presented either as a text with illustrating figures, formulas, and graphs (see our supplementary material) or through exploRNN(see website).For both conditions, the information was the same.The only difference was the presentation medium and the lack of interactivity in text-based learning.Learning outcome.To assess learning outcome, a domain expert developed a post-test with 11 open questions on the content of the learning session.To better understand cognitive processes, the posttest was differentiated by the first three levels of Bloom's taxonomy DU [28].Recall was measured with four questions (e.g., Name the backpropagation algorithm that is used for RNNs.).Comprehension was also measured with four questions (e.g., Explain the meaning of this formula: c t = f iltered input + f iltered state.).The main purpose of these questions was to test how well people could explain and discuss the learning content.Transfer was measured with three questions (e.g., Assuming you have a poem continuation network and training data with poems from the internet.If your network now makes a prediction, how do you determine if it is correct, to calculate the loss?).These questions were designed to test the ability of learners to draw inferences from the learning content and apply it to new contexts.Similarly to the prior knowledge test, each question was rated by a domain expert, following a predefined solution to ensure objectivity.A total of one point could be scored for each question, with partial points of .5.The maximum score for recall and comprehension was four each, and for transfer, it was three, so the total maximum score for the post-test was eleven.We did an ANOVA on the learning outcome to test for statistical significance.Cognitive load.To measure cognitive load CR , the differentiated cognitive load questionnaire was used [66].It contains two items for ICL, three items for ECL, and three items for GCL, all measured as self-reports on a 7-point Likert scale from 1 strongly disagree to 7 strongly agree.To measure internal consistency, the Cronbach Alpha was calculated [65].Reliability was α = .66for ICL, α = .81for ECL, and α = .77for GCL.As for learning outcome, we tested for significance with an ANOVA.System usability.To quantitatively measure the system usability, the System Usability Scale (SUS) was used [67].This scale is a self-report measurement consisting of 10 items related to the usability of exploRNN(e.g., I found the system very cumbersome to use.).Responses to the items were given on a 7-point Likert scale ranging from 1 strongly disagree to 7 strongly agree.The internal consistency of this scale was α = .74.Qualitative questions.For an impression of the quality of the learning material, further questions were implemented.Three open-ended questions were related to likeability (What about the learning experience did you like especially, what did you not like?), missing functionality (Was there something you would have liked to do but could not?), and additional comments (Other remarks.)MJ .For liking (I would like to use this learning material.)and recommendation (I would recommend this learning material to my friends.) of the material, two items could be rated on a 5-point Likert scale from very unlikely to very likely.Content (How was the quality of the content?) and design (How was the design of the learning experience?) could be rated with 0 -5 stars.

Results
In the following, we present the results of the user study.Descriptive data.The analysis of the descriptive statistics showed that subjects of the text group and the ex-ploRNN group did not differ in most of the variables.Ttests (variances were equal for all variables) with respect to age (p = .33),gender (exploRNN group 21% females, text group 16.67% females) (p = .74),MSLQ (p = .11),selfefficacy (MSLQ) (p = .16)and duration (p = .79)revealed no significant differences.Motivation (MSLQ) showed a significant t-test (p = .02),indicating that learners in the text group had a significantly higher score.Descriptive data for all variables per condition can be seen in Table 1.To analyze whether prior knowledge and MSLQ should be used as covariates, we conducted a correlation analysis with learning outcomes and cognitive load.Significant correlations could be found for GCL with the MSLQ (r = .37,p = .024)and for the recall of the post-test with the MSLQ (r = .44,p = .006).Therefore, they were included as a covariate in the following calculations concerning GCL and recall.No other significant correlations for the potential covariates could be found.Learning outcome.Against our hypotheses, we found a significant difference regarding recall (F (1, 34) = 3.91, p = .028,η 2 p = .103)in favor of the text group but not for comprehension (F < 1, n.s.) or transfer (F < 1, n.s.).Cognitive load.Contrary to our expectations, we found a significant difference between text and exploRNN group for ICL (F (1, 34) = 3.85, p = .029,η 2 p = .099).ECL showed the hypothesized effect: The exploRNN group showed a significant lower score than the text group (F (1, 34) = 4.33, p = .023,η 2 p = .113).Against our hypothesis, GCL was not significantly higher in the exploRNN group than in the text group (F < 1, n.s.).System usability.The SUS questionnaire indicates an excellent usability (M = 84.47,SD = 9.45) [68].Participants also rated our approach as significantly more likable (F (1, 30) = 10.52,p = .003,η 2 p = .260),more recommendable (F (1, 30) = 11.75, p = .002,η 2 p = .281),and better designed (F (1, 30) = 20.711,p < .001,η 2 p = .408)compared to the learning text.Qualitative questions.We also got some qualitative feedback in our free-form fields.Participants liked our introduction, which apparently made it easy for them to get started with exploRNN the tutorial was nice and the platform was easy to use.They also mentioned that the graphical support of these textual explanations was helpful for them to form a mental image of the setting: the mental bridge the graphical presentation helped build was helpful in memorizing and understanding.Some participants said that they did not remember specific names, as it was not important during the usage of exploRNN: I later did not remember the name of the algorithm that was used, since it was not important during the usage of the tool.Some participants asked for something similar for other types of networks, e.g.I would like to have similar resources to cover other topics from the basics such as MLPs

Discussion
In the following, we will refer back to the hypotheses we had before conducting the study and discuss the study outcome.
Learning outcome.We looked at both recall and understanding regarding the learning outcome.In contrast to H1 , we found that learners in the text group showed significantly better results for recall.While we found no significant differences between the groups regarding comprehension H2 and transfer H3 the results are interesting nonetheless.
Although not significant, the descriptive statistics indicate that the score for transfer is about 10% higher for exploRNN compared to text DU .This could be a first indication that learning environments such as exploRNN can help learners build a deeper understanding of the subject compared to learning with classic text.However, significant results and further research are needed to support this statement.Compared to recall, these results may indicate that while learners are better at learning terms by heart (surface learning) when they learn with text than with exploRNN.A possible explanation for the better recall performance in the text group could be that learners have more experience with text-reading strategies [69].This might help with the complex terms explained here, as learners might find it easier to find information that was previously presented [70].Thus, designing ways to easily retrieve previously presented information could be an interesting direction of future research for such interactive explorables.Another possibility is that learning with an interactive environment, which is affirmative and provides information step by step, might infuse the illusion of knowing [71].Learners may think that after a few experiments in exploRNN they have acquired enough knowledge, while there is still much more to explore and learn.In the text group, it is immediately clear to the learner when the text is finished.On the contrary, exploRNNcan require user initiative for information acquisition.As the learning experience was self-controlled, participants could decide for themselves when to go from the learning content to the post-test.Referring to the illusion of knowing, learners might have felt too competent as they experienced this more guided experience.However, even though learners may be able to transfer what they have learned to other application areas, they may be missing important basic terminology that was presented in the learning material to reflect their knowledge gain in a classical learning test.
Cognitive load.Against H4 , the exploRNN group showed significantly lower ICL than the text group.The perceived difficulty of the learning material is 12.71% lower for ex-ploRNN even though the text content was identical in both conditions.This suggests that exploRNN makes the learning material appear easier CR .The reason for this could be that the content is presented step by step in the tutorial of exploRNN, instead of all at once as in the text condition [56].
The results regarding H5 , are consistent with our assumptions.With a large effect size, the exploRNN group showed lower ECL than the text group CR .Therefore, exploRNNreduced the extraneous cognitive load compared to the text content, although the content was the same in both conditions and there were no unnecessary figures or information in the text.Combined with lower ICL, more cognitive capacity remains for GCL, which is important for learning.
For GCL, we did not find the significant difference we hypothesized in H6 .Although ICL and ECL indicate that more cognitive capacity should be free in the exploRNN condition, participants did not invest that cognitive capacity into GCL.This could be because there might already be high investment in GCL in both groups.Another explanation could be that since participants in the text group perceived the learning content as more difficult, they might have invested more GCL to compensate for said difficulty.System usability.The results of the SUS questionnaire indicate that our system is easy to use.This supports our proposed visualization and interaction design and shows that our design choice of creating an interactive environment as a learning experience on RNNs matches our target audience well.Additionally, as participants rated our approach as significantly more likeable, recommendable, and better designed, users are likely to experience more joy, and be more motivated when learning MJ .In combination with the reduced cognitive load exploRNN inflicts on users, we hope that this could result in a larger number of users willing to spend their time learning and more time spent learning per user.In turn, we think that this might outweigh the advantage in some areas of learning outcome with the more familiar text-based learning environment.Further longitudinal studies on NN learning systems need to be done to investigate this in more detail.

Qualitataive feedback.
The open feedback forms also provided interesting insights.In general, participants seemed to like exploRNN as a learning experience and even asked for similar interfaces in other contexts.Furthermore, the amount of information and our onboarding process seemed to make exploRNN easily usable.Most of the criticism was related to limitations regarding the freedom of interaction, which we deliberately implemented to provide an overview rather than in-depth technical details.Future work might reveal how both an overview and full depth could be combined in NN learning environments.We also learned why recall might be better in the text condition.As participants mentioned, they did not feel they needed to memorize specific terms to be able to use RNNs.This seems natural, as when programming or using RNNs in the wild, remembering specific terms is also not essential as they can be searched for.On the other hand, transfer tends to be much more important when tasked with solving real problems.
For learning material where transfer is important DU , as in recurrent networks, our descriptive results suggest that interactive visualizations such as exploRNN might be helpful.Additionally, the lower cognitive load and higher perceived likeability of our interactive environment might result in more learners spending more time with exploRNN.Although we extensively evaluated exploRNN in this study, it remains to be seen whether our insights are transferable to other learning environments.If so, the development of future explorables could be much better informed, indicating what is important, what could be discarded, and what needs to be improved on.While this study provides first insights into the effectiveness of such educational NN exploration environments, we hope that similar evaluations of other applications can broaden our insights.

CONCLUSIONS AND FUTURE WORK
This paper presents the first interactive learning environment specifically designed for RNNs.We propose a new visualization approach for inspecting RNNs where different levels of granularity are employed.To inform our visualization design, we first introduced educational objectives for this setting.Based on these objectives, we identified design challenges, which we tackle in the proposed learning environment.We hope that this process can be helpful for the development of future interactive learning environments.
Subsequently, we tested the learning outcome, cognitive load, and usability of this learning environment in an empirical study.Our study is the first quantitative evaluation of an interactive NN learning interface and, as such, provides helpful insights and directions for future work.The results of the user study indicate that, while the raw learning outcome was not improved compared to conventional methods, exploRNN makes learning easier and more fun since cognitive load was significantly reduced by exploRNN while subjective likeability was significantly improved.Based on these findings, we propose to specifically design interactive NN learning environments so that cognitive load is reduced.For broadly accessible education, exploRNN can be used in any modern browser at https://mi-pages.informatik.uni-ulm.de/explornn/.
As mentioned, more user studies for similar educational explorables could further advance the field and better inform future visualization designs.One such possible research direction is the suspected advantage of the text condition for going back to previously presented information.Here, eye-tracking studies and novel interaction designs might provide new insights.Additionally, it would be interesting to investigate whether such systems indeed lead to more voluntary learning and how that affects learning outcome.

STATEMENTS AND DECLARATIONS
This work was funded by the Carl-Zeiss Scholarship for Ph.D. students.The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Fig. 4 :
Fig. 4: Users can access more detailed explanations for many elements of our visualizations such as training steps, hyperparameters, and operations in a cell.

Fig. 5 :
Fig. 5: left: Adding a layer between two existing layers.right: Removing a layer from the network.
) and backpropagation (backward network animation) phases of the training process V2 .Altogether, through this animated component, while not being interactive itself, users can inspect the results of modifications that have been made in other places O3 .D Process.According to the typical NN training setup, we divide the training process into three distinct steps: inference (forward), validation (error calculation), and backpropagation (backward).The explanation pane in the lower left of the network overview (see Figure 1) displays which step is currently executed and provides an explanation of what happens in each of these steps DU .Through this, the user can learn more about the training dynamics of the network O3 .As described previously, animations in other components complement this dynamic nature of the training process V2 .E Controls.In the network overview, the network is trained by means of epochs to first provide an overview [43] of the training process V3 CR .To experiment O3 , users can interact with the control area in the bottom center of our environment MJ .In addition to automatically advancing epochs, which can be controlled with the and buttons, users can also trigger network training for a single epoch, by pressing the button DU .The training process can always be reset using the button.A back button to go to a previous epoch is not included in exploRNN as this would require saving multiple previous states of the network parameters, which would require significant browser memory.Therefore, and as individual epochs normally do not change the network behavior completely, going back one training epoch during training is not a common operation during neural network training, so we think users will not miss such functionality.F Training Parameters.To communicate the training setup of an RNN O3 , a trade-off between completeness and simplicity must be made V1 .Thus, we let the user freely choose some training parameters, but employ restrictions for others CR .As mentioned, users can add or remove individual network layers and use different preset training inputs.In addition, they can change the learning rate, batch size, and noise DU .The learning rate and batch size allow for exploration of different training settings O3 .Noise can be added to make the training data more realistic, resembling real-world scenarios of imperfect measurements O4 .Parameter changes can be made through sliders, which are positioned on the bottom right.To provide an intuition about the influence of these parameters, we include pretrained models that are loaded during the onboarding steps which explain each individual parameter V4 .Other parameters, such as units per layer or optimization strategies, cannot be changed in our implementation.This trade-off between freedom of exploration and simplicity proved to be effective in educating users about the influence of different training parameters and keeping their cognitive load low V1 .

Fig. 6 :
Fig. 6: LSTM cell view.G Visualization of data flow through the cell.H Input to the network and its prediction.Visualization of the training error computation.A grey sliding window indicates which data points are needed to initialize the cell state.I Explanations with more detailed steps for the forward direction of data flow.J In addition to interactively training the network, users can change the speed at which the visualization for cell steps advances.K Just as in the network overview, users can modify training parameters.

TABLE 1 :
Means and standard deviations for our study results separated by groups for all variables.Numbers annotated with * indicate a significant difference between the two conditions.