At least in our experience, the pace of technological and methodological change in spatial data science is such that the gap between what instructors know and what students know is substantial, and in many cases might even be growing. But the differences within individual student cohorts may be greater still, and instructors need to be able to quickly assemble and update a course that meets students where they are rather than where the instructor or institution might wish them to be. At the same time, there is also a strong need to support more open-ended exploration during ‘practicals’ (Unwin 1980), particularly by more advanced students who may ‘tune out’ if progression is overly structured and rigid or just too slow to keep their attention.
The guiding insight behind geopyter is that instruction in programming outside of computer science proceeds from fundamental units of learning typically built around computing concepts (variables, lists/arrays, dictionaries/hashes, functions/subroutines, etc.) to fundamental units of learning built around analytic concepts (cluster analysis, point patterns, spatial autocorrelation, etc.). These units must then be assembled in a way that speaks to the student cohort and its background; we mean this in two ways: first, the examples used must be domain-specific in order to speak to budding geographers, political scientists, historians, etc.; and second, it should be possible to develop courses for different types of cohorts without having to start over from scratch. In other words, how can we enable teachers to reuse many of the building blocks employed in an advanced course for masters students in an introductory course for undergraduates?
Components
From these constraints, it was clear that our system needed to support a compositional, ‘bottom-up’ approach to instructional design. So although the development of a course or class should obviously start out with a clear set of learning aims and outcomes, at a certain point the instructor will be searching for examples and code with which to teach a particular concept: What is a list or dictionary? What is k-means clustering? We settled on the term ‘Atoms’ to refer to these basic instructional units, and much like entries in the atomic table, we felt that they could be grouped together into sets of related concepts: the fundamentals of programming, point pattern analysis, machine learning, etc (Table 1).
Table 1 Overview of system components Each Atom would employ domain-specific illustrative examples and code so as to anchor learning in problems and applications relevant to the learner.Footnote 2 An Atom could start by showing how a list can be used to hold data about cities (e.g. name, country, population), and a subsequent set of ‘cells’ (the basic ‘unit’ of Jupyter a notebook) could build on this with an illustration of how a list-of-lists allows us to add location as a latitude and longitude coordinate pair. We will return to some of the issues that this approach raises in Engagement, but it points to the importance of ensuring a degree of consistency in how the Atoms for a set of closely related topics fit together.
The purpose of the bottom-up approach is that these units can then be flexibly assembled into Sessions: from the same ingredients (i.e. Atoms), the instructor could create quite different modules by organising and presenting the elements in different ways. Quite simply, we do not want to have to rewrite material for each format (e.g. lecture/practical, ‘flipped’ classroom, or distance learning), but we also need to deal with the fact that different types of scaffolding are required and that the amount of content suitable to a ‘class’ in each of these formats might differ substantially.
Furthermore, sessions designed for experts (e.g. those pursuing continuing online education) might be able to ‘move’ students through many more Atoms of instruction in a single Session than a similarly laid out on designed for first-time programmers in an undergraduate programme. So Sessions need to be able to incorporate Atoms in a way that minimises the level of effort involved in finding the ‘best’ way to, for example, explain the concept of recursion while maximising the ability of the instructor to relate this concept to the students’ practical experience (e.g. by providing a ‘context’ that is anchored in a locally relevant ‘story’ or data).
Naturally, Sessions can then be grouped into learning Modules that offer a coherent instructional programme over a period of weeks or months. Modules represent the highest level of abstraction in the proposed system, but they are also obviously the starting point from which instructors can organise their (remixed) atomic and sessional material into something incorporating a set of learning outcomes and a package of assessment appropriate to their students. However, our design reflects the expectation that contributions to geopyter at each level of instruction might be made by different people: a domain expert in Spatial Bayes might be the right person to develop an Atom on the concept and its application, but not the right person to develop a module tackling advanced spatial analytic concepts where this is just one approach amongst many. Similarly, given the global diversity of delivery formats, a 10-week term in Britain enables students to cover a very different ‘volume’ of content from a 15-week American semester. geopyter recognises and seeks to respond to that diversity.
Tools
So we are trying to design a system in which Sessions and Modules are composed out of Atoms that can be, optionally, surrounded by the instructor’s own ‘narrative’. We therefore want to produce a set of teaching materials that are highly portable, easily reused or edited, and that enable the instructor to select only the elements from which they wish to compose their materials. As intimated above, geopyter operationalises this through the Jupyter project and its ability to provide in-browser access to the Python interpreter,Footnote 3 and it also takes advantage of the dominance of the Git version control tool—and the GitHub service/website—as a means of tracking authorship across edits.
In theory, thanks to the combination of Jupyter and GitHub is not even necessary for the novice user to have Python installed on their own computer: since all interaction with Python is via the browser, the environment could be hosted on a server halfway round the world. In practice, however, there are few such services and most users simply download and install a free version of Python (e.g. Anaconda) that will run on their system. In our field, many people are already using this approach: notebooks can be found covering everything from introductory concepts (Millington and Reades 2017) to advanced spatial analysis methods (Arribas-Bel 2016), and combined for both complete courses or workshops (Rey 2016).
Jupyter notebooks are written as a mix of executable code cells and non-executable text formatted with the widely used ‘markdown’ syntax. Notebook structure is provided through headers in markdown cells: a ‘#’ pre-pended to a line of text is generally taken to be the title of the notebook; ‘##’ at the start of a line provides a second level of structure (i.e. Level 2 headers); ‘###’ indicates Level 3 headers; etc. For our purposes, what is relevant is that these headers naturally yield a semantic hierarchy that corresponds closely to the h1 ... h6 model used by the HTML markup language that lies at the heart of the World Wide Web. This hierarchy allows us to ‘abstract out’ the problem of inferring the meaning of cells in different sections of the notebook since the instructor does it for us through their use of headers.
Approach
In order to assemble Atoms into Sessions and Modules, geopyter necessarily requires a compositional syntax. We have noted the conceptual mapping between markdown and html formatting above, but how do we select some mix of code and markdown material in one notebook to be incorporated into another? And how do we do this in a way that is both simple to express and able to resolve ambiguity? Fortunately, such a model already exists and was hinted at in Fig. 1: cascading style sheets (css) uses well-understood ‘selectors’ to specify one or more elements on a web page to which a set of presentational styles should be applied.
In css, an ‘h1’ in a style sheet indicates that all html Level 1 Headers (e.g. <h1>A Title</h1>) should observe the styling rules declared immediately afterwards, while ‘h1.important’ specifies that only a Level 1 Header of the class ‘important’ should be selected (e.g. <h1 class=“important”>A Title</h1>) and all other Level 1 headers ignored (e.g. <h1 class=“unimportant”>A Title</h1>). In fact, css also allows for nested selectors in which ‘child’ element(s) of a ‘parent’ can be selected in turn. This is normally used to do things like specify mouseover behaviours for a menu: that all anchors (i.e. links) that are within a division of class menu should act in this way when the mouse passes over them (e.g. div.menu a.hover). What is particularly elegant about css is that it provides a means for selecting multiple pieces of content in the document in one declaration (where this is desirable) and a means for disambiguating content with the same name, but in different locations within a document hierarchy (where it is not).
Conceptually, geopyter adapts this syntax to allow us to select some or all of a Jupyter notebook using the structure imparted by the instructor: all cells coming after a Level 1 Header are considered to be part of that element’s semantic field until another Level 1 Header is encountered or the end of the document is encountered, whichever comes first. And a Level 2 Header coming ‘after’ (a ‘child’, if you prefer) a Level 1 Header is considered part of that ‘parent’ element’s semantic field, but we can select it uniquely within the notebook using the standard css form of h1.content h2.subcontent. This is illustrated in schematic form in Fig. 1, but note that the > is simply make clear the hierarchical relationship. With this, we have essentially repurposed css as a means of selecting and importing content from one notebook into another!
Unfortunately, the nature of Jupyter notebooks does not allow this to happen dynamically at run-time, but it does allow something similar to happen when an instructor is ‘compiling’ new Sessional and Module content. In short, the instructor writes whatever content they wish but, using syntax similar to the examples below, wherever they want to incorporate contributed content from geopyter (or elsewhere) they have only to ‘include’ it by specifying both a source and a selection. And this approach works recursively: a notebook can include content from a notebook that itself includes content from another notebook.
Syntax
To recap, we typically envision an Atom as a short notebook focussing on a core concept or method (e.g. lists, object-oriented design, or spatial autocorrelation); some or all of each Atom can then be selected and imported into a Session, which is itself a notebook; and the sessions can then be selected and managed through a Module, which can also be a notebook or a set of notebooks. This process is initiated by the instructor creating a blank text cell in a Jupyter notebook and writing an ‘include’ statement. The statement should be the only content in the cell since geopyter will be replacing the cell with an unknown number of whole text and code cells from the referenced notebook.
Crucially, include statements can be freely intermingled with the instructor’s own content (as shown in Fig. 2, allowing the instructor to ‘frame’ the concepts in a way that suits their teaching style but which saves them having to reinvent the wheel for each class. A Session tackling standardisation could include elements of the relevant Atom from geopyter while still allowing the instructor to interject comments, observations, questions, and additional tasks to ground the learning experience in the local context (individual, institutional, etc.). To illustrate this more clearly, an Atom on Python’s approach to dealing with lists could be incorporated into a longer Session as follows:
Here, nb is a path—local or remote—to a valid Jupyter notebook from which the instructor wants to import content. The select parameter specifies a selector for which the geopyter tool will search within the source notebook. All content from that point onwards up to the next selector at the same level will the then be copied into the compiled notebook. In the above example, if there were a following h1 covering, for example, ‘List Operators’ then this would not be included because, from a structural standpoint, it is at the same level in the hierarchy as ‘Lists’ but has not been selected. Furthermore, any h2 or h3 subsections within the ‘Lists’ section would be included since they are presumed to be providing pedagogical and logical structure to the Lists section and so should be carried over.
Clearly, an instructor might want to import only part of a section, or to suppress a subsection falling in the middle of a larger resource. In anticipation of this need, more complex ‘include’ statements with no equivalent in css are also possible:
$$ \begin{aligned} & @{\mathtt{include}}\,\,\{ \\ & \quad \quad \quad {\mathtt{'resource'}}\,\, = \,\,{\mathtt{'http://geopyter.org/atoms/fundamentals/lists.ipynb'}}. \\ & \quad \quad \quad {\mathtt{'select'}}\,\, = \,\,{\mathtt{'h1.Lists}}\,\,\text{-}{\mathtt{h3.Lists}}\,\,{\mathtt{Examples;}}\,\,\,{\mathtt{h1.List}}\,\,{\mathtt{Operators}}\,\,\text{-}{\mathtt{ h2.Concatenation'}} \\ & \} \\ \end{aligned} $$
In this second example, two Level 1 sections are imported at the same time and a Level 3 subsection from within each of those sections is suppressed using the ‘-’ syntax to indicate that the section should be removed. We diverged from the css standard since that selectors are not separated with commas: we wanted to allow for this punctuation to be part of a section heading and felt that semicolons are rather more rare in that context. An additional point of difference from true css is that we allow spaces in the ‘selector’ because we felt that asking teachers to translate between a natural language header (‘List Operators’) and what css would consider a safe header (‘List_Operators’) would detract from ease of use.
Putting it all together
Jupyter notebooks use a format called JavaScript Object Notation (json) that is not particularly easy for most humans to read, but as it is nonetheless highly structured we can interact with it programmatically. The extensible nature of the json format also allows us to read and write both data and metadata not only to each notebook, but also to each and every cell in a notebook. Since metadata that is not understood by Jupyter is simply ignored, we can add our own fields to provide useful information related to instruction such as who should be given credit for contributing and any dependencies or requirements for installed libraries.
Taken together, this provides the foundation for remixing/mashing up content while still enabling to add an institutional or course-specific gloss wherever necessary. Each notebook might start with the instructor’s contact information or by providing instructions for setting up the computing environment, but then make use of material developed by others for actual instruction. This process may seem quite abstract—and probably quite convoluted as well—but an illustration (Fig. 3) may help to clarify why this process is so useful.