This phase describes approaches to abstracting over (i) the complexity in the configuration and deployment of an environmental model (abstraction 3 above) and (ii) the complexity in cloud deployment of the model (abstraction 2 above). We created a tool for the configuration experiments, and a cloud deployment of the Weather Research and Forecasting (WRF) model. Here, we raise the level of abstraction through reducing the complexity of configuration and deployment, leveraging the on demand scale-ability of cloud architecture.
4.1 WRF
The Weather Research and Forecasting (WRF) model [10] is a large community-based endeavour (around 40,000 users), supported by the National Center for Atmospheric Research (NCAR). The model is primarily used for atmospheric research and forecasting across a wide range of scales (thousands of kilometres to meters). The diverse range of extensively validated science WRF can simulate includes regional climate, air quality, urban heat islands, hurricanes, forest fires, and flooding through coupling with hydrological models.
WRF is chosen as a case study here for the following reasons: (i) WRF installation is viewed as a barrier to use; (ii) cloud resources will enable WRF users to conduct simulations beyond current capability [7]; (iii) WRFs open-source nature and portability; and (iv) benefits will impact WRFs large community user base.
4.2 Configuration and Collaboration
Model Driven Engineering (MDE) is an approach to managing complexity in software systems and to capture domain knowledge effectively [8]. Domain knowledge is captured in a software model of the system, this model is configured using Domain Specific Languages (DSLs) that relate to the application domain and potentially the underlying platform features. Code is then generated by the model using transformation approaches to configure and deploy the system being modelled. This approach has been applied successfully in a variety of areas [8] including in industry settings [5].
In this work we investigated using this approach to develop DSLs to allow scientists to describe an experiment, with the underlying software model managing the generation of code to configure the environmental model appropriately, deploying the model to appropriate flexible cloud infrastructure and returning results. In exploring the tools available, we found they are not able to support such a vision at the current state of readiness. There are so many complexities and flexibility required in configuring models like WRF for the many different uses, that to hard code a set of rules into a DSL was not an approach that was likely to be successful.
Instead we decided to use a general purpose language to manage the configuration and deployment, and proposed embedding a future ‘learning’ approach that would match experimental configurations to infrastructures, and would be able to recommend an appropriate architecture for an experimental configuration. In our qualitative phase we found our participants were familiar with languages such as R and Python, often used to process data and produce visualisations.
In taking this MDE-lite approach, we produced a Python object based model of a WRF experiment configuration, that allows configuration using standard Python constructs. The Python package f90nmlFootnote 2 allows the object to generate Fortran 90 namelist files that are used to configure WRF, these are auto-uploaded to the WRF instance for deployment. This provides a layer of abstraction over the skills required to login to an instance, navigate the file system, and edit the Fortran namelist file without introducing errors. No configuration dependencies are managed in this first iteration, but this can be added to the Python model in future.
Code notebooks such as JupyterFootnote 3 are now widely used in the data science community to collaborate and annotate data analysis. They are online environments, often hosted in the cloud (but not necessarily publicly available) that allow code to be edited and run with multiple collaborators. They allow mixed mode documentation, with in line code and output visualisation from many modern programming languages, including Python and R. They are able to pull in data from outside sources and write out data to attached cloud native stores or via API.
For our purposes a Jupyter code notebook provides an ideal space for environmental scientists to control modelling experiments. Whilst it would not be possible to run a complex model such as WRF in the notebook, it is possible to configure a run experiments via system APIs. The model can be configured using our Python object based configuration tool, and the experiment run outside of the notebook (either on the same machine or an external ‘cloud burst’) with output data returned to the notebook for collaborative analysis. The notebook allows the experiment to be documented (supporting scientific reproducibility) and shared, extending the reach of the experiment.
4.3 Cloud Deployment of WRF
We built a scripted cloud deployment of WRF, and evaluated usability, performance and cost metrics. This installation script dealt with dependencies and the configuration and compilation required to run WRF on the Microsoft Azure cloud platform. The script is automated as far as possible, however in a number of places unavoidable user input is requested from the WRF installer.
The architecture of the WRF model itself is complex, described by [10]. Our standard cloud configuration consisted of a Message Passing Interface (MPI) supported cluster of 9 standard compute nodes from the Microsoft Azure Dsv3-series each having a 3.2 GHz Intel Xeon E5-2673 v4 (Broadwell) processor. One node is a master node taking care of all the compilation and providing a means of sharing the storage and computation with all the nodes.
We used a predefined image of Ubuntu Server 16.06 LTS for each of the cluster machines and each node has 8 processors with 32 GiB RAM and temporary storage of 64 GiB that is considered a secondary storage for each compute node. We used the GNU Fortran and GCC compilers. The cluster provides primary storage of 100 GiB shared amongst nodes via the Network File System (NFS). The shared location contains all the simulation related input/output data and files required for WRF configuration as well as compilation. All the cluster nodes and storage are deployed in Western Europe under one secure virtual network and have friction-less access to enable data sharing and execution of MPI jobs.
An expert user group of 6 regular WRF users agreed that our automated WRF deployment successfully abstracted over the major hurdle of initially installing the model. They helped to develop the use cases that we used in developing the work: (A) removing barriers to entry for new users, allowing them to immediately run experiments; (B) users wanting to run the model in a standard way to feed results into other models; and (C) power users wishing to deploy massively in parallel and without waiting for institutional HPC queue times [9].
4.4 Mechanisms for Cloud Computing Configuration
To build on this scripted installation, we investigated key mechanisms to abstract over the complexities of configuring compute architecture, by exploring different modes of cloud deployment. We wanted to retain as much flexibility in the deployment whilst retaining usability for our use cases.
Portal Configuration. The scripted installation described above requires the manual configuration of the cluster from the Azure web portal, or defined in code using their Infrastructure As Code (IAC) offering, which are general purpose interfaces and require a deep understanding of the desired infrastructure. Because of this knowledge required, it is perhaps not a suitable interface for our use cases except perhaps (C) the power user who may have an understanding of the relative merits of different cluster configurations and be able to fine tune to their experiment manually. Since the WRF installation is not fully automated, it still requires user interaction.
Containerisation is a mechanism whereby the software environment including dependencies for a particular application is defined in code. These containers are infrastructure agnostic, so can be deployed to any suitable provider. However the MPI architecture of the WRF model makes it unsuitable for containerisation - MPI is the mechanism by which messages are transferred between nodes, and we found little support for this in existing technologies. This situation is changing, with a number of providers beginning to offer support for MPIFootnote 4.
LibCloud is a Python library for interacting with many cloud providers, abstracting from their specific IAC offering. This allows a provider agnostic configuration of a cloud infrastructure from a notebook, however it is again general purpose and does not encompass every offering from every provider. It is designed for the computer scientist, and does not really abstract from the complexity of deploying a cloud computing system, just from the differences that individual cloud providers might have for their standard machines. It also does not get around the user interaction required for installing WRF so perhaps only suitable for use case (C).
Infrastructure As A Service (IAAS) allows the replication of predefined and configured machine images, allowing standard node with WRF installed to be instantiated on demand. In Azure this can be in their machine image library for anyone to use, and other providers have similar facilities.
In summary, the IAAS approach was selected as being most suitable for our WRF cloud deployment as it means deployment is instantaneous with no user interaction required in getting a system running WRF running, so good for use cases (A) and (B). However this reduces flexibility for (C) in terms of virtual machine specification as it is tied to a specific standard machine type offered by our cloud provider. Without re-engineering the WRF installer, or WRF itself it was not possible at this time to create a cloud-native, provider agnostic system without user input in the installation.
4.5 Experimental System
The IAAS approach whilst reducing ultimate flexibility in infrastructure, allowed us to explore how WRF might be configured and deployed from a notebook environment. We built a demonstrator using the Azure SDK for Python to configure and deploy a WRF cluster from within a Jupyter notebook running on an instance of Azure data science machine and so far a number of test experiments have been run.
Ongoing Work: The next stage in developing this system is to integrate the WRF namelist configuration tool to the same notebook, and return results as an attached data store to the machine. The same Jupyter notebook can then be used to sort output data and prepare data visualizations. In this way the whole experiment can be orchestrated through a single collaborative notebook interface, which is version controlled and can be archived easily. Figure 1 shows this visually.