Keywords

1 Introduction

On March 11, 2011, a major earthquake attacked to Eastern Japan. Especially, the east coast of Eastern Japan was severely damaged by the tsunami attacking. In Shikoku area including our universities in Western Japan, it is predicted that Nankai earthquake will happen in the near future. There is an interval theory that occurs every 100 to 150 years on the Pacific side in Western Japan. It is expected to have Nankai earthquake in the next 30 years, and its occurrence rate is between 70 % and 80 %. We have to prepare the disaster for the major earthquake.

In addition, we have a lot of bad experiences which is the record rainfall in a short period last few years. There are not uncommon story which the huge flood is caused by a short period rainfall. Especially, we had the heavy record rainfall and heavy floods, it attacked to Western Japan area at August 2014. We had heavy damage by these disaster. It is no longer special for us to suffer from disasters very often. We think the preparing for the disaster including these heavy floods is very important for keeping our life, also it is very important for the information system’s field.

On the other hand, the informatization of educational environment on universities is rapidly progressed by evolutional information technology in Japan. Current education environment cannot be realized without educational assistance system, such as LMS (Learning Management System), learning ePortfolio, teaching ePortfolio and so on. The learning history of students is stored by these educational assistance system. The fact is that awareness of the importance of learning data such as learning histories and teaching histories. The assistance systems are important same as learning data. Today’s educational environment on universities depends on educational assistance system with information technology infrastructure. If the educational assistance system with students learning history is lost by natural disasters, we think it become equivalent to lost sustainability for educational activity.

In addition, an integrated authentication framework of inter-cloud is used to share the course materials. For example, Shibboleth Federations such as GakuNin [1] is used to authenticate other organization’s user for sharing the course materials within consortium of universities. Today’s universities educational activity cannot continue smoothly without those learning data and assistance system.

We can find applications for constructing information system infrastructure by the private cloud for academic field such as Yokoyama’s study [2]. The target of these study is to provide massively parallel computing such as Apache Hadoop environment [3]. Their aims are to provide effective use of computer hardware resources, and providing a centralized control of computer hardware resources. It is different purpose for disaster recovery and the reduction of damage by large-scale disasters.

In this research, we have built a framework of disaster recovery from large-scale disaster such as earthquake, tsunami and huge floods for e-Learning environment. We build the private cloud computing fabrics and these inter-cloud environment, and our target is to build the private cloud collaboration framework. This private cloud environment and private cloud collaboration framework are constructed from any private cloud fabrics with the distributed storage system into several organizations such as universities. The Learning Management System such as Moodle [4] build on several private cloud fabrics. Each VM (Virtual Machine) has a LMS and the related data with a SQL database. General IaaS (Infrastructure as a Service) platform such as Linux KVM (Kernel-based Virtual Machine) [5] has a live-migration function with network shared storage and Virtual Machine Manager [6]. General network shared storage is constructed by iSCSI, NFS and usual network attached storage system. These network shared storage systems are bound to any physical storages on the each organizations. Therefore, it is difficult to do the live-migration of VMs between inter-organizations.

Our prototype platform is built with distributed storage system and KVM based IaaS architecture on a lot of usual server machines with network interfaces. It is able to handle many VMs including LMS and the data with enough redundancy. And, this prototype platform will operate inter-organizations. Thus, our prototype platform will be able to operate integrative each organization’s private cloud fabric. If one organization’s e-Learning environment on the private fabric is lost by some disaster, it will be able to keep running same environment on other organizations environment. In addition, our prototype platform can get emergency earthquake alert by smartphone via cell-phone carrier in Japan. Japanese cell-phone carrier is able to send emergency alert message when major earthquake generated cooperating with Japan Meteorological Agency. Our prototype platform make live-migration function when earthquake alert grasped.

In this paper, we propose a private cloud collaboration framework between private cloud fabrics on several organizations, and we show a configuration of the prototype system. And, we show the results of experimental use and examine these results. Finally, we describe future study and conclusions.

2 Assisting the Disaster Recovery for e-Learning Environment

In this section, we describe the private cloud collaboration framework of e-Learning environment. Especially, the purpose of this framework is a disaster recovery for LMS such as Moodle, and to keep running LMS and related data.

Figure 1 shows a framework of disaster recovery assistance for the e-Learning environment. Each organization such as university has each private cloud fabric. Each private cloud fabric has several server hardware at least four machines to get enough fabric’s redundancy, and network connections between several server hardware. Each server hardware does not independent other server hardware on the private cloud fabric.

Fig. 1.
figure 1

Framework of disaster recovery assistance

They provide computing resources and data store resources via VMs, their resources are changed adaptively by the request from the administrators. Each VM which exists on the private cloud fabric is generated from the resources in the private cloud fabric, it is able to process any function such as authentication, and LMS function on the VM. In addition, Each VM can migrate between other private cloud fabrics, and it is able to continue to keep processing.A live migration function needs a shared file system to do the VM’s migration. The product of Sheepdog Project [7] is applied to our framework. Sheepdog system is a distributed storage application optimized to QEMU and KVM hypervisor. Our proposed framework builds by KVM hypervisor, and Sheepdog distributed storage system provides highly available block level storage volumes. It can be attached to QEMU based VMs, it can be used to boot disk image for the VMs. Sheepdog distributed storage cluster does not have controller or meta-data servers such as any Storage Area Network (SAN) based storage system or other distributed storage system.

Figure 2 shows an architecture of hybrid distributed storage. This architecture has meta-data servers. The meta-data servers manage the meta-data information for split chunk data which is stored on the several data-store node. And, meta-data servers export the mount point using POSIX API such as NFS for virtual machine image. When the chunked data store to several data-store node, meta-data information which stored meta-data server is updated by stored chunk data. However, meta-data server become a single point of failure. All virtual machines can’t access the VM image file and users data when the meta-data information is lost. We think this problem is very strict.

Fig. 2.
figure 2

Hybrid distributed storage architecture

Figure 3 shows an architecture of pure distributed storage. The pure distributed storage system does not have metadata on the organized nodes. When the VM wants to get some data from distributed storage system, the consistent hashing method is used for searching target data from stored nodes of distributed storage system. The distributed storage system which is based on Sheepdog product does not have the single point of failure. Because, Sheepdog has a fully symmetric architecture. This architecture does not have central node such as a meta-data server. If some server hardware which compose Sheepdog cluster, it has small risk to lost the VM image file and history data.

Fig. 3.
figure 3

Pure distributed storage architecture

In addition, we think each VM image is able to find other organization’s private cloud fabric. Because, Sheepdog based distributed storage system is constructed integrally on the several organization’s private cloud fabrics. It can be able to reboot the VMs on other organization’s private fabric under the disaster situation. Where possible, the VMs which are running on the several organizations move to riskless other private cloud fabric, and keep running the VMs.

Each private cloud fabric of several organizations has private cloud collaboration controller. A private cloud collaboration controller is constructed from customized smartphone and Libvirt Virtualization Toolkit [8]. Today’s general smartphone has a function which catch the disaster alert notification. The disaster alert notification is delivered by mobile phone network using ETWS (Earthquake and Tsunami Warning System) message [9]. Our customized smartphone passes alert notification to the private cloud controller when the smartphone received ETWS messages. The private cloud controller which received alert notification makes live-migration command for controlled VMs.

However, if VMs migrate between several private cloud fabrics in working condition, it is not true that each organization’s users can use several services. The hostname which is used to access the services, it must be rewrite to the previous organization’s FQDN (Fully Qualified Domain Name). Generally, the users of organization-A want to access own LMS, they use the FQDN of organization-A. When the VM of organization-A is under controlled by the private cloud fabric of organization-B, that VM’s FQDN has to provide the hostname related to organization-A. This function must operate at the same time as the live migration function.

We applied a reverse network address translation technology (reverse NAT) to keep users connectivity. The VMs which are providing LMS services migrate between several private cloud fabrics. These private cloud fabrics are deployed to inside of reverse NAT, and these are deployed same Layer2 segment under the L2VPN technology. When the VM migrate from one private cloud fabric to other private cloud fabric, the reverse NAT gets the migration status. The reverse NAT which is accepted the migration status can to rebuild DNS host entry.

As a result, we think we can assist to provide this inter-cloud framework against the disasters for e-Learning environment.

3 System Configuration

We show the configuration of proposed prototype system in Fig. 4. This is a prototype system configuration of proposed framework.

Fig. 4.
figure 4

Prototype system configuration

This system has four components and two internal networks. The first one of the components is the node cluster. This is a core component of our prototype system. They are constructed by eight node hardware as shown by node1 to node8. The cluster which is constructed from node1 to node4 is placed same private cloud fabric. And the other cluster which is constructed from node5 to node8 is placed same private cloud fabric. These private cloud fabrics are placed different organization physically. These private cloud fabrics are connected with L2VPN such as EtherIP technology. And the IPsec technology is used to make a secure tunnel connection for L2VPN. As a result, both private cloud fabrics are organized same cluster logically.

This node hardware which is organized for private cloud fabric is based on Intel architecture with three network interfaces. Each node has the function of KVM hypervisor, virtualization API and Sheepdog distributed storage API. Each node can be used for the VM execution infrastructure, and it is also to use the composing element of Sheepdog distributed storage system. As a result, it is realized sharing the hardware to use VM executing infrastructure, and it is implemented a reliability and a scalability of the storage.

The second one of the components is a Software Defined Network (SDN) controller based on OpenFlow [10] architecture. These servers which compose the VM execution infrastructure have the function of OpenFlow switch based on Open vSwitch [11]. This function is used for making optimum path dynamically, and it is also used for integrating several distributed storage.

The third one of the components is the Virtual Machine Manager. An administration interface for VM’s administrator is provided by virt-manager. This function is used for management several VMs by VM’s administrator on this prototype system. The virt-manager uses Libvirt Virtualization Toolkit to make VM’s management functions. Libvirt Virtualization Toolkit supports any hypervisor such as KVM/QEMU, Xen, VMware ESX and so on. Any hypervisor functions are abstracted by Libvirt functions, VM’s management application is able to make control the VM’s status.

The fourth one of the components is the private cloud collaboration controller. This cloud controller has functions, there are catching earthquake alert notification via smartphone, and making live-migration command for target node machines. And the private cloud collaboration controller has each VMs status on private cloud fabrics, it was caught from Libvirt Virtualization Toolkit and Virtual Machine Manager. When the private cloud collaboration controller makes live-migration command to target VMs, it was planned adaptively based on managed VMs status. As a result, any alert system of earthquake will control VMs live-migration and saving the learning history via Libvirt interface on this prototype system.

On the other hands, our prototype system has two internal networks. The one of the internal network is provided to make closed segment, it is used to make a keep-alive communication, and making the storage data transfer between Sheepdog distributed storage clusters. This internal network become one Layer2 segment to connect each organization’s segment by L2VPN over IPsec technology. The second of the internal network provides network reachability to the Internet, and it provides the connectivity between the users and LMS services. In addition, this network segment is used to make a connection for VM controls under the secure environment with optimized packet filtering.

4 Experimental Use and Results

This prototype system was tested to confirm its effectiveness. We made the virtual disk images and virtual machines configuration on our prototype system. And, several VMs was installed LMS such as Moodle. Each size of the virtual disk image is 20 GB, and each size of allocated system memory is 4 GB on this experimental use. Table 1 presents the node hardware specification for the private cloud fabrics, and OpenFlow controller and private cloud collaboration controller specification are presented in Table 2.

Table 1. Specification of the private cloud nodes
Table 2. Specification of openflow controller and private cloud collaboration controller

The prototype of the private cloud fabrics are constructed by eight node machines, and each node has 250 Gbytes capacity HDD. The total amount of physical HDD capacity is about 2.0 Tbytes. Each clustered node uses about 4 Gbytes capacities for the hypervisor function with an operating system. We think this amount is ignorable small capacity. However, the distributed storage system has triple redundancy for this test. As a result, we can use about 700 Gbytes storage capacity with enough redundancy. The total capacity of the distributed storage system can extend to add other node servers, exchange to larger HDDs, and taking both solutions. We can take enough scalability and redundancy by this distributed storage system.

We tried to do a live-migration in our prototype system. We make the test with two cases. One of the cases is to do live-migration in the same private cloud fabric. This case is targeted making live-migration in an organization. Other case is to do live-migration between private cloud fabrics. This case is targeted making live-migration inter-organization.

Table 3 shows the time of live-migration for experimental trial. We used the operate VM’s live-migration by the interface of Virtual Machine Manager. The time of live-migration for same private cloud fabric is needed 23.8 s. The time of live-migration for inter-private cloud fabrics is needed 24.6 s. We think that both experimental times is enough live-migration time for a disaster reduction of provided VMs. And, we could get a complete successful result with active condition.

Table 3. Time of live migration

In addition, the live-migration of these experimental use is operated by private cloud collaboration controller, and this live-migration function was triggered by customized Android based smartphone. We think this experimental use is pretty good, the time requirement for VMs migrating was a short period. However, the results were getting under the initial condition. The VM which are made heavy use of LMS has large size of virtual disk image. Therefore, the time of live-migration will needed more than initial condition. We think we have to make the experimental use under the actual condition.

In the real situation, we think we will use an emergency notification of the disaster from any mobile communication carrier such as NTT DoCoMo, KDDI and Softbank via their smart phones. The custom application program is installed to any smartphone such as Android platform and iPhone platform. If we can get the information of emergency notifications via smartphone with near field communication method such as USB interface, Bluetooth communication method and so on, we will be able to make a trigger of VMs live-migration with more precision.

5 Conclusion

In this paper, we proposed a framework of disaster recovery for e-Learning environment. Especially, we described an assistance to use our proposed framework, and we show the importance of an against the earthquake and tsunami disaster for e-Learning environment. We built the prototype system based on our proposed framework, and we described a system configuration of the prototype system. And, we shown the results of experimental use and examine.For the future, we have a plan to implement the function of getting earthquake notification from other smartphone such as iOS based smartphone. And we will try to test the cloud computing orchestration framework such as OpenStack and CloudStack. And, we will try to experiment confirming its effectiveness under the inter-organization environment with multipoint organizations.