Keywords

1 Introduction

In large scientific projects with detectors for data acquisition, the access to the detectors is very important for management and control. In such projects the detectors can be distributed over a few thousands of square kilometers, e.g., the Auger Observatory [11]. In such distributed systems the access to the sensors must be managed over uniform interfaces from a central campus. Inside the campus the security aspects for accessing the detectors are very high if they are distributed over such a huge area. Any unauthorized access must be blocked to ensure that the measured data is authentic.

The security requirements are still higher if the access to the detectors is allowed from outside the central campus from worldwide. This remote functionality is necessary for developers and scientists to get status information about running components of the system and also intended to give the best possible support for operators at experiment site in case of severe problems with the detectors. The access to the data acquisition (DAQ) and Slow Control systems for the “outside world” has to be handled very carefully and unauthorized access has to be strictly prohibited to ensure that the measured data is authentic. The Slow Control System is responsible to control runtime and to check if conditions are accurate for data acquisition.

Nevertheless, software with remote functionalities has more requirements than only the security aspects. Especially for the Auger Observatory these are as follows:

  • During deployment of the observatory the deployed part of the system is already used for measuring data with the detectors.

  • The existing software for some components of the Auger Observatory was not developed to be remotely accessible from worldwide.

  • Access from abroad to a few important components of the existing systems can sometimes be very slow. One of the main objectives of the AugerAccess project is to provide a new, fast, and reliable link to the Auger Observatory.

These are the reasons why the existing system of computers and networks can not be used for development of new software components, as it is not possible to use a running system for development and to risk losing data during that time. So, another solution must be found in order to develop the new software. Therefore two different approaches are possible:

  • All necessary hardware components (like computers and switches) can be bought and deployed the same way as they are deployed in the existing system in Argentina.

  • The virtualization of all computers and equivalent networks as virtual machines (VM) on only one dedicated server. A VM is the abstraction of the resources of a computer and hides the physical characteristics of these resources from the way the VM is interacting with the resources [19]. Working with a VM is like working with a normal computer, but in reality a special hypervisor interacts with the physical hardware. In this way the resources of a computer can be reused more efficiently by several VMs.

2 Fundamentals

Extending the software of the fluorescence detectors (FD) (Section 2) of the Auger Observatory with remote functionality requires major changes in the existing software. Therefore a testbed during development for testing the integration of the new software with all sub systems of the Auger Observatory is necessary.

In the following sections the aims of the Auger Observatory and the AugerAccess project, as well as the requirements for a testbed, are discussed.

2.1 Auger Observatory

The Auger Observatory is intended to study the universe highest energy particles with energies of over 1020 electron volts (eV). Those particles shower down on earth in form of cosmic rays. Cosmic rays with low to moderate energies are well understood, but not much is known about those with extremely high energies. The Auger Observatory is detecting and studying those rare particles and revealing the enigmas of their origin and energy distribution [11].

Cosmic rays are charged particles, which constantly hit earth. When those particles reach the atmosphere they collide with other particles and produce cascades of secondary particles called an “extensive air shower.”

Cosmic rays with energies above 1019 eV arrive on earth at a rate of only 1 particle per km2 per year. The most interesting ones, having energies of above 1020 eV, have an estimated arrival rate of just 1 particle per km2 per century. In order to collect a sufficient number of events a very huge area of detectors and a long observation time are required.

The Auger Observatory is a “hybrid detector” measuring cosmic ray showers with two independent methods. The water-Cherenkov detectors, which are also referred to as surface detectors (SD), are a ground-based technique and detect particles in the shower when their electromagnetic shock waves produce Cherenkov light [19] during the interaction with water in the SD. If such a cosmic ray shower strikes in a SD, Cherenkov light is produced, which is detected by photomultiplier tubes mounted inside the water tanks. The Auger Observatory consists of 1,600 SDs, spaced in a triangular grid of 1.5 km. In total the corresponding detector area is about 3,000 km2, roughly 30 times the area of Paris [11].

The FDs are the second method of observation and measure the longitudinal distribution of cosmic rays emitted by the excited molecules in the air along the shower tracks. Therefore the Auger Observatory consists of four telescope buildings (FD Sites), each one with six telescopes, where every telescope covers a view of 30° × 30°. The measurement of data with the telescopes is done by the so-called front-end crates, which are specialized hardware devices developed for the Auger Observatory. The complete real-time behavior of the measurement with the telescopes is handled by the front-end crates, as they are developed especially for that purpose. To cover the complete atmosphere above the SD area, the FD Sites are arranged at the borders of the SD area (see Fig. 31.1).

Fig. 31.1
figure 31_1_160723_1_En

The whole detector area for measuring cosmic ray showers, which is currently deployed in Argentina. The dots mark the surface detectors positions and the lines show the field of view of the 24 fluorescence detectors [11]

The whole Auger Observatory consists of several more systems than only FD and SD. For example, a LIDAR network is deployed for the measurement and online monitoring of the atmospheric optical parameters [8] and a ballooning station for atmospheric monitoring exists.

2.2 AugerAccess

As already said the Auger Observatory is located in a remote region far from the participating research institutions. The presence of scientists and technicians from the collaborating institutions on site is currently necessary during the phases of installation and commissioning of the detectors. However, in the long term it will be very difficult to maintain such a level of involvement on site when scientists will be very busy with data analysis at their home institutions all over the world.

The main goals for AugerAccess are the improvement of the communication link between the Auger Observatory and the international networks and the remote monitoring and remote control functionality of the software to provide access to the running experiment for the community worldwide [10].

At the moment the access to the central campus in Argentina for the “outside world” is blocked by a firewall, allowing only very few developers to connect from outside. To allow the access for the Auger collaboration, methods have to be provided that only authorized users can access. The Grid Security Infrastructure (GSI) [14] of the Globus Toolkit 4 (GT 4) [16] is the chosen technique to provide the security requirements. The main features of the GSI [14] are as follows:

  • Security across organizational boundaries, prohibiting a centrally managed security system.

  • Support of “single sign-on” for users of the Grid, including delegation of credentials for computations that involve multiple resources.

To provide the GSI, a GT 4 Grid Services Container has to be installed at the central campus and the firewall must be opened for communication with the Grid Services Container. Every user operating with the remote system needs a valid X.509 certificate from a certificate authority (CA), containing information to identify and authenticate the user over the GSI by comparing the users X.509 certificate and the known CA keys. Therefore the user firstly must be authorized from an administrator to operate with the GT 4 installation. If a user or a CA is unknown the request will be rejected and the access denied. Worldwide official CAs exist in every country, signing personal certificates only after checking the identity of the potential users [19].

2.3 Testbed

During development of the new software components with remote functionality a testbed is required, as the original system in Argentina is already in production during deployment. The requirements for the testbed are as follows:

  • The testbed must have similar structure as the original system to simulate the interaction of the different sub-systems of the observatory.

  • The used components must be similar in the number of computers and network hardware. The computers have to be connected to the network as in the original system to simulate similar interaction between them.

  • The software for DAQ and Slow Control must be installed and usable on the testbed. Also the installed services in the Auger Observatory must run, e.g., a big amount of the computers in Argentina are diskless clients and boot over the network.

  • The testbed must be easy to configure and administrate. Also the installation must have a high availability – such as the original system.

Before a decision for a concrete solution of the testbed was done, the existing system was analyzed to choose the best available one. The systems in the Auger Observatory consist of several private networks with different subnet masks (see Fig. 31.3). The FD LAN is the network in the central campus and responsible for the communication with DAQ, Slow Control, and the “outside world.” Every FD Site has its own Eye LAN, containing several computers (see Fig. 31.2). The main ones are Eye PC, Slow Control PC, and Calibration PC. On the Eye PC the software for controlling and getting data out of the telescopes is installed. The Slow Control PC is running the Slow Control system and the Calibration PC is for the calibration of the telescopes at beginning and end of measurements. The Eye LAN connects to the Mirror LAN, containing six Mirror PCs each. The Mirror PCs are responsible for the communication with the telescopes and for getting the measured data out of them. In total all the three LANs have nine switches with nine different subnets and 39 computers – only half of the amount of computers in the observatory.

Fig. 31.2
figure 31_2_160723_1_En

The structure of one FD Site of the Auger Observatory with Eye and Mirror LAN. Behind the Mirror PCs the front-end crates, responsible for collecting the data from the telescopes, visible behind the crates, are shown

The configuration of the software has similar complexity as the hardware. Starting with a fixed version of the operating system (OS), it goes over several network services like Domain Name System (DNS) [6, 7] or Dynamic Host Configuration Protocol (DHCP) [2] and ends with the network boot of several computers.

As the system with all networks and computers has such a complexity the solution for the testbed with physical hardware is too expensive [13], especially for the necessary administration efforts and with respect to the needed space to deploy the hardware and network components. So the virtualization of the system was the chosen solution, having some advantages. The major ones are as follows:

  • If one server has not enough computational power, two or more can be integrated in a cluster to have a better scalability of the VMs. A VM can be deployed on one server of the cluster and communicate with a VM on another server as if they were on the same server.

  • The installation effort for VMs is much less than the installation effort for physical hardware. As already mentioned, the OS is a fixed version on the computers and this is also for the software. A VM must be installed once and is cloneable as often as needed. In the cloned VM only the configuration of hostname and network must be changed, which is a very fast procedure.

  • The installation and administration of the VMs can be done from only one client – simplifying their administration.

  • The costs of the complete testbed including space and cooling could be minimized to the costs of a virtual system [13].

The convenient testbed is configured once and may be used everywhere in the world. If other members of the Auger collaboration need the whole system – or parts, they can use the existing one or simply a copy. We aim to explore the usability for software testing of the DAQ. For usage of the system, administrators and users must learn how to use and configure the VMs and to handle the problems occurring by using this technology. To get an idea of the complexity of the FD system, Fig. 31.3 shows all networks and the connection of every computer needed for measuring data with FD in Argentina.

Fig. 31.3
figure 31_3_160723_1_En

The scheme of the network needed for FD. Every computer and the connection to the network with switches and network cards are visible

3 Architecture

For the virtualization of the Auger observatory a powerful server and software to run the VMs is necessary. Therefore an IBM Blade server with two Dual Core Xeon CPUs and 16 GB of memory is used. On the server VMware ESX Server [18] as hypervisor for abstracting the physical hardware and virtualization software is installed. The ESX Server was chosen, because for this product the handling of a huge amount of VMs is no problem as the server was designed for this kind of applications.

The architecture of the virtualized network is nearly the same as in the original system in Argentina, with a small difference concerning the networks. In Argentina every FD Site is connected to the central campus over a radio link, a requirement which is not possible to simulate with the ESX Server. The possible solution is the deployment with virtual switches. A virtual switch is the virtualization of a hardware switch and for the connection of virtual network cards to virtual networks responsible. The small disagreement with the not to virtualize radio link is negligible, as it makes no difference for the software how the network is connected – only the connection and communication protocol is important.

Also some additional behavior of the original system cannot be virtualized, as the ESX Server does not support the functionalities. For example, in the original system some computers reboot intermittently because of frequent power outages in Argentina. Another example is potential network problems, which can result in slow and unreliable networks, where some of the sent packets are lost during transport. The simulation of the network failures is possible, but only with a lot of configuration for the VMs. As such a behavior is not needed it was not configured.

3.1 Virtual Machines

As first step for the virtualization every existing computer of the FD system must be deployed as a VM on the ESX Server. Therefore one VM as template for all VMs was installed. The creation of a VM on an ESX Server is very simple with the provided client of the server installation. Only the name and the hardware (like network cards or hard disks) to virtualize must be indicated. After creation the VM is bootable like a normal computer, also from a CD-ROM, so that the installation of an OS is straightforward.

Currently on the ESX Server three computers of the FD LAN, visible in Fig. 31.3, are deployed. These are Ipecdas, Helge, and Gina. Ipecdas is the gateway for all computers in the private network to the Internet and contains the firewall responsible for denying access to the Auger networks for the “outside world.” Gina is responsible for the interaction with DAQ and Helge for the interaction with Slow Control. Los Leones and Los Morados as two examples of the four FD Sites are deployed at the moment. Each of them consists of Eye PC, Calibration PC, Slow Control PC, and six Mirror PCs. The FD Sites Loma Amarilla and Coihueco are not deployed on the testbed, because two FD Sites are sufficient to fulfill the current requirements and already provide the needed complexity of the structure for development of the new software components. If the absent FD Sites are needed, the deployment of them is straightforward by using the created template of the VMs.

After creation the necessary OS in the template VM was installed. The FD systems consist of three different OSs. The Calibration PCs are running Red Hat Linux 9 [12], the Slow Control computers and Helge using Windows 2000 [4], and the rest of the VMs are running SuSE Linux 9.1 [9]. As three different OS versions are needed the creation of three different templates, which can be used to clone the corresponding VMs from the accordingly template, is necessary. In the cloned VMs only the network identification and hostname must be changed. The name of the different VMs and their location in the network can be seen in Fig. 31.3.

3.2 Virtual Networks

To simulate the communication of the VMs as in the original system, they must be connected to virtual networks. This is done by virtual switches. To connect a VM to a virtual switch, the VM needs a virtual network card – nearly the same procedure as if physical hardware is used.

Each VM has at least one virtual network card and some VMs have more than one to virtualize the whole network the correct way. For example, the Eye PCs have two network cards. This is necessary because the Mirror LAN has its own subnet and for the communication of the Mirror PCs with the DAQ (running on the Eye PCs), it is necessary that each Eye PC has two network cards, routing between both subnets. The network cards are also required, because the Mirror PCs are diskless clients booting their file system over the network from their corresponding Eye PC.

There exists one difference between the network shown in Fig. 31.3 and the deployed virtual network: The default gateway for every computer in the Eye LAN has 192.168.*.70 as IP (Internet Protocol) address, but the setup of a default gateway for an ESX Server virtual switch is not possible, as this option is not available in the configuration of a virtual switch. For the right behavior of the testbed a communication of all VMs in the FD LAN with the worldwide networks is important. The only way to provide this is to add virtual network cards in the Ipecdas computer (see Fig. 31.3), having 192.168.*.70 as IP address. The default gateway in the network settings of each VM in the FD LAN is 192.168.*.70, so the communication with the worldwide networks is possible, if the Ipecdas computer is acting as router. To act as router the Ipecdas computer must provide NAT (Network Address Translation), which is very easy to realize in a Linux OS, as IP-Tables (program iptables under Linux) [17] are usable. NAT means rewriting of source and/or destination addresses of IP packets as they pass through a router or firewall [3, 19].

3.3 Services on the network

For the correct behavior of the virtualized network, it is important that all services running in the original system in Argentina are also running in the virtualized network. Otherwise the usage of the testbed for development purposes is not possible.

For the communication of the VMs, the software, and the users, the usage of IP addresses during operating with the system is not comfortable. A better way is to use a name when addressing a computer. Therefore, a DNS installation is provided in the virtual network.

To have a working lookup for the complex virtual network the installation of several DNS servers on different computers is important. First, on Gina (see Fig. 31.3) runs the DNS server for the FD and Eye LAN, allowing name resolution of all computers available in both LAN segments. The communication of the Mirror LAN with Gina is not possible, as the Mirror LAN has an own subnet. Therefore a DNS server on every Eye PC is running, providing name resolution for the Mirror PCs in the respective Mirror LAN.

Most VMs have static IP addresses in the network configuration of the corresponding OS. Nevertheless, this is not possible for the Mirror PCs, because they are diskless clients and mount their root file system over the network. Therefore, the IP addresses of the Mirror PCs are retrieved by DHCP. DHCP is used by network clients to allow requesting and obtaining an IP address from a server, which has a list of addresses available for assignment [2, 19]. During boot up of a Mirror PC, the IP address is obtained from the DHCP server running on the corresponding Eye PC.

After resolving its IP address the network boot of a Mirror PC can be executed. This is provided by the Preboot Execution Environment (PXE). PXE is a specification for booting an OS from a network server using a network ROM (read only memory) that conforms to the Intel PXE specification [1]. The scenario for booting over PXE is very simple:

  1. 1.

    An IP address is requested from a DHCP server.

  2. 2.

    A Trivial File Transfer Protocol (TFTP) server runs on the computer with the DHCP server. “TFTP is a very simple file transfer protocol with the functionality of a very basic form of FTP” [19] and delivers the configuration (what kernel image to boot) to the booting diskless client.

  3. 3.

    The location of the kernel image for booting the diskless client, as well as the address of the TFTP server, is declared in the DHCP configuration.

  4. 4.

    The file system(s) to mount from the booted kernel image must be exported via Network File System (NFS). NFS allows the access to files over the network as easy as if they were on a local disk [15, 19].

At least the needed file systems for the Mirror PCs are exported via NFS from the Eye PCs. These are the root file system, the home directories for the users of the Mirror PCs, the directories for the measured data with FD, and the directory for the log files produced by the software.

4 Status and Results

As a consequence all the VMs and virtual network devices required for the testbed are deployed and configured. The configuration of the virtual network environment is setup; hence all the virtual network cards are connected to their corresponding virtual switches. Furthermore, the substantial network services, as already mentioned in Section 3, have been configured and are up and running. As a consequence it can be stated that the deployed testbed offers very similar capabilities as the original system deployed in Argentina, despite the following minor differences:

  • Variability of the radio link between the central campus and every FD Site;

  • The integration of the default gateway in every subnet via a separated virtual network card;

  • The NAT of Ipecdas (see Fig. 31.3) is managed by a VM with IP-Tables, instead of a router.

Nevertheless, these differences are not relevant for the deployed software, since the physical transport has no impact in this case. The correct transmission of the sent packages is the only requirement.

The software needed for measuring data with FD is already installed and the DAQ system is running without errors. However, the DAQ is producing only “Dummy-Data,” which is based on real data, including different time stamps. This is necessary due to the unavailability of a telescope and a front-end crate.

In regard to the Slow Control system the two PCs for Los Leones and Los Morados FD Sites are installed and can be accessed in the same way as in the original system in Argentina, though any invoked operation will have no further effects because of the missing physical hardware. For example, the “open shutter” procedure will cause an error, because of the missing responses from the corresponding shutter hardware.

So, the testbed is usable for the development of the remote software. Another advantage is that the testbed is available for other partners of the Auger collaboration if needed. They can use the existing one or simply a copy.

The virtualization also has some limitations. Especially, hardware failure scenarios cannot be simulated, e.g., a sudden shutdown of a device because of power outage. If needed, such a behavior has to be simulated by manually switching the corresponding VMs. The telescopes of the Auger Observatory also cannot be of simulated, since they consist of special hardware of which no virtualized component is available. Nevertheless, real telescopes can be connected by configuring the ESX Server to add physical hardware to the virtual infrastructure.

4.1 Faced Problems

Several problems occurred during deployment of the virtual system, most of which have been solved. One problem was the time synchronization of the VMs. Initially we used the NTP (Network Time Protocol) [5] client of the VMs to synchronize time with a public available time host. However, the VMs time differed strongly from the real time after uptime of a few hours. The reason is that the VMs are sent into a sleep mode by the ESX Server when they produce no system load. After resuming the VM the NTP does not recognize the gap between the current system time and the actual time, which leads to the discovered time difference. The solution is to use the time synchronization offered by the VMware tools, which requires additional kernel parameters during start up of the VM.

Another problem arises from the fact that all VMs are stored in a storage area network (SAN). A SAN provides storage devices to remote computers, e.g., disk array controllers, to server systems so that the operating system recognizes the SAN storage device as a local storage device [19]. If the response time of the SAN exceeds a certain timeout value due to heavy system load, the VM changes the state of the mounted file system to “read only.” In this mode the VM is not able to work in an appropriate manner and therefore has to be rebooted. This is a known problem by VMware and a patch is already provided. However, the application of this patch is not possible in the current configuration of the used operating systems, since the patch is only available for newer Linux kernel versions and the operating systems deployed in Argentina use older versions of the Linux kernel. We considered this problem as negligible, as it occurs only once in 2 or 3 months, which exceed the time for development and testing by far.

4.2 Remote Client

As the testbed is fully deployed with all required VMs, the services for communication in the testbed are configured and the existing Auger software is able to “measure” data, the testbed can be used for development of the remote client to control the DAQ and Slow Control systems of the observatory.

The remote client is a service-based software and uses the GSI of the GT 4 for communication with (see Fig. 31.4) the respective software inside the observatory. The security features are only needed for the access to the system from outside the Auger campus. No additional security features are needed inside the campus. A remote user can only connect to the system after the authorization and authentication via his X.509 certificate is permitted. This is only possible if the X.509 certificate is known by the GT 4 Grid Services Container the user is communicating with. To control the system the GT 4 Grid Services Container is extended with several Grid services, whereas these services are responsible for performing operations with the DAQ. All operations are performed via the remote client, which is also responsible for the authorization and authentication of the user.

Fig.31.4
figure 31_4_160723_1_En

Screenshot of the remote client with opened dialog to add or remove telescopes. You can see the current status of the telescope building in Los Leones

After the access of a user is permitted, the so-called to accessed Grid service performs the according operation on the Eye PC of the corresponding telescope building. Therefore the operation to perform, which is declared in a special protocol, is sent over a SOAP message to the data acquisition software on the Eye PC, performed, and the remote client is notified about the status of the performed action. In order to retrieve, evaluate, and perform the SOAP messages the existing data acquisition software has to be adapted, because at the moment the data acquisition software is only controllable over a graphical user interface inside the central Auger campus. This graphical user interface at the moment implements the FD communication protocol.

The Slow Control system is already controlled over SOAP messages. This would allow the development of a uniform access method to the DAQ and Slow Control system. The problem hereby is that the Slow Control system is polling the status every several seconds and in order to develop a successful system the remote software has to poll the status again, as the server responsible for the status is not accessible. This would double up the communication and the network load and is not advisable. So the decision is to use a GSI SSH server and a port forwarding of the Slow Control server to be able to access the Slow Control system from worldwide. The opening and closing of the SSH connection is possible within the remote client, making the interaction with the Slow Control system with only one performed operation possible.

5 Discussion and Future

The visualized system for virtualization of the FD network as testbed for the development of remote control and monitoring software is a very helpful solution and much easier to use than an implementation with physical hardware. The virtualization is very flexible and easy to extend – if necessary. Just clone and connect a new VM or even an entire FD Site to the virtual network. The virtualized solution also has some restrictions, e.g., not to simulate radio links. However, these problems are not considered for the development of the new software components.

The virtualized system fulfills all our requirements and the current DAQ system is already installed and running without any problems. The development of new software components has already begun and so the testbed is in productive use for its destined purpose.

After the new software is completely developed and tested the currently running version in Argentina can be replaced with the new one. The remote functionality will only be available after an upgrade of the communication link to the Auger Observatory.

Another possibility we are exploring is to run the whole FD system in Argentina with VMs. Therefore the hardware must be virtualized, so that VMs can be executed. To virtualize the hardware, software like the ESX Server or XEN [20] can be used. Such a solution would have some advantages: If a new software version must be installed, the whole VM can be configured and tested on our testbed. If all tests are passed successfully, the currently running VM in Argentina could be halted and the “new version” started. This would offer a very fast solution for changing the software and even offer a very easy fallback solution in the case of any error. The old VM simply has to be booted and the old system would be up and running, which is done within a few minutes.

6 Conclusions

The virtualization is a very powerful and flexible solution for deploying a complex testbed as the computer and network infrastructure of the Auger Observatory. The deployment of a virtual testbed is easier than to deploy a second infrastructure using real hardware. The advantages for virtualization are as follows:

  • The virtualized infrastructure is very easy to extend with new components – e.g., cloning existing virtual machines and integrate them in the system.

  • The administration of the whole system can be done by using one single client from wherever installed. So, changing the system configuration or reboot a VM is very easy.

  • The host server can be shared with other projects, if the server has enough resources or several servers can be added to a cluster to provide the needed resources.

  • After the lifetime of the project the server can be used for other projects, and also the backup of VMs is easier than to depollute physical hardware.

The virtualization not only has advantages, as it is a new technology and users have to learn how to use the provided features. One example is the time synchronization with the provided VMware tools and not over NTP. However, it is not difficult to learn the new technology and the advantages outweigh the disadvantages.