An AIPS-based, distributed processing method for large radio interferometric datasets
- First Online:
- Cite this article as:
- Bourke, S., van Langevelde, H.J., Torstensson, K. et al. Exp Astron (2013) 36: 59. doi:10.1007/s10686-012-9315-0
- 148 Views
The data output rates of modern radio interferometric telescopes make the traditional data reduction process impractical in many cases. We report on the implementation of a lightweight infrastructure, named AIPSLite, that enables the deployment of AIPS interferometric processing routines on distributed systems in an autonomous and fault tolerant manner. We discuss how this approach was used to search for sources of 6.7 GHz methanol maser emission in the Cep A region with the European VLBI Network (EVN). The field was searched out to a radius of 1.25 min−1 at milli-arcsecond spatial resolution and 1024 frequency channels with 0.088 km s−1 velocity resolution. The imaged data was on the order of 30 TB. Processing was performed on 128 processors of the Irish Centre for High End Computing (ICHEC) linux cluster with a run time of 42 h, and a total of 212 CPU days.
KeywordsInterferometry data processingDistributed processingAIPSParselTongueAIPSLite
The landscape of radio interferometric telescopes is currently in a state of flux. Existing telescopes are being upgraded and entire new telescopes are being built deploying modern digital data transport and processing equipment. These include instruments which have been substantially upgraded such as eMerlin and the Jansky Very Large Array (JVLA), instruments which have had a series of incremental upgrades such as the European VLBI Network (EVN) and Very Long Base Line Array (VLBA), as well as a slew of new instruments which are currently being built and commissioned such as Lofar, ASKAP, and MeerKat. Common to all these instruments are their capacity to produce data volume orders of magnitudes larger than their predecessors.
Two technologies are common to many if not all of the upcoming generation of interferometers which result in large data output rates: wide band receivers and high throughput correlators. Motivated by the desire for increased sensitivity as well as coverage of previously unobserved frequency ranges, wide-band receivers are becoming the norm. For continuum sources with relatively flat spectrum over the observed band, the sensitivity of the instrument scales with the square root of the observing bandwidth. While the previous generation of interferometers often had a bandwidth of less than 100 MHz, current technology allows for receivers to cover an order of 1 octave providing 1–8 GHz of bandwidth in centimeter bands. It is desirable to subdivide the receiving bandwidth in to narrow sub-channels for a number of reasons. Firstly, this allows the mitigation of radio frequency inference (RFI)—unwanted man-made signals which are often narrow-band. Secondly, for reasons described below, imaging wide fields places restrictions on the maximum channel width that is acceptable if high fidelity images are to be produced. Thirdly, offline calibration techniques allow for accurate determination of the antenna receiver bandpass gain parameters, thus allowing correction and the applications of weighting parameters to avoid sensitivity losses, though modest frequency resolution is adequate for this purpose.
The second major advance common in the upcoming instruments is the use of high throughput correlators. A correlator is the computing system responsible for the production of visibilities, the data product of interferometers. Correlators are in concept simple devices, producing complex numbers related to the Fourier domain special frequency components of the sky brightness distribution. The implementation details and their high data throughput rates however often require them to use highly specialized hardware (see ). The output data rate is controlled primarily by two parameters, the number of frequency points generated, and the time averaging interval. The science result may lie in these regimes, i.e., spectral line/temporal variability but more commonly the reason for a high data rate is wide field imaging and, at low frequencies, RFI mitigation.
The manner in which interferometric data is processed will depend on the scientific goal of the observation. In many cases, however large parts of the processing is readily parallelizable. Data calibration is to a large extent decomposable in the time domain. In this case the data can be divided in time and calibrated in parallel. If the desired end product is a data cube—a set of maps, each at a different frequency—the imaging process is also easily parallelizable by decomposing along the frequency axis. Even in the case where a continuum map of the entire frequency band is desired the imaging can be decomposed and recombined later in the process with modest additional overhead. While parallel software is commonly used at observatories (e.g. [7, 9, 14]), its prevalence in the common user software packages is low. Of the packages AIPS, Casa, Miriad, and Difmap, only Casa aims to make use of parallel hardware. Given large volumes of data, it is desirable to use High Performance Computing (HPC) methods, such as parallel or distributed processing to calibrate and image the data in a timely fashion.
Compute clusters, the most common type of distributed system, often consist of one or more head nodes which users can access, and a large number of compute nodes. Batch jobs are submitted via a batch system on the head nodes which is responsible for scheduling the job, allocating cluster nodes and running the job. Disk storage is often two tier, with shared network space accessible from all compute nodes and local storage only accessible locally by each compute node. Access to such system is commonplace, however utilizing them for interferometric data processes is non-trivial, as the existing software is largely designed to be run on a workstation. Software which is available system-wide is limited, and local storage areas are often erased on completion of a job.
2 AIPSLite, a facility for distributed AIPS processing
The Astronomical Image Processing System (AIPS) is a well established system for end to end processing of radio interferometer observations. It is written in Fortran and has evolved over more than three decades. While it is a versatile and extremely complete system, its design does not immediately adapt well to distributed clusters. The distribution is in the order of 1GB. AIPS does not support deployment on a cluster for the purpose of distributed processing. Rather, its network deployment mode is for the purpose of central administration of data areas, tape drives, and configuration areas.
ParselTongue  is a set of Python modules that provides a Python interface to AIPS. It has two main functions, to allow the execution of AIPS tasks, and to provide access to the underlying data, including visibility data, images, and table data. Python’s flexibility, expressiveness, and extensive library combined with the data access provided by ParselTongue allow for a high level of sophistication in automating AIPS (e.g., [2, 8]).
We have developed a set of Python modules which we named AIPSLite that extends ParselTongue and allows for machines without an AIPS distribution to bootstrap themselves with a minimal AIPS environment. Data areas may be created and destroyed at will, and AIPS tasks can be downloaded and executed on the fly. All this is performed dynamically at run-time. Pre-existing AIPS files, both AIPS binaries (tasks) and data products, may be utilized. Multiple processes running on a single node can be entirely isolated from each other.
2.1 Architectural overview
Determine architecture and required libraries
Define environment variables
Determine AIPS version
Construct list of required files: libs, binaries, and metadata
Establish rsync connection to AIPS server and transfer files
Create AIPS run-time resources: Template, memory, data areas
Run initialization code to populate run-time areas
Create private configuration area, populating from template
Create data area(s)
In the AIPSLite system, the worker nodes set up their own configuration area (DA00) and AIPS data areas. They process data in an isolated environment and save the results in a shared area. This is contrary to AIPS’ normal mode of operation in which all AIPS instances running on a single host share configuration and data areas. AIPSLite provides a per-process as apposed to a per-host private AIPS distribution which is set up at run time with minimal overhead. This allows for an unlimited number of AIPS environments to be established and driven independently, and does away with the requirement to configure AIPS for each host it will run on.
Process isolation is largely facilitated in two ways. Firstly, prior to loading the ParselTongue modules, AIPSLite is used by each process to create a private DA00 area and used for the life time of that process. is initialized from the TEMPLATE directory which forms part of the minimal AIPS distribution. Each process also creates its own AIPS disk for use during its lifetime. This architecture was developed to overcome a failure mode common under high load, in which ParselTongue would fail to successfully allocate AIPS POPS numbers during early testing. If the cluster architecture supports it, both the DA00 and the AIPS disks are created in a temporary partition that is allocate by the job scheduler running on the cluster. The contents of this partition may be deleted on completion of the job by the batch system.
In this section an observational campaign will be described in which the EVN Mark4 correlator was utilized at 100 % capacity, and with sufficiently low time and spectral averaging to allow for full primary beam imaging. The goal of the observation (EL032) was the study of massive star formation regions via maser kinematics. The scientific objectives were two-fold, to image in high spatial and frequency resolution known sites of methanol maser emission and thereby constrain the kinematics of the systems to allow for improvement of the models of massive star formation. The second objective was to search the surrounding area for additional maser activity. Due to the high spatial and frequency resolution and wide field of view, this represents a significant data processing challenge. The software described in the previous section was applied to the problem and the specific approach is discussed in this section. For astronomical results from this campaign refer to .
Parameters of the target sources
3.1 Data calibration
The calibration of the data did not present computational challenges and was performed by traditional means. For the purpose of determining calibration solutions the data could be averaged by a factor of eight in time and thirty two in frequency which were then applied to the full, non-averaged data. Calibration of the data was performed in traditional AIPS. A priori amplitude calibration was carried out using the system temperature and antenna gain curves with the task APCAL. Phase calibration was performed by the AIPS task FRING. A two stage approach to fringe fitting was performed by solving for the fringe delay over two minutes of data on the field’s principal bright calibrator. These solutions were then applied to the data and then full fringe delay, rate, and phase solutions were determined on the entire dataset. Self calibration was implemented using several maser sources in the field of view. Some of the maser sources were significantly brighter than the calibrator sources. In these cases the maser was used to calibrate the data with the calibrator source being used solely for the purpose of astrometric reference. The AIPS task CVEL was used to correct the frequency spectrum to account for movement of the instrument relative to the local standard of rest.
3.2 Wide field search
The search for sources can be either performed in the visibility domain or the image domain. Due to the low number of stations, and the short observation time, the visibility data discussed in this section are reasonably modest (≈20 GB per source). The high resolution and wide field of view result in extremely large image domain maps (≈30 TB per source). In this case, operating in the visibility domain would be advantageous as the volume of data is significantly lower. The visibility domain, fringe rate mapping technique  implemented in AIPS as FRMAP exists, but requires sources to be detectable on many individual baselines within a a short period of time to locate the source, and therefore incur far worse detection thresholds. Computational requirements aside, searching in the image domain is a simpler problem. The imaging routines in AIPS offer a high degree of flexibility and when the data is in the image domain, simple algorithms can be used for source detection. As we had access to HPC resources the image domain was determined to be the more suitable domain to work in, providing a simpler problem and saving implementation time. The software described in Section 2 was used to accomplish the wide field search.
3.3 Imageable field of view
To determine the scale of the problem the imageable field of view must be calculated. The field of view is primarily limited by three factors; (1) the physical optics of the antennas involved, (2) the time and, (3) frequency averaging dictated during correlation. An additional issue, the w-term effect originates from the assumption that the array is 2 dimensional and can result in distortions that increase with distance form the phase center. Solutions to this last problem now exist in imaging algorithms [6, 11, 13] with the polyhedral imaging technique been provided in AIPS.
3.3.1 Time & frequency averaging limitation
3.3.2 The w-term limitation
The (u, v, w) vector associated with a visibility is calculated for a specific direction. When we image wide fields this vector will vary considerably over the field and if not corrected for, will cause a distortion of sources which increases in severity with distance from the field center. Algorithms such as w-projection  have been developed to correct for this. This algorithm is not available in AIPS however, which handles the problem via polyhedral imaging. The field is divided into many sub-fields, each with recalculated visibility phases and (u, v, w) vectors.
3.3.3 Primary beam limitations
The baselines with the largest dishes (and therefore the narrowest primary beam envelope) in this experiment are these consisting of the Effelsberg 100 m dish coupled with a 32 m class dish (those at Cambridge, Medicina, Noto, Torun). Taking the calculations from  and scaling to 6.7 GHz, we get a half-power beamwidth (HPBW) of approximately 2.5 min−1 which effectively limits our high sensitivity field of view. Disregarding Effelsberg data extends the HPBW to at least 5.8 min−1, albeit at lower sensitivity. It should be noted that these are the half power limits i.e. 50 % amplitude field of view, while the above limits are to 90 % amplitude levels, meaning that the primary beam limitation is much more severe than the time and frequency averaging effects.
If we limit the area we image to the 100 m–32 m baseline HPBW, we cover an area of approximately half that radius and one can average up to at least 0.5 s of arc without adverse time averages effects.
The most computationally intense portion of the data analysis is the imaging of the data; the transformation of the calibrated interferometric data into sky images is accomplished by first interpolating the data onto a regular grid. This is usually accomplished by means of a convolution. The data is then be transformed to the image domain by means of an FFT.
Based on the inherent resolution of data and allowing for Nyquist sampling, a cellsize of 1.5 mas was used for imaging. As we can decompose the field into sub-fields or facets, we need not image a rectangular area encompassing the imageable portion, rather we will approximate the circular field with small rectangular facets. Using the most limiting half power beam width, that involving Efflesberg and a 32 m antennas, of 2.5 min−1, we image an area of π × 752~17500 s−2. With the cell size of 1.5 mas and ~1000 spectral channels this yields an data cube of order 7 × 1012 cells. As these cells are stored internally as 32 bit floating point numbers the storage required to accommodate this cube is of the order of 30 TB. The gridded UV data produced as an intermediate step in the imaging process requires a similar amount of capacity. It is the processing of these quantities of data that necessitated the use of a distributed computing solution. The AIPSLite system described in Section 2 was used for the imaging and detection process.
3.4 Processing the wide field
Determine execution environment
Decompose data based on computing resources present
- Distribute datasets to nodes. On each compute node:
Dynamically configure node for AIPS usage
Decompose field into optimized facets
Sequentially image facets
Run detection routines and collect statistics
Aggregate statistics and store centrally
Identify sources of emission and remove from dataset
Re-analyze portions of the dataset affected by the previous step
3.5 Data decomposition
As the target sources in these observations are masers and have a finite frequency (although subject to Doppler broadening) the frequency channels are to first order independent with respect to the imaging process. Furthermore, on the cluster used for this project, the number of compute nodes allocated was 100, therefore the number of spectral channels exceeds the number of computing resources by a factor of 10. These factors allow for convenient data decomposition along the frequency axis, with each compute node dealing with a subset of frequency channels.
It is not possible to image the entire field in one pass. Apart from the fact that the AIPS IMAGR task has a maximum field size of 8192 × 8192 cells, the computational requirements would be prohibitive. Furthermore, the tangent shifting approach to handling the w-term problem, necessitates the use of sub-facet imaging. In this approach, the field is divided into sub-fields or ‘facets’ which are imaged separately. The observations described here are not sensitive to structures large with respect to the facet size. The shortest baseline present is approximately one tenth of the longest. Highly resolved sources are not expected in these observations, but were they to exist, they would be imaged by this faceted imaging approach. Experiments were performed to determine if there was an optimal choice for the dimensions of the facets in terms of cells and frequency channels which would minimize the computational time required. A field of dimensions 8192 cells squared with 16 frequency channels was imaged with various decompositions. From this it was determined that the optimal facet is of size 2048 cells squared. The number of channels imaged in each run is less significant. Channel sizes of 2, 4 and 8 yielded comparable performances, with 4 being marginally more optimal.
AIPS provides the SETFC program to automate the generation of facet parameters. It was found to lack some flexibility required for this project such as configurable geometric layout. Similar functionality to that provided by SETFC but with extra configuration options was developed in Python as part of the analysis.
3.6 Multiple CPU usage
Processing nodes in modern high performance clusters typically consist of single or multiple processing cores. With multi-CPU machines it can be quite a challenge to keep the CPUs busy as they have to compete for I/O. AIPS is in general quite I/O expensive; most tasks do not allow for in-place editing of data, instead a new output dataset is created by the task being executed. If a task’s I/O dominates performance then this additional computational capability is of little use. The process of interferometric imaging is computationally intense. Construction of a regular grid suitable for Fourier inversion from interferometer data is accomplished via convolutional gridding. This step, in which each visibility is convolved by a kernel and evaluated at the grid points, dominates the imaging process . The gridded data is then inverted yielding the image data.
An experiment was run on the Joint Institute for VLBI in Europe’s ALBUS cluster to determine if multi processing of AIPS imaging can indeed improve performance or whether the system is I/O bound. The ALBUS cluster consists of four nodes, each with four CPUs. The multi-core tests were run on this system to determine how performance scales to four CPUs. It should be noted here that the multi CPU AIPS usage discussed is via separate executions of AIPS tasks via ParselTongue. The tasks themselves are not parallelized, rather separate instances of them are run in parallel acting on different data.
3.7 Detection mechanism
The AIPS task IMAGR implements a Cotton–Schwab deconvolution algorithm . This provides superior image fidelity than other algorithms such as the Clark method  but is more processor intensive. A Clark CLEAN was also tested for performance. In this mode, IMAGR was used to produce non-deconvolved (‘dirty’) images which were then deconvolved by the Clark method with APCLN. This deconvolution method allows only for one quarter of the image to be deconvolved meaning the images size has to be doubled in both dimensions. This extra imaging load more than offsets any speedup gained. For this reason IMAGRs Cotton-Schwab CLEAN was used.
The source should have a signal to noise of greater than five to avoid an abundance of false positives, given the large search area.
Sources separated by a distance comparable to the resolution of the instrument are taken to be the same source.
A source should be present in at least two neighboring channels.
A map is then generated by the software of potential detections for the users inspection. The software can then remove verified sources from the UV dataset and these channels can then be re-analyzed. This is desirable due to the flat beam (PSF) of these observations. If a source is not subtracted from the data, other weaker sources may be hidden by the sidelobes of the strong source.
For each of the following maser sites Cep A, W3(OH) and AFGL5142 a circular field of radius 1.25 min−1 was processed. For each of these objects a job was prepared which and submitted to the Irish Centre for High End Computing (ICHEC) cluster via the PBS system. A resource allocation of 64 dual CPU nodes for 45 h was requested for each job totaling 5760 CPU hours per job. The RAM requirement were relatively low with each process requiring approximately 450 MB. Two process were running per node (as they each have two CPUs) for a total of 900 MB. This is well below the 4 GB available.
The output of these jobs are large data files containing information about regions of flux, including right ascension, declination, frequency and noise statistics. The AIPS logs are also recorded and saved. For a typical run, approximately 700 MB of statistics about emission in the field are produced and 1.5 GB of AIPS logs. In general, the AIPS logs are not of further use and can be deleted although they can be useful for problem diagnosis when and testing.
The emission statistics are then further processed as outlined above to identify masers in the field. The output of this processing is graphical representations of maser candidates, which are inspected and confirmed manually. When sources are confirmed they can be fed back into the system and a followup analysis is performed whereby the confirmed emission is subtracted from the data with the AIPS task UVSUB and the affected channels are reanalyzed. Due to the relatively flat dirty beam, strong sources of emission will leak emission over a wide area and can mask weaker sources. The subtraction and re-analysis stage circumvents this effect.
Running ParselTongue in the normal manner with standard POPS allocation, common configuration and data areas proved unreliable. The process isolation features discussed above were a requirement for stability. When these features were implemented a typical failure rate of less than one process per job was attained. Typically a job would spawn in the order of 10000 processes. Processes that failed were automatically re-run. A situation in which a job failure reoccurred was not observed.
Sample of masers detected in the Cep A field
In the analyzed maser sites, outlying sources of emission were not found. While this is disappointing, this result in itself provides useful information on the locality of the star formation regions. The scientific implications of these results are presented in .
We have developed a lightweight infrastructure, AIPSLite, to allow the deployment of AIPS routines on distributed systems. Using this infrastructure we developed a pipeline in Python with the ParselTongue interface which implements a truly distributed AIPS based analysis of wide field VLBI data. The resulting software has been shown to be highly robust, and is easily deployed on a heterogeneous multi-processor cluster environment running in this case PBS, and breaks the processing bottlenecks which have limited the use of AIPS in this and many other large scale datasets. This pipeline will be used to search the remaining maser sites in this observational campaign for unknown sources. Many masers in the field would be readily detectable without Effelsberg’s contribution. In the future, when computation performance dictates, an analysis of a wider field, up to diameter 5.8 min − 1, may be desirable. AIPSLite has been incorporated into ParselTongue as of the 2.0 release.
AIPSLite provides methods to set up AIPS on compute nodes with minimal effort providing infrastructure on which to run large AIPS based jobs. The task of data decomposition is left to the programmer as it is highly specific to the task at hand. This approach is useful when a job is highly batch parallizable i.e. the data can be easily be split into smaller chunks which can be handled independently. Interactive use, via either the AIPS TV or via user input is not easily facilitated in this mode of operation. The resources required by the worker nodes are the same as those for a traditional AIPS approach. The main requirement being that enough disk space and RAM be available to allow for the intermediate data products generated must be available to by AIPS for the portion of data to be processed.
AIPS Tasks are accompanied by a HLP file which contains information on the data structures used by the task which AIPS or ParselTongue requires as well as documentation on its functionality.
AIPS CC tables contain per-pixel flux levels extracted from the image by the deconvolution process, and act as source models within AIPS.
George Heald is thanked for his useful comments on the manuscript. Salvador Curiel is thanked for his continuum image of Cep A. S.B. acknowledges support by Enterprise Ireland, Science Foundation Ireland, and the Higher Education Authority. K.T. acknowledges support by the EU Framework 6 Marie Curie Early Stage Training programme under contract number MEST-CT-2005-19669 “ESTRELA”. This effort is supported by the European Community Framework Programme 7, Advanced Radio Astronomy in Europe, grant agreement no.: 227290. ParselTongue was developed in the context of the ALBUS project, which has benefited from research funding from the European Community’s sixth Framework Programme under RadioNet R113CT 2003 5058187. The authors wish to acknowledge the SFI/HEA Irish Centre for High-End Computing (ICHEC) for the provision of computational facilities and support.