Public “Cloud” Provisioning for Venus Express VMC Image Processing

  • J. L. Vázquez-PolettiEmail author
  • M. P. Velasco
  • S. Jiménez
  • D. Usero
  • I. M. Llorente
  • L. Vázquez
  • O. Korablev
  • D. Belyaev
  • M. V. Patsaeva
  • I. V. Khatuntsev
Original Paper


In this paper, we consider the implementation of the “cloud” computing strategy to study data sets associated to the atmospheric exploration of the planet Venus. More concretely, the Venus Monitoring Camera (VMC) onboard Venus Express orbiter provided the largest and the longest so far set of ultraviolet (UV), visible and near-IR images for investigation of the atmospheric circulation. To our best knowledge, this is the first time where the analysis of data from missions to Venus is integrated in the context of the “cloud” computing. The followed path and protocols can be extended to more general cases of space data analysis, and to the general framework of the big data analysis.


Retrieval Data integration Cloud computing Big data 

Mathematics Subject Classification

65D18 97R60 97R30 

1 Introduction

Our work is based on the previous Venus image processing from observations of Venus Monitoring Camera (VMC) when analyzing tracks of the planet clouds. We translate such analysis to a framework of “cloud” computing using the same algorithms described in Sect. 2 [3, 5]. Hereafter, “clouds” in virtual sense will be given in quotes to distinguish from physical clouds. The algorithms involved in the image processing are dealing implicitly with nonlocal/extended data. For the above reason, we are considering possible generalizations of fractional calculus concepts [6, 7, 8, 9]. This new approach is suitable to model the phenomena where the properties in one point depend on the behavior in a large neighborhood of the point.

Computational processing of this problem involves a great number of resources. For this reason, a hybrid “cloud” architecture has been proposed along with a very simple task classification method in Sect. 3. To our knowledge, this is the first time that “cloud” computing technology is used for Venus image processing.

The core of the present work lies in the public “cloud” infrastructure used in the proposed architecture, as its complexity demands a way to identify the best setup depending on the computational needs of each task. To accomplish this, an execution model is provided by means of different metrics such as throughput, cost and a mix of both.

2 Description of the Algorithms for Cloud Tracking

Contrast details of Venus clouds images can be used for estimating the direction and the speed of their displacement analyzing two scenes taken at different times. Such a motion can be measured using digital methods for tracking of cloud features in pairs of VMC images. This technique is based on a correlation analysis between two image matrices.

At first step, changes in observation conditions (e.g., shift of the sub-satellite point and change of the spatial resolution) between the two selected images are taken in account. The both scenes are projected on a regular grid with a constant latitude/longitude step \(\delta\) by the triangular interpolation [2, p.49]. It should be noted that the nodes of the grid may mismatch with the original position of pixels. For every node, we select three closest image points (klm) and form a triangle with the node inside. If they lie within the one-step rectangle formed by neighboring nodes \((i-1,j-1)\), \((i+1,j-1)\), \((i-1,j+1)\), and \((i+1,j+1)\), the brightness \(I_{i,j}\) at the node can be derived as
$$\begin{aligned} I_{i,j}=\, & {} \frac{I_k\det (r_{i,j},r_l,r_m)+I_l\det (r_k,r_{i,j},r_m) +I_m\det (r_k,r_l,r_{i,j})}{\det (r_k,r_l,r_m)}, \end{aligned}$$
$$\begin{aligned} \det (r_k,r_l,r_m)=\,& {} \left| \begin{array}{ccc} 1 &{} \lambda _k &{} \varphi _k \\ 1 &{} \lambda _l &{} \varphi _l \\ 1 &{} \lambda _m &{} \varphi _m \end{array}\right| = \lambda _k(\varphi _l-\varphi _m)+\lambda _l(\varphi _m-\varphi _k)+\lambda _m(\varphi _k-\varphi _l), \end{aligned}$$
where \(r_k(\lambda ,\varphi )\), \(r_l(\lambda ,\varphi )\), \(r_m(\Lambda ,\varphi )\) are coordinates (\(\lambda\)—longitude, \(\varphi\)—latitude) of the points klm, and \(I_k, I_m, I_l\) are image brightness values at these points. If we cannot find such three points within the rectangle above, we use the closest point in the frame of half a step for determination of the brightness \(I_{i,j}\). Otherwise, the node is marked as empty.

To achieve efficient and accurate interpolation, the value must be chosen equal or slightly coarser than the spatial resolution of the original images. Too large grid step would reduce the accuracy, while too small \(\delta\) would produce a lot of empty grid nodes near the limb. The optimal value is found empirically by minimizing the number of empty nodes at low latitudes for different image groups.

As a result of this interpolation procedure, two projected images A and B are retrieved from a selected pair of original scenes captured by the VMC. To obtain the correlation function for non-empty nodes, the image A is divided by fragments of equal size (see Fig. 1). They should: (a) be slightly bigger than the typical cloud feature; (b) be not too small to avoid false-positives; (c) contain statistically meaningful number of nodes for the correlation to be reliable. To reduce the computation time, the desired field in the image B is defined on a basis of expected cloud features displacement. Each fragment of the image A is then compared to all of them within a desired field in the image B by calculating the correlation function \(C_{\text {cor}}({\text {d}}x, {\text {d}}y)\) as follows:
$$\begin{aligned} C_{\text {cor}}({\text {d}}x,{\text {d}}y)=\displaystyle \frac{\displaystyle \sum _{i=1}^n\sum _{j=1}^m\left( \left( A[i,j]-\bar{A}\right) *\left( B[i+{\text {d}}x,j+{\text {d}}y]-\bar{B}\right) \right) }{\sqrt{\displaystyle \sum _{i=1}^n\sum _{j=1}^m\left( \left( A[i,j]-\bar{A}\right) \right) ^2} *\sqrt{\displaystyle\sum _{i=1}^n\sum _{j=1}^m\left( B[i+{\text {d}}x,j+{\text {d}}y]-\bar{B}\right) ^2}}, \end{aligned}$$
where AB are two-dimensional brightness arrays representing considered fragments of images A and B; \(\bar{A}, \bar{B}\) are mean brightness values in these fragments; \({\text {d}}x, {\text {d}}y\) are latitudinal and longitudinal displacements (multiples of the regular grid step \(\delta\)) of the image B fragment with respect to the one in the image A, and nm are numbers of the latitudinal and longitudinal nodes.
Fig. 1

Pair of UV images in Cartesian projections obtained in 27.07.2007 with time interval of one hour between each other. Images were corrected for viewing geometry using the Minnaert law and processed by 2d-wavelet filter for contrast enhancing. Black points are empty nodes. Red rectangle denotes moving cloud details. Rectangle size is 15\(^{\circ }\) longitude vs 10\(^{\circ }\) latitude. Coordinates of the rectangle center on left image 62.79 E, − 14.79 S, and on the right image 59.69 E, − 14.54 S. Correlation coefficient is 0.99

The correlation function \(C_{\text {cor}}({\text {d}}x, {\text {d}}y)\) defines the degree of similarity between brightness patterns from A and B while the second one is displaced by \([{\text {d}}x, {\text {d}}y]\). The maximum of \(C_{\text {cor}}\) reflects the desired fragment of the image B for which cloud pattern is in the best agreement with the selected fragment of the image A. The next stage is a fine tuning in the vicinity of the maximum. It is achieved by subsequent computations of \(C_{\text {cor}}\) on a new grid that have the same step \(\delta\) but shifted with respect to the previous one by a little fraction. Such a procedure allows more accurate determination for the new coordinates of a cloud pattern in the image B. It gives us an optimal trade-off between the accuracy and the computing time.

3 Computational Work

The image processing framework produces several tasks, one for each comparison of two images, being their file size 1 MB each. The execution time mainly depends on the area to be processed on each image, that corresponding to Venus atmosphere. On the other hand, being independent tasks, the execution time of each one is not affected by the total size of the files nor their number.

Before executing any image processing task, classification of the affected images needs to be performed. As the scope of this contribution is the execution of image processing tasks in the public “cloud” infrastructure, a very simple classification mechanism has been considered, consisting the images histogram, as shown in Fig. 2.
Fig. 2

Color histograms for examples of Venus clouds UV images captures by the VMC from distances 63 104 km (left) and 34 363 km (right). Both images were corrected for viewing geometry using the Minnaert law and processed by 2D-wavelet filter for contrast enhancing

Small area images, those which would require shorter execution time, represent about 55%–60% of each orbit dataset. Further developments may take more complex mechanisms into consideration, such as machine learning techniques based on previous execution times.

3.1 Proposed Architecture

Figure 3 shows the proposed architecture, which relies on the hybrid cloud model [1]. Tasks are generated based on the images obtained from the VMC database. Then, classification of these tasks is performed by means of the image color histogram as explained previously.
Fig. 3

Proposed architecture

Long tasks, those processing at least one big image area, are submitted to the local computing cluster. This cluster may be supported by baremetal machines or a private cloud managed by a virtual infrastructure manager such as OpenNebula [4]. On the other hand, short tasks are submitted to a cluster on a public cloud infrastructure such as that provided by Amazon Web Services.1 This cluster is managed through a solution like Star Cluster.2

The possibility of innumerable computing resources from public cloud infrastructures comes with the need of a proper provision planning. Optimization must be performed by means of throughput (tasks per unit of time), cost and/or a metric that evaluates them both. In other words, the public cloud offerings increase the level of planning complexity with the risk of a dangerous budget raise [10].

As said before, the object of this contribution is to provide a valid model that helps deciding the best public “cloud” setup, as it will process a high number of tasks. This setup is expressed by a number of machines and most common machine types offered by a major provider.

3.2 Experiment Setup

A representative example of the short tasks group has been used for the experiments in the public cloud infrastructure. This particular task analyzes two images from a particular orbit (see example in Fig. 1).

The Amazon Elastic Compute Cloud (EC2)3 was chosen as the public cloud infrastructure. Table 1 shows the 6 different instance types that were used during the experiments. These instances pertain to three different families: general (m4), memory optimized (r4) and compute optimized (c4).4
Table 1

Instances from Amazon EC2 used in this work




CPU Freq./GHz







Intel Xeon E5-2676v3






Intel Xeon E5-2676v3






Intel Broadwell E5-2686v4






Intel Broadwell E5-2686v4






Intel Xeon E5-2666v3






Intel Xeon E5-2666v3


Two execution sets have been performed on each instance type: one consisting a single task and another with n tasks, being n the vCPU number.

3.3 Public “Cloud” Execution Model

To craft an execution model for the Venus Image processing on the chosen public “cloud” infrastructure, the linear regression has been applied to the times from the experiments described in the previous section. This way, execution time (t) can be expressed for each AWS instance type by means of the allocated tasks (T) in the following equations:
$$\begin{aligned} \text {m4.2xlarge: } t= \,& {} 1.023\,2T + 10.369, \end{aligned}$$
$$\begin{aligned} \text {m4.4xlarge: } t=\, & {} 0.577\,2T + 11.075,\end{aligned}$$
$$\begin{aligned} \text {r4.2xlarge: } t=\, & {} 1.008\,4T + 10.624,\end{aligned}$$
$$\begin{aligned} \text {r4.4xlarge: } t=\, & {} 0.663\,3T + 10.617,\end{aligned}$$
$$\begin{aligned} \text {c4.4xlarge: } t=\, & {} 0.595\,1T + 9.029\,8,\end{aligned}$$
$$\begin{aligned} \text {c4.8xlarge: } t=\, & {} 0.283\,9T + 8.845\,5. \end{aligned}$$
The resource contention became manifest during the experiments. Even if there were enough processors for each execution, computing resources such as memory or internal buses were shared between tasks, preventing optimal parallelism.

3.4 Public “Cloud” Throughput Study

Figure 4 shows a comparison of the throughput (tasks / h) for each instance type and number, using the equations from the previous section.
Fig. 4

Throughput for each instance type and number of parallel tasks

Compute optimized instances obtain the highest throughput. Smaller instances pertaining to the other families (general and memory optimized) get the lowest throughput. In fact, values for m4.2xlarge and r4.2xlarge are almost identical. On the other hand, values for m4.4xlarge and r4.4xlarge are similar but diverge with a higher number of tasks.

3.5 Public “Cloud” Cost Study

Executing these tasks on a public “cloud” infrastructure results in a cost per resource usage. As for AWS, Table 2 shows the hourly usage prices for each instance type used in the present work. On the other hand, Fig. 5 shows a cost comparison for each instance type and number.

Steps appear in the figure because Amazon EC2 rounds up the execution hours. The c4.8xlarge instance is again on top of the graph followed by r4.4xlarge. In fact, their values tend to be similar in executions of more than 11 tasks.
Table 2

Hourly cost for each instance from Amazon EC2 used in this work















Fig. 5

Cost for each instance type and number of parallel tasks

3.6 Public “Cloud” Cost/Throughput Study

Pursuing the best setup (number and instance types) on a public “cloud” infrastructure results in choosing between throughput and cost, as explained previously. In many cases, a compromise between these two metrics is needed as a setup satisfying a high throughput demand does not match that satisfying a low-cost demand, and vice versa.

The cost/throughput (C/T) is a metric that allows to choose among different setups by means of these. The lowest its value, the best setup for those specific conditions [11].

As it can be seen in Fig. 6, compute optimized instances (c4.8xlarge and c4.4xlarge) are the ones with the best C/T values for most situations. Additionally, general instances and m4.2xlarge, in particular, are recommended for executions of less than 8 tasks.

Another interesting aspect is that memory optimized instances do not represent an improvement for this type of application.
Fig. 6

Cost/throughput for each instance type and number of parallel tasks

4 Conclusions

By starting with the previous standard analysis of the data associated to the Venus Monitoring Camera, we integrated the used algorithms and data in a “cloud” computing framework that can be extended to more general cases. In this context, we can get more optimal and cheaper analysis, as well a great flexibility in the use of different algorithms, as well different kinds of software to deal with the data. This approach can be extended to other physical contexts as well to the general Big Data Analysis issues.

The computational core of this contribution has been focused in the public “cloud” infrastructure module, that offers a pay-as-you-go fashion. An execution model has been provided for defining an optimal infrastructure by means of throughput, cost and a metric relating both.

As far this work has been conducted with VMC’s near IR images, it can be reproduced with the UV image set and other planetary databases where the image processing algorithm could be applied to.



  1. 1.
    Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010). CrossRefGoogle Scholar
  2. 2.
    Kalitkin, N.N.: Chislennye Metody (Numerical Methods). Nauka, Moscow (1978). (in Russian) Google Scholar
  3. 3.
    Khatuntsev, I., Patsaeva, M., Titov, D., Ignatiev, N., Turin, A., Limaye, S., Markiewicz, W., Almeida, M., Roatsch, T., Moissl, R.: Cloud level winds from the Venus Express Monitoring Camera imaging. Icarus 226, 140–158 (2013)CrossRefGoogle Scholar
  4. 4.
    Montero, R.S., Moreno-Vozmediano, R., Llorente, I.M.: Iaas cloud architecture: from virtualized datacenters to federated cloud infrastructures. Computer 45, 65–72 (2012). Google Scholar
  5. 5.
    Patsaeva, M., Khatuntsev, I., Patsaev, D., Titov, D., Ignatiev, N., Markiewicz, W., Rodin, A.: The relationship between mesoscale circulation and cloud morphology at the upper cloud level of Venus from VMC/Venus express. Planet Sp. Sci. 113–114, 100–108 (2015). CrossRefGoogle Scholar
  6. 6.
    Velasco, M.P., Usero, D., Jiménez, S., Aguirre, C., Vázquez, L.: Mathematics and Mars exploration. Pure. Appl. Geophys. 172, 33–47 (2015)CrossRefGoogle Scholar
  7. 7.
    Vázquez, L., Jafari, H.: Fractional calculus: theory and numerical methods. Cent. Eur. J. Phys. 11, 1163 (2013)Google Scholar
  8. 8.
    Vázquez, L., Valero, F., Romero, P., Martín, M.L., Velasco, M.P., Jiménez, S., Aguirre, C., Caro-Carretero, R., Barderas, G., Usero, D., Martínez, G., Llorente, I.M., Vázquez-Poletti, J.L., Pascual, P., Vicente-Retortillo, A., Ramírez-Nicolás, M.: Some elements of the present Martian research environment at Universidad Complutense de Madrid. Bol. Electrón. Soc. Esp. Mat. 14, 3–15 (2017)Google Scholar
  9. 9.
    Vázquez, L., Velasco, M.P., Vázquez-Poletti, J.L., Llorente, I.M., Usero, D., Jiménez, S.: Modeling and simulation of the atmospheric dust dynamic: fractional calculus and cloud computing. Int. J. Numer. Anal. Model. 15, 74–85 (2018)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Vázquez-Poletti, J.L., Santos-Muñoz, D., Llorente, I.M., Valero, F.: A cloud for clouds: weather research and forecasting on a public cloud infrastructure. In: Helfert, M., Desprez, F., Ferguson, D., Leymann, F., Méndez Munoz, V. (eds.) Cloud Computing and Services Sciences. CLOSER 2015. Communications in Computer and Information Science, vol. 512. Springer, Cham (2015)Google Scholar
  11. 11.
    Vázquez-Poletti, J.L., Perhac, J., Ryan, J., Elster, A.C.: Thor: a transparent heterogeneous open resource framework. In: 2010 IEEE International Conference On Cluster Computing Workshops and Posters (Cluster Workshops), pp. 1–6 (2010).

Copyright information

© Shanghai University 2019

Authors and Affiliations

  • J. L. Vázquez-Poletti
    • 1
    Email author
  • M. P. Velasco
    • 2
  • S. Jiménez
    • 2
  • D. Usero
    • 1
  • I. M. Llorente
    • 1
  • L. Vázquez
    • 1
  • O. Korablev
    • 3
  • D. Belyaev
    • 3
  • M. V. Patsaeva
    • 3
  • I. V. Khatuntsev
    • 3
  1. 1.Instituto de Matematica Interdisdiciplinar (IMI)Universidad Complutense de MadridMadridSpain
  2. 2.Universidad Politécnica de MadridMadridSpain
  3. 3.Space Research Institute of Russian Academy of Sciences (IKI)MoscowRussia

Personalised recommendations