1 Introduction

High fragmentation of land ownership is generally regarded as a threat to the profitability of farms and forest holdings [1, 2]. Actual fragmentation can be the result of quite different situations: a high number of landowners, a high number of holdings (land users), lack of overlap between landowners and land users, a high number of land parcels per landowner, or high average distances between the parcels of the same landowner [3, 4]. More usually, real cases result from a combination of these situations.

Public and private initiatives aimed at reducing land fragmentation exist in most countries and, very often, these focus on reducing the average number of parcels per landowner and/or on relocating parcels of each landowner as close as possible to each other [5,6,7] and, as such, may include the exchange of parcels between landowners. The potential benefits of parcel exchange increase with the total number of landowners and the total number of parcels involved in the exchange, as the number of possible solutions increases very quickly. Obviously, a higher number of possible solutions also make the search of a (sub)optimal solution a harder task to fulfill and the use of heuristics becomes suitable for these cases.

Compared to exchanging parcels one-by-one between two landowners, multiple parcel exchange has logically greater potential benefits, increasing with the number of owners and parcels involved in the exchange process. At this point, the problem can be simplified to a combinatorial problem, and as such, the number of possible exchange combinations grows very quickly as both the number of landowners and parcels involved increase [8]. Combinatorial or trial-and-error approaches, even performed by computers, need large amounts of time that increases exponentially with the number of owners and parcels. Due to the size of the problem, in general, these approaches are not viable and we must look for other options to solve them. A common viable option for complex problems with large numbers of possible solutions is the use of heuristic algorithms. These algorithms trade the optimality of the solution (they are not guaranteed to find the best solution) for performance (the speed of the algorithm), reducing the time needed to achieve good solutions, and as such, they may be a viable technique in this case.

From a general perspective, heuristic algorithms search for the solution by attempting to improve a candidate solution iteratively, according to a measure of quality. They are used to search for solutions of complex problems, but not necessarily the best one, in much shorter times. This speed allows them to solve complex problems such as those commonly found in land management or land administration. Among the several subtypes of heuristic algorithms, genetic algorithms (GA) operate with the basic principles of natural evolution: The best individuals on a population produce the next generations more suited for the environment. This translates to an algorithm that maintains a group of candidate solutions (population) from which the best individuals according to a fitness function (FF) are selected to create new individuals. Every new generation of the population is formed with potentially better solutions, repeating this procedure until the stopping criteria are met.

Previous publications have proven that evolutionary algorithms are well suited to support multiobjective spatial decision-making [9, 10]. Several applications can be found in literature, from traditional land consolidation [11] to space planning [12, 13], automatic delimitation of population settlements [14] and land-use allocation [15,16,17,18]. While there are several cases of genetic algorithms proposed to support decisions concerning land reallotment in traditional land consolidation processes [19,20,21,22], a different approach is raised in [23], where the results of the proposed GA are analyzed from an agroforestry standpoint. The effectiveness of this approach in real cases was verified with satisfactory results. The performance of GAs can be improved with several optimizations and techniques already known and tested in the literature [15].

Recently, big data technologies, such as Hadoop, Spark or Flink, have emerged as efficient solutions for large applications on distributed-memory systems. These technologies provide easier access to high amounts of computational resources, abstracting the underlying hardware structure. Some of them are flexible enough to use them outside of the problem scales usually related to Big Data, enabling their use on traditional problems. Other uses of these big data technologies for genetic algorithms can be found on the literature [24,25,26], some using clusters [27], others in public clouds [26].

In this paper, we analyze the utilization of a big data framework, specifically Spark, to not only increase the performance of the algorithm but also improve the quality of the results, enabling more complex configurations or support larger use cases. Spark has proven its suitability for this type of algorithms in the literature [27].

The rest of the paper is structured as follows: Sect. 2 describes the application of genetic algorithms for parcel exchange. Section 3 describes some details of the proposed system using a multithread approach, and Sect. 4 presents the proposal to perform parcel exchange using Apache Spark. Section 5 presents the results obtained and Section 6 summarizes the conclusions and outlines future work.

2 Genetic algorithm for parcel exchange: GeAPaE

Genetic algorithms are among the several search heuristics usually applied to optimization, in this case, inspired by natural processes like natural selection, and therefore, most of the terminology and mechanisms come from those fields.

The base element of the algorithm is an individual, and in itself is a possible solution to the optimization problem. This means that an individual encodes all the data needed to represent a possible solution to the problem, and that is the assignment of an owner to each of the N parcels involved. These can be also referred to as ownership patterns and can be abstracted as an ordered sequence of owner identifiers. An individual contains one owner identifier, one of the L owners of the particular use case for each parcel involved, each of those identifiers can be called a gene in common genetic algorithm terminology. The algorithm tries to find an individual that has the lowest value according to a specific set of rules usually represented as a mathematical function known as fitness function (FF) that factors the owner of each parcel in the computation of the value.

The basic algorithm maintains Q different groups of individuals called populations, denoted as \(P^{1},..., P^{j},..., P^{Q}\), and it creates new individuals using a combination of crossover and mutation processes. Each population could have a potential different number of individuals, denoted as \(M_{j}\). Each individual can be identified by their population and element index, denoted as \(P^{j}_{i}\) (the individual i of population j). \(P^{j}_{i,k}\) denotes the owner of the k-th parcel according to the i-th individual of the j-th population, one for each of the N parcels involved. These populations can also be referred as generations when one group is created from the members of the other to introduce the sense of parenthood or precedence: Individuals of a population generate a new group of individuals. These terms will be used interchangeably during this work.

In [23], different fitness functions that were used are explained in detail. In some of them, a reference point of the owner is mentioned, this point being a location indicated by the owner. This reference point usually is located at their farmyard and indicates the place around which the owner prefers their parcels to be located. A short description of each fitness function is as follows:

  • Parcel Distances Total (PDT): adds all the distances between each pair of parcels assigned to the same landowner, for all landowners.

  • Parcel Distances Average (PDA): similar to the previous one but uses the average for each landowner instead of just adding them.

  • Reference point Distance Total (RDT): adds all the distances between each parcel and the reference point of the owner.

  • Reference point Distance Average (RDA): similar to the previous one but the value for each landowner is the average instead of the sum.

  • Total Distances Combined (TDC): combines the PDT and RDT fitness functions, with potentially different weights.

  • Average Distances Combined (ADC): combines PDA and RDA, again with independent weights.

  • 4 Distances Combined (4DC): combines the first four fitness functions, that is PDT, PDA, RDT and RDA, giving different weights to each one.

  • Number Of ParceLs (NPL): number of parcels assigned to the owner.

All of these fitness functions except the last one, NPL, calculate the value for each landowner and average the values of each owner to give the final fitness value of the individual. The NPL fitness functions instead add the values of every owner, and the final value for an individual is the total number of parcels.

Table 1 summarizes the main equations for each fitness function. \(N_{o}\) is the number of parcels assigned to the owner o in the individual being evaluated. \(PD_{i,j,o}\) is the distance between parcel i and j of the owner o. \(RD_{i,o}\) is the distance from parcel i to the reference point of the owner o. Finally, \(W_{PDT}\), \(W_{PDA}\), \(W_{RDT}\) and \(W_{RDA}\) are the weights given to the first, second, third and fourth evaluation methods, respectively.

The fitness function to use can be configured in each execution according to the needs of the users. In some cases, the goal is to have the parcels closest to their main farmyard (RDA or RDT would be the best ones to use). Sometimes, they don’t need to be so close to their farmyard but prefer parcels grouped together (PDA or PDT in that case). Other times they want to have the lowest amount of separated pieces of land with no regards to the shape itself (NPL performing the parcel union, only cares about the amount of parcels after fusing the ones assigned to each landowner wherever possible).

Table 1 Equations for fitness functions used in [23]

3 Description of GeAPae implementation using multithreading

This section describes some internal peculiarities of previous implementations of GeAPae [23]. GeAPae is implemented using the Java programming language and parallelized using threads. It maintains all the modes of execution (command-line, integrated web application or GeoServer WPS operation). In this paper, we use command-line mode exclusively as it is the most adequate for the Spark environment used (HPC cluster with job scheduling). Input, output and configuration of the execution remain unchanged, as well as the distance calculation and caching done at the start.

In this section, we detail how the genetic algorithm works, focusing on the parallelization techniques that exploit the multithreading capabilities of modern systems. The algorithm can use multiple processing threads in two different ways: parallelize the creation of new individuals to create the next generation quicker for a given population, or maintain several populations evolving independently that share individuals periodically. These two approaches have different effects on the overall algorithm, increasing the thread count working in the same population increases the amount of generations completed in the same time, which can be understood as increasing the depth of the search or search depth since it can stack mutations on the individuals. Increasing the number of populations, on the other hand, does not increase the amount of generations achieved, but since each population evolves independently, they can test different mutation paths in parallel, increasing the width of the search or search width when taking into account all of the populations.

3.1 Overall structure

For a single population, a more detailed description of the algorithm is presented in [23]. Algorithm 1 shows the pseudocode of the sequential algorithm, to explain the evolution process.

The algorithm starts with an initial phase of population creation (lines 1-6). In this phase, \(M_k\) individuals are created to fill the first generation, checking that the ownership pattern is valid, that is, each landowner keeps the total property value within a prefixed margin with respect to his property value in the original distribution. When the population is complete, the algorithm enters the second and main phase, the evolution loop.

The evolution is performed for a predefined amount of time (line 7). For each individual in the population, another random individual is selected, and the crossover operation is performed (lines 15-19), generating two children. Those new individuals are mutated one or several times, and if the resulting individual is valid, it is added to the next generation’s population (lines 20-37). The best of the children and the original elements is selected as the individual to be included into the next generation (line 38). If the best fitness in the population does not change during several generations, the algorithm is considered stagnated and the loop stops before the maximum time is reached. This is done to avoid wasting execution time when the optimum is reached, either a local or global optimum, and, as such, this stagnation detection can be seen as another stopping criterion.

figure a

Two different approaches to increase performance and reach better fitness values faster are proposed. One is based on increasing the search depth by parallelizing the creation of new individuals. The other approach is based on increasing the search width, creating several populations that evolve independently but share some individuals periodically.

The first parallelization strategy uses the available threads to accelerate the creation of new generations. This is achieved by parallelizing the for loop in lines 14-39. Each iteration of the loop is considered an independent task, the i-th iteration is the creation of the individual that will replace the i-th individual in the next generation, and the available threads complete those tasks. The tasks in a generation are independent with each other, while there is a dependency between generations. There is no static scheduling of those tasks, since the computational cost of each one is not fixed, each thread proceeds with the next tasks when it completes the previous one, balancing the load of each thread and making efficient use of the resources. Increasing the speed at which new generations are completed increases the number of mutations that are applied to the same individual, since the mutations only occur between generations. The context of the algorithm increases the relevance of the mutation process, since it is the step that introduces randomness to the individuals, necessary to explore new parcel-landowner associations. Therefore, the depth of the genealogy of an individual, also seen as the search depth, has a notable role in the performance of the algorithm.

With regards to the width focused approach, a single process handles each population in a different thread in order to be able to use shared memory to perform that cooperation, instead of communication between different processes. The cooperation between populations is performed in the export/import step of the algorithm (lines 8-13), using lists of individuals as import queues in a producer-consumer pattern. At set intervals, each population exports its best individuals, adding them to the import queues of every other population (lines 8-10). Each population is independent of the others, and the only interaction is the asynchronous communication through the import queues that are managed at a higher level of abstraction.

When a population imports individuals of other populations, all incoming individuals are evaluated with the configuration of the accepting population, in case they originate from a population that uses a different fitness function, and they are appended to the current population, temporarily increasing the population size. In order to avoid uncontrolled population size increases, the inflated population is sorted by fitness value from best to worst, and the list is truncated to the correct population size, removing the worst individuals from it (lines 11-13). If the imported individuals are better than some of the ones already in the population, the old ones will be removed. Otherwise, the imported individuals are removed when the list is truncated, and the population will keep its original individuals.

Figure 1 shows a representation of this process when executed in a single node using the multithread versions of the two approaches detailed. In the shown case, there are three populations with 4, 3 and 4 individuals, respectively, where the second population exports the best individual to the other populations and they integrate that individual in their individual lists. For reasons of clarity, only population 2 shares individuals with the other two populations. In the case shown, \(P^2\) performs the export, sending its best individual, \(P^2_1\) to the other two populations. Those populations import that individuals, adding it to their individuals list, sort them by increasing fitness value and they truncate the individual list to the population size, four in both cases. In \(P^1\), the imported individual is better than the worst of the existing individuals, so the truncation removes \(P^1_4\) and the imported individual becomes part of \(P^1\). In \(P^3\), the imported individual is worse than all of the existing ones, so the truncation removes the imported individual and the population remains unchanged.

Fig. 1
figure 1

Scheme of the evolution of multiple populations exploiting both parallelization strategies, search width and depth

The increase in search width by itself does not provide big improvements, search depth tends to be more important than search width, in general, more so in this case due to the mutation step being the main source of fitness improvement. However, since the whole algorithm’s evolution is performed isolated in each population, a different configuration can be used in each population. This heterogeneous configuration opens new possibilities, an advancement in one population that is focused on one fitness function can have a positive impact in another population that is searching with other fitness function. This has a large impact when the computational cost of several fitness functions is very different, populations with fast fitness functions will complete new generations faster, and individuals from those deeper generations will be exported to slower populations, jumpstarting it or introducing big leaps in the evolution of the slower populations.

4 Efficient Spark-based GeAPaeSp implementation

With the approaches explained in Sect. 3, a trade-off must be sought between depth and search width, choosing how to dedicate the computational resources of the system that runs the algorithm. The possibility of not having to compromise one of the two levels of parallelism, increasing the amount of computational resources available, is appealing. There is a large number of tools and techniques to utilize distributed memory systems effectively, one big example being MPI. The rise of Big Data leads to the creation of new commercially oriented and more flexible software tools to utilize those systems in different applications, in contrast to the more research or academic focused uses or traditional approaches. An alternative approach of the multiple population parallelization has been proposed, using the Apache Spark framework to allow simple access to distributed memory systems like clusters or cloud infrastructure, obtaining other features like fault tolerance without extra effort.

There are other alternatives to Spark, such as Flink or Hadoop. The main reason for choosing Spark is ease of use and correct support for iterative algorithms. Hadoop is not adequate for iterative algorithms like a genetic algorithm as it has been tested and compared with Spark in previous works [27]. One of the main problems with Hadoop is the requirement to use HDFS for input, output an intermediate results, introducing a forced overhead of writing intermediate results to disk, which can be too high depending on the configuration as explored in previous works [27]. Flink, on the other hand, is not adequate for our proposal due to its design focus on stream processing.

Our Spark approach allows the utilization of multiple populations to increase the search width while using all available resources in each Spark worker node to maximize the search depth in each population. Each population can be distributed to a different worker node and use all of its available threads for the parallelization to maximize the search depth.

Apache Spark uses the concept of Resilient Distributed Dataset (RDD) as the main tool for efficient fault tolerance and distributed memory computation. An RDD is a collection of data distributed across the underlying infrastructure that can be operated in parallel and cached in memory when possible. In our case, we create an RDD from data already in memory, integrating Spark with the existing algorithm. Spark uses a master/worker architecture, with the Driver program running on the master node and Executors running on the worker nodes using resources allocated through a cluster manager.

4.1 Spark distributed structure

Our proposal takes advantage of the resources available through Spark by distributing the populations to different worker nodes and uses all the resources on each worker node to increase the search depth of the population assigned to it. The basic abstraction is based on the execution of the application on the worker nodes, each worker node performs the evolution of one population, with no interaction with the others. After the evolution is completed, some individuals are copied to other populations to allow cooperation, and the evolution is performed again repeating it until the criteria of time elapsed or stagnation are achieved.

Figure 2 shows a simple scheme of how the computation is distributed in a traditional Spark environment. In contrast to other parallelization techniques seen in the literature, this work is focused on the quality improvement that is possible with this approach, while maintaining the benefits on reduction in execution time. With our proposal, we can use the distributed memory paradigm to increase search width, running several populations in parallel in different nodes, and the shared memory to maximize the search depth in each population using all of the available threads in each node to increase the search depth of the population evolving in that node.

Fig. 2
figure 2

Scheme of the evolution of multiple populations using our Spark approach

The driver program creates a RDD of key-value pairs, where the value is the individual list that constitutes a population, and the key is the ID (or index) of the population. At this point, the RDD contains each population associated with its ID. A flatmap operation is executed on the RDD, and this operation performs the evolution of the population, and a reduce operation is performed to group the output of the flatmap by the key of each element. Then, the process is repeated with the reformed populations until the maximum allowed time is reached. Each group of flatmap-reduce operations is called a stage for future reference. We describe each of these operations in detail below.

The flatmap operation receives each key-value pair, runs the evolution on the population and creates new key-value pairs with the population index as key and each individual in its own list. The execution of the flatmap operation in each key-value pair is independent of the others in the same stage, while there is a dependency with consecutive stages. For one population of \(M_k\) individuals, \(M_k\) pairs are created, each one with the index of the population as key and a list containing the i-th individual. Additionally, new pairs for the b best individuals of the population are also generated (the precise amount can be configured). In an execution with Q populations, the b best elements of the population create \(Q-1\) new pairs, each one with the index of another population, and a list with one of the best individuals. At the end, there are \(M_k\) pairs with lists of one individual containing the full population in total, and \(b*(Q-1)\) with the best individuals copied with the indexes of the other populations.

The reduce operation joins the lists that share the same key. This operation applies an associative and commutative binary operator to all the key-value pairs sharing the same key. All the pairs with the same key are sent to one worker node. Spark should do this in a way that reduces the data transfers keeping each population in the same node and only moving the copied individuals, where the reduce operation is executed over those pairs. This binary operator receives two pairs and creates a new and single pair with a list that contains the individuals of the original pairs. After this operation is completed, the RDD returns to the original size, with one pair per population with the population ID as key and the list of individuals that constitute the population as value, ready to repeat the process.

The process of dividing the populations into lists of one individual at the end of the flatmap operation, and regrouping them with the reduce operation, allows the cooperation between different populations. Spark does not allow direct communication between worker nodes, so the best way of moving data from one worker to another is to complete the task which the worker is currently doing, rearrange the data in the RDD and launch another computation task. The movement of data between worker nodes is normally discouraged as it carries a relevant overhead, but in this case, the amount of data that needs to move through the network is not large, only some individuals of the populations.

After the first stage, the size of the lists (size of each population) increases, because on top of the individuals of the population itself, the best individuals of other populations are also added. To avoid increasing the size of the populations continually, only the \(M_k\) best individuals of each populations create new pairs at the end of the flatmap operation, with \(M_k\) being the original population size. The evolution may be with more individuals, but only the best \(M_k\) make it to the next stage, keeping the population size under control.

Algorithm 2 shows a representation of the RDD through several stages, using three populations of 3, 2 and 4 individuals and exporting one individual to other populations.

An important performance parameter is the duration of each stage, the time each population evolves isolated, as it controls the level of cooperation. On the one hand, short stages, more of them, are desired to increase the cooperation between populations, but on the other hand, each stage has an associated overhead due to data movement, so less stages are also desired. These two conflicting factors need to be balanced, so the progress achieved in the evolution is worth the overhead time introduced for communications. The duration of each stage can be adjusted in the configuration file to balance these two factors in each execution.

figure b

Lastly, to maintain the stagnation stop criteria, the driver program has to keep track of the evolution changes in each stage, since it has no access to the data the worker nodes kept during the evolution. Our Spark approach detects the stagnation when the fitness value does not change in any population after several stages.

5 Experimental results

An analysis of our proposal is presented. This analysis is split into two studies: an analysis for the multithread approach and a performance study for the GeAPaeSp approach.

During the testing procedure, several parcel holdings are used. One of them is synthetic with uniform, adjacent parcels with random landowner assignments for a controlled environment. The other parcel holding is a real test case. Table 2 contains a brief description of the parcel holdings and Fig. 3 shows a graphical representation of the initial distribution where each parcel is color coded by the assigned landowner. Both parcel holdings are color coded by the landowner of each parcel.

The configurations throughout this section (see Figs. 58) are named following the format (QT). Q is the number of populations evolving in parallel. T is the total number of threads in use, each population has an equal amount from the total. The best results are marked with bold numbers in the tables in the following sections.

Table 2 Test cases information
Fig. 3
figure 3

Testing parcel holdings. Each color represents an owner. a Synthetic parcel holding. b Real case holding, Ribadeo, Spain

5.1 Multithreading parallelism

For this analysis, we use a cluster where each node is equipped with two Intel Xeon E5-2660 Sandy Bridge-EP, each one with 8 cores at 3.0 GHz for a total of 16 cores, 64 GB of memory and one 1TB local hard disk drive.

Each configuration was executed ten times, and the average fitness value of the ten executions is displayed in the graphs, measuring progress in a 15-second interval. The population size M is not the same in all cases, Table 3 shows the value used in each configuration. The fitness function used is Average Distances Combined (ADC) with equal importance to parcel distances and reference point distances (\(W_{PDA} = W_{RDA} = 0.5\)). This fitness function is chosen because it can produce good results using under an hour for the parcel holding used, even in the sequential configuration.

Figure 4 shows the effect on the algorithm results when increasing the number of threads processing a single population. In this graph, the time axis is in logarithmic scale to improve graph visibility, which amplifies the representation of the 15 seconds between datapoints at the start of the graph. We consider a step-before graph over a straight line to never overrepresent the fitness improvement. A reference value called 90% improvement is showed. The difference between the maximum starting fitness value and the minimum final fitness value is calculated, and it represents the maximum fitness improvement possible. Then, the 90% improvement value is calculated as the higher starting fitness value minus 90% of the improvement just calculated. This value represents the point where fitness value reduction starts to provide diminishing returns. The user could stop the algorithm once this value is reached, since most of the improvement will have already been made. The conclusions based on speed or time are based on the time to reach this value.

Fig. 4
figure 4

Multiple threads on one population and a single node

As can be seen, there is a clear reduction in time needed to achieve good fitness values. The efficiency of the parallelization (seen as the ratio of time reduction versus the increase in resources used) goes down as the number of threads increases due to having low computational load for the amount of threads available. Our GeAPae proposal is up to 13.73x faster compared with the sequential version, and a slight improvement in the fitness value is achieved.

Figure 5 shows results when increasing the number of populations using one thread. In this case, as it was expected, a small benefit has been obtained in execution times, because the same amount of work to do has to be done for each population, while the resources used for processing each population are the same as for sequential algorithm. A slightly reduction in execution time is still achieved due to the cooperation between populations; if one population found a good element, it will be shared with the other populations, contributing to a faster achievement of good fitness values. Anyway, this approach is up to 1.71x faster with respect to the utilization of a single population.

Fig. 5
figure 5

Multiple populations with one thread

Fig. 6
figure 6

Different configurations increasing populations using 16 threads

Table 3 Time to achieve 90% of the potential whole improvement and speedups

Figure 6 shows different configurations varying the number of populations using the maximum number of available threads, 16. To ensure a fair comparison, all the configurations have the same computational load. In those configurations where less number of populations is processed, the size of each one is increased in order to keep the total number of individuals equal to 512 elements, maintaining the overall search width to reduce a variable from the equation. As can be seen on the graph, the first configuration to cross the 90% improvement line, therefore the fastest, is the one with one population of 512 individuals and 16 threads, but it stagnates on slightly higher fitness values. The configuration with four populations of 128 individuals processed each one by four threads presents an average behavior regarding execution time until stagnation, but achieves the best fitness value. As a result, this configuration is a good compromise between speedup and fitness values achieved.

Table 3 shows numeric data for all configurations. Q indicates the number of populations, T is the number of threads processing each population and M is the amount of individuals in each population. For each configuration, the starting and final fitness values, the time needed to achieve 90% of the whole potential improvement and the speedup are displayed. The speedup is calculated with respect to the \(P=1\) and \(T=1\) configuration. Compared to that configuration, fitness values reductions are up to 4% better and speedups of 13.36 times faster can be achieved.

5.2 Spark distributed parallelism

The following results were taken in a working Spark environment with multiple computation nodes, the testing platform is the same cluster with 17 nodes, in this case, running Spark 3.0.1. Each node consists of two Intel Xeon E5-2660 Sandy Bridge-EP with 8 cores each at 3.0 GHz for a total of 16 cores, 64 GB of memory and one 1TB local hard disk drive.

For this analysis, we use a synthetic parcel holding to have a more controlled environment, with the initial parcel distribution shown in Fig. 3a. The configuration used consists of 128 individuals for population size and fitness function of average parcel distances (PDA) after performing parcel union. This combined with the synthetic parcel holding allows to reach a situation where each owner has one contiguous set of parcels that, after performing the geometry union, will result in one parcel per landowner, resulting in a fitness value of 0. Using that target to measure time is prone to executions not being able to reach it due to being stuck at local minimum, so we will use the 90% improvement like the previous section. In this case, that value is 40, all populations start at values slightly below 400, so the speedup is calculated with the time to reach a value of 40. The fitness value of the original parcel distribution (completely random) is 1918, but the initial individual generation to fill the population is capable of creating better elements with a fitness value around 350, so in reality, the evolution itself starts at that point, not the 1918 of the initial configuration. This generation process is described in more depth in [23]. To increase the clarity of the graph, the fitness value axis is focused on the [0, 400] range, where most of the data are located.

Figure 7 shows the fitness curves during the evolution (if multiple populations are used, the value of the first population is used) for the synthetic parcel holding. The time axis displayed stops at 2000 seconds to increase readability, only two series have not reached the 90% value by that time, Table 4 has the time they need to reach it.

Fig. 7
figure 7

GeAPaeSp proposal performance using a synthetic parcel holding

We test one population with one thread to get a reference, that would be the sequential behavior of the algorithm.

We include configurations (1, 16) and (16, 16) to show the two extremes of the depth-width balance that the Spark implementation solves. Both of those configurations are limited to one node, the user has to choose to increase depth or width with the resources of that node. With the Spark implementation, each node dedicates its resources to maximize search depth, and more nodes process different populations to increase search width maintaining search depth. The configuration (1, 16) uses all available resources for search depth, so it can evolve faster but is prone to being stuck on local minimum, which results in less progress at the end part of the evolution. The configuration (16, 16) only uses one thread per population, all resources are dedicated to width which causes the evolution to be similarly fast as the sequential, but being able to reduce stagnation due to local minimum, achieving the target value of 0 more reliably.

The rest of the configurations show the first advantage of the Spark implementation. We can increase the number of populations (search width) and benefit from the stagnation resistance without losing search depth or speed to reach lower values. The total amount of resources available grows with each population added, executing each one in different nodes of the cluster. The Spark implementation has less granularity with the time measurements. Only after a stage is completed, we can know the evolution progress in the Driver program. In this case, the duration of each stage is 300 seconds so there are only around six stages, the end of each corresponds with the big reduction in fitness for the series with more than 16 threads in total.

Table 4 shows the execution time needed for each configuration together with the speedup achieved with respect to the sequential execution. For the configurations that use Spark, the time used is the time in which the stage that reached 40 or below finishes. The driver does not know exactly when the 90% threshold value actually was surpassed, but it is the time the results are available to check stopping criteria. Increasing the number of populations allows to reach lower values more reliably, with 4, 8 and 16 populations, all executions reach a value of 0. With less populations, that value is not always reached, and the average is higher than 0. This is evidence for the resilience to local minimum that wider configurations provide. The sequential case reached 0 in 3 out of 10 executions, (1, 16) reached 0 in 5 out of 10 executions, (16, 16) does it in 6 of 10 executions and (2, 32) reached the value 0 in 7 out of 10 executions. The configurations (4, 64), (8, 128) and (16, 256) reach 0 in all executions. Our GeAPaeSp is up to 14.26x faster compared to the sequential version, and the fitness reductions achieved are up to 1.69% better compared to the configuration (1, 16).

Table 4 Time in seconds to achieve a fitness value of 40 and speedup in the synthetic parcel holding

Figure 8 shows the fitness evolution of each configuration using the real parcel holding. As with the previous section, we will use the 90% improvement as the target for time measurement for speedup calculations. Table 5 shows the time needed for each configuration together with the speedup achieved with respect to the sequential execution. The configuration used in this test is similar to the one used on Sect. 5.1 but it is not the same, the fitness function in use is different, although the fitness values are very similar, they cannot be directly compared.

Table 5 Time in seconds to achieve 90% improvement in fitness value and speedup in the real test case parcel holding

This use case requires more execution time, and the stage duration of 300 seconds is more adequate relative to the total execution time. The fitness curves for the Spark configurations differentiate from each other better than in the synthetic parcel holding. In this case, the best configuration is the (8, 128), being 13.91 times faster than the sequential version. Similar to the synthetic parcel holdings results, increasing the number of populations helps to reach lower fitness values, higher search width has that effect, and with this implementation, the search depth is not compromised. The best fitness value reduction was achieved by the configuration (16, 256), outperforming the (1, 16) configuration by 4.43%. The speedup is mostly achieved from dedicating 16 threads to each populations, and the search width helps to reach lower fitness values, combining the two parallelization techniques advantages while mitigating the disadvantages of each one.

Fig. 8
figure 8

GeAPaeSp proposal performance using a real case

To close this section, Fig. 9 displays the result of one execution using the Spark distributed approach with 16 populations on each parcel holding. The synthetic parcel holding (see Fig. 9a) behaves uniquely when using the fitness function PDA together with parcel union. Since all of the parcels are adjacent to each other, one contiguous group of parcels (considering the eight neighbors of each parcel, sharing a corner is considered adjacency) is joined into one parcel, resulting in a fitness value of 0 for the landowner, with no regards to its shape. On real parcel holdings (see Fig. 9b) where there are roads and other parcels to separate the involved parcels, this situation does not happen. Nevertheless, it is easy to see that the new parcel holdings represent an improvement with regards to ownership fragmentation. The synthetic parcel holding execution managed to form one contiguous block of land (touching corners at worst) for each landowner, even with the aforementioned peculiarity. The real parcel holding also shows a notable improvement in this regard, all the parcels of each landowner are confined to smaller areas in general (see owners #3, #7, #9 and #11 for notable improvements).

Fig. 9
figure 9

Result parcel holdings. Each color represents an owner. a Synthetic parcel holding. b Real case, Ribadeo, Spain

6 Conclusions

This paper provides a parallel proposal of a genetic algorithm for parcel exchange using Apache Spark, GeAPaeSp. It should be observed that this proposal is especially well suited to increase the quality of the final results and minimize the execution time. Our proposal improves the width of search increasing the number of populations; and the depth of search increasing the number of generations in a suitable time. GeAPaeSp outperforms a sequential and multithreaded proposal, combining speedups of up to 14.26 with respect to the sequential version and, at the same time, better fitness improvements by up to 10.74% (4.43% comparing against the depth-focused multithreaded proposal), reaching those better fitness values more reliably.

Distributing the computation across multiple computation nodes allows the possibility of using broadly different configurations that cater to the multiple needs of the landowners involved, without hampering any of the other configurations.

There is still work to be done on this front. There are still known improvements to genetic algorithms not implemented, like local searches on selected individuals. Another possibility is dynamic configuration of each population independently to adapt to the progress of the evolution process, using fast methods during the initial stages and changing to more intensive options when those fast methods become stagnant.