Bi-criteria parallel batch machine scheduling to minimize total weighted tardiness and electricity cost

A bi-criteria scheduling problem for parallel identical batch processing machines in semiconductor wafer fabrication facilities is studied. Only jobs belonging to the same family can be batched together. The performance measures are the total weighted tardiness and the electricity cost where a time-of-use (TOU) tariff is assumed. Unequal ready times of the jobs and non-identical job sizes are considered. A mixed integer linear program (MILP) is formulated. We analyze the special case where all jobs have the same size, all due dates are zero, and the jobs are available at time zero. Properties of Pareto-optimal schedules for this special case are stated. They lead to a more tractable MILP. We design three heuristics based on grouping genetic algorithms that are embedded into a non-dominated sorting genetic algorithm II framework. Three solution representations are studied that allow for choosing start times of the batches to take into account the energy consumption. We discuss a heuristic that improves a given near-to-optimal Pareto front. Computational experiments are conducted based on randomly generated problem instances. The ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \varepsilon $$\end{document}-constraint method is used for both MILP formulations to determine the true Pareto front. For large-sized problem instances, we apply the genetic algorithms (GAs). Some of the GAs provide high-quality solutions.


Introduction
Semiconductor manufacturing deals with producing integrated circuits (ICs) on wafers, thin discs made of silicon or gallium arsenide. The manufacturing process takes place in semiconductor wafer fabrication facilities (wafer fabs) by a process where ICs are built layer by layer on top of a raw wafer. Up to several thousands of ICs can be manufactured on a single wafer. The moving entities in a wafer fab are called lots, a set of wafers with ICs that belong to the same product. In this paper, we call lots jobs to align with the deterministic scheduling literature. Wafer fabs can be modeled as a job shop with a couple of unusual features such as a large number of machine groups with machines that offer the same functionality, reentrant process flows, (i.e., some of the machine groups are visited by the same job many times), and a mix of single wafer, lot, and batch processing (Mönch et al. 2013). A p-batch is a group of jobs that are processed at the same time, i.e. in parallel, on the same machine (Potts and Kovalyov 2000). Up to onethird of all operations in a wafer fab are performed on batch processing machines. The processing times on these machines generally are very long compared to times on other machines. Up to 24 h are possible for the longest processes. Since batch processing machines process several jobs at the same time, they tend to offload multiple lots on machines that are able to process only single wafers or jobs. This leads to long queues in front of these serial machines. Therefore, scheduling batch processing machines in an appropriate manner is crucial for the overall performance of a wafer fab .
The semiconductor industry is extremely energy-intensive due to the required clean room condition and the extremely complicated machinery. It consumes more electricity than other industries such as steel or petrochemical (Yu et al. 2017). Diffusion furnaces, typical batch processing machines in a wafer fab, are the machines with the largest energy consumption in wafer fabs (cf. Scholl 2017; Singapore Government 2019) due to the fact that the diffusion process is a high temperature process that disperses material on the wafer surface. Therefore, in this paper, we study a model problem for scheduling jobs on parallel batch processing machines where we take two important, but contradicting performance measures into account, namely total weighted tardiness (TWT) of the jobs and electricity cost (EC) of the batch processing machines. We assume a TOU tariff, i.e., the energy price depends on the time when the energy is consumed. Although the effects of TOU tariffs are discussed controversially in the literature, there is some evidence that TOU tariffs can positively affect sustainability (Che et al. 2017). For instance, Finn et al. (2011) and Pina et al. (2012) demonstrate via use cases in Ireland and the Azores islands that demand response programs, among them TOU tariffs, lead to a higher penetration of renewable electricity. Stoll et al. (2014) demonstrate that the environmental impact of shifting consumption from on-peak to off-peak hours can be both positive and negative. A positive effect of the shifting is observed for Ontario and UK, but not for 6. Each job j has a ready time r j ≥ 0. 7. Each job j has a due date d j . 8. There are m identical parallel machines, labeled by k = 1, … , m . All the machines are available at time t = 0. 9. All the machines have the same maximum batch size B , measured in number of wafers. 10. Preemption of the batch machines is not allowed, i.e., once a batch is started, it cannot be interrupted. 11. We assume that a finite scheduling horizon is divided into periods of equal size. The periods are labeled by t = 1, … , T . The EC is modeled as a piecewise constant function over the scheduling horizon.
We show an example for three furnace machines and jobs, each with five wafers, belonging to three incompatible families in Fig. 1. The different families are indicated by different colors of the wafers and the maximum batch size is B = 10 wafers, i.e. two jobs. Note that the batch on the second machine only has a single job of family A. Forming such batches is reasonable in certain situations due to the ready times of the jobs and tight due dates. Seven additional jobs wait for processing in front of the three furnaces. They belong to the families A, B, and C. The time axis indicates that period-dependent energy costs are assumed.
The TWT measure of a schedule S is defined as follows: where T j ∶= C j − d j + is the tardiness, C j the completion time of job j in S , and the abbreviation + ∶= max( , 0) is used for an arbitrary real number . If we have d j ≡ 0, j = 1, … , n then we obtain TWT = ∑ n j=1 w j C j = TWC , i.e., we get the TWC measure. It is a surrogate measure for the weighted cycle time, an important key performance indicator in wafer fabs (Mönch et al. 2013). The second performance measure of interest is the EC. We assume that the EC values are constant in each period, i.e., we have a mapping e ∶ [0, T] → IR + that is piecewise constant in each period, to model the TOU tariff. If e(t) is the corresponding EC value in period t then the EC value of schedule S can be expressed as follows: where z k t is 1 if a batch is processed in period t on machine k in S and zero otherwise. Note that the two objectives are in conflict. Minimizing TWT leads typically to nondelay schedules, while minimizing EC requires determining periods for processing within the scheduling horizon where the e(t) values are low. The EC measure is a surrogate measure for sustainability efforts since low e(t) values offered by TOU tariffs are often considered as a prerequisite to increase the penetration of renewable energy (Finn et al. 2011;Pina et al. 2012;IRENA 2019). Note that minimizing the EC measure in the case of a constant EC for the entire scheduling horizon is equivalent to minimizing the number of batches which clearly leads to a reduction of the energy consumption. But even in the case of a general TOU tariff a small EC value often leads to a small number of formed batches.
Using the three-field notation from deterministic scheduling theory (Graham et al. 1979), the scheduling problem at hand can be represented as follows: where P indicates identical parallel machines, p-batch, incompatible refers to batch processing machines with incompatible families, r j to unequal ready times, and s j to non-identical job sizes. The notation ND(TWT, EC) refers to the set of all Paretooptimal solutions, i.e., a schedule S is called non-dominated when no other feasible schedule S ′ exists with TWT S � ≤ TWT(S) and EC S � ≤ EC(S) , and at least one of the two inequalities is strict. The entire set of all non-dominated solutions for a problem instance is called the Pareto frontier.
Note that scheduling problem (3) is NP-hard since the scheduling problem P|p − batch, incompatible|TWC , a special case of problem (3), is NP-hard due to Uzsoy (1995). Hence, we have to look for efficient heuristics if we want to tackle large-sized problem instances in a reasonable amount of computing time. (1) (3) P|p − batch, incompatible, r j , s j |ND(TWT, EC), 1 3

MILP formulation
A MILP formulation for scheduling problem (3) is presented in this section. A timeindexed formulation for the scheduling horizon t = 1, … , T is used. The following indices and sets are used in the model: The following parameters are included in the model: The following decision variables belong to the model: C j : completion time of job j T j : tardiness of job j Cb i : completion time of batch i. The model can be formulated as follows: (4) min We are interested in determining all Pareto-optimal solutions with respect to the TWT and the EC measures. This is modeled by (4). The constraint set (5) ensures that each job is assigned to exactly one batch. The maximum batch size is modeled by constraint set (6). The constraints (7) model that only jobs belonging to the same family can be used to form a batch. The family of a batch is determined by its jobs. When at least one job is assigned to a batch then this batch has to start in some period on some machine. This is modeled by constraint set (8). Constraint set (9) makes sure that the completion time of a batch is not larger than the end of the scheduling horizon. The constraints (10) model the fact that each batch starts at most once before the end of the scheduling horizon and no overlapping occurs in the processing of batches on a machine. The constraints (11) enforce that the ready time of the jobs are respected, i.e., a batch can only start if all jobs that belong to the batch are ready. The completion of a batch is calculated by Eq. (12). The constraint sets (13) and (14) ensure that the completion time of a batch and the completion time of the jobs that belong to this batch are the same. The tardiness of a job is linearized by the constraints (15). The domains of the decision variables are respected by constraints (16).
Determining the Pareto front for instances of problem (3) can be based on the (equidistant) -constraint method (cf. Ehrgott 2010) if the quantities p j , w j , d j and the e(t) values are integers. A single objective is optimized within each iteration while the remaining objectives are transformed into constraints. The application of the -constraint method to the MILP (4)-(16) is shown in the Appendix.

Analysis of a special case
For the sake of completeness, we recall important results for the special case r j ≡ 0 , d j ≡ 0 from Rocholl et al. (2018). In addition, all jobs have the same size, i.e., without loss of generality we assume s j ≡ 1 and B ∈ IN . This leads to the scheduling problem Due to Uzsoy (1995), problem (17) is still NP-hard. The following property holds that generalizes a result from Uzsoy (1995) for single regular performance measures.
Property 1 There is a Pareto-optimal schedule for each point of the Pareto front of an instance of problem (17) where all batches except maybe the last scheduled batch of a family contain B jobs.
We refer to (Rocholl et al. 2018) for a proof of this property. Note that Property 1 is not necessarily true for problem (3). It follows from Property 1 that a Pareto-optimal schedule exists for each point of the Pareto front where the number of batches in family f is ⌈ n f B ⌉ . A second property holds where the structure of batches in Pareto-optimal schedules for instances of problem (17) is considered. (17) there is a Pareto-optimal schedule where for each pair of batches 1 and 2 of the same family with completion times Cb 1 ≤ Cb 2 the weight of each job belonging to 1 is not smaller than any weight of a job from 2 .

Property 2 For each point of the Pareto front of instances of problem
A proof of this property can be again found in Rocholl et al. (2018). The jobs in each family are sorted with respect to non-increasing values of the job weights. These job sequences are then used to form batches for each family according to Property 1. The total number of batches is known and denoted by b . The following compact MILP formulation is possible for problem (17). Only additional notation compared to the MILP formulation (4)-(16) is introduced: wb i sum of the weights of the jobs that form batch i.
The MILP model can be formulated as follows:

3
Bi-criteria parallel batch machine scheduling Constraint set (19) makes sure that each batch is used only once, whereas the constraint set (20) ensures that at most one batch is processed in each period on a single machine. The formulation (18)-(21) is easier to solve than (4)-(16) since the batches are already formed in the former formulation. This leads to less binary decision variables. The (equidistant) -constraint method can be applied to the MILP (18)-(21) in the same way as shown in the Appendix for the MILP (4)-(16). The -constraint method is able to determine all Pareto-optimal solutions for medium-sized problem instances with up to 90 jobs in a short amount of computing time. Therefore, we can assess the quality of the proposed heuristics based on these instances.

Discussion of related work
Next, we discuss the literature with respect to multi-criteria batch scheduling in semiconductor manufacturing and energy-aware scheduling for parallel batching and serial machines, especially for TOU tariffs. There are many papers that deal with single-criterion parallel batch scheduling in semiconductor manufacturing. For related surveys we refer to Mathirajan and Sivakumar (2006) and Mönch et al. (2011).
It turns out that multi-criteria scheduling approaches with parallel batching machines for semiconductor manufacturing are rarely discussed in the literature. We are only aware of Reichelt and Mönch (2006), Mason et al. (2007) and Li et al. (2009). In the first paper, the scheduling problem P|r j , p − batch, incompatible|ND C max , TWT is solved using an NSGA-II approach that is hybridized by a list scheduling approach, while the second paper proposes a NSGA-II scheme for a similar parallel batch machine scheduling problem for TWT, cycle time variation, and time constraint violation. Here, C max is the makespan.
The third paper uses ant colony optimization to deal with a similar problem which requires additional consideration of qualification runs and sequence-dependent setup times. Problem (3) is different from these settings since it might be reasonable to consider schedules that are not necessarily non-delayed due to the TOU tariffs.
Energy-aware scheduling has recently attracted a number of researchers from academia and industry. Recent surveys of this topic are (Giret et al. 2015, Merkert et al. 2015, Gahm et al. 2016, Akbar and Irohara 2018, and Gao et al. 2019). They conclude that scheduling with energy-related objectives and constraints is an important research direction.
We start by discussing related work for single and parallel batch processing machines with EC-related objectives. The problem 1|p − batch, p j ≡ p|ND C max , EC is discussed by Cheng et al. (2014). All jobs have the same processing time, i.e., there is only a single job family. A TOU tariff is assumed. The ε-constraint method is used. Improved MILP formulations for this problem are discussed by Cheng et al. (2016b). Cheng et al. (2017) propose a heuristic variant of the ε-constraint method for the same scheduling problem, but on/off switching of the machines is allowed. A similar single-machine batch scheduling problem is considered by Wang et al. (2016). The energy consumption depends on the selected temperature of the machine. The ε-constraint method is applied for small-sized instances whereas constructive heuristics are designed to solve large-sized instances in a reasonable amount of computing time. Cheng (2017) and Cheng et al. (2016a) study the 1|p − batch|ND C max , EC problem where p-batch means that the processing time of a batch is determined by the longest processing time of the jobs that belong to the batch. The ε-constraint method is used to solve this problem. The resulting singleobjective problems are solved by considering a series of successive knapsack problems or multiple knapsack problems. Fuzzy logic is used to recommend a preferred solution to decision-makers. The problem 1|p − batch|ND L max , NB is studied by Cabo et al. (2018). Here, L max and NB are the maximum lateness and the number of formed batches, respectively. Note that a small number of batches lead to a small energy consumption too. The ε-constraint method is used to solve small-sized instances, while a biased random-key genetic algorithm is use to tackle large-sized instances. Since problem (3) contains parallel machines, the scheduling techniques from these single-machine papers cannot directly be applied to the present problem.
The problem P|r j , p − batch|ND TWT, CO 2 is studied by Liu (2014). The CO 2 performance measure is considered. Its value is obtained from multiplying the total EC by a constant factor. A NSGA-II approach is used to solve large-sized instances. The problem P|r j , s j , p − batch|ND C max , EC is studied by Jia et al. (2017) using a bi-criteria ant colony optimization (ACO) approach. Jia et al. (2019) consider the same problem as in (Jia et al. 2017), but the energy consumption of the machines in parallel can be different. Moreover, the ACO approach and the local search (LS) scheme are improved compared to the previous work. Cheng (2017) consider the scheduling problem Q|s j , p − batch, |ND(EC, NEM) , where Q refers to uniform parallel machines and NEM is the number of enabled machines. A two-stage heuristic scheme is proposed. Batches are formed on the first stage, whereas the batches are assigned and sequenced on the parallel machines in the second stage. None of these papers except the paper by Liu (2014) considers the TWT performance measure. The scheduling problem discussed by Liu (2014), however, does not have incompatible families and a TOU tariff. Hence, the proposed techniques in these papers cannot be directly applied to problem (3). Problem (17), a special case of problem (3), is studied by Rocholl et al. (2018). A MILP formulation is presented. The ε-constraint method is used in preliminary computational experiments for medium-sized problem instances. No heuristic approaches are considered. In the present paper, we will use the ε-constraint method for the problem studied by Rocholl et al. (2018) to assess the proposed GA-type approaches. The main contributions of this paper are twofold: 1. We consider a bi-criteria energy-aware scheduling problem for parallel batching machines and incompatible job families with a due date-related objective function, namely the TWT performance measure. It is well-known that this class of scheduling problems even for a single machine is strongly NP-hard (Brucker et al. 1998). Although it has applications in wafer fabs, to the best of our knowledge, this problem is not considered in the literature so far. 2. From a methodological point of view, we design three representations for GA-type algorithms that can be used in more general situations where non-delay schedules are not of interest. It is well-known that determining starting points given the sequence of activities is a nontrivial task (cf. Vidal et al. 2015). However, we are only aware of the papers by Goncalves et al. (2008) and Moon et al. (2013) where delay times are systematically incorporated in GAs for scheduling problems.

Basic design ideas
Various metaheuristics are designed for multi-objective optimization problems. Among them, the NSGA-II approach proposed by Deb et al. (2002) is widely used (Landa Silva and Burke 2002). That is why we use a NSGA-II scheme in this research. More details of the major NSGA-II principles are discussed in Sect. 4.2. Scheduling problem (3) requires the formation of groups of jobs, i.e., a single batch is represented by a group. Grouping problems can be solved efficiently by GGAs (cf. Falkenauer 1998). Successful applications of GGA to bin packing problems (Falkenauer 1996), single-machine batch scheduling (Sobeyko and Mönch 2011), and single-machine multiple orders per job (MOJ) scheduling problems (Sobeyko and Mönch 2015) are known (cf. Sect. 4.2 for a more detailed discussion of GGAs). Therefore, we use a GGA-based representation within the NSGA-II scheme. We have to design more problem-specific representations to refine the GGA standard representation. Therefore, three representations will be proposed in Sect. 4.3. LS is crucial to improve the solutions determined by NSGA-II-type algorithms (Deb and Goel 2001). Moreover, it is an important ingredient in metaheuristics for batch scheduling problems. This includes memetic algorithms, GAs where each chromosome is improved by LS (Chiang et al. 2010). LS procedures proposed for batch scheduling problems by Sobeyko and Mönch (2011) and Jia et al. (2017) are adapted to problem (3) in Sect. 4.4. Due to considering non-delayed schedules, it is likely that different non-dominated solutions can be found in the proximity of an already known Pareto-optimal solution. In the case of large Pareto frontiers, due to the resulting huge search space it may be inefficient or even impossible to fully explore the neighborhoods of all already determined solutions using a metaheuristic approach like NSGA-II. Therefore, an improvement heuristic is proposed in Sect. 4.5 to enlarge the set of non-dominated solutions with adjacent solutions within the same front.

NSGA-II
The NSGA-II approach is designed to ensure diversity by exploiting information of solutions from the entire population. The set of solutions that corresponds to a population is sorted into distinct fronts of different domination levels in each iteration.
The first front contains all solutions which are not dominated by any other solution.
The second front contains those which are only dominated by solutions from the first front, and so on. The fitness value of an individual is determined by the front its solution belongs to. For solutions belonging to the same front a crowded-comparison operator is used that assigns a higher fitness value to solutions in less crowded regions of the solution space. Binary tournament selection is used, i.e., two individuals of a population are randomly chosen and the one with higher fitness is selected for crossover (Goldberg 1989;Michalewicz 1996). Offspring are generated by recombination until the population size is doubled. An elitist strategy (cf. Michalewicz 1996) is applied. Individuals are inserted into the new population by nonincreasing fitness values, i.e., solutions are accepted starting from the first front until the original population size is reached. Solutions from the least crowded regions of the solution space are preferred from the last front to be accepted. In the present paper, we apply a NSGA-II procedure to deal with the two criteria of problem (3). We will describe the encoding and decoding schemes in the next subsections.

GGA
GGAs are introduced by Falkenauer (1996Falkenauer ( , 1998 since conventional encoding schemes for grouping problems often do not work well. This is caused by the fact that a direct encoding of grouping decisions often leads to highly redundant representations. Furthermore, recombination operations usually disrupt the formed group and require sophisticated repair actions. A GGA is a GA with a representation in which the genetic operations are not applied to the jobs but rather to the formed groups. One gene encodes one or more jobs of the same family and represents a group, i.e. a batch, formed by these jobs. The number of formed groups may vary for different solutions. As the batch formation is a major decision in solving problem (3), we choose a grouping representation as the base for the encoding and decoding schemes described in the next subsection. Each genome consists of a given set of groups (batches). An example for the grouping representation is provided in Fig. 2. Four batches are formed that belong to three different families.
Recombination of two individuals can be carried out by a two-point crossover (cf. Goldberg 1989;Michalewicz 1996). First, the selected genes from both parents are copied to the offspring, only a single chromosome due to the NSGA-II principles (see Sect. 4.2.1). Possible duplicate jobs are then deleted from the genes of the second parent. To ensure feasibility, jobs missing in the offspring must be reinserted. It is well known (cf. Brown and Sumichrast 2003) that the chosen reinsertion strategy highly influences the performance of a GGA for a given problem. The reinsertion operation is designed in such a way that a missing job j is inserted in the first batch i of family s(j) for which the r j value is less or equal to the ready time of at least one job that already belongs to i and ∑ k∈i s k + s j ≤ B . In contrast to Falkenauer (1998), we do not use any mutation for the GGA-type part of the encoding scheme.

First representation
The LIST representation is based on the idea to schedule each batch in such a way that its impact on only one of the two objectives is small. Therefore, we use a random key ∈ [0, 1] in addition to the groups in each individual to encode which batches are scheduled with respect to TWT and which ones with respect to EC. If we have b groups in a chromosome (see Fig. 2) then the batches labeled by 1, … , ⌊ b⌋ are scheduled in such a way that the corresponding partial schedule has a small TWT value, while the batches ⌊ b⌋ + 1, … , b are scheduled such that their EC value is small. During crossover, the value is inherited from the first parent chromosome to the child chromosome. If mutation is applied to an individual, its value is randomly reinitialized.
Next, we describe how we decode the batches in the two subsets. The decoder is given by two list scheduling algorithms that assign batches to machines and to time slots. The first algorithm, applied to the batches 1, … , ⌊ b⌋ and abbreviated by LIST-ASAP, can be summarized as followed. Here, LIST refers to list scheduling and ASAP indicates that batches are started as soon as possible. 1. Whenever a machine becomes available, the first batch from the list is assigned to it and set to be started as soon as possible respecting the ready times of the jobs that belong to the batch. 2. Update the availability time of the machine. 3. Remove the batch from the list of unscheduled batches. 4. Go to Step 1 if the remaining list is non-empty.
The LIST-ASAP procedure produces non-delay schedules with rather small completion times, while the EC value is completely neglected.
The second list scheduling algorithm for the batches ⌊ b⌋ + 1, … , b , abbreviated by LIST-MEC, determines a partial schedule with a small EC value. In the abbreviation, MEC is used to indicate minimum EC values. The procedure can be summarized as follows: LIST-MEC procedure 1. Consider the first batch from the list. 2. Calculate the EC value for the batch for all feasible starting times of the batch by respecting machine availability, ready times of jobs that belong to the batch, and its processing time. 3. The batch is then scheduled on a machine at a starting time that leads to the smallest EC value. Ties are broken by considering the smallest starting time. 4. Update the machine availability of the chosen machine. 5. Remove the batch from the list of unscheduled batches. 6. Go to Step 1 when the remaining list is non-empty.
When electricity tariffs are used which are not increasing over the entire scheduling horizon, the EC measure is not regular, i.e., the objective value does not always increase with larger completion times. As a consequence, the search for Pareto-optimal solutions cannot be limited to the class of non-delay schedules. Although especially LIST-MEC may lead to schedules with idle times between batches, the completion times of batches will be limited to only a subset of all periods. The first batch to be assigned, for instance, will either be started immediately after its ready time or in the least expensive periods with respect to the EC value. Therefore, it is likely that certain parts of the search space are not explored by the LIST representation.

Second representation
This limitation of the LIST representation motivates a second encoding scheme which allows for inserting idle times between the processed batches. This representation is abbreviated by batch delayed (BD) to indicate that the processing of batches can be delayed. Given a fixed scheduling horizon, determining intervals of possible delays for the processing of a batch is non-trivial due to the strong interdependency of the involved scheduling decisions. To allow a feasible schedule, the inserted idle times have to be smaller or equal to the total idle time of a machine which depends on the formed batches and their sequence on the machine. Moreover, the interval of possible delay for a single batch depends not only on the total idle time but also on the delays inserted before all other batches on the machine.
The assignment of batches to machines is encoded by a random value i ∼ DU [1, m] for each batch i. Here, DU[a, b] is a discrete uniform distribution over the set of integers {a, … , b} . The presence and length of idle times before the processing of a batch i is encoded by a random key i ∈ [0, 1) . Additional random keys ̃k ∈ [0, 1), k = 1, … , m , are used to represent the amount of idle time that is inserted after the last batch on machine k. Let B k be the sequence of batches assigned to machine k, and C max (k) the completion of the last batch on machine k if the first batch of B k can start at t = 0 . The amount of idle time inserted before batch i assigned to machine k is then calculated as follows: On the one hand, as the amount of idle time is relatively encoded, this information is context-sensitive to a large extent. Thus, we expect that crossover and LS operations will have a disruptive effect on the timing information. Therefore, this information cannot be passed along to later generations well. On the other hand, virtually all solutions from the solution space can be encoded and recombination always yields feasible schedules, provided that the encoded batch formation and assignment allow for one.
We expect that a random choice of the i values will only rarely lead to non-delay schedules. But since non-delay schedules are favorable with respect to small completion times, we generate a certain amount of individuals in each generation with i = 0, i ∈ {1, … , b} and ̃k > 0, k = 1, … , m . The crossover operator passes on the genes with the i and i values since each i and i value belongs to a group. The set of values ̃k , k = 1, … , m is inherited from the first parent chromosome. The mutation operator randomly reinitializes the i values for i = 1, … , b.

Third representation
A third representation, abbreviated by hybrid (HYB), is designed as a hybrid of LIST and BD. A chromosome of HYB contains all the features of the two encoding schemes. Moreover, a random key ∈ [0, 1] determines how the chromosome is decoded. The decoding scheme of LIST is applied if < 0.5 holds, otherwise the BD decoding scheme is used. Only the components relevant for that particular encoding scheme are interpreted during decoding. Still, the other components are passed on to offspring because the scheme to be applied might be changed by recombination or mutation. Given the 1 , 2 values from two chromosomes chosen for recombination, ∶= 1 + 2 2 is chosen for the single child chromosome by crossover. The mutation operator sets the value as follows:

Local search schemes
We propose a two-phase LS approach. In a first phase, the work load of the machines is balanced. The batch sequenced last on the machine with the largest completion time of a batch is reassigned to the machine with the lowest completion time if the objective values do not increase. This operation is repeated until no further improvement can be achieved. The described workload balancing procedure is also used to repair chromosomes with the BD representation if certain batch assignments lead to infeasible schedules. We start by describing a LS scheme which is applied in the second phase. The LS scheme can be summarized as follows: LS procedure 1. Job insert All jobs in a schedule are considered for being removed from one batch and inserted into another batch of the same family with enough available capacity and a starting time that is not smaller than the ready time of the job to be inserted. A job is reassigned to another batch if that leads to a reduced total completion time (TC) value. The TC measure is considered in this situation because even if reassigning a job does not lead to a reduced TWT value, it may allow that other delayed jobs are processed earlier. A single pass through all jobs in the order of non-decreasing completion times in the original schedule is performed. 2. Job swap The batch formation can be altered by exchanging two jobs of the same family between two different batches. All jobs are considered for a swap operation if the batches are not started before the ready times of the jobs. A swap is performed if the TWT value is reduced. All jobs are considered in a single pass through the list of jobs ordered with respect to non-decreasing completion times in the original schedule. 3. Batch postpone If a batch is completed before the smallest due date of the jobs that form the batch then the batch can be postponed without affecting the TWT value. Postponing a batch might even reduce the EC value if it is moved to a less expensive time slot. Each batch is considered in a single pass from the right to the left starting with the last batch on each machine. A batch is postponed as much as possible as long as both objective function values are not increased. 4. Batch pull It might be possible to shift batches to the left to reduce the TWT value without increasing the EC value. Therefore, a single pass from the left to the right through the list of batches processed on each machine is made. Each batch is shifted to the left as much as possible as long as the EC value is not increased.
A second LS procedure, applied within the NSGA-II scheme, is composed of batch postpone and pull operations. An integrated objective function is used for a given schedule S as follows: where the weights are given by Here, f max and f min are the maximum and minimum value of the objective function values f for the entire population. The batch postpone and pull moves are applied with respect to the integrated objective function F instead of using the TWT and the EC measures separately. We abbreviate this procedure by LS-NSGA-II.

Improvement heuristic
The main principle of this algorithm is to iteratively compute modified schedules by postponing, i.e. right-shifting, batches. Batches are only postponed when EC reductions occur and only a minimum impact on the TWT value can be observed. This is important in order to obtain new schedules for which it is likely that they are Paretooptimal. Because of these features we abbreviate the algorithm by MIN-POSP where MIN indicates the minimum impact on the TWT measure whereas POSP refers to a postponement of batches. The batch formation, batch assignment, and sequencing decisions remain unchanged.
Let k[n] be the batch in the n-th position on machine k and C k[n] the completion time of this batch. In addition, k is the last scheduled batch on machine k. The mapping t(k, n) provides the number of periods the batch k[n] can be postponed without interfering with the batch k[n + 1] or the scheduling horizon if k[n] = k . We obtain: The algorithm starts from a given schedule S . The notation Δf ∶= f S � − f (S) is introduced for a schedule S * that is obtained from S by modifying it. A modified schedule S * is returned if an EC value reduction is achieved by postponing at least one batch. The heuristic can be summarized as follows: , otherwise .
Steps 6-8 ensure that only those EC reductions are considered which lead to the smallest TWT increase. The batches for moves to be implemented in the input schedule S are stored in Step 9. The computational effort for the MIN-POSP procedure is O b 2 T .
Given an initial non-dominated set, the MIN-POSP is applied to each element of this set. The procedure is then applied in an iterative manner to the schedules obtained in the last iteration. Each schedule obtained in this way is inserted into the enlarged set of all solutions. Dominated solutions must be eventually removed from this set after the application of the MIN-POSP procedure is completed. Therefore, improving a frontier by applying MIN-POSP is time-consuming. However, we expect it can significantly increase the quality of the Pareto-optimal set obtained by the NSGA-II approach.

Overall approach
Three approaches based on the NSGA-II metaheuristic are considered. Each approach is based on the GGA encoding scheme described in Sect. 4.2.2. The basic GGA encoding scheme is extended by the encoding of additional information for the decoding process, namely the LIST, BD, and HYB representations. The different encoding and decoding schemes are summarized in Table 1.
The three approaches are called GGA-LIST, GGA-BD, and GGA-HYB, respectively where the abbreviation after GGA is inherited from the corresponding representation. They are depicted in Fig. 3.
The LS approach from Sect. 4.4 is applied within the three GAs each time a chromosome is changed by a crossover or mutation operation. Moreover, it is applied to each chromosome of the initial population after the chromosome is randomly generated. The LS-NSGA-II scheme is applied to each individual of a generation of the NSGA-II approach as soon as a population is completed. The Based on the random keys GGA-HYB Groups of jobs (batches) Additional encoding from LIST and BD, random key which determines whether GGA-LIST or GGA-BD is used Depends on the value MIN-POSP procedure is optionally used to improve the Pareto front obtained by the NSGA-II approach.

Design of experiments
We expect that the quality of the proposed heuristics depends on the number of jobs, the number of families, the maximum batch size, the job sizes, and the ready time and due date setting. Since problem (3) is NP-hard, the -constraint method can only be applied to small-sized problem instances. Therefore, in a first experiment, a set of small-sized problem instances is generated according to the design of experiments (DOE) shown in Table 2.
The ready time and due date setting is similar to Mönch et al. (2005). We use the average batch size B = B∕s = 2 where s is the average job size. The winter e W and the summer electricity tariffs e s are defined as follows:

Fig. 3 Proposed GGA-type representations
These winter and summer tariffs mimic to some extent the TOU rate plans offered by the Pacific Gas and Electric Company (TOU Rates-Pacific Gas and Electric Company 2019) with respect to the time windows over the course of 1 day and the relative difference between price levels. Semiconductor manufacturers consider information on their manufacturing processes as highly confidential. Hence, actual values of the energy consumption caused by processing batches in a furnace are not available. Instead of striving for decision support for a particular wafer fab, we provide general guidelines for managers to uncover savings potential. The e W (t) and e S (t) functions can easily be adapted for a real wafer fab.
Overall, we have 80 small-sized instances. These instances can be used to determine the Pareto frontier. This allows us to check whether the three heuristics are correctly implemented. Note that for the small-sized instances all the problem data is integer. Hence, the TWT and EC values are also integer and the -constraint method with a step size of 1 provides the entire Pareto front.
In addition, we consider large-sized instances. The corresponding design of experiments is shown in Table 3. Note that the largest instances contain 240 jobs. Such instances can be found in medium-sized wafer fabs (Mönch et al. 2013). The factors are the same as used for the small-sized instances.
We know from (Rocholl et al. 2018) that medium-sized instances for problem (17) can be solved to optimality using the -constraint method. Therefore, it is possible to compare the Pareto frontiers determined by the three heuristics with the true Pareto frontier. We consider problem instances with F = 2 , n f = 20 , B ∈ {2, 4, 8} and F = 3 , n f = 30 , B ∈ {4, 8} . The number of machines, processing times, and  (17). The large-sized problem instances from Table 3 are solved in another experiment. A fractional design is considered to assess the solution quality and the required computing time for the MIN-POSP procedure in an additional experiment. Therefore, we consider the design from Table 3 for n f = 20 . This leads to 192 problem instances with up to 120 jobs.
To examine the influence of the LS and LS-NSGA-II schemes described in Sect. 4.4, we conduct additional experiments with 20 instances randomly chosen from the set obtained by the design of experiments summarized in Table 3. Additional variants of each of the three heuristics are implemented where at least one of the LS and the LS-NSGA-II schemes are included. This leads to four variants of each heuristic to be compared.
The proposed bi-criteria heuristics determine an approximation of the Pareto frontier. Instead of comparing the heuristics based on objective function values, quality measures have to be derived from the approximation sets (Van Veldhuizen 1999;Zitzler et al. 2003;Coello Coello and Lamont 2004). Let Y H be the set of solutions obtained by the heuristic H and Y true be the corresponding set of Pareto-optimal solutions. The overall non-dominated vector generation (ONVG) measure is given by i.e., it is the number of non-dominated solutions found by H . The overall non-dominated vector generation ratio (ONVGR) measure is calculated as follows: Moreover, a distance measure is used that computes the mean distance of solutions provided by H to the nearest solution of the true Pareto front. Therefore, we consider: for y ∈ Y H and ŷ ∈ Y true where f max k and f min k are the maximum and minimum of the k-th objective function component found among the solutions from Y H and Y true . The average distance of a solution from Y H to the closest solution in Y true is then given by The set coverage (CS) indicator determines the percentage of individuals in one set dominated by the individuals of the other set, i.e., we have Moreover, the hypervolume (HV) indicator proposed by Zitzler and Thiele (1998) is used. It is a measure for the volume of the objective space dominated by a solution set. The HV indictor value for a solution set Y is given by Here, y (i) refers to the i-th solution in Y in descending order of TWT and TWT max , EC max denote the maximum value found for the TWT and the EC values, respectively and is used as reference point. The relative HV index, abbreviated by HVR and defined by HVR Y H , Y true ∶= HV Y H HV Y true is applied.
We are also interested in looking at the EC savings potential in relation to the TWT impairment. Therefore, we define the two measures and (29) HV(Y) ∶= (TWT max − TWT(y (1) ))(EC max − EC(y (1) )) (TWT(y (i) ) − TWT(y (i+1) ))(EC max − EC(y (i) )). for the Pareto front of an instance I . The pair TWT (I), EC (I) indicates the increase of the TWT value relative to the smallest possible TWT value and the EC reduction relative to the largest occurring EC value.
When the -constraint method can be applied, Y true is the set of Pareto-optimal solutions. For problem instances too large to be exactly solved in reasonable amount of computing time, the set Y true is formed by all known solutions for this problem instance where dominated solutions have been removed. All heuristics are performed five times with different seeds for each instance since the NSGA-II approach has stochastic elements. The set of solutions is then formed by these five replications where dominated solutions are removed.
Instead of using a prescribed number of generations for the different GAs we allow a maximum computing time of 180 s for instances of Table 3, while the maximum of the remaining problem instances is only 60 s.

Parameter settings and implementation issues
Preliminary computational experiments with a limited set of problem instances are conducted to find appropriate parameter values of the different heuristics. We randomly choose 20 problem instances from the set presented in Table 3. The population size is taken from {150, 300, 450} , while the mutation probability p M ∈ {0.01, 0.05, 0.1} is considered. For the sake of simplicity, the overall average of the HVR values of the solution sets found by the three heuristics are chosen to determine the parameter settings. As a result, we use 300 as population size and p M = 0.01 within the experiments. We do not use a specific crossover probability in our NSGA-II approach. The parameter in the GGA-BD is set to 10% of the individuals of a generation.
The C++ programming language is used to code all the algorithms. The CPLEX 12.8 libraries are used to implement the -constraint method. The NSGA-II approach is implemented using the MOMHLib++ framework by Jaszkiewicz (2019). All computational experiments are performed on an Intel Core i7-2600 CPU 3.40 GHz PC with 16 GB RAM.

Overview
We start by discussing computational results for small-sized instances of problem (3). This allows us to compare the heuristics with the true Pareto frontier obtained by the -constraint method. In a next step, results for experiments with the larger instances of problem (17) are presented. Again, the true Pareto frontier is obtained by the -constraint method. Since the -constraint method does not work for the large-sized instances of problem (3), we next present results for large-sized instances where the set Y true is formed by all known solutions as described above. We provide results that allow us to study the impact of the improvement heuristic and the two LS schemes. Finally, we discuss some managerial implications of the computational results.

Results for small-sized problem instances
Instead of comparing all instances individually, the instances are grouped according to the factor level values in all the tables found in this subsection. Results for the winter tariff, n f = 20, and F = 3 imply, for instance, that all other factors have been varied, but the tariff, the number of jobs per family, and the number of families have been constant at winter, 20, and 3, respectively. Table 4 shows the ONVGR and dist values for small-sized instances. The corresponding CS and HVR measure values are stated in Table 5. Best results are always marked in bold in the rest of the paper. Figure 4 exemplifies sets of non-dominated solutions obtained by the NSGA-II approach with the different encoding schemes, namely GGA-LIST, GGA-BD, and GGA-HYB and the -constraint method, abbreviated by OPT for a small-sized instance of problem (3).
The computational results from Tables 4 and 5 demonstrate that the NSGA-II approach with the GGA-BD representation is able to find near-to-optimal solutions for small-sized instances. The GGA-BD representation outperforms the  GGA-LIST representation for all performance measures under almost all experimental conditions. The largest difference can be observed for the ONVGR measure. This indicates that the GGA-BD representation is able to encode a much larger portion of the solution space than the remaining representations. The average CS measure value reveals that on average around 68% of the solutions found with the GGA-BD representation are Pareto-optimal. The GGA-BD representation again leads to the best average HVR value. However, for two sets of factor combinations, namely both tariffs, F = 3, = 0.25, = 0.25 , the GGA-LIST representation shows an almost equal or even better HVR value despite providing a much smaller set of non-dominated solutions. Apparently, in some situations it can be reasonable to search for a limited set of high-quality solutions. The GGA-HYB representation shows average values close to the ones of the GGA-LIST for both the dist and the CS measures. Compared to the GGA-BD representation, fewer solutions are found but still the average HVR value is close to the one found by the GGA-BD representation. The non-dominated sets exemplified in Fig. 4 support these observations.

Results for problem (17)
Results of the computational experiments for problem (17) are presented in Tables 6 and 7. The results in Tables 6 and 7 largely confirm the findings of the small-sized instances of problem (3) for experiments conducted for problem (17). The GGA-BD representation outperforms the remaining representations with respect to the ONVGR and dist measures. Applying the GGA-LIST representation can provide a large fraction of Pareto-optimal solutions under some experimental conditions as can be seen from the average CS value.

Results for large-sized problem instances
Computational results for the large-sized instances from Table 3 are shown in  Tables 8 and 9. The interested reader is referred to the electronic supplement for detailed computational results for all factor combinations.
No clear superiority of one representation over the others can be found. Apparently, the advantage of the GGA-BD representation cannot be confirmed here. In particular, a decline of the ONVGR values with a larger total number of jobs n = Fn f can be observed. This indicates that the criteria space is too large to be searched efficiently by the GGA-BD approach. Considering the dist and the HVR measures, the GGA-HYB representation outperforms the remaining representations. We observe that the GGA-LIST representation leads to higher-quality schedules for all performance measures when applied to instances with six job families compared to instances with only three families. On the contrary, the GGA-BD representation can produce better solutions if the jobs are distributed over three families only. The number of families does not influence the performance of the GGA-HYB representation which can be expected as it combines features of the two remaining representations.

Impact of the improvement heuristic and the local search schemes
The impact of applying the MIN-POSP procedure from Sect. 4.4 on the number of solutions in the corresponding Pareto frontier is shown in Fig. 5 as the average ONVG value over all problem instances described in Sect. 5.1. The MIN-POSP procedure is called improvement heuristic in this figure.
On the one hand, the procedure is able to find a large number of additional non-dominated schedules for all assessed representations. On the other hand, the computing time can become very long. It depends on the original number of solutions being the input and the time horizon and number of batches of the corresponding schedule. The additional computational burden per instance is between less than one second and one hour with an average value of 87 s. From the experiments conducted it appears that the additional time requirement can be limited to around 2 min if the MIN-POSP procedure is applied to instances with not more than 60 batches.  either one of the proposed schemes is applied. The largest number of non-dominated solutions is found when both LS schemes are used. The ONVGR measure improves most for the GGA-BD representation whereas the GGA-HYB representation does not profit as much with respect to this measure. However, Fig. 7 shows a positive impact of LS on the distance measure for all three heuristics. Obviously the application of both LS schemes can move solutions closer to the true Pareto frontier. Table 10 shows the EC savings in relation to the observed TWT increase. Note that we use only Pareto frontiers determined by the GGA-HYB since the differences between the different heuristics are fairly small.

Managerial insights from the computational experiments
The following managerial insights can be derived from the conducted experiments and Table 10:  1. Schedules with a small EC value tend to schedule batches in periods with small e(t) values, i.e. into off-peak periods. These schedules contribute to better balancing demand and supply of electricity. 2. In a highly loaded wafer fab, the room for EC reduction is smaller than in a wafer fab with moderate load. Small values mimic a high-load. The highest TWT impairment can be observed for tight due dates, represented by = 0.25 , and wide-spread ready times dates given by = 0.50 . In this situation, a formation of full batches, i.e. reaching small EC values, is often only possible if some jobs are delayed for a long time. This leads to large TWT values. 3. The smallest EC improvement potential can be observed for a small number of jobs within each family. In this situation, it is not very likely that most of the batches are fully loaded. Hence, the number of batches is high and the EC values are large. This insight might be taken into account when product mix decisions are made by management. 4. A larger processing flexibility represented by a larger number of parallel machines increases the ability for EC reductions. 5. The EC improvement potential is slightly smaller for the winter tariff. However, at the same time the EC improvement is reached at the expense of a larger TWT impairment in comparison to the summer tariff. The summer tariff results in smaller EC improvement, however, the TWT impairment is also smaller. This is caused by the larger number of available options of the summer tariff. This is also reflected by a larger average number of non-dominated solutions in the case of the summer tariff. Therefore, TOU tariffs with more segments seem to be beneficial.

Conclusions and future research
In this paper, we discussed a bi-criteria scheduling problem for identical parallel batch processing machines. The TWT and EC performance measures were used. A MILP formulation is proposed. In addition, we provided a second, much simpler MILP for a special case where structural properties of Pareto-optimal solutions were provided. Both MILP models were solved using the -constraint method. To tackle large-sized problem instances within a reasonable amount of computing time, we proposed three GGAs that are embedded into a NSGA-II-type framework. They differ in representations and decoding schemes. The performance of the GA variants was assessed using randomly generated problem instances. Moreover, the -constraint method was used to assess the correct implementation of the heuristic approaches and to check the performance of the GA-type heuristics for medium-sized instances of the special case with the TWC performance measure instead of the TWT measure. Overall, we were able to demonstrate by the computational experiments that some of the proposed heuristics perform very well. Senior operators in wafer fabs can apply the proposed algorithms to choose furnace schedules that find a compromise between tardiness and sustainability goals. However, large EC reductions in highly loaded wafer fabs are only possible at the expanse of large TWT values. A larger number of incompatible job families and of parallel machines are preferred from an EC reduction point of view. This paper contributes to sustainable manufacturing from two points of view. First, it allows for computing schedules with small EC values, which might result in a small CO 2 emission. Second, schedules with small EC values tend to have many batches in the off-peak periods, i.e. they balance the demand and supply of electricity. As a result, less backup infrastructure (which often generates electricity in a more dirty way) is required. Again, reduced CO 2 emissions are likely to be the result.
There are several directions for future research. First of all, although the -constraint method used in this paper is able to compute the exact Pareto front for the used integer-valued problem instances, it is interesting to look at several recent efficient implementations of this method for the specific instances and general instances, for example the one provided by Mavrotas and Florios (2013). Secondly, the problem setting can be generalized by considering unrelated parallel machines instead of identical ones as in the present paper. Moreover, the energy consumption can be made family-dependent. We believe that it is also desirable to model standby times and sequence-dependent warm-up and cool-down processes since they lead to a different amount of energy consumption. It is also interesting to explore in addition to TOU tariffs different types of real-time energy pricing in the context of the scheduling problem at hand. As a third direction, we are interested in designing sampling algorithms for the present problem to obtain robust schedules since the ready times are typically uncertain in a real-world wafer fab. Integrating the proposed scheduling technique in a global scheduling approach for an entire wafer fab as, for instance, the one proposed by Mönch et al. (2007), is also desirable.
Bi-criteria parallel batch machine scheduling EC = M where M is determined using the piecewise constant EC function. The solution is a schedule S with objective function value TWT(S) where the EC value is restricted to M . The MILP is then solved a second time with the settings E TWT = 0, E EC = 1, TWT = TWT(S) . The result is a Pareto optimal schedule. The next iteration starts from E TWT = 0, E EC = 1, EC = 0 , and TWT = TWT(S) − 1 . The setting TWT = TWT(S) − 1 is reasonable since the TWT and EC values are integers due to the integer parameters (see Sect. 5.1). This procedure is repeated until the MILP becomes infeasible for the parameters EC and TWT .