A new mixed-integer programming formulation for the maximally diverse grouping problem with attribute values

The paper presents a new mixed-integer programming formulation for the maximally diverse grouping problem (MDGP) with attribute values. The MDGP is the problem of assigning items to groups such that all groups are as heterogeneous as possible. In the version with attribute values, the heterogeneity of groups is measured by the sum of pairwise absolute differences of the attribute values of the assigned items, i.e. by the Manhattan metric. The advantage of the version with attribute values is that the objective function can be reformulated such that it is linear instead of quadratic like in the standard MDGP formulation. We evaluate the new model formulation for the MDGP with attribute values in comparison with two different MDGP formulations from the literature. Our model formulation leads to substantially improved computation times and solves instances of realistic sizes (for example the assignment of students to seminars) with up to 70 items and three attributes, 50 items and five attributes, and 30 items and ten attributes to (near) optimality within half an hour.


Introduction
The maximally diverse grouping problem (MDGP) is the problem of assigning a set of items, i, j ∈ I , to groups, g ∈ G, such that each group gets the same or a similar number of items assigned and the sum of pairwise distances d i j between all items assigned to the same group is maximized. Thus, for equal-sized groups the heterogeneity inside groups is maximized, which leads also to similar groups.
Although the MDGP is in its general version formulated with arbitrary distances d i j , common applications like the assignment of students to seminars use attribute values av a i for all students and attributes. Examples are grades or other performance scores or zeroone attributes (e.g. for international or non-international). Given these attribute values, the B Arne Schulz arne.schulz@uni-hamburg.de 1 Institute of Operations Management, Universität Hamburg, Moorweidenstraße 18, 20148 Hamburg, Germany distances are computed as d a i j = |av a i − av a j |, i.e. by the Manhattan metric, for all attributes a ∈ A and items/students i, j ∈ I .
In this paper, we consider the MDGP where d a i j = |av a i − av a j | for attribute values av a i ∈ Q ≥0 and use results of Schulz (2021b) to introduce a new mixed-integer programming formulation for the problem setting. We prove that our formulation leads to the same optimal objective values as the standard formulation. Our computational results show that this formulation outperforms the standard formulation and the model formulation by Papenberg and Klau (2021) for the MDGP by far. Moreover, we outline how the approach by Schulz (2021b) for the balanced MDGP, i.e. the problem of finding a best balanced solution among all optimal MDGP solutions, can be adapted if there is no MDGP solution fulfilling Assumption 1 (compare page 6) for every attribute or if not equal-sized groups are considered.
The paper is organised as follows: In Sect. 2, the relevant literature is reviewed. Afterwards, we give a formal problem description in Sect. 3. Our mixed-integer programming (MIP) formulation is introduced in Sect. 4. The MIP is compared with the standard formulation and the model by Papenberg and Klau (2021) in a computational study in Sect. 5. The paper closes with a conclusion (Sect. 6).

Literature review
The MDGP has been investigated in different settings in the literature. Weitz and Lakshminarayanan (1997) showed that it is mathematically equivalent to VLSI design (group highly connected modules onto the same circuit) and exam scheduling (assigning exam blocks to days). In a further paper, Weitz and Lakshminarayanan (1998) mentioned the assignment of students to project groups as a possible application.
Assignments of students are a common application of grouping problems. In an early work, Beheshtian-Ardekani and Mahmood (1986) assigned students to project groups. In more recent works, students were assigned to study groups (Krass and Ovchinnikov, 2010), work groups (Caserta and Vo, 2013), and multiple teaching groups (on the basis of preferences; Heitmann and Brüggemann (2014)). Johnes (2015) published a review on operations research in education and considered amongst others the assignment of students to courses. Dias and Borges (2017) applied the MDGP to assign students to teams. Students are mostly assigned according to their academic performance (overall average grade or grades in specific courses) to groups. Furthermore, students can be distributed to groups according to their gender to reach an equally distribution of male, female, and third gender students. Moreover, international students can be distributed equally over all groups. All these measures can be implemented as attribute values (e.g. attribute value 0 for male, 1 for female, and 2 for third gender, or 0 for non-international and 1 for international). Mingers and O'Brien (1995), Krass and Ovchinnikov (2006), Krass and Ovchinnikov (2010), and Caserta and Voß (2013) assign students to groups according to binary attributes. Baker and Benn (2001) investigated in a case study how pupils in a school should be assigned to tutor groups such that the groups are as similar as possible. Criteria are the gender, the ability level, ethnic minority groups, feeder schools, and special educational needs of pupils. All of them can also be represented by attribute values. Rubin and Bai (2015) considered the problem of assigning individuals to teams to make the teams as similar as possible. Schulz (2021b) extended the MDGP (in the version with attribute values) by such a balancing component.
Homogeneity over different days and therefore heterogeneity inside days is also helpful to distribute workload evenly over days. This is, for example, important in surgery scheduling to avoid overtime (surgery durations can be represented by attribute values). Overtime minimization is a frequently investigated objective in surgery scheduling (compare the review by Cardoen et al. (2010)). Schulz (2021c) assigned surgeries to days such that the days are balanced according to the surgery durations. Schulz and Fliedner (2021) analyzed the intra-day assignment of surgeries to starting times and rooms according to several balancing criteria. Papenberg and Klau (2021) used the fact that die MDGP aims at homogeneity over groups and heterogeneity inside groups in psychology to partition data sets into equivalent parts.
The MDGP is NP-hard (Feo and Khellaf, 1990) which may be the reason why only a few exact solution approaches for the MDGP were investigated. Gallego et al. (2013) presented a computational study with the standard formulations of both problem variants presented in Sect. 3.1 ((1)-(4) and (1)-(2), (4)-(5), respectively), where only instances with up to 12 items could be solved to optimality within 1800s (general d i j ). Papenberg and Klau (2021) introduced an exact MDGP formulation for the setting with equal-sized groups (compare Sect. 3.2) based on a work by Grötschel and Wakabayashi (1989). They could solve instances with 28 items in 950s and instances with 30 items in nearly 10000s to optimality. Schulz (2021b) investigated the MDGP with attribute values. The author proved that the set of optimal solutions for the MDGP with attribute values equals for at most two attributes the set of feasible solutions of a special system of equations (for more than two attributes this does not hold in general). He searched for the best balanced solution amongst all optimal solutions of the MDGP with attribute values in the case if there is a solution fulfilling Assumption 1 (compare page 7) for every attribute. We explain the ideas of this paper further in Sect. 4, where we work with them. Schulz (2021a) generalised this research for the case in which no MDGP solution fulfilling Assumption 1 for every attribute exists. The paper considered also only equal-sized groups. Schulz (2021a) solved instances with up to 15 items and 5 attributes to optimality within 600s (version with equal sized-groups; (1-4)).
In contrast to exact approaches, the MDGP has been solved by a variety of heuristic solution approaches. Fan et al. (2011) applied a hybrid genetic algorithm to it. An artificial bee colony algorithm has been investigated for the MDGP by Rodriguez et al. (2013). Gallego et al. (2013) developed a tabu search algorithm with strategic oscillation. Tabu search in an iterated version was considered by Palubeckis et al. (2015). Moreover, Brimberg et al. (2015) applied a skewed general variable neighbourhood search algorithm to solve the MDGP, Lai and Hao (2016) iterated maxima search, and Singh and Sundar (2019) a hybrid genetic algorithm. Lai et al. (2020) implemented a neighbourhood decomposition based variable neighbourhood search and a tabu search algorithm to solve the MDGP. A recent review on metaheuristics applied to solve grouping problems can be found in Ramos-Figueroa et al. (2020). Brimberg et al. (2017) solved the clique partitioning problem as an MDGP. A similar class of problems are dispersion problems. In contrast to the MDGP, only a single group of a given size is selected. Dispersion problems are considered for example in Fernández et al. (2013), Aringhieri et al. (2015), and Amirgaliyeva et al. (2017).

Problem description
In this paper, we consider a set of items i, j ∈ I , a set of groups g ∈ G, and a set of attributes a ∈ A as given. Each item has an attribute value av a i ∈ Q ≥0 for each attribute a ∈ A. Given them, we compute the distances between each pair of items for each attribute according to the Manhattan metric as d a i j = |av a i − av a j |. Note that it is no restriction to consider non-negative attribute values. As d a i j measures only differences between them, we can add a constant c a = max i∈I :0>av a i {|av a i |} to all attribute values of attribute a without changing the d a i j values since In the following subsections, we present the standard formulation of the MDGP (Sect. 3.1) and the formulation by Papenberg and Klau (2021) (Sect. 3.2).

Standard formulation
The standard integer programming formulation for the MDGP with equal-sized groups is (compare e.g. Gallego et al. (2013), Singh and Sundar (2019) x ig is a binary variable which equals to one if item i is assigned to group g and is zero otherwise. Objective function (1) maximizes the pairwise differences between each pair of items assigned to the same group according to all attributes. Constraints (2) ensure that each item is assigned to exactly one group while Constraints (3) take care that all groups are equal-sized. Constraints (4) are the binary constraints for the x variables. Several authors (Fan et al., 2011;Gallego et al., 2013;Singh and Sundar, 2019;Lai et al., 2020) where l g is a lower bound and u g an upper bound for the number of items in group g. In this paper, we consider as a start the more restricted case with (3) and relax it afterwards to (5). Objective function (1) is quadratic. Thus, we have to linearize it to use an off-the-shelf solver for mixed-integer programming. Therefore, we introduce a new set of variables z i jg , i, j ∈ I with i < j and g ∈ G, such that the variable z i jg is one if items i and j are assigned to group g and zero otherwise. Then, we replace (1) by max a∈A g∈G i∈I j∈I :i< j d a i j z i jg (6) and add the constraints Objective function (6) replaces the product of variables in (1) by the new variable z i jg . This variable has to be one if items i and j are assigned to group g and zero else. Since 0 ≤ z i jg ≤ 1 (10), Constraints (7) and (8) set z i jg = 0 if item i or item j is not assigned to group g. Constraints (9) set z i jg = 1 if items i and j are assigned to group g. Model (1)-(4) contains symmetric solutions. Given a solution, swapping the group assignments of all items assigned to two different groups g 1 and g 2 with i∈I x ig 1 = i∈I x ig 2 leads to a different solutions which is structurally identical because the two groups have the same size. In the case with equal-sized groups, this symmetry results in |G|! structurally identical solutions. The following set of inequalities avoids symmetric solutions by sorting homogeneous groups in increasing order of the smallest index of their assigned items (Salem and Kieffer (2020)): x 11 = 1 and In objective function (1) as well as in (6), d i j is only counted if both items i and j are assigned to the same group. This means that at the moment when an item i is assigned to a group-for example in a branch-and-bound procedure-, we are not (fully) aware of the consequences. If i is the first item assigned to the group, this has even no immediate influence on the objective value. If i is not the first but also not the last item assigned to the group, we know that certain d a i j values are realized but it is still possible that we have to add a large d a i j value later if the corresponding item j is added to the same group. This might lead to unprofitable decisions at early stages in the branch-and-bound tree. Thus, although objective function (6) is linearized, its quadratic character still influences the search process. We reduce this drawback by a reformulation of the model in Sect. 4.

Formulation by Papenberg and Klau (2021)
The model formulation by Papenberg and Klau (2021) is based on a work by Grötschel and Wakabayashi (1989). We adapt it here for the multi-attribute case. It uses binary variables x i j , i, j ∈ I , i < j, which are one if items i and j are assigned to the same group and zero else. The model formulations is: x i j +x ik −x jk ≤ 1 ∀i, j, k ∈ I : i < j < k Constraints (13)-(15) are transitivity constraints which ensure that two items are in the same group if both of them are in the same group with a third item. Constraints (16) form equalsized groups with |I | |G| items each. Given an item i, it has to be in the same group with |I | |G| − 1 further items to ensure that the group size is |I | |G| . Finally, Constraints (17) are the binary constraints. Note that this model formulation does not contain symmetric solutions, as variables do not have a group index. Note further that |I | · |G| < |I |(|I |−1) 2 is equivalent to 2|G| < |I | − 1. This means that the model formulation by Papenberg and Klau (2021) has more binary variables than the standard formulation if at least three items are assigned to the same group. Moreover, the model is restricted to equal-sized groups. Varying group sizes could be implemented by bounding the left side of (16) to both sides, but then all groups have the same lower and upper bound. If lower and upper bounds for group sizes vary, we would need a set of variables which indicate the assignment of items to groups.

Equal-sized groups
As mentioned above a disadvantage of the problem formulation (1)-(4) is that the share of an item i in the objective function, for example expressed by depends on the decision which of the other items are assigned to the same group.
To overcome this drawback, Schulz (2021b) introduced an assignment of the items to blocks k ∈ K with |K | = |I |/|G| (|K | is the number of items per group) for each attribute according to Assumption 1 Let |G| be the number of groups. Then, the |G| items with the largest attribute values according to the considered attribute are assigned to the first block, the |G| items with the next largest attribute values according to the considered attribute are assigned to the second block, and so on.
Moreover, a binary parameter b a ki is introduced which is 1 if item i is according to attribute a assigned to block k and 0 else. Schulz (2021b) (Theorem 1 in that paper) proved that the set of optimal solutions for (1)-(4) equals the set of feasible solutions for (2), (4), and if blocks are determined according to Assumption 1 and one of the following two criteria is fulfilled: 2. |A| > 1 and the assignment according to Assumption 1 is unique (no two items with identical attribute values are assigned to different blocks regarding the corresponding attribute). Schulz (2021b) proved also that the set of feasible solutions for (2), (4), and (18) might be empty if |A| > 2 (Theorem 3 in that paper).
With the help of the block notation (parameter b a ki ), the optimal objective value for (1)-(4) can be calculated, if the set of feasible solutions for (2), (4), and (18) is not empty (a proof can be found in Schulz (2021b)), as with wherec if |K | is even andc if |K | is odd. Figure 1 illustratesc k (left side of the figure) and c k (right side of the figure) values for small numbers of blocks. The first row states the number of blocks. The numbers below are thec k and c k values, respectively, in increasing order of the block number. If |K | = 2, (21) leads toc 1 =c 2 = 1. If |K | = 3, (22) leads toc 1 = 2,c 2 = 0, andc 3 = 2. It can easily be seen thatc k values are symmetric. The same is true for c k values but with a different algebraic sign.
, it is optimal to multiply the largest c k value, i.e. c 1 , with the largest d a i j value, i.e. the difference of the largest and the smallest attribute value assigned to the group. This means, we assign the item with the largest attribute value to the first block and the item with the smallest attribute value to the last block. If we repeat this with all remaining items until all items are assigned, we get the assignment according to Assumption 1.
Note that the right side of (19) is independent of x ig . Because of (2), item i is assigned to exactly one group such that In (19), the objective value is independent of the assignment of items to groups, which underlines that the set of optimal solutions for (1)-(4) equals the set of feasible solutions for (2), (4), and (18). However, we are only sure that (19) is valid if |A| = 1 or |A| = 2. For |A| > 2, in contrast, we are not sure that (19) is valid (Schulz (2021b), Theorems 2 and 3); there might be no feasible solution for (2), (4), and (18)). The right side of (19) has in comparison to the left side the advantage that the share of item i in the objective value, i.e. a∈A k∈K b a ki c k av a i , is independent of the assignment of all other items. The idea of this paper is to replace b a ki by a variable, i.e. make the assignment of items to blocks to an endogenous decision. By this, we overcome the drawback that we do not know whether there is a solution fulfilling (19) if |A| > 2.
So, we replace b a ki (in combination with x ig ) by a new variable y a kig which is 1 if item i is assigned to group g and according to attribute a to block k and 0 else. This makes the assignment of items to blocks to an endogenous decision, which is only allowed if the assignment is feasible regarding to (2) and (4). This does not only ensure feasibility, the resulting model formulation is, moreover, equivalent to the standard MDGP formulation ((1)-(4); compare Theorem 1).  (24) with the constraints Objective function (24) adds up the shares of all items for all attributes. Although the items are evaluated independently from each other (no d a i j values), they are only at the first glance independent because the assignment of items to blocks depends on the other items assigned to the same group. Constraints (25) ensure that each group and each block gets for each attribute exactly one item assigned according to which the share in the objective value is determined. Moreover, each item is assigned to exactly one block for each attribute (Constraints (26)). Therefore, |K | = |I |/|G| items are assigned to each group. Constraints (26) set in combination with Constraints (27) also the range of the y a kig variables. Their value can be between zero and one if item i is assigned to group g. Otherwise they must be zero. The proof of Theorem 1 shows that there is an optimal solution in which all y variables are zero or one.

Theorem 1 Let |K | = |I |/|G|. Then, the model formulation (2), (4), and (24)-(27) is equivalent to the formulation (1)-(4) in the sense that both model formulations lead to the same objective value for any feasible assignment of the x ig variables.
Proof Let any feasible solution of (1)-(4) be given. Due to (3) exactly |K | = |I |/|G| items are assigned to each group. In the following, we decompose the instance into one instance per combination of attributes and groups, i.e. given a ∈ A and g ∈ G. This reduced instance consists of all items assigned to the fixed group g, i.e. the items with x ig = 1 in the solution of (1)-(4). Since we consider only a single attribute, we know from Schulz (2021b) that (19) holds with b a ki according to Assumption 1. Set y a kig = b a ki with the fixed a ∈ A and g ∈ G.
As b a ki is binary, 0 ≤ y a kig for all i ∈ I with x ig = 1 and k ∈ K (compare (27)). Since the number of blocks equals the number of items in our reduced instance, exactly one item is assigned to each block. Thus, i∈I :x ig =1 y a kig = 1 for all k ∈ K (compare (25)) and k∈K y a kig = x ig = 1 for all i ∈ I with x ig = 1 holds (compare (26)). We can repeat the procedure for every pair (a, g) with a ∈ A and g ∈ G. Together, a∈A g∈G i∈I j∈I :i< j d a i j x ig x jg = a∈A i∈I g∈G k∈K c k av a i y a kig , ⇒ i∈I x ig = |K | = |I |/|G|, as |K | = |I |/|G|, for all g ∈ G in every feasible solution of (2), (4), and (24)-(27). Thus, every assignment of x ig variables which is feasible for (2), (4), and (24)-(27) is also feasible for (1)-(4). Together the theorem follows.
The idea of the proof is that it is still optimal to assign the items assigned to a group in decreasing order of their attribute values to the blocks (compare (23) and the following explanations). This means that, given fixed x ig variables, (1)- (4) and (2), (4), and (24)-(27) are solved optimally if we assign for each attribute and each group g the items assigned to that group, i.e. with x ig = 1, in decreasing order of their attribute values to the blocks (set y a kig accordingly). If (2), (4), and (18) has a feasible solution, y a kig = b a ki · x ig holds in an optimal solution for all i ∈ I , a ∈ A, k ∈ K , and g ∈ G (b a ki according to Assumption 1). If not, a pair of items i and j with b a ki = b a k j = 1 for an a ∈ A and a k ∈ K exists which are assigned to the same group (x ig = x jg = 1 for a g ∈ G). Thus, (18) is not fulfilled for all a ∈ A, g ∈ G, and k ∈ K . In other words, our model (2), (4), and (24)-(27) decides in the assignment of the y variables which block constraints (18) should be violated, if necessary, such that (1) is maximized. Thereby, we avoid the drawback of the block constraints that there might be no feasible solution but still benefit from the formulation on the right side of (19).
Let us consider an example. Let |A| = 3, |I | = 4, and |G| = 2. Let the attribute values be like in Table 1. If we assign the items according to Assumption 1 to blocks, items 1 and 2 are assigned to the first block according to the first attribute, i.e. b 1 11 = b 1 12 = 1 and b 1 23 = b 1 24 = 1. For the second attribute we get b 2 11 = b 2 13 = 1 and b 2 22 = b 2 24 = 1. For the third attribute items 1 and 4 are in the first block, i.e. b 3 11 = b 3 14 = 1 and b 3 22 = b 3 23 = 1. This means that item 1 is for the first attribute with item 2 in one block, for the second attribute with item 3 and for the third attribute with item 4. Thus, there is no feasible solution fulfilling (2), (4), and (18).
There are three possibilities to assign four items to two groups with two items each. They are where the right side equals the sum of pairwise differences over all attributes of the two items assigned to the group ( 3 . Hence, the third solution has the largest objective value such that the model sets x 11 = x 41 = x 22 = x 32 = 1 (beside symmetry) which fulfills (2) and (4).
As we assign two items to each group, c 1 = 1 and c 2 = −1 (compare (20) and (21)). Thus, (24) is maximized if we set y a kig variables such that the item with the larger attribute value of the group is in block one and the item with the smaller attribute value of the group is in block two. Hence, the model sets y 1 111 = y 2 111 = y 3 111 = y 1 122 = y 2 222 = y 3 122 = y 1 232 = y 2 132 = y 3 232 = y 1 241 = y 2 241 = y 3 241 = 1 and all remaining y a kig variables to zero. Then, (25)-(27) are fulfilled and (24) equals which equals the sum in (28). If we compare b a ki and y a kig , y a kig = b a ki · x ig for items 2 and 4 and attribute 3. While item 4 is assigned to block 1 according to Assumption 1, the model assigns item 2 to block 1 (can also be item 3). Thus, the model selects the block constraints (18) which should be violated if necessary (here both block constraints of attribute 3).

Varying group sizes
We adapt the approach of the previous subsection in this subsection to investigate the relaxation of (3) to (5) and obtain our reformulation of (1)- (2) and (4)-(5).
We replace c k andc k in the following by c kk andc kk , respectively, because individual groups may contain a different number of items. This means thatc kk =c k for |K | =k in (21) and (22), respectively, and all k ∈ K . Correspondingly, c kk = c k for |K | =k in (20) and all k ∈ K . Note thatk ≤ max g u g .
Moreover, we introduce the binary variable w gk which is one if group g hask blocks, i.e. g hask assigned items. Variable y a kig is replaced by variableȳ a kigk which is defined continuously between zero and one but is one if item i is assigned to block k in group g and group g has k assigned items. Both variables, w andȳ, are for all g ∈ G only defined for l g ≤k ≤ u g . Moreover, we fixȳ a kigk = 0 for k >k. By this, we get the following model formulation: max a∈A i∈I g∈G l g ≤k≤u gk k=1 c kk av a iȳ a kigk (29) (2), (4) i∈I l g ≤k≤u gȳ a kigk ≤ 1 ∀a ∈ A, g ∈ G, k ∈ K : k ≤ u g (30) The model sets x ig variables to assign items to groups. Thereby, it ensures that at least l g and at most u g items are assigned to group g ∈ G (33). Because of (33)- (35), w gk indicates the number of items in group g. Given w gk , (32) fixesȳ a kigk for all but onek (dependent on g) to zero. So, l g ≤k≤u g (·) includes only one non-zero addend in (29)-(31) such that they set y a kigk in line with (24)-(26) and the argumentation in the proof of Theorem 1.-Note that we fixȳ a kigk = 0 for k >k. Thus, (30) is an equality for k ≤k and the left side of (30) is zero for k >k.-Hence, the largest attribute values within each group are multiplied with the largest c kk values in (29). By this, we are able to prove the following theorem which is an analogon to Theorem 1.

Theorem 2 The model formulation (2), (4), and (29)-(35) is equivalent to the formulation (1)-(2) and (4)-(5) in the sense that both model formulations lead to the same objective value for any feasible assignment of the x ig variables.
Proof Let any feasible solution of (1)-(2) and (4)-(5) be given. In the following, we decompose the instance into one instance per combination of attributes and groups, i.e. given a ∈ A and g ∈ G. This reduced instance consists of all items assigned to the fixed group g, i.e. the items with x ig = 1 in the solution of (1)- (2) and (4)-(5). Since x ig variables are fixed, we can set w gk = 1 fork = i∈I x ig and zero for all otherks. Thus, (33)-(35) are fulfilled. Furthermore, (32) fixes y a kigk to zero for all but onek and all a ∈ A, i ∈ I , g ∈ G, and k ∈ K . Thus,k is fixed in (29)-(31). Given the fixedk, it follows by the same argumentation as in the proof of Theorem 1 that both model formulations, (1)-(2) and (4)-(5) as well as (2), (4), and (29)-(35), have the same objective value for the given solution.
The model formulation (2), (4), and (29)-(35) is clearly a generalization of the setting with equal-sized groups ((2), (4), and (24)-(27)). However, we need a further set of binary variables w gk for the number of items within each group. Thus, there is a further generalization of (2), (4), and (24)-(27) but a special case of (2), (4), and (29)-(35) where we do not need w gk variables. If we set l g = n g = u g , i.e. fix the number of items assigned to each group (not necessarily with n g = n g for all g, g ∈ G), l g ≤k≤u g (·) contains only one addend such that w g,n g = 1 due to (34). So w gk is already determined such that the model can be reformulated to omit w gk variables. Together with w gk ,k is known for each g. Thus,k can be omitted in the definition of y a kigk (compare (32)).

Computational study
This section describes our computational study which is divied into two parts. In the first part, we consider equal-sized groups and compare the model formulations (1) In the second part, we consider the case with varying group sizes and the most general formulations of standard ((1)- (2) and (4)-(5)) and blocks ( (2), (4), and (29)-(35)). The models were implemented in G AM S (version 32.0) and solved by C P L E X (version 12.10).
The standard model was implemented in the linearized version, i.e. we used the formulations (2)- (4) and (6)- (10) and (2) and (4)-(10), respectively. The symmetry breaking constraints (11) were added to the standard model as well as to the blocks model when equal-sized groups are considered. The computational study was executed on a single AMD EPYC 7302 core with 2.99GHz. Section 5.1 describes the composition, Sect. 5.2 the results for the case with equal-sized groups, and Sect. 5.3 the results for the case with varying group sizes.

Composition
We tested the models with instances with 10, 20, 30, 40, 50, 60, 70, and 80 items. All of them were distributed into 2 and 5 groups. Starting with 20 items they were also distributed in 10 groups. 60, 70, and 80 items were further distributed into 15 groups and 80 items into 20 groups. For the case with equal-sized groups we used only those settings where the number of items divided by the number of groups is integer. All settings were tested in 30 runs each for 3, 5, and 10 attributes. Attribute values were determined according to a [0,1] uniform distribution. Considering [0,1] values is no restriction, as we are only interested in their absolute differences and each other interval can be normalized to [0,1] without changing the relation of two attribute values. Before we determined the range for the group sizes (l g and u g ) for the second part of the computational study, we distributed the items randomly to the groups according to the following procedure to ensure that there is a feasible assignment: First, we draw |G| − 1 uniform integers between 1 and |I | − |G|. They were sorted in increasing order and we computed the differences between 0 and the first, the first and the second, and so on and added 1 to all of them. Moreover, we computed the difference between the sum of them and |I |. Thereby, we ensured that each difference is positive. In total, this leads to |G| numbers which sum up to |I |. Finally, we set n 1 equal to the first of them, n 2 equal to the second, and so on. Afterwards, we determined l g and u g . We set l g = max(1, n g − U 0, n g · 0.05 ) and u g = n g + U 0, n g · 0.05 , i.e. the group size interval is determined uniformly and bounded by about 10% around n g . We interrupted the search after 30 minutes (1800 seconds) if no solution was proved to be optimal like it was done by Gallego et al. (2013).

Results (equal-sized groups)
Tables 2-4 present the results with equal-sized groups split according to the number of attributes. The tables are structured as follows: The first two columns indicate the parameter setting. The next six columns show the results for the blocks model (average objective value, average computation time in seconds, average gap in percent, average objective value of the relaxed model, number of feasible solutions found, and number of optimal solutions found). The next six columns present the same classifications numbers for the standard model, the last six for the model by Papenberg and Klau. Note that average objective values contain only instances where the corresponding model found a feasible solution. Average gaps contain only instances where the corresponding model found a feasible but no proven optimal solution. Table 2 presents the results for three attributes. The model formulations blocks and standard found for all instances a feasible solution. The formulation by Papenberg and Klau found also for all settings beside the one with 70 items and 5 groups a feasible solution. However, for the setting with 70 items and 5 groups no feasible solution was found. Having a closer look on the problem setting, finding a feasible solution is easy because any assignment with exactly |I |/|G| items in each group is feasible. The reason why we have not found a feasible solution for the model formulation by Papenberg and Klau seems to be that CPLEX used the whole 1800s in the root node such that no solution was detected (also for five and ten attributes).
blocks terminated for almost all instances up to 60 items with a proven optimal solution within 1800s and terminated also for larger instances often with a proven optimal solution before the time limit was reached. The two other formulations, however, found only optimal solutions for small instance sizes up to 20 (standard) and 30 items (Papenberg and Klau), respectively. Both models reached comparable results regarding the number of optimally solved instances.
Considering the solution gap for the instances for which we found feasible solutions but could not prove optimality, the model formulation by Papenberg and Klau clearly outperformed the standard formulation. For the blocks model the gap was almost zero for all instances which could not be solved to proven optimality. The required computation times confirm the results for the three model formulations.
In total, we can conclude that the blocks model leads for instances with three attributes to the best results in comparison with the standard model and the formulation by Papenberg and Klau. If we solve the blocks model but relax (4), i.e. with 0 ≤ x ig ≤ 1 for all i ∈ I and g ∈ G instead of (4), we find a reason for it. Column 6 shows the average objective values for the relaxed model of blocks, column 12 the corresponding average objective values for the standard model and column 18 for the formulation by Papenberg and Klau (relaxing (17)). Relaxing blocks leads to substantially better upper bounds for the objective value than relaxing standard. Relaxing the formulation by Papenberg and Klau leads to slightly worse upper bounds than the blocks formulation. However, the difference between the average optimal objective value and the average optimal objective value of the relaxed model is still large. Thus, we can assume that we have to fix a large number of x ig variables to zero or one before we get tight upper bounds by relaxing the remaining binary constraints in a branchand-bound procedure. This may also explain why larger instances could not be solved reliably to optimality. Tables 3 and 4 show the results for five and ten attributes, respectively. For the standard model the results are similar for all three numbers of attributes. The performance of the blocks model decreases. The larger the number of attributes is the lower is the number of solutions where we proved optimality within 1800s. However, the gap is still comparably small with under 5% on average. In contrast, the formulation by Papenberg and Klau leads to better results regarding the number of proven optimal solutions found and the gap the larger the number of attributes is. Interestingly, the model by Papenberg and Klau requires less computation time if the number of attributes increases. For both of the other models computation times increase if the number of attributes increases. Because of this the model by Papenberg and Klau is faster than blocks for some settings with up to 30 items-especially if the number of attributes increases.

Results (varying group sizes)
Tables 5-7 present the results regarding the setting with varying group sizes. Analogous to Sect. 5.2 the tables are split according to the number of attributes. The tables are structured as in Sect. 5.2. As the formulation by Papenberg and Klau works only for equal-sized groups, we consider only the blocks and the standard formulation in this subsection. Table 5 presents the results for three attributes. Both model formulations found for almost all instances a feasible solution but the blocks model found clearly more proven optimal solutions. For up to 50 items even all instances could be solved to optimality while standard could only solve instances with two groups or 10 items to optimality. This results also in smaller computation times for the blocks model in comparison with the standard model. Moreover, the blocks formulation reached small gaps after 1800s for up to 70 items and gaps up to 8.1% on average for instances with up to 80 items. Table 6 examines the performance of the models for five attributes. For five attributes blocks still managed to find feasible solutions for almost all instances but had trouble in proving optimality for larger numbers of groups (at least 5) in combination with larger numbers of items (at least 40). From 60 items on the model further struggled with instances with two groups. Nevertheless, there is again a transition where the model found less optimal solutions but reached small gaps (40 or 50 items and 5 or 10 groups, and 60, 70 or 80 items and 2 or 5 groups).
The trend confirms for ten attributes (Table 7). Although blocks still found in at least 75% of the instances a feasible solution in each setting (23 out of 30 for 80 items and 2 groups), it found only for small instances up to 20 items always a proven optimal solution. Accordingly, the gaps increased for larger instances. However, there is again a transition where the model did not manage to find proven optimal solutions for all instances but reached small gaps (25 items and 10 groups, 30 or 40 items and 5 groups, and 50 items and 2 groups). The development of the solution quality for standard is similar as in the setting with equal-sized groups although the model is able to prove some solutions to be optimal for larger instances with two groups. A reason could be that l g and u g and therefore the group size is determined randomly such that some instances might be particularly easy to solve if one of the groups has only a small number of items assigned while the other one has a large number of assigned items. In comparison with the setting with equal-sized groups, the blocks model had more difficulties with the setting with varying group sizes which results in less instances which could be solved to proven optimality within 1800s and larger gaps for those which could not be solved to proven optimality.

Conclusion
We introduced a new mixed-integer programming formulation for the MDGP with attribute values. As common applications like the assignment of students to groups (grades, gender, international or not) or surgery scheduling (surgery duration) use attribute values, this is an interesting special case of the MDGP. Nevertheless, using the Manhattan metric is a limitation of our study. Our MIP clearly outperforms comparable approaches for instances with attribute values. Moreover, it is able to solve instances of realistic size, for example in the assignment of students to seminars-e.g. 40 students assigned to two seminar groups or 20 students within a seminar assigned to working groups with two or four students each-in reasonable time to (near) optimality. As our model formulation is able to solve larger instances to near optimality, the model can help to evaluate heuristic approaches for the general MDGP by testing them on the version with attribute values even for a larger number of attributes.
This paper shows that it is worth to have a closer look on the data structure of the MDGP. Thus, a direction for future research may also be to investigate how instances with binary or integer attributes or other special data structures and their combinations over several attributes, which occur in practical applications, can be tackled by made-to-measure solution approaches.
Funding Open Access funding enabled and organized by Projekt DEAL. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Availability of data and material
The test instances were generated by the author in G AM S.

Declaration
Conflicts of interest/Competing interests There are no interests to declare.

Code availability
The G AM S implementation is saved on a server of the University of Hamburg.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.