Homogeneous grouping of non-prime steel products for online auctions: a case study

Not all products meet customers’ quality expectations after the steelmaking process. Some of them, labelled as ‘non-prime’ products, are sold in a periodic online auction. These products need to be grouped into the smallest feasible number of bundles as homogeneous as possible, as this increases the attractiveness of the bundles and hence their selling prices. This results in a highly complex optimisation problem, also conditioned by other requirements, with large economic implications. It may be interpreted as a variant of the well-known bin packing problem. In this article, we formalise it mathematically by studying the real problem faced by a multinational in the steel industry. We also propose a structured, three-stage solution procedure: (i) initial division of the products according to their characteristics; (ii) cluster analysis; and (iii) allocation of products to bundles via optimisation methods. In the last stage, we implement three heuristic algorithms: FIFO, greedy, and distance-based. Building on previous works, we develop 80 test instances, which we use to compare the heuristics. We observe that the greedy algorithm generally outperforms its competitors; however, the distance-based one proves to be more appropriate for large sets of products. Last, we apply the proposed solution procedure to real-world datasets and discuss the benefits obtained by the organisation.

needs to consider several product characteristics and operational restrictions. Well-designed bundles are more attractive in the eyes of bidders (potential customers); thus, the problem has considerable effects on the selling prices. In large, multinational steel companies, the auction generally takes place on a periodic basis with extensive amounts of non-prime products from many origins. This makes the optimisation problem computationally very demanding. All in all, we highlight that the grouping decision is a complex optimisation problem that is central to the auction process, and providing an effective solution strategy would arguably result in substantial benefits.
From an algorithmic perspective, this optimisation problem may be interpreted as a variant of the bin packing problem (BPP) (e.g. Coffman et al., 2013;Taylor et al., 2017;Yao, 1980), where a set of products of different volumes have to be allocated into the smallest possible number of bins of a given capacity. In terms of computational complexity, this is an NP-hard problem (Berlińska, 2020;Garey & Johnson, 1979). Therefore, finding optimal solutions is frequently impractical and approximate algorithms are commonly used, which are often able to provide high-quality solutions in reasonably low computational times.
In this article, we address the grouping problem of non-prime products by focusing on a specific case study in the steel sector. We examine the relevant characteristics of the problem under study, highlighting the distinctive features in comparison with the general BPP. We argue that it represents a new BPP model that has not been studied in prior literature. Then, we discuss different methodological approaches to solve this problem, from which we suggest a three-stage solution procedure. Our structured approach integrates a variety of methods, through which we deal with the complexity of the problem. The third stage, which optimises the allocation of products to bundles, incorporates three heuristic algorithms. We compare their performance, in terms of solution quality and computation time, through different experiments.
The remaining of this article is structured as follows. Section 2 describes in detail the specific problem under consideration, emphasising its economic relevance. We highlight the constraints that must be satisfied and the additional requirements that good solutions should meet. Sections 3 and 4 review the BPP and clustering streams of literature, respectively, which provide the theoretical background for our study. Section 5 formalises the optimisation problem mathematically. Section 6 develops our solution strategy, providing in-depth information on its structure and the different algorithms. Section 7 describes the 80 test instances developed for our study, which are built on existing datasets. Section 8 shows and discusses the results obtained by our solution procedure in these instances. Section 9 analyses the application of our solution strategy to the real-world problem under study. Finally, Sect. 10 concludes and suggests interesting avenues for future research.

Problem description
The steel industry is a pillar of the global economy. Steel is a fundamental component of many other prominent sectors, including construction, automotive, transportation, energy, and food. This is thanks to its excellent mechanical and structural properties along with a relatively low production cost. In addition, this material is environmentally friendly, as it can be reused almost infinitely and is 100% recyclable (Johnson et al., 2008). Furthermore, it is highly available due to iron being a very abundant metal; it makes up more than 5% of the Earth's crust and most of its core, with existing iron ore mines all over the world. Steels can be classified according to their chemical composition and/or physical properties, with over 2000 steel grades existing nowadays (Thelning, 2013). Broadly speaking, steel products can be categorised into flat and long products. In the production of high-quality steel products, generally flat, some outputs do not satisfy customer requirements due to different motives. These products, which are referred to as non-prime, cannot be sold to their original customer, thus being unassigned to the prime order.
In this work, we consider the problem faced by a large steel company in their operations in Europe, which could be extrapolated to other enterprises in this industry. This company first offers non-prime products to affiliated companies and regular customers, often in the local regions of their plants. Unsold non-prime products go to a weekly online auction, where around 20 plants from all over Europe participate together with about 300 regular bidders. Approximately 500,000 tons of flat steel products are sold yearly through this mechanism. This corresponds to roughly 10,000 tons per week. This means that several thousands of non-prime products with very different weights are auctioned on a weekly basis. Products are only removed from the auction list to be reused in the manufacturing processes if they have received no offers after several weeks.
Due to the high volume of products, the auction is a fundamental part of the sales process. Indeed, it has a considerable impact on the financial performance of the organisation. Nonprime products are not auctioned individually, but they are grouped into bundles, which need to be carefully designed prior to the auction. All bundles are auctioned at the same time, with the auction being held in the form of a first-price sealed-bid auction with a reserve price. The bundles, with different capacities (which depend on the plant that offers the bundle), have to be as homogeneous as possible. This makes them more attractive to bidders, which increases the selling price, and hence the profits of the company. This emphasises the importance of the grouping problem. Also, large sets of non-prime products are often more attractive for bidders and small sets are generally expensive to deliver, so the number of bundles should be minimised for the sake of profitability. In this fashion, a minimum weight requirement is defined for each bundle.
Each non-prime product is defined by a set of parameters that refer to its characteristics and properties. Due to reasons of different nature, all the products in the same bundle need to have some parameters in common, which will be referred to as 'global parameters'. These are: manufacturing plant, location, category, family, and coating sides. For the other relevant parameters, which are named 'local parameters', there may be some discrepancies within the same bundle, but these should be minimised to keep the bundles as homogeneous as possible. The local parameters are: subfamily, steel grade, oiling, weight, width, thickness, and coating thickness (on one or both sides). Not all customers prioritise the same parameters, which must be considered in the solution procedure.
A key aspect to mention is the limited time available to design and prepare the bundles for the auction. This involves all the necessary tasks from updating the list of non-prime products to uploading the bundles' information to the online platform. In between, our solution strategy will need to be applied to the grouping problem, and the execution time should be reasonable. Therefore, the algorithms employed do not only need to be accurate but also time-efficient.
Finally, we summarise the main features and criteria of the grouping problem under consideration: • Non-prime products have to be grouped into bundles, considering a set of global parameters that all the products in each bundle must have in common and a set of local parameters that determine the degree of homogeneity of the bundle, which needs to be maximised.
• The degree of homogeneity needs to be defined in a flexible manner as a function of the local parameters, given that the parameters that should be prioritised may vary across bundles. • The number of bundles and the number of unassigned non-prime products should be minimised simultaneously. In doing so, the weight of each bundle needs to be as close as possible to its capacity, which depends on the plant where the non-prime products were manufactured. Also, the weight of each bundle must be compliant with a minimum requirement. • Despite the computational complexity of the optimisation problem (up to around 5000 products per auction), the algorithm must be able to provide a good solution in a reasonable amount of time.

Theoretical background (I): the bin packing problem
The grouping problem that we formulated in the previous section may be encapsulated within the theoretical framework defined by the BPP. There are many variants of this well-known, industrially relevant problem in the operations management discipline. Coffman et al. (2013) categorised them into four main classes: (i) dual versions; (ii) variations on bin sizes; (iii) variations on item packing; and (iv) additional conditions. First, dual versions address different objectives, such as the maximisation of the total number of items that are packed in a fixed number of bins (Azar et al., 2002) and the maximisation of the number of bins that are used for packing all the items with each bin weighing at least a predefined value (Coffman et al., 1987). Second, variations on bin sizes cover different models of variable-sized bins (e.g. Kang & Park, 2003;Liu et al., 2021) and problems with resource augmentation or bin stretching (e.g. Csirik & Woeginger, 1998;Dhahbi et al., 2021). Third, variations on item packing comprise a wide range of variants, such as the dynamic BPP (Coffman et al., 1983), selfish BPP (Bilò, 2006), BPP with rejection (Dósa & He, 2006), and BPP with fragile objects (Chan et al., 2007). Fourth, additional conditions include item-size restrictions (Gutin et al., 2006), item types (Adler et al., 2002), constrained cardinality (Kellerer & Pferschy, 1999), constrained distances (Beaumont et al., 2010), partial orders (Wee & Magazine, 1982), conflicts (Epstein & Levin, 2008), and compatible categories (Santos et al., 2019), among others.
Finally, we note that some authors developed generalisations of the BPP. For instance, we refer to the multi-dimensional BPP, in which items need to be packed according to more than one dimension, such as width and height (e.g. Dahmani et al., 2015;Polyakovskiy & M'Hallah, 2021), and the generalised BPP, which is characterised by multiple items and bins attributes (e.g. Baldi & Bruglieri, 2017;Baldi et al., 2019).
None of these variants matches exactly the requirements and captures all the complexities of our case study. Rather, our grouping problem combines characteristics of different BPP variants. We highlight three: • Conflicts. Products that have differences in the global parameters cannot be grouped together. • Constrained distances. Homogeneity in the items of each bin (a bundle of non-prime steel products) is rewarded. The distance between items is measured through the differences in the local parameters. • Minimum weight requirements. Each bin needs to be compliant with a minimum weight established.
Our solution strategy, which we present in Sect. 6, deals with conflicts by splitting the input data in accordance with the global parameters. Thereby, P sub-problems emerge. These are solved via algorithms that need to consider both the constrained distances and minimum weight requirements. Next, we briefly review the literature on algorithms for BPPs.

Algorithms for bin packing problems
Algorithms for BPPs can be broadly classified into online and offline (Coffman et al., 2013). The former assign items to bins as items appear. Hence, the bin for each item is selected without knowledge of the following items. Traditional online algorithms include: next-fit (Johnson, 1973), first-fit (Johnson, 1973), worst-fit (Johnson, 1974a), refined first-fit (Yao, 1980), best-fit (Falkenauer & Delchambre, 1992), and bounded-space (Csirik & Johnson, 2001). Nonetheless, this area is in continuous development, and many other online algorithms have been proposed to solve different variants of the BPP, such as the recent articles by Verma et al. (2020), Balogh et al. (2020), and Epstein and Mualem (2021).
In contrast, offline algorithms have full information about the list of items and use it to make the overall allocation. The most popular offline algorithms are those with presorting (i.e. reordering algorithms), such as the next-fit decreasing (Baker & Coffman, 1981), firstfit decreasing (FFD) (Baker, 1985), refined fist-fit decreasing (Yao, 1980), and best two-fit (Friesen & Langston, 1991). Their time complexity is at least O(n · logn). Other algorithms propose solutions faster, without presorting, such as the Group-X-Fit Grouped (Johnson, 1974b) and H 7 (Békési et al., 2000). More advanced methods have been recently applied to solve BPPs. For instance, Abdel-Basset et al. (2018) proposed a whale optimisation algorithm; Abdul-Minaam et al. (2020) developed an adaptive fitness-dependent optimiser with swarm intelligence; and Munien et al. (2020) implemented two hybrid metaheuristics (hybrid cuckoo search genetic algorithm and mutated firefly algorithms). These prior works focused on one-dimensional BPPs, while others developed metaheuristics for two-dimensional packing problems, such as Grandcolas and Pain-Barre (2021).
In between both extremes, semi-online algorithms do not have information about the complete list of items (like in offline ones) but have more information than pure online ones. In this sense, repacking items is sometimes allowed. This leads to different algorithms, such as the buffered next-fit (Galambos, 1985), which uses two open bins, and REP 3 (Galambos & Woeginger, 1993), with three open bins. In other cases, the algorithm is allowed to look ahead to later items, such as in the revised warehouse (Grove, 1995).
In the following subsection, we focus on algorithms that have been used for solving BPPs with conflicts, in which items in conflict cannot be assigned to the same bin, due to the similarities with our problem noted above.

Methodological approaches for bin packing problems with conflicts
The BPP with conflicts is an NP-hard optimisation problem that can be interpreted as a combination of the BPP and the vertex colouring problem (e.g. Diaby, 2010). Different methods have been used in the literature to approach this problem, which are briefly discussed below.
• Asymptotic approximation scheme. This refers to a family of algorithms where, for all ε > 0, there is an algorithm of performance ratio at most 1 + ε, where the running time is polynomial in n . Given a set of items V {1,…,n} and a conflict graph G (V ,E), Jansen (1999) proposed this scheme for a BPP with conflicts restricted to d-inductive graphs with constant d. • Modified FFD algorithm. Gendreau et al. (2004) adapted the conventional FFD algorithm to consider the conflicts. Specifically, in this algorithm (H1), the items are assigned to the first bin in which there is enough capacity without incurring conflicts with the already assigned items. • Graph colouring algorithms. Gendreau et al. (2004) developed three heuristics (H2, H3, H4) that make use of a conflict graph G. They colour the vertices of V through the DSatur heuristic (Brélaz, 1979). H2 creates sets of mutually non-conflicting items and applies the FFD algorithm to each set. H3 uses the same rationale after removing the less conflictive items, which are later assigned to bins with the modified FFD algorithm. H4 is based on the iterated use of the FFD algorithm. • Clique-based algorithms. Gendreau et al. (2004) proposed two heuristics (H5, H6) based on cliques determined through Johnson's (1974a) greedy heuristic. H5 uses a clique of nonconflicting items, which are then assigned to bins. H6 uses a clique of conflicting items, which are considered for computing cliques of non-conflicting items. This inspired Maiza and Radjef (2011) in their MSS-based heuristic, which converts the BPP with conflicts to a set of sub-problems without conflicts. • Greedy-approximation algorithms. Beaumont et al. (2010) showed that the BPP with constrained distances could be transformed into a BPP with conflicts. Their solution is based on the algorithm developed by Epstein and Levin (2008), using an FFD algorithm to fill the bins. • Adapted minimum bin slack (MBS) heuristic. Maiza and Radjef (2011) extended the MBS algorithm developed by Gupta and Ho (1999) for BPPs to the case with conflicts. This heuristic executes the MBS procedure with a compatibility test of the current item with those already considered. • Branch-and-price algorithm. Elhedhli et al. (2011) solved the BPP with conflicts by means of this algorithm. First, they employ a branching rule to match the conflicting constraints. Later, they create maximal clique valid inequalities according to these constraints. Similar approaches were used by other authors, such as Sadykov and Vanderberck (2013). • Iterated local search (ILS) metaheuristic. To solve the BPP with conflicts, Capua et al. (2018) developed an ILS, with several classes of local and large neighbourhoods for solution improvement. • Sequential maximum degree packing heuristic. Ekici (2021) proposed this algorithm for BPPs with conflicts and item fragmentation. The key idea of this approach is to start the allocation with the items with the highest number of conflicts, as these are the most problematic ones.
All these algorithms used for BPPs, and particularly BPPs with conflicts, are heuristics or metaheuristics that look for near-optimal solutions due to their complexity, as discussed by prior works (e.g. Asta et al., 2016;Fernandez et al., 2013;López-Camacho et al., 2013). In our case, the complexity may be even higher because of the interactions of conflicts with the other characteristics of the grouping problem of non-prime steel products (restricted distances, minimum weight) and the size of the problem (number of products, set of parameters, different weights, etc.). Also, the solution needs to be provided in a short period of time.

Theoretical background (II): clustering
In the light of previous discussions, cluster analysis (Duran & Odell, 1974;Farnè & Vouldis, 2021) emerges as an interesting approach for grouping similar non-prime products and separating those very different in an attempt to reduce computational requirements. This section briefly reviews the main set of techniques that are used for clustering, based on the categorisation by Xu and Wunsch (2005).
• Hierarchical clustering algorithms structure data in a hierarchical manner according to a proximity matrix (Kou & Lou, 2012). They may be divided into agglomerative algorithms, such as BIRCH and CURE, and divisive algorithms, such as MONA and DIANA (Kaufman & Rousseeuw, 2005). • Squared error-based clustering assigns the objects into non-hierarchical clusters by using the sum of squared error criterion function. The most popular algorithm is the traditional k-means algorithm (Morissette & Chartier, 2013), but many other examples can be found; see e.g. Yu et al. (2018). • Combinatorial clustering defines the problem via an objective function that aims to allocate the objects into clusters by optimising a criterion function (Kim et al., 2017). As this is computationally demanding, metaheuristics are generally used to achieve high-quality solutions (Levin, 2015). • Graph-based clustering uses graph-theoretical concepts and properties to create hierarchical or non-hierarchical clusters. There are different clustering approaches based on graphs, including spectral clustering (Hendrickson & Leland, 1995), dynamic modelling (Karypis et al., 1999), and density peaks (Xu et al., 2021). We also highlight the work by Kawaji et al. (2004), who developed an algorithm for the clustering of a large set of proteins that finds distantly-related proteins. Vertices of the graph denote proteins, and edges denote their similarity. The graph is partitioned repetitively by removing edges with small weights (dissimilar proteins), achieving promising results. • Fuzzy clustering algorithms may assign an object to several clusters simultaneously, with different degrees of membership (Sakawa, 2013), unlike the previous approaches. The most popular fuzzy clustering algorithm is the Fuzzy c-means algorithm; see Pantula et al. (2020). • Other clustering approaches include mixture densities (Chacón, 2019), neural networks (Du, 2010), and kernel-based clustering (Piciarelli et al., 2013). Also, several interesting approaches have been recently proposed for clustering, such as self-organising features maps (Li et al., 2020a), adaptive hyper-spheres (Li et al., 2020b), and dual iterative local search (González-Almagro et al., 2020).
We conclude our review by highlighting that graph-based clustering facilitates the representation of the problem and allows for a rapid generation of clusters with similar characteristics. Therefore, it provides an interesting approach from which to address the grouping problem under consideration. In Sect. 6, we describe how clustering fits within our solution strategy, together with the other elements.

Mathematical formalisation of the problem
In line with the description of the problem in Sect. 2 and the discussion of relevant background in Sects. 3 and 4, we now formalise the grouping problem of non-prime steel products mathematically.
The restriction on the global parameters (i.e. they need to be equal for all products in a bundle) will be dealt with by dividing the initial set of products into P subsets, thus resulting in P sub-problems. Each of them can be interpreted as a BPP with restricted distances, which emerge as homogeneity in the local parameters is rewarded, and minimum weight requirements. In this sense, it is important to note that the goal of the problem is not only to minimise the bundles used, and the number of non-prime steel products unassigned, but also to make the bundles as homogeneous as possible. We call it the 'homogeneous bin packing problem with minimum weight requirements' (HBPPMWR).

Notation
In the mathematical formulation of the problem, we use the following indices: • i refers to items (non-prime steel products), i 1, 2, . . . , n, where n is the number of items, • h refers to bins (bundles), h 1, 2, . . . , m, where m is the number of bins; the following decision variables: • y h is a binary variable, with y h 1 indicating that bin h is used in the proposed assignment, and y h 0 otherwise (note: y h results in a row vector of m binary variables), • x ih is a binary variable, with x ih 1 indicating that item i is assigned to batch h, and x ih 0 otherwise (note: x ih results in a matrix of n × m binary variables); and the following parameters: • w i is a set (row vector) of n binary variables, with w i denoting the weight of item i, • C is the capacity of the bins, • w min is the minimum allowed weight of the bins, • d max is the maximum allowed distance between any pair of items in a bin.
Finally, the variable u refers to the number of items that have not been allocated to any bin in a proposed assignment, and d h refers to a variable that includes the maximum distance between items within batch h for the proposed assignment. Thus, u and d h depend on the decision variables y h and x ih .

Optimisation problem
Mathematically, the optimisation problem for a specific set of products with the same global parameters can be expressed as follows: Equation (1) defines the objective function. This is a cost function that attempts to minimise the number of bins used, the heterogeneity of these bins, and the number of unassigned items. In this equation, z y , z d , and z u (such that z y +z d +z u 1) is the weight given to each criterion, depending on the prioritisation strategy. Constraint (2) ensures that each item is assigned to at most one bin. Constraint (3) makes sure that the total weight of each bin does not exceed its capacity. Constraint (4) ensures that the minimum weight is achieved by the sum of items in the bin. Constraint (5) ensures that the degree of homogeneity of all bundles fulfils the requirements. Last, constraints (6) and (7) ensure that the variables y h and x ih have binary values.

Solution procedure
Given the complexity of the problem under study, our solution strategy for the HBPPMWR aims to achieve near-optimal solutions in a time-efficient manner. Figure 1 provides an overview of the procedure, which is composed of three main stages: (1) Initial division of the products according to the global parameters.
(2) Cluster analysis of the products based on similarity in the local parameters.
(3) Optimisation-based allocation of non-prime steel products to bundles.
This three-stage solution procedure fits well with the nature of the industrial problem under consideration. It allows the users to easily fine-tune the model for each specific sub-problem (w i , C, w min , d max , z y , z d , and z u differ in the various subsets due to the different resources and requirements of each plant and family of products, among others). This procedure also adds transparency to the allocation problem, facilitating that managers understand the generation of the final bundles of non-prime steel products. Each of the stages that characterise the proposed solution strategy is developed in depth below.

Initial division
After receiving the data with the non-prime products of the different plants, the overall dataset is divided into P subsets, such that all the products in each one have the same global parameters. This stage ensures that this fundamental restriction will be satisfied in all the bundles of the final solution. We thus deal with conflicts by dividing the overall problem into smaller ones without conflicts, which is in line with some of the approaches described in Sect. 3. This stage also reduces considerably the computational complexity faced by the clustering and optimisation algorithms in the following phases of the solution procedure.

Cluster analysis
In each of the P subsets with the same global parameters, clustering is applied to guide the optimisation algorithm towards more homogeneous and time-efficient solutions. Clustering is performed according to the local parameters, whose distance should be minimised in each bundle. Following the discussion in Sect. 4, we adopt a graph-based approach. Each item is represented by a node, which is linked to the other items in the same subset by means of undirected edges characterised by their distances d i j (where i and j represent the nodes under consideration). Next, we describe how such distances, which measure the similarity between products, are computed.

Definition of distance
We consider an eight-dimensional space, where the dimensions refer to the eight local parameters of our homogeneous grouping problem (i.e. subfamily, steel grade, oiling, weight, width, thickness, and two coating thicknesses).
Then, we use a weighted Euclidean distance to measure the difference between two items. On the one hand, we selected the Euclidean distance, instead of other alternatives, as it is a common and recommended practice for measuring the distance between items when parameters of different nature are involved, as in our case (Kou et al., 2014;Xu & Wunsch, 2005). On the other hand, weighing the differences in the various parameters allows us to define the degree of homogeneity in a flexible manner, which is a requirement of our real-world problem, as discussed before. To define the importance of homogeneity in each parameter, we use weights, γ t , with t {1, 2, . . . , 8}. These weights can be configured between 0.1 and 10; therefore, they cover differences of importance of up to two orders of magnitude. These weights have been rescaled through geometric means, thus resulting in the normalised weights γ t γ t / 8 8 q 1 γ q . Finally, we note that we have normalised the five numerical parameters, namely, weight, width, thickness, and the two coating thicknesses. Also, we have transformed the three categorical parameters, that is, subfamily, steel grade, and oiling, into a numerical (and normalised) format. We explain how we have normalised and transformed the different parameters in the following subsection.
Taking all the above into consideration, the distance between items i and j, d i j , can be obtained via where ϕ t,i and ϕ t, j refer to the normalised values of parameter t for items i and j, respectively.

Normalisation and transformation
Normalising the numerical parameters is essential to ensure the robustness of the distance measurements. In this sense, we have rescaled the values of the parameters into a range of [0,1]. Specifically, we have used where ϕ t,i denotes the actual value of parameter t for item i, and ϕ t,max and ϕ t,min denote the maximum and minimum values, respectively, of parameter t in the subset under consideration. For the categorical parameters, it was necessary to transform their distances into a numerical format, as discussed before. To this end, we have assigned the distance values, ϕ t,i − ϕ t, j , ranging between 0 and 1 based on the evaluations of experts in the different processes. They considered the degree of similarities between each pair of categorical levels for each parameter.
By way of example, we focus on the parameter 'subfamily'. In this case, the experts agreed to define three levels of distances: 0 (no distance), 0.1 (low), and 1 (high). For the organic coating (OC) product family, there are eight subfamilies, characterised by the following labels: OCR, OCH, OAS, OAZ, OZ, OZA, OZE, and OZO. The ideal is to bundle together products of the same subfamily (ϕ t,i − ϕ t, j 0). Nonetheless, products of subfamilies OCR and OAH can also be included in the same bundle at a low cost (ϕ t,i − ϕ t, j 0.1). The same happens with products of subfamilies OAS, OAZ, OZ, OZA, OZE, and OZO (ϕ t,i − ϕ t, j 0.1). However, combining products of both groups in the same bundle (e.g. OCR and OAS, or OAH and OZ) is not desirable, as this would reduce the attractiveness of bundles to potential customers (ϕ t,i − ϕ t, j 1). This information is summarised in Table 1.
Although we do not include all tables here for the sake of brevity, we clarify that the same methodological approach has been followed for the rest of the product families, as well as for the other two categorical parameters. In the case of the steel grade, we have also used three levels of distances with the aim of promoting the bundling of products with the same (ϕ t,i − ϕ t, j 0) or similar steel grades (ϕ t,i − ϕ t, j 0.1) rather than those with highly different steel grades (ϕ t,i − ϕ t, j 1). In contrast, in the case of oiling, we only use two values: ϕ t,i − ϕ t, j 1 if one product went through oiling but the other did not; and ϕ t,i − ϕ t, j 0 if both or none of them went through this oxidation prevention process.

Clustering algorithm
In line with the discussion in Sect. 4, we follow a graph-based approach to generate clusters (groups) of non-prime steel products with low distances among them. To this end, we gathered  Hendrickson and Leland (1995), who use spectral graphs, Zhou et al. (2016), who implement a weighted summation, and Kawaji et al. (2004), who repeatedly partition the graph by removing edges with low similarities, among others. The clustering algorithm, which operates in each of the P subsets by means of a specific function (SPLIT), follows the sequence of events described in Fig. 2. In this sense, the operation of the clustering algorithm is based on four steps that are executed as follows: 1. An initial undirected graph is generated, which represents products in the form of nodes.
These are linked by edges, characterised by their distance d i j (calculated as explained before). The undirected graph is created such that the distances are lower than a predefined threshold, ϑ, i.e. d i j ≤ ϑ∀i, j. 2. The initial graph is divided into several subgraphs by considering the connected components. In this sense, we generate initial clusters that are formed by relatively similar non-prime steel products. a. If the number of items in the initial cluster is within the limits, we create a cluster. b. If it is lower than desired (i.e. < MinSi ze), the initial cluster is dissolved and its products are defined as 'singles'. c. If it is higher than desired (i.e. > Max Si ze), the threshold is readjusted as follows: ϑ * τ · ϑ, where τ < 1 is the step parameter. Then, the sequence starts again for the products in this large cluster. As the new threshold is more restrictive (τ < 1 ⇒ ϑ * < ϑ), this cluster will tend to generate several initial clusters of a smaller size. If necessary, this occurs recursively until the threshold becomes lower than 10 −6 (when this occurs, the cluster is formed).
4. Singles are regrouped into the clusters created, when this is possible. To this end, a new threshold is defined with the geometric mean of the last two thresholds, i.e. ϑ * * √ ϑ · ϑ * . Then, we evaluate if ϑ * * (ϑ * < ϑ * * < ϑ) allows for the incorporation of any of the singles to the new cluster created.

Allocation of products to bundles
For each cluster, the products need to be allocated to a set of bundles, taking into consideration both their minimum allowed weight and their capacity, which are introduced as inputs, and making sure that they are as homogeneous as possible. In order to provide the best possible solution for the online auction, the algorithm creates many assignments using different strategies, which are described in the following subsection, and finally selects the best one according to a user-defined fitness function (based on Eq. 1).

Algorithm structure
The algorithm, which has to be executed several times before each auction, needs to provide a high-quality allocation of the non-prime steel products in a short amount of time. In line with the description in Sect. 5, it will aim to minimise the number of bundles employed, the differences in the items of each bundle, and the number of unassigned items, according to the weights assigned by the user.
The structure of the algorithm, which has been implemented in a specific function (CRE-ATE_BUNDLES), is summarised in Fig. 3. There are three main phases: 1. Item sorting. The products in each cluster first need to be sorted according to a predefined criterion. We consider three methods: random sorting (used L-2 times per call to the function, where L is a decision parameter that considers the replications of each heuristic method), largest-first sorting (1 time per call), and smallest-first sorting (1 time per call). 2. Item allocation. Now the products are allocated to bundles. To do this, we implement three heuristics: FIFO, greedy, and distance-based. The traditional models have been adapted to accommodate the homogeneity requirements, as we will discuss in the next subsection. 3. Fitness evaluation. Once the L solutions of each algorithm have been generated, all the allocations (3L) are analysed, and the one that provides the lowest cost, according to Eq. (1), is selected. This allocation is proposed to be used in the online auction.

Heuristic techniques
The first heuristic algorithm adopts a FIFO (first-in-first-out) approach. It assigns items to bundles by going over the list of items in sequential order. If the following item in the list verifies the maximum distance requirement and there is enough capacity in the bundle, it is assigned to the open bundle. If there is enough capacity but the item does not verify the homogeneity requirement, then the next item is checked. As soon as an item cannot be introduced in a bundle due to capacity restrictions, the bundle is closed and a new one will be opened. Therefore, the time complexity of this algorithm is O(n).
The second heuristic is a greedy algorithm. In this case, if a product cannot be assigned to a bundle because of capacity limitations, the bundle is not closed. Rather, the following products may be assigned to this bundle (if this was possible considering the maximum distance and capacity). Logically, this requires more time. Thus, we have implemented additional checks to avoid consuming unnecessary time. For example, if the smallest-first sorting method is used, and an item cannot be included in a bundle for capacity reasons, the rest of the items will not be considered for that bundle. Similarly, for the largest-first sorting method, when an item cannot be introduced into a bundle due to capacity limitations, the algorithm evaluates if the last (i.e. the smallest) item could be introduced. If not, there is no need to consider the remaining items for the bundle. The time complexity of this algorithm is O(n 2 ).
Last, the distance-based algorithm makes decisions based on the similarity between the items in the open bundles and the rest of the items. This heuristic works as follows. The first item goes to the first bundle. Then, the closest item to the first one (lowest d i j ) is also introduced in this bundle. Subsequently, the item that is closest to those two items is added (specifically, we consider the maximum d i j to those items in the bundle). The process is repeated as long as there is enough capacity. If, at some point, the closest item cannot be introduced for capacity reasons, the second closest item is evaluated, and so on. Finally, once no more items can be accepted (due to capacity and/or maximum distance restrictions), the bundle is closed. Then, the process starts again for the next bundle. The time complexity of the distance-based algorithm is O(n 3 ).

Design of experiments
In this paper, we first evaluate our solution strategy for the HBPPMWR through a new set of test instances. In this section, we describe the procedure followed to generate these instances based on existing datasets.

Existing sets of test instances
ESICUP, the EURO's Working Group on Cutting and Packing, collects on their website 1 several test instances from different works (e.g. Falkenauer, 1996;Scholl et al., 1997;Schwerin & Wäscher, 1997) that have been widely used in the BPP literature. As discussed by Bai et al. (2012), those test instances generated by Falkenauer (1996) are probably the most commonly employed dataset in the BPP field.
Authors dealing with BPPs with conflicts have also used these instances, adapting them to the new context. Gendreau et al. (2004) selected the first 10 Falkenauer's (1996) uniform instances for different numbers of products, n {100, 250, 500, 1000}. These instances consider items with discrete weights that are uniformly distributed within the range [20,100], and the bin capacity is 150. The authors also used the first 10 triplet instances developed by Falkenauer (1996) for n {60, 90, 249, 501}, and multiplied the weights by 10 to obtain integer numbers. In this case, the bin capacity was set equal to 1000. They added 10 graphs of random conflicts, characterised by density values within 0 and 0.9, which resulted in 800 test instances. Muritiba et al. (2010) followed the same procedure as in Gendreau et al. (2004) and generated 800 new test instances, 2 which were also used by Yuan et al. (2014) and Maiza et al. (2016), among others. Sadykov and Vanderberck (2013) also followed Gendreau et al.'s (2004) procedure to generate a new set of instances for the BPP with conflicts.

Generating test instances for our problem
To create the test instances for the HBPPMWR, we also start from Falkenauer's (1996) dataset. Specifically, we consider their 80 uniform instances, that is, 20 instances for each n, with n {100, 250, 500, 1000}. In our problem, we also need to provide the other local parameters (in addition to the weight, which is used in the original dataset) with values for the different instances. To this end, we have proceeded as follows. We note that the test instances have been developed for the hot-dip galvanised steels (HD) family, which is very representative of the problem under consideration. Also, we highlight that the probabilities and data provided below in brackets for the various local parameters are based on actual information provided by the steel company studied.
The set of 80 instances is available upon request. In the following numerical study, we will assume that all the instances have the same values for the global parameters.

Numerical results
We now apply the solution strategy proposed in Sect. 6 to the 80 test instances generated as described in Sect. 7. In the tests reported here, we use the following configuration of the parameters of the system: • Prioritisation strategy. We give the same importance to the three criteria, z y z d z u 1/3. • Capacity. We employ C 150, as in the original Falkenauer's (1996) dataset.
• Maximum distance allowed. We employ d max 1.
• Weight of local parameters. We use γ t 1∀t.
In relation to the configuration of the clustering algorithm, the interval of products allowed for the creation of clusters is defined by MinSi ze 5 and Max Si ze 100. In addition, the initial threshold is defined by ϑ d max 1, and the threshold adjustment (step parameter) is set to τ 0.8. Nonetheless, we note that we do not focus on the cluster analysis in this section; rather, we compare the performance of the algorithms.
Regarding the configuration of the algorithm for allocating non-prime products to bundles, we use L 200 replications. This is based on a preliminary analysis, in which this value has proven to provide a good trade-off between the quality of the allocation and the computation time required.
We also clarify that, while the function CREATE_BUNDLES selects the most appropriate assignment (that with the lowest J ) in its final phase, it stores the best solution and the computation time required by each heuristic. This facilitates the comparison of the effectiveness and the efficiency of the different algorithms. Moreover, due to the randomness in the performance of our solution strategy, we have carried out 10 runs with each of the 80 test instances. In this sense, the results we report are the average of these 10 runs.
In line with our objective function, we analyse the quality of the solutions by looking at the number of bundles used, the unassigned non-prime steel products, and the heterogeneity of the bundles (i.e. measured as the maximum distance between the items of a bundle). We also evaluate the mean computation time.

Overall performance vs computation time
First, we consider the cost function J . Table 2 shows the heuristic algorithm that provided (on average) the lowest value of J in the 80 test instances (20 instances for each n, with n {100, 250, 500, 1000}). Note that the FIFO algorithm did not provide the best solution for any of the 80 instances. The greedy algorithm achieved the best result in 50 instances, a 62.5%, while the distance-based algorithm provided the best result in the remaining 30 instances, a 37.5%. From this perspective, we may conclude that the greedy algorithm generally outperforms its competitors. Nonetheless, it can be highlighted that the distance-based algorithm offers better performance for large datasets (namely, for n 1000). Table 3 reports the mean time (in seconds) spent by the optimisation algorithm in the different tests. As expected, we can observe that the FIFO algorithm is the fastest one. Interestingly, the greedy heuristics requires less time than the distance-based algorithm. This order applies to the four scenarios defined by different numbers of products. Note that the time difference between the algorithms grows as n increases. By combining the insights from Tables 2 and 3, we can conclude that the additional time that the distance-based algorithm requires over the greedy algorithm may only be justified for high numbers of products.

Number of bundles and unassigned items
To better understand the different performances of the three algorithms, we now look at two components of the cost function that are highly interrelated, i.e. the number of bundles used in the allocation and the number of unassigned items. Figures 4 and 5 show the mean value of these metrics and the sum of both for the 20 test instances with n 250 and n 1000, respectively. Detailed inspection of these graphs reveals that the distance-based algorithm is the heuristic that generally uses the lowest number of bundles (see top-left bar diagrams). However, in most cases, the greedy algorithm is able to provide the lowest number of unassigned items (see top-right diagrams). When these two perspectives are considered simultaneously, both algorithms provide a relatively similar performance (see bottom line charts). Nonetheless, the sum of bundles used and unassigned items is more often lower for the greedy heuristic than for the distance-based heuristic. Looking at the FIFO algorithm, we observe that it is clearly outperformed by its competitors. Figures 6 and 7, in Appendix, represent the same information for n 120 and n 500, respectively. Their analysis leads to the same general conclusion: the distance-based algorithm generates allocations with fewer bundles, but the greedy algorithm is able to assign a higher number of products to the bundles. In this way, we highlight that the most suitable algorithm for a specific company would depend on the prioritisation strategy (the distance-based algorithm is more appropriate when minimising bundles is more important; if minimising the unassigned items was the priority, the greedy algorithm would be preferable).
All in all, in 71 out of the 80 instances, a 88.75%, the distance-based algorithm generated the lowest number of bundles. However, this occurs at the expense of leaving a higher number of products unassigned, given that in 69 instances, a 86.25%, it was the greedy algorithm that allocated the most items into the bundles. Considering the sum of both, the greedy algorithm provided the best results in 48 instances, a 60%.

Homogeneity in the bundles
Finally, we complete the picture by analysing the third component of the cost function. This refers to the homogeneity in the bundles, which is measured by the distance in the local parameters of the non-prime steel products that are included in the same bundle (low distances results in high homogeneity).
Tables 7, 8, 9, 10, in Appendix, provide information about the algorithm that generates the most homogeneous solution in the 80 test instances. In 59 instances, a 73.75%, the greedy algorithm outperformed its competitors, while in the remaining 21 instances, a 26.25%, the most homogeneous result was offered by the distance-based algorithm. Nevertheless, it is interesting to note that the number of instances in which the greedy algorithm outperforms the distance-based algorithm decreases as n grows. Specifically, for n {100, 250, 500, 1000}, the greedy algorithm was the one that provided the most homogeneous solution in 20, 19, 15, and 5 instances, respectively. That is, the distance-based algorithm tends to produce more homogeneous allocations than the greedy algorithm for high values of n.
In this sense, we conclude that the fact that the distance-based algorithm is the most appropriate option for large datasets can be explained from the perspective of the homogeneity of the solutions that it generates (rather than by the sum of the number of bundles and unassigned items). This can also be observed from the analysis of Table 10, which reports the best algorithm from the viewpoint of the different criteria.

Real-world application
We now apply the solution strategy developed to the specific problem under study. From this perspective, we address its usefulness in real-world environments, providing a complementary lens to the previous analysis. We use a dataset of non-prime products provided by the organisation for one of their online auctions. This contains all the necessary information about the global and local parameters for 2771 steel products. It should also be noted that in this case the capacity of the different bundles is C 25 tons, and the minimum weight accepted per bundle is w min 3C/5 15 tons.
To preserve confidentiality, we use the same values for the weights of the local parameters as in the previous experiments (i.e. γ t 1∀t) as well the same weights for the different criteria (z y z d z u 1/3), but we note that in the real-world use of the system these parameters need to be dynamically adjusted by the experts in agreement with the management team. Regarding the maximum distance allowed, we use two different levels, d max 1 and d max 1.5, to better understand the impact of this controllable parameter.
Moreover, we configure the clustering and allocation algorithms in a similar manner as the previous study of the instances. That is, the parameters and conditions remain unchanged; specifically: MinSi ze 5; Max Si ze 100, ϑ d max , τ 0.8, L 200, and number of runs per algorithm 10.
Following our solution strategy, the first step requires splitting the steel products according to the global parameters. Then, we have run the clustering algorithm, which provides the results that are shown in Table 4. The algorithm generated 132 clusters with at least 5 items, which together contain 2256 products (out of the 2771 products). The average number of non-prime products per cluster then is 17.1. Also, note that, despite the clusters being allowed to have until 100 items, the largest cluster has 48 items. Indeed, there are only three clusters with more than 40 items, while 38 clusters have less than 10 items.
The allocation algorithm is then applied to assign the products to the final bundles. The (mean) results provided by each heuristic technique in the three main criteria (i.e. number of bundles, unassigned items, and sum of distances in bundles) are displayed in Table 5 for both levels of the maximum distance.
First, we compare the results of the algorithms with d max 1 and d max 1.5. Table 5 provides evidence that when the homogeneity requirements are more demanding (i.e. d max is reduced), the solution procedure provides more uniform bundles (i.e. the mean distance decreases) and the number of bundles decreases. However, these improvements come at the expense of a considerable increase in the number of unassigned items. Note that this holds for the three algorithms. Specifically, when d max decreases from 1.5 to 1, the mean distance decreases more than 0.2 in the three algorithms and the number of bundles used reduces by more than 170 bundles in all cases, but the number of unassigned products increases by at least 700 items.  Second, we analyse the performance of the three optimisation algorithms. To this end, we focus on the case with d max 1.5. Table 5 shows that the distance-based algorithm is able to generate an allocation that simultaneously utilises a lower number of bundles (438 versus 463 and 471), it leaves fewer unassigned items (916 versus 1055 and 1063), and it produces more homogeneous bundles (0.76 versus 0.84 and 0.87). The same general findings hold for d max 1, with an interesting exception: in this case, the FIFO algorithm proposes an allocation with fewer bundles than its competitors (251 versus 265 and 266), although this is partially because the number of unassigned items is higher (1816 versus 1652 and 1766). All in all, we conclude that the superiority of the distance-based algorithm is in line with the findings of the previous section, where we observed that, while the greedy algorithm generally provided better results, the distance-based algorithm often emerges as the most appropriate alternative for a high number of products.
At this point, we note that the average total execution time of our solution procedure, including the clustering and the allocation algorithms, has been 244 s. This is a reasonable amount of time, which fits the requirements of the problem under study, and would allow the users to test different configurations of the parameters for each auction. In this sense, the procedure is not only effective but also efficient.
To provide further insights on the behaviour of the algorithms, we finally consider three scenarios, in addition to the base one in which the same weight is given to the three criteria (scenario 0, z y z d z u 1/3). Scenario I assumes that the minimisation of the number of bundles is prioritised (z y 0.6, z d z u 0.2). Scenario II considers that minimising the unassigned items is the priority of the company (z u 0.6, z d z y 0.2). Scenario III models the case in which the homogeneity in the bundles is the most important criterion (z d 0.6, z y z u 0.2). Table 6 provides information on the cost function J in the four scenarios (0, I, II, and III) in relative terms to the minimum value of J for each value of d max . We can see that the distancebased heuristic offers the best allocation in the four scenarios. This can be easily understood given that, as we discussed before, when the number of items is high, this algorithm tends to generate allocations not only more homogeneous but also with fewer bundles and a lower number of unassigned items. In addition, we observe that the greedy algorithm outperforms the FIFO heuristic in seven out of the eight cases, although both entail a significantly higher cost than the distance-based heuristic. In this regard, it is also interesting to note that the relative difference between this heuristic and its competitors increases as d max grows.

Concluding remarks
This work has studied the grouping of non-prime products into homogeneous bundles that are later auctioned, a problem that significantly affects the economic performance of many steel producers. The objective is to simultaneously minimise the number of bundles used, the number of unassigned items, and the differences (in a set of parameters) of the products that are included in each bundle. In addition, the allocation problem needs to be solved with a moderate computational effort due to time limitations. We have modelled the problem mathematically as a variant within the family of BPPs that is characterised by the interaction of conflicts, constrained distances, and minimum weight requirements.
To solve the grouping problem of non-prime steel products, we have developed a threestage solution procedure that employs clustering techniques and optimisation algorithms. Specifically, we have implemented three heuristics; namely, a FIFO, a greedy, and a distancebased algorithm. The value of our solution strategy has been demonstrated both with a sample of test instances and with data from the real-world problem under consideration. It is capable of providing an effective allocation of products to bundles in a reasonable amount of time. We have also observed that in general terms the greedy algorithm outperforms its competitors; however, when the number of items is very high, the distance-based algorithm generally provides better performance. In such cases, this heuristic is able to generate fewer and more homogeneous bundles with fewer unassigned items.
Interesting avenues for research emerge from this work. Studying the effects of the weight of the local parameters and/or the criteria in the objective function would help managers to configure their decision support systems more precisely. We may also include other optimisation algorithms to increase the effectiveness of our solution tool. Nonetheless, this may increase the computation time considerably. Therefore, due to the limited time available, this would motivate us to look for ways to improve the efficiency of our solution procedure. This can be done by delving into the interplays between the sorting methods and the allocation algorithms. In this sense, we also plan to get inspiration from recent developments in different streams of the operational research literature, yet adjacent to the BPP literature, including cluster analysis (e.g. Xu et al., 2021), conflict management (e.g. Ficker et al., 2021), multiobjective optimisation (e.g. Denstad et al., 2021), and multi-criteria decision making (e.g. Kou et al., 2020).