1 Introduction

Warehouses allow regulating the company’s flow of goods, both inbound and outbound. It consists of handling and storing equipment and products, and both human and capital resources are involved. We can identify five main processes carried out within it: (i) receiving, (ii) transfer and storing, (iii) accumulation, (iv) sortation, (v) order picking, (vi) cross-docking, and (vii) shipping. The first involves the arrival of loading units in the warehouse. Subsequently, the items are transferred to the appropriate area and stored, accumulated, sorted, and finally picked and shipped or traversed through a cross-docking area [1].

Warehouse operations management can be very complex and plays a critical role in order to avoid inefficiencies that can be reflected throughout the production process. Sufficient space is needed in order to optimize stock handling and cart movement, and loading and unloading procedures. Generally, the operation that requires the most significant workforce commitment and so has the highest cost is order picking [2]. For all these reasons, warehouse management represents a fundamental part of the internal organization of a company’s supply chain. In recent years, it has become necessary to find solutions that allow improvements in the performance of the entire production chain. In this regard, several works have been developed that analyze the key performance indices (KPIs) used to evaluate the operations performed and improve their performance.

Sustainable warehouses provide a storage management service that satisfies customers by trying to produce as little environmental impact as possible and by operating in a socially responsible manner. Climate change is one of the greatest threats of the twenty-first century and, as such, is of concern to both companies and the public. For this reason, companies are looking for solutions that limit greenhouse gas emissions into the atmosphere in any process (e.g., by promoting green logistics). In fact, in recent years, the concept of sustainability has evolved profoundly to ensure the availability and quality of natural resources (environmental sustainability); the quality of life and safety of citizens (social sustainability); and the economic efficiency of businesses (economic sustainability). For example, while energy consumption and consequent emissions have continually increased, transportation and storage are perceived as essential drivers of environmental pollution in global supply chains. It is estimated that about 2.8 billion tonnes of the overall GHG emissions, which is equivalent to about 5.5% of the total GHG emissions, are caused by the logistics and transport sector [3]. According to this idea, a fundamental open point in the research is to create models to estimate the impact of each Industry 4.0 enabler on supply chain sustainability [4]. The trend towards environmentally friendly warehouse management practices has gained significant importance in recent years, with increasing research attention towards the concept of “Green Warehousing.” This approach emphasizes reducing energy consumption and emissions in the handling and storage of inventory, thereby minimizing the environmental impact of warehouse operations [5, 6]. However, it is also essential to consider the economic and social aspects of warehouse management. The concept of “Lean Warehousing” focuses on maximizing efficiency and productivity while minimizing waste and non-value-added activities [7]. A more sustainable approach to warehouse management requires a comprehensive performance measurement framework that considers not only environmental factors such as pollutant emissions and energy savings but also worker conditions and ergonomics of tasks. It is crucial to strike a balance between environmental, economic, and social sustainability in warehouse management to achieve optimal outcomes for all stakeholders involved; however, to date, there are no standardized methods for assessing social sustainability [8].

The purpose of this paper is twofold. Firstly, it aims to present a quantitative evaluation method and a proposed ranking procedure to assess the most used KPIs in the context of warehouses. These approaches will help warehouse managers and operators to identify and prioritize KPIs that are most relevant to their business objectives and performance goals. Secondly, this paper provides a comprehensive review of the current state of the art in scientific and industrial literature on warehouse KPIs. By analyzing and synthesizing existing research and industry reports, this paper seeks to provide a quantitative overview of the most important KPIs for warehouses, their definitions, and how they can be measured and used to drive continuous improvement in warehouse performance.

The concept of employing quantitative metrics to evaluate research articles within a systematic literature review represents an emerging and, as of now, not widely adopted practice. Traditionally, systematic literature reviews have relied on qualitative assessments and expert judgments to gauge the relevance and quality of research articles. By leveraging metrics, the scientific community can establish a more transparent and data-driven approach, enhancing the rigor and reproducibility of results.

We reach the proposed aim through three stages. First, we extracted a set of main KPIs from the literature. Second, we conducted a second query on the articles to assess the occurrences of KPIs and validate their relevance. Finally, we created several metrics to rank the founded KPIs according to different criteria, such as their popularity, impact, and ease of measurement. This method allowed us to identify the most important KPIs in the context of warehouses and provide guidance on how to use them to improve warehouse performances. By following this structured approach, warehouse managers and operators can identify and prioritize the KPIs that are most relevant to their business objectives.

The present work is structured as follows. Section 2 contains a literature review of similar work in other areas and focuses on warehouse systems. In addition, some literature gaps are described, and this paper’s primary purpose is presented. Section 3 is dedicated to the applied KPI discovery, dataset creation, and metric selection procedure. Section 4 presents the indicators found with their definition and a ranking of the indicators according to the metric presented. Finally, in Section 5, there are the conclusions with a focus on the future perspectives and limitations of this paper.

2 State of the art and research gaps

According to Roberts’s Measurement Theory, when a phenomenon is studied, it is possible to define an empirical relational system like \(U=\langle A, R, O\rangle\), in which \(A=\{{a}_{i}\}\) is the set of all possible instances of the phenomenon; \(R=\{{r}_{i}\}\) and \(O=\{{o}_{i}\}\) are respectively the sets of possible empirical relations and operations on \(A\). A relational system like \(B = (N, S, P)\), where \(N\) is a set of numbers (\({\mathbb{N}}\), \({\mathbb{R}}\), \({\mathbb{Q}},\) etc.), \(S\) is a set of mathematical relation (\(<\), \(>\), \(=\), etc.) and \(P\) is a set of mathematical operations (\(+\), \(\div\), \(\times\), etc.), is called numerical relational system [9]. The term “to measure” means creating a map of an empirical relational system in a numerical relational system, keeping all the relations and operations real. A measure is a homomorphism of \(A\) into \(N\) and an isomorphism of \(R\) into \(S\) and \(O\) into \(P\) [9]. It also is possible to make a homomorphic map of an empirical relational system without the isomorphic map of relations and operations. This map could be called an indicator [10]. A performance indicator is a numeric value that represents a complex empirical phenomenon to evaluate performances giving insights to the management office. The elementary data gathered by sensors or operators must be aggregated into valuable tools representing system performance [11]. The use of indicators is strictly linked to three different aims: (i) to assess the current status of a process in order to compare it with a benchmark, (ii) to continuously monitor the progress of a particular process in a specific time frame, and (iii) to evaluate the impact of a particular strategy or change by measuring the KPI in two different moments. A graphical representation of these three purposes is displayed in Fig. 1.

Fig. 1
figure 1

The three main aims of key performance measurements

In the past few years, some scholars have focused their research on performance in order to provide an extensive set of sustainability KPIs that can be used as a valuable tool in a particular field of knowledge. One of the most used approach to pursue this aim is the systematic analysis of literature. For instance, [12], through literature analysis, discovered 55 sustainability indicators. It arranged them into five different clusters and ranked them based on their usefulness and practicality. At the same time, in [13], the authors proposed several indices to study the environmental, economic, and social sustainability for additive manufacturing product and process development. Another example is [14], in which the authors extracted 787 indicators to analyze eco-design processes from the scientific literature. It is possible to find similar works aiming to develop a framework to support the definition of a complete set of indicators applicable in warehouse performance analysis. Johnson and McGinnis proposed a data envelopment analysis (DEA) approach to assess the warehouses compared to an efficient frontier. In their work, a warehouse is represented as a system that transforms inputs like labor, space, equipment, and inventory into outputs, like the piece lines, the pallet lines, the case lines, revenue, and value-added services [15]. Staudt et al. performed a literature review for the purpose of finding the most used performance indices. They defined four categories of direct, measurable KPIs: time, cost, quality, and productivity, and a general indirect measurable KPI cluster. The indirect group contains indicators like flexibility, customer perception, and value-added [16]. A recent scientific trend is to study warehouses with a sustainable outlook. Torabizadeh et al. proposed 33 different sustainability indicators and clustered them into six distinct groups: (i) warehouse operation performance, (ii) economic performance measurement, (iii) resources, (iv) emission waste and environmental commitment, (v) labor practice and decent work, and (vi) product responsibility and society [7]. In contrast, Faveto et al., in their article, focus on the triple bottom line and the three aspects of sustainability: social, environmental, and economic [17]. It is also possible to find indicator definitions in articles aiming to study particular warehouse logic or strategies. Gu et al., in their work, proposed a wide selection of scientific papers that calculate the travel time and other indicators analytically according to different warehouse technology [18].

Despite the number of cited studies, a significant scientific gap is still evident. In different geographical or sectoral contexts, the indicators are not homogeneous, although they are intended to represent the same phenomena. Sometimes an indicator with the same name is calculated and used for entirely different purposes. Other times the same indicator is named differently according to the context. For instance, in [19], storage costs and holding costs are used as synonyms, while in [20], the storage costs are incurred in introducing the load unit inside the warehouse, and in [21], the holding costs are sustained in maintaining the load unit stored in the warehouse. The standard ISO 22400 has the scope to create a conventional set of KPIs in the manufacturing field. However, such a norm does not present a section related to the logistic management of the warehouses. The primary purpose of the present work is to start a scientific debate on the realization of a broadly accepted performance measurement system in warehouses using the sustainability triple bottom line as a general reference.

3 Method of the framework

The main phases of the framework are (i) the KPI discovery phase, explained in the first subsection of this chapter and consisting of defining the KPI list under analysis, (ii) the KPI evaluation phase according to the scientific literature, (iii) the corresponding evaluation phase based on the know-how of a manufacturing companies’ sample, respectively, discussed in the second and in the last subsections of this chapter, and, finally, (iv) the creation of a single ranking of all the selected KPIs according both evaluations. Figure 2 generally describes the proposed framework to obtain a quantitative evaluation, in this case, a rank, of the KPI inherent in the target sector.

Fig. 2
figure 2

Method flowchart: the green phase represents the KPI Discovery (3.1), the blue phase the KPI Relevance Validation (3.2), and finally, the red phase the KPI Evaluation through Metrics Definition (3.3)

The first phase of KPI discovery consists of performing a query in an available DB of textual documents inherent to the desired industrial sector. In this case, the Scopus DB is used to extract scientific papers regarding warehouse systems example such documents by applying different criteria to extract a sample that is processable by available analytical techniques. In this work, the model is processed by reading each manuscript by a human, and for this reason, the best sample is the one that maintains the distributions of the population from which it is sampled and is as small as possible. According to the requirement, two different criteria are applied: one according to quality requirements regarding scientific papers defined by the authors and a second that consists of random sampling to mitigate any bias introduced by the human by choosing the quality requirements of in the first sampling criteria. From the processing of that sample, therefore, all the KPIs present in the sample are extracted in order to obtain the list of KPIs to be evaluated in a quantitative manner by extracting different metrics with which to rank those KPIs: in other words, creating a system of KPIs to evaluate the KPIs selected in the discovery phase.

The other two parallel steps consist of creating adequate metrics to evaluate how much the discovered KPIs are used in the scientific community and the industrial sector. The KPI evaluation in scientific literature consists of performing a query in a scientific DB where Scopus, in this case, is almost the only choice, and a sampling of the results of this query to obtain a sample processable by available analytical techniques: in this case, a single quality criterion is used. Note how in this phase, it is necessary to use a scientific DB so as to measure the relevant metrics, whereas in the previous phase of KPI discovery, this is not necessary, and it is possible to use different sources to select the KPIs to be evaluated, e.g., if present, a reasonable method is to consider the KPIs listed by ISO or other documents on national and international standards. Once the documents with which to assess the KPIs have been identified, the various metrics defined are measured for each KPI simply by reading the manuscripts and counter-checking the presence or absence of the KPIs in question; other methods are not used in this work and are discussed in the section regarding future work. Finally, the evaluation of KPIs in Italian enterprises consists of measuring metrics defined on the basis of the results of surveys submitted to several employees of logistics companies, manufacturing plants that employ warehouse systems in their production lines, or manufacturers of such systems.

3.1 KPI discovery

The identification of the most important KPIs for automatic warehouses starts with a systematic literature review from the database Scopus. The literature suggests that Web of Science and Scopus stand out as the most valuable databases, with a focus on Life Sciences, Physical Sciences, and Technology. Web of Science is known for its selectivity, whereas Scopus offers a more comprehensive coverage. It is worth noting that nearly all the journals in Web of Science are also indexed in Scopus. In fact, Scopus boasts 66.07% more unique journals compared to Web of Science [22]. Nevertheless, it is important to emphasize that Scopus primarily includes journals considered of high value. Therefore, our decision to opt for Scopus reflects a balance between selectivity and the preservation of valuable contributions. To extract the most comprehensive number of documents without including off-topic articles, we define the following query: “TITLE (autom* AND warehouse)” on Scopus (https://www.scopus.com/). This query allowed us to find papers with a focus on automated and automatic warehouses. Searching the same keywords inside the whole abstract would extract too many documents without a clear focus. Therefore, we limited the query to the title. In June 2020, the query retrieved 499 different articles.

To obtain and analyze the KPIs used in the extracted articles, we performed a sampling method based on two criteria: (a) quality papers selection and (b) random papers selection. An article, in order to be considered of high quality, i.e., to be classified in the category (a), must satisfy at least one of the following three conditions: (i) the paper is published in a journal classified as Q1 or Q2, (ii) the paper is published in a journal with a Scimago Journal Ranking (SJR) greater than 0.5, or (iii) the paper has at least 14 citations. The documents that satisfy this condition are 113 out of 499, representing 22.6% of the total volume. The documents belonging to the second group (b) are selected by a random sampling performed on the remaining 386 papers. The random group is composed of 124 documents that are 25% of the initial quantity (499).

After selection, the papers to be analyzed are 236, but some of them are not possible to get or read (e.g., without English translation), while some others are considered “out of scope” because they are not referring to an automatic or automated warehouse, but to topics like a data warehouse. From the high-quality group (a), we excluded 14 papers from the topic and 9 whose full text was not available. From the randomly sampled group (b), we excluded 10 papers out of topic and 1 whose full text was not available.

This analysis allows us to extract the 70 KPIs. We start from this set of indices and search them on a broader sample to validate and integrate them with other missing indicators.

3.2 KPI relevance validation

The second objective of this work is to extract a sufficiently big enough database about the use of warehouse KPIs in order to create an evaluation and ranking system. To extract a comprehensive sample of articles focused on warehouse systems, we define the following query: TITLE (warehouse) AND NOT TITLE-ABS-KEY (“data warehouse”) on Scopus (https://www.scopus.com/). This query allowed us to find papers with a focus on warehouses. Searching the same keyword (warehouse) inside the whole abstract would extract too many documents without a clear direction. Therefore, we limited the query to the title. We also decided to avoid all these articles containing the keyword “data warehouse” in their title since it was evident from different analyses, we conducted that the central part of the out of our scope articles was in the computer science domain. In November 2020, the previous query retrieved 4402 different articles. We performed a sampling method based on quality criteria. An article, in order to pass the quality filter, must satisfy at least one of the following three conditions: (i) the paper is published in a journal classified as Q1, (ii) the article is published in a journal with a Scimago Journal Ranking (SJR) greater than 0.5 or (iii) the article has at least 50 citations. The filter extracted 890 articles, 20% of the total sample. These articles were scanned and analyzed in order to pursue two different purposes: (a) to calculate the frequency of the 70 primary indicators already collected previously on a more significant sample and (b) to find new indicators to be included in this analysis.

The 890 articles were assigned to two parallel groups of engineering students to be analyzed manually (each student oversaw 35–50 documents). After carefully reading the paper, each student identified and highlighted the presence or absence of the 70 given indicators within the article to simplify the review process, and the obtained result was recorded in a shared Excel document. A second aim of the student analysis was to find and report new KPIs not included in the base set. Finally, another student’s task was to identify any off-topic research (e.g., articles about warehouse insects) and report them. Of the 890, 232 documents were classified as duplicates or not found. Usually, duplicates were old versions of papers already analyzed, like errata, or editorial and notes, while the ones not found were articles whose full text was not available. Another 73 were identified as off-topic or out of scope, i.e., unrelated to the topics studied. Finally, the analyzed articles were 585, about 13% of the total extraction. Once all articles were analyzed, we moved on to a review process. In the first phase, we merge the two results obtained. The papers were examined in parallel by the two groups. Therefore, it has been fundamental to compare the results of the two analyses to gather them in one. In 17 instances, KPIs identified by one student did not match those found in the analysis of the duplicate copy of the same paper. In these cases, we re-examined the articles. Next, a sample review of 585 papers was conducted. A total of 363 articles and their indicators were reviewed, randomly selected from all the papers present, and if some errors were found, they would be corrected.

3.3 KPI evaluation through metrics definition

In the pursuit of evaluating the effectiveness and comprehensiveness of various indicators, the focus is placed on a dual set of metrics. The goal of these metrics is to assess the usefulness of the indicators from different points of view to try to extract a set that can be considered complete. We have proposed two sets of metrics: (i) objective and (ii) subjective: while objective metrics are based on the frequency in scientific manuscripts, subjective metrics are based on a qualitative assessment of the ease of use and on a survey conducted in different Italian companies.

3.3.1 Scientific KPI evaluation

The first metric is the relative frequency \({f}_{\mathrm{\vartheta }}^{r}\), calculated by dividing the absolute frequency \({f}_{\mathrm{\vartheta }}^{a}\) of a generic indicator ϑ by the total number of the analyzed papers K (585).

$${f}_{\mathrm{\vartheta }}^{r}=\frac{{f}_{\mathrm{\vartheta }}^{a}}{K} ,$$
(1)

The second metric is a citation-weighted frequency based on the number of citations of the article in which the indicator is contained. \({B}_{k}^{\mathrm{\vartheta }}\) is a Boolean value equal to one if the ith indicator is present in paper k and zero otherwise, while \({C}_{k}\) is the number of citations of the kth article.

$${f}_{\mathrm{\vartheta }}^{wc}=\frac{\sum_{k=1}^{K}{C}_{k}{B}_{k}^{\mathrm{\vartheta }}}{\sum_{k=1}^{K}{C}_{k}} ,$$
(2)

The third metric considers the singularity of indicators used in a research article, where \({M}_{k}\) represents the number of different indices present in the kth paper. The logic behind this index is that if a KPI is always used alone, it has a precise purpose and can provide knowledge without other indicators. When the frequency \({f}_{\mathrm{\vartheta }}^{wm}\) is equal to 1 means that it is always used singularly in every article.

$${f}_{\mathrm{\vartheta }}^{wm}=\frac{\sum_{k=1}^{K}{B}_{k}^{\mathrm{\vartheta }}}{\sum_{k=1}^{K}{M}_{k}{B}_{k}^{\mathrm{\vartheta }}} ,$$
(3)

The last metric is based on the year of publication of the paper. Each article is identified by a decimal number between 0 and 1, denoted by \({A}_{k}\) that represents the age of the kth article. \({A}_{k}\) is calculated as:

$${A}_{k} =\frac{{y}_{k}- {y}_{\mathrm{max}}}{ {y}_{\mathrm{min}}- {y}_{\mathrm{max}}} ,$$
(4)

where \({y}_{\mathrm{max}}\) is the year of the most recent publication, 2021 in our case, and \({y}_{\mathrm{min}}\) is the year of the older publication decreased by a unit, in our case, 1946. \({A}_{k}\) is then used as a discount factor in order to calculate a new frequency that assigns a larger value to a more recent occurrence than an older one.

$${f}_{\mathrm{\vartheta }}^{wa}=\frac{\sum_{k=1}^{K}{A}_{k}{B}_{k}^{\mathrm{\vartheta }}}{K} ,$$
(5)

In this way, we provide a discount rate to the occurrences. If an indicator is present in an article published in 2021, the presence would be worth 1. This value will decrease linearly to a minimum value of 0.0134 given to the occurrences in the articles published in 1947.

3.3.2 Industrial KPI evaluation

All the metrics that have been presented so far have an objective value. However, they lack contact with the industrial world; for this reason, we have created a survey to be submitted to experts working in the logistics sector. In about 6 months, we have collected the answers of 15 people representing both SMEs and big corporate firms, whose warehouses vary from a minimum of 48 m2 to a maximum of 36,000 m2. Each respondent had compiled a first section in which the company is described, mainly information about the company size, warehouse size, number of SKUs managed, the primary function of the warehouse, etc. In a second phase, they have assigned a value of importance from 0 to 5 to the 70 indicators constituting the basic set. The perceived importance metric (\({s}_{\mathrm{\vartheta }}^{q})\) of the indicator ϑ is calculated through a simple arithmetic average of the answer \({Q}_{n}^{\mathrm{\vartheta }}\) obtained by the \(N\) respondents.

$${s}_{\mathrm{\vartheta }}^{q}=\frac{\sum_{\mathrm{n}=1}^{N}{Q}_{n}^{\mathrm{\vartheta }}}{N} ,$$
(6)

4 KPI selection

The 70 selected indicators are categorized into three clusters following the triple bottom line (TBL) structure. The TBL is a framework that evaluates a process from three distinct points of view of sustainability: social aspects, environmental aspects, and economic aspects. According to this theory, an organization should be able to perpetuate its activities over time concerning the environment and society by generating profit [11]. Inside the three clusters, other subcategorization was made depending on the nature of KPIs. The economic cluster, as we expected, is the biggest one, with 52 different indicators (almost 80% of the total). Environmental and Social Cluster have similar dimensions, 10 hands are clustered as environment-related, and 8 indicators are clustered as Social related.

The following paragraphs describe each KPI cluster. For each KPI, the unit measure, the questionnaire-based perceived importance (Q), the relative frequency (R), the citation-weighted frequency (C), the singularity indicator (S), and finally, the yearly weighted frequency (Y) are reported.

4.1 Economic KPIs

The indicators of this cluster refer to the economic value created by the organization. In particular, they indicate the warehouse’s performances that directly influence the company’s costs and profit. Inside this group, we subcategorized the indicators into four separated subclusters: (i) generic performances (Table 1), (ii) time-related performances (Table 2), (iii) cost-related performances (Table 3), (iv) and ICT performances (Table 4).

Table 1 Generic performances
Table 2 Time-related performances
Table 3 Cost-related performances
Table 4 ICT performances

4.2 Environmental KPIs

The environmental performances are clustered into two different groups. The first set describes the warehouse system as an environment per se, i.e., the atmosphere in which the human resources operate, and the items are stored. The first cluster is called Warehouse Environmental Measures, and it is displayed in Table 5.

Table 5 Warehouse environmental measures

At the same time, the second group contains measures of the direct impact that the system has on the environment: like energy consumption and pollutant emissions (Table 6).

Table 6 Emission, waste, and environmental commitment indicators

4.3 Social KPIs

Based on activities concerning social sustainability and ISO 26000 [35], companies are responsible for considering their impact on their human resources and the human society in which they are immersed. Not many indices have been found in the literature for this category. The ones we found mainly measure the operator’s safety and how much the warehouse system is based on human work or automation labor practice, decent work, and social responsibility indicators are reported in Table 7.

Table 7 Labor practice, decent work, and social responsibility indicators

It is interesting to note that the experts interviewed found indicators analyzing human resources such as Human Utilization, Human Activity Time, and Human Error relevant. The most important indicator, according to the experts, is Inventory Turnover. This indicator is more financial than operational, and it is able to quickly provide insight into the rotation of goods and possible inventories. Data from the questionnaires also revealed the importance of safety in the workplace (machine and work safety). Finally, we mention energy consumption as the only indicator related to environmental sustainability. The three frequencies, R, C, and Y do not exhibit excessively different results. These three classifications are able to depict the focus that research has in the area of warehouse analysis. The most studied indicators are those related to Travel Time and Travel Distance. This fact is not surprising as research often concentrates on algorithms to find the best route to reach the unit load in the warehouse. Along with these, Picking, Cycle, and Lead Time are also important. A second particularly prolific area of research is devoted to warehouse cost minimization, especially Holding and Inventory Costs through stock management. Throughput is a well-known KPI and is often used to compare different warehouses’ performances. Finally, we cite Space Occupation as an essential indicator of environmental sustainability. More efficient and compact warehouses should be preferred over wider ones requiring more space. Lastly, particularly interesting is the result obtained from the uniqueness metric. In this group, we have very particular indicators linked to environmental sustainability. This may show that studies that aim to analyze warehouse environmental performance rarely combine environmental KPI with economic performance and vice versa. In addition, articles specializing in environmental sustainability performance seem to focus on a small set of indicators compared to other analyses.

4.4 Indicator ranking

This section presents the indicators ranked for the five analyzed metrics: Q, R, C, U, and Y. Table 8 displays the nine top-ranked warehouse indicators according to the five metrics.

Table 8 Top 9 indicator ranking

Finally, the three KPIs with the highest aggregated score for the seven clusters have been identified. To compute an aggregated rank for the KPIs with the five metrics, we normalize them by dividing each value by the maximum value found using the formula:

$${\overline{M} }_{ik} =\frac{{M}_{ik}}{\mathrm{max}({M}_{i})}$$
(7)

where \({\overline{M} }_{ik}\) is the value of the \(i\) th normalized metric for the \(k\) th warehouse indicator. According to Yu et al. [37], the proposed approach stands out as the most effective means of normalizing positive values because it preserves both the minimum and maximum values, as well as the relative differences between elements in the series. Consequently, it allows us to scale all metrics within the range of 0 to 1, where 0 signifies the highest quality and 1 represents the lowest quality, and finally, we compute the arithmetic average. The result of this final ranking is represented in Fig. 3.

Fig. 3
figure 3

Ranking of the three most valuable KPIs for each category

It is interesting to note that at the top positions with very high scores, classic performance metrics related to time, cost, or warehouse efficiency are predominant, while environmental or social performance indicators, besides being fewer in number, are also less used both in scientific research and in industrial environment.

5 Conclusions and future works

In conclusion, this paper has presented a framework mainly based on a quantitative evaluation and on a ranking procedure for analyzing the most used KPIs in the context of warehouses. Through a comprehensive analysis of the scientific and industrial literature, we have identified the most important KPIs and their definitions and provided guidance on how to measure and use them to improve warehouse performance.

Overall, this paper provides a valuable contribution to the field of warehouse management by providing a structured approach to evaluating KPIs and highlighting the importance of continuous performance improvement. By implementing the proposed evaluation method and ranking procedure, warehouse managers and operators can effectively monitor and improve their warehouse performance to achieve their business objectives.

During the second step of our warehouse KPI analysis, we identified several KPIs that were not included in the proposed ranking procedure presented in this paper. However, we believe that these KPIs are still important for an overall view of a warehouse measurement system and should be included in future studies. Incorporating these KPIs into the evaluation method and ranking procedure will provide a more comprehensive understanding of warehouse performance and help warehouse managers and operators identify areas for improvement. Examples of such KPIs include shortage costs, replenishment costs, and ergonomics.

The first indicator, shortage costs (sometimes it can be found as penalty costs [38]), aims to evaluate the effect of stock out. It represents costs incurred due to backlogging or delivery penalties for incorrect timing. Only in the case of contractual penalties does this mean an actual expense for the company, but more often, it is an opportunity cost whose estimation is particularly complex. According to [39], most researchers assume that during the stockout phase, the shortages were either. However, in practice, some devoted potential consumers are ready to wait for these shortages, while others can be more impatient and search for the goods elsewhere.

Replenishment costs are the costs incurred for the procurement of new items. Replenishment costs, holding costs, and shortage costs constitute inventory costs (presented in Table 3). These three costs include various sides of the same coin. It is necessary to demand large orders to lower the replenishment cost. However, implementing this strategy impacts holding costs (especially in the case of perishable products). A proper balance between order size and frequency decreases the chances of stockout and, thus, shortage costs. A heuristic application of an efficient replenishment strategy can be found in [40].

Another interesting KPI is ergonomics. Warehouses, particularly picking activities, require a significant human workload, so keeping track of workers’ good physical condition is critical. In [41], an optimization model for picking is proposed to find the best solution for a storage location problem that minimizes the cycle time and the work discomfort. They estimate a measure of the work discomfort based on location factors (e.g., picking level, section number, and type of bin) and product factors (e.g., quantity, mass, and volume of the product to pick). While in [42], the authors evaluate the physical fatigue of workers through an analysis of spinal load performed in a lab environment.

A potential improvement for the current article would be to define all 70 key performance indicators (KPIs) proposed using the approach outlined in ISO 22400. This approach is focused on optimizing management performance by providing a detailed description, including its relevant scope (e.g., work unit, work center), a mathematical formula used for calculation, the unit of measurement, range of values, and a trend to assess whether the results are satisfactory or not. Incorporating this approach would enhance the clarity and accuracy of the KPIs proposed in the article and make it easier for managers to understand and apply them in their own organizations.

Two main further improvements to the framework could be made. Firstly, alternative sources could be used for the KPI discovery phase, e.g., an industrial source, to increase the relevance and applicability of the KPIs proposed. Secondly, the validation and ranking of the KPIs could be enhanced by involving a larger sample of industrial experts and conducting more extensive and detailed interviews to gather their opinions and insights. These improvements would help to ensure that the KPIs identified are comprehensive, relevant, and reliable and would enhance the overall quality of the study.

Another type of improvement to the framework is the application of text mining and natural language processing (NLP) techniques to detect KPI references in a generic document, for example, a scientific manuscript. For example, the KPIs in the document can be evaluated using term frequency-inverse document frequency (TF-IDF) [43] using different types of NLP techniques already discussed in other scientific works [44]. A not discussed output of the analysis described in this paper is creating a collection of scientific manuscripts in PDF format where the KPIs identified by human processing analysis are highlighted. This database is a resource that habilitates the use of supervised machine learning methods to obtain a KPI detection model in a document. Therefore, a comprehensive future improvement is using both supervised and not supervised methods to automatically identify KPIs in a text-based document.

Finally, an interesting qualitative future work is the introduction of the industry 5.0 concepts in the KPI classification. This work includes a classification based on different sustainability criteria, and this is a fundamental step in evaluating and classifying KPIs in terms of sustainability, human-centrality, and resilience of the system where this KPIs set is applied.