1 Introduction and background

In supply chains (SCs), large data sets are available through multiple sources such as enterprise resource planning (ERP) systems, logistics service providers, sales, supplier collaboration platforms, digital manufacturing, Blockchain, sensors, and customer buying patterns (Li et al., 2020b; Rai et al., 2021; Li et al., 2022a). Such data can be structured, semi-structured, and unstructured. Big data analytics (BDA) can be used to create knowledge from data to improve SC performance and decision-making capabilities. While BDA offers substantial opportunities for value creation, it also presents significant challenges for organisations (Chen et al., 2014; Choi et al., 2018).

Compared to BDA that deal with collecting, storing, and analysing data, data science (DS) focuses on more complex data analytics. In particular, predictive analytics such as machine learning and deep learning algorithms are considered. From the methodological perspective, DS &BDA contribute to decision-making at strategic, tactical, and operational levels of SC management. Organisations can use DS &BDA capabilities to achieve competitive advantage in the markets (Kamley et al., 2016). DS &BDA techniques also help organisations improve their SC design and management by reducing costs, increasing sustainability, mitigating risk and improving resilience (Baryannis et al., 2019b), understanding customer demands, and predicting market trends (Potočnik et al., 2019).

Along with methodological advancements, a progress in the DS &BDA tools can be observed. SC analytics software help researchers and practitioners alike to develop better forecasting, optimization, and simulation models (Analytics, 2020). These tools can also extract data and produce advanced visualizations. Along with the large corporations such as SAP®, IBM, and Oracle, there are also specific SC software such as anyLogistix™ and LLamasoft™, that allow to integrate simulation and network design with SC operations data to build digital SC twins (Ivanov, 2021b; Burgos & Ivanov, 2021). The advanced methodical and software developments result in growing opportunities for SC researchers and practitioners. However, the existing insights are scattered over different literature sources and there is a lack of a structured review on the DS &BDA application areas in SC and logistics (SC &L) areas, comprehensively covering efficiency, resilience and sustainability paradigms which encouraged us to conduct this systematic and comprehensive literature review. In the next section, we elaborate in detail on our motivation for this study.

1.1 Motivation of the study

Google trends for “Data Science” and “Big Data” have exhibited continuously increasing interest over the last 19 years in the DS &BDA in SC &L field, while the trend for SCs has steadily exhibited high interest (see Fig. 1). However, interest in BDA started increasing earlier than that for DS. We can also observe the recent convergence in the trends for “Big Data” and “Data Science”.

Fig. 2
figure 1

Trends of interest in the topics of this research (2004–2022)

From an academic point-of-view, various literature review studies have recently indicated the benefits of using DS &BDA in SC &L management (Pournader et al., 2021; Riahi et al., 2021; Novais et al., 2019; Neilson et al., 2019; Ameri Sianaki et al., 2019; Baryannis et al., 2019b; Choi et al., 2018; Govindan et al., 2018; Mishra et al., 2018; Arunachalam et al., 2018; Tiwari et al., 2018). Table 3 demonstrates the latest literature review publications in line with DS &BDA and affirms that although several review papers can be found around this topic, the reviews only explore SC &L from the specific viewpoint of BDA. Kotu and Deshpande (2018) concede that although the concept of big data is worthy of being explored separately, a holistic view on all aspects of data science with consideration of big data is of utmost importance and still needs to be researched in several areas such as SCs. Our investigation also shows that studies including BDA mostly discuss architecture and tools for BDA, but lack a contextualisation in the general data science methodologies. Waller and Fawcett (2013) affirm that along with the importance of data analysis in SCs, other issues related to data science are important in the SC, such as “data generation”, “data acquisition”, “data storage methods”, “fundamental technologies”, and “data-driven applications”, which are not necessarily connected to BDA.

The growing number of studies in DS &BDA and SC &L substantiates the need to adopt systematic approaches to aggregating and assessing research results to provide an objective and unbiased summary of research evidence. A systematic literature review is a procedural aggregation of precise outcomes of research. We explored several survey studies around our topic, shown in Table 3, to understand how researchers employ a systematic approach for their review process. Our general observation from analysis of the literature is that the existing surveys mostly focus on the BDA while missing a detailed analysis of DS and intersections of BDA and DS - a distinct and substantial contribution made by our study (Grover & Kar, 2017; Brinch, 2018; Nguyen et al., 2018; Kamble & Gunasekaran, 2020; Neilson et al., 2019; Talwar et al., 2021; Maheshwari et al., 2021). For instance, Maheshwari et al. (2021) conduct a systematic review for finding the role of BDA in SCM, but only select the keywords “Big data analytics” with “Supply chain management” or “Logistics management” or “Inventory management” which definitely miss many relevant studies using DS applications with big data.

1.2 Basic terminologies

Since several terms are used in the area of DS &BDA, we introduce here some of the main terminologies in the domain of our research.

Data science is a knowledge-based field of study that provides not only predictive and statistical tools for decision-makers, but also an effective solution that can help manage organisations from a data-driven perspective. DS requires integration of different skills such as statistics, machine learning, predictive analyses, data-driven techniques, and computer sciences (Kotu & Deshpande, 2018; Waller & Fawcett, 2013).

Big data includes the mass of structured or unstructured data and has been commonly characterised in the literature by 6Vs, i.e., “volume” (high-volume data), “variety” (a great variety of formats and sources), “velocity” (rapid growth in generation), “veracity” (quality, trust, and importance of data), “variability” (statistical variation in the data parameters) and “value” (huge economic benefits from low-data density) (Mishra et al., 2018; Chen et al., 2014).

Predictive analytics project the future of a SC by investigating its data and employing mathematical and forecasting models (Kotu & Deshpande, 2018).

Prescriptive analytics employs optimisation, simulation, and decision-making mechanisms to enhance business performance (Kotu & Deshpande, 2018; Chen et al., 2022).

Diagnostic analytics is a financial-analytical approach that aims to discover events causes and behaviours (Xu & Li, 2016; Windt & Hütt, 2011).

Descriptive analytics aims to analyse problems and provide historical analytics regarding the organisation’s processes by applying some techniques such as data mining, data aggregation, online analytical processing (OLAP), or business intelligence (BI) (Kotu & Deshpande, 2018).

The remainder of this paper is organised as follows: Sect. 2 describes our systematic research methodology to introduce the research questions, objectives, and conceptual framework, and to identify potential related studies. Section 4 presents and describes our content analysis results of the selected studies. Section 5 identifies gaps in the literature of DS &BDA within the context of SC &L. Finally, Sect. 6 concludes our study by summarising the significant features of our detailed framework and by providing several future research avenues.

2 Research methodology

2.1 Research questions and objectives

To develop a conceptual framework for our research, the following research questions (RQs) have been framed:

  1. 1.

    What strategies are required (in line with the systematic review protocol) to identify studies related to our research topic? (RQ1)

  2. 2.

    What can be inferred about the research process and guidelines from the previous survey studies related to DS &BDA in SC &L? (RQ2)

  3. 3.

    What research topics and methodologies have been investigated in DS &BDA in the context of SC &L? (RQ3)

  4. 4.

    What are the existing gaps in the literature for using DS &BDA techniques in SC &L? (RQ4)

Consequently, the research objectives are defined as follows:

  • Developing a comprehensive and unbiased systematic process to identify a methodological taxonomy of DS &BDA in SC &L.

  • Proposing a conceptual framework to categorize application areas of DS &BDA in SC &L.

  • Identifying the gaps and future research areas in development and application of DS &BDA techniques in SC &L.

2.2 Research process

Figure 2 depicts the nine main steps of our research process derived from Kitchenham (2004). The process includes three major phases: “research planning”, “conducting review”, and “reporting results”. We initially prepared the research plan by clarifying the research questions, defining the research objectives, and developing a review protocol for our study. A review protocol is an essential element in undertaking a systematic review and determines how primary studies are chosen and analysed. It also involves choosing beneficial resources/databases, study selection procedures or criteria (inclusion and exclusion criteria), and the proposed data synthesis method.

According to our defined review protocol, the second phase of the proposed research process (conducting the review) involves:

  • Conducting the analysis of recent review studies.

  • Material collection and identification of the available studies concerning the domain.

  • Developing a conceptual framework for reviewing and coding the collected studies.

Finally, we analyse the results of the content analysis and coding/classifying the selected studies. This phase also involves exploring the potential gaps and concluding with significant insights.

Fig. 3
figure 2

Outline of the research process

2.3 Review protocol

Our review protocol is a systematic process of searching, demarcating, appraising, and selecting of articles. A similar protocol has been adopted by a number of highly cited review papers in the literature (Nguyen et al., 2018; Wang et al., 2018b; Brinch, 2018). Material was collected from standard academic databases such as Web of Knowledge, Science Direct, Scopus, and Google Scholar, and only included “articles”, “research papers” or “reviews”. Results were limited to articles written in English language only between the years 2005 and 2021. The rationale behind this year range is the following: it will allow us to overview the latest studies to identify the research gaps in the area of DS &BDA in SC &L. Additionally, it will enable us to develop a coding strategy to formulate a conceptual framework for classifying the literature.

Initially a broad set of keywords were chosen to select potentially relevant studies. These keywords were “supply chain” OR “logistics”, along with at least one of the following keywords: “data science”, “data driven”, “data mining”, “text mining”, “data analytics”, “big data”, “predictive analytics”, and “machine learning”. However, additional search terms were identified later on from the relevant identified articles, to formulate more sophisticated search strings. We limited our search to articles that include the search keywords in their “title”, “abstract”, or “keywords”. The entire contents of the articles were not studied at this stage. If any database returned a huge number of articles during the search, we then followed a strategy to exclude or make selections from that database.

Table 1 shows the number of extracted papers from each database. This is further subdivided as per the keywords in Table 2. Since we followed a comprehensive approach and selected a broad range of keywords, it resulted in a large number of studies, in comparison with the related review papers. Investigating the search results from Google Scholar demonstrated that most of the articles were irrelevant. Therefore, we identified Google Scholar database’s result as unreliable and did not consider the associated articles for the selection process. Moreover, after a thorough content analysis of the review articles and an examination of their search keywords with our proposed keywords set (listed in Table 2), we recognised that the “SC analytics” and “big data analytics” set of words had been commonly used in most of these articles; thus, to provide a more comprehensive search process, we also added these two keywords to our previous set of search keywords. According to the above-mentioned selection process, the number of preliminary papers extracted from the three search databases was reduced to 6064. The last search process was applied in January 2023.

Table 1 Preliminary search results in the selected databases
Table 2 Preliminary search results in terms of each keywords set

In the next stages, duplicates from the databases were removed and, in order to ensure quality, only papers that were published in A* and A-ranked journals were selected,Footnote 1 or journal papers published in Q1-ranked journals in the SJR Report.Footnote 2. This process was repeated twice: once for identifying a database of only literature review studies, and once for all other studies (see Figs. 3 and 4). Regarding the selection of review studies, we also looked at papers with the most citations in the Google Scholar database. We selected these studies by sorting the search list collected by each keyword set (see the last column of Table 2). This stage could not be applied to the process of selecting all studies, as the most cited non-review studies, listed in the Google Scholars search, were books, chapters, and other non-relevant articles. At the last stage, a content analysis was done to exclude those studies that were not closely associated with our field of interest.

2.4 Analysis of recent relevant review studies

As noted in the research process, we initially aimed to deliberate recent relevant review studies. The purpose of this approach is twofold. First, we are able to overview the latest review approaches and identify the interest and research gaps in the area of techniques involving DS &BDA in SC &L. Second, it helps us summarise coding strategies to develop a conceptual framework for classifying the literature.

Following our review protocol, the word “review” was added to the previous keywords, and the search process was repeated. This was done to extract only literature review studies from the shortlisted databases. No thorough analysis regarding the content of these studies was done at this time. This reduced the number of potential studies to 459. Amongst these, we found 317 duplicates. Furthermore, the focus on A* and A-ranked journals reduced the number of papers to 18. Then, we investigated the relevance of the remaining papers to our field of interest. After precisely reading the abstract and introduction sections, the papers not strongly associated with the subject or to the field of this research were also removed. Finally, 16 potential review studies were selected in this stage for a full text analysis. Figure 3 illustrates our meticulous selection procedure for selecting these articles. To ensure the comprehensiveness of the analysis, we also selected the most cited review studies listed in the Google Scholar database. The keywords set, as well as the search and filtering process, were applied as noted in the review protocol. We found three more relevant review papers in this stage (see the second stage of the process in Fig. 3). It is worth mentioning that some survey studies that only focus on BDA and SCs without a relevance to DS and logistics (e.g. Xu et al. (2021a)), have been removed from our list. Finally, 23 papers were selected from our two stages after full text filtering and content analysis.

Fig. 4
figure 3

Research selection process regarding the literature review papers

3 Content analysis and framework development

3.1 Lessons from the review studies

To answer the second research question (RQ2) of our study, outlined in Sect. 2.1, we analysed the content of the 23 selected review articles. Table 3 summarises these latest review articles. We categorise the lessons gained from this content analysis step in the following subsections.

3.1.1 Review methodologies

The investigation of venues of the selected survey studies introduced the top journals, listed in Table 3. These journals are mostly among A*/A/Q1 -ranked journals. This confirmed that our approach regarding the inclusion of highly ranked journals was a correct strategy to limit the selected documents. Moreover, by looking over the search engines used by the survey studies (see the column Search Engines in Table 3), we confirmed our main databases for the material selection process in which we selected all relevant studies.

In Table 3, the column Type of Review refers to the research methodology employed for reviewing the selected studies. From a total of 23 review papers, eight of them did not utilise any type of “systematic” (SR) or “bibliometric” (BIB) methods, and by investigating their research methodology in more detail, we categorised these articles as “others” (ORS), which means that they did not utilise an organised research methodology for their review process. Three articles reviewed the literature bibliographically (Pournader et al., 2021; Mishra et al., 2018; Iftikhar et al., 2022b), and one article (Arunachalam et al., 2018) chose both methods (SR and BIB). The systematic approach also claimed to be implemented on some BIB based methods (see Pournader et al. (2021))

3.1.2 Gaps identification in research topics

Although the authors asserted a holistic view in their research process, they mostly investigated the SC from an operations viewpoint, i.e., production, logistics, inventory management, transportation, and demand planning (see the coding and classification in these studies: Nguyen et al. (2018); Tiwari et al. (2018); Choi et al. (2018); Maheshwari et al. (2021)). Moreover, it seems that some aspects of SC operations were overlooked by the researchers (see the column Perspective and Special Features of Neilson et al. (2019) and Novais et al. (2019)). For instance, some of the production and transportation aspects, such as the “network design”, “facilities capacity”, and “vehicle routing” were not reviewed by Nguyen et al. (2018) or Maheshwari et al. (2021), although they follow a comprehensive approach.

Table 3 Recent review studies about the techniques of DS &BDA in SC &L

Any decision around a SC can be classified at three planning levels, i.e., “strategic”, “tactical”, and “operational” (Stadtler & Kilger, 2002; Ivanov et al., 2021b). DS &BDA can provide useful solutions at each of these planning levels (Nguyen et al., 2018). Wang et al. (2016a) focused on the value of SC analytics and the applications of BDA on strategic and operational levels. They acknowledge the importance of BDA for the SC strategies, which in turn, affects the “SC network”, “product design and development”, and “strategic sourcing”. They also note that at the operational level, BDA plays a critical role in the effective performance of analysing and measuring “demand”, “production”, “inventory”, “transportation” and “logistics”. The authors do not utilise the basic definitions and categorisations of the decisions levels that existed in the literature (Stadtler & Kilger, 2002).

3.1.3 Gaps identification in DS &BDA techniques

Grover and Kar (2017) classify BDA into “predictive”, “prescriptive”, “diagnostic”, and “descriptive” categories. Kotu and Deshpande (2018) note that these classifications can be considered for any research using DS &BDA tools. However, some review studies around BDA only refer to three of these classifications (predictive, prescriptive, and descriptive categories) (Wang et al., 2016a; Arunachalam et al., 2018; Nguyen et al., 2018). Nguyen et al. (2018) classify the applications of BDA in SC &L based on the main three categories and conclude that prescriptive analytics is more controversial than the other two, since the results of this type of analytics are strongly influenced by the descriptive and predictive types.

Considering a broader exploration of logistics for any company, DS &BDA has significant importance in transportation systems for enhancing safety and sustainability. Neilson et al. (2019) review the applications of BDA from only the logistics perspective. They concede that the applications of BDA in the transportation system can be categorised as sharing traffic information (avoiding traffic congestion), urban planning (developing transportation infrastructures), and analysing accidents (improving traffic safety). Since the authors focus only on transportation systems, they explore the data collection process from urban facilities only such as smartphones, traffic lights, roadside sensors, global positioning systems (GPSs), and vehicles. The authors focus on special data types and formats that are mostly used in urban applications. They also classify the application of BDA in transportation into several categories, including predictive, real-time, historical, visual, video, and image analytics.

The special characteristics of big data as noted in our research terminologies (see Sect. 1.2), have been researched in certain survey studies. For instance, Addo-Tenkorang and Helo (2016) propose a framework based on the Internet of things (IoT), referred to as “IoT-value adding”, and extend five traits for big data: variety, velocity, volume, veracity, and value-adding. IoT is defined as the connectivity and sharing of data between physical things or technical equipments via the Internet (Addo-Tenkorang & Helo, 2016). Studies around BDA also list recent technologies and tools employed for dealing with large data sets. These technologies include but are not limited to cloud computing, IoT, and master database management systems (MDMS), which are associated with the veracity characteristic of big data; additionally, the tools include Apache Hadoop, Apache Spark, and Map-Reduce. In the case of the SC, Chen et al. (2014) concede that big data can be acquired from elements of the SC network, such as suppliers, manufacturers, warehouses, retailers, and customers, which are related to the variety characteristic of big data. Some of the researchers such as Brinch (2018) and Addo-Tenkorang and Helo (2016) consider the value of big data in SC &L. Brinch (2018) introduces a conceptual model for discovering, creating, and capturing value in SC management. Arunachalam et al. (2018) note that assessing the current state of an organisation on BDA will help its managers enhance the company’s capabilities. The authors suggest five BDA capabilities dimensions: “data generation”, “data integration and management”, “data advanced analytics”, “data visualisation”, and “data-driven culture”. The first two capabilities represent the level of data resources, whereas the second two demonstrate the level of analytical resources. The last is the foundation capability, compared to the other capabilities, which needs to be institutionalised in any organisation. Kamble and Gunasekaran (2020) also affirm that the performance measures used in a data-driven SC must be different from a traditional SC. For this purpose, the authors identify two categories of measures for data-driven SC performance monitoring: BDA capability and evaluating processes. BDA tools and platforms are also categorised into five groups according to the type of provided service: “Hadoop”, “Grid Gain”, “Map Reduce”, “High-performance computing cluster (HPCC) systems”, and “Storm” (Grover & Kar, 2017; Addo-Tenkorang & Helo, 2016). Each of these tools has different applications in SC &L.

Regarding the statistical techniques indexed in the review studies and their categories, we found the following classifications:

  1. 1.

    Techniques to measure data correlation (such as statistical regression (Zhang et al., 2019) and multivariate statistical analysis (Wesonga & Nabugoomu, 2016)).

  2. 2.

    Simulation techniques (Wojtusiak et al., 2012a; Antomarioni et al., 2021)).

  3. 3.

    Optimisation techniques (including heuristic algorithms such as the genetic algorithm (Chi et al., 2007) and particle filters (Wang et al., 2018c)).

  4. 4.

    Machine learning methods (e.g. neural networks (Tsai & Huang, 2017), support vector machines (Weiss et al., 2016)).

  5. 5.

    Data mining methods (e.g., classification (Merchán & Winkenbach, 2019), clustering (Windt & Hütt, 2011), regression (Benzidia et al., 2021)).

These studies note that every technique has its strengths and weaknesses. For instance, statistical methods are fast but not adaptable enough to all problems. These methods cannot be applied to an unstructured and heterogeneous data set, while machine learning techniques are flexible, adaptable, yet time-consuming (Wang et al., 2016a; Choi et al., 2018; Pournader et al., 2021). Some studies such as Ameri Sianaki et al. (2019) investigate the applications of DS &BDA in a specific industry such as healthcare or smart cities. The authors find applications of DS &BDA in healthcare SCs and classify them as “patients monitoring”, “diagnosing disorders”, and “remote surgery”. Each of the mentioned techniques can also be applied in different types/levels of analytics. For example, optimisation is a prescriptive analytic and cannot be predictive. However, simulation techniques can be used in predictive, diagnostic, and prescriptive analytics (Baryannis et al., 2019a). Therefore, one perspective that can help us define the conceptual analysis of articles is the categorisation of DS &BDA techniques based on different analytical types/levels. This categorisation proposes guidelines for practitioners as well.

These review studies also investigate their selected articles in certain specific domains in SC &L such as SC risk management, in which decision-making is required to be fast, and the data is acquired from multi-dimensional sources. Baryannis et al. (2019a) explore risk and uncertainty in the SC by reviewing the applications of AI in BDA. The authors categorise the methods proposed for SC risk management in two main classes: mathematical programming and network-based models, and find that mathematical programming approaches have received more attention in the literature. Data-driven optimisation (DDO) approaches are other recent and effective approaches used in this area of research (Jiao et al., 2018; Gao et al., 2019; Zhao & You, 2019; Ning & You, 2018). The related methods are recognised as a combination of machine learning and mathematical programming methods for making decisions under uncertainty. DDO approaches can be further subdivided into four categories: “stochastic programming”, “robust optimisation”, “chance-constrained programming”, and “scenario-based optimisation” (Ning & You, 2019; Nguyen et al., 2021). In the DDO approaches, uncertainty is not predetermined, and decisions are made based on real data. Therefore, these are the main differences between data-driven approaches and traditional mathematical approaches. The results of the DDO methods are also less conservative, and consequently, closer to reality. The selection of techniques and tools is very critical because they strongly influence the outputs of analytics.

One of the most widely used tools for managing and integrating data in SC &L is cloud computing (Mourtzis et al., 2021; Sokolov et al., 2020). With this tool, the data is stored in cyberspace and serviced according to user needs. This technology can play an important role in SC &L. Novais et al. (2019) explore the role of cloud computing on the chain’s integrity. Some studies (Jiang, 2019; Zhu, 2018; Zhong et al., 2016) show that the impact of cloud computing on the integration of the SC (financially or commercially) is positive. This technology helps improve the integration of information, financial and physical flows in the SC via information sharing between the SC members, optimising payment and cash processes among partners, and controlling inventory levels and costs. In the case of information sharing, we also found the fuzzy model developed by Ming et al. (2021) as a valuable method considering BDA concerns.

3.2 Material collection

By looking at the keywords, we found that two combinations of them, i.e., “data mining” AND “logistics” and “machine learning” AND “logistics”, had yielded the most search results. Precisely reviewing some of the articles, instead of merely the “logistics” word, the “logistics regression” phrase was detected, which is a common methodology in data mining, and not in the transportation field. Therefore, the keyword “logistics regression” was excluded from the list via “AND NOT”. The selection process resulted in 6681 potential papers. In the next step, we excluded irrelevant papers by overviewing the abstracts and keywords. Articles related to “conceptual analysis”, “resource dependency theory”, “the importance of BDA”, “management capabilities”, and “the role or application of the BDA in the SC” were identified as unrelated articles. We also removed the review papers, as we explored them in the previous step, separately. These filtering criteria reduced the number of papers to 2583 (see Table 4). Figure 4 illustrates the selection procedure in detail. After removing duplicates and filtering for highly ranked journals, 1167 studies remained. In the next step, we scanned the papers’ abstracts (and in some cases, the full text) to examine the relevance of the paper to our domain. Since we aimed to limit our selection to research employing any DS &BDA models/techniques, we removed several papers that were using only conceptual models. A total of 364 articles were finally selected to go through the full text analysis and coding step.

Fig. 5
figure 4

Research selection process regarding all relevant papers excluding review papers

Table 4 Results of material collection after excluding irrelevant and review studies

3.3 Conceptual framework of DS &BDA in SC &L

Considering all the insights gained from the previous review studies, we propose a conceptual framework of our study encompassing two perspectives: (1) SC &L research problems/topics and (2) DS &BDA main approaches. This structure can help practitioners apply DS &BDA approaches for creating a competitive advantage. According to our research process outlined in Fig. 2, we revised the list of each category with the help of a recursive process and gathered feedback from the full content analysis of the selected studies.

Figure 5 illustrates our proposed categorisation from the SC &L research problems/topics viewpoint in a framework. We classify SC operational processes (i.e., procurement, production, distribution, logistics, and sales) into three hierarchical levels of decision-making used in SC &L companies (Stadtler & Kilger, 2002). In the first operational process, we highlight decisions about procurement planning, which includes concerns about raw materials and suppliers (Cui et al., 2022). Production planning organises the products’ design and development (Ialongo et al., 2022). These issues coordinate suppliers’ and customers’ requirements. Distribution planning influences production and transportation decisions. Logistics or transportation planning deals with methodologies related to delivering products to end-customers or retailers. Sales planning is related to trades in business markets. We also consider SC design as a strategic decision and classify the studies in resilient, sustainable, and closed loop and reverse logistics categories.

Fig. 6
figure 5

Conceptual framework of reviewing the selected studies from the SC &L research problems/topics viewpoint

Figure 6 demonstrates our conceptual framework proposed for the classifications of the DS &BDA main approaches. DS &BDA algorithms/techniques for SC &L are categorised based on this proposed framework.

Fig. 7
figure 6

Conceptual framework of reviewing the selected studies from the viewpoint of the employed DS &BDA main approaches

All DS &BDA approaches, shown in Fig. 6, are applied to each topic listed in Fig. 5. In the next section, we explore our 364 selected articles in more detail with respect to each of these categories.

4 Context analysis of results

Responding to the third research question (RQ3), we initially visualise the research sources in the scope of a yearly distribution, publication venues, and analytics types. In the coding process, we review the context of the selected papers precisely and classify them based on the proposed framework. In this step, we explore the main contributions of the selected papers. It is worth mentioning that with the recursive process, we complete the proposed framework so as to cover all topics (the final version of the framework is delineated in Figs. 5 and 6). Consequently, we evaluate and synthesise the selected studies at the end of this section.

4.1 Data visualisation and descriptive analysis

4.1.1 Distribution of papers per year and publications

To identify the journals with the highest number contributions, and to provide an overview of the research trends, we classify all selected papers based on the publication per venue and year (see Fig. 7). Figure 7a depicts the distribution of published papers between 2005 to August 2022. It can be observed that before 2005, the domain of DS &BDA was not investigated, and there is an insignificant contribution until 2012. In fact, before 2012, the concept of DS &BDA was considered as data mining or BI (Arunachalam et al., 2018). The Google trend of interest regarding DS &BDA topics, depicted in Fig. 1, also confirms this trend and the consideration of DS &BDA after the year 2012. The publication trend also shows that the applications of DS &BDA in SC &L have attracted the attention of many researchers in the past four years. As the chart shows, the number of papers published in the last five years is approximately doubled to the summation of those in the previous years. Apparently, the number of studies has been declined since 2020 which is expected due to the specifics of the COVID-19 period.

Fig. 8
figure 7

Distribution of the selected papers per year and for the top ten journal venues

The number of publications in the top ten journals is illustrated in Fig. 7b. Overall, we found 157 various journal venues for all of our 364 selected studies in the domain of DS &BDA, with most of them in the “information system management” and “trasportation” 2020 SJR subject classification.Footnote 3 It is noticeable that a significant proportion of the studies (over 45%) have been published by high-impact journals, such as CIE, ESA, IJPR, JCP, and IJPE. Also, it is worth mentioning that the ESA journal recently has got the most publications in the field of DS &BDA applications in SC &L. The ESA journal is an open access journal whose focus is on intelligent systems applied in industry. The CIE and IJPR are both in the second rank, which mostly concentrates on SC &L, compared to “information systems”. Other journals in Fig. 7b are among the most popular journals published in the field of SC &L.

4.1.2 Types of analytics approaches

The analytics type for each selected study needs to be further investigated. According to the classification introduced by Grover and Kar (2017) and Arunachalam et al. (2018), four types of analytics can be defined: descriptive, diagnostic, predictive, and prescriptive. Due to an extremely limited number of studies classified on diagnostic analytics (7 out of 364 publications in our data set), this area was excluded from our classification, similar to the survey study of Nguyen et al. (2018). A classification in each field of analytics is conducted based on the applied models and common techniques of analysis, as outlined in Table 5 (see also Wang et al. (2016a); Grover and Kar (2017); Nguyen et al. (2018) for a description of these analytics types). The simulation approach is listed in both the predictive and prescriptive analytics (Viet et al., 2020; Wojtusiak et al., 2012b; Wang et al., 2018c).

Figure 8 shows the annual distribution of analytics types over time. Predictive analytics methodologies have become more popular in 2019–2022. 45% of the articles have followed a predictive approach in their proposed solution, which is the highest proportion compared to the other types of analytics. This is justified by the development of analytical tools and the ability to access dynamic data in addition to historical data (Arunachalam et al., 2018).

Fig. 9
figure 8

Annual distribution of the selected studies with respect to analytics type

Table 5 Classification of the employed DS &BDA models and techniques in SC &L
Fig. 10
figure 9

Distribution of the articles by DS &BDA approaches

Additionally, we analyse the distribution of approaches used in the articles (see Table 5). Figure 9a–c show the distribution of the main approaches employed in the selected studies regarding each type of analytics. Among all predictive approaches, we found that neural network is the most favourable technique, employed in 19% of the selected papers in the various main approaches of DS &BDA, such as forecasting, classification, and clustering. Moreover, among the main approaches and algorithms, the graph visualisation techniques are the most employed methods in the field of this survey (29% of the selected papers used this technique).

Ensemble learning is the process by which several algorithms/techniques (including forecasting or classification techniques) are strategically combined to solve a particular DS &BDA problem. Regarding the selection of appropriate techniques, ensemble learning can be employed to help reduce the probability of an unlucky selection of a poor technique and can improve performance of the whole model (Zhu et al., 2019b; Hosseini & Al Khaled, 2019). Deep learning is an evolution of machine learning that uses a programmable neural network technique and can be employed for forecasting, classification, or any predictive model (Bao et al., 2019; Pournader et al., 2021; Rolf et al., 2022).

4.1.3 Methodological perspectives

Descriptive analysis is adopted in approximately 33% the examined literature. These articles have commonly used clustering, association, visualisation, and descriptive approaches in DS &BDA (see Fig. 9a). The trend of using these approaches in the articles has almost been ascending, especially the visualisation ones that have received much attention in the last four years. Data visualisation is a beneficial tool for SC &L in different areas. The graphs and OLAP techniques have been the methods mainly used in data visualisation. This is because visualisation approaches are able to depict a portion of the research problem and are applicable to all areas of SC &L. In the clustering approach, there are a variety of techniques and algorithms. K-means clustering is the most discussed technique, which is used in analysing energy logistics vehicles (Mao et al., 2020), traffic accidents (Kuvvetli & Firuzan, 2019), traffic flows (Bhattacharya et al., 2014), pricing (Hogenboom et al., 2015), and routing (Ehmke et al., 2012). The third most commonly used approach in descriptive analytics is the association approach, which means the measure of association between two variables. The Apriori algorithm is the most popular association algorithm, which has been used in various issues, including transportation risk (Yang, 2020), demand forecasting (Kilimci et al., 2019), quality management (Wang & Yue, 2017), vehicle routing (Ting et al., 2014), research and development (R &D) (Liao et al., 2008b), and customer feedback (Singh et al., 2018a).

In the predictive analytics type, the classification approach is very popular (see Fig. 9b). The most common algorithms used in the classification approach are SVM (20%), decision trees (19%), logistic regression (19%), and neural networks (11%). This approach is usually applied in decisions corresponding to demand forecasting (Nikolopoulos et al., 2021; Yu et al., 2019b; Gružauskas et al., 2019; Zhu et al., 2019a), quality management (Bahaghighat et al., 2019), customer churn (Coussement et al., 2017), delivery planning (Proto et al., 2020; Wang et al., 2020; Praet & Martens, 2020), and routing (Spoel et al., 2017). Next, regression techniques have received high attention. Both linear regression models (37%) and SVR (24%) are the most commonly used techniques in regression models. These regression models are mainly applicable to logistics decisions such as traffic accidents (Farid et al., 2019; Wang et al., 2016b), vehicle delays (Eltoukhy et al., 2019), delivery planning (Ghasri et al., 2016; Merchán & Winkenbach, 2019), and sales decisions such as demand forecasting (Nikolopoulos et al., 2021, 2016) and sales forecasting (Lau et al., 2018).

The neural network is an important and common technique for forecasting and can be applied in a wide variety of problems such as supplier selection (Pramanik et al., 2020) and demand or sales forecasting (Verstraete et al., 2019). Time series modelling is the fourth predictive approach. ARIMA (34%), exponential smoothing (17%), and moving averages (18%) are the most popular techniques for DS &BDA time series modelling. These techniques are usually applied for demand forecasting (Kilimci et al., 2019; Huber et al., 2017). In this survey, we find that ARENA and AnyLogic simulation software are used more than others for shop floor control simulations (Yang et al., 2013), machine scheduling (Heger et al., 2016), and routing (Ehmke et al., 2016). Text mining is a useful approach for understanding the feelings and opinions of customers or people. In the examined papers, this method has been used in only 8 articles in the fields of customer feedback (Hao et al., 2021), sales forecasting (Cui et al., 2018), SC mapping (Wichmann et al., 2020).

Prescriptive analytics has the lowest number of contributions, compared to the other types of analytics. The optimisation models, simulations, and multi-criteria decision-making are the main approaches of the prescriptive analytics type. Among them, optimisation has the most contributions (78% out of the prescriptive analytics studies). The optimisation techniques are most often used to optimise the facility location (Doolun et al., 2018), location of distribution centres (Wang et al., 2018a), type of technology (shen How & Lam, 2018), capacity planning (Ning & You, 2018), number of facilities (Tayal & Singh, 2018), inventory management (Çimen & Kirkbride, 2017), and vehicle routing (Mokhtarinejad et al., 2015). In addition to optimisation, MCDM approaches are also used in decision-making. This approach is classified into two main technique categories (MADM and MODM) and applied to customer credit risk (Lyu & Zhao, 2019), supplier selection (Maghsoodi et al., 2018), inventory management (Kartal et al., 2016), and SC resilience (Belhadi et al., 2022).

4.1.4 Technique verification strategies

In order to solve an SC &L problem, a suitable algorithm/technique must be selected and then evaluated through a proper set of data. Figure 10 shows the percentages of the applied verification strategies. In the examined literature, researchers mainly employ case study strategy with real data to verify their selected approaches and models (Antomarioni et al., 2021; Nuss et al., 2019). However, a few others have used a generating data strategy (i.e., synthetic data) that is mainly seen in simulation techniques (Kang et al., 2019). Hence, almost all of algorithms/techniques require real data to be verified (Choi et al., 2018).

Fig. 11
figure 10

Distribution of research verification strategies

4.1.5 Comparison with previous survey studies

We compare our results with the recent survey studies listed in Table 3. The comparison of top journals demonstrates that our unbiased approach in finding studies includes more relevant journals focusing on information systems (e.g., ESA and IEEEA journals). For instance, in the survey by Nguyen et al. (2021), all top journals are listed among SC &L-focused journals (IJPE, TRC, IJPR, and ICE). This survey has employed only “data-driven” or “data-based” keywords that do not cover all aspects of data science or data analytics applications (e.g. machine learning, deep learning, big data, etc). The authors only use previous survey studies to find all keywords related to SC &L.

Comparison of “Search Engines” in Table 3 demonstrates that most of the previous surveys use a sole database (mostly Scopus) for their search process and do not double check or confirm the process by other databases. Our systematic process concluded many duplicates (see Tables 1 and 2 ) but by handling these duplicates we reached a more clean and accurate data set.

4.2 Classification of studies based on the conceptual framework

As depicted in Fig. 5, SC &L is comprised of five internal processes: procurement, production, distribution, logistics, and sales. In each process, a hierarchical triple planning structure is required: (1) long-term planning or strategic decision-making over a multi-year scheduling horizon, (2) mid-term planning or tactical decision-making over a seasonal or maximum one-year scheduling horizon, and (3) short-term planning or operational decision-making, which has a planning period between a few days up to one season (Sugrue & Adriaens, 2021).

An overview of the processes shows that the logistics process has received more interest, especially during the last two years (128 papers, 31% of the corpus). Sales is another frequently studied field in applying DS &BDA (83 papers, 20% of the examined literature). Figure 11 illustrates the distribution of studies by each decision level. In the procurement process, supplier selection (27 papers) and order allocation (17 papers) are the most discussed. The results of our investigations indicate that long-term decisions such as the plant location (Doolun et al., 2018), type of technologies (Vondra et al., 2019), and R &D (Liao et al., 2009) have made considerable contributions to improve production decisions. For example, an inappropriate network design incurs high costs (Song & Kusiak, 2009). The two key aspects at the mid-term production decision level in DS &BDA are master production scheduling (determining production quantities at each period) and quality management (Masna et al., 2019). Shop floor control has been of interest to researchers in short-term production planning (Yang et al., 2013). The results further show that in distribution tactical decisions, most of the papers discuss inventory management (Sachs, 2015), while in this process at the long- and short-term decision levels, the issues of distribution centre location (Wang et al., 2018c), warehouse replenishment (Priore et al., 2019), and order picking decisions (Mirzaei et al., 2021) have been attractive to researchers.

Fig. 12
figure 11

Distribution of the selected studies with respect to SC &L processes and planning levels

Logistics decisions have been mainly studied at the short-term level, i.e., vehicle routing (Tsolakis et al., 2021) and delivery planning (Vieira et al., 2019a). Unlike other processes, most articles discuss the operational decisions compared to the other levels (59% of logistics planning studies). Subsequently, mid-term transportation planning decisions, including material flow rate issues (Wu et al., 2019), have been investigated more frequently. In the sales process, decisions and issues are mainly planned at the mid-term level. Hence, decisions regarding customer demand forecasting (Yu et al., 2019b), pricing (Liu, 2019), and sales forecasting (Villegas & Pedregal, 2019) are the three most commonly studied issues in this process.

Overall, at the long-term planning level, most articles contribute to production (35 papers) and procurement decisions (30 papers). At the mid-term decision level, due to attractive issues such as demand forecasting, sales process has been the most investigated area (70 papers). After that, logistics process is in second place (40 papers) in the form of contributions involving transportation planning issues. At the short-term level, logistics process is at the forefront (76 papers). Vehicle routing (Yao et al., 2019), delivery planning (Vieira et al., 2019a), financing risk (Ying et al., 2021), and transportation risk (Zhao et al., 2017) have been addressed more frequently. Subsequently, distribution process, with a large difference in contributions (20 papers), ranks second. Material ordering (Vieira et al., 2019b) and customer feedback (Singh et al., 2018a) have the lowest contribution in terms of applying DS &BDA among other processes at the short-term decision level.

4.2.1 Long-term decisions in SC &L

Long-term procurement decisions deal with supplier selection (Hosseini & Al Khaled, 2019) and supplier performance (Chen et al., 2012). During the production and distribution processes, these decisions are made considering the network design of factories and the distribution centres such as the location, number, types of facilities, and centre capacity (Mishra & Singh, 2020; Flores & Villalobos, 2020; Mohseni & Pishvaee, 2020), whereas, strategic decisions in the logistics process comprise planning with respect to the transportation system infrastructure, carrier selection, and capacity design (Lamba & Singh, 2019; Lamba et al., 2019). The decisions include customer service level determination, strategic sales planning, and customer targeting through sales category.

We also consider the SC design decisions in this category, including resilient SC (Brintrup et al., 2020; Belhadi et al., 2022; Mungo et al., 2023; Mishra & Singh, 2022; Hägele et al., 2023), sustainable SC (Bag et al., 2022b), closed-loop (Govindan & Gholizadeh, 2021) and reverse logistics (Shokouhyar et al., 2022). A more complete categorisation of the related articles is summarised in Table 6.

4.2.2 Efficiency, sustainability, and resilience paradigms

The COVID-19 pandemic has clearly shown the importance of resilient SC designs (Rozhkov et al., 2022). SC resilience refers to having the capability to absorb or even avoid disruptions (Ivanov, 2021a; Kosasih & Brintrup, 2021; Yang & Peng, 2023). Belhadi et al. (2022) concede that AI techniques provide capable solutions for designing and upgrading more resilient SCs. Zhao and You (2019) develop a resilient SC design by employing a data-driven robust optimisation approach and demonstrate how the DS &BDA concepts should be considered in SC models.

SC sustainability refers to consideration of environmental, societal, and human-centric aspects in SC decisions (Li et al., 2021b; Homayouni et al., 2021; Sun et al., 2020; Li et al., 2020a). Mishra and Singh (2020) develop a sustainable reverse logistics model by considering realistic parameters. They affirm that all three aspects of sustainability can be covered by BDA. Tsolakis et al. (2021) conduct a comprehensive literature review for AI-driven sustainability and conclude that the most essential techniques in modelling SCs are AI and optimisation techniques.

A closed-loop SC employs reverse logistics to supply re-manufactured products back into the forward logistics process. Jiao et al. (2018) develop a data-driven optimisation model to integrate sustainability features in a closed-loop SC. Shokouhyar et al. (2022) employ social media data for modelling a customer-centric reverse logistics with an emphasis on the BDA approaches for designing reverse logistics SCs.

Table 6 Strategic decisions made in SC &L processes and analytics types selected for the solution
Table 7 Tactical decisions made in SC &L processes and analytics types selected for the solution

4.2.3 Mid-term decisions in SC &L

The selected paper categorisation at the tactical decision level is outlined in Table 7. Decisions regarding the allocation of orders to suppliers such as the order quantity planning and lot sizing (Lamba & Singh, 2019), supply risk management (Baryannis et al., 2019a), raw materials quality management (Bouzembrak & Marvin, 2019), material requirement planning (Zhao & You, 2019), material cost management (Ou et al., 2016), and demand forecasting (Stip & Van Houtum, 2020) are all considered as mid-term procurement decisions.

The main tasks in mid-term production planning are master production scheduling (Flores & Villalobos, 2020), capacity planning (Sugrue & Adriaens, 2021), quality management (Ou et al., 2016), and demand forecasting (Dombi et al., 2018), while inventory management decisions (Ning & You, 2018), capacity planning (Oh & Jeong, 2019), in-stock product quality management issues (Ou et al., 2016), and warehouse demand forecasting (Zhou et al., 2022b) are among the tactical distribution decisions.

Some of the main mid-term logistics decisions are transportation planning (Wu et al., 2020; Gao et al., 2019), service quality management (Gürbüz et al., 2019; Molka-Danielsen et al., 2018), transportation modes (Jula & Leachman, 2011), and demand forecasting (Potočnik et al., 2019; Boutselis & McNaught, 2019). Demand forecasting (Lee et al., 2011; Shukla et al., 2012), demand shaping (e.g., marketing) (Aguilar-Palacios et al., 2019; Liao et al., 2009), sales forecasting (Wong & Guo, 2010), pricing (Hogenboom et al., 2015), consumer behaviour (e.g., purchasing pattern) (Bodendorf et al., 2022b; Garcia et al., 2019), and customer churn (Coussement et al., 2017) are planned in the tactical sale decisions.

4.2.4 Short-term decisions in SC &L

Short-term procurement planning includes ordering materials (Vieira et al., 2019b). Production operational decisions include machine scheduling (Yue et al., 2021), shop floor control (such as preventive maintenance scheduling (Celik & Son, 2012) and material flows (Zhong et al., 2015)), and decisions regarding the size of the production batch (Sadic et al., 2018). In the area of distribution, planning associated with packaging (Kim, 2018), warehouse replenishment (Taube & Minner, 2018), order picking (Mirzaei et al., 2021), and inventory turnover (Zhang et al., 2019) could be made in short-term decisions. A variety of operational decisions can be made at the logistics stage, including delivery planning (Praet & Martens, 2020), vehicle delay management (Kim et al., 2017), routing planning (Liu et al., 2019), and transportation risk management (Wu et al., 2017).

At this level of decision-making, due to the wide variety of decisions, we consider more categories than other levels. For example, we consider vehicle delivery planning (Vieira et al., 2019b) and vehicle routing (Yao et al., 2019) as two separate categories. Also, in order to reduce the number of categories, we aggregate crash risk (Bao et al., 2019), traffic safety (Arbabzadeh & Jafari, 2017), and fraud detection decisions (Triepels et al., 2018) in the transport risk management category. Table 8 shows the results of reviewing the short-term decisions.

Table 8 Operational decisions made in SC &L processes and analytics types selected for the solution

5 Identification of research gaps

To answer the fourth research question (RQ4), we evaluate the selected studies in details to find any existing gaps in the literature for using DS &BDA approaches in SC &L. We categorise our findings in the following sub-sections.

5.1 Data-driven optimisation

DDO has received a considerable attention. In our study, we aimed to identify related techniques by adding the word “data-driven” to our keyword set (see the preliminary search results for DDO in Table 2). DDO is a mathematical programming method that combines uncertainty approaches for optimisation with machine learning algorithms. The objective functions are often cost-related (Alhameli et al., 2021; Baryannis et al., 2019a). Ning and You (2019) divided DDO into four modeling methods: stochastic programming, chance-constrained programming, robust optimisation, and scenario-based optimisation. In the SC &L area, some of the problem parameters may be considered as uncertain such as customer demand (Medina-González et al., 2018; Taube & Minner, 2018), production capacity (Jiao et al., 2018), and delivery time (Lee & Chien, 2014). In comparison with the traditional optimisation models under uncertainty, which consider perfect information for the parameters, DDO approaches employ information of random variables direct inputs to the proposed programming problems.

In our examined material, 21 papers studied optimisation under uncertainty. The stochastic programming methods (e.g., MILP and MINLP) were the most applied methods (e.g., Flores and Villalobos (2020); Taube and Minner (2018)). Chance-constrained programming is an optimisation method in which the constraints in the probability distribution must be satisfied. This method has practical applicability in SC &L (Jiao et al., 2018). In robust optimisation, the uncertainty sets (the set of uncertain parameters) must be specified the in case of data sets. Therefore, in order address uncertainty in the SC &L area, this method seems to be more efficient. As in the SC &L, we are mostly facing uncertain data (Gao et al., 2019). In scenario-based optimisation, uncertainty scenarios are used to find an optimal solution. In our selected studies, there was no study using this method. It seems that this method has research potential, as long as the scenarios are created as a set of data, and the scenario-based DDO methods are applied especially in risk management (Baryannis et al., 2019a).

Considering that BDA applications to SC &L are still in the process of development, employing BDA techniques (e.g., cloud computing or parallel computing) or tools (e.g., Hadoop, Spark, or Map-Reduce solutions) can be sonsidered as important future directions for using DDO methods in decision-making (Ning & You, 2019). Big data-driven optimisation (BDDO) methods, which are a combination of methods dealing with big data and techniques employing DDO, could be of interest in terms of solving several problems in SC &L.

5.2 SC &L processes and decision levels

The framework used in our study revealed the contributions of DS &BDA in SC &L processes. The material evaluation from the SC &L process point-of-view shows that the two processes of distribution and procurement are discussed less often in all three hierarchical levels of decision-making in the SC &L. While the SC is a set of hierarchical processes, and the decisions at each level are influenced by the ones from other levels and processes (Stadtler & Kilger, 2002), more attention can be given to distribution and procurement decisions, especially at the strategic level.

In the process of procurement, most of the studies focused on mid-term decisions (such as order allocation decisions (Kaur & Singh, 2018), supplier risk management (Brintrup et al., 2018), MRP (Zhao & You, 2019), and so forth), while short-term decisions in this process (e.g., ordering materials (Vieira et al., 2019b)) have received the least amount of attention.

Short-term decisions in the production process, such as lot sizing (Gao et al., 2019) and machine scheduling (Simkoff & Baldea, 2019) decisions, have received less attention compared to strategic decisions. In the process of distribution, warehouse capacity planning (Oh & Jeong, 2019) and inventory turnover (Zhang et al., 2019) decisions have been partially ignored representing a visible research gap. For example, capacity design (Gao et al., 2019) requires more attention in the domain of logistics processes. Shipment size planning (Piendl et al., 2019) has been identified as one of the most important decisions. In sales processes, customer feedback (Hao et al., 2021) is crucial in determining organisation strategies; however, this field has not received enough attention so far.

5.3 DS &BDA approaches, techniques, and tools

Our results demonstrate that a wide range of models and techniques can be used in the SC &L area. Nevertheless, some techniques are employed less. For instance, OLAP is a powerful technique behind many BI software solutions, but it is rarely employed in the models. OLAP is applied for the processing of multidimensional data or data collected from different databases, which are routine issues in the SC &L area. As another example, in data clustering, some other clustering techniques such as fuzzy k-modes, k-medoid and fuzzy c-means are used less in the reviewed articles. For instance, Kuvvetli and Firuzan (2019) apply k-means clustering to classify the number of traffic accidents in urban public transportation. However, the model is not examined by other clustering techniques such as the k-medoid or fuzzy c-means to ensure that the selected clustering technique is more accurate or efficient than the others.

Our study on the types of analytics indicates that the predictive analytics approach has attracted more attention. Nevertheless, using this approach has its own challenges. Executing predictive analytics techniques is time-consuming and requires iterative stages of testing, adopting, and resulting (Arunachalam et al., 2018). The majority of the studies have not discussed these challenges. Machine learning is one of the efficient methods of AI for analysing and learning data. Some articles have used machine learning methods, but only in the context of “AI”, which can be used with a wider range of techniques (Li et al., 2021a).

Among the machine learning techniques used in the examined literature, deep learning (Punia et al., 2020; Kilimci et al., 2019) and ensemble learning (Zhu et al., 2019b) techniques have received very limited attention, while these techniques increase the ideal prediction accuracy (Baryannis et al., 2019a). Moreover, “transfer learning” and “reinforcement learning” have not been employed in the examined literature. These methods enhance neural networks and deep learning techniques.

5.4 Big data analytics (BDA)

Although some scholars have argued in favor of BDA approaches, they have not fully addressed BDA challenges such as generation, integration, and BDA techniques (Arunachalam et al., 2018; Novais et al., 2019). Among the 227 examined articles, 107 articles used the buzzword “Big Data” in their publications, but a few of them (we found 13 articles) focused on big data characteristics, techniques, and architectures. Therefore, we suppose the the rest probably used large data sets, but not necessarily big data. Considering the special characteristics of big data, it is required that the studies on BDA unequivocally and practically refer to big data techniques (Chen et al., 2014; Grover & Kar, 2017; Arunachalam et al., 2018; Brinch, 2018).

Since big data in SC &L can be generated from various SC processes and from different data collection resources (such as GPS, sensors, and multimedia tools), extracting knowledge from various types of the data is another concern in BDA. The diversity of the data is anticipated to increase in the future Baryannis et al. (2019b); thus, integration in data analysis is an important debate in BDA. It is expected that researchers will considerably focus on data integration in the future.

BDA implementation, like other analytical tools and types of process monitoring, is time-consuming and requires management commitment. Executive BDA challenges, such as strategic management, business process management, knowledge management and performance measurement, need to be reviewed and analysed (Brinch, 2018; Choi et al., 2018; Kamble & Gunasekaran, 2020). Moreover, instead of focusing on some limited performance metrics, the key performance indicators of an SC &L company, such as financial or profitability indicators, must be monitored for proper BDA implementation. In the future, with the development of BDA techniques, such as the proposed BDDO techniques, some prescriptive analytics approaches will become more preferred (Arunachalam et al., 2018).

5.5 Data collection and generation

Unstructured data, such as the data extracted from social media and websites, are great sources of data acquisition that seem to be ignored in the SC &L literature. This type of data should be considered more in the future. Besides,in order to extract more value from DS &BDA approaches, real-time data is much more reliable than historical data because it can better describe SC behaviour (Nguyen et al., 2018). Therefore, SC &L companies should rapidly employ analytics with real-time processing tools. IOT, RFID, and sensor devices are technologies that facilitate real-time recognition (Zhu, 2018; Zhong et al., 2016), and it is suggested that these tools be used in any of the real-time processes in SC &L. A special role in this area will be played by digital twins and associated technologies for real-time data collection such as 5 G (Ivanov et al., 2021a; Choi et al., 2022; Ivanov & Dolgui, 2021a; Dolgui & Ivanov, 2022; Ivanov et al., 2022).

5.6 SC design

Analysis of DS &BDA models indicated a few papers considering not only efficient but also sustainable and resilient network designs. Table 6 illustrates that there is a large gap in literature for considering DS &BDA concepts in resilient SCs. Belhadi et al. (2022) confirm that the COVID-19 pandemic made the SCs focus on resilient principles. The authors affirm that DS &BDA techniques highly support SC resilient strategies.

Our observation for employing the DS &BDA techniques in sustainable SCs reveals another future direction for SC &L research. We realised that only 5% of the studies consider sustainability concepts in their models. Although considering the environmental and human impacts on SC design is a contemporary subject for SCs, Tsolakis et al. (2021) acknowledge that Industry 4.0 and the Internet of Things necessitate the applications of DS &BDA techniques but with deliberating social and environmental aspects in line with SCs’ progress. The authors confirm that the recent extant literature has not adequately covered the sustainability implications of DS &BDA innovations.

The closed-loop SC and reverse logistics are also among the rare design configurations for DS &BDA models. Govindan and Gholizadeh (2021) concede that the analysis of the processes in a closed-loop SC requires big data and once the sustainability and resilient features are combined to the model, a BDA model is capable of addressing the proposed problems in such types of SCs. This means that the volume, velocity, and variety of the input data should be considered in the models.

5.7 COVID-19 and pandemic

Since 2020, COVID-19 pandemic has posed significant challenges for SCs. Different SC echelons have collaborated under deep uncertainty. Academic research introduced some new models and frameworks (Ivanov & Dolgui, 2021b; Ivanov, 2021c; Ardolino et al., 2022). We identified several studies within this research stream in our selected data set. For instance, Barnes et al. (2021) study consumer behaviour in pandemic and named it as “panic buying”. Using big data of social media, the authors apply text mining with compensatory control theory to demonstrate early warning of potential demand problems. Nikolopoulos et al. (2021) study forecasting and planning during a pandemic using nearest neighbours clustering method. They use Google trends data to predict COVID-19 growth rates and model excess demand of products.

One of the central questions regarding the pandemic is how to design a pandemic-resilient SC (Ivanov & Dolgui, 2020; Nikolopoulos et al., 2021; Ivanov & Dolgui, 2021a; Ivanov, 2021a; Choi et al., 2022) and how to adapt to “new normal” (Bag et al., 2022a; Ivanov, 2021b). By emphasising the role of BDA in SC &L, Belhadi et al. (2022) examine the effect of COVID-19 outbreak on manufacturing and service SC resilience. Kar et al. (2022) investigate fake news on consumer buying behavior during pandemic and focus on the effect of resultant fear on hoarding of necessary products. SC performance in COVID-19 era is also investigated by researchers through BDA (Li et al., 2022b; Rozhkov et al., 2022). Although several studies contributed in the area of using DS &BDA approaches, the literature needs a dedicated survey study similar to (Ardolino et al., 2021; Queiroz et al., 2022). Novel contributions in this area can be done with BDA and DS applications in the context of SC viability and Industry 5.0 (Ivanov, 2023; Ivanov & Keskin, 2023).

6 Conclusion and research directions

In this study, we proposed a literature review methodology and a holistic conceptual framework for classifying the applications of DS &BDA in SC &L. An investigation of the relevant review studies illustrated several gaps in former studies, which motivated us to focus on a conceptual framework for our reviewing process. Our broad keyword search initially found a large variety of papers published from 2005 to 2022. Employing a detailed review protocol and process, we selected 364 publications from highly ranked journals. We also focused on studies using DS &BDA modelling methods for solving SC &L problems. We revealed the contributions of DS &BDA in SC &L processes and highlighted the potential for future studies in each SC &L process. We also indicated the effective and bold role of DS &BDA applications/techniques in triple hierarchical decision levels. Three main types of analytics were used to categorise DS &BDA techniques and tools. The overall results indicated that the predictive approach was the most popular one. However, with the development of BDA techniques and the DDO approaches in the future, the prescriptive approach is likely to become more attractive. We also emphasised the deployment of effective deep learning, ensemble learning, and machine learning techniques in SCs. In the area of SC design, we proposed a structured and unbiased review on the DS &BDA application areas in the SC &L comprehensively covering efficiency, resilience and sustainability paradigms.

Limitations exist as with any study. Although we conducted a systematic literature review, the selected papers were restricted due to our proposed inclusion and exclusion criteria. We tried to include all relevant papers and selected highly ranked journals to increase the quality of the research. Nevertheless, a larger data set using computer science/engineering conferences and journals may result in a better exploration of the literature. This will reduce the echo chamber effect of citations in which a specific subset of journals keep citing each other and find each other worthy. The proposed conceptual framework may need to be extended, especially in the case of prescriptive analytics approaches. Also, we may be prejudiced in our interpretation of the literature. The material collection process showed that studies on the topic of DS &BDA in SC &L are substantially growing. Therefore, annual survey studies on this topic (with a broad range of keywords) are suggested for future research. Furthermore, any of the main approaches in DS &BDA applications (such as clustering, classification, simulation, text mining, or time series analysis) can be investigated separately in SC &L.