1 Introduction

Life Cycle Assessment (LCA) is a series of procedures set for the collection and assessment of the inputs and outputs of materials or energy, as well as the subsequent impacts on the environment incurred due to the running of a system or product throughout that entity’s life cycle (ISO 14040.2 Draft). The LCA provides a framework for the definition of the scope, and the goal of the assessment, analysis of the inventory (LCI, life cycle inventory), assessment of the impact (LCIA, life cycle impact assessment), and finally, the interpretation from these procedures (Guinee 2002). The purpose, entities (systems, products) and the degree of sophistication are defined in the LCA framework’s goal and scope definition step. The life cycle inventory (LCI) is the step in which the system boundaries are defined. The key outcome from the LCI is the inventory which collates inputs and outputs to the environment. The life cycle impact assessment (LCIA) is how its relevance expresses the inventory to the impact categories. This step quantifies the impact through weighting and normalization. The interpretation is the final step in which the results from the LCIA are evaluated and used to make recommendations (Guinee 2002). LCA is a vital instrument to help reduce the overall environmental burden and provide insights into upstream and downstream trade-offs associated with environmental pressures, health & wellbeing, and the consumption of natural resources. As such, LCA can inform policy-making by providing valuable information on environmental performance, and thus contributing to performance targets within the Environmental Technology Action Plan (ETAP) and for Energy-using Products within the EuP Directive, in green public procurement (GPP), and in Environmental Product Declarations (EPDs).

In addition, the recent special report on the impacts of global warming of \(1.5^{\circ }\)C was yet another call to implement measures to mitigate GHG emissions and to devise new adaptation scenarios (IPCC 2021; Sala et al. 2021). In this context, LCA helps quantify the environmental pressures, the trade-offs, and areas for achieving improvements considering the entire life cycle of built assets from design to recycling. However, current approaches to LCA do not consistently factor in (both in the foreground and background inventory systems) life cycle variations in: (a) building usage, (b) energy supply (including from renewable sources), and (c) building and environmental regulations; as well as other changes over the building/district lifetime (Anand and Amor 2017; Bueno et al. 2016; Skaar and Jørgensen 2013). These include (a) change in the energy mix of a building/district or upgrading/retrofitting the energy system(s) in place; and (b) time-increase of energy demand during the lifetime of a building due to a wide range of reasons, including changes in occupancy patterns.

As such, LCA is an important instrument to help reduce the overall environmental burden of buildings and provide insights into the upstream and downstream trade-offs that are associated with environmental pressures, health and wellbeing, and the consumption of natural resources. As such, LCA can inform policymaking by providing valuable information on the environmental performance of built assets. However, the current LCA methods and tools face several limitations and challenges, including: (a) site-specific considerations (Bueno et al. 2016), several local impacts need to be considered in building assessments, such as the microclimate; (b) model complexity (Anand and Amor 2017), buildings involve a wide range of material/products, interacting as part of a complex assembly or system; (c) scenario uncertainty (Anand and Amor 2017; Bueno et al. 2016), the long use phase of buildings, including the potential for future renovation, poses uncertainty problems in LCA that are not currently addressed; (d) health and wellbeing (Bueno et al. 2016; Skaar and Jørgensen 2013), traditional LCA methodologies do not address indoor and outdoor environmental impacts on health and well-being; (e) recycled material data (Anand and Amor 2017; Negishi et al. 2018), lack of data on using waste and recycled materials as new building materials; and (f) lack of consideration for social and economic aspects (Anand and Amor 2017; Negishi et al. 2018).

The sheer number of input parameters and their uncertainties that contribute to the full life cycle make a broader application of ML complex and difficult to achieve. Hence the need to adopt a cartesian, i.e., “Divide and Conquer”, or systems engineering approach, whereby the strategy to reduce and mitigate the environmental impact of a complex artefact, in our case a built asset, should be divided into an ensemble of discrete and manageable scenarios, such as optimizing the energy mix of an energy system. By addressing these discrete scenarios in isolation using ML, a broader reduction of environmental impacts via LCA is feasible.

ML methods are from a subtype of Artificial Intelligence (AI) methods that learn from data to improve their accuracy without the need to be programmed again. ML is creating such a model that can find patterns by studying a set of training data and developing an algorithm without human involvement (Mitchell 1997). ML algorithms are typically categorized into four groups: Supervised learning, Unsupervised learning, Semi-supervised learning, and Reinforcement learning as shown in Fig. 1.

Fig. 1
figure 1

Various types of ML techniques

Areas of data science, including ML, are presently used to fill gaps in data for LCA. Furthermore, they have been used to develop accessory tools for LCA that can model and predict a product’s environmental impact based on information from the design phase. ML has the capability of being integrated as a real-time algorithm, assessing production or changes in processes and responding with potential alternatives for better or less environmentally impactful production. ML approaches have been applied to different disciplines of LCA. These include the prediction of missing data, forecasting impact parameters both directly and indirectly, and optimization algorithms in LCA. ML methods have also been used to overcome incompleteness or uncertainty in data to deliver actionable recommendations for the LCA (Algren et al. 2021). One potential advantage of ML in LCA is that it can reduce the cost of data collection. In other words, with ML, the most informative attributes can be identified and focused on collecting them while ignoring other attributes that may not contribute significantly to the model’s accuracy. Sometimes missing data in LCA is encountered, and ML can help predict those missing values, improving the data available for LCA. Based on the type of ML, various statistics and visualizations can be used to evaluate the predicted data, including accuracy, confusion matrix, receiver operating characteristic curve, cluster distortion, and means squared error.

ML thrives in applications where there is a requirement to solve mathematical models accurately and efficiently. Consequently, it can be adapted to provide ideas or methods for an optimization process. It can be implemented as part of a real-time decision-making process where potential improvements in the performance of a system throughout its life cycle are identified. Optimization methods can then be applied to the process. This makes it particularly useful in the design process instead of the entire LCA. This study shows that the ML can be coupled with standard optimization methods to increase their capability of quickly exploring promising regions. Figure 2 provides the standard ML and LCA deployment processes which should be considered in the investigation of ML methods in LCA.

Fig. 2
figure 2

ML and LCA deployment processes

The paper reviews the application of ML to LCA with a focus on Buildings, Districts and Cities, while also including several useful related applications described under a “Miscellaneous” heading. A plethora of studies has explored the application of LCA in buildings with most studies focusing on energy use and GHG emissions (Asif 2019; Elkhayat et al. 2020; Lyu and Chow 2020). However, the literature thus far lacks a comprehensive review of the different applications of ML in LCA, the trends in current practices as well as some of the gaps in research. This review aims to address this by proposing and answering research questions which will establish the current practices and future works which are required in this field.

1.1 Goal and scope

This paper aims to investigate the role of ML methods in LCA across three levels;

  • Buildings,

  • Districts and cities

  • Miscellaneous

As such, our review focuses on built assets considered (a) in isolation or (b) within a District or wider city level. Built assets can be of any type, including residential, public, or industrial. At the districts and cities level, the role of ML in human structures like roads, pavements, bridges, parks, railways is investigated, informed by the literature. ML in other related studies like chemical, agriculture, and products is considered and reported at the “Miscellaneous” level. As to the investigated LCA requirements, this review adopts an exploratory approach in that it reports use cases involving the application of ML in Buildings and wider Districts. As such, a bottom-up approach, driven by applications of ML in LCA, has helped identify the most common requirements addressed in the literature. It is worth noting that environmental certification schemes, such as BREEAM, are not considered in this paper. When ML is used at the districts level, the building as an attribute (categorical attribute) can be considered in the model, which will help capture the differences between different building types within the districts. This attribute will have to be tested to see the level of information (accuracy, for example) that it brings to the model.

In this study, fundamental limitations and challenges faced by current ML methods in LCA, applications, motivations, constraints and their role in predictions and optimizations are considered (Fig. 3).

Fig. 3
figure 3

The applications, motivations, constraints and ML methods in LCA that are considered in this review

The significant contributions of this paper are collating a literature survey to determine use of ML techniques for LCA by answering the following research questions:

  1. 1.

    How has ML been used in LCA?

  2. 2.

    What is the role and efficacy of ML methods in optimization in LCA?

  3. 3.

    Can ML methods integrate and contextualize existing inventory databases to provide a sound basis to streamline the LCA?

  4. 4.

    What are the gaps in research in order to guide future research for ML in LCA?

LCA is explored, and the current state of the art reported in the literature is identified to answer these questions. ML techniques tailored to LCA and specific AI techniques that can advance LCA’s establishment and delivery of the smart technology are investigated. Gaps in research will then be identified in order to guide future research for ML in LCA.

The contents of this paper are organized as follows: Sect. 2 lays out the methodology for identifying and including studies for the review. Section 3 discusses research and provides an overview of ML methods in LCA. Section 4 talks about ML and optimization in LCA. The results and discussion are described in Sect. 5. Finally, the findings are evaluated and concluded in Sect. 6.

2 Methodology

A literature review in applying ML in LCA was performed, and 81 relevant studies were analysed according to the research questions. The review presented here aims to identify, evaluate and interpret all available research relevant to LCA using ML models. This section outlines the process for selecting included papers. This methodology was based on five phases.

Planning phase

In this phase, scope, literature research questions and databases were determined. Google scholar was chosen for the search database as well as Scopus and Web of Science. Citavi (SWISS ACADEMIC SOFTWARE GMBH, 2021) was used for managing the collected references because of its broad functionality. The publication years of studies were determined to be between the years 2000 and 2021.

Search phase

In this step, the search process was developed to select appropriate studies. After defining the research questions in the planning phase, the main terms were defined. Similar terms or interchangeable terms were identified and connected using Boolean OR and AND operators. Table 1 shows the search terms used.

Table 1 Search terms

Filtering phase

At first, the contents of the papers were assessed through screening of titles and abstracts and the following of inclusion criteria were applied.

  1. 1.

    Language: English

  2. 2.

    Document types: Only full-text, conference or journal papers or books

  3. 3.

    Time interval: The publication years of selected primary studies are between the years 2000 and 2021 to narrow to more relevant results based on current practices in the field of LCA.

After removing the duplicates, the papers selected through their abstract screening were reviewed in full, and those that did not consider ML techniques in LCA or provide primary research findings in this topic were excluded. In the next step, the relevance of a paper based on its introduction and the conclusion/discussion was determined. In total, this yielded 81 primary research papers. These references were imported into our reference manager Citavi.

Evaluation phase

In this phase, the articles were assessed for their quality and impact. Three main points were considered for this phase:

  1. 1.

    Is the methodology clear?

  2. 2.

    Are results provided in full?

  3. 3.

    Is the paper relevant to the research questions of this review?

Finally, a decision is made regarding the inclusion of the paper in a full review for this paper. Some papers may have been included for context or interest despite a lack of methodology.

Extraction phase

The collected references were managed using Citavi. For each selected paper, relevant information was collected and a justification for each inclusion was noted. Each paper was then analysed and the following information was extracted and recorded: the model used, the optimal model found by the authors, the type of application that ML was targeting in the paper, and finally, the scale at which LCA was applied in this paper.

3 ML methods in LCA

In this section, related works about ML methods and motivation are presented. For ML methods, each studied zones are made bold.

Luque et al. presented a conceptual framework for the integration of AI and LCA. Throughout their study, the relevance of using sensing when addressing an objective of intelligent sustainability in engineering projects has emerged (Luque et al. 2020). Adedeji et al. present a roadmap to using AI techniques in LCI. The data chain for efficient resident data availability for LCA studies was considered to focus on AI integration. Also, a framework for using AI in LCI was developed (Adedeji et al. 2020).

At the buildings level, through the combined use of ML in LCA, it may be possible to significantly reduce environmental impacts (Barros and Ruschel 2021). D’Amico et al. employed ML methods in civil and structural engineering in order to reduce building impacts (D’Amico et al. 2019b). Barros and Ruschel performed a systematic literature review of the scientific research conducted for architecture, engineering and construction industries in the context of LCA and ML (Barros and Ruschel 2021). They show that the most investigated environmental indicators were energy consumption and Global Warming Potential (GWP). Significantly, they found that ML was predominantly used for prediction. In the case of a regionalized bottom-up model created using ML techniques, environmental profiles for individual households were assessed by (Frömelt et al. 2020). At the districts and cities level, Manfren et al. presented a review of modelling tools for identifying optimal solutions for district-wide energy systems. They introduced a framework for the key concepts of a local energy management system in an urban area. This framework has a multicriteria perspective and uses ML to find optimal solutions for providing energy services through distributed generation (Manfren et al. 2011). Furthermore, DeRousseau et al. examined the various problem formulations which are commonly seen in the field of concrete mixture design optimization that can necessitate models based on the linear combination, statistics, ML, and physics (DeRousseau et al. 2018). Also, in LCA at the miscellaneous level of production, ML algorithms can have an impact in reducing GHG emissions in LCA for geographically differentiated and contextualized design measures; however, they are still underutilized for such applications (Milojevic-Dupont and Creutzig 2021). Kurdi et al. reviewed methods for simulation in tribology to model tribo-contact scenarios and investigated LCA with simulation combined with ML (Kurdi et al. 2020). Wu and Wang reviewed ML methods applied to toxicity prediction and discussed the ML algorithm’s input parameter to enhance prediction accuracy (Wu and Wang 2018). Gust et al. demonstrated that in toxicological and regulatory assessment for novel materials where fewer characterization data are available, probabilistic adverse quantitative outcome pathway can leverage using supervised ML models (Gust et al. 2015). In later sections, we discuss the most commonly used ML techniques in LCA.

3.1 Neural networks

Artificial Neural Networks (ANNs), also known as Neural Networks (NNs) or simulated neural networks (SNNs), are a subset of ML and are at the heart of Deep-Learning algorithms. Their name and structure are inspired by the human brain, mimicking how biological neurons signal to one another (Livingstone 2008). ANNs are favourable as they overcome some limitations commonly seen with traditional software, such as collecting environmental and energy data, physical problem and software language, long computational time, and the need to calibrate a model. Consequently, ANN models provide a superior and more reliable decision support tool for engineers and architects, reducing uncertainties in the LCA field. Furthermore, the implementation of ANN in software can accommodate the development of an appropriate decision support tool. Thus, ML algorithms and techniques may be capable of increasing accuracy in LCA and reducing the simulation time (Sharif and Hammad 2019; Barros and Ruschel 2021; D’Amico et al. 2019a). However, the validity of the NN solution is directly and powerfully proportional to the reliability of the database, which tends to be the most difficult to implement. Ziyadi et al. implemented quantitative uncertainty analysis methods to characterize and quantify uncertainties in a Life Cycle Inventory Analysis (LCIA) model. An ANN model was trained and tested to propagate input variability through a system using interval analysis. Monte Carlo sampling was then used to propagate input uncertainty directly and was compared to an indirect nonlinear optimization method that tries to maximize output range (Ziyadi and Al-Qadi 2019; Barros and Ruschel 2021). At the buildings level, ANN mainly was used for optimizing building performance and for impact prediction of energy consumption and GWP. It was suggested that advances in LCA and ML could help calculate and analyze building environmental indicators and develop and improve LCA methods. Shi and Xu presented a systematic LCA method to analyze the environmental performance of construction materials. Furthermore, BPNN and the hybrid algorithm GA-BP were introduced to evaluate building materials. Compared with BPNN, the hybrid GA-BP algorithm was shown to be of better value for selecting construction materials environmentally and has greater precision (Shi and Xu 2009). D’Amico et al. used ANN to simultaneously solve the energy and environmental balance along the building life cycle. The authors developed a decision support tool that quickly and reliably determines buildings’ performance with minimum effort. The reliable data and ML combination significantly contribute to the increase in speed and accuracy of LCA (Barros and Ruschel 2021; D’Amico et al. 2019a). The results showed that ANN helps predict energy demand and building LCA (D’Amico et al. 2019a). Considering that the importance of the design phase to carbon emissions during a building’s life cycle, Xikai et al. presented a regression model of carbon emissions using designing factors. Also, to determine the designing factors for a predictive model; Multilayer Perceptron (MLP) was used to develop regression models (Xikai et al. 2019). Sharif and Hammad proposed an ANN model to obtain complex data generated from the simulation-based multiobjective optimization model. This model tried to predict energy consumption to improve buildings’ energy performance-critical element of building energy conservation. The outcome of this study showed that the proposed ANN models could efficiently predict the LCA for the whole building renovation scenarios considering the building envelope, HVAC, and lighting systems (Sharif and Hammad 2019). Also, Sharif proposed a simulation-based multiobjective optimization model for optimizing the selection of renovation scenarios for existing buildings by minimizing total energy consumption (TEC) considering LCA. He developed a surrogate ANN for selecting near-optimal building energy renovation methods; and developed deep ML Models to generate renovation scenarios considering TEC (Arani 2020). In the building sector’s construction, the material with their embodied energy of all the materials that fall under the main category like wood, cement, plastic and the material that release less energy is provided as input data to the NN (Mukherjeea et al. 2019). Płoszaj-Mazurek et al. showed the relationships between the parameters of buildings and the possibility of introducing Carbon Footprint estimation and implementing building optimization at the initial design stage. They used Convolutional Neural Networks (CNN) to analyze an image of the urban layout and consider its influence on the building’s Total Carbon Footprint (Ploszaj-Mazurek et al. 2020). Azari et al. investigated the ideal building envelope design using a multiobjective optimization algorithm. This was based on the office building’s energy use and life cycle environmental impacts. The input variables for design were insulation material, window type, window frame material, wall thermal resistance and south and north window-to-wall ratios. The optimal iteration of these variables was found to design the building with the smallest possible operational energy and environmental impact. The eQuest 3.65 simulation tool was used to calculate active energy. LCA and Athena IE was used to find an estimated LCA. In addition, an ANN and genetic algorithm (GA) approach were implemented to generate further combinations and find the ideal design iteration. The environmental impact categories included global warming, acidification, eutrophication, formation of air pollution, and ozone depletion (Azari et al. 2016; Barros and Ruschel 2021). Xia et al. introduced a green building assessment index, developed using the life cycle theory and a back-propagation neural network (BPNN), through a Chinese and international building classification system. The assessment index was intended for scientific assessment as the basis for choosing the best plan for green building systems (Xia and Liu 2013; Barros and Ruschel 2021). Oduyemi et al. produced an ANN model for estimating operation and maintenance costs of buildings (Oduyemi et al. 2015). Life cycle cost analysis (LCCA) compares different design elements, specifications, and materials based on the installation, operation, maintenance and residual costs to evaluate the total life cost of construction. Alqahtani et al. used ANNs to develop a framework for LCCA of construction projects. This was used to estimate the entire cost of construction and uses cost significant items to find the main cost contributions affecting the accuracy of estimation (Alqahtani and Whyte 2013). Wang and Shen created a stochastic Markov model to increase the accuracy of life cycle energy consumption forecasting. This was done by involving longitudinal uncertainties in building conditions, degree days, and valuable life. The Markov building deterioration model was developed using historical data of similar situations and was used to predict the building’s useful life and expected condition at any given time. Deterioration of building and temperature changes were used to simulate yearly variation in energy consumption. Energy consumption was estimated with the available data set to calculate annual energy consumption using NN. The proposed stochastic model results in a more restricted distribution, being similar to measured data. It may be implied that the longitudinal uncertainty in the thermal condition of the building and the temperature can account for some uncertainty in the variation of the energy performance (Wang and Shen 2013; Barros and Ruschel 2021). Duprez et al. developed a technique using ML for predicting GWP of building design alternatives with a high coefficient of determination. The original model was compared to three metamodels, Multiple Linear Regression (MLR), Support Vector Regression (SVR) and ANN, to compare their ability to estimate GWP accurately. The authors concluded that ANN offered better results than MLR and SVR (Duprez et al. 2019).

At the city level, Perrotta et al. used the application of Boruta Algorithm (BA) and NN to evaluate and calculate a fleet of trucks’ fuel consumption to estimate the emissions for pavement roads. The authors showed that NN is appropriate for analyzing data from fleet and road asset management databases. The resulting NN model was used to estimate the impact of rolling resistance parameters (pavement roughness and macrotexture) on fuel consumption (Perrotta et al. 2018). Furthermore, Perrotta et al. used truck telematics, road geometry and condition data to investigate the fuel consumption prediction of fleets of trucks. Three ML techniques, Support Vector Machine (SVM), Random Forest (RF) and ANN, were developed and compared in performance (Perrotta 2017).

In the miscellaneous level Wisthoff et al. studied the relationship between product design decisions and eventual LCA. Their study developed a search tree of sustainable design knowledge in the early design phase, and to assist in quantifying the impact of these design decisions; the study used an MLP method to relate the LCA of 37 case study products to product attributes to help the designer to redesign the product to reduce the impact (Wisthoff et al. 2016). Smetana et al. focused on analyzing evolutionary similarities and differences between two complex modular systems, NN and blockchain technologies, on evaluating their potential for application to material flow analysis (MFA) and LCA. The authors concluded that the combination of NN and blockchain could form a more efficient system for MFA and LCA (Smetana et al. 2018). Chiang et al. introduced a design for environment methodology to evaluate derivative consumer electronic product development using a BPNN model and a technique for order preference by similarity to ideal solution (TOPSIS) method (Chiang et al. 2011). Zhu et al. presented a research framework for greening the continuous sitagliptin manufacturing process with LCA and NN’s aid. Deep learning NN models were developed to predict LCA according to the chemicals in a database with known LCA values and corresponding molecular descriptors (Luque et al. 2020). Li et al. developed an ANN approach to estimate unknown eco-indicators for missing environment impact information for several vital materials used in electronic products and integrate recycling scenarios in LCA (Li et al. 2008). The result showed that the ANN-based approach was accurate enough in forecasting the missing materials. Kaab et al. employed two ANNs and an adaptive neuro-fuzzy inference system (ANFIS) model for predicting LCA and output energy of sugar cane production (Kaab et al. 2019). Romeiko et al. presented a model for estimating LCA spatially at the county scale, with corn production developed by applying ANN (Romeiko et al. 2020a). For the cost estimation of a product’s life cycle in the product design process, Leszczyński and Jasiński used ANNs and compared them with a parametric estimation (Leszczynski and Jasinski 2020). Marvuglia et al. developed an automatic selection strategy using combinations of a General Regression Neural Network (GRNN) and a set of linear models, based on partial least squares (PLS) regression for USEtox factor. The authors found that linear models have lower predictive power (prediction of toxicity factors) compared to GRNN nonlinear model (Marvuglia et al. 2015; Barros and Ruschel 2021). Song et al. developed ANN models to estimate the LCA of chemicals in the market. Using molecular structure information, they trained multilayer ANNs for life cycle impacts of chemicals using six impact categories. The application domain (AD) of the model was estimated for each impact category within which the model exhibits higher reliability (Song et al. 2017). Also, Song continued an attempt to harness the power of ML techniques to address the data deficiencies in LCA and an ANN, and Random Forest predictive models were developed to estimate approximate life cycle impacts of chemicals (Song 2019). Li et al. used nine molecular fingerprints to describe pesticides, binary and ternary classification models constructed to predict aquatic toxicity of pesticides via six machine learning methods: ANN, Naïve Bayes (NB), K-Nearest Neighbours (KNN), Classification Tree (CT), RF and SVM (Li et al. 2017). Amini Toosi et al. explored the possibility of an ANN-based LCA model for the conceptual design phase by classifying products according to their environmental and product characteristics. The product classification ultimately identified was used to create classification schemes with the C4.5 decision tree algorithm. An ANN-based approach with product attributes as inputs and environmental impact drivers as outputs were developed to predict the approximate LCA of grouping members. The predicted results seemed to be satisfactory (Seo et al. 2005). Cornago et al. introduced a model which resembles the deep neural network (DNN) to forecast the hourly day-ahead electricity consumption in an LCA aware scheduling system. This information allows to schedule the production to minimize the LCA impacts relative to the electricity consumption. (Cornago et al. 2020). Understanding and developing the LCA of activated carbon produced from diverse biomass feedstocks is critical and time-consuming for biomass screening and process optimization for sustainability. Liao et al. addressed this problem by developing a high accuracy ANN model and kinetic-based process simulation to estimate primary energy consumption and GHG emissions across various woody biomass (Liao et al. 2020). Nabavi-Pelesaraei et al. used historical data to predict future agricultural energy, and they showed that agricultural energy output and its LCA could be readily predicted by ANN (Nabavi-Pelesaraei et al. 2018). Sousa et al. proposed an ANN model using product attributes, which are characteristics of product concepts, and environmental inventory data from pre-existing LCAs. The product design team then use the new high-level attributes to obtain LCA for a new quickly product concept (Sousa et al. 2000). Also, Sousa and Wallace developed an ANN-based learning surrogate in approximate LCA of product design concepts (Sousa and Wallace 2006). Kleinekorte et al. proposed a predictive LCA framework of chemicals using ANN networks. The results show that the proposed. ANN was able to predict whether a technology change has the potential to reduce climate change impacts (Kleinekorte et al. 2019b). Park and Seo proposed a BPNN model for an approximate LCA for the conceptual design phase by classifying products according to their environmental and product characteristics. For approximate LCA, the product attributes and environmental impact drivers (EID) were identified to predict the environmental impacts of products. The results showed BPNN is more accurate than multiple regression analysis in the prediction of the results of LCA (Park and Seo 2003). Milczarski et al. applied ANN to validate the production process’s quality and parameters in the food processing industry (Milczarski et al. 2020).

3.2 Support vector machines

Support vector machines (SVMs) have been scarcely involved in LCA. SVM is an ML algorithm based on a theory proposed by Vapnik called the statistical learning theory. It has proven to have unique advantages when working with smaller samples, nonlinear and high dimensional pattern recognition and can also be used in conjunction with other ML problems such as function fitting. SVM aims to solve the optimization problem and to find the optimal classification hyperplane in the high-dimensional feature space in order to work with complicated data classification (Cortes and Vapnik 1995).

At the buildings level, Shan et al. explored ML-based electroencephalogram (EEG) methods in the human-computer interaction domain for a potentially more accurate and objective human-building interaction. The machine learning-based EEG methods can be the primary feedback mechanism of wellbeing and performance to the building life cycle platform. Linear discriminant analysis (LDA) and SVM machine learning classifiers were demonstrated. Together with EEG indices, these two ML-based EEG methods can be the primary feedback mechanism of wellbeing and performance to the building (Shan et al. 2017). Liu et al. proposed a methodology that couples multiobjective optimization and SVM and decision tree classifiers to extract design heuristics (Comfort temperatures, etc.). The methodology has been demonstrated on sustainable residential system design via Techno-Ecological Synergy in LCA (TES-LCA) methodology (Liu and Bakshi 2018).

At the districts and cities level, Perrotta et al. presented the application of SVM to fuel consumption modelling of articulated trucks for a large dataset. Again, SVM demonstrated a good level of accuracy (Perrotta 2017).

At the level of miscellaneous, Hou et al. compared the performance of SVM beside other ML models with the performance of the Ecological Structure-Activity Relationships (ECOSAR) model. This is proven to be the best model among several existing aquatic ecotoxicity QSAR tools and linear regression models for estimating HC50 values of chemicals based on their physical-chemical properties and their classification of the mode of action (Hou et al. 2020). Pradeep Kumar et al. developed an SVM model to delineate vanadium-derived strengthening effects in HSLA steels in the field of production. In addition, they created a ML model to predict the yield strength of V-HSLA steels. Materials savings are translated to embodied energy and carbon savings using LCA databases in a life cycle inventory process, subtracting the costs incurred in the production of vanadium feedstock (Pradeep Kumar et al. 2021). Li et al. used SVM to predict the aquatic toxicity of pesticides and develop a tool for an early evaluation of aquatic pesticide toxicity in environmental risk assessment. They found that SVM exhibited high accuracy (Li et al. 2017). Romeiko et al. compared the SVM and Gradient Boosting Regressor (GBR) model for estimating spatially explicit life cycle global warming and eutrophication, with corn production. The results indicated that the GBR model built with monthly weather, features yielded higher predictive accuracy for life cycle, global warming impact, and life cycle EU (Romeiko et al. 2019). Milczarski et al. applied SVM, ANN, RF, KNN and C4.5 to validate the production process’s quality and its parameters in the food processing industry. The results showed that using the RF algorithm had the best results of processes classification (Milczarski et al. 2020).

3.3 Random forest

Random forest is a type of supervised learning algorithm. It is a collection of decision trees, each trained with the “bagging” method. The principle of the bagging method is that combining learning models can improve the outcome (Breiman 2001). This ML algorithm has been relatively well-used in the LCA due to its high predictive accuracy and its built-in variable importance measures (Hou et al. 2020; Hou 2019).

At the buildings level, Xikai et al. applied RF beside three regression techniques to develop regression models of carbon emissions to predict designing factor during the building’s life cycle (Xikai et al. 2019). Frömelt used RF, KNN and LASSO-Regression to attribute missing water supply, electricity, and heating information. The predicted data were then converted to quantities using price data. Household budget survey finds the existence of similar socio-economic household archetypes in consumption. These archetypes diverging from general macro-trends suggest that the proposed approach may be beneficial in improving understanding of consumption and informing policymakers’ future decisions for impactful environmental measures targeting specific consumer groups (Frömelt et al. 2018; Frömelt 2018). DeRousseau et al. applied RF as the best method between various ML methods like regression models and for predicting concrete compressive strength for field concrete mixtures given the model performance metrics in the field of concrete mixture design optimization (DeRousseau 2020).

At the districts and cities level, Perrotta et al. developed an RF model beside other ML algorithms to investigate the fuel consumption prediction of large fleets of trucks based on truck telematics and road geometry and condition data. The study also shows that although all three methods make it possible to develop models with good precision, the RF slightly outperforms SVM and ANN (Perrotta 2017).

At the miscellaneous level, Cheng et al. assessed the impacts of different combinations of feedstocks and pyrolysis conditions on climate change, energy, and economic performance. First, they built an RF model to predict the yields and characteristics of biochar for selected feedstocks at varied pyrolysis conditions. Then, they applied LCA and financial analysis to RF model outputs to determine GWP, energy return on investment (EROI), and minimum product selling price (MPSP) of biochar (Cheng et al. 2020a). Also, Cheng et al. evaluated the energy, climate change, and economic performance of slow pyrolysis of multiple feedstocks under various processing conditions via the integration of RF, LCA, and financial analysis. The results showed this integration is helpful for efficiently evaluating many possible pyrolysis systems producing biochar to sequester atmospheric CO2 (Cheng et al. 2020b). Also, Cheng et al. evaluates the feasibility of hydrothermal treatment (HTT) with carbon capture and storage (CCS) as energy-producing negative emissions technology (NET) and compares such system with traditional bioenergy with carbon capture and sequestration (BECCS) system. RF was developed to predict product yields and characteristics from HTT of various feedstocks. The model results were then integrated into an LCA model to compute two metrics EROI and GWP. Results showed that RF models had better prediction accuracy than regression tree and multiple linear regression models for HTT of feedstocks and predicted the mass yields of various products and the energy and carbon contents of biocrude and hydrochar (Cheng et al. 2020a). Rojek and Dostatni used RF beside some ML methods as modelling tools supporting selecting materials in ecodesign (Rojek and Dostatni 2020). Gu developed an LCA model to reduce the life cycle environmental impacts of metal-organic frameworks; he combined a conventional LCA with RF and yielded some preliminary heuristics for sustainable design of metal-organic frameworks with some life cycle impact (Gu 2018). Beyond LCA, Hou developed an RF model in chemical risk management to predict the ecotoxicity of new chemicals or as a screening process to identify chemicals with high predicted ecotoxicity potential to further test in priority (Hou 2019). Milczarski et al. applied RF to validate the production process’s quality and its parameters in the food processing industry. The results showed that using the RF algorithm had the best results of processes classification (Milczarski et al. 2020).

3.4 Hybrid and ensemble ML techniques

The use of ML methods, including singles, ensembles, and hybrids, have been dramatically increasing. Hybrid methods combine at least two ML and soft computing methods to achieve superior outcomes. Ensemble methods use a series of ML classification trees as opposed to one. By doing so, the accuracy of the model is significantly increased. Ensemble methods are categorized as supervised learning algorithms. Ensemble methods increase the training. The ensemble method allows for different training algorithms, making training more flexible. Kishk et al. proposed an integrated life cycle costing (LCC) that utilizes statistics, fuzzy set theory, and ANNs to deal with incomplete information, human judgment, and uncertainty. The authors claim that these models should also provide estimates from different levels of data, and information availability (Kishk and Al-Hajj 1999).

At the buildings level, Feng et al. developed a quantitative method using fuzzy C-means clustering and an extreme learning machine (FCM-ELM) for assessing buildings’ environmental performance in early decision stages, considering uncertainty associated with complex design decisions. The results show that the model is at least as reliable and accurate as the Monte Carlo methodology (Feng et al. 2019). Also, Feng developed an LCA method that integrated discrete-event simulation and process-based LCA using the Bayesian regularization back-propagation neural networks (BRBNN), RT, ensemble learning (EL) and ELM algorithms to extract knowledge about the relationships between construction planning and project performance (Feng 2020). Azari et al. proposed a hybrid ANN and GA approach as the optimization technique to explore optimum building envelope design concerning energy use and LCA in a low-rise office building. The categories within the LCA were global warming, acidification, eutrophication, smog formation, and ozone depletion (Azari et al. 2016). In the context of building material properties, Shi et al. considered a systematic method derived from LCA theory to analyze the green performance of construction materials. The authors proposed a BPNN and GA-BP hybrid algorithm to evaluate green building materials. They showed that with BPNN, the GA-BP hybrid algorithm is favourable for selecting green building materials and achieves higher accuracy (Shi and Xu 2009). Wang et al. introduced a Markov chain based stochastic approach and an ANN model to project periodic energy consumption distribution for each joint energy state of building condition and temperature. Comparing the traditional deterministic model and the developed model shows that the proposed model improved the result (Wang and Shen 2013). Duprez et al. combined Sobol Sensitivity Analysis (SA) and an ANN to building LCA. The Sobol method displayed satisfactory results with the computation of quantitative indices. SA was used in the ANN training, and the subsequent model predicted the GWP of new design alternatives. It was able to do this in a time-efficient manner and with a coefficient of determination higher than 0.9 (Duprez et al. 2019).

For miscellaneous uses, Kleinekorte et al. proposed a fully automated framework, including selecting suitable subsets of descriptors, called feature selection and optimization of the network architecture. They used a GA to determine the optimal network architecture and an ANN to predict the environmental impact for a given chemical. The results show that the environmental impact is expected correctly, and the framework can serve as an initial screening tool for identifying environmentally beneficial process alternatives (Kleinekorte et al. 2019a). Lysenko et al. proposed a method that the gradient-boosted classifier tree ensemble model (GBM) is chosen for the small number of positive (toxic) drugs in a training dataset with missing values. The ML leverages the identity of drug targets and off-targets, functional impact score computed from Gene Ontology annotations, and biological network data to predict drug toxicity (Lysenko et al. 2018). Li et al. introduced a modular Scorecard-based LCA architecture with a Bayesian Network (BN). The energy consumption is assessed by an overall modular Scorecard-based LCA architecture embedded with a BN energy prediction model. Seo and Kim proposed a hybrid GA and NN model for an approximate LCA. The GA was employed as an optimization method of relevant feature selection, determining the number of hidden layers and processing elements. For approximate LCA, the product attributes and environmental impact drivers (EID) were identified to predict the environmental impacts of products (Seo and Kim 2007). The results show that the hybrid model improves the prediction accuracy of the BN model, and the BN is suitable for small data sets (Li et al. 2017). Zhou et al. proposed integration of ANN with GA to optimize the multiobjective function of material selection in product design considering LCA (Zhou et al. 2009).

3.5 Other types of ML

Slapni et al. presented a framework that used Weka 3.6.10, a JAVA program package for machine learning algorithms to predict the missing characterization factors (CFs) in environmental interventions to reduce deviation from the European Union normalization factors (EU NFs) and a nominated reg regional NFs to calculate LCA (Slapnik et al. 2014).

At the buildings level, Duprez et al. proposed a method for predicting GWP of building design alternatives with a high coefficient of determination. The authors used MLR, SVR and ANN. MLR and SVR performed poorly when predicting new values as they could not cope with complexity as for MLR or were prolonged as for SVR models. (Duprez et al. 2019). Xikai et al. presented a study on the regression model of carbon emissions in residential buildings using designing factors. Four regression techniques, Principal Component Analysis (PCR), RF, MLP and SVR, were used to develop regression models, and the results show that SVR had the optimal predictive power (Xikai et al. 2019). Shan used LDA and SVM machine learning classifiers to established EEG based methods to improve human-building interaction in the indoor environment and use them in a building LCA platform (Shan et al. 2017). Płoszaj-Mazurek et al. introduced a study of regenerative design guidelines for parametric modelling of building designs with calculated total Carbon Footprint. They used the GBR model to predict optimal building features and the CNN to predict the total carbon footprint of a building design based on fundamental building features and the urban layout. The results of multicriteria analyses showed relationships between the parameters of buildings and the possibility of introducing carbon footprint estimation and implementing building optimization at the initial design stage (Ploszaj-Mazurek et al. 2020). Østergaard used an MLR model to estimate more accurate lifespans, which can help to reduce the uncertainty of sustainability assessments of buildings in LCA. The regression model proved to estimate the lifespan with lower errors than the general approach relying on a single fixed value for all building locations, uses and building materials (Østergaard et al. 2018). Feng developed an LCA method that integrated discrete-event simulation and process-based LCA. The BRBNN, RT, EL and ELM algorithms were used to extract knowledge about the relationships between construction planning and project performance (Feng 2020).

At the districts and cities level, Alam used multiple linear regression, polynomial regression, decision tree regression and support vector regression models using calculated CO2 emission as a response variable for the LCA model for different phases of the pavement life cycle. The models determined the significant contributors and quantified the CO2 emission in pavement material production, initial construction, maintenance and use phase; they found that SVM and ANN performed better than other methods (Alam 2020). Renard et al. developed a reinforcement learning (RL) decision support tool that minimizes the global warming impacts of a pavement system over its life cycle. Renard et al. presented an approach to LCA modelling that implements a reinforcement learning algorithm called Q-learning, which helps decision-makers account for several sources of uncertainty in pavement infrastructure (Renard et al. 2021a).

In the context of miscellaneous applications, Romeiko et al. used the boosted regression tree (BRT) model to identify the leading contributors among soil, weather, and farming practice parameters affecting the life cycle impacts in Soybean Production. The authors used a combination of Environmental Policy Integrated Climate and process-based LCA models to quantify life cycle GWP, EU and acidification (AD) impacts. BRT has been used in discovering the driving factors for spatial and temporal trends in transportation, public health, and other disciplines (Romeiko et al. 2020b). Also, Romeiko et al. compared the predictive accuracies of SVR, linear regression (LR), ANN, gradient boosted regression tree (GBRT), and extreme gradient boosting (XGBoost) for estimating spatially explicit LCA at the county scale, with corn production in a case study. The results indicated that the GBRT model yielded the highest predictive accuracy with cross-validation (CV) values of 0.8 for the life cycle GW impacts (Romeiko et al. 2020a). Bui and Perera proposed a decision support framework comprising the life cycle cost analysis and advanced data analytics based on Gaussian Mixture Models (GMM) with the expectation-maximization (EM) algorithm for data clustering. GMM is a case of an unsupervised learning algorithm in which GMM is a probabilistic-model technique for distributing data into different clusters by Gaussian distributions. This framework prepared an intelligent decision support tool for ship owners to achieve optimized vessel performance and comply with stringent environmental regulations (Bui and Perera 2020). Hamrol et al. presented an integrated eco-design of products and technological processes, ensuring the appropriate selection of materials and connections from the point of view of recyclability. The method was implemented in an expert system using the classification method decision tree induction as the classification method. The expert system offers a practical solution that makes it possible to change material or connection without consulting the product designer. Moreover, it is consistent with concurrent engineering design (Dostatni et al. 2018). Hou et al. used KNN, SVM, ANN, RF, Adaptive boosting (AdaBoost) and Gradient boosting machine (GBM) for estimating HC50 values of chemicals based on their physical-chemical properties and their classification of the mode of action. Among the machine learning models, RF had the best predictive performance (Hou et al. 2020). Cheng et al. used the MLR, regression tree (RT), and RF to predict product yields and characteristics from HTT of various feedstocks. The model results were then integrated into an LCA model to compute EROI and net GWP. Results showed random forest models had better prediction accuracy than regression tree and multiple linear regression to model HTT of feedstocks (Cheng et al. 2020a). Rojek and Dostatni compared the effectiveness of RBF networks, Kohonen networks, and RF as modelling tools supporting selecting materials in ecodesign and showed that ML methods effectively supported selecting materials in ecodesign. The study has proven ML methods to be highly useful and effective in selecting materials in designed products (Rojek and Dostatni 2020). Gu used a built-in decision tree model (ID3) package coupled with conventional LCA to speed up understanding metal-organic frameworks based via connecting the LCA results with ML technique (Gu 2018). Lee et al. developed a rapid predictive model to quantify life cycle GW and eutrophication (EU) impacts of corn production using the BRT model to estimate future life cycle environmental impacts of corn production (Lee et al. 2020). Nabavi-Pelesaraei et al. conduct energy output and environmental impact prediction of paddy production on ANN and adaptive neuro-fuzzy inference system (ANFIS). According to the results, multi-level ANFIS is chosen as a better model than ANN models due to higher computation speed, and higher accuracy (Nabavi-Pelesaraei et al. 2018). Ma and Kim (Ma and Kim 2015) presented an algorithm, predictive usage mining for life-cycle assessment (PUMLCA). This displayed a higher forecasting accuracy when data had complexity. Through modelling usage patterns, trend, seasonality and level, predictive LCA was performed for agricultural machinery in real-time. This showed an accurate estimate of environmental impact (Barros and Ruschel 2021). SAAB presents a proposed LCA calculator for implementing an efficient LCA computation; they used Spark MLlib, a library built on Apache Spark, MPI and OpenMP for LCA algorithms. The results showed that the combination of MPI/OpenMP provided much better performance for computing algorithms than Spark MLIB in LCA (Saab 2019). Abdella et al. presented a framework integrating the economic input-output LCA with logic regression and k-means clustering to deal with multiple decision-making units in food consumption categories and sustainability indicators (Abdella et al. 2020). Olafasakin et al. developed a Kriging-based reduced order model (ROM) to predict pyrolysis yields of feedstock samples based on the output of a detailed chemical kinetic pyrolysis mechanism for assessing the costs and emissions of a pyrolysis biorefinery (Olafasakin et al. 2021).

KNN classification is one of the most fundamental and straightforward classification models in traditional supervised learning. Consequently, it is often one of the first choices for a classification study when it is tiny or no prior knowledge about the data distribution (Peterson 2009; Frömelt et al. 2018; Hou et al. 2020). Hou et al. proposed three data-driven frameworks to estimate the missing data in LCA. The results show that KNN models have better prediction performance than ECOSAR and linear regression models for estimating some parameters for chemicals in USEtox (Hou et al. 2020). Serajiantehrani used KNN and MLR, decision tree regression, and gradient boosting regression methods for the complete construction and environmental costs of trenchless cementitious spray-applied pipe linings, cured-in-place pipe with polyester resin, and sliplining with high-density polyethylene pipe methods by evaluation and analysis of the construction and environmental costs based on the actual data. The results show that Multi-linear Regression had the optimal predictive (Serajiantehrani 2020). Milczarski et al. applied KNN and C4.5 to validate the production process’s quality and parameters in the food processing industry (Milczarski et al. 2020). Romeiko et al. compared the SVM and GBR model for estimating spatially explicit life cycle global warming and eutrophication, with corn production. The results indicated that the GBR model built with monthly weather, features yielded higher predictive accuracy for life cycle, global warming impact, and life cycle EU (Romeiko et al. 2019).

In this section, applying ML in LCA is explored, and the current state of the art reported in the literature is identified to answer the above questions. ML techniques tailored to LCA and specific AI techniques that can advance LCA’s establishment and delivery of the smart technology are investigated. Table 2 shows details of the literature survey on ML methods in LCA. The papers are divided into three types of prediction impact (PI), decision making (DM) and literature review (LR). The applied method in each paper is identified, and the scale at which the model was applied is shown. The majority of studies identified in this review were for impact prediction, but many had multiple objectives and often incorporated decision making. Many niche applications were also found in this review and the discussed studies show the adaptability of ML techniques for LCA.

4 ML and optimization in LCA

In its most basic form, LCA does not always include a systematic way of optimizing alternatives for environmental impacts mitigation. Combining LCA with ML methods may be a useful way of generating optimized process alternatives as part of an LCA (Wallace et al. 2014). ML models can perform faster and with lower storage requirements when estimating model outputs than other traditional process-based models. They are also more flexible when being integrated into other processes and simulation platforms. These allow ML models to attempt more runs of a simulation and achieve better outcomes for a range of computationally demanding tasks. These include optimization, prediction, and validation. Also, ML models can be fine-tuned by altering trainable parameters through an optimization procedure. LCA can be used to assess technological solutions from an environmental perspective. In conjunction, ML can be used as an optimizer alone or combined with other optimization algorithms to find the best solution according to constraints in LCA.

Luque et al. developed a conceptual framework for the integration of AI and LCA. The study focused on the sensorization of industrial plants and the treatment of data through ML algorithms in the field of sustainability optimization (Luque et al. 2020). Ziyadi et al. developed an ML surrogate model to perform direct Monte Carlo sampling as well as indirect nonlinear optimization to provide grounds for objective model uncertainty analysis for LCA applications (Ziyadi and Al-Qadi 2019).

At the buildings level, Sharif and Hammad developed an ANN model to analyze renovation scenarios to minimize total energy consumption in LCC and LCA. They developed a set of data to represent renewal scenarios from results obtained by Simulation-Based Multi-Objective Optimization (SBMO). ANNs were developed as surrogate models of actual computationally complex buildings. The computational time saved with the use of the proposed substitute models was found to be significant (Cornago et al. 2020). Also, Sharif proposed a simulation-based multi-objective optimization model for optimizing the selection of renovation scenarios for existing buildings by minimizing total energy consumption (TEC) considering LCA. Furthermore, he developed a surrogate ANN for selecting near-optimal building energy renovation methods; and developed deep ML Models (MLMs) to generate renovation scenarios considering TEC and LCC (Arani 2020). Azari et al. used a multi-objective optimization algorithm to explore ideal building envelope design by analyzing energy use and LCA of office buildings. Their approach combined an ANN and GA to find the optimal design (Azari et al. 2016). Feng developed an LCA method that integrated discrete event simulation and process-based LCA. The optimization method achieved real-time environmental optimization by introducing ML methods into simulation-based optimization. Płoszaj-Mazurek et al. applied the CNN method to optimize the carbon footprint of buildings in regenerative architectural design. The BRBNN, RT, EL and ELM algorithms were used to extract knowledge about the relationships between construction planning and project performance (Feng 2020). The results show ML methods could be a research tool for exploring vast design spaces in the field of sustainable architectural design (Płoszaj-Mazurek et al. 2020; Płoszaj-Mazurek 2020). Renard et al. implemented a Q-learning to optimize a pavement construction and maintenance plan to minimize the expected global warming impact of a pavement facility (Renard et al. 2021a). Liu et al. proposed a methodology that couples multiobjective optimization and ML to extract design heuristics (comfort temperatures and other related parameters). The methodology has been demonstrated on sustainable residential system design via TES-LCA methodology (Liu and Bakshi 2018).

At the districts and cities level, Chen et al. used multi-agent deep reinforcement learning to optimize dissolved oxygen and chemical dosage in water treatment plants. The outcome was designed from an LCA perspective to achieve sustainable optimization. They showed that the optimization based on LCA had results that achieved lower environmental impacts compared to the baseline scenario (Chen et al. 2021). Abokersh et al. developed a multiobjective optimization framework using an ANN model comprising the Bayesian optimization approach; assisted sensitivity analysis. ANN method was used to inherent sustainability principles in the design of solar assisted district heating in different urban sized communities in an optimization framework (Abokersh et al. 2020). DeRousseau et al. examined the various problem formulations commonly seen in concrete mixture design optimization that can necessitate models based on the linear combination, statistics, ML, and physics. They used ML methods for predicting the compressive field strength of concrete (DeRousseau et al. 2018; DeRousseau 2020).

For miscellaneous uses, Zhou et al. proposed integration of ANN with GA to optimize the multiobjective of material selection in product design considering LCA (Zhou et al. 2009). In the context of decision-making support for LCA, Marvuglia et al. presented an evaluation of two different grouping techniques for categorizing materials based on their environmental performance. The agglomerative clustering technique and self-organizing map helped distinguish variables that could be used to establish classes of materials using their environmental performance (Marvuglia et al. 2015). The authors implemented GRNN and a set of linear models based on PLS regression, hoping to develop an automatic selection strategy of the critical variables according to the modelled output (USEtox factor) (Marvuglia et al. 2015; Barros and Ruschel 2021). Cornago et al. proposed an LCA aware scheduling framework, in which a production schedule is optimized with a lower environmental impact using predicted the hourly day-ahead electricity consumption by a DNN model (Cornago et al. 2020). Romeiko et al. presented a model for estimating LCA spatially at the county scale in corn production. This was developed by applying ML methods that could be used for corn supply chain optimization, corn-based biorefinery siting, and feedstock landscape optimization (Romeiko et al. 2020a).

Figure 4 represents the general relation of ML methods in the field of Optimization in LCA in the reviewed researches.

Fig. 4
figure 4

ML and optimization methodology in LCA

5 Results and discussion

This paper collaborated a literature survey to determine the use of ML techniques for LCA by answering the research questions. Gaps in research for ML in LCA were identified to guide future research. In the following sections, the highlights of reviewed papers and the limitation of using ML methods in LCA will be discussed.

5.1 Limitations of ML methods in LCA

Based on reviewed papers, the limitations of ML methods in LCA are;

  1. 1.

    LCA and training powerful analytical models with ML are expensive and depend on large amounts of hand-crafted, structured training data. Computational cost and training time in ML methods are other important parameters related to the accuracy of outputs. The researchers should try to reduce the computational cost by reducing the dimensions of data sets and keeping the accuracy and validation of the results in good time.

  2. 2.

    Some ML models, known as black-box models such as DNN, RF and SVMs, are exceedingly complex and make it very difficult to predict how they will perform in a specific context. Similarly, their users may not be able to review and understand the recommendations given by these models for intelligent systems.

  3. 3.

    Early design stages often are limited in detailed information, which is typically required for thorough assessments and thus need quick decisions on varying, numerous and loosely-defined concepts. These make the early use of detailed LCA impractical. For predictive modelling and experimental studies to be compatible, standardization of the conditions, experiments and reporting are needed in order to achieve consistency and to be reproducible.

  4. 4.

    Data-based approach is a method to fill in data gaps in LCA studies. It depends on the available data and how we choose to use it statistically, so we can recognize a good pattern from the data and make a prediction (Song 2019). Therefore, a large amount of data is required for an LCA while one of the key limitations on the application of ML algorithms side is a lack of high-quality and real-world-collected data sets.

5.2 Highlights of reviewed papers

The significant contributions of this paper are collaborating literature survey to determine use of ML techniques for LCA by answering the following research questions:

  • How has ML been used in LCA?

  • What is the role and efficacy of ML methods in optimization in LCA?

  • Can ML methods integrate and contextualize existing inventory databases to provide a sound basis to streamline the LCA?

  • What are the gaps in research in order to guide future research for ML in LCA?

Applying ML in LCA is explored, and the current state of the art reported in the literature is identified to answer the above questions. ML techniques tailored to LCA and specific AI techniques that can advance LCA’s establishment and delivery of the smart technology are investigated. In this section, the results of the research are shown by figures and tables.

How has ML been used in LCA? Table 2 shows details of the literature survey on ML methods in LCA. The papers are divided into three types of prediction impact (PI), decision making (DM) and literature review (LR). The applied method in each paper is identified, and the scale at which the model was applied is shown. The papers included in this paper answer and support the above research question. In the included literature, many different applications at different scales were demonstrated to be beneficial in accurate and efficient LCA.

Table 2 Summary of literature survey on ML in LCA for included papers

The associated heatmap, Fig. 5, shows that ANN is the most commonly applied method at all three levels of categorisation in this paper, particularly at the buildings level and then at the miscellaneous level which includes a variety of niche applications. Hybrid techniques were the next most used ML method at the building and districts and cities level.

Fig. 5
figure 5

Heatmap of hit-points for each ML method

Figure 6 is a radar graph showing that the most common application of ML methods have been for predictions. Individually, NN was most commonly used, followed by hybrid methods. In response to the first research question, Figs. 5 and 6 display a list of included studies that supported the use of ML-based prediction methods to predict LCA accurately.

Fig. 6
figure 6

Radar graph showing the applications of ML methods in prediction and decision making

What is the role and efficacy of ML methods in Optimization in LCA? The results of this study show that the ML methods are capable of matching detailed LCA results and predicting missing data or trends of variables while staying within the accuracy of typical LCA. Furthermore, ML extends outside of LCA in processes such as data cleaning, predicting system output or performance, ecosystem informatics, and optimization. ML algorithms could also be applied in screening or cleaning data for LCI, estimating flow data for unit processes, improving the quality and quantity of data used to determine CFs, and can be used to generate optimized scenarios. They are especially suitable for supporting real-time decisions of construction environmental optimization. This study shows that the ML can be coupled with standard optimization methods to increase their capability of quickly exploring promising regions.

Can ML methods integrate and contextualize existing inventory databases to provide a sound basis to streamline the LCA? Many included studies in this review utilised pre-established databases in order to perform LCA. ML methods are capable of integrating these existing databases, although with all LCA the quality of the data and the nature of the database may have an impact on the quality of LCA. However, ML methods identified in this paper can be used to fill in gaps if pre-existing databases are partially complete.

Figure 7 shows predictors and outcomes of ML methods that have been used in LCA applications. Characteristics are shown as the most commonly used inputs. Impact categories were the most frequently assessed outcome of these applications.

Fig. 7
figure 7

Sankey diagram to show the relationship between the inputs and outcomes of ML methods in LCA

What are the gaps in research in order to guide future research for ML in LCA? Table 3 shows details of the literature survey on ML methods in LCA for different levels of the built environment. In this paper, the levels are categorized as buildings vs district & cities . For each level of the built environment, different categories of LCA are identified. The applications in which ML methods have been utilized have been marked with an asterisk. This is a roadmap for researchers in LCA who want to apply ML techniques to identify gaps in research. This paper has identified a significant gap in research in the ‘End of Life’ phase and ‘Benefits beyond the system, for buildings. These include demolition, disposal and transport, as well as recycling. At the districts and cities level, the most significant opportunities for ML in LCA research lie in the ‘Networks’ and ‘Open Spaces’.

Table 3 Use cases at different levels of the built environment

The research questions posed in this paper were answered through this literature survey. In the included papers, authors claimed and displayed that ML can be applied to different aspects of the LCA and be a useful tool. ML methods were shown to be applied efficiently in optimization scenarios in LCA. Finally, ML methods were integrated into existing inventory databases to streamline the LCA across many use cases. However, ML-based techniques have been employed less for real-time monitoring and control of real-world LCA.

Future research should focus on using ML technologies in real-time applications to monitor, optimize, and control the built-environment systems. ML models may be more comprehensible than other black-box approaches due to their transparency. Furthermore, hybrid ML applications may expand on the benefits of ML models and overcome limitations to case-specific scenarios for optimizing LCA through their interpolation and extrapolation capabilities. Advanced stochastic metaheuristics should be used in refining ML model training parameters to maximize their accuracy and reliability.

6 Conclusion

LCA, when done successfully, provides a systems view of products systematically and quantitatively and can thus act as a decision support tool. It can then guide the design and give insights on areas for improvement and innovation. However, performing detailed LCA is expensive, time-consuming, and requires a large amount of data.

The ML methods in LCA have received considerable attention as countries are continuing and growing to address the importance and protection of the environment. The climate regulations have encouraged industries to apply LCA using various intelligent technologies. The rapid development of modern technologies, including sensors, information, wireless transmission, network communication, cloud computing, and smart devices have been led to an enormous amount of data accumulation. Therefore, LCA researches have adopted the opportunities made possible by the development of computational techniques and ML methods to improve predictive models. ML methods belong to the category of data-based predictive models and thus aims to use computational methods to allow an algorithm to find a meaningful pattern from an extensive data set.

The contributions of this paper are as follows:

  • This study presented a review of ML models utilized for LCA. It presented a thorough review and critical discussion of various ML technologies to solve function approximation, optimization, monitoring, and control problems in LCA research. Moreover, the advantages and disadvantages of using ML technologies in LCA are highlighted to direct future policymakers efforts in this domain.

  • The reviews show that if computational levels in LCA are divided into three categories, inventory, modelling and optimization, ML is most used at the inventory level for prediction and finding the missing data; and optimizing during the model simulation. The fundamental limitations and challenges faced by applying ML methods in LCA are model complexity and scenario uncertainty.

  • The review identifies that developing ML techniques, including predictive model control and optimization algorithms, can help the policymakers deliver actionable knowledge to inform various control strategies and corrective measures to reduce the gap between predicted and actual environmental impact. This review finds that ML methods can match the LCA results within the accuracy of typical LCA studies and correctly predict the trends.

This review has identified research gaps and given an overview of the progression in this field to aid researchers’ understanding of key concepts for applying ML in LCA. Future research should focus on using ML technologies in real-time applications to monitor, optimize, and control the built-environment systems. ML models may be more comprehensible than other black-box approaches due to their transparency. Furthermore, hybrid ML applications may expand on the benefits of ML models and overcome limitations to case-specific scenarios for optimizing LCA through their interpolation and extrapolation capabilities. Optimization uses can be particularly impactful in life cycle alternatives where the environmental impact of a process, product or system can be reduced. ML has the capability of being integrated as a real-time algorithm, assessing production or changes in processes and responding with potential alternatives. These can be less environmentally impactful and help decision-makers choose the optimum available options of the design, construction/production, facilities management, and demolition processes. Advanced stochastic metaheuristics should be used in refining ML model training parameters to maximize their accuracy and reliability. Nevertheless, ML may not be appropriate for every application and should be considered alongside the cost, length of time and delays which incur from some ML techniques. In the future, the integration of ML models within LCA may be commonplace following further research into applications such as utilizing access to dynamic data and providing detailed and accurate environmental impacts.