Introduction

Cost estimation in the early stages of construction projects involves an extensive amount of uncertainty. Thus, there is a high demand to establish an effective method to reduce uncertainty in cost estimation. An effective cost estimation technique could facilitate the process of time/cost control in construction projects. One conventional method for a rough cost estimation and conducting a feasibility study with a predefined budget is the use of some experts. However, continuous access to these experts is not an easy option, leading to developing another method to estimate the cost of construction projects especially in their early stages. The new method could be based on data generated from the previous similar projects.

As artificial intelligence became popular in the 1980s, a new approach was introduced for construction cost estimation, while several studies employed different methods to estimate the costs in a wide range of industrial applications. Later, in the 1990s, neural networks (NNs) as a branch of artificial intelligence were employed as an alternative to estimate construction costs. This method does not require the determination of a cost estimating function that mathematically relates the cost to the variables with the most effect on the cost. A feature-based neural network (NN) for modelling cost estimation was developed by Zhang et al. (1996) for packaging products. Shtub and Versano (1999) compared the performances of NNs and regression analysis when they estimate the construction cost of a steel-pipe-bending process. Gwang and Sung-Hoon (2004) investigated the accuracies of several cost estimation methods such as multiple regression analysis (MRA), NNs, and case-based reasoning (CBR) based on 530 available historical construction costs from residential buildings. General contractors conducted these projects between 1997 and 2000 in Seoul, Korea. CBR is a methodology that received an increasing attention to make cost estimation during the early phases of a project. The existing knowledge is exploited by this method to make better estimations compared with the case without its use. De Soto and Adey (2015) investigated the CBR reasoning retrieval process to estimate resources in construction projects. Cavalieri et al. (2004) compared parametric and NN models for the estimation of production costs and concluded that NN performs better and is more reliable. Kim et al. (2004) employed a back-propagation neural-network (BPNN) approach combined with genetic algorithms (GAs) to estimate construction costs of residential buildings. The aim of using GAs in their work was to determine the BPNN’s parameters and to improve the accuracy of the estimation. Murat and Ceylan (2006) implemented an artificial neural-network (ANN) process to estimate the cost of energy transportation. Verlinden et al. (2008) developed MRA and ANN-based models to estimate the cost of a sheet metal production. Wang et al. (2013) developed a cost estimator model based on NN. The learning procedure of his NN was completed by means of a particle-swarm optimization algorithm. Zima (2015) presented an approach to estimate the unit price of construction elements with the use of the CBR method. The CBR system presents a knowledge base that supports the cost estimation at the early stage of a construction project.

In addition to the applications of the artificial intelligence approach in cost estimation, a large number of works were devoted to evaluate the effectiveness of machine-learning methods for prediction and forecasting. Geem and Roper (2009) used an ANN model to anticipate the energy demand in South Korea. Geem (2011) developed machine-learning-based models to forecast South Korea’s transport energy demand. By considering the socio-economic indicators as input, Assareh et al. (2010) presented the usage of a particle-swarm optimization (PSO) and a GA to predict the demand for oil in Iran. Gudarzi Farahani et al. (2012) applied a Bayesian vector auto-regressive methodology to forecast Iran’s energy consumption and discussed its potential implications. In addition, Rodger (2014) applied different variables such as price, operating expenditures, drilling cost, the cost for turning the gas on, the price of oil and royalties to predict gas demand. His implemented method was a fuzzy regression nearest neighbor ANN model.

There are different methods in the above-mentioned literature for applying estimation models in various fields, where based on the type of data, input values, and required accuracy, a specific model has been developed for each case. However, there has been no study for estimating the cost of pressure vessel construction. As such, the novelty of the current work is to introduce a simple, high accuracy data-based model for cost estimation in construction projects of pressure vessel tanks. In other words, this paper deals with the problem of estimating cost involved in constructing a spherical storage tank during its early stages, the case that has been overlooked in the past. Two main steps are taken to tackle the problem: (1) identifying the input variables and (2) evaluating the performance of the proposed cost estimator methods using the real construction data. Four different models, i.e., NNs with Levenberg–Marquardt and Bayesian-regulated training algorithms, a linear regression model, and an exponential regression model, are applied in this paper to estimate the cost. In addition, a genetic algorithm is employed to find better estimates of the parameters of the linear and the exponential regression models.

The structure of this paper is illustrated as follows. A brief background on spherical storage tank construction is presented in “Construction cost of spherical storage tank” section. The proposed modeling techniques are proposed in “Artificial neural network” section. Comparative analyses to evaluate the performance of the proposed models come in “Results” section. Finally, conclusions are made in “Conclusion” section.

Construction cost of spherical storage tank

A typical scheme of a spherical storage tank is illustrated in Fig. 1. The main parts of a spherical storage tank are pressure parts (shell plates) and supporting structure system that includes individual cylindrical pillars and braces. American society of mechanical engineers (ASME) codes Sec VIII (2015) provides the most common governing rules for designing, constructing, and inspecting spherical storage tanks. The construction activities include two types of manufacturing: one that is performed in a shop and the other that is to finalize the construction at a site.

Fig. 1
figure 1

General view of a spherical storage tank

Figure 2 shows the main activities including marking, cutting, forming shell plates, forming, and constructing upper column and its junction to petals (pressure parts), all performed in a shop. Activities that are completed at the site are demonstrated in Fig. 3. These activities are started by assembling columns, bracing system and shell plates, and welding. In addition, some other operations such as post-weld heat treatment (PWHT), final non-destructive tests, and hydrostatic test are performed as per the design requirements after completing the welding of shell plates.

Fig. 2
figure 2

Shop activity sequence

Fig. 3
figure 3

Stages of erecting spherical storage tank at the site

Artificial neural network

Artificial neural network (ANN) is an intelligent numerical procedure that includes three main steps. These steps are applied to three layers as input, middle or hidden, and output layers. The input layer provides input variables to the network in the form of a vector with the dimension equal to the number of neurons. The hidden layer represents the main part of the network and covers the network of neurons. The neurons in the hidden layer are the main computational parts of ANNs. Every neuron receives input signals, based on which it generates the corresponding output values using an assigned activation function, as shown in

$$y = f\left( {\mathop \sum \limits_{i} w_{i} x_{i} - \theta } \right),$$
(1)

where w i , x i , θ, and y are the weighting factor, the input of each node, the bias, and the output, respectively. While various activation functions (f) is applied, the most conventional forms are shown in Eqs. (24):

$$f\left( x \right) = {\text{Sigmoid}}\left( x \right) = {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 {(1 + e^{x} )}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${(1 + e^{x} )}$}}$$
(2)
$$f\left( x \right) = {\text{Signum}}\left( x \right) = \left\{ {\begin{array}{*{20}c} {1\quad {\text{if}}\;(x) > 0} \\ {0\quad {\text{if}}\;(x) = 0} \\ { - 1\quad {\text{if }}\;(x) < 0} \\ \end{array} } \right.$$
(3)
$$f\left( x \right) = {\text{Step}}\left( x \right) = \left\{ {\begin{array}{*{20}c} {1\quad {\text{if}}\;(x) > 0} \\ {0\quad {\text{otherwise}}.} \\ \end{array} } \right.$$
(4)

The weights deal with the parameters, which are multiplied by the input values of the neuron. The weighting values and the bias factors are correlated with the structure of an ANN. Figure 4 illustrates the structure of a neuron (Gonzalez 2008).

Fig. 4
figure 4

Structure of neuron (Gonzalez 2008)

The ANN used in this paper is well known and one of the most applied NNs called multilayer perceptron (MPNN). To design an MPNN, the data set is divided into three groups of training, validation, and testing to update the weights and the bias factors. There are two types of learning algorithms to develop an NN in this paper: the Levenberg–Marquardt and the Bayesian regulated. Moreover, in the training phase, 70% of the data is assigned for training, 15% for testing, and 15% for validation (Wang et al. 2002).

Learning is based on tracking error, which is described as the difference between the model output and the actual data. Monitoring the error in addition to a predetermined maximum number of iterations provides the stopping criteria of the learning phase. The mean squared error (MSE) and the correlation between the inputs and the outputs (R 2) is used in this research to evaluate the performance of the network. MSE is the average squared differences between outputs and targets, where lower values are apparently better. Besides, R 2 defines how the model fits the real observations, i.e., it explains the correlation between the outputs and the targets, where 1 and 0 mean close and random relationship, respectively. MSE and R 2 are defined as

$${\text{MSE}} = {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 n}}\right.\kern-0pt} \!\lower0.7ex\hbox{$n$}}\mathop \sum \limits_{i = 1}^{n} (\hat{Y}_{i} - Y_{i} )^{2}$$
(5)
$$R^{2} = {{\left( {n\mathop \sum \limits_{i = 1}^{n} \left( {\hat{Y}_{i} Y} \right)_{i} - \mathop \sum \limits_{i = 1}^{n} (\hat{Y}_{i} )\mathop \sum \limits_{i = 1}^{n} (Y_{i} )} \right)} \mathord{\left/ {\vphantom {{\left( {n\mathop \sum \limits_{i = 1}^{n} \left( {\hat{Y}_{i} Y} \right)_{i} - \mathop \sum \limits_{i = 1}^{n} (\hat{Y}_{i} )\mathop \sum \limits_{i = 1}^{n} (Y_{i} )} \right)} {}}} \right. \kern-0pt} {}}\left( {\sqrt {n\left( {\mathop \sum \limits_{i = 1}^{n} Y_{i}^{2} } \right) - \left( {\mathop \sum \limits_{i = 1}^{n} Y_{i} } \right)^{2} } \sqrt {n\left( {\mathop \sum \limits_{i = 1}^{n} \hat{Y}_{i}^{2} } \right) - \left( {\mathop \sum \limits_{i = 1}^{n} \hat{Y}_{i} } \right)^{2} } } \right),$$
(6)

where n is number of samples, \(\hat{Y}_{i}\) is the output of the network, and Y i is the value of the real data.

Genetic algorithm

GA is a population-based meta-heuristic algorithm using to find near-optimum solutions for optimization problems with complexity. It is developed based on evolutionary biologies such as inheritance, mutation, selection, and crossover (i.e., recombination). The evolution usually begins with a population, which is formed, randomly in the predefined range of variables. In each generation, a fitness function is evaluated for ranking the members. The new population is generated by means of the best members of the previous population. The stopping condition for the algorithm is defined according to the maximum number of generations or a satisfactory fitness level. The name “GA” is indeed an emphasis on the motivation of a genetic optimization algorithm based on improving the individuals by manipulating of their genotype. The changes in candidates are taken place by crossover and mutation operations. In the crossover operation, two members of the population are selected as parents based on which new individuals are generated by swapping, while in the mutation operation, a single member of the previous population is replaced with a new one. These operations are shown in Fig. 5, schematically. By applying these operators, a great variety is generated to find the optimal answer. After defining the number of variables (the number of cells in the solution strings or chromosomes) and determining the upper and lower limits for each variable, the number of members in the population is defined. Then, the probabilities of performing the crossover and the mutation operations are defined to say how often the operators are taken place.

Fig. 5
figure 5

Crossover and mutation operators in GA

Integration of a GA

In this section, the application of GA in the proposed regression cost estimation models is discussed. The aim of using GA is to find the best combination of the coefficients of a linear and an exponential regression model to minimize MSE. The use of a GA was demonstrated in Hasheminia and Niaki (2006), where they proposed a GA to find the best regression model among some candidates.

The coefficients of the regression models are treated as the decision variables. The initial populations in GA are randomly generated by defining upper and lower limits for the coefficients coded as 1 and − 1, respectively, based on which the coefficients are assigned to cover a complete range. By defining the fitness function as the mean squared error, the initial population is evaluated to select the best members to generate the next population. The fitness function, f(x) takes the following form:

$${\text{Min}}\,f\left( x \right) = {\text{Min}}\left( {{\text{MSE}} = 1/n\mathop \sum \limits_{i = 1}^{n} (o - p)^{2} } \right),$$
(7)

where o and p are the measured and predicted values and n is the number of measurements.

The linear and the exponential regression models to estimate the construction costs are defined as

$$y_{\text{Linear}} = \beta_{0} + \beta_{1} x_{1} + \beta_{2} x_{2} + \beta_{3} x_{3}$$
(8)
$$y_{\text{Exponential}} = \beta_{0} + \beta_{1} x_{1}^{{\beta_{2} }} + \beta_{3} x_{2}^{{\beta_{4} }} + \beta_{5} x_{3}^{{\beta_{6} }} ,$$
(9)

where y denotes the dependent variable (response), x i represents an independent variable, and β i represents a coefficient. For these models, a GA is applied for finding near-optimal values of their coefficients, where 60 samples are used. The objective function used in GA is to minimize the MSE obtained based on the actual and the estimated costs. Based on a pilot study, the parameters of the GA are set as

  • Population size (n): 20.

  • Iterations (number of the generation): 20,000.

  • Mutation percentage: 70%.

  • Crossover percentage: 30%.

Results

In this study, the effects of the thickness, tank diameter, and the length of weld lines on the construction cost of spherical storage tanks are investigated using NNs with Levenberg–Marquardt and Bayesian regularized learning algorithms and the linear and the exponential regression models, both hybridized with a genetic algorithm. The GA was coded in the MATLAB software and was executed to estimate the construction cost of the spherical storage tank in 11 random samples. Matlab has been used successfully in several studies such as Valipour and Montazar (2012); Valipour (2012, 2013 and 2014) to analyze data and to develop the required models. In the presented modeling, the training phase terminates if the stopping criterion defined as MSE is met. Otherwise, the weights are updated until a desired MSE is achieved. The main purpose of the NN training is to achieve better memorization and generalization capability, which is mainly dependent on the learning algorithm. The appropriate network configuration is selected by testing different numbers of hidden layers for both Levenberg–Marquardt neural network (LMNN) and Bayesian regularized neural network (BRNN), as shown in Table 1. The results in this table indicate that eight hidden layers for the LMNN and ten hidden layers for the BRNN give the best MSE of \(2.53{\text{e}}^{ - 4}\) and \(5.07{\text{e}}^{ - 4}\), respectively. These values are highlighted in Table 1. The numbers of neurons in hidden layers are the same as ten for all the mentioned cases, as shown in Tables 1 and 2. The stopping criteria for the learning phase are the number of epochs and the number of validation set as 1000 and 6, respectively. The numbers of neurons in the input layer are equal to the number of variables as thickness, tank diameter, and length of the weld. The output of all models is considered as the cost of the project; therefore, the output layer has only one neuron.

Table 1 Training stage MSE for different numbers of hidden layers
Table 2 Testing performance of the LMNN and BRNN

The mean squared error is used to evaluate the performance of the NNs when they are used on 11 randomly selected testing data sets to estimate the costs. The results that are given in Table 2 show the better performance of the LMNN. The range of the predicted cost by LMNN is (0.285–0.760) and for BRNN is (0.235–0.714). The range of the actual cost is (0.267–0.822).

The comparisons among the actual data and their neural-network-based estimated values are shown in Figs. 6 and 7. These figures clearly show that the Levenberg–Marquardt training approach has a better performance, where its estimated costs are closer to their corresponding actual costs. Moreover, the R values in Fig. 8 that are shown for the LMNN and BRNN networks measure the correlation between the estimated (output) and the actual data (target), where an R value of 1 means a high correlation and 0 indicates a random connection. This correlation is higher for the LMNN (0.99516) than the one of the BRNN (0.99234).

Fig. 6
figure 6

Comparison between the actual cost and LMNN predicted cost

Fig. 7
figure 7

Comparison between actual cost and BRNN predicted cost

Fig. 8
figure 8

Actual and predicted costs obtained by LMNN (left) and BRNN (right)

For the regression models, the GA was coded in the MATLAB software and was executed to estimate the construction cost of the spherical storage tank in the 11 random samples previously used in the NN models. A comparison between the actual cost and the predicted cost obtained by the linear and the exponential cost models is shown in Figs. 9 and 10, respectively, where it can be concluded that the exponential cost model performs better than its counterpart, i.e., the linear cost model. In addition, the MSE of these models for the 11 cases under investigation is given in Table 3, where the exponential model with an MSE of 0.4 is preferred to the linear model with a larger MSE of 0.5. Note that the LMNN with an MSE of 0.291 had a better performance than the exponential regression model hybridized with the GA. However, although both linear and exponential models were successed in estimating the trend of cost, the exponential model has a lower error level. In other words, the error involved in the linear model in estimating the peak cost is larger as it cannot adopt itself with different cases.

Fig. 9
figure 9

Comparison between the actual cost and the predicted cost obtained by the linear cost model

Fig. 10
figure 10

Comparison between the actual cost and the predicted cost obtained by the exponential cost model

Table 3 MSE of the linear and the exponential regression cost models

Conclusion

This paper presented data-driven models consisting of ANNs and regression models to estimate the construction cost of spherical storage tank projects. The variables considered in these models were thickness, tank diameter, and length of the weld. The learning algorithms of the developed multilayer perceptron NNs were the Levenberg–Marquardt and the Bayesian regulated. Moreover, a GA was employed to find near-optimum values of the parameters of a linear as well as an exponential regression model. While it was shown that the NNs had better capabilities in cost estimation compared to the regression models, the LMNN performed better than the BRNN in terms of MSE. In general, we showed that ANN models could play a very important role for an efficient estimate of construction project costs in their early stages.

The level of uncertainties can be reduced by means of increasing the samples in training data. Moreover, gathering data is important to form a reliable and effective data set. For this purpose, valid and updated resources should be available. In addition, based on the results obtained in the current study, choosing an appropriate model to describe the trend of data is an important task in this regard.

Future work may focus on comparing the performance of the proposed ANN method with the one of another ANN approach when it is hybridized with a meta-heuristic such as GA, Bees Algorithm, Artificial Bee Colony, Ant Colony, etc.