Skip to main content

Estimation of Linear Model Parameters Using Least Squares

  • Chapter
  • First Online:
Applied Data Analysis and Modeling for Energy Engineers and Scientists

Abstract

This chapter deals with methods to estimate parameters of linear parametric models using ordinary least squares (OLS). The univariate case is first reviewed along with equations for the uncertainty in the model estimates as well as in the model predictions. Several goodness-of-fit indices to evaluate the model fit are also discussed, and the assumptions inherent in OLS are highlighted. Next, multiple linear models are treated, and several notions specific to correlated regressors are presented. The insights which residual analysis provides are discussed, and different types of remedial actions to improper model residuals are addressed. Other types of linear models such as splines and models with indicator variables are discussed. Finally, a real-world case study analysis which was meant to verify whether actual field tests supported the claim that a refrigerant additive improved chiller thermal performance is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    From Walpole et al. (1998) by © permission of Pearson Education.

  2. 2.

    Parsimony in the context of regression model building is a term denoting the most succinct model, i.e., one without any statistically superfluous regressors.

  3. 3.

    From Draper and Smith (1981) by permission of John Wiley and Sons.

  4. 4.

    Actually the outdoor humidity impacts energy use only when the dew point temperature exceeds a certain threshold which many studies have identified to be about 55°F (this is related to how the HVAC is controlled in response to human comfort). This type of conditional variable is indicated as a + superscript.

  5. 5.

    Actually, there is no “best” model since random variables are involved. A better term would be “most plausible” and should include mechanistic considerations, if appropriate.

  6. 6.

    From ASHRAE (2005) © American Society of Heating, Refrigerating and Air-Conditioning Engineers, Inc., www.ashrae.org.

  7. 7.

    From Stoecker (1989) by permission of McGraw-Hill.

  8. 8.

    Data for this problem is given in Appendix B.

  9. 9.

    Data for this problem is given in Appendix B.

References

  • ASHRAE, 1978, Standard 93-77: Methods of Testing to Determine the Thermal Performance of Solar Collectors, American Society of Heating, Refrigerating and Air-Conditioning Engineers, Atlanta, GA.

    Google Scholar 

  • ASHRAE, 2005. Guideline 2-2005: Engineering Analysis of Experimental Data, American Society of Heating, Refrigerating and Air-Conditioning Engineers, Atlanta, GA.

    Google Scholar 

  • ASHRAE, 2009. Fundamentals Handbook, American Society of Heating, Refrigerating and Air-Conditioning Engineers, Atlanta, GA.

    Google Scholar 

  • Belsley, D.A., E. Kuh, and R.E Welsch, 1980, Regression Diagnostics, John Wiley & Sons, New York.

    Book  MATH  Google Scholar 

  • Chatfield, C., 1995. Problem Solving: A Statistician’s Guide, 2nd Ed., Chapman and Hall, London, U.K.

    MATH  Google Scholar 

  • Chatterjee, S. and B. Price, 1991. Regression Analysis by Example, 2nd Edition, John Wiley & Sons, New York.

    Google Scholar 

  • Cook, R.D. and S. Weisberg, 1982. Residuals and Influence in Regression, Chapman and Hall, New York.

    MATH  Google Scholar 

  • DOE-2, 1993. Building Energy simulation software developed by Lawrence Berkeley National Laboratory with funding from U.S. Department of Energy, http://simulationresearch.lbl.gov/

  • Draper, N.R. and H. Smith, 1981. Applied Regression Analysis, 2nd Ed., John Wiley and Sons, New York.

    MATH  Google Scholar 

  • Gordon, J.M. and K.C. Ng, 2000. Cool Thermodynamics, Cambridge International Science Publishing, Cambridge, UK

    Google Scholar 

  • Hair, J.F., R.E. Anderson, R.L. Tatham, and W.C. Black, 1998. Multivariate Data Analysis, 5th Ed., Prentice Hall, Upper Saddle River, NJ,

    Google Scholar 

  • Katipamula, S., T. A. Reddy and D. E. Claridge, 1998. “Multivariate regression modeling”, ASME Journal of Solar Energy Engineering, vol. 120, p.177, August.

    Article  Google Scholar 

  • Pindyck, R.S. and D.L. Rubinfeld, 1981. Econometric Models and Economic Forecasts, 2nd Edition, McGraw-Hill, New York, NY.

    Google Scholar 

  • Reddy, T.A., 1987. The Design and Sizing of Active Solar Thermal Systems, Oxford University Press, Clarendon Press, U.K., September.

    Google Scholar 

  • Reddy, T.A., N.F. Saman, D.E. Claridge, J.S. Haberl, W.D. Turner W.D., and A.T. Chalifoux, 1997. Baselining methodology for facility-level monthly energy use- part 1: Theoretical aspects, ASHRAE Transactions, v.103 (2), American Society of Heating, Refrigerating and Air-Conditioning Engineers, Atlanta, GA.

    Google Scholar 

  • Schenck, H., 1969. Theories of Engineering Experimentation, Second Edition, McGraw-Hill, New York.

    Google Scholar 

  • Shannon, R.E., 1975. System Simulation: The Art and Science, Prentice-Hall, Englewood Cliffs, NJ.

    Google Scholar 

  • Stoecker, W.F., 1989. Design of Thermal Systems, 3rd Edition, McGraw-Hill, New York.

    Google Scholar 

  • Walpole, R.E., R.H. Myers, and S.L. Myers, 1998. Probability and Statistics for Engineers and Scientists, 6th Ed., Prentice Hall, Upper Saddle River, NJ

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to T. Agami Reddy .

Problems

Problems

Pr. 5.1

Table 5.13 lists various properties of saturated water in the temperature range 0–100°C.

Table 5.13 Data table for Problem 5.1
  1. (a)

    Investigate first order and second-order polynomials that fit saturated vapor enthalpy to temperature in °C. Identify the better model by looking at R2, RMSE and CV values for both models. Predict the value of saturated vapor enthalpy at 30°C along with 95% confidence intervals and 95% prediction intervals.

  2. (b)

    Repeat the above analysis for specific volume but investigate third-order polynomial fits as well. Predict the value of specific volume at 30°C along with 95% confidence intervals and 95% prediction intervals.

Pr. 5.2

Tensile tests on a steel specimen yielded the results shown in Table 5.14.

Table 5.14 Data table for Problem 5.2
  1. (a)

    Assuming the regression of y on x to be linear, estimate the parameters of the regression line and determine the 95% confidence limits for x = 4.5

  2. (b)

    Now regress x on y, and estimate the parameters of the regression line. For the same value of y predicted in (a) above, determine the value of x. Compare this value with the value of 4.5 assumed in (a). If different, discuss why.

  3. (c)

    Compare the R2 and CV values of both models.

  4. (d)

    Plot the residuals of both models

  5. (e)

    Of the two models, which is preferable for OLS estimation.

Pr. 5.3

The yield of a chemical process was measured at three temperatures (in °C), each with two concentrations of a particular reactant, as recorded in Table 5.15.

Table 5.15 Data table for Problem 5.3
  1. (a)

    Use OLS to find the best values of the coefficients a, b, and c in the equation: y = a + bt + cx.

  2. (b)

    Calculate the R2, RMSE, and CV of the overall model as well as the SE of the parameters

  3. (c)

    Using the β coefficient concept described in Sect. 5.4.5, determine the relative importance of the two independent variables on the yield.

Pr. 5.4

Cost of electric power generation versus load factor and cost of coal

The cost to an electric utility of producing power (CEle) in mills per kilowatt-hr ($ 10−3/kWh) is a function of the load factor (LF) in % and the cost of coal (Ccoal) in cents per million Btu. Relevant data is assembled in Table 5.16.

Table 5.16 Data table for Problem 5.4
  1. (a)

    Investigate different models (first order and second order with and without interaction terms) and identify the best model for predicting CEle vs LF and CCoal. Use stepwise regression if appropriate. (Hint: plot the data and look for trends first).

  2. (b)

    Perform residual analysis

  3. (c)

    Calculate the R2, RMSE, and CV of the overall model as well as the SE of the parameters

Pr. 5.5

Modeling of cooling tower performance

Manufacturers of cooling towers often present catalog data showing outlet-water temperature Tco as a function of ambient air wet-bulb temperature (Twb) and range (which is the difference between inlet and outlet water temperatures). Table 5.17 assembles data for a specific cooling tower. Identify an appropriate model (investigate first order and second order polynomial models for Tco) by looking at R2, RMSE and CV values, the individual t-values of the parameters as well as the behavior of the overall model residuals.

Table 5.17 Data table for Problem 5.5

Pr. 5.6

Steady-state performance testing of solar thermal flat plate collector

Solar thermal collectors are devices which convert the radiant energy from the sun into useful thermal energy that goes to heating, say, water for domestic or for industrial applications. Because of low collector time constants, heat capacity effects are usually small compared to the hourly time step used to drive the model. The steady-state useful energy qC delivered by a solar flat-plate collector of surface area AC is given by the Hottel-Whillier-Bliss equation (Reddy 1987):

$$ {q_c} = {A_c}{F_R}{\big[{I_T}{\eta _n} - {U_L}({T_{Ci}} - {T_a})\big]^ + } $$
(5.64)

where FR is called the heat removal factor and is a measure of the solar collector performance as a heat exchanger (since it can be interpreted as the ratio of actual heat transfer to the maximum possible heat transfer); η n is the optical efficiency or the product of the transmittance and absorptance of the cover and absorber of the collector at normal solar incidence; UL is the overall heat loss coefficient of the collector which is dependent on collector design only, IT is the radiation intensity on the plane of the collector, Tci is the temperature of the fluid entering the collector, and Ta is the ambient temperature. The + sign denotes that only positive values are to be used, which physically implies that the collector should not be operated if qC is negative i.e., when the collector loses more heat than it can collect (which can happen under low radiation and high Tci conditions).

Steady-state collector testing is the best manner for a manufacturer to rate his product. From an overall heat balance on the collector fluid and from Eq. 5.64, the expressions for the instantaneous collector efficiency η c under normal solar incidence are:

$$ \begin{array}{l} {\eta _C} \equiv \displaystyle\frac{{{q_C}}}{{{A_C}{I_T}}} = \displaystyle\frac{{{{(m{c_p})}_C}({T_{Co}} - {T_{Ci}})}}{{{A_C}{I_T}}} \\\quad \,\, = \left[{F_R}{\eta _n} - {F_R}{U_L}\left(\displaystyle\frac{{{T_{Ci}} - {T_a}}}{{{I_T}}}\right)\right] \\\end{array} $$
(5.65)

where mc is the total fluid flow rate through the collectors, cpc is the specific heat of the fluid flowing through the collector, and Tci and Tco are the inlet and exit temperatures of the fluid to the collector. Thus, measurements (of course done as per the standard protocol, ASHRAE 1978) of IT, Tci and Tco are done under a pre-specified and controlled value of fluid flow rate. The test data are plotted as η c against reduced temperature [(TCi − Ta)/IT] as shown in Fig. 5.32. A linear fit is made to these data points by regression, from which the values of FR η n and FR UL are easily deduced.

Fig. 5.32
figure 32

Test data points of thermal efficiency of a double glazed flat-plate liquid collector with reduced temperature. The regression line of the model given by Eq. 5.65 is also shown. (From ASHRAE (1978 ) © American Society of Heating, Refrigerating and Air-conditioningEngineers, Inc., www.ashvae.org)

If the same collector is testing during different days, slightly different numerical values are obtained for the two parameters F R η n and F R U L which are often, but not always, within the uncertainty bands of the estimates. Model misspecification (i.e., the model is not perfect, for example, it is known that the collector heat losses are not strictly linear) is partly the cause of such variability. This is somewhat disconcerting to a manufacturer since this introduces ambiguity as to which values of the parameters to present in his product specification sheet.

The data points of Fig. 5.32 are assembled in Table 5.18. Assume that water is the working fluid.

Table 5.18 Data table for Problem 5.6
  1. (a)

    Perform OLS regression using Eq. 5.65 and identify the two parameters F R η n and F R U L along with their standard errors. Plot the model residuals, and study their behavior.

  2. (b)

    Draw a straight line visually through the data points and determine the x-axis and y-axis intercepts. Estimate the F R η n and F R U L parameters and compare them with those determined from (a).

  3. (c)

    Calculate the R2, RMSE and CV values of the model

  4. (d)

    Calculate the F-statistic to test for overall model significance of the model

  5. (e)

    Perform t-tests on the individual model parameters

  6. (f)

    Use the model to predict collector efficiency when IT   = 800 W/m2, Tci   = 35°C and Ta = 10°C

  7. (g)

    Determine the 95% CL intervals for the mean and individual responses for (f) above.

  8. (h)

    The steady-state model of the solar thermal collector assumes the heat loss term given by [UA(T ci  − T a ] to be linear with the temperature difference between collector inlet temperature and the ambient temperature. One wishes to investigate whether the model improves if the loss term is to include an additional second order term:

    1. (i)

      Derive the resulting expression for collector efficiency analogous to Eq. 5.65?

      (Hint: start with the fundamental heat balance equation—Eq. 5.64)

    2. (ii)

      Does the data justify the use of such a model?

Pr. 5.7

Dimensionless model for fans or pumps

The performance of a fan or pump is characterized in terms of the head or the pressure rise across the device and the flow rate for a given shaft power. The use of dimensionless variables simplifies and generalizes the model. Dimensional analysis (consistent with fan affinity laws for changes in speed, diameter and air density) suggests that the performance of a centrifugal fan can be expressed as a function of two dimensionless groups representing flow coefficient and pressure head respectively:

$$ \Psi= \frac{{SP}}{{{D^2}{\omega ^2}\rho }}\quad {\rm{and}}\quad \Phi {\rm{ = }}\frac{Q}{{{D^3}\omega }} $$
(5.66)

where SP is the static pressure, Pa; D the diameter of wheel, m; ω the rotative speed, rad/s; ρ the density, kg/m3 and Q the volume flow rate of air, m3/s.

For a fan operating at constant density, it should be possible to plot one curve \({\Psi}\;{\rm{ vs }}\;\Phi\) that represents the performance at all speeds. The performance of a certain 0.3 m diameter fan is shown in Table 5.19.

Table 5.19 Data table for Problem 5.7
  1. (a)

    First, plot the data and formulate two or three promising functions.

  2. (b)

    Identify the best function by looking at the R2, RMSE and CV values and also at the residuals.

Assume density of air at STP conditions to be 1.204 kg/m3

Pr. 5.8

Consider the data used in Example 5.6.3 meant to illustrate the use of weighted regression for replicate measurements with non-constant variance. For the same data set, identify a model using the logarithmic transform approach similar to that shown in Example 5.6.2

Pr. 5.9

Spline models for solar radiation

This problem involves using splines for functions with abrupt hinge points. Several studies have proposed correlations to predict different components of solar radiation from more routinely measured components. One such correlation relates the fraction of hourly diffuse solar radiation on a horizontal radiation (Id) and the global radiation on a horizontal surface (I) to a quantity known as the hourly atmospheric clearness index (k T  = I/I 0) where I0 is the extraterrestrial hourly radiation on a horizontal surface at the same latitude and time and day of the year (Reddy 1987). The latter is an astronomical quantity and can be predicted almost exactly. Data has been gathered (Table 5.20) from which a correlation between (I d /I) = f(k T ) needs to be identified.

Table 5.20 Data table for Problem 5.9
  1. (a)

    Plot the data and visually determine likely locations of hinge points. (Hint: there should be two points, one at either extreme).

  2. (b)

    Previous studies have suggested the following three functional forms: a constant model for the lower range, a second order for the middle range, and a constant model for the higher range. Evaluate with the data provided whether this functional form still holds, and report pertinent models and relevant goodness-of-fit indices.

Pr. 5.10

Modeling variable base degree-days with balance point temperature at a specific location

Degree-day methods provide a simple means of determining annual energy use in envelope-dominated buildings operated constantly and with simple HVAC systems which can be characterized by a constant efficiency. Such simple single-measure methods capture the severity of the climate in a particular location. The variable base degree day (VBDD) is conceptually similar to the simple degree-day method but is an improvement since it is based on the actual balance point of the house instead of the outdated default value of 65°F or 18.3°C (ASHRAE 2009). Table 5.21 assembles the VBDD values for New York City, NY from actual climatic data over several years at this location.

Table 5.21 Data table for Problem 5.10

Identify a suitable regression curve for VBDD versus balance point temperature for this location and report all pertinent statistics.

Pr. 5.11

Change point models of utility bills in variable occupancy buildings

Example 5.7.1 illustrated the use of linear spline models to model monthly energy use in a commercial building versus outdoor dry-bulb temperature. Such models are useful for several purposes, one of which is for energy conservation. For example, the energy manager may wish to track the extent to which energy use has been increasing over the years, or the effect of a recently implemented energy conservation measure (such as a new chiller). For such purposes, one would like to correct, or normalize, for any changes in weather since an abnormally hot summer could obscure the beneficial effects of a more efficient chiller. Hence, factors which change over the months or the years need to be considered explicitly in the model. Two common normalization factors include changes to the conditioned floor area (for example, an extension to an existing wing), or changes in the number of students in a school. A model regressing monthly utility energy use against outdoor temperature is appropriate for buildings with constant occupancy (such as residences) or even offices. However, buildings such as schools are practically closed during summer, and hence, the occupancy rate needs to be included as the second regressor. The functional form of the model, in such cases, is a multi-variate change point model given by:

$$ \begin{aligned}y &= {\beta _{0,un}} + {\beta _0}{f_{oc}} + {\beta _{1,un}}x +{\beta _1}{f_{oc}}x \\& \quad+ {\beta _{2,un}}(x - {x_c})I + {\beta_2}{f_{oc}}(x - {x_c})I\end{aligned} $$
(5.67)

where x and y are the monthly mean outdoor temperature (T o ) and the electricity use per square foot of the school (E) respectively, and f oc  =N oc /N total   represents the fraction of days in the month when the school is in session (Noc) to the total number of days in that particular month (Ntotal). The factor foc can be determined from the school calendar. Clearly, the unoccupied fraction f un  = 1 − f oc .

The term I represents an indicator variable whose numerical value is given by Eq. 5.54b. Note that the change point temperatures for occupied and unoccupied periods are assumed to be identical since the monthly data does not allow this separation to be identified.

Consider the monthly data assembled (shown in Table 5.22).

  1. (a)

    Plot the data and look for change points in the data. Note that the model given by Eq. 5.67 has 7 parameters of which xc (the change point temperature) is the one which makes the estimation non-linear. By inspection of the scatter plot, you will assume a reasonable value for this variable, and proceed to perform a linear regression as illustrated in Example 5.7.1. The search for the best value of xc (one with minimum RMSE) would require several OLS regressions assuming different values of the change point temperature.

  2. (b)

    Identify the parsimonious model, and estimate the appropriate parameters of the model. Note that of the six parameters appearing in Eq. 5.67, some of the parameters may be statistically insignificant, and appropriate care should be exercised in this regard. Report appropriate model and parameter statistics.

  3. (c)

    Perform a residual analysis and discuss results.

Table 5.22 Data table for Example 5.11

Pr. 5.12

Determining energy savings from monitoring and verification (M&V) projects

A crucial element in any energy conservation program is the ability to verify savings from measured energy use data—this is referred to as monitoring and verification (M&V). Energy service companies (ESCOs) are required, in most cases, to perform this as part of their services. Figure 5.33 depicts how energy savings are estimated. A common M&V protocol involves measuring the monthly total energy use at the facility for whole year before the retrofit (this is the baseline period or the pre-retrofit period) and a whole year after the retrofit (called the post-retrofit period). The time taken for implementing the energy saving measures (called the “construction period”) is neglected in this simple example. One first identifies a baseline regression model of energy use against ambient dry-bulb temperature T o during the pre-retrofit period E pre  = f(T o ). This model is then used to predict energy use during each month of the post-retrofit period by using the corresponding ambient temperature values. The difference between model predicted and measured monthly energy use is the energy savings during that month.

$$ \begin{aligned}{{Energy\;savings}} =& {{Model}}\hbox{-}{{predicted\;pre}}\hbox{-}{{{\textit {retrofit}}\; use }}\\&-{{measured}}\;{{post}}\hbox{-}{{{\textit {retrofit}}\;use}}\end{aligned}$$
(5.68)

The determination of the annual savings resulting from the energy retrofit and its uncertainty are finally determined. It is very important that the uncertainty associated with the savings estimates be determined as well for meaningful conclusions to be reached regarding the impact of the retrofit on energy use.

You are given monthly data of outdoor dry bulb temperature (To) and area-normalized whole building electricity use WBe) for two years (Table 5.23). The first year is the pre-retrofit period before a new energy management and control system (EMCS) for the building is installed, and the second is the post-retrofit period. Construction period, i.e., the period it takes to implement the conservation measures is taken to be negligible.

Table 5.23 Data table for Problem 5.12
  1. (a)

    Plot time series and x–y plots and see whether you can visually distinguish the change in energy use as a result of installing the EMCS (similar to Fig. 5.33);

    Fig. 5.33
    figure 33

    Schematic representation of energy use prior to and after installing energy conservation measures (ECM) and of the resulting energy savings

  2. (b)

    Evaluate at least two different models (with one of them being a model with indicator variables) for the pre-retrofit period, and select the better model;

  3. (c)

    Use this baseline model to determine month-by-month energy use during the post-retrofit period representative of energy use had not the conservation measure been implemented;

  4. (d)

    Determine the month-by-month as well as the annual energy savings (this is the “model-predicted pre-retrofit energy use” of Eq. 5.68);

  5. (e)

    The ESCO which suggested and implemented the ECM claims a savings of 15%. You have been retained by the building owner as an independent M&V consultant to verify this claim. Prepare a short report describing your analysis methodology, results and conclusions. (Note: you should also calculate the 90% uncertainty in the savings estimated assuming zero measurement uncertainty. Only the cumulative annual savings and their uncertainty are required, not month-by-month values).

Pr. 5.13

Grey-box and black-box models of centrifugal chiller using field data

You are asked to evaluate two types of models: physical or gray-box models versus polynomial or black-box models. A brief overview of these is provided below.

(a) Gray-Box Models

The Universal Thermodynamic Model proposed by Gordon and Ng (2000) is to be used. The GN model is a simple, analytical, universal model for chiller performance based on first principles of thermodynamics and linearized heat losses. The model predicts the dependent chiller COP (defined as the ratio of chiller (or evaporator) thermal cooling capacity Q ch by the electrical power P comp consumed by the chiller (or compressor) with specially chosen independent (and easily measurable) parameters such as the fluid (water or air) inlet temperature from the condenser T cdi , fluid temperature leaving the evaporator (or the chilled water return temperature from the building) T cho , and the thermal cooling capacity of the evaporator (similar to the figure for Example 5.4.3). The GN model is a three-parameter model which, for parameter identification, takes the following form:

$$ \begin{aligned}&\left( {\frac{1}{{COP}} + 1}\right)\frac{{{T_{cho}}}}{{{T_{cdi}}}} - 1 ={a_1}\frac{{{T_{cho}}}}{{{Q_{ch}}}} \\& + {a_2}\frac{{({T_{cdi}} -{T_{cho}})}}{{{T_{cdi}}{Q_{ch}}}} + {a_3}\frac{{(1/COP +1){Q_{ch}}}}{{{T_{cdi}}}} \end{aligned}$$
(5.69)

where the temperatures are in absolute units, and the parameters of the model have physical meaning in terms of irreversibilities:

  • a 1 = ∆s, the total internal entropy production rate in the chiller due to internal irreversibilities,

  • a 2 = Q leak , the rate of heat losses (or gains) from (or in to) the chiller,

  • \({a_3} = R =\displaystyle\frac{1}{{{{(mCE)}_{cond}}}} + \frac{{1 - {E_{evap}}}}{{{{(mCE)}_{evap}}}}\) i.e., the total heat exchanger thermal resistance which represents the irreversibility due to finite-rate heat exchanger, and m is the mass flow rate, C the specific heat of water, and E is the heat exchanger effectiveness.

The model applies both to unitary and large chillers operating under steady state conditions. Evaluations by several researchers have shown this model to be very accurate for a large number of chiller types and sizes. If one introduces:

$$\begin{aligned}{x_1}= \frac{{{T_{cho}}}}{{{Q_{ch}}}},{x_2}& = \frac{{({T_{cdi}} -{T_{cho}})}}{{{T_{cdi}}{Q_{ch}}}},{x_3} = \frac{{(1/COP +1){Q_{ch}}}}{{{T_{cdi}}}}\\ &\text{and}\quad y = \left({\frac{1}{{COP}} + 1} \right)\frac{{{T_{cho}}}}{{{T_{cdi}}}} -1\end{aligned}$$
(5.70)

Eq. 5.69 assumes the following linear form:

$$ y = {a_1}{x_1} + {a_2}{x_2} + {a_3}{x_3} $$
(5.71)

Although most commercial chillers are designed and installed to operate at constant coolant flow rates, variable condenser water flow operation (as well as evaporator flow rate) is being increasingly used to improve overall cooling plant efficiency especially at low loads. In order to accurately correlate chiller model performance under variable condenser flow, an analytic model as follows was developed:

$$\begin{aligned}\frac{{{T_{cho}}(1 + 1/COP)}}{{{T_{cdi}}}} - 1 - \frac{1}{{{{(V\rho C)}_{cond}}}}\frac{{(1/COP + 1){Q_{ch}}}}{{{T_{cdi}}}}\\ = {c_1}\frac{{{T_{cho}}}}{{{Q_{ch}}}} + {c_2}\left( {\frac{{{T_{cdi}} - {T_{cho}}}}{{{Q_{ch}}{T_{cdi}}}}} \right) + {c_3}\frac{{{Q_{ch}}(1 + 1/COP)}}{{{T_{cdi}}}}\end{aligned} $$
(5.72)

If one introduces

$$ {x_1} = \frac{{{T_{cho}}}}{{{Q_{ch}}}},\quad {x_2} = \frac{{{T_{cdi}} - {T_{cho}}}}{{{Q_{ch}}{T_{cdi}}}},\quad {x_3} = \frac{{(1/COP + 1){Q_{ch}}}}{{{T_{cdi}}}} $$

and

$$ \begin{aligned}y =\;&\frac{{{T_{cho}}(1/COP + 1)}}{{{T_{cdi}}}} - 1\\& - \frac{1}{{{{(V\rho C)}_{cond}}}}\frac{{(1/COP + 1){Q_{ch}}}}{{{T_{cdi}}}} \end{aligned}$$
(5.73)

where V, ρ and c are the volumetric flow rate, the density and specific heat of the condenser water.

For the variable condenser flow rate, Eq. 5.72 becomes

$$ y = {c_1}{x_1} + {c_2}{x_2} + {c_3}{x_3} $$
(5.74)

(b) Black-Box Models

Whereas the structure of a gray box model, like the GN model, is determined from the underlying physics, the black box model is characterized as having no (or sparse) information about the physical problem incorporated in the model structure. The model is regarded as a black box and describes an empirical relationship between input and output variables. The commercially available DOE-2 building energy simulation model (DOE-2 1993) relies on the same parameters as those for the physical model, but uses a second order linear polynomial model instead. This “standard” empirical model (also called a multivariate polynomial linear model or MLR) has 10 coefficients which need to be identified from monitored data:

$$ \begin{aligned}COP &= {b_0} + {b_1}{T_{cdi}} + {b_2}{T_{cho}} \\ & \quad +{b_3}{Q_{ch}} +{b_4}{T_{cdi}}^2 + {b_5}T_{cho}^2 + {b_6}Q_{ch}^2\\ & \quad+ {b_7}{T_{cdi}}{T_{cho}} + {b_8}{T_{cdi}}{Q_{ch}}+{b_9}{T_{cho}}{Q_{ch}}\end{aligned} $$
(5.75)

These coefficients, unlike the three coefficients appearing in the GN model, have no physical meaning and their magnitude cannot be interpreted in physical terms. Collinearity in regressors and ill-behaved residual behavior are also problematic issues. Usually one needs to retain in the model only those parameters which are statistically significant, and this is best done by step-wise regression.

Table B.3 in Appendix B assembles data consisting of 52 sets of observations from a 387 ton centrifugal chiller with variable condenser flow data. A sample hold-out cross-validation scheme will be used to guard against over-fitting. Though this is a severe type of split, use the first 36 data points as training data and the rest (shown in italics) as testing data.

  1. (a)

    You will use the three models described above (Eqs. 5.71, 5.74 and 5.75) to identify suitable regression models. Study residual behavior as well as collinearity issues between regressors. Identify the best forms of the GN and the MLR model formulations.

  2. (b)

    Evaluate which of these models is superior in terms of their external prediction accuracy The GN and MLR models have different y-values and so you cannot use the statistics provided by the regression package directly. You need to perform subsequent calculations in a spreadsheet using the power as the basis of comparing model accuracy and reporting internal and external prediction accuracies. For the MLR model, this is easily deduced from the model predicted COP values. For the GN model with constant flow, rearranging terms of Eq. 5.71 yields the following expression for the chiller electric power Pch:

$$\begin{aligned} &{P_{comp}} =\\ &\frac{{{Q_{ch}}({T_{cdi}} - {T_{cho}}) +{a_1}{T_{cdi}}{T_{cho}} + {a_2}({T_{cdi}} - {T_{cho}}) +{a_3}{Q^2}_{ch}}}{{{T_{cho}} - {a_3}{Q_{ch}}}}\end{aligned}$$
(5.76)
  1. (c)

    Report all pertinent steps performed in your analysis and present your results succinctly.

Helpful tips:

  1. (i)

    Convert temperatures into degrees Celsius, Qch into kW and volumetric flow rate V into L/s for unit consistency (work in SI units)

  2. (ii)

    For the GN model, all temperatures should be in absolute units

  3. (iii)

    Degrees of freedom (d.f.) have to be estimated correctly in order to compute RMSE and CV. For internal prediction, d.f. = n − p where n is the number of data points and p the number of model parameters. For external prediction accuracy, d.f. = m where m is the number of data points.

Pr. 5.14

Effect of tube cleaning in reducing chiller fouling

A widespread problem with liquid-cooled chillers is condenser fouling which increases heat transfer resistance in the condenser and results in reduced chiller COP. A common remedy is to periodically (every year or so) brush-clean the insides of the condenser tubes. Some practitioners question the efficacy of this process though this is widely adopted in the chiller service industry. In an effort to clarify this ambiguity, an actual large chiller (with refrigerant R11) was monitored during normal operation for 3 days before (9/11-9/13-2000) and 3 days after (1/17-1/19-2001) tube cleaning was done. Table B.4 (in Appendix B) assembles the entire data set of 72 observations for each period. This chiller is similar to the figure for Example 5.4.3.

Analyze, using the GN model described in Pr. 5.13, the two data sets, and determine the extent to which the COP of the chiller has improved as a result of this action. Prepare a report describing your analysis methodology, your analysis results, the uncertainty in your results, your conclusions, and any suggestions for future analysis work.

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Agami Reddy, T. (2011). Estimation of Linear Model Parameters Using Least Squares. In: Applied Data Analysis and Modeling for Energy Engineers and Scientists. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-9613-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-9613-8_5

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-9612-1

  • Online ISBN: 978-1-4419-9613-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics