1 Introduction

The consumer price index (CPI) is an economic measure of major interest in economic policy. From a microeconomic point of view inflation increases cost of living for consumers. One immediate macroeconomic implication is the pressure to increase wages, which in turn has a direct impact on competitiveness. The CPI is also the basis for measuring growth and productivity in real terms, notwithstanding its influence in the evolution of interest rates and other financial variables governing, for instance, investment decisions at the micro and macro levels, which also influence growth rates. In this context, the correct measurement of consumer price changes is a fundamental issue (Boskin et al. 1998).

The CPI is usually measured as a weighted average of the prices of a fixed basket of goods representing consumer expenditure. However, the report of the Boskin Commission has established that one of the major drawbacks of this methodology is the inability to cope with the quality change and new product biases (Boskin 1996), therefore, overstating the increase in the cost of living (Boskin et al. 1998). The recommendations of the Boskin report have influenced statistical agencies to take the steps toward making the CPI a better approximation of a true cost of living index. They have also served to renew the interest on hedonic regressions and hedonic price indexes as a potential way of controlling for those biases.

Hedonic price indexes are constructed based on a hedonic regression where the price of the good is explained by its characteristics. The coefficients of this regression are similar to price characteristics that may be used to construct an index of quality change. The price change of the good is then adjusted by this quality change in order to build a price index free of quality or new product biases. Most of the literature considers hedonic regressions with observed product characteristics, while the impact of omitted unobserved product characteristics has received very little attention. Most authors rely on brand or make dummies, hoping that this will be enough to control for product unobservables. However, Benkard and Bajari (2005) and Requena-Silvente and Walker (2006) show, using different methodologies, that not including specific controls for unobserved effects can induce a significant bias in hedonic price indexes.

Benkard and Bajari (2005) propose a method based on factor analysis to correct for these biases and they apply it to the US personal computer market finding that not taking into account unobserved effects induces an upward bias in the hedonic index of about 1.4 % per year. The alternative approach proposed by Requena-Silvente and Walker (2006) controls for product unobservables by introducing model dummies in the hedonic regressions. In their application to the UK car market they find that the contribution of car-model effects to the value of cars has fallen since the 1970s, suggesting a downward bias in the hedonic index.

In this paper, following Requena-Silvente and Walker (2006) I construct a hedonic price index for cars in Spain in the 1990s controlling for time-invariant product unobservables. I extend their approach to control also for time-variant unobserved factors by including controls for age. Age effects have already been used in the literature (see for example the application to the Dutch car market of Dalen and Bode 2004 and the references therein), but the simultaneous inclusion of age and car-model effects has not been tried before.Footnote 1 As usual in this literature I find that price indexes are larger than quality-adjusted prices, but also that car-model effects play an important role that will be misleading unless we control for age effects. In particular, it is shown that in the absence of age effects just controlling for time-invariant unobservables tends to overstate the hedonic price index by a large amount. As a consequence, the year-on-year Spanish CPI would have been overestimated by around \(0.1~\%\) on average during the sample period. Excluding both the controls for age effects and time-invariant unobservables would have increased this bias to \(0.2~\%\).

The recent literature has mainly focused on durable goods where quality upgrading is frequent and product replacement is high. These types of goods usually have a large weight in the CPI (particularly in the case of cars) and therefore any adjustment in price indexes for those categories may have a relevant impact on the CPI. Examples include computers (Pakes 2003; Brown 2000), domestic appliances (Ioannidis and Silver 2003; Silver and Heravi 2004), electronic devices (Chwelos et al. 2008) and particularly cars.

The automobile sector has been widely studied in the hedonic price index literature, probably as a consequence of the important weight that automobiles have in consumer price indexes. Among the papers that have computed hedonic prices indexes for cars for different countries and periods of time are the following: (i) For the US: Court (1939), Griliches (1961) for the period 1954–1960, Triplett (1969) for the period 1960–1965 and Ohta (1987) for used cars for the period 1970–1983. (ii) For the UK: Cowling and Cubbin (1972) for the period 1956–1968, Murray and Sarantis (1999) for the period 1977–1991 and Requena-Silvente and Walker (2006) for the period 1971–1998. (iii) For the Netherlands: Kroonenberg and Cramer (1974) for the period 1964–1971 and Dalen and Bode (2004) for the period 1990–1999. (iv) For Portugal Reis and Silva (2006) for the period 1997–2001. (v) For Italy Tomat (2002) for the period 1988–1998.

Regarding the Spanish market Izquierdo et al. (2001) have computed hedonic prices for new cars using monthly data for the period 1997–2000. Their hedonic regressions explain prices as a function of quality indexes, constructed from a comprehensive set of 35 observed characteristics, in order to avoid the collinearity problems common to this methodology (Pakes 2003). The main finding is that quality corrected prices are 3.1 % lower per year as compared to the price index computed by the Spanish National Statistics Office. However, they do not attempt to control for the impact of unobserved quality factors and their sample runs only for 4 years, while this paper takes into account the impact of unobserved attributes and covers 11 years, offering a wider view of the trends in the industry. Matas and Raymond (2009) offer estimates for the period 1981–2005, but using yearly, instead of monthly, data. They perform standard hedonic regressions, proposing two different smoothing techniques to deal with the parameter instability caused by collinearity. They do not address directly the problem of product unobservables, assuming that they may be captured by brand dummies. Their results show that not controlling for quality improvements overestimates nominal price increases at an average rate of 8.8 % per year for this period of 25 years. For the period 1997–2001 they estimate a gap of around 2.85 % per year, in line with the results of Izquierdo et al. (2001), although negligibly smaller. For the period 1991–2000 their estimated rate of change in the hedonic index is similar to the one reported in this paper for the brand dummies specification (designed to be comparable to Matas and Raymond model). This shows that the results offered in the present study are robust and in line with the existing literature. However, by not taking into consideration age effects, Matas and Raymond are only able to capture with their model just a half of the total quality-adjusted price decrease in the car industry during that decade, as is shown in this paper. As a consequence, the correspondent correction for quality in the Consumer Price Index would be heavily underestimated.

The rest of the paper is organized as follows: Sect. 2 describes the data and quality improvement process of cars in Spain. Section 3 explains the methodology followed in the hedonic regressions. Section 4 presents the price indexes to be computed. Section 5 shows and discusses the results. Finally, Sect. 6 concludes.

2 Quality improvement patterns for cars in Spain during the 1990s

2.1 Data description

I use a unique data set of monthly registrations of new cars in Spain for period January 1990 to December 2000. These data were initially collected by Moral and Jaumandreu (2007),Footnote 2 who also provide a thorough description of the database. It includes information on listed nominal and real prices and characteristics such as car size (length, width, luggage capacity), power, maximum speed, fuel consumption, and equipment (dummies for air conditioner, anti-lock braking system, power steering, central door locking and electric windows). It also has information on model age and on the geographical origin of the brand producing the model.

Table 1 describes the set of characteristics. The average car model costs almost 20 thousand euros and sells 681 unit per month. Its size, measured as its external surface, is 7.37 squared meters, with fuel consumption of 6.23 l every 100 km. Over 80 % of the observations of the sample have central door locking and electric windows, but only a half have an air conditioner, and just 39 % come with an Anti-lock Braking System.

Table 1 Characteristics and descriptive statistics

Most of the models observed during these 11 years were introduced for the first time during the sample period. However, in the first period of the sample, January 1990, there were already 77 models that had entered in the past, with ages ranging from one to fifteen years (13–181 months) at that moment.

The unit of observation is the car model. Car models often have several variants or subvariants. In the data, a given model denomination is associated with the characteristics of its most popular variant in the month of observation. Therefore, the variation in characteristics over time is due to the variation of the characteristics of the representative variant (and not due to a change in the variant chosen). The number of registrations for a model are, however, the sum of registrations of all variants.

Some filters were introduced to exclude super luxury models, e.g., Ferrari or Rolls Royce. Models with fewer than an average of 10 registrations per month are also excluded.Footnote 3 The model-month observations with less than 10 registrations represent only 2 % of the sample and those with less than 16 units sold per month are just 5 % of the sample. Table 1 shows the quantiles from the left tail of the empirical distribution for the number of registrations per model-month across the different model-months observations in the data. Overall, the data set accounts for more than 99.9 % of car registrations in Spain during the sample period.

Models are classified in segments following industry sources.Footnote 4 In particular, I consider the following classification in eight segments: Small-Mini, Small, Compact, Intermediate, High Intermediate, Luxury, Sport, and Minivan. The segments from Small-Mini to Luxury correspond to vertical product differentiation, while Sport and Minivan can be identified with horizontal product differentiation. These two segments include cars of different levels of quality, all of them having in common that they are designed to serve a more specific purpose. Table 2 shows the relative importance of each segment in terms of market share. The Small and Compact segments are clearly the most popular, each having one third of the market, followed by the High-Intermediate. While the Compact segment has kept its weight over all the sample period, it seems that the Small segment shows a slightly decreasing trend that has benefited the sales of the other segments, specially the High-Intermediate. The three most popular segments consistently account for more than 80 % of sales every year.

Table 2 Segment market shares

2.2 The evolution of automobile characteristics and prices

One of the most salient features of the Spanish car market in the 1990s is the intense process of product entry and replacement. The number of products increases steadily over the period due to the entry of new firms (mainly from Asia) and the expansion of the product range of incumbents. We can, therefore, say that the market is characterized by a scenario of increased competition, particularly from Asian manufacturers (Jaumandreu and Moral 2008). The evolution of prices and car characteristics suggests that non-price competition is the strategy followed by the majority of firms. The average price of cars in real terms increases all over the period, except for Asian models (Fig. 1), which may be due to the fact that at the beginning of the sample period Asian producers were concentrating mainly on models of the upper-class segments. As they expanded their range of products to cover segments of lower quality it is natural that the global average price decreases. The initial decline in prices (Fig. 2) can be attributed to the context of economic crisis at the beginning of the decade (1990s) in Spain. The quality of cars, measured by the amount of each characteristic, clearly increases all over the period, perhaps more in the case of non-Asian models.Footnote 5\(^{,}\)Footnote 6 The general trend in the period goes toward larger, faster and more powerful cars, but with smaller luggage capacity and higher consumption rates. This is particularly marked in the case of European models, probably as a response to the increased competition from Asian models in the second half of the decade. The average Asian car seems to follow the opposite pattern, but as mentioned before, this is mainly a consequence of the fact that the earlier Asian cars were concentrated in the upper-quality segments. After 1995 the Spanish market witnesses an intense wave of entries by Asian makers, mainly in the medium and lower quality segments. Nevertheless, Spanish, European and American manufacturers contribute also very actively to the enlargement of the number of models offered in Spain.

Fig. 1
figure 1

Average real price by origin (deflated from the Spanish Consumer Price Index)

Fig. 2
figure 2

Average real price by segment (deflated from the Spanish Consumer Price Index)

The evolution of car amenities such as air conditioning, power steering, etc., follows a similar pattern, although in this case it is very clear that improvements in these characteristics are always introduced in the upper-class segments and they eventually spread over the rest. It is important to clarify that the Minivan and Small-Mini segments became popular during this period so that the variety and number of models increased significantly. As a consequence the average characteristics varied a lot due to the intense entry process, particularly at the beginning of the sample period.

In summary, it seems clear that the automobile sector in Spain experienced a remarkable improvement in quality which a simple price index would ignore, thus overstating the increase in the cost of living attributed to car purchases. Therefore, the application of hedonic regression techniques to the computation of price indexes seems to be clearly justified.

3 Hedonic regressions

The hedonic regression methodology is aimed at explaining price variations by the change in product characteristics. Its practical implementation requires choosing and justifying assumptions regarding model specification, functional form of the hedonic function, parameter constancy or weighting. The next subsections address each of these issues.

3.1 Model specification

The most basic hedonic specification relates price to a number of characteristics (C):

$$\begin{aligned} p_{it}=\beta _{0}+\sum _{j=1}^{C}\beta _{j}x_{ijt}+\varepsilon _{it} \end{aligned}$$
(1)

where \(\beta _{0}\) is an intercept, \(x_{ijt}\) denotes characteristic \(j\) of product \(i\) at time \(t\). The age of the car model can be one of these characteristics. \(\beta _{j}\) denotes the implicit price of characteristic \(j\) and \(\varepsilon _{i}\) is the error term. In some cases the researcher may have access to a thorough set of product characteristics, comprehensive enough to justify the assumption that there remains no unobserved characteristic and the model is well specified. However, in most cases the set of characteristics is much more limited and observability becomes an issue. And even if we had such an exhaustive set of characteristics we could always think of factors like reliability, consumer perception of quality or reputation that have an effect on prices, but are not specifically captured by any combination of technical characteristics. If these factors are also correlated with the observed characteristics then their omission would make the estimation of \(\beta \) inconsistent. One approach that has become common in order to address this problem consists of adding brand or make dummies, in the hope that reputation or reliability will be adequately captured:

$$\begin{aligned} p_{it}=\beta _{0}+\sum _{j=1}^{C}\beta _{j}x_{ijt}+\sum _{b=1}^{B-1}\gamma _{b}Brand_{i}^{b}+\varepsilon _{it} \end{aligned}$$
(2)

where \(Brand_{i}^{b}=1\) if product \(i\) belongs to firm \(b\) and zero otherwise. As usual, one of the \(B\) brands must be excluded in order to avoid collinearity problems. \(\gamma _{b}\) captures the brand effect. This set of dummies may be augmented in some cases, such as the automobile sector, with the addition of segment dummies:

$$\begin{aligned} p_{it}=\beta _{0}+\sum _{j=1}^{C}\beta _{j}x_{ijt}+\sum _{b=1}^{B-1}\gamma _{b}Brand_{i}^{b}+\sum _{l=1}^{L-1}\delta _{l}Seg_{i}^{l}+\varepsilon _{it} \end{aligned}$$
(3)

where \(Seg_{i}^{l}=1\) if product \(i\) belongs to segment \(l\) and zero otherwise. \(\delta _{l}\) captures the corresponding segment effect. Unfortunately, even with brand or segment dummies there may remain unobserved factors specific to model \(i\). One possible solution would be to introduce \(M-1\) product specific dummies in order to capture such effects:

$$\begin{aligned} p_{it}=\beta _{0}+\sum _{j=1}^{C}\beta _{j}x_{ijt}+\sum _{m=1}^{M-1}\eta _{m}Model_{i}^{m}+\varepsilon _{it} \end{aligned}$$
(4)

where \(Model_{i}^{m}=1\) if \(m=i\) and zero otherwise, and \(\eta _{m}\) is the associated model effect. This effect would capture all the unobserved factors related to the particular car model as well as the effect associated to belonging to a given firm and segment.

However, if the number of products is large this approach could be problematic due to the lack of degrees of freedom. The number of parameters for characteristics and product dummies would be large relative to the number of observations and this could result in identification problems and inconsistent ordinary least squares (OLS) estimates. A more parsimonious approach would specify a model with a time-invariant product fixed effect (\(\zeta _{i}\)):

$$\begin{aligned} p_{it}=\beta _{0}+\sum _{j=1}^{C}\beta _{j}x_{ijt}+\zeta _{i}+\varepsilon _{it}= \beta _{0}+\sum _{j=1}^{C}\beta _{j}x_{ijt}+\nu _{it} \end{aligned}$$
(5)

In this specification the time-invariant car-model effect is an omitted variable affecting prices through the compounded error term \(\nu _{it}\). However, assuming that this effect is time-invariant and that observed characteristics are strictly exogenous with respect to the transitory error term \(\varepsilon \) permits the consistent estimation of \(\beta \) using panel data fixed effects estimators. For instance, using first differencing or a within transformation would remove \(\zeta _{i}\), but also all other time-invariant regressors, so their coefficients would not be separately identified from \(\zeta _{i}\).Footnote 7 The estimated prices for characteristics, \(\beta \), would be the same in approaches 4 and 5.Footnote 8 Therefore, for the purpose of obtaining hedonic coefficients, equations (4) and (5) are the same and differ only in the way time-invariant unobserved effects are estimated. Approaches 13 are common in the literature and approach 4 has been proposed by Requena-Silvente and Walker (2006). In this paper I will compare the results from all of them.

Regarding the choice of characteristics in \(X\) the common approach has been using as many characteristics as there are available. However, the potential collinearity between many of these characteristics can induce some problems in the estimation of \(\beta \), notably the appearance of “wrong” signs or parameter instability. Nevertheless, following Pakes (2003), these problems have not been a particular source of worry in the literature. Therefore, in my empirical specification I will be using all characteristics listed in Table 1.Footnote 9 Among them, the role of model age deserves particular attention.

The variable age, measures the age of the product, i.e., the period of time elapsed since it was first introduced in the market. The sign of age effects in a hedonic regression is not necessarily restricted to be either positive or negative. On one hand, product age could be informative about the degree of obsolescence of the product. In this case, consumers would tend to penalize older products and age would have a negative effect, as they would have a preference for newer, up-to-date goods. On the other hand, it could also be the case that older products would be seen as more successful. For instance, Dalen and Bode (2004) find mostly evidence of positive age effects, which they interpret as a gradual improvement of the quality of car models “after introduction without immediately adjusting the basic technical description of the model” (p. 1177). They also report some negative age effects. This constitutes evidence that the sign of the hedonic price for age can change through time.

Moreover, according to Oliner (1993), hedonic equations should include the age as an explicit argument, because if omitted characteristics are correlated with model age, this would be an adequate proxy for them. Therefore, the age can be informative about product specific (quality) characteristics that cannot be inferred from the observed technical specifications and that can also be time-variant. In consequence, introducing age as an explanatory variable (i.e., as an additional \(x_{ijt}\)) can help in controlling for unobserved effects in combination with \(\zeta _{i}\) in expression 5. The latter would control for time-invariant product specific unobserved effects, while the former would capture the time-variant ones that are common to the products of the same age. After these controls are introduced it does not seem implausible to assume that most of all relevant sources of (time-variant or -invariant) unobserved product heterogeneity are being accounted for.

If product unobservables are really an issue then the estimation of expressions 13 should yield biased estimates of \(\beta \). The introduction or not of age would just affect the size of the bias. In specification 5 however, the omission of age could introduce some bias and its inclusion should remove it. If the time-variant unobserved effects are correlated with \(X\) then omitting them would induce correlation between \(X\) and the error term. The size and direction of these biases are empirical questions that depend on the correspondent estimates of the effects.

3.2 Functional form of the hedonic function

In the previous subsections linear expressions of the hedonic regression have been used for simplicity in the exposition. However, the relation between prices and characteristics could follow any general functional form: \(p_{it}=f\left( X_{it}\right) \). Nevertheless, the literature has focused on linear relations, but allowing for the possibility of transforming the data to have a more flexible specification. The usual approach has consisted of applying a Box-Cox transformation to the dependent and/or the right-hand side variables and estimating the transformation parameters consistent with the data. It turns out that in most cases the Box-Cox parameters are close enough to 0 or 1 to safely assume semi-log, log-log or simple linear specifications. Therefore, most of the work of functional form selection reduces to determining the best suitable transformation of the data.

Regarding the automobile industry, previous studies have found that a semi-log specification (taking logs on the price and leaving the right-hand side variables unchanged) is the one that best fits the data, as is the case of Dalen and Bode (2004) or Requena-Silvente and Walker (2006). For the Spanish market, Matas and Raymond (2009) also find the semi-log the most adequate choice. That will also be the one I will use in this paper. This choice is sustained by the fact that the maximum likelihood estimates for the parameters of the Box-Cox transform on a regression of price on characteristics for the whole sample yield a parameter of 0.013 (p-value 0.116) for price and 1.14 for the right-hand side variables (the dummies are excluded from the transformation). Therefore, the assumption of linearity for right-hand side variables and logs for price does not seem unreasonable.Footnote 10

3.3 Parameter constancy and the use of weights in hedonic regressions

The hedonic price index methods can be applied following different estimation strategies that are basically differentiated by the sample size they use:

  1. 1.

    The time dummy variable (TDV) method fits the hedonic regression to the whole sample, adding to the model specification a set of time dummies. The idea is that the coefficient of the dummy of say, period \(t\), will represent the growth in the price index from the initial period to time \(t\) net of quality changes, which are controlled for through the variation of characteristics. The main drawback of this method is that it restricts the coefficients (prices) of characteristics to be constant over the whole sample period. Even if one could consider that this assumption would be reasonable for short sample periods or in contexts where consumer perception and valuation of quality remain constant over time, the truth is that in the literature parameter constancy is most often rejected by Chow tests of structural break. The case of the automobile industry is not an exception.

  2. 2.

    The adjacent period (AP) method can be seen as a refinement of the TDV where parameter constancy is assumed to hold only for two consecutive periods and a dummy is added to capture the quality-adjusted price increase of the second period with respect to the first one. A whole index series can then be constructed by chaining the time dummy coefficients.

  3. 3.

    The single period equation (SP) method allows the prices of characteristics to vary from period to period. Its parameter estimates can then be used to construct indexes of quality change which serve to correct the quality bias of the non-adjusted price index. One potential drawback of this approach is, as mentioned before, the parameter instability on the estimated prices of characteristics. However, the quality-adjusted price indexes constructed from them seem to be quite robust in general (Pakes 2003).

In this study I will follow the single period equation approach, that has gained in popularity precisely because it avoids the assumption of parameter constancy, which is not recommended unless it is sustained by the data (Triplett 2004 p. 61). This is generally not a problem in the AP method, however this paper is aimed at assessing the impact of unobservables on price indexes, rather than comparing the results of SP and AP methodsFootnote 11 (see again for examples Triplett 2004 pp. 61–63).

Another point of debate in the hedonic literature is whether the hedonic regressions should be weighted or not. In this respect, and following the recommendations in Triplett (2004) I will make use of weights to avoid an excessive impact of prices of products whose market share is low, because they are viewed as less satisfactory by consumers. The price variation of these type of goods should have less importance than other more successful models.Footnote 12

3.4 Estimation issues

Taking into account all considerations of the previous subsections, the final hedonic specifications to be applied to the data are:

$$\begin{aligned} \ln p_{it}&= \beta _{0}+\beta _{1}C90_{it}+\beta _{2}CarSize_{it}+\beta _{3}LGC_{it}+\beta _{4}HP_{it} \nonumber \\&+\,\beta _{5}maxSp_{it}+\beta _{6}Age_{it}+\beta _{7}AC_{it}+\beta _{8}ABS_{it}+\beta _{9}PST_{it}\nonumber \\&+\,\beta _{10}CDL_{it}+\beta _{11}EW_{it}+\varepsilon _{it} \end{aligned}$$
(6)

where \(\beta _{6}\) captures age effects.

$$\begin{aligned} \ln p_{it}&= \beta _{0}+\beta _{1}C90_{it}+\beta _{2}CarSize_{it}+\beta _{3}LGC_{it}+\beta _{4}HP_{it} \nonumber \\&+\,\beta _{5}maxSp_{it}+\beta _{6}Age_{it}+\beta _{7}AC_{it}+\beta _{8}ABS_{it}+\beta _{9}PST_{it}\nonumber \\&+\,\beta _{10}CDL_{it}+\beta _{11}EW_{it}+\sum _{b=1}^{B-1}\gamma _{b}Brand_{i}^{b}+\varepsilon _{it} \end{aligned}$$
(7)

where \(\gamma \) captures brand effects.

$$\begin{aligned} \ln p_{it}&= \beta _{0}+\beta _{1}C90_{it}+\beta _{2}CarSize_{it}+\beta _{3}LGC_{it}+\beta _{4}HP_{it} \nonumber \\&+\,\beta _{5}maxSp_{it}+\beta _{6}Age_{it}+\beta _{7}AC_{it}+\beta _{8}ABS_{it}+\beta _{9}PST_{it} \nonumber \\&+\,\beta _{10}CDL_{it}+\beta _{11}EW_{it}+\sum _{b=1}^{B-1}\gamma _{b}Brand_{i}^{b}+\sum _{l=1}^{L-1}\delta _{l}SEG_{i}^{l}+\varepsilon _{it} \end{aligned}$$
(8)

where \(\delta \) captures segment effects.

$$\begin{aligned} \ln p_{it}&= \beta _{0}+\beta _{1}C90_{it}+\beta _{2}CarSize_{it}+\beta _{3}LGC_{it}+\beta _{4}HP_{it} \nonumber \\&+\beta _{5}maxSp_{it}+\beta _{6}Age_{it}+\beta _{7}AC_{it}+\beta _{8}ABS_{it}+\beta _{9}PST_{it}\nonumber \\&+\beta _{10}CDL_{it}+\beta _{11}EW_{it}+\sum _{m=1}^{M-1}\eta _{m}Model_{i}^{m}+\varepsilon _{it} \end{aligned}$$
(9)

where \(\eta \) captures time-invariant car-model effects.

$$\begin{aligned} \ln p_{it}&= \beta _{0}+\beta _{1}C90_{it}+\beta _{2}CarSize_{it}+\beta _{3}LGC_{it}+\beta _{4}HP_{it} \nonumber \\&+\beta _{5}maxSp_{it}+\beta _{6}Age_{it}+\beta _{7}AC_{it}+\beta _{8}ABS_{it}+\beta _{9}PST_{it}\nonumber \\&+\beta _{10}CDL_{it}+\beta _{11}EW_{it}+\zeta _{i} +\varepsilon _{it} \end{aligned}$$
(10)

where \(\zeta _{i}\) denotes time-invariant car-model effects. As discussed before, (9) and (10) produce the same hedonic coefficients. However, I will show the results from both as a robustness check. Assuming that observed characteristics are exogenous, expressions (6)–(9) could be estimated period to period by ordinary least squares. However, expression (10) requires at least two periods to control for the unobserved component \(\zeta \). Therefore, in the estimation of all specifications I follow the approach proposed in Matas and Raymond (2009) of taking moving samples of order \(h\). They suggest this procedure as a way to smooth the estimated coefficients of single period hedonic regressions, that tend to be erratic from period to period. This method has the added advantage of providing enough time observations for the application of the within estimatorFootnote 13 in equation 10. It should be clarified that this approach allows different prices for characteristics every period (different \(\beta _{t}\)’s), except for the first \(h-1\) periods. What is assumed is that the coefficients of period \(t\) can be satisfactorily estimated by pooling all periods from \(t-h+1\) to \(t\). In this subsample the prices for characteristics from \(t-h+1\) to \(t-1\) are held equal to those in period \(t\). Next, for the estimation of period \(t+1\) the coefficients of \(t+1\) will be assumed to hold for the previous \(t-h+2\) to \(t\) periods, and so on. Hence, contrary to what happens in the AP or TDV approaches, holding the coefficients constant is a manner of improving the estimation of the per-period coefficients. It is assumed that for the estimation of \(\beta _{t}\) the previous \(t-h+1\) periods contain useful information, and that we can take advantage of it even if we “temporarily” impose the coefficients of period \(t\) over the previous \(t-h+1\) periods. In the AP or TDV methods the sample remains constant and the coefficients within the sample remain constant too. The order of the moving sample, \(h\), should be of a size which is just enough to control for the fixed effect, and not too large to avoid an excessive smoothing. For the empirical application I have chosen \(h=12\), because for shorter samples many of the characteristics are usually constant and, therefore, their coefficients could not be separately identified from the fixed effect. Recall that we are using here monthly data and that even if there is a significant rate of product change and improvement, many models do not experience changes in their technical specifications from one month to the next.Footnote 14 By fixing \(h=12\) we are assuming a one-year moving sample,Footnote 15 which may be considered as ad hoc, but which is also consistent with all studies using yearly data to estimate hedonic prices for characteristics.Footnote 16 In summary, for the sake of comparability, I estimate specifications (6)–(10) using the moving sample of order \(12\).Footnote 17

4 Quality-adjusted price indexes

I use a unit sales weighted Laspeyres geometric index,Footnote 18 as proposed in Feenstra (1995), to measure the price increase of automobiles. The quality-adjusted index for model \(i\) is, therefore, defined as:

$$\begin{aligned} \ln I_{it}=\ln p_{it}-\ln p_{it-1}-\left( X_{it}-X_{it-1}\right) \beta _{t} \end{aligned}$$
(11)

where \(\left( X_{it}-X_{it-1}\right) \beta _{t}\) represents the quality correction and can be interpreted as a characteristics quantity index, \(Q_{it}\) (Triplett 2004, p. 60). Notice that all time-invariant characteristics, the brand and segment dummies or the car-model effects, would cancel out if they were introduced in \(Q_{it}\). Thus, the differences between the alternative quality-adjusted price indexes proposed come from the different estimates of \(\beta \) in each specification. In order to mitigate the potential impact of coefficient instability in the single period hedonic regressions over the indexes I smooth the \(\beta \) using a weighted moving average of coefficients or order \(k=3\) as proposed by Matas and Raymond (2009).Footnote 19 Therefore, the smoothed coefficient for characteristic \(j\) in period \(t\) is:

$$\begin{aligned} \hat{\beta }_{jt}^{smooth}=\lambda _{t-1}\hat{\beta }_{jt-1}+\lambda _{t}\hat{ \beta }_{jt}+\lambda _{t+1}\hat{\beta }_{jt+1} \end{aligned}$$

where \(\hat{\beta }_{jt}\) denotes the estimate of \(\beta _{jt}\) and \(\lambda _{t}=\frac{\left[ var\left( \hat{\beta }_{jt}\right) \right] ^{-1}}{ \sum \nolimits _{s=t-1}^{t+1}\left[ var\left( \hat{\beta }_{js}\right) \right] ^{-1}}\).

The aggregated index is the weighted sum:

$$\begin{aligned} I_{t}=\exp \left( \sum \limits _{i=1}^{n_{t}}s_{it-1}\times \ln I_{it}\right) \end{aligned}$$
(12)

where \(n_{t}\) is the number of models in period \(t\) and \(s_{it}\) is the market share of model \(i\) in period \(t\), such that \(\sum \nolimits _{i=1}^{n_{t}}s_{it}=1\). This is the kind of approach followed by the Spanish National Statistics Office (see for example Izquierdo et al. 2001). The indexes (12) are then chained to construct a whole series.

I use as reference the non-quality-adjusted index, comparing it to adjusted indexes from each of the four specifications (610).

5 Results

The hedonic specifications 610 were estimated for each period.Footnote 20 Tables 3, 4, 5, 6, 7 show abridged versions of the hedonic parameter estimates, reporting the results for the month of May of each year.Footnote 21\(^{,}\)Footnote 22 The estimated coefficients have in general the right sign, although a few variables show reversed signs for certain periods, a fact frequently reported in the literature (Pakes 2003).

Table 3 Hedonic coefficients for characteristics: Brand dummies specification (abridged table)
Table 4 Hedonic coefficients for characteristics: Brand and segment dummies specification (abridged table)
Table 5 Hedonic coefficients for characteristics: Fixed effects specification (abridged table)
Table 6 Hedonic coefficients for characteristics: Model dummies specification (abridged table)
Table 7 Hedonic coefficients for characteristics: Basic specification (no dummies nor fixed effects) (abridged table)

In the Fixed Effects (FE) specification, in certain months, some characteristics did not show enough variation to have all coefficients identified (see footnote a in Table 5). Apart from these very specific cases, the variation in observed characteristics allows the estimation of all hedonic specifications.

The size of coefficients (in absolute value) also varies over time, but changes tend to be smooth from period to period. This may be a consequence of having monthly regressions, because even if we can expect parameter instability, as mentioned above, it should be smaller between two consecutive months rather than between two consecutive years. It is not unreasonable to expect that consumer valuation of characteristics do not change much within a year, and that would be consistent with obtaining very similar results for price indexes using different orders of smoothing of the hedonic parameters.

The quality-adjusted hedonic price indexes resulting from the estimation of specifications 610 are presented in Fig. 3, the non-adjusted index is also included for comparison.Footnote 23 It can be seen that correcting for quality using dummies for brand or segment has a strong impact, in line with the results generally reported in the literature and similar to those reported by Izquierdo et al. (2001) or Matas and Raymond (2009) for the Spanish market. One important finding is, however, that not taking into account car-model effects largely overestimates the dummy-corrected indexes that stem from expressions 7 and 8. Indeed, the index coming from specification 10 shows that quality-adjusted prices remained, basically, constant for the whole period. By the end of 2000 the difference between the car-model-effects- and the brand-dummies-corrected indexes is about \(18~\%\) and the difference with respect to the non-adjusted index is around \(35~\%\). This represents an average year-on-year difference of \(1.8\) and \(3.5~\%\), respectively. Taking into account that the purchase price of new cars had at that time a weight of \(5.27~\%\) in the Spanish Consumer Price IndexFootnote 24 (IPC) we can say that omitting observed and unobserved quality improvements in automobiles would have led to an overestimation of the IPC of almost \(0.2~\%\) per year during the 1990s.

5.1 The interpretation of age effects

The variable age is meant to capture any time-varying product effects that may not be captured by other observed characteristics. Interestingly, the omission of age effects has a striking influence over the hedonic index when adjusting for car-model effects, leading to opposite conclusions. Omitting age effects seems to be an important source of bias, which would be more relevant in the case of the car-model effects specification (Fixed Effects or Model Dummies) than in the Brand-dummies (BD) and Brand-Segment-dummies (BSD) specifications. As discussed in subsection 3.1 we can expect that not introducing age in specifications 68 would just affect the size of the bias of hedonic coefficients. The bias would already exist due to the omission of part or all of the time invariant car-model effects. However, in specification 10 there should not be any significant bias due to the omission of unobserved effects, hence, by omitting age we could be inducing some bias that would have not existed previously. The results without age effects show a similar pattern than those reported for the UK by Requena-Silvente and Walker (2006). They find that car-model effects push up the hedonic index with respect to the brand dummies case and they attribute the result to the fact that the value of unobserved components is decreasing over time. In our case it seems that age effects are in general positive,Footnote 25 suggesting that the quality of cars perceived by the consumers is improving even if this is not reflected in the characteristics observed by the econometrician. Therefore, the omission of age effects would tend to underestimate quality improvement, thus, overestimating quality-adjusted prices. The estimated age coefficients in the fixed effects specification are much largerFootnote 26 than the corresponding coefficients in the dummy specifications, hence, their omission also induces a much larger upward bias in the quality-adjusted index in the fixed effects case than in the brand and brand-segment dummies cases. In particular, the hedonic coefficients of age under fixed effects take their largest values for period January 1993 to April 1996. Their omission determines the strikingly different pattern of the hedonic price index between Figs. 3 and 4 in that period.

Fig. 4
figure 4

Hedonic price indexes (not controlling for age effects)

One could think that age could play the role of a time trend in a hedonic regression, because it increases one step per period, like time trends. In which case, it could be argued that the coefficient of age could be capturing part of the quality-adjusted price increase, because the setup would be very similar to the Time Dummy Variables (TDV) one. The TDV approach finds the amount of quality-adjusted price changes from the estimated coefficients of time dummies. There is, however, an important difference between the TDV approach and the interpretation of age: while time dummies are common for all products in any given period of time, the variable age shows variation across products conditional on the particular time period. It is this specific variation at the product level, that is absent from the simple time dummies, which allows the identification of the impact of unobserved factors associated with the presence and survival of the product in the market (better design, reliability, reputation, etc.). For a given period of time, one can always take a specific product as reference and the variable age will indicate the premium that the market pays to products that are older/younger. Therefore, although both age and a time trend evolve over time in the same manner, the differences in levels across products of the former determines that the pieces of information identified by each of them is different.

The fact that age and a time trend evolve in a similar way could also raise concerns in the context of the estimation of a fixed effects model using a within transformation. The within transformation removes time invariant fixed effects by subtracting from each variable its time mean. Thus, it somehow assigns more importance to the pieces of information in data that are time varying. Given that time variation pattern of age and time trend are similar (one-by-one linear increase) one could think that the within transformation would lessen the identification power of age against a simple time trend described above. However, this is actually not the case. The within estimator is equal to the least-squares dummy variable estimator,Footnote 27 which controls for fixed effects by using dummy variables. In this type of specification the variable age is not transformed and as a consequence it retains the identification power of reputation and unobserved quality effects given by its variation at the product level. Therefore, the coefficient for age from a within approach is giving the “price” of those unobserved characteristics proxied by age.

As a robustness check, all figures show the quality adjusted price index using the Model Dummies (MD) and the FE approach. Clearly, both series have the same pattern.Footnote 28 This is, however, expected from the previous discussion. It is more interesting to note the difference between the BD and BSD specifications with the MD approach. All three estimation strategies are identical, the unique difference being the type of dummy introduced. If age was capturing the raw increase of prices (as do the time dummies in the time-dummy-variable approach to hedonic regression) that should also be happening in BD and BSD. As the unique difference between these specifications is the dummy being used, the differences in the indexes are also provoked by those dummies, and nothing else. Hence, it is not a matter of the use of age with a specific type of estimator.

Fig. 5
figure 5

Hedonic price indexes by origin of car model: Spain

Fig. 6
figure 6

Hedonic price indexes by origin of car model: Europe

Fig. 7
figure 7

Hedonic price indexes by origin of car model: Asia

Fig. 8
figure 8

Hedonic price indexes by origin of car model: America

Fig. 9
figure 9

Hedonic price indexes for the Small segment

Fig. 10
figure 10

Hedonic price indexes for the Compact segment

Fig. 11
figure 11

Hedonic price indexes for the High-Intermediate segment

The next question is then, why are the adjusted indexes so different? This is important, particularly if we take into account that when we omit age, the MD adjusted series jumps up dramatically, as Fig. 4 shows. Again, both the Fixed effect and MD follow the same pattern as expected. However, now MD is much higher (or “less adjusted”) in contrast to the case of having age effects. BD and BSD indexes do not show that much change. As argued at the beginning of this subsection, the explanation must be related to the omitted regressor problem that induces endogeneity in the MD specification when age is left out. In this case time-varying unobserved factors are not considered, resulting in biased estimates of the prices for characteristics. The combination of age and model effects through dummies in the MD specification is able to control for the unobserved effects that could bias estimation. The fact that BS and BSD show smaller variations in the adjusted index specifications only tells us that the amount of additional endogeneity induced by the absence of age controls does not add much to the endogeneity already present in these specifications. This endogeneity would be due to incorrectly considering that individual unobserved product effects can be captured and controlled for by using just brand dummies.

There is one additional remark concerning product definition in the data set and the size and interpretation of age effects. As explained in Sect. 2.1, products are defined according to the commercial denomination given by the manufacturer and, hence, they are the reflection of firms’ marketing criteria. However, as has been demonstrated, product specifications can (dramatically) change over time. These continuous changes may end up yielding a different product, while the commercial name remains unchanged. On the contrary, firms can modify commercial names with no substantial modification of the products. What impact could these somehow “artificial” renaming strategies subjacent in the data have on the estimated age effects? One has to keep in mind that model entry and exit are decisions made (optimally) by the firm. Thus, if the name remains despite the amount of changes made to the configuration of the product, it may be precisely because the name has some value in itself. This value is captured through the fixed effect/model dummies and the age, because that value may have accumulated over time. If producers found it optimal to change the commercial name every time they make a major change in characteristics, they would be discarding any “accumulated” reputation (obsolescence). In this case, the size of age effects captured in the hedonic regressions and the corresponding correction of the price index would be smaller. However, we could also expect a smaller price increase from the initial set of characteristics to the second set, because the lost reputation could no longer be used by the firm to increase its markups. Therefore, a smaller age correction would be matched with an overall smaller price increase, such that one should not expect a change in the quality-adjusted price index. If age effects were capturing obsolescence instead of good reputation, the same argument would apply with just the opposite interpretation of age effects. The crucial factor is that the omission of age effects would bias the hedonic estimations and the quality-adjusted price indexes independently of how frequently the name is changing, because in any case there would be some “accumulation” of reputation. It is only the amount of the estimated age effect that would vary.

These results may suggest that previous studies that have relied on the time dummy variable approach have overestimated quality-adjusted price increases. As these studies generally skip the characteristic of product age it may be the case that (for example) the reputational effects of a product (which evolve over time and can be a characteristic valued by consumers) would be subsumed in the raw time dummies that are meant to capture pure price increases. The time dummy variable approach has other known problems, as discussed in Sect. 3.3, the problem uncovered here would just add to that list.

5.2 Hedonic price indexes by segment and origin

I have used the coefficient estimates of the hedonic specifications to construct price indexes by car model geographic origin and segment.Footnote 29\(^{,}\)Footnote 30 The objective is to determine whether the behavior of prices is determined by the evolution of any particular type of model. Figures 5, 6, 7, 8 report the results for car models from Spain, Europe (excluding Spain), Asia and America. Asian cars are clearly leaders in (car-model and age effects) quality-adjusted price reductions with a \(20~\%\) drop in the whole sample period. American cars also reduced their quality-adjusted price while the Spanish and European cars increased between \(3\) and \(8~\%\). However, for making comparisons we must take into account that the average price at the initial period was different across geographic origins. Asian cars where on average most expensive at the beginning of the sample (Fig. 1) and one could expect that as more models were introduced in lower-class segments the average price would fall even if the quality improvement was not too high. In order to make a balanced comparison, one can look, for example, at the beginning of 1995, where the average real price of Asian and European cars was roughly the same, around \(12,000\) euros. The real price of Asian cars remained roughly constant afterwards, while European cars increased by around \(10~\%\). In the same period, car-model and age effects quality-adjusted prices for Asian cars fell by a \(7~\%\) while European prices fell by less than \(4~\%\). Therefore, even starting from similar price ranges, Asian cars seem to have improved in quality faster than the rest. The increasing competitive pressure from these models may have served to discipline the European makers towards better products in order to retain their market shares. Figure 6 shows that between mid-1995 and the end of 1997 the fixed effects quality-adjusted price of European cars fell sharply after three years of increase (the brand and segment dummies approaches also show that pattern, although it spreads over two more years, until the end of 1999), probably as a response to the wave of entry of models from Asian manufacturers that took place after 1995. Spanish and American models kept their car-model and age effects quality-adjusted prices more or less constant (Figs. 5 and 8). These indexes show, however, some jumps that can be attributed to the smaller number of models in these categories, specially for Spanish cars, that make the indexes more sensitive to product entry or exit.

The indexes for each segment shown in Figs. 9, 10, 11 have been computed following the same approach than for the by-origin figures. To make the exposition more concise, I present only the results for the most popular segments, the Small, Compact and High-Intermediate, that account for around \(70~\%\) of the market. However, their examination does not suggest that there is a specific segment leading price reductions. Quality-adjusted prices tend to increase in the first half of the sample and then start to decrease around 1995. The intensity of price increases and decreases is obviously different across segments, but the general impression is that both trends are roughly uniformly distributed across segments.

6 Concluding remarks

The use of hedonic regressions to compute quality-adjusted price indexes is nowadays a common practice in the economics literature. Nevertheless, the impact of unobserved product characteristics in that analysis has been largely neglected. Building on Requena-Silvente and Walker (2006), I propose an approach that serves to compute such indexes, taking into account the existence of both time-variant and time-invariant unobserved effects and I apply it to the Spanish car market. The former are controlled by introducing model age as an additional characteristic, the latter are accounted for by using fixed effects panel data estimators. Although these two approaches have been separately proposed in the literature they have not been used simultaneously. The results show that the estimated indexes for the Spanish automobile market can be even lower than what had already been reported, suggesting that the consumer price index could have been overestimated by \(0.2~\%\) per year, which constitutes an important bias considering its importance in measuring productivity changes or in determining wage increases. Controlling for age effects turns out to be crucial for obtaining these results, implying that time-variant unobserved effects are as important as the time-invariant ones, at least in what concerns the Spanish car market.

Extending the analysis to the segment and geographical origin level leads to two additional conclusions: First, that the patterns of price evolution are similar across segments, such that price increases or decreases seem to be distributed uniformly across segments, i.e., price increases or decreases do not seem to be concentrated on a particular segment. Second, the price decrease of both the non-adjusted and the adjusted price indexes in the second half of the decade seems to be motivated by the strength of competition after an intense wave of new entries of Asian models. Asian cars show by far the largest quality-adjusted price reductions, particularly, due to their strong quality improvements. It seems plausible to interpret the price reductions of Spanish and European incumbents as a response to that pressure, although a more formal analysis would be needed to clearly determine to what extent this is the case.