1 Introduction

The art market has attracted significant interest for a variety of reasons, reflecting its unique blend of cultural, economic, and social factors (Schönfeld and Reinstaller 2007; David et al 2013). Artwork transactions take place through various channels, such as art galleries, online art marketplaces, and auctions (Ashenfelter and Graddy 2003). Auctions play a significant role in the art market and are important for analyzing how the artwork price is determined since the bidding process is open, and participants can see the competitive bids, which helps establish a fair market value for artworks (Candela and Scorcu 1997; Higgs and Forster 2014). Therefore, many researchers aim to understand the factors influencing art prices, predict price trends, and gain insights into the art market dynamics (Peluso et al 2017; Kraeussl and Logher 2010). Research in these areas contributes to a deeper understanding of the art market, provides insights and recommendations for collectors, investors, and policymakers, and informs strategies for auction houses, artists, and dealers (Peluso et al 2017). It is an evolving field that continues to adapt to changes in the art market and also create advancements in data analysis techniques.

The repeat sales method and the hedonic pricing model are representative techniques used in artwork pricing. The repeat sales method (Galbraith and Hodgson 2018) involves analyzing the resale prices of artworks that have been sold multiple times over a period. However, this method relies on having sufficient data on repeat sales, which may be limited for certain artworks, especially those with infrequent transactions. In this paper, we utilize the framework of the hedonic model (Ekeland et al 2004), which is a specific application of linear regression used to estimate the price or value of a product or asset based on its attributes. In the context of artwork pricing, a hedonic model employs artwork attributes (e.g., artist reputation, size, medium) as independent variables to estimate the artwork’s log price as the dependent variable (Shin et al 2014; Sproule and Valsan 2006; Garay et al 2022). Artwork prices are often influenced by combining two key factors: artist attributes and artwork attributes (Forster and Higgs 2018; Beckert and Rössel 2013; Rengers and Velthuis 2002). These factors work in tandem to shape the perceived value and market price of an artwork. Artist attributes include gallery reputation, the number of solo exhibitions, media awareness, awards, and aliveness. The examples of artwork attributes that significantly contribute to artwork prices are size, medium, and materials.

The popularity effect, also known as the rich-get-richer phenomenon, means that popular trends, ideas, and products often become even more popular; while, less-known options struggle to gain attention and traction. We can observe the popularity effect in many human-made societies including social networks (Jung and Phoa 2021; Jung 2023; Song and Park 2022), economics (Durham et al 1998), and online platforms (Bratanova et al 2016; Jung et al 2020). The popularity effect also occurs in art markets since artworks by well-known and highly regarded artists, or those that have gained significant attention and acclaim, tend to command higher prices and greater demand. This effect is a crucial aspect of how artworks are priced and sold in the art market. Researchers have conducted numerous studies to understand how an artist’s reputation influences the pricing and demand for their artworks (Ursprung and Wiermann 2011; Beckert and Rössel 2013). However, these works simply use the artist’s reputation attributes as independent variables of the model, such as survey results and the number of solo exhibitions, which are measured at a specific time point. These static artists’ reputation variables make it difficult to capture the reputation change over time; thus, the artist’s reputation at the time of sale can be inaccurate. Moreover, the artist’s reputation variables may have high correlations, hurting the explainability of the model result.

Researchers have attempted to explain the innate ability of artists that cannot be explained by controlled variables. Bocart et al (2018) conducted Bayesian modeling to explain gender differences in ability. The Gaussian assumption of ability is similar to the approach of this paper, but they assumed static ability and focused on gender differences. Marchenko and Sonnabend (2022) also investigated the gender effect on German visual artists through the notion of artistic ability. Hosoya (2020) conducted an extensive study on artistic ability from the perspective of a random effect model, assuming invariant artistic ability. Kackovic et al (2022) investigated students’ artistic ability via visual arts programs. The term ability is sometimes used interchangeably with terms like talent, and it is often measured using artwork prices (Angelini et al 2023; Etro and Stepanova 2018).

An artist’s ability can change over time due to various factors, including experience, practice, exposure to new techniques and styles, and personal growth (Markusen et al 2006). Pablo Picasso, one of the most influential artists of the 20th century, has remarkable ability fluctuations over his lifetime due to his adaptability and willingness to explore new artistic horizons. His early realistic style gave way to the emotional depth of his Blue Period, showcasing his evolving ability to convey complex emotions. Influenced by African and Iberian art, he embraced abstraction during his Cubist phase, deconstructing forms and perspectives. Therefore, we should allow the change of an artist’s ability, especially when dealing with a long period of data, to properly understand the dynamic nature of artwork markets. Studies on such changes in artist ability have been conducted from various perspectives, including career effects and aging effects (Galenson 2000; Galenson and Weinberg 2000; Galenson 2004).

In this study, we develop a novel pricing model to analyze the magnitude of the popularity effect in the artwork market and estimate the artist’s ability. Also, we propose a time-dependent artist popularity measure that can be inferred from artwork sales history data without using the artist’s reputation attributes. Therefore, the proposed measure can consider the popularity at the point of artwork sales. We employ the proposed popularity measure to determine whether the popularity effect occurs in the system. In addition, genuine artist ability under the popularity effect can be inferred to help with artist recommendation systems and artwork price prediction. The influence of variables on pricing can also be analyzed like the hedonic pricing model. The inference algorithm is presented to properly estimate the model parameters and the artist’s ability values. The simulation study is conducted to show the validity of the presented algorithm. The real data analysis detects the rich-get-richer nature of the artwork pricing mechanism with inferences on the analysis of the artist’s abilities.

The remainder of the paper is structured as follows. Section 2 proposes the popularity measure and our pricing model. The inference algorithm of the proposed model is presented in Sect. 3 with a discussion on the validity of the algorithm in Sect. 3. Section 4 analyzes the auction artwork sales data with interpretations. Section 5 concludes the paper with final remarks.

2 The proposed popularity measure and the model

2.1 Background

We deal with an artwork price dataset with the variables including artwork name, artist, date of work, date of sale, and price. In addition, the dataset can have other variables that may affect artwork prices, such as artwork size and medium. Let \(i=1,2, \cdots , I\) be the artists of the dataset. We use the notation \(t=1,2,\ldots ,T\) for the date of sale and \(t'=1,2,\ldots , T'\) for the date of work. Let an artist i created \(K_{i,t}\) artworks at the date of sale t, with prices \(p_{i,t,k}\), features \(x_{i,t,k,m}, m=1,2,\cdots ,M\), and date of work \(t'_{i,t,k}\) for each artwork \(k=1,2,\ldots ,K_{i,t}\), where M is the number of independent variables. Similarly, we may index the artworks using the date of work, which is another temporal variable. Let an artist i created \(K'_{i,t'}\) artworks at the date of work \(t'\), with prices \(p_{i,t',k'}\), features \(x_{i,t',k',m}, m=1,2,\cdots ,M\), date of sale \(t_{i,t',k'}\) for each artwork \(k'=1,2,\ldots , K'_{i,t'}\). The logarithmic transformation \(y_{i,t,k} = \ln (p_{i,t,k})\), \(y_{i,t',k'} = \ln (p_{i,t',k'})\) is performed to the prices, which is a common practice in pricing models since prices usually have a positive and right-skewed distribution.

Let \(u_{i,t}\) be the popularity of an artist i at the date of sale t. We use the date of sale for the popularity measure since people consider how popular the artist is at the time of purchasing an artwork. We should carefully choose the popularity measure considering the characteristics of the data, as it will greatly influence the evaluation of the rich-get-richer phenomenon. In this paper, the sum of log prices of sold artworks of artist i up to time t is used to measure the artist’s popularity as popular artists tend to have many works sold at high prices. In addition, we divide by 10,000 to make the scale similar to other variables, given by

$$\begin{aligned} u_{i,t} = \frac{1}{10000} \sum _{s=1}^{t-1} \sum _{k=1}^{K_{i,s}} y_{i,s,k}, \quad i=1,2,\cdots ,I,~ t=1,2,\cdots ,T. \end{aligned}$$
(1)

Let \(a_{i,t'}\) be the ability of an artist i at the date of work \(t'\). Note that we employ the date of work as the time index for the artist’s ability since the quality of the artwork is determined by the ability level of the date of work rather than the date of sale. We assume that an artist i creates the first artwork and the last artwork at \(t'=t'_{0,i}\) and \(t'=t'_{1,i}\), respectively. The initial ability of an artist i is assumed to follow a normal distribution with mean 0 and variance \(\tau ^2\). In most cases, the variance of the artists’ initial ability values is unknown. Therefore, we let \(\tau ^2\) be a Bayesian parameter that follows an exponential distribution with mean \(\lambda\) as a prior distribution so that the proper value of \(\tau ^2\) for the data will be inferred through the posterior distribution. Then, the distribution of initial ability is given by

$$\begin{aligned} a_{i,t'_{0,i}} \sim N(0, \tau ^2), \quad i=1,2,\cdots ,I. \end{aligned}$$
(2)

We allow the ability values to change over time in a random walk for an artist i, \(i=1,2,\cdots ,I\), as follows:

$$\begin{aligned} a_{i,t'} \sim N(a_{i,t'-1}, \sigma _a^2), \quad t'=t'_{0,i}+1, \cdots , t'_{1,i}. \end{aligned}$$
(3)

The variance \(\sigma _a^2\) of the random walk process determines the magnitude of random jumps between the adjacent time steps. In this paper, we set the two hyperparameters \(\lambda = 4.0\) and \(\sigma _a^2=0.1^2\) so that the initial abilities are expected to have a standard deviation of 2.0 over artists and allow a change of around 0.1 standard deviation per each time step.

2.2 Formulation of model

We propose a pricing model considering the popularity and ability of artists, given by

$$\begin{aligned} y_{i,t,k} = \sum _{m=1}^M \beta _m x_{i,t,k,m} + \gamma u_{i,t} + a_{i, t'_{i,t,k}} + \epsilon _{i,t,k} \end{aligned}$$
(4)

for \(i=1,2,\ldots ,I,~ t=1,2,\ldots ,T,~ k=1,2,\ldots ,K_{i,t}\), where the error term follows the normal distribution with zero mean and the variance of \(\sigma _\epsilon ^2\), that is, \(\epsilon _{i,t,k} \sim N(0, \sigma _\epsilon ^2)\). We may re-index the observations using the date of work, and the identical representation of Eq. (4) can be given by

$$\begin{aligned} y_{i,t',k'} = \sum _{m=1}^M \beta _m x_{i,t',k',m} + \gamma u_{i,t_{i,t',k}} + a_{i, t'} + \epsilon _{i,t',k'} \end{aligned}$$
(5)

for \(i=1,2,\ldots ,I,~ t'=t'_{0,i}, t'_{0,i}+1, \cdots , t'_{1,i},~ k'=1,2,\ldots ,K'_{i,t'}\), where \(\epsilon _{i,t',k'} \sim N(0, \sigma _\epsilon ^2)\).

The regression coefficient \(\beta _m\) represents the relationship between the m-th independent variable and the log price. The model may contain an intercept term by setting 1 for all the observations corresponding to one of the independent variables. The parameter \(\gamma\) describes the size of the rich-get-richer phenomenon of the artwork pricing system. A positive \(\gamma\) indicates a positive popularity effect that popular artists would achieve high artwork prices over time, and the stronger rich-get-richer should result in a larger \(\gamma\). If the popularity effect works in the opposite direction, i.e., popular artists have a disadvantage in artwork prices, then a negative \(\gamma\) would be observed, although a negative popularity effect would be rarely observed in real-world artwork pricing dynamics. However, care should be taken in interpretation when comparing across different datasets, as the estimate of \(\gamma\) may vary depending on the scale of popularity \(u_{i,t}\) and the characteristics of datasets.

Similar to the random effect model (Kim et al 2021; Francke and Van de Minne 2021), the effect of the artist i’s ability on the price will be directly explained by the ability term \(a_{i, t'_{i,t,k}}\) in the proposed pricing model. Also, note that the date of work \(t'_{i,t,k}\) is employed for the artwork created at the date of sale t.

Let the dataset has \(N = \sum _{i=1}^I \sum _{t=1}^T K_{i,t} = \sum _{i=1}^I \sum _{t'=t'_{0,i}}^{t'_{1,i}} K'_{i,t'}\) observations. For ease of notation, let \(x_{obs,m} = (x_{i,t,k,m})_{i=1,2,\cdots ,I, t=1,2,\cdots ,T, k=1,2,\ldots ,K_{i,t}}\), \(m=1,2,\cdots ,M\), \(u_{obs} = (u_{i,t})_{i=1,2,\cdots ,I, t=1,2,\cdots ,T, k=1,2,\ldots ,K_{i,t}}\), \(a_{obs} = (a_{i,t'_{i,t,k}})_{i=1,2,\cdots ,I, t=1,2,\cdots ,T, k=1,2,\ldots ,K_{i,t}}\), and \(y_{obs} = (y_{i,t,k})_{i=1,2,\cdots ,I, t=1,2,\cdots ,T, k=1,2,\ldots ,K_{i,t}}\) be the vector of each independent variable, popularity, ability, and dependent variable for all the observations. We denote by

$$\begin{aligned} X_{obs} = \left[ x_{obs,1}, \cdots , x_{obs,m}, u_{obs}\right] \end{aligned}$$

the \(N \times (M+1)\) matrix, where the columns \(x_{obs,1}, \cdots , x_{obs,m}\), and \(u_{obs}\) are the observation vector of independent variables and popularity.

Table 1 summarizes the notations in this paper.

Table 1 Notations and their descriptions

3 Inference algorithm

3.1 Probability distributions and their properties

The artist ability values \(a=\{a_{i,t'} \mid i=1,2,\cdots ,I,~ t'=t'_{0,i}, t'_{0,i}+1, \cdots , t'_{1,i}\}\) and the initial ability variance \(\tau ^2\) are the two latent variables of the model. The model parameters are the regression coefficients \(\beta _m\), \(m=1,2,\cdots ,M\) for the independent variables, the popularity coefficient \(\gamma\), and the error variance \(\sigma _\epsilon ^2\). Let us denote the sequence of regression coefficients \(\delta = \left( \beta _1, \cdots , \beta _m, \gamma \right)\) and the sequence of all the model parameters \(\theta = \left( \beta _1, \cdots , \beta _m, \gamma , \sigma _\epsilon ^2 \right) = (\delta , \sigma _\epsilon ^2)\).

We will use the EM algorithm to update latent variables and model parameters alternately. The E-step updates latent variables through Gibbs sampling, which requires the conditional distributions of latent variables. We also require the complete-data likelihood function to update parameters in the M-step. We first see the basic probability distributions of the model to access these requirements.

The prior distribution of \(\tau ^2\) is given by

$$\begin{aligned} p(\tau ^2) = \frac{1}{\lambda } e^{-\tau ^2/\lambda }, \quad \tau ^2 > 0. \end{aligned}$$
(6)

The prior distribution of initial ability values \(a_{i,t'_{0,i}}\), \(i=1,2,\cdots ,I\) given \(\tau ^2\), shown in Eq. (2), can be written as

$$\begin{aligned} p(a_{i,t'_{0,i}} | \tau ^2) = \frac{1}{\sqrt{2\pi }\tau } e^{-\frac{1}{2 \tau ^2} a_{i,t'_{0,i}}^2}, \quad -\infty< a_{i,t'_{0,i}} < \infty . \end{aligned}$$
(7)

The random walk process of the ability change in Eq. (3) can be described by the probability density function

$$\begin{aligned} p(a_{i,t'} | a_{i,t'-1}) = \frac{1}{\sqrt{2\pi }\sigma _a} e^{-\frac{1}{2 \sigma _a^2} \left( a_{i,t'} - a_{i,t'-1}\right) ^2}, \quad -\infty< a_{i,t'} < \infty \end{aligned}$$

for \(i=1,2,\cdots ,I\), \(t'=t'_{0,i}+1, \cdots , t'_{1,i}\). Finally, we have the probability density function of response variable \(y_{i,t,k}\), given by

$$\begin{aligned} p(y_{i,t,k} | a, \theta ) = \frac{1}{\sqrt{2\pi }\sigma _\epsilon } e^{-\frac{1}{2 \sigma _\epsilon ^2} \left( y_{i,t,k} - \left( \sum _{m=1}^M \beta _m x_{i,t,k,m} + \gamma u_{i,t} + a_{i, t'_{i,t,k}}\right) \right) ^2}, \quad -\infty< y_{i,t,k} < \infty \end{aligned}$$

for \(i=1,2,\cdots ,I\), \(t=1,2,\cdots ,T\), \(k=1,2,\ldots ,K_{i,t}\) according to Eq. (4). Another representation of the model provided in Eq. (5) yields the probability density function of \(y_{i,t',k'}\) given by

$$\begin{aligned} p(y_{i,t',k'} | a_{i,t'}, \theta ) = \frac{1}{\sqrt{2\pi }\sigma _\epsilon } e^{-\frac{1}{2 \sigma _\epsilon ^2} \left( y_{i,t',k'} - \left( \sum _{m=1}^M \beta _m x_{i,t',k',m} + \gamma u_{i,t_{i,t',k}} + a_{i, t'}\right) \right) ^2}, \quad -\infty< y_{i,t',k'} < \infty \end{aligned}$$

for \(i=1,2,\cdots ,I\), \(t'=t'_{0,i}, t'_{0,i}+1, \cdots , t'_{1,i}\), \(k'=1,2,\ldots ,K'_{i,t'}\). Also, we denote

$$\begin{aligned} p(y_{i,t'} | a_{i,t'}, \theta ) = \prod _{k'=1}^{K'_{i,t'}} p(y_{i,t',k'} | a_{i,t'}, \theta ), \end{aligned}$$

where \(y_{i,t'} = \{y_{i,t',k'} \mid k'=1,2,\ldots ,K'_{i,t'} \}\).

We are now ready to derive the required conditional distributions of latent variables for Gibbs sampling. The conditional distributions for the artist’s abilities are summarized in Proposition 1.

Proposition 1

Let us denote by

$$\begin{aligned} b_{i,t',k'} = \sum _{m=1}^M \beta _m x_{i,t',k',m} + \gamma u_{i,t_{i,t',k}} \end{aligned}$$

the linear combination term of the independent variables and popularity in Eq. (5), for \(i=1,2,\cdots ,I,~ t'=t'_{0,i}, t'_{0,i}+1, \cdots , t'_{1,i},~ k'=1,2,\ldots ,K'_{i,t'}\). Then we have the following conditional distributions for the abilities of artist i, \(i=1,2,\cdots ,I\).

  • The conditional distribution of the initial ability is given by

    $$\begin{aligned} p(a_{i,t'_{0,i}} | \tau ^2, a_{i,t'_{0,i}+1}, y_{i,t'_{0,i}}, \theta ) \sim N\left( \mu _{i,t'_{0,i}}, \sigma _{i,t'_{0,i}}^2\right) , \end{aligned}$$
    (8)

    where the mean and variance of the normal distribution are

    $$\begin{aligned} \sigma _{i,t'_{0,i}}^2 = \left( \frac{1}{\tau ^2} + \frac{1}{\sigma _a^2} + \frac{K'_{i,t'_{0,i}}}{\sigma _\epsilon ^2}\right) ^{-1}, \quad \mu _{i,t'_{0,i}} = \sigma _{i,t'_{0,i}}^2 \left( \frac{a_{i,t'_{0,i}+1}}{\sigma _a^2} + \frac{1}{\sigma _\epsilon ^2} \left( \sum _{k'=1}^{K'_{i,t'_{0,i}}}\left( y_{i,t'_{0,i},k'} - b_{i,t'_{0,i},k'}\right) \right) \right) . \end{aligned}$$
  • The conditional distributions of the intermediate ability values are given by

    $$\begin{aligned} p(a_{i,t'} | \tau ^2, a_{i,t'-1}, a_{i,t'+1}, y_{i,t'}, \theta ) \sim N\left( \mu _{i,t'}, \sigma _{i,t'}^2\right) , \end{aligned}$$
    (9)

    where the mean and variance of the normal distribution are

    $$\begin{aligned} \sigma _{i,t'}^2 = \left( \frac{2}{\sigma _a^2} + \frac{K'_{i,t'}}{\sigma _\epsilon ^2}\right) ^{-1}, \quad \mu _{i,t'} = \sigma _{i,t'}^2 \left( \frac{a_{i,t'-1} + a_{i,t'+1}}{\sigma _a^2} + \frac{1}{\sigma _\epsilon ^2} \left( \sum _{k'=1}^{K'_{i,t'}}\left( y_{i,t',k'} - b_{i,t',k'}\right) \right) \right) , \end{aligned}$$

    for \(t'=t'_{0,i}+1, t'_{0,i}+2, \cdots , t'_{1,i}-1\).

  • The conditional distribution of the last ability is given by

    $$\begin{aligned} p(a_{i,t'_{1,i}} | \tau ^2, a_{i,t'_{1,i}-1}, y_{i,t'_{1,i}}, \theta ) \sim N\left( \mu _{i,t'_{1,i}}, \sigma _{i,t'_{1,i}}^2\right) , \end{aligned}$$
    (10)

    where the mean and variance of the normal distribution are

    $$\begin{aligned} \sigma _{i,t'_{1,i}}^2 = \left( \frac{1}{\sigma _a^2} + \frac{K'_{i,t'_{1,i}}}{\sigma _\epsilon ^2}\right) ^{-1}, \quad \mu _{i,t'_{1,i}} = \sigma _{i,t'_{1,i}}^2 \left( \frac{a_{i,t'_{1,i}+1}}{\sigma _a^2} + \frac{1}{\sigma _\epsilon ^2} \left( \sum _{k'=1}^{K'_{i,t'_{1,i}}}\left( y_{i,t'_{1,i},k'} - b_{i,t'_{1,i},k'}\right) \right) \right) . \end{aligned}$$

Proof

Using Bayes’ rule, we obtain the relationship for the initial ability of an artist i given by

$$\begin{aligned} p(a_{i,t'_{0,i}} | \tau ^2, a_{i,t'_{0,i}+1}, y_{i,t'_{0,i}}, \theta ) \propto p(a_{i,t'_{0,i}} | \tau ^2) p(a_{i,t'_{0,i}+1} | a_{i,t'_{0,i}}) p(y_{i,t'_{0,i}} | a_{i,t'_{0,i}}, \theta ). \end{aligned}$$

The intermediate ability values of an artist i is

$$\begin{aligned} p(a_{i,t'} | \tau ^2, a_{i,t'-1}, a_{i,t'+1}, y_{i,t'}, \theta ) \propto p(a_{i,t'} | a_{i,t'-1}) p(a_{i,t'+1} | a_{i,t'}) p(y_{i,t'} | a_{i,t'}, \theta ) \end{aligned}$$

for \(t'=t'_{0,i}+1, t'_{0,i}+2, \cdots , t'_{1,i}-1\). Finally, the last ability of an artist i can be written by

$$\begin{aligned} p(a_{i,t'_{1,i}} | \tau ^2, a_{i,t'_{1,i}-1}, y_{i,t'_{1,i}}, \theta ) \propto p(a_{i,t'_{1,i}} | a_{i,t'_{1,i}-1}) p(y_{i,t'_{1,i}} | a_{i,t'_{1,i}}, \theta ). \end{aligned}$$

It is straightforward to obtain the probability density function of normal distributions in Eqs. (8), (9), and (10) by computing the product of normal densities.

\(\square\)

Next, the conditional distribution of \(\tau ^2\) is given by

$$\begin{aligned} p(\tau ^2 | a_{i,t'_{0,i}}, i=1,\cdots ,I) \propto p(\tau ^2) \prod _{i=1}^I p(a_{i,t'_{0,i}} | \tau ^2). \end{aligned}$$
(11)

We use the transformation \(\eta = \ln (\tau ^2)\) to sample \(\tau ^2\) effectively through the ARS(Adaptive Rejection Sampling) that requires the log-concavity of the target distribution (R. Gilks and Wild 1992). Proposition 2 shows the requirement.

Proposition 2

The probability density function \(p(\eta | a_{i,t'_{0,i}}, i=1,\cdots ,I)\) is log-concave in \(\eta\).

Proof

By using Eq. (11) and \(\left| \frac{d \tau ^2}{d \eta }\right| = e^{\eta }\), we obtain

$$\begin{aligned} p(\eta | a_{i,t'_{0,i}}, i=1,\cdots ,I) \propto p(\tau ^2 = e^\eta ) \prod _{i=1}^I p(a_{i,t'_{0,i}} | \tau ^2 = e^\eta ) \cdot e^{\eta }. \end{aligned}$$
(12)

By taking the logarithm and plugging in the probability density functions in Eqs. (6) and (7), we can write

$$\begin{aligned} \ln p(\eta | a_{i,t'_{0,i}}, i=1,\cdots ,I) = - \frac{1}{\lambda } e^{\eta } - \left( \frac{I}{2} - 1 \right) \eta - 2 e^{-\eta } \sum _{i=1}^I a_{i,t'_{0,i}}^2 + C, \end{aligned}$$

where C is constant with respect to \(\eta\). Then the second-order derivative

$$\begin{aligned} \frac{\partial ^2}{\partial \eta ^2} \ln p(\eta | a_{i,t'_{0,i}}, i=1,\cdots ,I) = - \frac{1}{\lambda } e^{\eta } - 2 e^{-\eta } \sum _{i=1}^I a_{i,t'_{0,i}}^2 \end{aligned}$$

is negative for any \(\eta\), showing the logarithm of the target distribution is concave in \(\eta\).

\(\square\)

We can sample \(\eta\) via ARS and take the inverse transformation \(\tau ^2 = e^{\eta }\) to obtain samples of \(\tau ^2\) for Gibbs sampling.

Next, we turn our attention to the M-step. We need the complete-data likelihood function to construct an optimizing function, given by

$$\begin{aligned} \begin{aligned} L(\theta | a, \tau ^2, y_{obs})&= p(y_{obs} | \theta , a, \tau ^2) \\&= \prod _{i=1}^I \prod _{t=1}^T \prod _{k=1}^{K_{i,t}} p(y_{i,t,k} | a, \theta ). \end{aligned} \end{aligned}$$

Then, the complete-data log-likelihood function of the model can be expressed by

$$\begin{aligned} \begin{aligned} l(\theta | a, \tau ^2, y_{obs})&= \ln L(\theta | a, \tau ^2, y_{obs}) \\&= \sum _{i=1}^I \sum _{t=1}^T \sum _{k=1}^{K_{i,t}} \ln p(y_{i,t,k} | a, \theta ). \end{aligned} \end{aligned}$$

Let \(Q(\theta | \theta _{prev})\) be the expectation of the log-likelihood function given by

$$\begin{aligned} Q(\theta | \theta _{prev}) = E \left[ l(\theta | a, \tau ^2, y_{obs}) | y_{obs}, \theta _{prev}\right] , \end{aligned}$$

where the expectation is over the latent variables a and \(\tau ^2\). EM algorithm will update the parameter \(\theta\) using the previous parameter value \(\theta _{prev}\) by maximizing \(Q(\theta | \theta _{prev})\) with respect to \(\theta\). In practice, obtaining the exact value of \(Q(\theta | \theta _{prev})\) is not feasible. Therefore, we approximate the objective function

$$\begin{aligned} \hat{Q}(\theta | \theta _{prev}) = l(\theta | a_{est}, \tau ^2_{est}, y_{obs}) \end{aligned}$$
(13)

by employing the estimate of latent variables \(a_{est}\) and \(\tau ^2_{est}\), where the estimates should be obtained with given observation \(y_{obs}\) and previous parameter estimate \(\theta _{prev}\). We can utilize the ordinary least square method to obtain the parameter estimate of \(\theta\) that maximizes the function \(\hat{Q}(\theta | \theta _{prev})\) for the M-step.

3.2 Algorithm and inference

We present the EM algorithm to obtain the model parameter estimates \(\theta\) and the latent variables a and \(\tau ^2\). We obtain G Gibbs samples in the E-step for each latent variable. We remove the first \(G_0\) samples to remove potential bias due to the initial values. Furthermore, we initialize the Gibbs samples by setting the last Gibbs sample of the previous iteration. In this paper, we set \(G=110\) and \(G_0=10\).

Algorithm 1
figure a

EM Algorithm

We set the ability estimate at the s-th iteration of the EM algorithm, \(\hat{a}^{(s)} = \left( \hat{a}_{i,t'}^{(s)}\right) _{i=1,2,\cdots ,I, t'=t_{0,i}, t_{0,i}+1, \cdots , t_{1,i}}\), to the average value of the corresponding Gibbs samples, given by

$$\begin{aligned} \hat{a}_{i,t'}^{(s)} = \frac{1}{G-G_0} \sum _{g=G_0+1}^G a_{i,t',(g)}^{(s)}. \end{aligned}$$

In the M-step of Algorithm 1, we employ the current estimate \(\hat{a}_{obs}^{(s)}\) given by

$$\begin{aligned} \hat{a}_{obs}^{(s)} = \left( \hat{a}_{i, t'_{i,t,k}}^{(s)} \right) _{i=1,2,\cdots ,I, t=1,2,\cdots ,T, k=1,2,\cdots ,K_{i,t}} \end{aligned}$$

for \(a_{est}\) in Eq. (13). Note that \(\tau _{est}^2\) in Eq. (13) is not directly used in the ordinary least square optimization process.

We repeat E-step and M-step until the convergence of \(\hat{\theta }^{(s)}\), \(s=1,2,\cdots\). Then Algorithm 1 provides the final estimates \(\hat{\theta } = \hat{\theta }^{(S)} = (\hat{\delta }^{(S)}, \hat{\sigma }_{\epsilon }^{2,(S)})\) for the model parameters. The standard error of the estimate \(\hat{\delta }\) can be obtained by the diagonal elements of the \((M+1) \times (M+1)\) matrix \(\hat{\sigma }_{\epsilon }^{2} \left( X_{obs}^T X_{obs}\right) ^{-1}\).

We also obtain Gibbs samples \(a_{(g)} = a_{(g)}^{(S-1)}\), \(\tau _{(g)}^2 = \tau _{(g)}^{2,(S-1)}\), \(g=1,2,\cdots ,G\) for the latent variables from Algorithm 1. We define the point estimates for latent variables as the average values of the samples:

$$\begin{aligned} \hat{a} = \frac{1}{G-G_0} \sum _{g=G_0+1}^G a_{(g)}, \quad \hat{\tau }^2 = \frac{1}{G-G_0} \sum _{g=G_0+1}^G \tau _{(g)}^2. \end{aligned}$$
(14)

Since the ability value is given at each time point, we can consider the average value over time as a single-value estimate of an artist i, given by

$$\begin{aligned} \bar{\hat{a}}_i = \frac{1}{t'_{1,i}-t'_{0,i}+1} \sum _{t'=t'_{0,i}}^{t'_{1,i}} \hat{a}_{i,t'}. \end{aligned}$$
(15)

4 Applications to artwork dataset

4.1 Dataset

We use the Artnet auction data, where Artnet is an online platform that operates a global marketplace and informational resource for the art market. In this paper, we focus on contemporary art, by selecting the artists of the Top 500 artists by fine art and NFT auction turnover in the annual report “The Art Market Trends” from 2018 to 2022 provided by Artprice(www.artprice.com). We further select artists born after 1920 and categorized as “Post-War Art” and “Contemporary Art.” Up to 500 artworks are randomly sampled for each selected artist to properly consider less famous artists, preventing potential popularity bias. Next, we remove artworks that do not have images or sold price records.

The obtained dataset contains 64,176 artworks created by 260 artists sold at auctions. We consider data with the date of work from the years 1951 to 2020 and the date of sale from the years 1993 to 2022. As in the simulation study, we use 5 years and 1 year as time windows for the date of work and the date of sale, respectively, so that we have time points \(t'=1(1951--1955), 2(1956--1960), \cdots , 14(2016--2020)\) for the date of work and \(t=1(1993), 2(1994), \cdots , 30(2022)\) for the date of sale. We choose artists with three or more artworks in each of five or more consecutive dates of work \(t'= t'_0, t'_0+1, \cdots , t'_1\) from the time of entry \(t'_0\), where \(t'_0\) is the first time three or more artworks appear in the system and \(t'_1 - t'_0 \ge 5\). We remove the artworks that are not created at \(t'= t'_0, t'_0+1, \cdots , t'_1\). Consequently, our final Artnet dataset contains 42,599 artworks created by 152 artists. We consider the following independent variables:

  • Log Area(\(m=1\)) is the logarithmically transformed values of the physical surface area occupied by the artwork.

  • Signed(\(m=2\)) refers to whether the artwork is signed by the artist.

  • Inscribed(\(m=3\)) refers to whether the artwork has additional writing, markings, or inscriptions.

  • Medium1 is the material used to create the artwork and classified into seven categories: oil(\(m=4\)), acrylic(\(m=5\)), water(\(m=6\)), pastel(\(m=7\)), pencil(\(m=8\)), mixed media(\(m=9\)), and others(reference category).

  • Medium2 is the material on which an artwork is created and classified into seven categories: canvas(\(m=10\)), cardboard(\(m=11\)), board(\(m=12\)), paper(\(m=13\)), leather(\(m=14\)), and others(reference category).

  • Year of sale: \(t=1(1993, m=15), 2(1994, m=16), \cdots , 30(2022, m=44)\)

The artist’s popularity values are calculated using Eq. (1). The dependent variable is the log price (dollar) of artworks.

4.2 Result

Table 2 Estimated regression coefficients and corresponding standard errors of the artnet dataset
Fig. 1
figure 1

Trace plots of the regression coefficients \(\beta _1\), \(\beta _2\), \(\beta _4\), \(\beta _{30}\), and \(\gamma\) shown in order top to bottom. The horizontal axis is the iteration index g of Algorithm 1

Table 2 shows the parameter estimation results of the artnet dataset. Figure 1 shows the trace plots of the regression coefficients \(\beta _1\), \(\beta _2\), \(\beta _4\), \(\beta _{30}\), and \(\gamma\), showing the convergence of the regression coefficients. Although not shown in Fig. 1, the convergence of other parameters is also stable.

We can see that most parameters are statistically significant. Positive \(\hat{\beta }_1=0.65\), \(\hat{\beta }_2=0.04\), and \(\hat{\beta }_3=0.09\) indicate that large, signed, and inscribed artworks tend to be expensive. The parameter estimates from \(\hat{\beta }_{15}\) to \(\hat{\beta }_{44}\) corresponding to the date of sale imply that the recently sold artworks tend to have higher prices. The order of Medium1 from most expensive to cheapest turns out to be oil(\(\hat{\beta }_4=0.65\)), acrylic(\(\hat{\beta }_5=0.46\)), pastel(\(\hat{\beta }_7=0.32\)), water(\(\hat{\beta }_6=0.17\)), pencil(\(\hat{\beta }_8=0.06\)), mixed media(\(\hat{\beta }_9=0.04\)), and others(reference category). On the other hand, the order of Medium2 from most expensive to cheapest is canvas(\(\hat{\beta }_{10}=0.73\)), board(\(\hat{\beta }_{12}=0.36\)), others(reference category), cardboard(\(\hat{\beta }_{11}=-0.15\)), leather(\(\hat{\beta }_{14}=-0.26\)), and paper(\(\hat{\beta }_{13}=-0.33\)). Most importantly, the statistically significant and positive popularity parameter \(\hat{\gamma }=1.38\) suggests that artworks by artists with a high reputation are likely to be pricy.

Fig. 2
figure 2

Estimated ability values of the five artists A. R. Penek, Adrian Ghenie, Albert Oehien, Alex Katz, and Alighiero Boetti, with confidence intervals made by \(\pm 2\) times the standard deviation of Gibbs samples \(a_{i,t',(g)}\), \(g=G_0+1, G_0+2, \cdots , G\) centered on the estimate \(\hat{a}_{i,t'}\). The horizontal axis is the time index \(t'\) of the date of work

Through the proposed model, we can infer artist ability values using the popularity and the independent variables as control variables. Artworks by famous artists tend to be expensive, and we try to infer how much this is caused by the artist’s reputation and the artist’s ability. In other words, we estimate the artist’s ability excluding the effect due to the artist’s popularity. Also, we can infer how the author’s ability changes over time. Figure 2 shows the ability inference results for five artists in the dataset. Adrian Gheni has positive ability estimates at all times, indicating his intrinsic ability is higher than other artists. We can also see that his ability gradually increases over time. Note that the average ability should be 0 according to Eqs. (2) and (3). On the other hand, A.R. Penek’s ability variability is small, and he has negative ability estimates for all \(t'=1,2,\cdots ,14\). Inferring from Alex Katz’s ability estimate and its trend, we can see that he has approximately average ability and tends to gradually decrease.

Table 3 Top 5 artists based on the ability estimate for each time and the averaged estimate of the Artnet dataset

Table 3 shows the top 5 artists based on the estimated ability values at each time point and the average estimate over time, computed by Eqs. (14) and (15), respectively. We can see that Jasper Johns (3.74) has the greatest overall ability. We can also find capable artists at a specific time point, for example, Wayne Thiebaud is the most talented artist from 2001 to 2020. Note that the proposed model estimates the ability values considering the rich-get-richer effect caused by the reputation of the artists.

5 Concluding remarks

In this paper, we propose an artwork pricing model to measure the size of the popularity effect and to infer the intrinsic ability of artists in the presence of the popularity effect. The proposed model has the advantage of utilizing independent variables that affect artwork prices like the hedonic model. The estimation algorithm of the proposed model is presented to infer the model parameters and artist ability values via the EM algorithm and Gibbs sampling. Monte Carlo simulations show that the proposed algorithm works well by showing the convergence of the parameters and the suitability of the inference on the artist’s ability. Additionally, the algorithm works reasonably on real data analysis as well.

We present a popularity measure in Eq. (1) that can be inferred from the dataset without explicit reputation measures of artists. It is worth noting that the presented popularity measure can be customized depending on the characteristics of the dataset. For example, if we have the artist’s reputation measures such as the number of gallery openings and media awareness, then the popularity measure can be modified considering the explicit artist’s reputation values. Also, we usually consider how famous the artist is when the artwork is sold, but we may also use the date of work if appropriate. In this paper, the presented popularity measure has a discrete structure with significant jumps over time. Hence, employing a smoothed popularity measure could be another valuable alternative.

The inference on the artist’s ability is performed on the Artnet dataset regarding the popularity effect in the system. The artist’s ability is allowed to change over time, reflecting the artist’s experience, self-growth, inspiration due to technological advancement, and decline in ability due to personal issues such as aging and health. The degree of change in the artist’s ability is set as a hyperparameter so that we can control for the expected change in the artist’s ability. Also, we may consider the other structures of ability dynamics by modifying the ability assumptions in Eqs. (2) and (3) according to the characteristics of a dataset. For example, we may modify the assumption of the artist’s ability to have discrete levels. Furthermore, the proposed model can cover any time intervals and even irregular time intervals by defining the time index of the date of work. For example, it may be more reasonable to consider the artist’s various creative periods taking into account the characteristics of each period in art history.

The inferred ability values of artists can be applied to the artist recommender system in places such as artwork sales platforms while considering the system’s rich-get-richer effect. In future work, one may develop an artwork recommender system using the idea of the proposed model to eliminate the popularity bias. Another direction of future work includes extending the proposed model to consider the interaction between the popularity and ability of artists. The artist’s ability is a latent variable in the proposed model, affecting their artwork prices. A single latent variable may oversimplify the multifaceted nature of ability, such as creativity, innovation, and technical skill. A model incorporating multi-dimensional latent variables could be another direction for future research.