1 Introduction and literature review

Price is what a consumer must pay in order to receive a product. It it also known as a charge or fee when referring to services. Although price is often related to the maximization of a company’s profits, it can also compete for or change market share. In any case, knowing the price of a product (or as close to this as possible) confers an advantageous position on companies that trade with it in one way or another.

Price prediction is a complicated task when it relates to products in markets where there are pressure groups with intricate interrelated interests. Indeed, price is the output of a complex process which combines both factors and conflicting interests amongst related agents. While factors influencing price are known (production costs, profit-margins, market demand, consumer purchasing power, competitors and regulation amongst others), the interests (interrelationships) between lobbies are not. And even if these interests were known, it would not be easy to quantify them to know exactly what weight (i.e., how much influence) each lobby has in this process.

This paper presents a mathematical/artificial intelligence (AI) model that calculates price outcomes in markets with lobbies by capturing prices using a time-varying Markov random field over an underlying graph which represents the market. Our main contribution is to use this method to successfully solve the complex problem of what kinds of market prices emerge when there are multiple agents pushing for their prices. Our contribution is not only to solve the problem but also to do it in a clear and direct way through a set of straightforward formulas, based on the price levels expected by lobbies which in turn, can be calculated by using the probability that each lobby gives to market prices. More specifically, if the minimum price is understood as being the price level at least reached, specific formulas for computing the likelihood of both the aggregate and minimum market price of a product are provided.

An additional contribution is to give an expression to compute the weight of each lobby. A quantitative study of this type on the importance of lobbies in the process of pricing outcomes sheds light on a problem for which there were only qualitative references up until now. Actually, to the best of our knowledge, this is the first quantitative study of the lobby’s importance in processes of pricing outcomes. This would complete the information necessary to develop adequate pricing strategies that allow price negotiation processes to be conducted from a vantage point. Such help in decision-making would allow practitioners to anticipate and seize market opportunities.

In the final section, this paper also explores the connections between the key problem (what kind of price outcome appears in a market with multiple agents that push for its price) and thermoeconomics, an emerging discipline (involving concepts such as ensemble average of machine learning, see [10]) which provides a novel approach to the problem being studied.

The structural model is based on Markov random fields (MRFs) considered in their dynamic version (Time-Dynamic Markov Random Fields, TD-MRFs) to adjust to reality. Our approach has several advantages over previous ones: a model of the interrelationships and objectives of the agents (as pricing factors, stakeholder’s interests, interactions amongst agents) is provided allowing the complex process process to be visualized using a suitable dynamic graphical model (DGM). Moreover, a joint probability distribution (associated to a TD-MRF) is given that gathers the perspectives of all the related agents and translate them into expressions with simple and clear formulations. In this way, the use of TD-MRFs clearly shows how the variables involved interact by displaying the relationships amongst them. This is in contrast with other approaches that are classified as opaque because their internal structures remain hidden [18], [24]. As a result, all the processes can be fine-tuned by varying the variables of the model using suitable sensitivity tests to better suit the changing dynamics of the sector. Thanks to the notable computational properties of DGMs, our approach can also be easily translated into coding-terms thereby providing extra support. Furthermore, the use of TD-MRFs for building the structural model has two additional advantages: first, TD-MRFs require significantly less information to produce an output (only the information from cliques is required) compared with existing approaches such as Bayesian or neural networks, which need large amounts of data (one of their major disadvantages). This also means a higher computing speed and lower storage capacity requirements. And second, the use of TD-MRFs means that there is no need to use a given distribution function, as required when using neural networks and related statistical methods, which strongly rely on a given distribution.

Generally speaking, interest groups which compete for influence are called lobbies whether they are fully organized or not. They are pressure groups which try to exercise their influence on policymaking processes, government decisions or the economic course of action, particularly on pricing processes. We illustrate our analysis of price outcomes under the influence of lobbying with the olive oil sector, where lobby pressure along with a lack of transparency in certain links of the value chain, make price prediction difficult. It is important to note that, in this sector, determining the weight of each lobby (i.e., how much influence they come to exert) is challenging since each group exerts pressure in a different way. Moreover, some of the lobbies have extra support that adds to their specific weight. We are referring to agricultural producers -at the lowest link in the chain- who apply pressure with public displays of discontent, in contrast with the more private actions of other parts of the olive oil value chain. Furthermore, the demonstrations held by agricultural producers enjoy public support. The consequence of this extra weight is enormous influence as evidenced by the ability of the producers to impact on the process of reforming the EU(European Union)’s agricultural policy (Agriculture is the EU’s largest policy area in budgetary terms, see [2] and [23]). Therefore, our approach clarifies this panorama by providing explicit computation of the weights of the lobbies in price outcomes. Importantly, both the model designed and its findings can be applied , with minor adjustments to other lobbying. As mentioned, our analysis of price outcomes in the presence of lobbying uses Markov random fields, which are stochastic models that generalize the well-known Markov chains. While both stochastic models and Markov decision processes have been previously used for studies on price outcomes, see [4] for instance, to the best of our knowledge, our approach is the first study of price outcomes under the influence of lobbying that explicitly aggregates the interests of pressure groups in a single final output.

In the literature review, we shall address two points: price prediction and those papers which specifically deal with the olive oil sector. There are several studies which address price outcomes. Instead of trying to mention them all, we shall make a brief summary regarding the different techniques used. Let us start with probabilistic forecasting: autoregression (AR), moving average (MA), autoregressive moving average (ARMA), autoregressive integrated moving average (ARIMA) and seasonal autoregressive integrated moving-average (SARIMA), which are all known as time series analysis. While time series techniques can help to clean data by filtering out the noise, one of their major drawbacks in price prediction is that they are strongly dependent on past history: they are unable to forecast anything that has not previously happened (see [25], which contains a systematic literature review on financial time series forecasting, covering the period 2005-2019). For these reasons, these are valuable tools when combined with other methodologies, see the paper [20] which examines performance of traditional time series models, ROBUST models and ARIMA models together with generalized regression neural networks (GRNNs) and multi-layer feedforward networks (MLFNs) in forecasting prices.

According to [21], probabilistic forecasting has been one of the areas that has made significant advances in the last decade against all odds. Our proposal can also be considered to be one of the probabilistic forecasting methodologies inasmuch as it employs joint probability distributions. It has a high degree of novelty as it relies on MRFs for predicting purposes, see [11], [12], while MRFs are tools known in AI as instruments that have traditionally been used in image processing or pattern recognition and, more recently in activity detection (see [8] for an application of MFRs for the detection of spammers).

Artificial neural networks (ANN) are widely used in price prediction, see [6] and [17] to mention more recent papers. The latter includes an ANN model which tests the model effectiveness for the prediction of perishable vegetable price series. Vector Support Machines (VSM) are neural network techniques used in solving nonlinear regression estimation problems. They also have been successfully used to predict price, see [28] for instance, where the authors combine VSM with the firefly algorithm (FA) to enhance the global convergence speed. Another example is the more recent paper [13], where time series forecasting of agricultural product prices are used together with a Wavelet-Support Vector Machine (W-SVM) model in forecasting palm oil price. In this type of model, wavelet transform is used to decompose data series into two parts which are then used as the input for the SVM model to forecast the palm oil price, thereby enhancing the forecasting performance (compared with the traditional ANN). A mixture of ANN or VSM with other methodologies are also used, as in [26] which combines an ensemble empirical mode decomposition (EEMD) into a wavelet neural network with a random time effective (WNNRT) to design a hybrid neural network prediction model for energy prices. A further example is the paper [27], where a novel random deep bidirectional gated recurrent unit neural network is constructed in order to obtain accurate forecasts of crude oil prices, which also integrates historical data into the training process of the model.

While effective in predicting prices, these techniques have some limitations regarding issues related to price outcomes due to the complex dimensionality of this problem, not only because of the large number of factors that affect the final output but also, above all, for the intricate relationships between them. Another limitation of these techniques is that they are data-intensive requiring huge amounts of data. Nonetheless, in the opinion of some authors, [18] or [24], their main drawback is that they implicitly involve a high levels of opacity in their reasoning (they are called black boxes). This makes them difficult to fine tune, hindering the incorporation of the characteristics of each scenario into the models.

Unlike with qualitative analysis, there is very little literature on olive oil sector-specific papers that study the subject from a quantitative perspective. Amongst the most recent ones is the paper [16], referred to as adaptive learning forecasting, where forecast revisions are made from previous forecasting errors. The paper also applies this to agricultural prices. In [22] the interrelationship between olive oil price dynamics and pollen emission as a price determinant are studied by using a pollen monitoring methodology to predict olive yields. . Moreover this paper explores the difficulties of defining marketing strategies for oil prices due to the complex nature of the olive oil market. Furthermore, the paper [5], which explores the price determinants of extra-virgin olive oil using a mathematical model based on three price levels (production, intermediation and sales) and states the relationships amongst price factors such as the purchase cost of olives and the production cost of olive oil. Finally, the paper [1] econometrically considers olive oil price formation aimed at identifying the trader’ behaviour in the olive oil market. In this paper, for prediction purposes, the artificial neural network (ANN) approach is used.

The remainder of the paper is organized as follows. Section 2 contains the fundamentals on TD-MRFs. In Section 3 the structural model of price outcomes in the presence of lobbying is designed. Section 4 is devoted to developing the procedure for price prediction. In Section 5, an analysis of energy functions is performed. The application of the model to the olive oil market is fully detailed in Section 6, including both a model validation with real data and a complementary study from the perspective of thermodynamics. Finally, conclusions are stated in Section 7.

2 Dynamic graphical models. Time-Dynamic Markov random fields

Any group of random variables \(X=\{X_{s_{i}}| s_{i}\in S, i\in \mathbb {N}\}\) can be viewed as a spatial stochastic process inasmuch as they may take concrete values \(X_{s_{i}}=x_{s_{i}}\) for each site si of a given space S. For them, terms like node, site or vertice are used interchangeably throughout the literature. These processes are also known as graphical models and their main feature is that they express the probabilistic conditional dependence between random variables. For them, let us pay attention to notions of neighbourhood and clique. On one hand, two vertices si and sj are said to be adjacent, denoted by \(s_{i}\sim s_{j}\), if there is an edge connecting them. There is an equivalence between defining edges and the defining neighbourhood of a node si, N(si), in the following sense: for two nodes si,sj there is an edge connecting them \(s_{i} \sim s_{j}\) if and only if sjN(si). On the other hand, cliques are maximally connected subgraphs of the underlying graph.

Graphical models are commonly categorized into Bayesian and Markov random fields (MRFs) where Markov random fields are particular graphical models that use an undirected graph to represent a distribution, see (Kindermann 1980) [15] . These have been (only) used in the image-processing scenario (denoising and vision), where only a static model is needed. The local Markov property which characterizes MRFs is defined as follows: if \(\mathbb {P}[X]= \mathbb {P}[\{X_{s_{i}}| s_{i}\in S, i\in \mathbb {N}\}]=\)\(\{\mathbb {P}[X_{s_{i}}=x_{s_{i}}] | s_{i}\in S, i\in \mathbb {N}\}\) denotes the joint distribution of random variables \(X=\{X_{s_{i}}| s_{i}\in S, i\in \mathbb {N}\}\), thus

$$ \mathbb{P}[X_{s_{i}}=x_{s_{i}} | X_{S-\{s_{i}\}}=x ] = \mathbb{P}[X_{s_{i}}=x_{s_{i}} | X_{N(s_{i})}=x],$$

where \(\mathbb {P}\) denotes probability in order to avoid confusion with the P reserved for the variable “price”. The Hammersley-Clifford theorem establishes that an MRF of non-negative random variables (“positivity condition”) has an associated joint distribution function which can be described in terms of some functions f on the cliques \(C\in \mathcal {C}\subset S\) (i.e., in terms of some functions of the given random variable X taking values only on the cliques). These functions f are called energy functionsFootnote 1 or clique potentials when considered as product factors. Thus, the joint distribution function is of the form \( \mathbb {P}[X] = \frac {1}{Z} \exp \left [- {\sum }_{C \in \mathcal {C}} f(X_{C}) \right ]\), where Z is a normalizing constantFootnote 2 which ensures that the distribution sums to 1. This is known as a Gibbs distribution.

Time-dynamic (or time-variying) Markov Random Fields (TD-MRFs) are the plain extension of MRFs for dynamic scenarios (see [19]). These are a special sort of dynamic graphical models (DGMs) which explicitly model the correlations in space and in time as dependencies amongst the random variables such that the corresponding joint distribution function can be written by means of functions of the corresponding time-varying random variable over the cliques \(C\in \mathcal {C}\):

$$ \mathbb{P}[X] = \frac{1}{Z} \exp \left[ - \sum\limits_{C \in \mathcal{C}} f({X^{t}_{C}}) \right] \text{where \textit{t} represents the time step.} $$
(1)

As mentioned before, TD-MRFs have been chosen as the theoretical structure for our model in order to both reflect the dynamic nature of the reality of price outcomes and to reduce the amount of information needed.

3 The structural model of price outcomes using TD-MRFs (the TD-MRF model)

This section is devoted to developing the fundamentals of price outcomes in the presence of lobbying. Hereafter “price” will be denoted by P while \(\mathbb {P}\) will stand for “probability”. There are many agents who are pushing to obtain their price: agents will be denoted by ai while \(P_{a_{i}}\) is the price they are pushing for. Let A be the set \(A=\{a_{i}, i\in \mathbb {N} \}\) which gathers all the agents pushing for a certain price, no matter whether they agree on the price they want to reach or not (which we will refer to as target price). Agents are interconnected by common relationships (commercial, agreement on the target price etc.) which interconnect them. All this information can be visualized jointly with a graph (A,E) whose nodes are the agents ai and whose edges are represented by such links amongst agents.

Price is a variable that can take positive values in all cases up to a upper bound maxP: i.e., the range of P is [0,maxP]. Since \(P_{a_{i}}\) is the target price that each agent ai pushes for, each \(P_{a_{i}}\) can be viewed as a random variable as it can take different numerical values depending on each agentaiA. Thus, the previous graph (A,E) shall be the underlying graph of a graphical model with nodes the random variables \(P=\{P_{a_{i}}, a_{i}\in A\}\) whose values vary depending on each agent aiA. Once the edges between the nodes have been specified (which we will do shortly), we shall have a joint graphical representation (and therefore a graphical model) of statistical relationships between non-negative random variables \(P=\{P_{a_{i}}, a_{i}\in A\}\).

Importantly, price outcomes are the result of a time-varying process. For this reason, we make the transition from a static scenario (graphical models) to dynamic graphical models (DGMs) by incorporating the time step as superscript t. Hence, nodes in a dynamic graphical model DGM are the random variables \(P^{t}=\{P_{a_{i}}^{t}, a_{i}\in A\}\) whose values vary depending on both each agent aiA and time t. Additionally, in order to capture the huge variety of factors that influences price outcomes, we shall describe the random variable “ai’s target price”, \(P_{a_{i}}^{t}\), as an array (or column) of features \(p_{a_{ki}}^{t}, k=1, \ldots , n\) that gathers all the factors associated with price: \(P_{a_{i}}^{t} \simeq (p_{a_{1i}}^{t}, p_{a_{2i}}^{t},\ldots , p_{a_{n}i}^{t})^{t} \text {(here, superscript \textit {t} means transpose)}\). For the sake of simplicity we will identify each agent ai with their target price \(P_{a_{i}}^{t}\), i.e., \(a_{i}\simeq P_{a_{i}}^{t} \simeq (p_{a_{1i}}^{t}, p_{a_{2i}}^{t},\ldots , p_{a_{ni}}^{t})\).

Once the nodes have been fully described, the edges amongst the nodes must be defined. To this end, it is important to note that there is an equivalence between the edges and the neighbourhoods of a graphical model,Footnote 3 i.e., there is an edge connecting two nodes \(P_{a_{i}}^{t}, P_{a_{j}}^{t}\), denoted by \(P_{a_{i}}^{t} \sim P_{a_{j}}^{t}\), if and only if \(P_{a_{i}}^{t}\in N(P_{a_{j}})^{t}\). Thus, we define the neighbourhood of a node as follows:

$$ \begin{array}{@{}rcl@{}} N(P_{a_{i}}^{t})&=& \{ P_{a_{j}}^{t} \text{such that the random variables} P_{a_{i}}^{t}, P_{a_{j}}^{t} \\ &&\quad\text{are equivalent} \}, \end{array} $$

in the usual sense: random variables \(P_{a_{i}}^{t}, P_{a_{j}}^{t}\) are equivalent if and only if their distribution functions are the same: \(\mathbb {P}[P_{a_{i}}^{t}\leq p]= \mathbb {P}[P_{a_{j}}^{t}\leq p]\) for a numerical price p. From this definition it should be noted that, in particular, agents in the same neighbourhood have the same marginal distribution. Thus, the neighbourhood of a node may be easily identified as follows:

Lemma 1

All agents in the same neighbourhood have an equal conditional distribution.

Proof

In order to prove this result, let \(P_{a_{i}}^{t}, P_{a_{j}}^{t}\) be two nodes in the same neighbourhood. The definition taken implies that their marginal distributions are equal, \(\mathbb {P}[P_{a_{i}}^{t}]= \mathbb {P}[P_{a_{j}}^{t}]\). The result follows from applying Bayes’s theorem: \(\mathbb {P}[P_{a_{i}}^{t}|P_{a_{j}}^{t}]= \frac {\mathbb {P}[P_{a_{j}}^{t}|P_{a_{i}}^{t}] \cdot \mathbb {P}[P_{a_{i}}^{t}]}{\mathbb {P}[P_{a_{j}}^{t}]}= \mathbb {P}[P_{a_{j}}^{t}|P_{a_{i}}^{t}]\). □

Proposition 1

The neighbourhood of a node consists of all the agents who have the same pricing interests. That is, the neighbourhoods are the lobbies.

Proof

On one hand, from the definition of neighbourhood of a node ai, agents aj who belong to this set have the same marginal distribution function, that is, their corresponding achievement prices shall take the same values. On the other hand, previous lemma states that their corresponding conditional distributions are equal which means that their mutual interests are the same on their target price. This implies that agents in a neighbourhood shall jointly push for the same price. This means that the neighbourhoods are the lobbies. □

Once the nodes and edges have been defined (and considering former identification \(a_{i}\simeq P_{a_{i}}^{t}\)), previous structures enable us to switch from graph (A,E) to the dynamic graphical model (Pt,Nt), In other words, price outcomes are the result of a process like the one described in Fig. 1 that uses a DGM. In this way, this complex process of relationships between agents is displayed as a solid graphical representation which clearly shows both the target price of each agent and the affinity-in-price (or lack of) between them.

Fig. 1
figure 1

Price outcomes as a TD-DGM

It is important to remember that for any random variable, the transition from the random variable to a particular value is known as the realization of the random variable. This procedure can be carried out using several approaches, known as rating/scoring processes. As mentioned before, price outcomes are the result of a dynamical process, which is time-varying and depend on agents ai, which has been fully covered by its representation as a former DGM (Pt,Nt). From this general representation, two kinds of realizations may be carried out. First, if price outcomes for a concrete time instant t0 are required, the corresponding DGM, formed by

  • nodes \(P(t_{0})=\{P_{a_{i}}(t_{0}), a_{i}\in A\}\) and

  • edges \(P_{a_{i}}(t_{0}) \sim P_{a_{j}}(t_{0})\) if agents ai,aj belong to the same lobby,

should display the process as time stopped in t0. A second realization of price outcomes may be performed in (Pt,Nt) when the random variables involved take the same value simultaneously. Once a random variable decomposed in a collection of features \(P_{a_{i}}^{t} \simeq (p_{a_{1i}}^{t}, p_{a_{2i}}^{t},\ldots , p_{a_{n}i}^{t})\) has taken a concrete numerical value, it is renamed as feature vector, i.e., a vector formed by n numerical scores which represents an object according to the definition commonly used in AI literature.

Let us now pay attention to the former DGM. Before addressing the next theorem, there are some points that need to be clarified. In Gibbs distribution contexts, the statement “energy functions f are functions of the random variable taking values only on the cliques” can have more than one interpretation: the functions f are equal for all cliques but they can have different domains (input values of the price) depending on each clique or the f ’s can be different depending on cliques with a common domain for all cliques. Since from the perspective of mathematical modelling both interpretations are equivalent, we consider the second one to be:Footnote 4 thus \(f_{C_{i}}\) will stand for the energy function corresponding to the i th-clique. Thus, the following theorem is the most important result in the paper as it proves that price outcomes are a TD-MRF with an explicit joint probability distribution expressed in terms of lobbies. It should be noted that the positivity condition on the set of variables “price” is always satisfied.

Theorem 1 (Price outcomes are a TD-MRF)

Price outcomes viewed as a DGM are a TD-MRF. Its corresponding joint distribution \(\mathbb {P}[P^{t}]\) is equal to

$$ \begin{array}{@{}rcl@{}} \mathbb{P}[P^{t}]=\frac{1}{Z} e^{-{\sum}_{i\in \mathbb{N}} f_{L_{i}}(P^{t})}= \frac{1}{Z} \prod\limits_{i\in \mathbb{N}} e^{-f_{L_{i}}(P^{t})}, \\ \text{where $L_{i}$ stands for the \textit{i}th-lobby.} \end{array} $$
(2)

Proof

To prove the theorem is equivalent to proving that the local dynamic Markov property holds. To do so, consider that the target price of a given agent ai takes a concrete numerical value \(P^{t}_{a_{i}}= p_{a_{i}}\) while the target price of the rest of the agents (A −{ai}) takes a value p. Since the set of agents that are different from ai may be expressed as A −{ai} = N(ai) −{ai}∪ NotN(ai), by the definition of neighbourhood N(ai) (where NotN(ai) represents the set of agents who do not belong to N(ai)) it follows that

$$ \begin{array}{@{}rcl@{}} \mathbb{P}[P^{t}_{a_{i}}= p_{a_{i}} | P^{t}_{A-\{a_{i}\}}=p] & = & \frac{\mathbb{P}\left[ (P^{t}_{a_{i}}= p_{a_{i}}) \cap (P^{t}_{A-\{a_{i}\}}=p) \right]} {\mathbb{P}[P^{t}_{A-\{a_{i}\}}=p]}\\ & & \\ & = & \frac{\mathbb{P}\left[ (P^{t}_{a_{i}}= p_{a_{i}}) \cap (P^{t}_{N(P_{a_{i}})} =p) \right]} {\mathbb{P}[P^{t}_{N(P_{a_{i}})} =p]}\\ & & \\ & = & \mathbb{P}[P^{t}_{a_{i}}=p_{a_{i}} | P^{t}_{N(a_{i})})=p]. \end{array} $$

Hence, price outcomes are a TD-MRF at any time instant. In consequence, the price outcomes \(P^{t}=\{P_{a_{i}}^{t}, i\in \mathbb {N}\}\) obey a Gibbs distribution (see (1)) with a joint probability function depending on the price P over the cliques.

By and large, cliques are fully connected sub sets of the neighbourhood. However, from our definition of a neighbouring relationship, cliques are equal to the neighbourhood itself. Hence, the price outcome joint distribution can be expressed in terms of clique energy functions, different for each clique but taking the same input prices (i.e., having the same domain). That is to say, the price outcome joint distribution is of the form \(\mathbb {P}[P^{t}]=\frac {1}{Z} e^{-{\sum }_{i\in \mathbb {N}} f_{L_{i}}(P^{t})}\) with energy functions \(f_{L_{i}}\) where Li is the i th-looby, according to proposition 1. □

4 Deriving the model of price prediction

The aim of this section is to develop a price prediction procedure from the previous structural model. Explicit formulae for computing the likelihood of both aggregate and minimum market price of a product are provided to this end. Such information offers as future trends an output that aggregates the opinions of lobbies on future prices.

Therefore, we will focus on the neighbourhoods of the TD-MRF model. It is important to remember that, from the previous proposition 1, the neighborhoods are the lobbies. Taking into account the fact that the positivity condition on the variable “price” is always satisfied, the following result is achieved:

Theorem 2

Any lobby in the market, considered as a DGM, is a TD-MRF.

Proof

It is important to remember that, according to the definition of neighbourhood all nodes which belong to this set have the same distribution function. This implies that (Markov) local property holds and hence, the neighbourhood of each node is a TD-MRF. By proposition 1 the result holds. □

As mentioned before (see the proof of theorem 1) from the definition of neighbouring relationship, cliques equal the neighbourhood. Hence the terms “clique”, “neighbourhood” or (by proposition 1) “lobby” shall be used interchangeably. \(\mathbb {P}_{L_{i}}[P^{t}]\) denotes the distribution which expresses the probability of reaching a (feasible) market priceFootnote 5, according to the opinion of the Li lobby. Thus, the set of all the functions \(\mathbb {P}_{L_{i}}[P^{t}]\), i = 1,…,n(= number of lobbies) condenses the opinions of lobbies on future prices.

The next result provides an explicit expression of the likelihood distribution for the i-th lobby.

Theorem 3 (Joint probability distribution of each lobby)

The joint (i.e., for all agents belonging to the lobby) likelihood distribution for the i th-lobby is given by the formula \(\frac {1}{Z} e^{-f_{L_{i}}(P^{t})}\) where Z is a normalizing constant and \(f_{L_{i}}\) is the energy function of the i th-lobby. We refers to this as

$$\mathbb{P}_{L_{i}}[P^{t}]=\frac{1}{Z} e^{-f_{L_{i}}(P^{t})}.$$

Proof

On one hand, recall that cliques in a graphical model are connected subgraphs where no node may be added and still be connected (i.e., they are maximally connected). Moreover, there is only one clique which coincides with the whole. On the other hand, an specific joint probability function of the neighbourhood may be derived by applying the Hammersley-Clifford theorem: \(\mathbb {P}_{L_{i}}[P^{t}]\), which can be expressed in terms of energy functions f which are clique-dependent. Hence, the results follows. □

A clarification should be made on the notation of the normalizing constant Z.

Remark 1

For simplicity, in both Theorems 1 and 3 the normalizing constant Z has been denoted with the same letter (and continues in this way). However, it is not the same Z. Let us examine this in greater detail. From both theorems 1 and 3, Z can be isolated from the fact that the corresponding probability distributions sums to 1:

\(\begin {array}{lll} 1 & = & {\sum }_{\text {prices} p_{j}} \mathbb {P}[P^{t}]= \\ & = & {\sum }_{\text {prices} p_{j}} \frac {1}{Z} e^{-{\sum }_{i\in \mathbb {N}} f_{L_{i}}(P^{t})}=\\ & =& \frac {1}{Z} {\sum }_{\text {prices} p_{j}} e^{-{\sum }_{i\in \mathbb {N}} f_{L_{i}}(P^{t})}\Rightarrow \\ & \Rightarrow & Z= {\sum }_{\text {prices} p_{j}} e^{-{\sum }_{i\in \mathbb {N}} f_{L_{i}}(P^{t})}. \end {array}\)

\(\begin {array}{lll} 1 & = & {\sum }_{\text {prices} p_{j}} \mathbb {P}_{L_{i}}[P^{t}]=\\ & = & {\sum }_{\text {prices} p_{j}} \frac {1}{Z} e^{-f_{L_{i}}(P^{t})} \\ & = & \frac {1}{Z} {\sum }_{\text {prices} p_{j}} e^{-f_{L_{i}}(P^{t})}\Rightarrow \\ & \Rightarrow & Z= {\sum }_{\text {prices} p_{j}} e^{-f_{L_{i}}(P^{t})}. \end {array}\)

It should be noticed that in both cases the sub-indices set is the same. This is the result of considering that the energy functions f can be different depending on cliques but they have a common domain.

As a consequence of theorems 1 and 3, the following corollaries shall finally provide explicit formulae to compute i) likelihood of an aggregate market price and ii) likelihood of a minimum market price, understanding that the minimum price is that price level at least reached. The corresponding probabilities from the perspective of the lobbies are also given.

Corollary 1 (Likelihood of reaching a market price)

Let us consider a market with lobbies Li and a given market price p at a time instant t0. Thus,

  • the likelihood of reaching an aggregate market price p is \( \mathbb {P} [P^{t_{0}}=p]=\frac {1}{Z} e^{-{\sum }_{i\in \mathbb {N}} f_{L_{i}}(p)} = \frac {1}{Z} {\prod }_{i\in \mathbb {N}} e^{- f_{L_{i}}(p)}. \)

  • From the perspective of the ith-lobby, the likelihood of reaching p is given by \( \mathbb {P}_{L_{i}}[P^{t_{0}}=p] = \frac {1}{Z} e^{-f_{L_{i}}(p)}. \)

Finally, by using the cumulative probability, a second result is derived:

Corollary 2 (Likelihood of reaching a minimum market price)

For a market price p at time instant t0,

  • The likelihood of reaching at least p is given by \(\mathbb {P}[P^{t_{0}}\leq p]={\sum }_{p_{i}\leq p}\mathbb {P}[P^{t_{0}}=p_{i}]\).

  • From the perspective of the ith-lobby, the likelihood of reaching at least p is given by

    $$\mathbb{P}_{L_{i}}[P^{t_{0}}\leq p] = \sum\limits_{p_{i}\leq p} \mathbb{P}_{L_{i}}[P^{t_{0}}=p_{i}].$$

The fact that price outcomes are TD-MRFs provides extra-information: it is possible to weight each lobby. Actually, an instantaneous weight \(w^{t}_{L_{i}}\) may be attached to each lobby Li by computing the similarity index between the joint distribution of a given lobby and the distribution of the whole process: \(w^{t}_{L_{i}}= \frac {\mathbb {P}_{L_{i}}[P^{t}]}{\mathbb {P}[P^{t}]}\)

Let us assume that there are n lobbies in the market. Thus, the following proposition gives an explicit formula for computing the weight of the i-th lobby Li:

Theorem 4

\( w^{t}_{L_{i}} = K\cdot e^{f_{L_{1}}(P^{t})} {\ldots } e^{f_{L_{i-1}}(P^{t})}\) \(e^{f_{L_{i+1}}(P^{t})} {\ldots } e^{f_{L_{n}}(P^{t})}\) for a constant K.

Proof

By substituting the explicit form of \(\mathbb {P}[P^{t}]\) as a product of clique potentials in the definition of weight and thus applying theorem 3, the weight of the i-th lobby is equal to

where, according to remark 1, the quotient of two normalizing constants (Z’s) is a constant (K). □

In real life, it is necessary to compute the value of this weight as accurately as possible. To this end, the constant K shall be completely determined, which will be performed by following the instructions of the remark 1. For this, the variable price should be regarded as a discrete variable, as follows:

  • first, a finite range of values for price should be considered, [pmin,pmax].

  • Second, exact approximations of the variable price within the range should be taken (this step reflects the normal real-life practice of rounding prices up or down thus avoiding the infinite decimal series).

Once price has been viewed as a discrete variable within a range of values, using the following expression considered in the corresponding sub-indices makes sense:

Proposition 2

In the expression \(w^{t}_{L_{i}} = K\cdot e^{f_{L_{1}}(P^{t})} \ldots \) \(e^{f_{L_{i-1}}(P^{t})}e^{f_{L_{i+1}}(P^{t})} {\ldots } e^{f_{L_{n}}(P^{t})}\), the constant K corresponding to the i th-lobby weight with energy functions \(f_{L_{i}}\), is

$$ K = \frac{{\sum}_{p_{j}=p_{min}}^{p_{max}} e^{-{\sum}_{i\in \mathbb{N}} f_{L_{i}}(p_{j})}}{{\sum}_{p_{j}=p_{min}}^{p_{max}} e^{-f_{L_{i}}(p_{j})}}= {\sum}_{p_{j}=p_{min}}^{p_{max}} \frac{ e^{-{\sum}_{i\in \mathbb{N}}f_{L_{i}}(p_{j})} }{{\sum}_{p_{j}=p_{min}}^{p_{max}} e^{-f_{L_{i}}(p_{j})}}. $$

Proof

The constant K is a quotient of the corresponding normalizing constants Z, as shown in the proof of theorem 4. The exact values for Z’s are provided by remark 1. □

5 Analysis on energy functions/clique potentials

The significance of theorem 1 as well as that of previous results is that they delimit a complex problem within the bounds of some probability computations as long as the energy functions f are specified. Since energy functions/clique potentials have a crucial bearing on our model, we shall examine their significance in depth.

Previous theorems 1 and 3 provide explicit formulae on how to compute the likelihood of reaching an aggregate price for the whole market and the corresponding estimation from the perspective of the lobbies. In both cases, the corresponding joint probability distributions are transcribed in terms of energy functions \(f_{L_{i}}(P^{t})\) (it is important to remember that, when expressed as factors, energy functions \(f_{L_{i}}(P^{t})\) are known as clique potentials \(e^{-f_{L_{i}}(P^{t})}\)), which play a central role in computations, see Table 1. More importantly, they strongly determine the performance of the whole probability distributions.

Table 1 Energy functions and clique potentials

We will focus here on the construction of the energy functions. It is very important to make it clear that, to the best of our knowledge, there is no information in the literature about energy functions, nor on how to define them nor why some have been chosen and others have not. In any case, the specific clique functionals are given without providing any further explanations. Therefore it is worth paying attention here to how they are established and which appear most frequently in literature.

On how they can be defined, the functionals f can be arbitrarily taken as long as they meet the “energy constrain” which sets that the energy functions must be decreasing functions as for the matching of the features of the cliques with a given template. That is, if the features in a clique match the features in a given template, the energy function should decrease. Otherwise it should increase. Bearing this in mind, assuming the clique template is 1, a routine for constructing energy functions f on a clique C could be f(c) = 1 − d(c),cC, where d is a distance methodology which measures the similarities between the features of the clique and those of a given template. SSD (Sum of Squared Differences) is a further example of functions that meet the idea of measuring the distance from a given template. This pattern for defining energy functions is so wide, that it allows many options. The energy functions that appear most frequently in literature are Gaussian and log linear distributions (Table 2):

Table 2 Most commonly used energy functions

where μi and \({\sigma _{i}^{2}}\) are the mean and the variance of the i-th lobby respectively. It is well-known that the log linear functional takes its name from the fact that

$$ \begin{array}{@{}rcl@{}} f_{L_{i}}(P^{t}) &=& \exp (w_{0} +\sum\limits_{k} w_{ik}P_{ik}^{t}) \Leftrightarrow log (f_{L_{i}}(P^{t})) \\ &=& w_{0} +\sum\limits_{k} w_{ik}P_{ik}^{t} \text{(i.e., its log is linear)}. \end{array} $$

6 Model application: The olive oil sector in Andalusia

The olive, the core product in the olive oil industry, is classified under Agribusiness Classification Terms as a non-perishable and long-life agricultural product. However, olive oil can also be stored for long periods without its main qualities suffering (there are advanced techniques to preserve it from deterioration -perishing-Footnote 6 in long periods of storage). This quality, shared by both the olive and the olive oil, makes it more difficult to accurately foresee the price of olive oil since, as a result of a widespread practice in the sector, part of the oil consignments (which are stored waiting for the optimal moment for sale) go to market depending on unexpected decisions that private individuals (producers/olive oil industry and even distributors) make based solely on their own particular interests. This reality is just one example of the complexity of the olive oil sector, see [9], [22].

The aim of this section is to apply previous results to the olive oil market. This is an extremely complex sector and, therefore, it is especially important to develop price prediction techniques which provide inside information that can help to achieve an advantageous position in the price negotiation processes which take place in the olive oil value chain (e.g., between producers and the oil industry, between the oil industry and distributors and even directly between producers and distributors). Our proposal presents an encouraging solution for such a challenging problem. Specifically, our proposal succeeds in developing effective price prediction techniques which reduce the problem to straightforward probability computations.

Section 6.1 is devoted to the translation of the theoretical TD-MRF model to the context of olive oil. In Section 6.2, energy functions for the olive oil sector are analyzed. This is a necessary intermediate step considering that (as mentioned before) there is no detailed information in the literature about how to define/select the energy functions. Section 6.3 contains an example of how to use the TD-MRF model designed to produce price outcome predictions in real-life (model validation with real data) and finally, a complementary study from the perspective of thermodynamics is provided in Section 6.4.

6.1 Applying the TD-MRF model to the olive oil market

The term “olive oil lobby” includes all the agents in the agricultural value chain, from producers to distributors. Figure 2 displays graphically the olive oil value chain as a linear structure which consists of three categories in which agents of different nature participate with a high degree of specialization, from the first level (producers) to the third one (distributors). It allows also to visualize that the main categories are divided into sub-categories (e.g., the olive oil industry is formed by oil mills and bottling plants). In closer detail, we can see from Fig. 2 that the different pressure groups that appear in this sector make up the following panorama: let us use LPr,LOOI and LD for the lobbies corresponding to producers, the olive oil industry and distributors respectively with weights that reflect their importance within the value chain, wPr, wOOI and wD respectively.

Fig. 2
figure 2

Olive oil value chain: lobbies and their corresponding weights

Another factor of complexity in this sector that prevents the accurate calibration of the weight of each of the pressure groups is that each lobby uses different weapons to further their interests. For instance, agricultural lobbies, at the bottom of the value chain, apply pressure with public displays of discontent whereas other olive oil lobbies apply pressure more privately. For reasons of public well-being, the protests staged by these types of lobbies (agricultural producers) enjoy public support in contrast to what happens with other types of lobbying. Additionally, the power of intervention held by both oil producers and the oil industry, is evidenced in every process present in the reform of the EU’s agricultural policy (see [2]).

Now, we are going to apply the TD-MRF model and the results obtained in previous sections, to the olive oil market. From previous results, the olive oil price outcomes can be viewed as a dynamic graphical model (see Fig. 1) where the neighbourhoods are the lobbies:

Proposition 3

The neighbourhood of any agent ai is its lobby: N(ai) = LProrLOOIorLD.

Moreover, our model provides the joint probability distribution of each lobby, thereby expressing the expectation of each lobby with regard to the price of olive oil.

Theorem 5

Each of the three lobbies LPr,LOOI,LD considered to be a DGM, is a TD-MRF. Thus, the corresponding joint distribution function which gives the likelihood of reaching a price from the perspective of each lobby is

$$ \begin{array}{@{}rcl@{}} &&\mathbb{P}_{L_{Pr}}[P^{t}] = \frac{1}{Z} e^{-f_{L_{Pr}}(P^{t})}, \quad \mathbb{P}_{L_{OOI}}[P^{t}] \\ &&= \frac{1}{Z} e^{-f_{L_{OOI}}(P^{t})} \quad \text{and}\quad \mathbb{P}_{L_{D}}[P^{t}] = \frac{1}{Z} e^{-f_{L_{D}}(P^{t})}, \end{array} $$

where Z denotes a normalizing constant and the f’s are the corresponding energy functions of each lobby.

Concrete energy functions for lobbies shall be studied in short (see next subsection). Additionally, the likelihood of reaching an aggregate olive oil price (corresponding to the price outcomes) is stated by the following theorem:

Theorem 6 (The olive oil price outcomes are a TD-MRF)

The olive oil price outcomes, viewed as a DGM, are a TD-MRF. Its corresponding joint distribution \(\mathbb {P}[P^{t}]\) is equal to

$$ \begin{array}{@{}rcl@{}} &&\mathbb{P}[P^{t}]= \frac{1}{Z} e^{-[(f(P^{t}_{L_{Pr}}) + f(P^{t}_{L_{OOI}})+ f(P^{t}_{L_{D}})] } = \mathbb{P}[P^{t}] \\ &&= \frac{1}{Z} e^{-f(P^{t}_{L_{Pr}})} e^{-f(P^{t}_{L_{OOI}})} e^{-f(P^{t}_{L_{D}} ) } \end{array} $$

as the product of the expectations of lobbies with regard to the olive oil price up to a constant.

The importance of each stakeholder involved in the olive oil price outcomes shall be detailed here from a quantitative perspective using of the specific weigh w. Although in general terms, the olive oil industry has more weight than the rest of the agents (the olive oil industry has a greater ability to influence the institutional course of actions than other agents) the producers enjoy implicit public support to further increase their specific weight. As an application of theoretical developments of Sections 3 and 4, a precise form of calculating these specific weights w is provided by theorem 4 (together with proposition 4) which, in short, states that a weight may be defined for each lobby LPr,LOOI,LD by comparing its joint distribution to the distribution of the price outcomes:

$$ w^{t}_{L_{Pr}}= \frac{\mathbb{P}_{L_{Pr}}[P^{t}]}{\mathbb{P}[P^{t}]},\quad w^{t}_{L_{P}}= \frac{\mathbb{P}_{L_{OOI}}[P^{t}]}{\mathbb{P}[P^{t}]}, \quad w^{t}_{L_{P}}= \frac{\mathbb{P}_{L_{D}}[P^{t}]}{\mathbb{P}[P^{t}]}. $$
(3)

These properties, derived from the DGM model, have been expressed in terms of a Gibbs distribution but they can be expressed in terms of a slightly more general function: the Bolztmann distribution, which is a key tool in thermodynamics. Hence, in order to reach more conclusions about the price outcomes in the presence of lobbying, we will proceed by studying this considering thermodynamics where notions and results shall be properly applied to our scenario with a specific economic meaning. This analysis will be carried out in Section 6.3, after the appropriate energy functions for the influence groups of the oil sector are studied in depth, in Section 6.2.

6.2 Energy functions in the study of the olive oil

Gaussian distribution describes those data sets in which most values cluster around the centre of the range. Compared with the previously described pattern that energy functions should follow, Gaussians measure the distance between price and average price (the template). These are suitable for those products whose price takes values within a small range, in an interval [pmin,pmax] of short length pmaxpmin. That is, products whose price takes approximately the same values. Further, a Gaussian distribution fits well with those products whose price suffers from small, smooth variations. Meanwhile, the log-linear distribution is used to manage output-contexts where an analysis of the simultaneous effects of several inputs is needed. This could be useful for describing the associations and interrelationships amongst numerous inputs and, therefore, also useful for the price of those products which depends on many interrelated factors.

The olive oil varieties most commonly used for human consumption are (in decreasing order of quality) extra virgin olive oil EVOO, virgin olive oil VOO, lampante olive oil LOO and refined olive oil ROO. They are categorized according to their acidity levels and other characteristics which are regulated by the corresponding competent body:

In Southern Spain (Andalusia), one of the fundamental pillars of the Andalusian economy is the agri-food sector (olive oil and its derivatives, wine products, meat, fruits and vegetables and seafood) which accounts for 10% of the Andalusian Gross Domestic Product (GDP) by contributing 10% of the total labour force. It is not surprising that olive oil prices are periodically displayed on official platforms of the Government of Spain. Under normal economic and environmental conditions, olive oil in all its varieties is an example of product whose price has very few fluctuations as shown in Fig. 2.

Another argument that supports the choice of Gaussians as energy functions and not the log linear ones, is that, despite the fact that several interconnected factors affect the price of all the varieties of olive oil, in a simplification of the model made by experts in olive oil production, it is only the weather conditions that are considered to be really determining factors: the water index and to a lesser degree, temperature. Hence, Gaussian distribution shall be used (as energy functions) for predicting the prices from the perspective of the lobbies and, consequently, for forecasting the aggregate price.

These Gaussian functions will be different for each lobby, that is, with a mean and a specific variance for each one. In these types of distributions, in general the standard deviation represents the degree of deviation with regard to the mean (the higher the standard deviation is, the further from the mean it is and thus, the more spread out it is). In the context we are using (with a mean and a specific variance for each lobby) it represents how much the olive oil price deviates from the expected mean in the opinion of the lobby. These opinions about the price given by the lobbies are often a combination of economic reality (the price they actually expect to reach under the economic circumstances) and political aspirations (the target price for which they push).

As mentioned, there are various platforms that reflect the variations in the price of olive oil varieties. In Andalusia (Southern Spain), one of the most popular amongst olive producers and oil producers is PoolRed (see http://www.poolred.com/) which is used to predict the right moment (that is, when prices are rising) to sell the harvests to the oil companies. Both olive and oil producers use this platform to try to answer the following two questions, which are key when conducting price negotiation processes:

figure d

Thus, Table 3 provides an overview for computing the price outcome prediction by using the TD-MRF model.

Table 3 Aggregate and minimum market and lobby’s price in the Gaussian case

6.3 An example of price outcome prediction by using the TD-MRF model

The aim of this section is to develop an example in order to illustrate how our model of price prediction works. For this, data have been extracted from https://www.olimerca.com/precios/tipoInforme/3 in a time period from 2017 to 2020 consisting of the monthly average price of certain variety of olive oil (extra virgin EVOO) for 7 major specialist retailers/hypermarkets. Thus, we assume that there are 7 lobbies L1,L2,L3,L4,L5,L6,L7 corresponding to these 7 big stores with data displayed in Table 4:

On one hand, the probability distribution for each lobby Li is given by HCode

$$\mathbb{P}_{L_{i}}[P^{t_{0}}=p] = \frac{1}{Z} \bigg(e^{\frac{(p -\mu_{i} )^{2}} {{\sigma_{i}^{2}} } }\bigg)^{-1}$$

with p representing a certain price at a time instant t0. We know that the probability distribution \(\mathbb {P}_{L_{i}}\) provides the price prediction from the perspective of the lobbies and this prediction is based on the mean μi and the variance \({\sigma _{i}^{2}}\), which are specific for each lobby. These parameters can be derived from Table 4.

Table 4 Monthly average price of EVOO for specialist major retailers

On the other hand, the price of each variety of olive oil oscillates within a specific range (see Fig. 3). From the perspective of each lobby, there are minimum and maximum thresholds for prices as part of the price wars between the supermarkets. These are \(p_{i}^{min}\) and \(p_{i}^{Max}\). In order to reach a common domain, let \(\underline {p}= inf \{ p_{i}^{min}| i=1, \ldots , 7\}\) and \(\overline {p}= sup \{ p_{i}^{Max}| i=1, \ldots , 7\}\) be the infimum and supremum respectively so that \(\underline {p}\leq p \leq \overline {p}\). For each lobby Li, the probability of EVOO price reaching values p which exceed the upper threshold, \(p\in [\overline {p}- p_{i}^{Max}, \overline {p}]\) must be close to zero, as actually happens when computed by using Gaussian energy functions since the further the deviation, the greater the exponent and the lower the quotient (which is the probability). The mean and variance of each lobby as well as other related parameters are displayed in Table 5:

Fig. 3
figure 3

Source: Government of Spain, at https://www.mapa.gob.es/es/agricultura/temas/producciones-agricolas/aceite-oliva-y-aceituna-mesa/Evolucion_precios_AO_vegetales.aspx

Olive oil price in Spanish market.

Table 5 Mean and variance for each lobby

In order to carry out the necessary computations, let us apply some properties of Gaussian functions to the joint probability whose output are the minimum price p0 at time t0:

$$ \begin{array}{@{}rcl@{}} \mathbb{P}[P^{t_{0}}\leq p] &=& \frac{1}{Z} \underset{p_{j}\leq p}\sum \quad \underset{i\in \mathbb{N}}\prod \bigg(e^{\frac{(p_{j} -\mu_{i} )^{2}}{{\sigma_{i}^{2}}}}\bigg)^{-1}\\ &=& \frac{1}{Z} \underset{p_{j}\leq p}\sum \!\!\quad e^{- \underset{i\in \mathbb{N}}\sum \frac{(p_{j} - \mu_{i} )^{2}}{{\sigma_{i}^{2}}}} = \frac{1}{Z} \underset{p_{j}\leq p}\sum \quad\!\! e^{- \frac{(p_{j} - \mu)^{2}}{\sigma^{2} }}, \end{array} $$

for \(\mu ={\sum }_{i=1}^{n} \mu _{i} , \sigma ^{2} ={\sum }_{i=1}^{n} {\sigma _{i}^{2}}\). This property can also be applied to the partition function Z, whose explicit expression was given at remark 1 as \(Z={\sum }_{\text {prices} p_j} e^{-{\sum }_{i\in \mathbb {N}} f_{L_{i}}(P^{t})} \). According to the selected energy functions \(f_{L_{i}}(P^{t})\), Z is

$$ \begin{array}{@{}rcl@{}} &&Z={\sum}_{\text{prices} p_{j}} e^{- \underset{i\in \mathbb{N}}\sum \frac{(p_{j} -\mu_{i} )^{2}}{{\sigma_{i}^{2}}}}= {\sum}_{\text{prices} p_{j}} e^{- \frac{(p_{j} -\mu)^{2}}{\sigma^{2} }}, \qquad \\ &&\mu={\sum}_{i=1}^{n} \mu_{i} ,\sigma^{2} ={\sum}_{i=1}^{n} {\sigma_{i}^{2}} \end{array} $$

The calculation of Z entails certain difficulties in practice. This example is intended to be illustrative, in addition to the use of our TD-MRF model prediction, of how to proceed in the calculation of the partition function.

Under certain circumstances, Z can be considered equal to 1. For this, let us remind the reproductive property (see [14]):

Proposition 4

If two independent random variables which follow a certain distribution are added together, the resulting random variable has a distribution of the same type as that of the summands.

By applying this property to Z itself, we can conclude that Z has an exponential distribution, Z = eλ for some parameter λ. Therefore, Z can be considered equal to 1 when λ = 0.

In this example however, we will consider that this is not the case and a detailed calculation of Z shall be performed. The necessary information for computing \(\mathbb {P}[P^{t_{0}}\leq p] = \frac {1}{Z} \underset {p_{j}\leq p}\sum \quad e^{- \frac {(p_{j} -\mu )^{2}}{\sigma ^{2} }}\) is given in Table 6:

Table 6 Mean and variance for each lobby

According to the values \(\underline {p}= 2.98\) and \(\overline {p}=5.59\), let us consider the sample S = {2.9,3,3.1,3.2,3.3,3.4,3.5, 3.6,3.7,3.8,3.9,4,4.1,4.2,4.3,4.4,4.5,4.6,4.7,4.8,4.9, 5,5.1,5.2,5.3,5.4,5.5,5.6,5.7}. Thus, the probability of reaching a minimum market price p = 5.7 is given by the formula

$$ \begin{array}{lcl} \mathbb{P}[P^{t_{0}}\leq p] = \frac{1}{Z}\underset{p_{j}\leq p}\sum \quad e^{- \frac{(p_{j} -\mu)^{2}}{\sigma^{2} }}= \frac{1}{Z} \underset{p_{j}\in S}\sum \quad e^{- \frac{(p_{j} -27.718)^{2}}{1.302}}=\\ \\ = \frac{1}{Z} \Big(e^{- \frac{(2.9 -27.718)^{2}}{1.302}} + e^{- \frac{(3 -27.718)^{2}}{1.302}} + e^{- \frac{(3.1 -27.718)^{2}}{1.302}} + e^{- \frac{(3.2 -27.718)^{2}}{1.302}} + e^{- \frac{(3.3 -27.718)^{2}}{1.302}} +\\ e^{- \frac{(3.4 -27.718)^{2}}{1.302}} + e^{- \frac{(3.5 -27.718)^{2}}{1.302}} + e^{- \frac{(3.6 -27.718)^{2}}{1.302}} + e^{- \frac{(3.7 -27.718)^{2}}{1.302}} + e^{- \frac{(3.8 -27.718)^{2}}{1.302}} +\\ e^{- \frac{(3.9 -27.718)^{2}}{1.302}} + e^{- \frac{(4 -27.718)^{2}}{1.302}} + e^{- \frac{(4.1 -27.718)^{2}}{1.302}} + e^{- \frac{(4.2 -27.718)^{2}}{1.302}} + e^{- \frac{(4.3 -27.718)^{2}}{1.302}} +\\ e^{- \frac{(4.4 -27.718)^{2}}{1.302}} + e^{- \frac{(4.5 -27.718)^{2}}{1.302}} + e^{- \frac{(4.6 -27.718)^{2}}{1.302}} + e^{- \frac{(4.7 -27.718)^{2}}{1.302}} + e^{- \frac{(4.8 -27.718)^{2}}{1.302}} +\\ e^{- \frac{(4.9 -27.718)^{2}}{1.302}} + e^{- \frac{(5 -27.718)^{2}}{1.302}} + e^{- \frac{(5.1 -27.718)^{2}}{1.302}} + e^{- \frac{(5.2 -27.718)^{2}}{1.302}} + e^{- \frac{(5.3 -27.718)^{2}}{1.302}} + \\ e^{- \frac{(5.4 -27.718)^{2}}{1.302}} + e^{- \frac{(5.5 -27.718)^{2}}{1.302}} + e^{- \frac{(5.6 -27.718)^{2}}{1.302}} + e^{- \frac{(5.7 -27.718)^{2}}{1.302}} \Big) = \frac{1}{Z} 2.0314E-162 , \end{array} $$

supported by the information given in Table 7:

Table 7 Calculation of the numerator of \(\mathbb {P}[P^{t_{0}}\leq p]\) for p = 5.7

Similar computations must be accomplished in order to determine the “partition function” Z, for which the main difference is the range of prices pj considered. In general terms, all possible values for the EVOO price should be considered (pj ≥ 0). To be more realistic, by using a record of market EVOO price data, the range could be further scoped: in this way, we shall consider that 2 ≤ pj ≤ 6.5. Thus, Table 8 gathers all the information required for the computation of \(Z=\displaystyle {\sum }_{\text {prices} p_j} e^{- \frac {(p_{j} -\mu )^{2}}{\sigma ^{2} }} = \underset {2 \leq p_{j}\leq 6.5}\sum \quad e^{- \frac {(p_{j} -27.718)^{2}}{1.302}}\). Hence, \(\mathbb {P}[P^{t_{0}}\leq p] = \frac {1}{Z} 2.0314E-162= \frac {2.0314E-162}{7.034E-151}=2.88797E-12\).

Table 8 Computing Z

6.4 Analysis from thermoeconomics

Econophysics is a novel discipline (from the physicist E. Stanley in 1996) which interrelates economics and physics by applying statistical and mathematical techniques, particularly from thermodynamics (with methodologies around the Boltzmann-Gibbs distributions amongst others, see (Dimitrijevic 2015) . This mixture of principles and methods is called thermoeconomics, where physical standards like energy and temperature find an economic analogy. Actually, from this perspective, thermodynamic and economic processes run in parallel. This section is divided into two main parts. The first one is devoted to key notions and results from thermodynamics. The second part is an application of thermodynamic principles to our proposal of price outcomes from which significant market properties shall be derived.

6.4.1 Key notions from thermodynamics

The concepts and results in this paragraph may be found in [3] for instance.

  • The partition function Z. First, we pay attention to the normalizing constant Z whose basic role has so far been to ensure that the probability distribution sums to 1. In thermodynamics, Z is called the “partition function”. Far from simply being a constant (we shall see in short how a constant can also be a function), the analysis of Z shall allow us to reach conclusions about some properties of our economic process.

    Before giving the definition of Z, we must refer to the Boltzmann distribution, see [3]. While the most common form of joint probability distribution in MRF contexts is the Gibbs distribution (which has been used in this paper up until now) the one used in thermodynamics is called the Boltzmann distribution and it is simply a Gibss distribution depending on one parameter β with \(\beta =\frac {1}{k_{\beta } T}\) where kβ is the Bolztmann constant and T is the temperature. In most contexts, β is considered to be a Lagrange multiplier. Hereinafter, the Bolztmann constant kβ will be taken as being equal to 1 and β shall be referred to as the inverse temperature.

    According to this, the Boltzmann distribution of olive oil price outcomes is

    $$ \begin{array}{@{}rcl@{}} \mathbb{P}[P^{t}]= \frac{1}{Z} e^{-\beta [(f_{L_{Pr}}(P^{t}) + f_{L_{OOI}}(P^{t})+ f_{L_{D}}(P^{t})]}\\ \quad \text{for} \beta \text{the inverse of temperature}, \beta=\frac{1}{T}, \end{array} $$

    and that of lobbies is

    $$ \begin{array}{@{}rcl@{}} &&\mathbb{P}_{L_{Pr}}[P]= \frac{1}{Z} e^{-\beta (f_{L_{Pr}}(P^{t})}, \mathbb{P}_{L_{OOI}}[P]\\ &&= \frac{1}{Z} e^{-\beta (f_{L_{OOI}}(P^{t})}, \mathbb{P}_{L_{D}}[P]= \frac{1}{Z} e^{-\beta (f_{L_{D}}(P^{t})}. \end{array} $$

    In general, the partition function is defined as \(Z(\beta )= \sum e^{-\beta H(x_{1}, x_{2}, \ldots )}\) where H(x1,x2,…) is the Hamiltonian operator associated to random variables xi (i.e., it is the Legendre transformation of the corresponding Lagrangian operator). In the context of our work, the normalizing constant Z was explicitly isolated in remark 1. In this way, the partition function associated to a Bolztmann distribution is

    $$ Z= e^{-\beta f_{L_{Pr}}(P^{t})} + e^{-\beta f_{L_{OOI}}(P^{t})} + e^{-\beta f_{L_{D}}(P^{t})}, $$

    or equivalently, in its log-form, as

    $$ log Z= log \bigg(e^{-\beta f_{L_{Pr}}(P^{t})} + e^{-\beta f_{L_{OOI}}(P^{t})} + e^{-\beta f_{L_{D}}(P^{t})} \bigg). $$

    As can be seen, Z may be considered as function of β, Z(β) and function of the temperature, Z(T).

  • Expected value and ensemble average. Let A be any magnitude (mainly energy in thermodynamics) and let Ai be some A-microstates. Both the expected value and the ensemble average (these are actually the same concept) are tools for studying global properties from local (micro)states. The expected value (also termed as expectation value) of a magnitude A with microstates Ai is defined as usual:

    $$ \begin{array}{@{}rcl@{}} <A>= \sum\limits_{j} \mathbb{P}[A=A_{j}] \cdot A_{j}, \text{where} \mathbb{P}[A=A_{j}] \\ \text{is the likelihood of A to take the microstate} A_{j}. \end{array} $$

    On its part, the ensemble average < A > is defined as \( \begin {array}{lcl} <A>& = & \frac { {\sum }_{i} e^{-\beta A_{i}} \cdot A_{i} }{{\sum }_{i} e^{ -\beta A_{i}}} = \frac {{\sum }_{i} e^{-\beta A_{i}} \cdot A_{i} }{Z} \end {array} \) where \(Z={\sum }_{i} e^{-\beta A_{i}}\) is the partition function as stated before.

  • Relationship with partition function. The relationship between expected energy and the partition function Z may be obtained from the logZ-expression by first-order partially deriving logZ with respect to the inverse temperature β: \( <A>=-\frac {\partial Z}{Z \partial \beta }= -\frac {\partial log Z}{ \partial \beta } \)

6.4.2 Our contribution

The insights from thermodynamics could offer a new perspective on price outcomes in the presence of lobbying inasmuch as notions and results (see theorem 3 for instance) have certain parallels with them. Therefore, we proceed by analyzing the price outcomes using suitable microstates related to the lobbies. For later interpretations it is important to remember that the energy E, no matter which scenario it is referring to, symbolizes a flow rate that changes the magnitudes. According to some authors ([29]), the ensemble average of some daily repeated stochastic process consists of the daily variations in the process. Therefore here, regarding price outcomes, the energy of price shall represent the volatility of price as a rate of change.

Specifically, for our analysis we consider the magnitude price variations at time instant t which shall be termed as energy of price and denoted as E(Pt) in order to follow the thermodynamical notation as closely as possible. Moreover, while price outcomes refer to the whole market price (which is the global process), their study shall be based on the price estimates made by lobbies, which shall be taken as the micro levels. The price variations at the micro-level states are given by the energy functions of each lobby \(f_{L_{i}}\). Particularly, in the olive oil market the energy functions of the lobbies are \(f_{L_{Pr}}, f_{L_{OOI}}, f_{L_{D}}\). In consequence, the total energy of price at a time instant t is

$$ \begin{array}{lcl} E(P^{t}) & = & f_{L_{Pr}}(P^{t})+ f_{L_{OOI}}(P^{t}) + f_{L_{D}}(P^{t}). \end{array} $$

Next, both the expected value and the ensemble average (same concept, different perspectives) of the energy of price are examined. On one hand, the expected value is

$$ \begin{array}{lcl} <E(P^{t})> & = & {\sum}_{i} \mathbb{P}[E(P^{t})=e_{i}] \cdot \overbrace{f_{L_{i}}(P^{t})}^{e_{i}} = \\&&\\ & = & \mathbb{P}_{L_{Pr}}[E(P^{t})= f_{L_{Pr}}(P^{t})] \cdot f_{L_{Pr}}(P^{t})+\\ && \mathbb{P}_{L_{OOI}}[E(P^{t})= f_{L_{OOI}}(P^{t})] \cdot f_{L_{OOI}}(P^{t}) + \\ && \mathbb{P}_{L_{D}}[E(P^{t})= f_{L_{D}}(P^{t})]\cdot f_{L_{D}}(P^{t}). \end{array} $$

On the other hand, the concept known in thermodynamics as “ensemble average” is more easily identified by transforming the former expression into the following one:

$$ \begin{array}{lcl} <E(P^{t})>= & = & {\sum}_{i} \mathbb{P}[E(P^{t})=e_{i}] \cdot \overbrace{f_{L_{i}}(P^{t})}^{e_{i}} \\ &&\\ & = & \frac{e^{-\beta f_{L_{Pr}}(P^{t})}}{Z} f_{L_{Pr}}(P^{t}) + \frac{e^{-\beta f_{L_{OOI}}(P^{t}) }}{Z} f_{L_{OOI}}(P^{t}) + \frac{e^{-\beta f_{L_{D}}(P) }} {Z} f_{L_{D}}(P^{t}) \\ &&\\ & = & \frac{e^{-\beta f_{L_{Pr}}(P^{t})} f_{L_{Pr}}(P^{t})+ e^{-\beta f_{L_{OOI}}(P^{t})} f_{L_{OOI}}(P^{t}) + e^{-\beta f_{L_{D}}(P^{t}) } f_{L_{D}}(P^{t}) }{Z} \\ &&\\ & = & \frac{e^{-\beta f_{L_{Pr}}(P^{t})} f_{L_{Pr}}(P^{t})+ e^{-\beta f_{L_{OOI}}(P^{t})} f_{L_{OOI}}(P^{t}) + e^{-\beta f_{L_{D}}(P^{t}) } f_{L_{D}}(P^{t}) }{ e^{-\beta f_{L_{Pr}}(P^{t})} + e^{-\beta f_{L_{OOI}}(P^{t})} + e^{-\beta f_{L_{D}}(P^{t})} }. \\ && \end{array} $$

Now, we shall prove some useful properties of the expected energy of price.

Proposition 5

  1. 1.

    The expected energy of price is the sum of all the expected energy of these lobbies:

    $$ \begin{array}{lcl} <E(P^{t})> \!& = &\! <E_{Pr}(P^{t})> + <E_{OOI}(P^{t})> + <E_{D}(P^{t})> \end{array} $$
  2. 2.

    < E(Pt) > always takes positive values regardless of the values of the temperature.

  3. 3.

    The expected energy of price is increasing function of the temperature.

Proof

The result holds from the properties of the expectation values. Moreover, it has been shown in the former development since

$$ \begin{array}{@{}rcl@{}} <E(P^{t})> & = & <E_{Pr}(P^{t})> + <E_{OOI}(P^{t})> + <E_{D}(P^{t})> \\ &&\\ & = & \underbrace{\mathbb{P}_{L_{Pr}}[P^{t}] f_{L_{Pr}}(P^{t})}_{<E_{Pr}(P^{t})>} + \underbrace{\mathbb{P}_{L_{OOI}}[P^{t}] f_{L_{OOI}}(P^{t})}_{<E_{OOI}(P^{t})>} \\ &&+ \underbrace{\mathbb{P}_{L_{D}}[P^{t}]f_{L_{D}}(P^{t})}_{<E_{D}(P^{t})>}. \end{array} $$

The function Z is a decreasing function of β so it is logZ. Hence, the first partial derivative \(\frac {\partial log Z}{\partial \beta }\) should always be negative. In consequence, \(<E(P^{t})>=-\frac {\partial log Z}{\partial \beta }\) always takes positive values regardless of the values of β (or of the temperature).

Using the above property and the corresponding Bolztmann distribution from theorem 5, the first derivative of < E > with respect to β is

$$ \begin{array}{lcl} \frac{\partial <E>}{\partial \beta} & = & \frac{\partial <E_{Pr}>}{\partial \beta} + \frac{\partial <E_{OOI}>}{\partial \beta} + \frac{\partial <E_{D}>}{\partial \beta}\\ &&\\ & = & \frac{\partial \mathbb{P}_{L_{Pr}}[P^{t}]}{\partial \beta} f_{L_{Pr}}(P^{t}) + \frac{\partial \mathbb{P}_{L_{OOI}}[P^{t}]}{\partial \beta}f_{L_{OOI}}(P^{t}) + \frac{\partial \mathbb{P}_{L_{D}}[P^{t}]}{\partial \beta}f_{L_{D}}(P^{t})\\ &&\\ & = & \frac{ \partial \left( \frac{e^{-\beta f_{L_{Pr}}(P^{t})}}{Z}\right) }{\partial \beta} f_{L_{Pr}}(P^{t}) + \frac{\partial \left( \frac{e^{-\beta f_{L_{OOI}}(P^{t})}}{Z}\right) }{\partial \beta} f_{L_{OOI}}(P^{t}) \\ && + \frac{\partial \left( \frac{e^{-\beta f_{L_{D}}(P^{t})}}{Z}\right) }{\partial \beta} f_{L_{D}}(P^{t}) =\\ &&\\ & = & K \bigg(- f_{L_{Pr}}(P^{t})^{2} \mathbb{P}_{L_{Pr}}[P^{t}] - f_{L_{OOI}}(P^{t})^{2} \mathbb{P}_{L_{OOI}}[P^{t}] - f_{L_{D}}(P^{t})^{2}\mathbb{P}_{L_{D}}[P^{t}]\bigg) <0, \end{array} $$

with K a positive constant (as it is the result of algebraic operations of the normalizing constant Z). This proves that the expected energy is decreasing function of the inverse temperature, and the result follows. □

We address the economic interpretation of our former proposition 5. Regarding price volatility, the next theorem provides a main market property: the variations that the market price suffers may be computed separately (as the algebraic sum) from the variations in the target price of the lobbies.

Theorem 7

The expected market price volatility is equal to the algebraic sum of that of lobbies: < E(Pt) >=< EPr(Pt) > + < EOOI(Pt) > + < ED(Pt) > . Moreover, each of these may be explicitly computed once the corresponding energy functions have been selected:

$$ \left\{ \begin{array}{lcl} <E_{Pr}(P^{t})> & = & \frac{e^{-\beta f_{L_{Pr}}(P^{t})}}{Z} f_{L_{Pr}}(P^{t})\\ <E_{OOI}(P^{t})> & = & \frac{e^{-\beta f_{L_{OOI}}(P^{t})}}{Z} f_{L_{OOI}}(P^{t}) \\ <E_{D}(P^{t})> & = & \frac{e^{-\beta f_{L_{D}}(P^{t})}}{Z} f_{L_{D}}(P^{t}) \end{array} \right. $$

Before concluding the volatility issue, it should be noticed that, according to the second property of proposition 5, mean volatility is always positive, i.e., market prices move upwards.

In econophysics, the temperature of an economic system is linked with money: the income of involved agents, wages or monetary value of goods and services (particularly with Gross Domestic Product GDP at a global scale, as a measure of the system efficiency), see [7]. Thus, we shall assume here that the temperature of an economic process is its financial solvency (soundness or economic health). In this line, higher temperatures are more desirable and higher differences of temperatures should generate more profit. Thus, the third property of proposition 5 has the following economic meaning:

Corollary 3

The expected market price volatility is an increasing function of the solvency of the olive oil market. That is, the higher the market solvency, the higher the volatility of the price.

7 Conclusions

This paper presents a structural model for explaining what kind of market price emerges when there are multiple agents pushing for their prices (i.e., markets with the presence of lobbying). Apart from offering a mathematical/AI model that successfully unravels a complex real-world problem, the significance of our approach is that it delimits an intricate problem within the bounds of some probability computations providing formulas that allow the calculation of the price estimates of a product. The model is also endowed with a quantitative study which determines the specific weight of 7each lobby in the process of pricing outcomes thus solving a problem for which up until now, there have only been qualitative references. This study has been complemented with an analysis from thermoeconomics (econophysis), a relatively young science that offers a novel perspective from which our model can offer additional results.

The originality of our model lies in the use of time dynamic Markov random fields (TD-MRF model) since this problem has been addressed from the perspective of Bayesian approaches, neural networks and related techniques. The use of TD-MRFs means that significantly less information is required (only from cliques) than for the aforementioned methodologies. The consequence is a higher computing speed and a lesser need for storage capacity. Using TD-MFRs also means breaking free from the constraint of a given distribution function.

Our TD-MRF model has been tested on real olive oil prices (the olive oil market in Andalusia, Spain) with encouraging results for a challenging sector in which opacity in the entry of oil shipments throughout the season, with olives/olive oil being stored waiting for the price to rise, makes it very difficult to forecast the prices. In the model-validation process, both the base functions (energy functions) and the partition function have been chosen and computed in detail when, to the best of our knowledge, there is a notable absence of discussion on either how to select the energy functions (to best fit the context) or how to compute the partition function Z in the literature. As we mean the value of Z, not an approximation, this could be considered as a limitation of the model since the final output depends on how good the estimate of Z is (as previously mentioned, the literature gives no indication on how to delimit the range considered in the calculation of Z in reality).

As mentioned in the Introduction section, our TD-MRF model can be applied to any lobbying context although it is possible that fine-tunings of the structural model might be necessary if the lobbies considered have their own characteristics. Furthermore, understanding the “lobbying context” in a broad manner would allow the application of the model to all those scenarios in which the final goal is to obtain a single output as the result of unifying different perspectives (such as reaching a consensus). Thus, the possible extensions of the TD-MRF model should focus on replacing the numerical datasets with wording ones (or integrating both types of datasets).