1 Introduction

Broadly speaking, autocorrelation is the comparison of a time series (TS) with itself at a different time. Autocorrelation measures the linear relationship between lagged values of a TS and is central to numerous forecasting models that incorporate autoregression. In this study, this idea is borrowed and incorporated to a recommender system (RS) with a forecasting purpose.

RSs apply knowledge discovery techniques to the problem of helping users to find interesting items. Among the wide taxonomy of recommendation methods, Collaborative Filtering (CF) [1] is the most popular approach for RSs designs [2]. CF is based on a intuitive paradigm by which items are recommended to an user looking at the preferences of the people this user trusts. User-based or item-based [3] recommendations are two common approaches for performing CF. The first evaluates the interest of a user for an item using the ratings for this item by other users, called neighbours, that have similar rating patterns. On the other hand, item-based CF considers that two items are similar if several users of the system have rated these items in a similar fashion [4]. In both cases, the first task when building a CF process is to represent the user-items interactions in the form of a rating matrix. The idea is that given a rating data by many users for many items it can be predicted a user’s rating for an item not known to the user, or identify a set of N items that user will like the most (top-N recommendation problem). The latter is the approach followed in this study.

The goal of this study is to introduce a point forecasting methodology for univariate TS using a item-based CF framework. In particular, to study the behaviour of this methodology in short lenght and unvenly spaced TS. On one side, the distinct values of a TS are considered the space of users in the RS. On the other, items are represented by the distinct values of a lagged version of this TS. Ratings are obtained by studying the frequencies of co-occurrences of values from both TS. Basically, the forecast is produced after averaging the top-N set of recommended items (distinct values of the shifted TS) to a particular user (a given value in the original TS).

2 Related Work

TS forecasting has been incorporated into recommendation processes in several works to improve the users’ experience in e-commerce sites. However, to the best of the authors’ knowledge, the use of a RS framework as a tool for producing forecasts in TS is new in literature.

2.1 Item-Based CF Approach

This section describes the basic and notation of an standard item-based CF RS, with a focus on the approach followed in this study. Mostly, CF techniques use a database as input data in the form of a user-item matrix \(\mathbf {R}\) of ratings (preferences). In a typical item-based scenario, there is a set of m users \(\mathcal {U}=\{ u_{1}, u_{2},...,u_{m} \}\), a set of n items \(\mathcal {I}=\{ i_{1}, i_{2},..., i_{n} \}\), and a user-item matrix , with \(r_{jk}\) representing the rating of the user \(u_{j}\) (\(1 \le j \le m\)) for the item \(i_{k}\) (\(1 \le k \le n\)). In \(\mathbf {R}\), each row represents an user \(u_{j}\), and each column represents an item \(i_{k}\). Some filtering criteria may be applied by removing from \(\mathbf {R}\) those \(r_{jk}\) entries below a predefined b threshold. One of the novelties of this study is the creation of an \(\mathbf {R}\) matrix from a TS, which is explained in the next Sect. 3. Basically, \(\mathbf {R}\) is created after using the TS under study and its lagged copy, and considering them as the space of users and items, respectively. The rating values (\(r_{jk}\)) are obtained by cross-tabulating both series. Instead of studying their relation with an autocorrelation approach, the frequency of co-occurrence of values is considered. In order to clarify how the transformation of a TS forecasting problem is adapted in this study using a RS, Table 1 identifies some assumed equivalences.

Table 1. Some equivalences used in this study to build the forecasting recommender system (RS) from a given time series (TS).

The model-building step begins with determining a similarity between pair of items from \(\mathbf {R}\). Similarities are stored in a new matrix , where \(s_{ij}\) represent a similarity between items i and j (\(1 \le i,j \le n\)). \(s_{ij}\) is obtained after computing a similarity measure on those users who have rated i and j items. Sometimes, to compute the similarity is set a minimum number of customers that have selected the (ij) pair. This quantity will be referred as the c threshold. Traditionally, among the most commonly similarity measures used are the Pearson correlation, cosine, constraint Pearson correlation and mean squared differences [5]. In this study it will be used the Cosine and Pearson correlation measures, but also, the Otsuka-Ochiai coefficient, which is borrowed from Geosciences [8] and used by leading online retailers like Amazon [9].

Some notation follows at this stage of the modelling. The vector of ratings provided for item i is denoted by \(\mathbf {r}_{i}\) and \(\bar{r}_{i}\) is the average value of these ratings. The set of users who has rated the item i is denoted by \(\mathcal {U}_{i}\), the item j by \(\mathcal {U}_{j}\), and the set of users who have rated both by \(\mathcal {U}_{ij}\).

Forecasting. The aim of item-based algorithm is to create recommendations for a user, called the active user \(u_{a} \in \mathcal {U}\), by looking into the set of items this user has rated, \(I_{u_{a}}\in \mathcal {I}\). For each item \(i \in I_{u_{a}}\), just the k items wich are more similar are retained in a set \(\mathcal {S}(i)\). Then, considering the rating that \(u_{a}\) has made also on items in \(\mathcal {S}(i)\), a weighted prediction measure can be applied. This approach returns a series of estimated rating for items different from those in \(I_{u_{a}}\) that can be scored. Just the top ranked items are included in the list of N items to be recommended to \(u_{a}\) (top-N recommended list). The k and N values has to be decided by the experimenter. In this study, each active user (\(u_{a}\)) was randomly selected from \(\mathcal {U}\) following a cross-validation scheme, which is explained next. To every \(u_{a}\), a set of recommendable items is presented (the forecast). The space of items to recommend is represented by the distinct values of the shifted TS. Since the aim is to provide a point forecast, the numerical values included in the top-N recommended list are averaged.

Evaluation of the Recommendation. The basic structure for offline evaluation of RS is based on the train-test setup common in machine learning [5, 6]. A usual approach is to split users in two groups, the training and test sets of users ( \(\mathcal {U}_{\,train} \cup \,\, \mathcal {U}_{\,test}=\mathcal {U}\) ). Each user in \(\mathcal {U}_{\,test}\) is considered to be an active user \(u_{a}\). Item ratings of users in the test set are split into two parts, the query set and the target set. Once the RS is built on \(\mathcal {U}_{\,train}\), the RS is provided with the query set as user history and the recommendation produced is validated against the target set. It is assumed that if a RS performs well in predicting the withheld items (target set), it will also perform well in finding good recommendations for unknown items [7]. Typical metrics for evaluation accuracy in RS are root mean square error (RMSE), mean absolute error (MAE) or other indirect functions to estimate the novelty of the recommendation (e.g., serendipity). The MAE quality measure was used in this study.

3 Creation of the Rating Matrix

This section describes the steps to transform a given TS (\(\mathbf {TS}\)) in a matrix of user-item ratings (\(\mathbf {R}\)). This should be considered the main contribution of this study. The remaining steps do not differs greatly from a traditional item-based approach beyond the necessary adaptations to the TS forecasting context.

Since the aim it is to emulate \(\mathbf {R}\), the first step is to generate a space of users and items from TS. Thus, TS values are rounded to the nearest integer which will be called TS\(_{\mathbf {0}}\). The distinct (unique) values of TS\(_{\mathbf {0}}\) are assumed to be \(\mathcal {U}\).

The second step is setting up \(\mathcal {I}\). For that, TS\(_{\mathbf {0}}\) is shifted forward in time time stamps. The new TS is called TS\(_{\mathbf {1}}\). The value of h depends on the forecasting horizon intended. Now, the distinct values of TS\(_{\mathbf {1}}\) set \(\mathcal {I}\). Once \(\mathcal {U}\) and \(\mathcal {I}\) have been set, they are studied as a bivariate distribution of discrete random variables. The last step is to compute the joint (absolute) frequency distribution of \(\mathcal {U}\) and \(\mathcal {I}\). Thus, it is assumed that \(n_{jk}\equiv r_{jk}\), where \(n_{jk}\) representes the frequency (co-occurrence) of the \(u_{j}\) value of TS\(_{\mathbf {0}}\) and \(i_{k}\) value of TS\(_{\mathbf {1}}\). These steps are summarized below.

figure a

Some Considerations. The followed approach experiences the same processing problems that a conventional item-based approach, namely, sparsity in \(\mathbf {R}\) and cold start situations (user and items with low or inexistent numbers of ratings). In large RS from e-commerce retailers, usually \(n\ll m\). However, under this approach \(n\simeq m\).

4 Experimental Evaluation

This section describes the sequential steps followed in this study for creating a item-based CF RS with the purpose of TS forecasting.

4.1 Data Sets

Two short length TS were used to evaluate this methodology. These TS were obtained after scraping the best-selling products from the Italy’s Amazon webpage between 5th April, 2018 and 14th December, 2018. The scraping process setting was sequential. It consisted of obtaining the price from the first to the last item of each category, and from the first category to the latest. The first TS (TS-1) was created after averaging the evolution of best-selling products’ prices included in the Amazon devices category. The second TS (TS-2) represents the price change of the most dynamic best-selling product from this marketplace site. Italy’s Amazon webpage has experienced changes during the eight month period of the crawling process due to commercial reasons. The main change is related to the number of best-selling products being showed for each category (e.g., 20, 50 or 100). This causes the period of the scraping cycles is not similar and the TS derived from this data (TS-1 and TS-2) are not equally spaced at time intervals. In this study, the data obtained in the scraping process will be considered as a sequence of time events. Therefore, it is worth to note that the values of TS-1 were obtained after averaging a different number of products (20, 50, or 100), according to the number of products shown by Amazon at different times. Their main characteristics are shown in Table 2.

Table 2. TS characteristics used in the methodology testing (P: percentile; min: minimum value, max: maximum value; in €).

4.2 Creation of the Rating Matrices

A rating matrix \(\mathbf {R}\) was created for TS-1 and TS-2 according to the algorithm described in Sect. 3. As mentioned before, the different duration of scraping cycles causes unevenly spaced observations in the TS. Also, it represents an additional difficulty when setting a constant forecasting horizon (h) as described in the algorithm. Therefore, in this study, it will be necessary to assume that the forecasting horizon coincides with the duration of the scraping cycle, independently of its length in time. In practice, this means that the TS representing the items (TS\(_{\mathbf {1}}\)) was obtained after lagging one position forward in time with respect the original TS representing the users (TS\(_{\mathbf {0}}\)). After empirical observation, h approximately takes values 1 h, 2 h or 4 h depending on Amazon shows the 20, 50 or 100 best-selling products for each category, respectively. The lack of proportionality between the duration of scraping cycles and the number of best-selling product shown is explained by technical reasons in the crawling process (structure of the Amazon’s webpage for each product affecting the depth of the scraping process). Those ratings with a value b \(\le 3\) were removed from the corresponding \(\mathbf {R}\).

4.3 Similarity Matrix Computation

Three symmetric functions were used in this study to calculate different \(\mathbf {S}\) for TS-1 and TS-2. The Pearson correlation (1) and cosine (2) similarity functions are standards in the RS field. The third one, the Otsuka-Ochiai coefficient, incorporates a geometric mean in the denominator:

$$\begin{aligned} s_{\mathbf {1}}(i,j)= & {} \frac{\sum _{u \in \mathcal {U}_{i,j}}(r_{u,i}-\bar{r}_{i})(r_{u,j}-\bar{r}_{j})}{\sqrt{\sum _{u \in \mathcal {U}_{i,j}}(r_{u,i}-\bar{r}_{i})^{2}} \,\, \sqrt{\sum _{u \in \mathcal {U}_{i,j}}(r_{u,j}-\bar{r}_{j})^{2}}} \end{aligned}$$
(1)
$$\begin{aligned} s_{\mathbf {2}}(i,j)= & {} \frac{\mathbf {r}_{i} \,\,\bullet \,\, \mathbf {r}_{j}}{\Vert \mathbf {r}_{i} \Vert _{2} \,\, \Vert \mathbf {r}_{j} \Vert _{2}}=\frac{\sum _{u \in \mathcal {U}_{i,j}} r_{u,i} \,\,\bullet \,\, r_{u,j}}{\sqrt{\sum _{u \in \mathcal {U}_{i,j}} r^{2}_{u,i}} \,\, \sqrt{\sum _{u \in \mathcal {U}_{i,j}} r^{2}_{u,j}}} \end{aligned}$$
(2)
$$\begin{aligned} s_{\mathbf {3}}(i,j)= & {} \frac{\mid \mathcal {U}_{ij} \mid }{\sqrt{\mid \mathcal {U}_{i} \mid \, \mid \mathcal {U}_{j}\mid }} \\ \nonumber \end{aligned}$$
(3)

where \(\bullet \), \(\Vert \cdot \Vert _{2}\) and \(\mid \cdot \mid \) denote the dot-product, \(l_{2}\) norm, and cardinality of the set, respectively. These similarity measures were calculated when the minimum number of user rating a given items was c \(\ge 3\), and set the value of \(k=3\).

4.4 Generation of the Top N-Recommendation

This step begins by looking at the set of items the active user \(u_{a}\) has rated, \(I_{u_{a}}\). In particular, the interest is to predict ratings for those items \(j \notin I_{u_{a}}\) rated by user \(u_{a}\). Then estimated rating (\(\hat{r}_{u_{a},j}\)) for a given item \(j \notin I_{u_{a}}\) was calculated according to (4), where \(\mathcal {S}(j)\) denotes the items rated by the user \(u_{a}\) most similar to item j:

$$\begin{aligned} \hat{r}_{u_{a},j}= & {} \frac{1}{\sum _{i \in \mathcal {S}(j)} \, s(i,j) } \, {\sum _{i \in \mathcal {S}(j)} \, s(i,j) \,\, r_{u_{a},i}} \end{aligned}$$
(4)

The value of \(\hat{r}_{(u,j)}\) can be considered a score that is calculated for each item not in \(I_{u_{a}}\). Finally, the highest scored items are included in the top-N recommended list.

4.5 Evaluation

The set of users (\(\mathcal {U}\)) was splitted following a 80:20 proportion for obtaining the training and test sets of users (\(\mathcal {U}_{\,train}\) and \(\mathcal {U}_{\,test}\)), respectively. Every user in \(\mathcal {U}_{\,test}\) (20% of distinct values in TS\(_{0}\)) was considered an active user (\(u_{a}\)) to whom recommend a set of items. From the history of each \(u_{a}\), again the proportion 80:20 proportion was used to obtain the query and target sets, respectively. The produced recommendation was validated against the target set of each \(u_{a}\).

Evaluation Metric. The MAE value was obtained according to (5):

$$\begin{aligned} MAE= & {} \frac{\sum _{i=1}^{n} \mid e_{i} \mid }{n} \end{aligned}$$
(5)

where n represent the number of active users to whom a recommendation has been suggested. The N value to generate a top-N recommendation was set dynamically according to the length of items in the target set for each \(u_{a}\). Thus, the value of e is the difference between the average value of the top-N recommended items and the average value of the items in the target set.

4.6 Performance Results

The MAE results for assessing the quality of the forecasting proposal is shown in Table 3, for the two analysed TS and three similarity measures studied.

Table 3. MAE performance on both TS and the different similarity measures (\(s_{\mathbf {1}}\), \(s_{\mathbf {2}}\) and \(s_{\mathbf {3}}\): Pearson correlation, cosine and Otsuka-Ochiai similarities, respectively), in €.

It can be seen that Otsuka-Ochiai similarity (\(s_{\mathbf {3}}\)) yields a better performance on both TS studied. It is necessary to remind the characteristics of the TS used for testing this methodology (TS-1 and TS-2). They are characterized by unevenly spaced observations in the TS, but also, the variable forecasting horizon provided by the data acquisition environment. This represents a very complex environment for performing forecasting. In the case of forecasting the average price for a given category, or the price for a given product, the forecasting results could be considered an estimation of the trend of such prices (uptrend or downtrend) more than aiming to forecast an exact value of such prices. Other experiences have been accomplished to test this methodology with ozone (O\(_{3}\)) TS (results now shown). These experiences with conventional TS (evenly spaced observations and constant forecasting horizon) has yield a very good results in term of forecasting. Therefore, TS-1 and TS-2 should be considered to represent an extreme environment in which perform forecasting. One of the advantage of this approach is that does not require high computational power to operate, due to the analysis of the distinct rounded values of a given TS, which is independent of its length. Besides, the proposed approach is very intuitive and allows for further developments to be included and being deployed using any programming language.

4.7 Computational Implementation

The computational implementation was accomplished using ad hoc designed Python functions except those specified in Table 4. The first two purposes correspond to the creation of the Rating matrix (\(\mathbf {R}\)), and the remaining ones to the similarity matrix computation (\(\mathbf {S}\)). In particular, the last two ones are used in the computation of the cosine similarity measure used in this work.

Table 4. Python functions for specific tasks.

5 Conclusions

This study aims to produce a point forecast for a TS adopting a RS approach, in particular an item-based CF. To that end, basic elements in a RS (the users, items and ratings triple) was emulated using a TS as only input data. An autocorrelation-based algorithm is introduced to create a recommendation matrix from a given TS. This methodology was tested using two TS obtained from Italy’s Amazon webpage. Performance results are promising even the analyzed TS represent a very difficult setting in which to conduct forecasting. This is due to the TS are unevenly spaced and the forecasting horizon is not constant. Application of classical forecasting approaches (e.g., autoregression models) to this type of TS is not possible mainly due to the irregular time stamps in which observations are obtained. Thus, the introduced algorithm should be considered a contribution to the forecasting practice in both short length and unevenly spaced TS. Complementary, computational time required to obtain forecasting estimates is short due to such estimates are obtained considering distinct values of the TS are not all the values forming the TS. Further developments include to consider contextual information when building the RS and transforming the sequence of events to a TS with evenly spaced data.