Introduction

Online shopping has already influenced the purchasing behaviour of consumers. Today, buyers face an overload of information to select the most preferred goods. Recommender systems (RSs) are developed to recommend appropriate products to consumers on the basis of their historical records. An effective RS service can boost sales by building and increasing customer loyalty (Aggarwal 2016). Reviews of RS technologies can be found in (Aggarwal 2016; Haruna, et al. 2017; Adomavicius and Kwon 2015; Kunaver and Požrl 2017; Kotkov et al. 2016; Zhang et al. 2017; Ma et al. 2018). RSs are typically categorised into three types: collaborative filtering, content-based, and hybrid (Aggarwal 2016). Since these types are based on user profiles including their historical ratings and purchase records (Lika et al. 2014), the RSs have insufficient information to learn the interests of new users. Lacking information for newly joined users is known as the cold-start problem, which is a critical challenge of RS (Kunaver and Požrl 2017; Lika et al. 2014; Volkovs et al. 2017; Viktoratos et al. 2018). A discussion and review of the cold-start problem can be found in Lika et al., (2014).

Cold start problems have significant influence on high-end consumer electronics such as smartphones, laptops, game consoles, and audio–video equipment. Since their electronic components and technologies are frequently updated, recommendations based on historical purchasing records could possibly not be applicable to new products. The motivation of this research is to propose an expert system for product recommendations that is based on the current individual users’ preferences and expert knowledge elicited from cognitive comparison rating (CCR) method. The proposed model does not have such a cold-start problem, as historical information is not used for the recommendations.

The evaluation of expert judgments and user preferences for products is complicated as numerous products such as the aforementioned high-end consumer electronics consist of different attributes. Multi-criteria decision making (MCDM) methods, which can measure both user preferences and expert judgments for multiple product attributes, have been used in RSs (van Capelleveen et al. 2019; Song 2018; Zhang et al. 2018). The analytic hierarchy process (AHP), a classical MCDM, has been adopted to evaluate user preferences for different product attributes (Hinduja and Pandey 2018; Karthikeyan et al. 2017; Pamučar et al. 2018; Wang and Tseng 2013). CCR, an improved alternative to AHP, is introduced in this study for evaluating expert judgments and user preferences. As an approach to rectify the mathematical representation problem of the perception of the paired differences in AHP, CCR is an ideal method for weighing product attributes and defining numerical values of nominal scales based on user preferences (Yuen 2009, 2012, 2014a; b).

To provide product recommendation services, the hierarchical clustering (HC) method is used to group the products based on the evaluation results of CCR. Different clustering analysis methods have been applied to identify groups of products that have similar attributes with respect to consumer preferences (Nilashi 2017; Frémal and Lecron 2017; Katarya and Verma 2017; Selvi and Sivasankar 2019). HC (Murtagh 1983; Ward Jr 1963; Han et al. 2011) is a popular clustering method; for example, HC has been adopted in other RSs (Selvi and Sivasankar 2019; Gupta and Patil 2015; Zheng et al. 2013; de Aguiar Neto et al. 2020). A hierarchical decomposition of a dataset can be built by HC in the form of a tree graph (called a dendrogram). The major advantage of HC is that the dendrogram can be easily interpreted since the distances between the objects are directly presented. HC has limitations when applied to product-recommendation cases. Firstly, the attributes of products are equally considered; however, different consumers can have different preferences for each attribute. Secondly, the product attributes of nominal scales cannot be directly used in clustering processes. To address these limitations, CCR is used to weigh product attributes and define numerical values of nominal scales with respect to user preferences. A novel system, cognitive comparison-enhanced hierarchical clustering (CCEHC), is proposed to provide product recommendations with respect to the current individual user’s rating preferences. The new method provides a solution to the cold start problem in RSs by using the expert knowledge elicited from CCR instead of the users’ historical data. In addition, non-specialized consumers can express their references to interact with the system.

This paper offers a significant extension of the previous initial work (Guan and Yuen 2015; Guan 2018), especially for the sections of methods, experiments, comparisons, and discussions. The remainder of this paper is organised as follows. Section 2 proposes the novel CCEHC system. Section 3 demonstrates the validity and feasibility of the proposed method using a laptop recommendation case, for which the dataset was collected in this study. Section 4 discusses the advantages and limitations of the proposed approach. Section 5 presents the application of CCEHC for workstation recommendations using an open dataset. Finally, Sect. 6 concludes the study.

Cognitive comparison enhanced hierarchical clustering

The procedures of the CCEHC model are presented in Fig. 1. In Steps 1 and 2, the attributes of the products are structured as an attribute tree. According to the attribute tree, a raw data table is collected from different sources. In Step 3, CCR is applied to measure the nominal attribute values and attribute weights with user preferences. The resulting table is normalised in Step 4. In Step 5, the values of the products are produced by aggregating the normalised table and attribute weights. In Step 6, a personalised top-N recommendation is produced by ranking the product values. In the final step, the products are clustered by HC, and similar products can be recommended to the different users.

Fig. 1
figure 1

CCEHC procedures

Specifying attributes

Detailed product information can be obtained from different sources including manufacturer websites, product engineers, and retailers. A product is represented as a group of attributes, \(\left\{ {\delta_{i} } \right\} = \left( {\delta_{1} , \delta_{2} , \ldots ,\delta_{i} , \ldots ,\delta_{n} } \right)\), where \(\delta_{i}\) is the ith attribute of the product. Attributes can have sub-attributes. For example, an attribute \(\delta_{i}\) is represented by ni sub-attributes, \(\left\{ {\delta_{i,j} } \right\} = \left( {\delta_{i,1} , \delta_{i,2} , \ldots ,\delta_{i,j} , \ldots ,\delta_{{i,n_{i} }} } \right),\) where \(\delta_{i,j}\) is represented by the jth sub-attribute of \(\delta_{i}\); the attribute \(\delta_{i,j}\) is represented by ni,j sub-attributes, \(\left\{ {\delta_{i,j,k} } \right\} = \left( {\delta_{i,j,1} , \delta_{i,j,2} , \ldots ,\delta_{i,j,k} , \ldots ,\delta_{{i,j,n_{i,j} }} } \right)\), where \(\delta_{i,j,k}\) is the kth sub-attribute of \(\delta_{i,j}\). The attributes of the different levels are structured as an attributes tree. A sample of the laptop attribute tree is presented in Fig. 2 in Sect. 3.

Fig. 2
figure 2

Attributes tree for laptops with weights for User A

Preprocessing data

The leaf attributes, denoted as L, are attributes without sub-attributes. The measurable values of leaf attributes are collected from different sources, as mentioned in Sect. 2.1. Product dataset D consisting of m products and l leaf attributes is denoted as \(D=\left\{{d}_{\alpha \beta }|\forall \alpha \in \left(1,\dots ,m\right),\forall \beta \in \left(1,\dots ,l\right),\right\}\). An example of a laptop data matrix is presented in Sect. 3.2. D cannot be directly clustered since it could contain nominal scales that do not have a natural ordering. In the proposed CCEHC system, the nominal scales are substituted by the numerical values measured using the CCR approach presented in the next step.

Evaluating user preferences by CCR

The user preferences for different attributes and nominal scales are measured using the CCR method. A sample of the CCR interface is displayed in Fig. 3.

Fig. 3
figure 3

Cognitive comparison Interface for evaluating laptop attributes

Table 1 is a typical measurement scale schema \(\left( {\aleph ,\overline{X}} \right)\) applied to CCR (Yuen 2009, 2014a). The space of the linguistic labels \(\aleph\) of the paired interval scales is {Equally, Slightly, …, Outstandingly, Absolutely}. The numerical representation of the paired interval scales \(\overline{X}\) is as follows:

$$\overline{X} = \left\{ {\overline{x}_{q} = \frac{q\kappa }{\tau }|\forall q \in \left\{ { - \tau , \ldots , - 1,0,1, \ldots ,\tau } \right\},\quad \kappa > 0} \right\}.$$
(1)
Table 1 Measurement scale schema for CCR

The subjective perception of the difference between pairs is represented as the normal utility, \(\kappa\). By default, \(\kappa\). is set to \({\text{max}}\left( {\overline{X}} \right)\). Denoting the number of linguistic labels as \(\tau\), the number of scales is \(2\tau + 1\).

To measure the user preferences in paired interval scales, a pairwise opposite matrix (POM) is defined as follows.

$$B = \left[ {b_{ij} } \right] = \left[ {\begin{array}{*{20}c} 0 & {v_{1} - v_{2} } & \cdots & {v_{1} - v_{n} } \\ {v_{2} - v_{1} } & 0 & \cdots & {v_{2} - v_{n} } \\ \vdots & \vdots & \ddots & \vdots \\ {v_{n} - v_{1} } & {v_{n} - v_{2} } & \cdots & 0 \\ \end{array} } \right] \cong \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} 0 & {b_{12} } \\ {b_{21} } & 0 \\ \end{array} } & {\begin{array}{*{20}c} \cdots & {b_{1n} } \\ \cdots & {b_{2n} } \\ \end{array} } \\ {\begin{array}{*{20}c} \vdots & \vdots \\ {b_{n1} } & {b_{n2} } \\ \end{array} } & {\begin{array}{*{20}c} \ddots & \vdots \\ \cdots & 0 \\ \end{array} } \\ \end{array} } \right] = \left[ {b_{ij} } \right] = B,$$
(2)

where B denotes a POM. \(v_{i}\) denotes the priority value, and \(b_{ij} \tilde{ = }\left[ {v_{i} - v_{j} } \right]\) denotes the approximate comparison value between objects i and j. The values of \(b_{ij}\) are obtained from a questionnaire. For example, \(b_{13} = 3\) means that the customer considers the first object to be fairly more important than the third.

To verify the validity of the POM, an Accordance Index (AI) is defined in Eq. (3). AI = 0 indicates that B is absolutely accordant. If 0 < AI ≤ 0.1, then B is recommended. If AI > 0.1, B is unacceptable, the survey should be rechecked.

$${\text{AI}} = \frac{1}{{n^{2} }}\mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{j = 1}^{n} \sqrt {\frac{1}{n}\mathop \sum \limits_{p = 1}^{n} \left( {\frac{{b_{ip} + b_{pj} - b_{ij} }}{\kappa }} \right)^{2} } .$$
(3)

The priority values of objects are computed using the row average plus normal utility (RAU) as follows:

$${\text{RAU}}\left( {B,\kappa } \right) = \left\{ {v_{i} :v_{i} = {\frac{1}{n}\mathop \sum \limits_{j = 1}^{n} b_{ij} + \kappa , \forall i \in \left\{ {1, \ldots ,n} \right\}} } \right\}.$$
(4)

The RAU values are subsequently normalised as a vector W as follows:

$$W = \left\{ {w_{i} :w_{i} = \frac{{v_{i} }}{n\kappa },\forall i \in \left\{ {1, \ldots ,n} \right\}} \right\},{\text{where}}\mathop \sum \limits_{{i \in \{ 1, \ldots ,n\} }} v_{i} = n\kappa .$$
(5)

The vector W can represent a variety of items such as the priorities of options, item utilities, weights of features, and preferences for nominal values. In CCEHC, the weights of the product attributes and nominal scales in raw dataset D are substituted with their normalised RAU values.

Normalising dataset

Two equations are introduced to normalise the raw dataset D. If a higher value indicates a higher preference for a leaf attribute, the dividing maximal function \(\Delta_{\max }\) defined in Eq. (6) is used to rescale the column of raw attribute values, i.e., \(D_{\beta }^{T} = \left\{ {d_{1,\beta } , \ldots ,d_{\alpha ,\beta } , \ldots ,d_{m,\beta } } \right\}\). If a lower value reveals a higher preference, the minimal dividing function \(\Delta_{\min }\) defined in Eq. (7) is applied. The normalised data matrix is denoted as \(D^{\prime} = \left\{ {x_{\alpha \beta } {|}\forall \alpha \in \left( {1, \ldots ,m} \right),\forall \beta \in \left( {1, \ldots ,l} \right),} \right\}\).

$$x_{\alpha \beta } = \Delta_{\max } \left( {d_{\alpha \beta } } \right) = \frac{{d_{\alpha \beta } }}{{\max \left( {D_{\beta }^{T} } \right)}} ,\forall \alpha \in \left( {1, \ldots ,m} \right),\forall \beta \in \left( {1, \ldots ,l} \right),$$
(6)
$$x_{\alpha \beta } = \Delta_{\min } \left( {d_{\alpha \beta } } \right) = \frac{{\min \left( {D_{\beta }^{T} } \right)}}{{d_{\alpha \beta } }} , \forall \alpha \in \left( {1, \ldots ,m} \right),\forall \beta \in \left( {1, \ldots ,l} \right).$$
(7)

Fusing data

The product values \(\left\{ {\rho^{\left( a \right)} {|}\forall \alpha \in \left( {1, \ldots ,m} \right)} \right\}\) are the weighted summation of the product attribute values, where \(\alpha\) is the index of the product. Attribute values are the weighted summation of their sub-attributes. The detailed calculations of product/attribute are presented in Eqs. (8)–(10), where ri, ri,j, and ri,j,k are the weights of \(\delta_{i}\), \(\delta_{i,j}\), and \(\delta_{i,j,k}\), respectively. The leaf attribute values are obtained from the normalised data matrix \(D^{\prime}\).

$$\delta_{i,j}^{\left( \alpha \right)} = \mathop \sum \limits_{k = 1}^{{n_{i,j} }} r_{i,j,k} \cdot \delta_{i,j,k}^{\left( \alpha \right)} ,\forall i \in \left( {1, \ldots ,n} \right),\forall j \in \left( {1, \ldots ,n_{i} } \right),\forall \alpha \in \left( {1, \ldots ,m} \right),$$
(8)
$$\delta_{i}^{\left( \alpha \right)} = \mathop \sum \limits_{j = 1}^{{n_{i} }} r_{i,j} \cdot \delta_{i,j}^{\left( \alpha \right)} ,\forall j \in \left( {1, \ldots ,n_{i} } \right),\forall \alpha \in \left( {1, \ldots ,m} \right),$$
(9)
$$\rho^{\left( \alpha \right)} = \mathop \sum \limits_{i = 1}^{n} r_{i} \cdot \delta_{i}^{\left( \alpha \right)} ,\forall \alpha \in \left( {1, \ldots ,m} \right).$$
(10)

Generating top-N list

According to the product values, a personalised top-N list consisting of the N highest value products in descending order is provided to the user; the calculation details are described in Algorithm 1. For different users, the top-N lists are different since the product values are calculated with respect to personal preferences.

figure a

Clustering products

HC is used to group products according to their product values. The aim of HC is to iteratively combine the two nearest clusters into a larger cluster until all the objects are in one cluster or a preset termination condition is reached (Han et al. 2011). Murtagh (1983) briefly described the steps of hierarchical clustering methods. The steps of HC applied to the CCEHC are described below.

Step 1: One product is an atomic cluster, \({C}_{\sigma }=\left\{{\rho }^{(a)}\right\}\). The distances between each pair of clusters are computed in the following form:

$$d_{{\alpha ,a^{\prime}}} = \left| {\rho^{\left( a \right)} - \rho^{{\left( {a^{\prime}} \right)}} } \right|,\forall a,\forall a^{\prime} \in \left( {1, \ldots ,m} \right),$$
(11)

where \(d_{{\alpha ,a^{\prime}}}\) is the dissimilarity of product values for any two different products \(\rho^{\left( a \right)}\) and \(\rho^{{\left( {a^{\prime}} \right)}}\).

Step 2: The two closest clusters, \({C}_{s}\) and \({C}_{t}\), where \(\left( {s,t} \right)\)= argmin({\(d_{{\alpha ,a^{\prime}}}\)}), are combined into a larger cluster, i.e., \({C}_{s}={C}_{s}\cup {C}_{t}\), which means \({C}_{s}\) is updated by merging \({C}_{t}\) and \({C}_{s}\). The distances between the updated cluster \({C}_{s}\) and other clusters \({C}_{\neg s}\) are computed as the average distance (Han et al. 2011) in the following form:

$$d_{{{\text{avg}}}} \left( {C_{s} ,C_{\neg s} } \right) = \frac{1}{{{ }\eta_{s} { }\eta_{\neg s} { }}}\mathop \sum \limits_{{\rho^{\left( a \right)} \in C_{s} ,\rho^{{\left( {a^{\prime}} \right)}} \in C_{\neg s} }}^{{}} d_{{\rho^{\left( a \right)} ,\rho^{{\left( {a^{\prime}} \right)}} }} ,$$
(12)

where \({\eta }_{s}\) is the number of objects in cluster \({C}_{s}\). \({d}_{\mathrm{avg}}({C}_{s},{C}_{\neg s})\) updates the distances between clusters \(C_{s}\) and \(C_{\neg s}\). \(d_{{\rho^{\left( a \right)} ,\rho^{{\left( {a^{\prime}} \right)}} }}\) is the distance between products \(\rho^{\left( a \right)}\) and \(\rho^{{\left( {a^{\prime}} \right)}}\), where \(\rho^{\left( a \right)} \in C_{s}\) and \(\rho^{{\left( {a^{\prime}} \right)}} \in C_{\neg s}\). Step 2 is repeated until all products are in one cluster.

Step 3: A dendrogram indicating the arrangement of the merged clusters is produced. Two examples of dendrograms for similar laptop clusters are displayed in Fig. 4. The products are grouped into different clusters by cutting the branches at an appropriate height, which represents the distance between the clusters. Clustering results can be used for similar product recommendations. When a user searches for product \({\rho }^{(a)}\) such that \({\rho }^{(a)}\in {C}_{\sigma }\), the other products in Cluster \({C}_{\sigma }\), i.e., R, are recommended to the user. R is defined as follows:

Fig. 4
figure 4

Dendrogram for clusters of similar laptops produced for User A (left) and User B (right)

$$R={C}_{\sigma }/\left\{{\rho }^{\left(a\right)}\right\},$$
(13)

where / is a complement operator.

Application of laptop recommendation

Laptops can be represented by a set of attributes. Consumers search for a set of preferred product attributes when searching for a laptop. To demonstrate the applicability and validity of the proposed CCEHC system, a laptop recommendation case for a consumer (denoted as User A) is illustrated. For the cases in Sects. 3–5, a dataset of 27 laptop configurations was manually collected from the websites of online retail shops and manufacturers in 2015.

Specifying attributes

A large number of laptop configurations can be found on websites for selling, introducing, and comparing electronic products. The majority of consumers are likely unfamiliar with specific technical properties such as the wireless type and video output details. Certain laptop components could be unimportant to other consumers such as USB ports, DVD/CD burners, and speakers. These attributes are not considered in this recommendation case. The selected attributes for choosing an ideal laptop are structured as a 3-level attribute tree, as indicated in Fig. 2.

The attributes in the first level of the tree are CPU (\({\delta }_{1}\)), Operating System (\({\delta }_{2}\)), Storage (\({\delta }_{3}\)), Brand (\({\delta }_{4}\)), Display (\({\delta }_{5}\)), Portable (\({\delta }_{6}\)), and Price (\({\delta }_{7}\)). Five of these have sub-attributes. For example, Storage includes the Hard Drive and Random-Access Memory (RAM). The sub-attributes of the first-level attributes are structured in the second level, including {RAM (\({\delta }_{3,1}\)), Hard Drive (\({\delta }_{\mathrm{3,2}}\))}, {USA (\({\delta }_{\mathrm{4,1}}\)), Asia (\({\delta }_{\mathrm{4,2}}\))}, {Screen (\({\delta }_{\mathrm{5,1}}\)), Graphics Card (\({\delta }_{\mathrm{5,2}}\))}, {Weight (\({\delta }_{\mathrm{7,1}}\)), Battery (\({\delta }_{\mathrm{7,2}}\))}. The sub-attributes of the second-level attributes are in the third level of the tree, which are {SSD (\({\delta }_{\mathrm{3,2},1}\)), Size (\({\delta }_{\mathrm{3,2},2}\))}, {Size (\({\delta }_{\mathrm{5,1},1}\)), Resolution (\({\delta }_{\mathrm{5,1},2}\))}.

Preprocessing data

From the attributes tree presented in Fig. 2, a laptop has 13 leaf attributes. A raw data matrix D of is obtained from the laptop configurations as indicated in Table 15 of the Appendix. The quantification approaches used to preprocess leaf attributes are summarised in Table 2. For example, the attribute values of the CPU and Graphics Card are quantified by their performance scores (3DMARK 2015). The SSD attribute has three nominal labels: SSD, which indicates that the laptop has an SSD, No SSD indicating that the laptop has no SSD and Hybrid indicating that the laptop has SSD and another type of hard disk. The three labels are respectively replaced by “2”, “0”, and “1”. The screen resolution attribute is represented by the production of the width and height pixels of the screen. The nominal scales of the attributes OS and Brand are measured by CCR in Sect. 3.3.

Table 2 Schema of laptop leaf attributes

Evaluating user preferences by CCR

The preferences of user were gathered from a CCR questionnaire. An example of a questionnaire using CCR is presented in Fig. 3. The measurement scale schema defined in Table 1 is used in this case, and \(\kappa\) is set to “8”. The POM for User A presented in Table 3 is obtained from the questionnaire results in Fig. 3 based on Eq. (2). The AI for the POM computed by Eq. (3) is less than 0.1, which means that the POM is acceptable. Table 3 lists the weights of the 1st level laptop attributes computed by Eqs. (4) and (5) within the detailed calculations steps. The POMs, AIs, and weights of the remaining sub-attributes are provided in Table 4. All the attribute weights for User A are given, including the attribute tree, in Fig. 2. The nominal attribute labels for User A of Operating System (\({L}_{2}\)), Asia Brand (\({L}_{6}\)) and USA Brand (\({L}_{7}\)) are also measured by CCR. The POMs and prioritisation results (called as preference values) are displayed in Table 5. The nominal attribute values in raw dataset D can be substituted with their preference values.

Table 3 Comparison matrices for 1st level laptop attributes (User A)
Table 4 Comparison matrices for 2nd and 3rd levels laptop attributes (User A)
Table 5 Comparison matrices for nominal attribute of \({L}_{2}\), \({L}_{6}\) and \({L}_{7}\) (User A)

Normalising dataset

The suitable normalisation equations for the leaf attributes are listed in Table 2. For example, a CPU (\({L}_{1}\)) with a higher performance score is attractive. \({\Delta }_{\mathrm{max}}\) defined in Eq. (6) is therefore applied to normalise the CPU attribute values. Typically, consumers prefer a lower product price; therefore, \({\Delta }_{\mathrm{min}}\) defined in Eq. (7) is used to normalise Price (\({L}_{13}\)). The normalised data matrix \(D{^{\prime}}\) is provided in Table 16 in the Appendix. Two samples of the normalisation process for the attribute values of CPU and Price for laptop ID1 are given below.

$${x}_{\mathrm{1,1}}={\Delta }_{\mathrm{max}}\left({d}_{\mathrm{1,1}}\right)=\frac{{d}_{\mathrm{1,1}}}{\mathrm{max}\left({D}_{1}^{T}\right)}=\frac{3367}{7060}=0.447,$$
(14)
$${x}_{\mathrm{1,13}}={\Delta }_{\mathrm{min}}\left({d}_{\mathrm{1,13}}\right)=\frac{\mathrm{min}\left({D}_{13}^{T}\right)}{{d}_{\mathrm{1,13}}} =\frac{2}{7}=0.285.$$
(15)

Fusing data

For each laptop, the 2nd level attribute values are calculated using Eq. (8), the weights in Tables 3 and 4, and the normalised data matrix \(D{^{\prime}}\) in Table 16 in the Appendix. An example of the calculation process for \({{\delta }_{\mathrm{3,2}}}^{(1)}\) is presented below.

$${\delta }_{\mathrm{3,2}}^{\left(1\right)}=\sum_{k=1}^{2}{r}_{\mathrm{3,2},k}\bullet {\delta }_{\mathrm{3,2},k}^{\left(1\right)}=\left({r}_{\mathrm{3,2},1}\bullet {\delta }_{\mathrm{3,2},1}^{\left(1\right)}\right)+\left({r}_{\mathrm{3,2},2}\bullet {\delta }_{\mathrm{3,2},2}^{\left(1\right)}\right)=\left(0.313\bullet {x}_{\mathrm{1,4}}\right)+\left(0.687\bullet {x}_{\mathrm{1,5}}\right)=\left(0.313\bullet 1.000\right)+\left(0.687\bullet 0.169\right)=0.429.$$
(16)

The value of attribute \({{\delta }_{\mathrm{5,2}}}^{(1)}\) is computed as 0.556. The 1st level attribute values are computed using Eq. (9). The calculation process for \({{\delta }_{3}}^{(1)}\) is given in Eq. (17) as an example.

$${\delta }_{3}^{\left(1\right)}=\sum_{j=1}^{2}{r}_{3,j}\bullet {\delta }_{3,j}^{\left(1\right)}=\left({r}_{\mathrm{3,1}}\bullet {\delta }_{\mathrm{3,1}}^{\left(1\right)}\right)+\left({r}_{\mathrm{3,2}}\bullet {\delta }_{\mathrm{3,2}}^{\left(1\right)}\right)=\left({r}_{\mathrm{3,1}}\bullet {x}_{\mathrm{1,5}}\right)+\left({r}_{\mathrm{3,2}}\bullet {\delta }_{\mathrm{3,2}}^{\left(1\right)}\right)=\left(0.500\bullet 0.250\right)+\left(0.500\bullet 0.429\right)=0.340.$$
(17)

The values of attributes \({\delta }_{4}^{(1)}\), \({\delta }_{5}^{(1)}\) and \({\delta }_{6}^{(1)}\) are 0.563, 0.327 and 0.550, respectively. The laptop product values are computed using Eq. (10). For example, the product value of the first laptop is 0.448; the detailed steps are presented in Eq. (18). All 27 laptop product values for User A are listed in Table 6.

Table 6 Laptop product values for 2 users
$${\rho }^{\left(1\right)}=\sum_{j=1}^{7}{r}_{i}\bullet {\delta }_{i}^{\left(1\right)}=\left({r}_{1}\bullet {\delta }_{1}^{\left(1\right)}\right)+\left({r}_{2}\bullet {\delta }_{2}^{\left(1\right)}\right)+\left({r}_{3}\bullet {\delta }_{3}^{\left(1\right)}\right)+\left({r}_{4}\bullet {\delta }_{4}^{\left(1\right)}\right)+\left({r}_{5}\bullet {\delta }_{5}^{\left(1\right)}\right)+\left({r}_{6}\bullet {\delta }_{6}^{\left(1\right)}\right)+\left({r}_{7}\bullet {\delta }_{7}^{\left(1\right)}\right)=\left({r}_{1}\bullet {x}_{\mathrm{1,1}}\right)+\left({r}_{2}\bullet {x}_{\mathrm{1,2}}\right)+\left({r}_{3}\bullet {\delta }_{3}^{\left(1\right)}\right)+\left({r}_{4}\bullet {\delta }_{4}^{\left(1\right)}\right)+\left({r}_{5}\bullet {\delta }_{5}^{\left(1\right)}\right)+\left({r}_{6}\bullet {\delta }_{6}^{\left(1\right)}\right)+\left({r}_{2}\bullet {x}_{\mathrm{1,13}}\right)=0.448.$$
(18)

Generating top-N list

The top-N list for laptops is produced using Algorithm 1. According to User A’s preferences for laptop attributes, a top-10 list of laptops is provided in Table 7. The information and correspondingly web links of the laptops in the top-10 list can be recommended to User A in descending order after the user has completed the CCR survey.

Table 7 The top-10 laptops for User A

Clustering products

The details of the HC method are described in Sect. 2.7. The HC method is used to cluster the similar laptop products into different groups by measuring the dissimilarities between the product values calculated using Eq. (11). After merging the two closest clusters, the dissimilarities are updated using Eq. (12). The dendrogram produced by HC for User A is displayed in Fig. 4a. By cutting the dendrogram at height of 0.05, six clusters are generated: {4, 18, 19}, {14, 13, 16, 20, 1, 5, 22, 3, 24, 6, 26}, {25, 27, 8, 11}, {7, 9, 10, 12, 21, 23}, {15} and {2, 17}. The clustering results are used for product recommendations. For example, if User A browses the webpage of Laptop 4, Laptops 18 and 19, which are in the same cluster of Laptop 4, are recommended to the user. Similarly, if User A browses Laptop 2, Laptop 17 is recommended.

Discussions

Comparisons and discussions are presented in this section to demonstrate the advantages of the proposed RS. To demonstrate the advantage of providing personalization recommendations, the recommendations for User B are presented in Sect. 4.1. To demonstrate the differences between CCR and AHP, the results produced by AHP enhanced method are presented in Sect. 4.2. The limitations of the proposed method are discussed in Sect. 4.3.

Personalization

User B completes the questionnaire. The rating scores are presented in Tables 8, 9, 10; the rating scores of User A are given in Tables 3, 4, 5. The product values for Users A and B are listed in Table 6. The system produces personalised top-10 laptop lists and laptop clusters with respect to the two users’ preferences. Table 7 presents the top-10 laptops recommended for User A; Table 11 lists the top-10 laptops recommended for User B. The two dendrograms in Fig. 4 indicate the laptops clustering results for User A and B.

Table 8 Comparison matrices for 1st level laptop attributes (User B)
Table 9 Comparison matrices for 2nd and 3rd levels laptop attributes (User B)
Table 10 Comparison matrices for nominal attribute of \({L}_{2}\), \({L}_{5}\) and \({L}_{7}\) (User B)
Table 11 The top-10 laptops for User B

Comparing the preferences indicated in Tables 3 and 8, both users require a laptop with a high-speed CPU, large storage, and acceptable graphics. Three differences between the preferences of the two users can be summarised by comparing Tables 3, 4, 5 with Tables 8, 9, 10. Firstly, User A is not very price sensitive, whereas for User B, the price is considerably more important. Secondly, User A requires a portable laptop that is light and has a long battery life; User B hardly considers portability. Thirdly, User A has no strong preference for the brand, whereas User B strongly prefers laptops produced by US companies, especially Apple and Alienware.

From the laptop configurations presented in Table 15 and the two top-10 lists generated for Users A and B, respectively, in Tables 7 and 11, it can be concluded that the laptops in the two top-10 lists meet the common requirements (high speed CPU, large storage and acceptable graphics) of the two users. There are three issues for the two users’ preferences leading to the top-10 lists results. Firstly, the best laptop for Users A and B is the same: Laptop 27. This laptop has almost all the best configurations, yet is the most expensive. The product value of Laptop 27 for User B is less than for User A. The main reason is that User B is more price sensitive. Secondly, two portable laptops, Laptops 21 and 23, are recommended to User A, even though the other configurations of the two laptops are not attractive. The main reason is that User A prefers portable laptops. Thirdly, as User B is faithful to the brands Apple and Alienware, all the laptops of the two brands are recommended to User B in the top-10 list. The laptop recommendations provided for the two users match their requirements, and it can be concluded that the proposed CCEHC method can provide personalized recommendations with respect to user preferences.

Comparisons between CCR and AHP

CCR is based on the cognitive network process (CNP) (Yuen 2009, 2014a). The CNP is proposed as an ideal alternative to AHP to solve the rating scale problem in AHP. The numerical definition of the AHP’s paired ratio scale inappropriately represents the human intuitive judgment of paired difference; CNP uses a paired interval scale instead of a paired ratio scale. Detailed comparisons between CNP and AHP can be found in Yuen (2009, 2014a).

This study uses the original version of AHP proposed in Saaty (1980) for comparisons. To produce the AHP results, the CCP rating scales are transformed to AHP scales. The method employed is called AHP Enhanced Hierarchical Clustering (AHPEHC). The transformation method of the rating scale between AHP and CCR is given in Yuen (2009, 2014a). Table 12 presents the transformed rating matrix and weights of the ratings listed in Table 3. The product values and clustering results of the laptops computed using AHP are shown in Table 13 and Fig. 5, respectively.

Table 12 AHP comparison matrices for 1st level laptop attributes (User A)
Table 13 Laptop product values by AHP (User A)
Fig. 5
figure 5

Dendrogram for clusters of similar laptops produced by AHPEHC

The 27 laptop product values are displayed in Fig. 6. A significant difference between the values computed by CCR and AHP is that the product values computed by CCR are considerably closer than those computed by AHP. The results of CCR reflect that the recommendations for the products are difficult to make, whereas AHP results reflect that the problem is trivial. The reason for this difference is that the paired ratio scales applied in AHP typically exaggerate the human perception of the paired difference in ratio. It can be concluded that CCR outperforms AHP in reflecting the preferences of both expert and users.

Fig. 6
figure 6

Comparsion between product values computed by CCR and AHP

Limitations

Regarding the limitations, as the proposed approach is typically designed for recommendations of the latest launch products, the datasets consider the latest products (assuming that a consumer is not likely to buy an obsolete product). As the obsolete products are not considered, the data set should not be excessively large. The proposed method is not designed for processing large-scale data; the processing capability for large datasets is limited; however, this is not typically a problem as it is rare that there are a large number of new products. The scope of the proposed RS is not to address the problems solved by content based and collaborative filtering RSs; in turn, the content-based and collaborative filtering RSs are not designed to address the research problem solved by the proposed approach. The clustering validity of the proposed method is not discussed as no ground truth class labels can be used to verify the results. Internal clustering criteria, such as Davies and Bouldin (1979) and Silhoette (Rousseeuw 1987) are normally used as references, although these do not necessarily reflect real validity.

Workstation recommendation with open dataset

The proposed CCEHC method can be applied to different kinds of RSs. A workstation RS is developed to demonstrate the usability of the CCEHC. As a special type of laptop, workstation is designed for technical, scientific, and other professional purposes. In general, the workstations are more expensive than the laptops. Thus, users typically spend more time selecting a suitable workstation. The RS built by CCEHC with expert opinions and user preferences could be helpful for workstation recommendation.

An open dataset for the characteristics and prices of laptop models (Kaggle 2018) is used for the workstation RS. The dataset contains 29 items related to workstation. The characteristics and prices of workstations obtained from the original datasets can be summarised as 13 attributes organised as a two-level attributes tree, as displayed in Fig. 7. The POMs are presented in Table 17 in the Appendix. The weights of the seven attributes (B0) in the first level and three second-level attributes, Screen (B2), Processor (B3), and Memory (B4), are evaluated by CCR. In addition, the attributes, Company (L1), Screen Type (L3), Hard Disk Type (L8) and OS (L10), are nominal and evaluated by CCR.

Fig. 7
figure 7

Workstation attribute trees with the weights for User C (left) and User D (right)

To demonstrate the usability of the workstation RS, two users, User C and D, use the CCEHC system to determine what workstation would fit their purpose. The comparison matrices of User C and D are presented in Table 17 in the Appendix. The weights of the users are indicated in the attribute trees presented in Fig. 7. To compare the results of CCR and AHP, both CCEHC and AHPEHC are used to build the workstation RSs. The recommendations produced by CCEHC and AHPEHC RSs for both users are displayed as dendrograms with clusters in Fig. 8 and the top-10 lists presented in Table 14.

Fig. 8
figure 8

Dendrogram for workstations for a User C applying CCEHC, b User C applying AHPEHC, c User D applying CCEHC, d User D applying AHPEHC

Table 14 The top-10 workstations for each user using CCEHC and AHPEHC

By reading the preferences of the two users in Table 17 and Fig. 7, it can be determined that the preferences of Users C and D are not significantly different. For example, the two users both feel the memory and OS are important, and the price and weight are less important. Their preferences for Memory in the second-level attributes are also similar. The larger RAM and better type of hard disk (such as solid-state disk) are more important; however, a large hard disk capacity is less essential. The preferences of the company and processor are different. User C has certain preferences for the company of workstation, and feels the CPU and GPU are equally important; User D is not overly concerned with the company and feels the CPU is more important than GPU. By comparing the four top-10 lists produced by the two RSs for the two users, it can be observed that the RS applying CCEHC produces similar recommenders for the two users, whereas the RS applying AHPEHC method does otherwise. The recommendations produced by the two RSs are different for each user.

For the two users with similar preferences for the workstation, the RS applying CCEHC provides similar recommendations, whereas the RS applying AHPEHC provides considerably different results. The results demonstrate that the CCEHC can better reflect the user preferences than AHPEHC. The reason for the different results of CCR and AHP is the different mathematical representation of human opinions. As mentioned in Sect. 4.2, the paired ratio scales applied in AHP typically exaggerate the human perception of the paired difference in times; hence, the marginal difference in user preferences can lead to considerably different results. The application of workstation RS demonstrates that the CCEHC method can produce reasonable personalized recommendations to users.

Conclusions

RSs are helpful for consumers making choices among different products. To address the limitations of current AHC methods applied to RSs, this paper proposes a CCEHC approach for providing personalised product recommendations. CCEHC consists of two major parts: CCR and hierarchical clustering. CCR is used for user preferences elicitation. The user preferences elicited by CCR can be used to weigh the multi-level product attributes and quantify the nominal attribute values. The product values can be calculated by considering the attribute weights and normalised numerical attribute values. Hierarchical clustering is used to group similar products according to their product values. Recommendations can be produced according to the product values and clustering results. The applications of a laptop RS, where the dataset is collected by this research, and a workstation RS with an open dataset are demonstrated to confirm the validity and applicability of the proposed method. In RS applications, CCEHC can provide a top-10 list of products and similar products recommendations to customers based on their preferences provided.

The CCEHC method can be considered as an expert system that serves the recommendation function. As CCR can be used for expert judgments and user preferences, product data with human input can be processed by the clustering method and recommendations can be generated. The experimental results demonstrated that the proposed CCEHC method can provide personalised recommendations based on different user preferences. CCR outperformed AHP in reflecting the preferences of both expert(s) and users.

There are several possible paths for future work based on this research. Firstly, other clustering methods can be considered. Secondly, the interfaces for user input and recommendation output could be further improved for a better user experience. Thirdly, the approach to addressing missing values, such as user input data and the product data, could be further investigated. Fourthly, regarding the size of the data, the proposed method could be further improved to process large scale of data. Finally, to extend the application areas, the proposed CCEHC method could be further applied to numerous other product recommendation applications such as movie, music, book, cars, and smartphones.