1 Introduction

Serendipity is widely experienced in human history, it is defined as “an unexpected experience prompted by an individual’s valuable interaction with ideas, information, objects, or phenomena” [1]. So far studies relating to serendipity mainly focus on the following two directions: theoretical studies in the area of information research which aim to investigate the nature of serendipity [2,3,4], and the empirical studies with the purpose to develop applications or algorithms that provide users with serendipitous encountering especially in the digital environment [5,6,7].

One of the areas which try to employ serendipity applications is the design of recommender system. The overloaded information in the cyber space has made current users no longer satisfied by recommending them those “accurate” information, instead, users aims to be recommended with the information that are more serendipitous and interesting to them [8,9,10]. However, a rising concern identified in our reviewing of relevant studies is that those discoveries from information research regarding the nature of serendipity do not receive sufficient attentions in the recommender system designs. This paper proposes a new algorithm to support serendipitous recommendation by applying recent research fruits on serendipity in the area of information research.

2 Problem and Research Question

Recommender system researchers often consider serendipity as “unexpected” and “useful” [11], and have designed recommendation algorithms through either content-based filtering [12] or collaborative filtering [13]. However, most of the recommendation algorithms mainly focus on providing “unexpectedness” to the users, and treated the “usefulness” as only a metric value to measure the effectiveness of their algorithms rather than considering it as a design clue [14].

As a comparison, serendipity in information research is often considered with three main characteristics: unexpectedness, insight and value [4]. “Unexpectedness” is considered as the encountered information should be unexpected or a surprise to the information actor, while “value” specifies that the encountered information should be considered as useful and beneficial to the information actor. These two understandings of “unexpectedness” and “value” consist with the current view of serendipity in designing recommender systems [11, 14]; however, the “insight” aspect tends to be neglected.

“Insight” is considered as an ability to find some clue in current environment, then “making connections” between the clue and one’s previous knowledge or experience, and finally shift the attention to the new discovered clue [15]. Some researchers have found such ability of “making connections” is actually a key facet in experiencing serendipity [4] and can be quite different among individuals and result in a range of serendipity encounterers from the super-encounterers to occasional-encounterers [16]. The connections can be made between different pieces of information, people and ideas [3]; therefore, to support or “trigger” connection-making in order to bring more possibilities of experiencing serendipity have always been considered as an important design clue for those information researchers [17, 18].

Based on the discussed issues, we then raise our research question: is it possible to combine the theoretical studies of serendipity in information research, especially the ignored aspect of “insight” or “making connection”, into the recommender system design?

Followed by our research question, we proposed a collaborative-filtering based algorithm by considering the theoretical discoveries of serendipity from the area of information research. Based on the discovery from information research that serendipity is often encountered in a relaxed and leisure personal state [1, 3], we then applied the algorithm into a game based application and conducted an empirical experiment.

3 Proposed Algorithm

There are two major concerns in providing serendipitous encountering in the recommendation system design: the first concern is how to balance “unexpectedness” and “useful”. As pointed out by [14], there should be “a most preferred distance” between the two values, as the high level of unexpectedness may cause user’s dissatisfaction of the recommended information, while users may also lose interest to that information with a low unexpectedness. The second concern is how to combine “insight” into system design to stimulate the process of “making connections”.

The two concerns are addressed from the following perspective of “relevance” with two hypotheses:

  • Hypothesis 1: Given the information that is highly relevant to a user’s personal profile, the information would also of a high potential value to the user;

  • Hypothesis 2: A user will be unexpected to the information that is relevant to his profile while is not previous acknowledged or known by the user.

Consider a target user A, who is the user that will be provided with the recommended information, a user B who is highly relevant to user A and a user C who is highly relevant to user B while is not known by user A. The user A may experience serendipity by providing the information of user C, which is unexpected to him/her, and by providing the relationship between user B and user C, which may further cause interestingness or usefulness to user A. The following part of this section illustrates a detailed implementation of the algorithm.

1. :

Target user

Consider a table of a target user profile U 1 with a category set C = {C 1 , C 2 , C 3 …C i …C n }, where C i represents the i-th category of the user profile. All the categories are arranged through the value of their weights in the user profile. The weight can either be a given weight by the dataset or calculated through clustering analysis [19]. In order to simplify the introduction of our proposed algorithm here, it is more convenient to set the weight for each C i which is given by the dataset in the very beginning. The weight of C i is larger than C j (i > j) in C set:

$$ w_{c} = \left\{ {w_{{C_{1} }} ,w_{{C_{2} }} , \ldots ,w_{{C_{i} }} , \ldots ,w_{{C_{i} }} , \ldots ,w_{{C_{n} }} \left| {w_{{C_{i} }} \ge w_{{C_{j} }} ,i > j} \right.} \right\} $$
(1)

For each category set C i , consider C i  = {a 1 , a 2 , a 3 … a i … a n }, where a i is the corresponded attribute to each vector C i . In particular, for each a i represents the dimension according to which a new user profile may be produced (i.e. author of literatures; musicians). The values for each a i are also arranged by their weight in each vector C i and can be calculated through semantic analysis such as the tf*idf weight (term-frequency times inverse document frequency) calculation [20]:

$$ w(t,d) = \frac{{tf_{t,d} \log \left( {\frac{N}{{df_{t} }}} \right)}}{{\sqrt {\sum\limits_{i} {\left( {tf_{{t_{i} ,d}} } \right)^{2} \log \left( {\frac{N}{{df_{{t_{i} }} }}} \right)}^{2} } }} $$
(2)

Where w(t,d) represents for the weight of a term t in a document d, and it is a function of the frequency of t in the document (tft,d), the number of documents that contain the term (dft) and the number of documents in the collection (N). As a result, the weight for a category set C i is determined by the weight of each attribute in the set:

$$ w_{{c_{i} }} = \left\{ {w_{{a_{1} }} ,w_{{a_{2} }} , \ldots ,w_{{a_{i} }} , \ldots ,w_{{a_{j} }} , \ldots ,w_{{a_{n} }} \left| {w_{{a_{i} }} \ge w_{{a_{j} }} ,i > j} \right.} \right\} $$
(3)
2. :

Screen the weight

As been pre-defined that C 1 with the largest weight in the C set and a 1 with the largest weight in the C i set. Set a threshold τ to eliminate the low weight value from the user profile U 1 :

$$ w_{{c_{i} }} = \left\{ {w_{{a_{1} }} ,w_{{a_{2} }} , \ldots ,w_{{a_{i} }} , \ldots ,w_{{a_{j} }} , \ldots ,w_{{a_{n} }} \left| {w_{{a_{i} }} \ge w_{{a_{j} }} ,i > j} \right.} \right\} $$
(4)

Similarly, set a threshold θ to eliminate the low weight value from the C i set:

$$ w_{{c_{i} }} = \left\{ {w_{{C_{i} a_{1} }} ,w_{{C_{i} a_{2} }} ,w_{{C_{i} a_{3} }} \, , \ldots ,w_{{C_{i} a_{i} }} \left| {w_{{C_{i} a_{i} }} \ge \theta } \right.} \right\} $$
(5)
3. :

Generate a new user profile

A new user profile U i+1 is produced according to each a i in the C i set. Here, the generation of the user profile arranges from the largest weight of \( w_{{C_{i} ,a_{1} }} \) to the smallest weight of \( w_{{C_{i} ,a_{i} }} \).

4. :

Iteration and End condition

Based on the weight arrangement in a user profile, it is intuitional that for an attribute a i with a large weight, it is also with more possibility for the current user to have acknowledged about the information of a i . In other words, the probability for a current user U i to make connection with the next user profile U i+1 is proportional to the weight of the attribute in current user profile:

$$ P(U_{i + 1} \left| {U_{i} } \right.) = \lambda w_{{c_{i} }} * w_{{c_{i} ,a_{i} }} $$
(6)

where λ is the proportionality coefficient of the probability to the relevant weight.

The probability of making connections by target user U 1 to i-th user can be further extended if only the generated user is always new to the prior generated ones:

$$ P(U_{i} \left| {U_{1} } \right.) = P(U_{2} \left| {U_{1} } \right.) * P(U_{3} \left| {U_{2} } \right.) * \ldots * P(U_{i} \left| {U_{i - 1} } \right.) $$
(7)

The iteration to find the next user would not continue until it meets the following two end conditions:

  • the generated user is no longer new to all the previous generated users;

  • \( P(U_{i} \left| {U_{1} } \right.) \) comes to a threshold δ, where δ represents an appropriate threshold of the probability.

The reason to set the threshold δ is to ensure the effectiveness of the iteration process. This is because if \( P(U_{i} \left| {U_{1} } \right.) \) comes too large, the recommended information may fail to bring the target user with the sense of unexpectedness, as the recommendation may probably have been acknowledged by the user; however, if the value of \( P(U_{i} \left| {U_{1} } \right.) \) is too small, the recommended information may be too irrelevant to the target user and he/she may lose interest on it. Hence the setting of the threshold δ is a very important step for the iteration process and it needs to be further identified based on empirical studies in the future. Once the recommendation list is generated within the threshold δ, they can be recommended to the target user by selecting the item with the highest values of \( P(U_{i} \left| {U_{1} } \right.) \).

5. :

Recommendation

When the iteration is finished, the content with the largest weighted category in current candidate will be provided to the target user, in addition with the relevant information of the previous searched users that result in the current user.

6. :

An example of the proposed algorithm

An example of the proposed algorithm is provided in Fig. 1. Consider Ann as the target user (U 1 ) with different literature categories of {A, B, C} in her personal library, whose weight is {0.5, 0.3, 0.2} (Fig. 1a). The author names of the literatures are set as the attributes for each category and according to the tf*idf weight calculation, there are three values {a 1 , a 2 , a 3 } in category A with the weight W’A = {0.6, 0.3, 0.1}. Set λ = 1 for each probability of the current user to find the next user profile, the probability for Ann to find a1’s profile (U 2 ) can be calculated according to Eq. (6):

$$ P(U_{2} \left| {U_{1} } \right.) = w_{A} * w_{{A,a_{1} }} = 0.5 * 0.6 = 0.3 $$
(8)
Fig. 1.
figure 1

An example of the proposed algorithm: (a) target user Ann’s personal library; (b) user a1’s personal library generated by Ann; (c) user d1’s personal library generated by a1

Fig. 2.
figure 2

Different stages of the designed sketch application: (a) Memorised picture; (b) Participant’s sketching; (c) Retrieving; (d) Sketching result and game score; (e) provided picture information

The profile of a1 is then produced as Fig. 1b. Likewise, among the four authors in the D category, author d1 (U 3 ) weights largest and then produce d1’s profile (Fig. 1c):

$$ P(U_{3} \left| {U_{2} } \right.) = w_{D} * w_{{D,d_{1} }} = 0.4 * 0.5 = 0.2 $$
(9)

According to Eq. (7), the probability for Ann (U 1 ) to find d1’s profile (U 3 ) is:

$$ P(U_{3} \left| {U_{1} } \right.) = P(U_{2} \left| {U_{1} } \right.) * P(U_{3} \left| {U_{2} } \right.) = 0.3 * 0.2 = 0.06 $$
(10)

Set the threshold δ as 0.06, then the iteration of the algorithm stops and recommend literatures of category F in d1’s profile to Ann, in addition with the relevant information of d1 and a1. For example, the recommended information can be “these papers (category F) are most stored by d1, who had published papers (d1, d2, d3, d4) with a1 before”.

7. :

Description of the Proposed Algorithm

The proposed algorithm is collaborative filtering based, hence it is more appropriate to those dataset whose content is generated by different users, according to which the next user’s profile will be easier to produce for a current user.

The proposed algorithm relates with serendipity from the following three aspects:

  • Unexpectedness: by setting the value of probability. In an identified threshold δ, the unexpectedness of the information to a target user is inversely related to the magnitude of probability. The smaller probability for a target user to find another user, the more unexpectedness he/she receives from the provided information of the current candidate.

  • Insight: by providing the information of the searched clues which demonstrates the relationship between the provided user (recommendation source) and the target user. As aforementioned that the ability to connect the new clue with previous knowledge/experience is a key element in the occurrence of serendipity, and thus there is a necessity for the designers to provide the design clues can contribute to a customer’s noticeability or attention to connect the provided information with his/her personal profile. In the provided example of Fig. 1, such insight is provided by showing the relationship between d1 and the target user, who had published paper together before.

  • Value: by generating the next user’s file according to the weight arrangement of the attributes; those with larger weights are considered as priorities. This is because the larger weight the attribute is, the more possibility it may have to satisfy the target user’s need/concern, and finally brings more potential value to the user.

4 Empirical Study

A problem that the developed algorithm confronted is how to evaluate it successfully in a real life environment. According to the information research, studying serendipity in a controlled experiment always has negatively influences on the participants [21, 22]; in addition, serendipity is such a subjective phenomenon that it is tightly closed to the participant’s own experience or knowledge [4, 15]. A hint to address the problem may rise from Shute’s [23] stealth assessment theory where the assessments or inferences of conceptions or models that is elusive to humans is embedded into new computer-based technologies such as games. In the centre of Shute’s theory is the Evident-Centred Design (ECD), where a player’s abilities and understandings, especially those that cannot be directly observed by researchers (e.g. critical thinking, problem solving) is reflected through the embedded tasks or situations in the design, such as the interaction processes of the game. Serendipity is exactly such a phenomenon that cannot be observed directly by the researchers; however, during the process of game-playing, participants would naturally produce sequences of actions while performing the designed tasks and hence provides us with possible evidences to access the encounter of serendipity. In addition, there is also evidence from the information research that serendipity is often experienced by those participants who are in a relaxed and leisure state [1, 3], and playing games can bring participant to such a relaxed state comparing with other activities. Based on the above discussion, we have then employed the algorithm into a game-based application and have conducted an empirical experiment to investigate whether our proposed algorithm could provide serendipitous encountering to researchers. The study is described in details below:

4.1 Participant

28 PhD students (14 males and 14 females) from different disciplines are invited to the study. They were asked to conduct a drawing game on a mobile application which was developed by the research group.

4.2 Game-Based Application

The developed game is an android-based drawing game, which involves the following stages:

  • Memorising and sketching

Each participant was given a picture in the very beginning for observation. Participant was then asked to layout the colour features of the picture based on the memory. A time clock is set during this stage where the maximum observation time is 30 s and the maximum sketch time for each participant is 120 s.

  • Retrieving

When a participant finishes sketching, a group of 30 images is displayed to the participant for retrieving whether or not his/her drawing picture was contained in the provided pictures. If the picture is contained in the group, he/she may click on the picture to pick it out. Or the participant only needs to click “Next” button.

  • Sketching result

Participant’s finial sketching result is provided after retrieving. A winning game means the participants has successfully retrieved the drawing picture, and then he/she will be given a game score based on the observation time and sketching time. Otherwise, the participant will be noticed he/she has failed in the sketching.

  • Providing picture information

The last stage of the application provides participants the related information of the picture, in regardless of whether or not the participant has made a successful sketching.

4.3 Embedded Algorithm and Comparison

  • Embed proposed algorithm into the developed application

The next step is to embed the proposed algorithm into the application. As all the participants are PhD researchers, the algorithm is designed based on three assumptions:

  • Assumption 1: For each PhD student, their supervisor’s information is a large weight attribute in their personal profile.

  • Assumption 2: For each PhD’s supervisor, the co-author from their publications is a large weight attribute in the supervisor’s profile.

  • Assumption 3: For each co-author’s personal profile, their working institution is a large weight attribute.

Therefore, each PhD student supposed to be provided with the information of their supervisor’s co-author’s institution. Figure 3 shows the design of the study including how the proposed algorithm is embedded into the game-based application and the sketch game process. Based on each participant’s information, we start our study by providing them with the pictures which show the institution badge (Fig. 2a). Each participant was then asked to draw the picture out within 120 s (Fig. 2b). After retrieving (Fig. 2c) and showing the result of the sketch (Fig. 2d), the serendipitous information to the pictures was provided to the participant (Fig. 2e). The given information related to the picture includes two levels: (1) the introduction of the institution; (2) the publications of both the participant’s supervisor and the co-author, as is shown in Fig. 5a.

Fig. 3.
figure 3

Process of the study and the embedded proposed algorithm

Fig. 4.
figure 4

A comparison of the proposed algorithm

Fig. 5.
figure 5

Provided information: (a) designed algorithm; (b) information from the nature website

  • A comparison of the proposed algorithm

As a comparison, each participant was also given the pictures that without the serendipitous information from our proposed algorithm (Fig. 4). Two cover pictures from the “Nature” website (www.nature.com) were selected to the participant, together with the description of the picture on the website (Fig. 5b). We consider such provision as the conventional way to introduce the relationship between the picture and the information (pic-and-info). As a result, each participant should draw two pictures with our proposed algorithm and two with the conventional way of “pic-and-info”.

4.4 Evaluation

The traditional measurement of serendipity in the area of recommender systems is often based upon the conventional perception on serendipity, where it is considered with the two main characters of “unexpected” and “useful” [11, 14, 24]. However, information research on serendipity has found that an important characteristic of serendipity is the element of time. The considered serendipity at a certain time may be changed with time [3, 4]. Therefore, [17] argued that apart from “unexpectedness” and “useful”, “interesting” and “relevant” should also be taken into considerations as new measurements of serendipity, this is because their studies have shown that users may keep or follow up the information that is “relevant” or “interest” to them and lead to serendipity at a different time. They consider such serendipity as “pseudo-serendipity” which refers to “encounters experienced by users that have the potential of being serendipity in that users intended to do something in the future with those encounters” [17, 25].

In this paper, we tend to argue that both “pseudo-serendipity” and “serendipity” would happen in recommendation systems. This is because in some cases, whether or not the recommended information is “useful” or “beneficial” to the participant needs to be further identified, and such identification may probably start from “interesting” or “relevant” [17].

Therefore, the evaluation on serendipity in our empirical study is also identified from the four dimensions of “unexpected”, “interesting”, “relevant” and “beneficial”. After a participant finished sketching all the pictures, he/she was given a questionnaire with the four dimensions, and with each dimension a Likert scale from one represents “not at all” to five represents “extremely”. Participant needs to rate in the questionnaire based on their experience of the whole sketching process from the four dimensions.

In addition, a 15 min post-interview is carried out right after each participant finished their sketching. The interview explored participant’s subjective experience and the further reason for their ratings of the four dimensions.

4.5 Results

1. :

Questionnaire

In total, 20 effective questionnaires were picked out from the 28 participants, as the other eight participants were too concentrated in the gameplay and failed to read the related information of the picture. These questionnaires were the feedbacks of 40 pictures of the conventional way of “pic-and-info” and the other 40 pictures based on our designed algorithm.

Only the marks of four or five are considered to be effective values on the corresponding dimension, which is shown in Fig. 6. According to the four identified dimensions of unexpected, interested, related and beneficial, it is obvious that comparing with the conventional way of “pic-and-info”, our designed algorithm is more possible to result in participant’s serendipitous encountering.

Fig. 6.
figure 6

Questionnaire result

2. :

Interview

During the interviews, most participants reported their senses of serendipity relating to the serendipitous algorithm designed pictures from the following two perspectives:

  • All the participants reported that they had experienced “unexpectedness” because of the relationship between the picture and the provided information:

I’ve never thought the picture is related to my supervisor! I’ve just taken it as a drawing game…… The information in the end really surprised me and I really think this is a very good design to provide me with the information in such a context! (Participant 3)

In addition, 12 out of 20 participants reported another level of unexpectedness existing in the content of the information, as the provided information was previously unacknowledged to them:

I never know that my supervisor had published such a paper with him (the co-author) before…… I’m interested about it and will check the details of the paper later. (Participant 10)

  • The result of the sketching game:

Over eight participants expressed their requirements to add an external link of the presented information (e.g. published paper of …). One participant even asked us to send him the detailed information after the study.

By contrast, most participants have reported a less interest in the conventional “pic-and-info”, this also reflects the important role of “relatedness” played in the design of the algorithms. As a result, the feedbacks from the participants have demonstrated that our proposed algorithm can effectively support the design strategies for serendipity.

5 Conclusion and Future Work

In this paper, we have presented a new serendipitous recommendation algorithm by combining the theory of serendipity in information research. In particular, our proposal extended the design of such serendipitous recommendation by including two other vital aspects in serendipity, namely, “insight” and “value”.

We also performed an empirical experiment with target users by employing the proposed algorithm to a game-based application. The result demonstrates that comparing with the conventional design of “pic-and-info”, our algorithm has effectively encouraged our participants to experience serendipitous encountering However, the study is limited by the small sample number of participants, so our future work will aim to explore the algorithm through more datasets, and to investigate the appropriate thresholds (e.g. \( \tau \) , θ, δ) which have been set in current algorithms. We will also compare our proposed algorithm with other existing algorithms so as to better evaluate and optimize the algorithm in different situations.