Scholars’ Perceptions of Relevance in Bibliography-Based People Recommender System

Collaboration and social networking are increasingly important for academics, yet identifying relevant collaborators requires remarkable effort. While there are various networking services optimized for seeking similarities between the users, the scholarly motive of producing new knowledge calls for assistance in identifying people with complementary qualities. However, there is little empirical understanding of how academics perceive relevance, complementarity, and diversity of individuals in their profession and how these concepts can be optimally embedded in social matching systems. This paper aims to support the development of diversity-enhancing people recommender systems by exploring senior researchers’ perceptions of recommended other scholars at different levels on a similar–different continuum. To conduct the study, we built a recommender system based on topic modeling of scholars’ publications in the DBLP computer science bibliography. A study of 18 senior researchers comprised a controlled experiment and semi-structured interviewing, focusing on their subjective perceptions regarding relevance, similarity, and familiarity of the given recommendations, as well as participants’ readiness to interact with the recommended people. The study implies that the homophily bias (behavioral tendency to select similar others) is strong despite the recognized need for complementarity. While the experiment indicated consistent and significant differences between the perceived relevance of most similar vs. other levels, the interview results imply that the evaluation of the relevance of people recommendations is complex and multifaceted. Despite the inherent bias in selection, the participants could identify highly interesting collaboration opportunities on all levels of similarity.


Introduction
In scholarly work, collaboration has become a normative form of knowledge production. Researchers across the social sciences broadly concur that collaboration is the best path to solving complex problems and achieving exceptional results (Frydlinger et al. 2013). Collaboration is promoted as a means of cultivating quality, enhanced resource utilization, and high impact (Hsiehchen et al. 2015). In science and patenting, a substantial shift toward collective work has been found across scientific disciplines and business domains (Wuchty et al. 2007;Börner et al. 2010). In academic research, collaboration takes place on a dyadic level between individuals, amongst research teams, as well as within international consortia. However, identifying new suitable candidates for academic collaboration requires high investment in social networking, and the disciplinary structures can prevent unexpected combinations of individuals.
Following this trend of the increasing importance of collaboration, supporting social networking and encouraging new social encounters have become central design goals in the HCI & CSCW communities. Prior research on so-called social matching (Terveen and McDonald 2005) has particularly looked into people recommender systems (Tsai and Brusilovsky 2016;Guy and Pizzato 2016) and opportunistic matching applications (Mayer et al. 2015a;Mayer et al. 2016) that aim to enable identification of new relevant connections, some of them employing playful approaches and gamification (Paasovaara et al. 2016). There are also prototypes of people recommender systems that specifically aim to match scholars: for instance, expert finding systems (Vassileva et al. 2003;Beham et al. 2010), or event-based mobile applications like 'Find & Connect' developed by Chin et al. (2014) and experimented at the UbiComp 2011 conference. Considering the rich publication data available in online repositories, prior research has looked into bibliography analysis methods for recommender systems, e.g., DBLP 1 -based systems for researchers (Zaiane et al. 2007).
However, the majority of professional matching systems tend to utilize a similarity-maximizing approach, providing recommendations of like-minded others with similar interests. In this regard, Yuan and Gay (2006) deliberate that homogeneity produces both positive and negative effects on interpersonal communication, community formation, and knowledge work -"homophily not only unifies, it also divides a network". On the one hand, collaboration within a group of people with shared interests can contribute to a safe and trustworthy work environment, enabling cohesive team spirit and ease of communication. On the other hand, it has been found that researching and cooperating with diverse individuals is essential in tasks that aim to create new knowledge (Mollica et al. 2003). Prior work emphasizes that an insightful dialogue between diverse actors can build social capital (Burt 2017) by increasing awareness about external knowledge groups and bridging polarized intellectual communities towards abounding knowledge sharing and idea creation (Argote and Ophir 2002). While research on diversifying item recommendations (Adamopoulos and Tuzhilin 2015;Castells et al. 2015) is gaining interest, few attempts have been made to match people based on diversity (Rajagopal et al. 2017).
Additionally, the evaluation of people recommender systems and matching applications is geared toward the assessment of algorithm effectiveness, with little focus on user perceptions. Although there is well-established research on usercentered evaluation of content recommender systems (Knijnenburg et al. 2012;Pu et al. 2011), the choice of potential collaborators is significantly different and, therefore, requires contextually operationalized evaluation metrics. Considering the diverse needs of scholars, it is essential to pay attention to subjective perceptions regarding the recommended people and carefully conceptualize factors such as perceived relevance and willingness to follow-up on the recommendations.
To enable the gathering of such data, we developed a simple DBLP-based people recommender system that provides the user with recommendations of other scholars from three different levels of similarity regarding their publication historylow, moderate and high. With a user study that combines a controlled experiment and semi-structured interviewing, we address the following research questions: (RQ1) What level of measured similarity of publication history is preferred in recommendations of potential collaborators? (RQ2) What specific needs and expectations scholars have in regard to seeking professional collaboration?
The findings reveal an intriguing mismatch between scholars' intuitive behavior and deliberate intentions regarding potential academic collaboration. While the quantitative results demonstrate participants' general preference to most similar recommended people, the interview data brings up a variety of scholars' needs for connecting with cross-disciplinary and diverse people. Thus, the nature of the collaboration task might influence the perceived relevance of potential candidates, for example, regarding the complementarity of professional roles, skills, and expertise.
The contribution of this work is two-fold: (i) providing empirical findings on subjective perceptions of people recommendation relevance in the context of potential academic partnering, and (ii) presenting the qualitative account of academics' needs in collaboration and factors that might affect their decision in choosing partners. Furthermore, as a methodological contribution, we operationalize measures for subjective opinions on people recommendations in the context of professional academic collaboration.

Related work
While various disciplines have studied scientific collaboration in different ways, a general consensus is that collaboration is imperative for effective knowledge production. Bozeman et al. (2013) note that research collaboration is often limited to the notion of co-authorship, and criticize the assumption that cooperation is undoubtedly resulting in a knowledge product (e.g., scientific paper). In this article, we approach scholarly partnering practices, which are beyond co-authorship, and do not require an explicit valuable outcome. In fact, Bozeman and Corley (2004) define key motives for scientific collaboration and propose that the selection of collaborators can be driven by: (i) work ethic attribution and schedule compliance; (ii) shared nationality; (iii) need to mentor junior researchers; (iv) administration request or high reputation; (v) preceding collaboration experience, its quality and personality chemistry; (vi) complementarity of skills. Considering this breadth, in this article we adopt the broad definition by Bozeman et al. (2013): "collaboration is a social process whereby human beings pool their human capital for the objective of producing knowledge." To emphasize novelty of our contribution and the research gap we cover the following topics. First, we discuss research on supporting social interaction and collaboration in the general context of conferences and introduce works on bibliography-based recommender systems. Then, we deliberate on the concepts of similarity and diversity. Finally, we provide an overview of existing user-centered evaluation metrics in recommender systems.

Computational support for matching scholars
Supporting experts finding is one of the crucial CSCW design goals to facilitate collaborative knowledge creation and dissemination (Ackerman and McDonald 1996). Over the last two decades, systems for supporting the conference experience have expanded from increasing people's awareness of necessary information at the venue (e.g., schedule and contents) to facilitating social encounters, for example arranging meetings (Nishibe et al. 1998) and general enhancement of attendee interaction with the environment and other people (Dey et al. 1999). One reasonably common approach relates to location and proximity-based services for finding relevant connections (Kawakita et al. 2004;Cox et al. 2003). For instance, 'Find & Connect', a social networking mobile application developed by Chin et al. (2014) and experimented during the UbiComp 2011 conference, aims to provide the users with social recommendations based on physical proximity and similarity of interests. The results of the work reveal that users preferred acting on familiar recommended people or friends-of-friends, as well as those who have similar research interests.
There are also generic, platform-like services designed to help the attendees meet new people with shared interests (Zenk et al. 2014). An example of such tool is 'Confer' (Zhang et al. 2016) which has been tested and deployed in several HCI conferences. 'Conference Navigator' is another example and has gradually acquired new features and functions (Farzan and Brusilovsky 2007;Wongchokprasitti et al. 2010;Parra et al. 2012). For instance, the most recent version (Brusilovsky et al. 2017) -'Conference Navigator 3' -is a community-based recommender system that by utilizing content-based and tag-based analysis methods provide the user with personalized suggestions about people and contents of a conference. The user can explore their research community through interactive social network visualization and connect with similar experts either before or during an event.
Another recent research proposes so-called 'Adaptive Conference Companion' (Arens-Volland and Naudet 2016) -a mobile application that aims to deliver personalized guidance for attendees of academic events. In addition to utilizing conference data and explicit user input, authors enhance profiling and matchmaking mechanism by extracting bibliographic database (DBLP, GoogleScholar) and social media channels (LinkedIn, ResearchGate and MyScienceWork). They applied a term frequency-inverse document frequency (TF-IDF) 2 algorithm (Beel et al. 2016) for recommending most similar people and sessions in the scope of users' interests. The authors speculate that the majority of participants in the experiment were well-prepared for the conference and already had their schedule which matched with recommendation predictions of relevant content. Unfortunately, the authors did not discuss the effectiveness of a system from the social networking perspective, i.e., how people reacted to and perceived the relevance of people recommendations.
Another vein of research relates to recommending academic collaborations by utilizing bibliography data and social networks analysis and, thus, suggesting candidates with similar research interests. For instance, Kong et al. (2016) utilized topic clustering model to retrieve academic domains, calculate authors' features, and analyze academic collaboration networks. Another group of researchers (Li et al. 2014) explored co-authorship networks to identify relevant collaborators with an already existing academic tie. They observed that scholars' relationships are more complicated in a real-world setting and suggested that future work should go beyond existing co-authorship networks and consider matching people without established connections. Most recently, Hoang et al. (2017) proposes a new approach to calculate similarity with deep learning and experiment on DBLP and WiKiCFP databases. Sie et al. (2012) designed a system for recommending future co-authors that utilizes co-authorship network and topic similarity aspects in the matching mechanisms. They tested researchers preferences regarding existing co-authors vs. new potential candidates with a light-weight user study by asking the participants to rate each recommendation on a scale from one to ten. The findings revealed that participants prefer existing connections with whom collaboration has been already established.
To summarize, the prior work displays a diversity of research and development of people recommender services for scholars with a particular focus on analyzing human-generated content and publishing history. The literature on algorithmic approaches indicates that bibliographical data sets can serve as a valid data source for identifying and recommending social connections. The algorithmic choices encouraged us to approach the area with specific topic modeling methods (TF-IDF) and to analyze the cosine distance 3 (Li and Han 2013) between the authors. Although there is apparent interest in creating services for academic partnering, the primary contribution of preceding research lies in the design of new matching mechanisms and algorithms. The evaluation of such systems, thus, is focused on testing the quality criterion of prediction accuracy and little attention is paid to the subjective perceptions of recommendation usefulness and user's intention to follow-up on those. In this article, we experiment on content-based similarity-difference dimensions and specifically focus on the human-centered and subjective evaluation of perceived relevance and related variables.

Concepts of similarity and diversity
Our approach of diversity-enhancing people recommender systems is founded on the relatively strong consensus that fruitful collaboration and high innovation capability result from complementary viewpoints among a diverse group of actors (Mitchell and Nicholas 2006). Rodan and Galunic (2004) imply that heterogeneous knowledge is of high importance to both overall managerial performance and particularly to innovation performance. However, the factual value of diversity and how it should exactly manifest remain unclear. Despite the extensive literature, the role of both similarity and diversity (as opposite ends of a continuum), particularly in the decision-making of choosing academic collaborators, requires more research, as will be shown in what follows.
The related work discussed in the previous subsection demonstrates a tendency of utilizing similarity-maximizing approaches for recommending content and connections, thus amplifying the effects of homophily bias. The concept of homophily has caught the attention of researchers, primarily in social psychology (Lazarsfeld and Merton 1954;Marsden 1987;Moody 2001), as the phenomenon of individuals' natural preference to interact with similar-minded people who share socio-cultural traits. In CSCW and HCI research, homophily has been addressed, for instance, as a predictive and influential factor of online behavior in content preferences (Chang et al. 2014), and audience attraction on social media (Sharma and Cosley 2016). Another vein of research focuses on studying diversity in terms of human, relational and intellectual capital within global organizations to design features that support online communities in collaborative tasks (Muller et al. 2012). Researchers and developers seem to have adopted the similaritymaximizing approach from item recommender systems, using metrics of similarity as the proxy for relevance also in matching peers within organizations (Guy et al. 2010) and scholars in the context of academic collaboration (Heck 2013).
Although homophily might strengthen existing communities, it does not encourage the creation of new ties to further away in the global social network. Some researchers propose that such mechanisms directly lead to the formation of echo chambers that are detrimental to information flow, innovation, and creativity (Jasny et al. 2015;Bessi 2016). Echo chambers have received critique particularly with respect to social media services that divide the user community into camps of different opinions and thus increase polarization in the society (Li et al. 2013;Lee et al. 2014).
At the same time, organizational studies have identified that also diversity can have negative influences on collaborative activities, such as information exchange and decision-making (Graves and Elsass 2005;Hobman et al. 2004). An extensive review (Mannix and Neale 2005) concludes that social differences (i.e., surfacelevel), such as race and gender, indeed tend to have adverse effects on the ability of groups to function effectively, whereas more profound cognitive dissimilarities, such as differences in expertise or personality, are more often positively related to team performance. In other words, diversity is strongly linked to the concept of identity, which can make the introduction of diversity challenging in established work cultures.
Following the above mentioned, CSCW research has investigated whether it is possible to overcome the adverse effects of dissimilarities in teams to provoke creativity and productivity. For instance, Dong et al. (2016) found that commitment to a common cause, such as shared goals of the work, bring people together despite cultural differences. Similarly, but from a broader perspective, Ye and Robert Jr. (2017) revealed that collectivism (over individualism) makes people more tolerant to differences in terms of personal values, working styles, skills, and general abilities, thus, embracing individual creativity and work satisfaction. Besides, Rajagopal et al. (2017) investigated how to match peers with dissimilar opinions. The findings demonstrated that matching people with different interpretations of shared interests is more effective in producing positive experiences of breakdown.
Overall, the literature on diversity and homophily contains interesting contradictions, which calls for further empirical research on various forms of similarity or diversity in different types of collaboration. To this end, we seek to uncover how the two concepts are interlinked particularly in the assessment of the relevance of potential scholarly collaborators.

User-centered evaluation criteria for recommender systems
Historically, research on recommender system has primarily focused on the design of algorithms, underlying the assumption that better algorithms results in better user experience with the systems. Pu et al. (2012) challenges this premise by providing conceptual observation and guidelines on the evaluation criteria for recommender systems. They explicitly emphasize the importance of the user's perception regarding the system qualities. We summarize existing conceptualizations of the recommendation quality as follows: (i) perceived accuracy (Pu et al. 2011) -how well recommendations match with users interests defines the trust towards the systems; (ii) familiarity (Sinha and Swearingen 2002) -presence of familiar items increase trust towards the system; (iii) novelty (Castells et al. 2015) -unexpectedness of received recommendations can affect perceived usefulness of the system; (iv) diversity (Nguyen et al. 2014) -receiving diverse items lessens filter bubble thus increasing users' satisfaction and, as a consequence, perceived accuracy of the system. Knijnenburg et al. (2012) also provide a framework for the user-centered evaluation of recommender systems that extends the system accuracy metric with other relevant measures. For instance, the authors observe correlations between concepts, such as perceived recommendation quality (relevance), choice satisfaction, variety, diversity, effectiveness, and accuracy along with personal characteristics of the user (e.g., trust towards ICT).
To sum up, these types of evaluation criteria focus on subjective user perceptions' in the evaluation of objective aspects of the system. In this article, we do not question the effectiveness of the designed system and its elements, but rather focus on investigating scholars' attitudes towards the recommended people as potential collaborators. The evaluation criteria proposed by the prior research has proven to be effective in the assessment of item recommender systems. In contrast, as objects of recommendation, human individuals contain much more diverse features that influence the evaluation. When assisting people in choosing potential collaborators, the subjective perception of relevance might have various facets and be determined by the need or task for partnering, and, therefore, the metrics should be operationalized accordingly. In this article, we approach relevance criteria with the temporal aspect (i.e., from the perspective of past vs. current research interests), as well as from the perspective of potential collaborative activities with the recommendations.

System design
In this section, we first explicate the choice of the data source used for the design of the recommendation algorithm. Next, we outline the data cleaning process and analysis and, finally, describe the user interface developed for the experiment.

Data source, data cleaning, and analysis
We designed a content-based people recommender system using DBLP, an open bibliographic database of publications records from the majority of Computer Science conferences and journals. The DBLP dataset is a substantial plain ASCII XML file. 4 The metadata for each record contains more than necessary details for the study and, therefore, requires multiple cleaning procedures.
In the first step, the XML file was parsed using the 'xml.sax' 5 package targeting on the following tags: article, inproceedings, proceedings, book, incollection, phdthesis, mastersthesis, and www. Then, from the parsed XML file, we extracted 5,847,090 records that consist only of titles, co-authors, publishing years and venues. Next, in the resulted subset we cleaned the titles of publications following three steps: (i) converting letters to lowercase, (ii) removing the English stop words with 'nltk.corpus.stopwords' 6 function, (iii) removing the digital strings. The people recommender system runs on the subset of the parsed 5,847,090 records and depends on the input of publication venues of a given user (participant of the study). The detailed data analysis is demonstrated in Figure 1. For a participant, top venues of their publications are given (see step 1 in Figure 1). Then a subset of records is extracted from the parsed DBLP dataset by only those publication venues (step 2). All the titles in the subset records are cleaned as described previously and aggregated to form the corpus profile for each author ever published in those venues (step 3). Those authors who have less than three  publications in recent five years are filtered out to improve the quality of recommendations. Next, we tokenize the corpus and build the vocabulary with the words that only appear once or show up in more than 95% of the author corpus profiles using 'CountVectorizer()' function from scikit-learn. 7 After corpus tokenization, TF-IDF is applied to the profile model to form feature vectors for each author (step 4). Next, we compute the cosine distances between the given participant and the other authors in the subset records regarding them (step 5). As it is more intuitive to indicate the close distance with a smaller number, we use cosine distance to represent the similarity between two authors. Accordingly, the closest, or the most similar author to a participant will have the smallest cosine distance.
To validate the participants' preferences on similarity-difference continuum during the user study, we decided to deliver recommendations in the form of three groups of controlled distances -low (high similarity), medium (moderate similar) and high (low similarity). To automatically separate recommendations into such groups, the cosine distances between a participant and the other authors are sorted first, and then the OTSU filter (Otsu 1979) is employed to detect the boundaries between each group (step 6). The OTSU filter calculates the optimum threshold separating the two groups so that their intra-class variance is minimal. As the distribution of the cosine distances follows the power law, we implement the OTSU filter twice. In the first round, we apply it to the whole sorted cosine distances to detect the boundary between the low distance group and the rest. In the second round, we apply the OTSU filter on the rest without the low distance group to divide medium and high distances groups. Finally, three recommendations that have no co-authorship with the given participant (step 7) and published in the same venues are picked from each distance group as the final output, thus delivering nine recommendations in total (step 8).

User interface
A single-page web application was deployed in Firebase development platform. 8 Figure 2 illustrates the User Interface (UI) view with personalized recommendations in the form of a carousel-based list. The UI visualizes all information about authors, which DBLP data set allows to extract: full name, research topics, the list of co-authors, and recent publications. Each section of the UI is expandable if there is additional content available. The publication list represents only works from conferences where both the recommended person and the participant of the experiment have published in. Accordingly, the list of co-authors is taken from those publications only. The topics were generated through bigram analysis on the corpus profiles of each recommended person. We first generate the bigram word pairs using NLTK 'bigram' function on all the corpus. Next, we use 'nltk.ConditionalFreqDist' to calculate the occurrence of other words by giving a certain word in the corpus. For example, in a bigram word pairs for a word 'social,' the word 'media' may appear 20 times, while the word 'compute' may appear zero times. Then, to generate the authors' topics, the bigram word pairs are created from their corpus. After that, we check the conditional frequency of the second word regarding the first word in each pair. If the frequency is equal to or higher than ten (10), we pick this bigram word pair as one of the authors' topics.

User study
We designed a user study combining a controlled experiment and a semistructured interview. By providing participants with real recommendations, we aimed to help them to form their opinion regarding experiment variables. Following the homophily bias, we hypothesize that the lower the cosine distance between the participant and the recommended person (i.e., the similarity of publishing history), the more relevant and similar the recommendation would be perceived.

Experimental design
In the experiment, the computed cosine distance (content-based distance) is the independent variable, represented as three groups of fellow academics -those with low, medium or high distances. Thus, recommendations of other researchers with high similarity (low distance), moderate similarity (medium distance) and low similarity (high distance), with three recommendations from each group were presented to the participants. The participants were not informed of the three groups to avoid biased evaluation, and the presentation order of the altogether nine recommendations was randomized. The evaluation inquired the participants' perceptions about the following dependent variables: relevance, similarity, familiarity and willingness to interact.

Recruitment and participants
For the experiment, we recruited 18 English-speaking senior researchers who work at two university campuses in Tampere, Finland. Following the assumption that senior researchers often have more needs for finding collaborators, we limited our scope to postdoctoral researchers, professors, or otherwise senior academic positions. For the recruitment, we utilized various e-mail lists to reach relevant faculties, departments, and research groups. In addition to offering the participants a movie ticket for their participation, the recommendations of potential collaborators were also marketed as incentives to take part in the study. Overall, we had 13 male and five female participants, all based in either of the two universities in the same city. Fourteen of them are Finnish, two Russians, one British, and one Romanian. The ages vary from 32 to 66 (Median: 42, Mean: 45). Seven of the respondents reported their current occupation as Senior Researchers, six as Postdoctoral Researchers, four as Full Professors, and one as an Associate Professor. The most frequent research interest of the participants included human-computer interaction (10), gaze technologies and interactions (7), wireless technologies (6), interaction design and techniques (6), interfaces and information systems (5), usability/user experience and user-centered design (5), telecommunications and networking (5), virtual reality (3), wellness/health technologies (3). Their academic experience varied from 10 to 46 years (Median: 19, Mean: 20.3). Figure 3 illustrates the participants' backgrounds and attitudes concerning technology orientation, social openness, activity in networking, and breadth of research interests. Along with the other background information, the figure implies that the respondents represent what we would consider as typical computer science scholars, being technically oriented and curious about research, while displaying variety in their networking practices and interests.

Procedure and data gathering
The data gathering is comprised of three parts: (i) screening of suitable participants based on their professional position and publishing history before the experiment session. These data were used to prepare personalized recommendations for each participant. (ii) In the experiment session, the participant signed a consent form by filling out an online survey, including also a background questionnaire and numerical evaluations of each recommendation. (iii) The experiment was followed by a semi-structured interview to gather qualitative data about the participants' choices and needs for collaboration. The whole study session lasted from 40 minutes to 1.5 hours, depending on the time the participant took to get familiar with and assess the recommendations and how opinionated and expressive they were in the interviewing part. All sessions were audio recorded with the participants' permissions.
Before starting the numerical evaluation, participants were given time to explore all the recommendations and get a general overview of the alternatives. The evaluation was constructed according to four variables (see questionnaire verbatim in Table 1): (i) perceived relevance from the perspective of current and past research interests (Q1 and Q2), (ii) expected willingness to interact with a recommended person in the context of a scientific conference, including six traditional collaborative activities (Q3-Q8), (iii) levels of perceived familiaritywhether or not the user is familiar with the target person, with their research, or with their co-authors (Q9-Q11), (iv) perceived similarity between the participant and a recommended person (Q12). The variables were operationalized based on the authors' personal experiences and qualitative research insights on academic collaboration and user experience evaluation. Originally over 20 candidate items were assessed within the project team and with collaborators in an iterative fashion, resulting in the included 12 items. After providing the ratings, participants were asked to explain the scores and their reasoning behind them verbally. The interview questions that are also presented in Table 1 were designed to obtain participants' rationale behind the scoring of recommendations as well as to reveal needs and factors that affect decision-making in academic networking practices.

Data analysis
Tableau 9 was used for analysis and visualization of participants' background information, and RStudio 10 for statistical analysis and visualizing the scores in multiple box plots. The experiment has a thrice-repeated within-subjects design with nine categorical data points per participants. We utilized non-parametric Friedman test (Sheldon et al. 1996) and post-hoc analysis with 'Agricolae' package 11 in RStudio to identify a statistically significant difference between the participants' ratings of the three recommendations' groups in all questions. To avoid pseudo-replication, we calculated medians of scores given to each group of recommendations. Thus, the input data for the Friedman test consisted of Participant ID, Similarity distance groups (Low, Medium, High) as factors, and medians of scores as values.
As for the qualitative data, the audio recordings from each session were transcribed and resulted in a text file for each participant (with Min 211, Max 1,373 and Median 690 words). The coding procedure consisted of two cycles including elemental, axial and focused methods (Saldaña 2015). At the first cycle, we applied structural coding that allowed us to group data under top-level categories from interview questionnaire (see Table 1): overall impression, collaboration needs, comments about recommended people and their content, essential factors   in social matching, attitudes towards ICT-mediated professional social matching. Then, for the data in each category, we utilized line-by-line analysis and deconstruction of data into emerging categories, which were further reconstituted resulting in subcategories, linkages, and relationships. Finally, the focused coding was applied to identify the most frequent codes and organize them into emerging themes.

Results
We first report the quantitative and qualitative results on perceived relevance, similarity, familiarity and willingness to interact. Then, we discuss the needs and important factors in research collaboration, which might have affected the participants' decision-making on potential social interactions.

Quantitative findings
The participants evaluated the perceived relevance of given recommendations from two perspectives (see Figure 4): relevance for current (Q1) and past research interests (Q2). To summarize, the scores were found to be consistent with the computer-defined cosine distance. The highest scores were given to the group of recommendations with low distance, while those with the high distance generally received the lowest grades. This indicates the prevalence of homophily bias (preferring most similar researchers) in this sample of participants. Nevertheless, the data also reveals high ratings of perceived relevance for the group of recommendations with low similarity. These matches have received scores of five and higher (appears 16 times) that indicates some participants' interest in dissimilarity and openness towards new opportunities.
A Friedman test and post-hoc analysis indicated a statistically significant difference in the scores of each group of recommendations (see Figure 5). Ratings of relevance for the past and current research interests demonstrated almost equal  results, meaning that participants were consistent in their scores independently of the temporal perspective.
The quantitative results of perceived similarity (Q12) also demonstrate the tendency of a significant difference in given scores (See Figure 4). Accordingly, matches with low distance are graded as most similar, matches with medium distance as somewhat similar and those with high distance -least similar. This consolidates validity of cosine distances and OTSU filter as a method for identifying thresholds of three degrees of similarity. Friedman test results consolidate a significant difference between recommendations groups (See Figure 5).

Qualitative findings
The verbal feedback about relevance and similarity is generally in line with the quantitative results. To provide an overview of participants' comments, we collected illustrative examples in Table 2, sorted according to the three groups of similarity distance. Although the participants were unaware of the three different similarity distances, in their feedback they distinguish between different degrees of perceived relevance by using phrases like 'very/most relevant,' 'somewhat relevant/not exact match' and 'irrelevant/totally irrelevant. ' Feedback about the outliers (in Q1 -9 cases, in Q2 -7 cases) -recommendations of high distance rated as relevant -has revealed that such recommendations relate to participants' current research interests with potential for future directions, or because of surprising topics appeared in their profiles. Such cases hint about interest in dissimilarity and openness for new opportunities, as quotes 1 and 2 illustrate.  2)"(R9) This person is from a different field, and the research has interesting aspects. The last paper in the list is the most interesting: it is about something similar we have been doing recently but not published yet! So, I have to check it and maybe contact the authors." (P2, 42 y.o, Finnish male, Senior Researcher) Some participants also mentioned that their first impression about recommended people had changed when they started to check the recommendation's profiles in details (see quote 3).
3) "That was a good idea that you gave me to check all recommendations first because initial reaction was different, but when I started thinking and realized that my first impression maybe was not correct. When you start thinking about recommendations' relevance further, there might be some changes. So some of them are not that irrelevant as I thought at first." (P16, 50 y.o., Finnish female, Senior Researcher) Hence, decision-making on perceived relevance was found to be influenced by the participants' estimation about how topics of recommended people match with their own -whether they are similar, very different or complementary. In this regard, few participants addressed that evaluating the similarity between them and the recommended people is a challenging task. First, they mentioned employing different scales of comparison (5 cases) -assessing a recommended person from the perspective of all research fields in the world or specific focus areas. The second factor refers to the temporal aspect (5 cases): as interests and research directions tend to change, the evaluation of the similarity depends on the chosen time frame. Participants also pointed out that the estimation of relevance and optimal level of similarity for specific collaboration are context dependent (4 cases): some tasks would require cooperation with diverse people, while other tasks benefit from similarity. Thus, when looking for candidates with a distinct set of skills, people should also define shared interests or goals to make a prospective collaboration fruitful. Participants also acknowledged the possible adverse effects of receiving recommendations of people who are very similar (4 cases). For example, in a professional context, high similarity of interests might result in competition, and social interaction with such people will require choosing specific communication strategies.

Quantitative findings
Familiarity variable was evaluated from three perspectives: familiarity with the research topics (Q9); recognizing some of the co-authors' names (Q10); and knowing the match in person (Q11). The Figure 6 depicts that the scores of recommendations with a low distance in all questions are widely distributed, while groups of the medium and high distances received mostly negative ratings. According to scores distribution, participants are mainly familiar with the research of recommendations and aware of many co-authors of recommended people in low distance group. Besides, there are only a few people whom participants know in person. The results of the Friedman test yield a statistically significant difference in given scores (see Figure 7). The post-hoc analysis reveals significant variance only in groups of low vs. medium and low vs. high distances.    In the interview, participants addressed that already known people are less exciting recommendations (quotes 6 and 7). Even though we intentionally filtered out all the co-authors, the bibliographic data prevents understanding of the actual social relationships between researchers. Thus, in some cases (9 recommendations out of 54), the system recommended people from very close social circles, for instance, peers from academic projects or colleagues with whom one had not co-authored publications but interact daily (quote 8).

Qualitative findings
6) "(R3) It seems like I know him. His areas of research match with mine, I would say, by 90%. I know some of his co-authors even in person. This is the most interesting recommendation but not surprising, because I know him and even know his Ph.D. students." (P5, 42 y.o., Finnish male, Full Professor) 7) "The first and last person in the list I already know. They are obviously relevant recommendations. But the rest is more interesting because I do not know them." (P10, 40 y.o., Finnish male, Senior Researcher) 8) "(R2) One interesting observation is that one of the recommended people is my roommate. We have similar topics and plenty of other commonalities. We are cooperating mostly by talking. I think, we have never written papers together, but we have shared co-authors." (P4, 56 y.o., Russian male, Senior Researcher).

Quantitative findings
Evaluation of the willingness to interact with recommended people reflects six predefined scenarios of face-to-face interaction or follow-up collaboration at the context of a conference (see Figure 8): (Q3) asking advice, (Q4) giving advice, (Q5) sharing research ideas, (Q6) exploring joint research topics, (Q7) spending time together, and (Q8) organizing a research visit. The distribution of scores demonstrates that participants seemed to have a very positive attitude towards engaging in a low-threshold interaction like sharing research ideas, exploring common research topics and spending time together, particularly with most similar people. Scores given to medium and high distance groups illustrate variance of opinion with a neutral attitude on average. Interestingly, all the six interaction scenarios yielded very similar results, even in the more long-term follow-up action of organizing a research visit. The results of the Friedman test (see Figure 9) depicts a statistically significant difference in given scores only in groups of low vs. high and low vs. medium distance.   consider such social activities to happen naturally at conferences and do not necessarily require high investments of time in collaboration. For some, it was hard to envision willingness to interact with unfamiliar people based on papers titles and topics of the research. Thus, some participants (8 cases) admitted that they would like to learn more about a recommended person before taking any decisions on social interaction. Some emphasize (5 cases) that even in a real context of visiting the conference, it might be challenging to find relevant people in a crowd and contextualized use of such recommender systems might simplify the process and encourage social interaction with unfamiliar people (see quote 9).

Qualitative findings
9) "Such system should narrow the focus to serve specific purposes [...] I think about a scenario where I am going to the conference, so then I can define 'show me people who are relevant to this event' and it would give me a sense of community around it [...] After all these years you sometimes stand somewhere in the corner of a conference hall not knowing anybody. Of course, I can start communicating with random people, but it would be much more efficient if the system can suggest already somewhat relevant people and provide with tickets to talk." (P11, 53 y.o., Finnish female, Full professor)

Needs and important factors in research collaboration
When specifying crucial needs for collaboration, participants mentioned different activities. The most frequent reasons are understandable when considering senior academics: seeking academic and industrial partners for funding applications (appears in 12 answers) and knowledge sharing as the way of indirect cooperation (12 answers, e.g., exchange of data or finding relevant publications in the topic of interest). Some participants also pointed out a need for people without conflicts of interest (5 answers) such as pre-examiners, reviewers, editors or opponents, who are highly demanded and complicated to find. Another reason was research mobility (5 answers), which calls for cooperation with particularly international universities or companies. Interestingly, whereas similarity was generally considered to be a significant aspect, the above-mentioned collaborative relationships demand heterogeneity of methodological skills, research areas, or social networks. The participants also emphasized that needs for collaboration are occasional and it will be useful to contextualize the recommender system to specific scenarios, for instance, make it particularly location-and event-based or expert-finder. In their opinion, this will ensure the reasoning for using a service and motivate to follow-up on recommendations (see quote 10).
10)"I would be interested in such a system to explore people who are visiting the same conference in advance and filter them based on similarity or relevance.
[...] It will help me to revise recommendations faster. Let's say for the eventbased mobile application it will be great to inform me when a person visits the event and recommend me to meet him there. If a notification to interact comes in the middle of a street, then I doubt it will work. However, if it will happen at the conference I, of course, will try to follow-up on recommendations." (P8,33 y.o.,Finnish male,Postdoctoral Researcher) The participants also specified factors that matter to them when seeking professional collaboration. First, the majority (14 replies) addressed the importance of affiliation and the current position of candidates. From their perspective, it can tell a lot about the seniority, availability, and potential interest of the people. Besides, considering the relatively high migration of researchers to non-academic positions, it can indicate whether potential cooperators are still pursuing an academic career. Furthermore, many factors can be implicitly obtained from publications. For instance, the quantity, citation rates, and quality of papers might reveal information about the maturity of a researcher, their topics of interests as well as information about their community. For many (8 replies), these aspects play a significant role when aiming to approach unfamiliar scientists (see quote 11). 11) "There are different influence groups with leading experts which are often competitors. So, based on co-authors of a match in his publication lists I can instantly interpret that he belongs to particular influence group. That can help in decision-making whether to collaborate or not and carefully choose the communication strategy." (P5, 42 y.o., Finnish male, Full Professor) Discussion on seniority level brought out various opinions (7 replies). In general, seniority plays a considerable role: for instance, in tasks that require straightforward ability to make decisions (e.g., project planning) it is essential to be in contact with mature researchers, while some practical implementations could be performed in cooperation with students (e.g., assisting a course). Other scenarios might call for open-mindedness regarding this aspect, like in the following example: 12) "The seniority level does not matter to me that much. Sometimes junior people are more creative and innovative. [...] So we should never think about seniority levels. More senior people might have much information, but at the same time too narrow in their vision and interests. Of course, it depends: for consultancy, I might prefer to contact senior people, while for generating new ideas and brainstorming I will be more interested in collaborating with young researchers." (P15,50 y.o.,Finnish female,Senior Researcher) Additionally, participants address the personal chemistry factor (7 replies), specifically for cases of direct collaboration it might be crucial regarding the efficiency of interpersonal relationships and teamwork (see quote 13): (13) Chemistry plays a significant role -we need some basis for communication. It should be a person with whom it is nice to sit talk and drink coffee in addition to work practicalities. (P15, 65 y.o., British male, Senior Researcher) Thus, participants highlighted that this factor is highly demanded yet unfeasible to be integrated into the system because in their opinion personality compatibility can be assessed only after continuous interactions.

Discussion
While prior research has aimed at creating meaningful professional connections with the help of people recommender systems, little attention has been put on evaluating the subjective perceptions of the recommendation relevance. We emphasize that recommending is different from predicting new connections (McNee et al. 2006): to design services that can meaningfully enhance professional collaboration, algorithms should go beyond reproducing or strengthening the typical human bias.
In the following, we first summarize our findings and reflect on their novelty and relevance. Next, we provide a discussion on limitations and future work.

Summary of the results
We presented the results of an experiment on computer science researchers' preferences regarding potential collaborators of different similarity levels with a DBLP-based recommender system. With 18 senior scholars in areas related to CS, we tested how the dependent factors of perceived relevance, similarity, familiarity and willingness to interact are related to the independent variable of objective similarity measurement in terms of publication history.
The findings reveal that (1) the homophily bias is evident also in scholars' intuitive assessments of relevance and willingness to interact, and (2) there is a mismatch between people's intuitive choices and the deliberate intentions in decision-making on potential collaborators.
Considering our first research question about which level of system-defined similarity is preferred in participants' evaluations, the findings demonstrate the highest ratings for most similar people. Methodologically, the subjective evaluations of different similarity levels seem to consolidate the system design, particularly the efficiency of the OTSU filter in identifying different levels on the similarity-difference continuum. In other words, even the relatively simple analytics procedure with scarce data seemed to work sufficiently, and the publication data represented the participants' topics accurately enough. While the norm in such data analytics tends to stress the need for Big Data (e.g., Hoang et al. 2017), it appears that for people recommendations the systems could suffice with rather simple datasets as long as the recommender engine logic is well designed. The findings imply that the participants were able to retrieve useful suggestions and, for the majority, the evaluation process with the operationalized variables was straightforward.
Regarding the second research question about academics' needs and expectations in professional collaboration, the results demonstrate that the optimal area on the similarity-difference continuum highly depends on the type and context of collaboration. For instance, crucial factors in direct cooperation, such as personal compatibility and similarity of attitudes and beliefs, are not as emphasized in short-term and indirect professional interactions (e.g., consultancy type of cooperation) as in long-term collaboration. Furthermore, the nature of the collaboration task might influence the perceived relevance of potential candidates, for example, regarding the complementarity of professional roles, skills, and knowledge.
As a methodological contribution for studying user perceptions, we operationalized the concepts of perceived relevance, similarity, familiarity and willingness to interact as subjective evaluation measures for the context of academic collaboration. This helps to uncover some of the experiential aspects of these concepts and quantitatively assess how individuals consider people recommendations. To complement the reductionist measures, the qualitative findings reveal more complex and nuanced aspects that should be addressed in the design and evaluation of people recommendations for professional partnering.
Following the participants' rationale about important factors in collaboration, we propose that the diversity of recommendations in professional social matching could be enhanced through several dimensions or criteria of relevance: similarity in terms of background, attitudes, values, beliefs, goals and intentions (e.g., research aims). Previous research addressed that similarity of such qualities can raise cohesion or so-called 'affinity' (Moreland and Zajonc 1982) in interpersonal relationships, and can even reduce adverse effects of individual dissimilarities (Dong et al. 2016) in collaborative work; complementarity in terms of professional roles, skills, knowledge, and social capital. In this context, complementarity is beyond pure diversity, as discussed by Mitchell and Nicholas (2006). It should enable relevant opportunities for collaboration by identifying beneficial intersections between individuals' qualities; compatibility for direct cooperation in terms of being mentally, socially, morally or emotionally close to each other. This aspect was partially emphasized by Bozeman et al. (2013), who define collaboration as a process of knowledge production, in which compatibility of such qualities can establish trustful, joyful and personally valued cooperation; approachability/logistics -the availability of a person for direct or indirect interaction in terms of physical proximity as well as social and organizational distance. This dimension echoes with 'collaboration readiness' conceptualized by Olson and Olson (2000), who calls for better technological solutions to enable smooth communication and interaction practices for distributed collaboration.

Limitations and future work
We selected a mixed-method approach to enable a broad understanding of our research questions. By combining a controlled experiment with qualitative faceto-face interviews, we intentionally limited the sample size and compromised generalizability with deeper qualitative understanding. In the same vein, the participants represent culturally the same geographical area, which means that the generalizability of the findings to the general population of scholars is limited. Nevertheless, our method allowed us to observe the decision-making process on collaboration in the actual context of using a people recommender system. Additionally, it helped to engage participants in the discussion on potentially relevant partners concerning similarity, complementarity, and other essential qualities or factors when seeking collaboration. We could not have elicited some of the interview findings without having the task of evaluating the recommendations in situ. In fact, our prior research experience suggests that qualitative exploration of human needs, wishes, and expectations often benefits from providing a design artifact that can help a participant to form opinions on abstract concepts and speculated behavior.

Data set limitations
The limitations on data set might explain some of the mismatches in participants' quantitative and qualitative feedback on recommendations. A central limitation of DBLP is that only the publication titles (without abstracts) are available, which limits the content analysis and compromises the accuracy of the user profiles. For example, some participants addressed that it was problematic to assess the relevance of junior researchers who had a small number of papers (Min 3 in the reported study). Limiting the data sample to only those who have extensive publication history would be advantageous for the accuracy of topic modeling and providing comprehensive pictures of the assessed individuals. However, we intentionally wanted to introduce both junior and senior researchers and reveal how they could be appreciated in different contexts of cooperation, as well as evaluate the role of seniority as a possibly influential factor in collaboration processes.
Another limitation of the DBLP data set is that it provides limited understanding to the social ties between researchers. Even though we excluded all the co-authors of each participant from the recommendations, some were recommended people with whom they occasionally interact (9 cases out of total 54 recommendations). The proportion of very familiar people was relatively small, so this can be argued to have an relatively small effect on the perceptions of the system validity. By applying alternative data sources, if practically feasible, it would be possible to implement more advanced analysis of social networks and, thus, prevent recommending already known people.

Homophily bias
Even though the participants of the experiment were unaware of the similarity distance groups, the quantitative ratings of recommendations indeed demonstrated the tendency of researchers preferring most similar people. Only a few participants assessed recommendations with high distances as exciting, surprising and worthy of exploration for potential follow-up. At the same time, the qualitative feedback provides evidence of aspects in a research collaboration that require access to both similar and different others. In the following, we discuss two possible reasons behind the apparent homophily bias, also related to the methodological validity of the study: (i) The evaluation was largely based on first impression. Studies on the cognitive processes of choice (Kahneman 2003;Stanovich and West 2000) distinguish between two modes influencing humans' decision-making -so-called 'system 1' (effortless, intuition-based judgment) and 'system 2' (rational and reasoningbased). In our experimental setup, it seems that most of the participants were primarily relying on their intuition and did not engage in more rational or reflective reasoning in their evaluation. Therefore, the first impression about recommendations was likely driven by the homophily bias, thus explaining the numerical evaluations. At the same time, after rationalizing the matter during the interview, and reflecting with specific practical collaboration scenarios, the participants started appreciating different types of diversity between themselves and the evaluated person. As for considerations for design, this finding calls for user interfaces that support reflection and multi-dimensional analysis of the collaboration potential with a given recommendation. The current norm in recommender systems and leisurely social matching is based on hastiness: using simplistic profiles and simple mechanisms for selecting or discarding recommendations. We argue that such UI and interaction mechanisms do not fit with the goal of identifying optimal academic collaborators.
(ii) Lacking a timely need for collaboration. At the moment of the experiment, all of the participants were involved in research projects where the consortium was already built, and they did not report having any urgent needs or requirements for finding new collaborators. Therefore, the estimation of relevance was mostly formed according to their general picture of an ideal collaborator. Nevertheless, the majority emphasized the occasional need to utilize such people recommender systems for research networking and collaboration. This raises two design considerations. First, this calls for user interfaces that support keeping track of different types of collaboration needs in the often so scattered work of academics; most simply, the user could be reminded about their different professional activities upon receiving a recommendation. Second, this calls for context awareness (Mayer et al. 2015b) in timing the recommendation. For example, rather than having separate services for professional matching, the recommendations could be tied to services that academics typically use to seek for suitable collaborators.

Conclusions
We evaluated scholars' perceptions of relevance about potential collaborators representing different levels of similarity, utilizing a bibliography-based people recommender system. We operationalized the concept of perceived relevance, familiarity, similarity and willingness to interact within the context of evaluating prospective collaboration. By showing how these variables match with system-defined similarity in bibliography data, we revealed the asymmetry of scholars' intuition-based evaluation and their intentions. The quantitative results demonstrated the effects of homophily bias (preference of most similar others) to perceived relevance, while qualitative findings identify important factors for collaboration that naturally require connection with people of complementary expertise. Compared to the evaluation methods used in item recommenders, the findings demonstrate that people recommender systems require more advanced models and logics that go beyond predicting ties or optimizing for accuracy. From cognitive psychology perspective, assessment of potential partnering with recommended people is a complicated task that should not rely only on the first impression. Considering the long-term and reciprocal nature of professional collaboration, social matching of scholars calls for domain-specific solutions.

Open Access
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.