Keywords

1 Introduction

Knowledge grows. The growth of knowledge has continued to accelerate over the recent years and is expected to do so in the future [1]. As a result, it becomes more and more difficult to extract meaningful data from the available knowledge. For this extraction, it is necessary to turn knowledge into information and the information into data [2]. The field of information science and information management has developed a copious amount of research on how to store data effectively as information (e.g. meta-data). Nevertheless, the amount of information and data is growing rapidly, therefore hindering the user from acquiring his desired knowledge from data and information.

Coping with this ever increasing amount of information and data is the central challenge of big data and knowledge discovery. The challenge shifts from organizing information to finding information that is both relevant [3] and helpful to the user in order to create knowledge. This search process becomes increasingly important in the organizational context.

Organizations are becoming ever more complex and employee fluctuation makes knowledge management increasingly harder to do [4]. To prevent knowledge loss, knowledge management systems are increasingly used. However, these address mainly explicit knowledge and fail to capture tacit knowledge. Social software solutions have tried to address this topic by capturing communication processes as they happen and storing this data for later evaluation.

Nonetheless retrieval of this knowledge from the stored data is still a great challenge. Whenever information is connected and complex, it becomes necessary to not only transfer the information from system to user, but also to shape the mental model [5] the user has of the information. It is furthermore necessary to take into account what the model of the technology behind knowledge management software is, due its strong effect on acceptance [6]. The composition of the software must be clearly communicated. Here various forms of systems come into play that address different aspects of knowledge management and knowledge transfer. Still, if the mental model of both system and content are clear, success of such a system can not be guaranteed. Often the creator of information and the benefactor are different persons [7] thus a sense of community or even locality [8] are important criteria for their success.

2 Related Work

In this paper we look at various forms of knowledge management and discovery systems and address their applicability in an organizational setting. Typical forms of knowledge management systems that deal with big data are thefollowing [9]:

  1. 1.

    Intranet and Groupware software [10] are designed to support organizational collaboration and integrate a network of clients. Typically summarized under the term CSCW they are designed with work tasks in mind and often based on well known protocols like HTTP, SMTP and FTP. They are most of the time file-centric.

  2. 2.

    Data Warehousing & OLAP [11] are data centered solutions. Transaction and Process data [12] is integrated and stored in data cubes, which can later be analyzed for reporting purposes. Data warehouses implement a single-source of truth policy and keep track of data history. Tools are required for extraction, cleaning and loading data into the Online Analytical Processing (OLAP) system. Afterwards, the OLAP system analyzes the given data cubes to find patterns or possible trend candidates. Since queries are often multidimensional, meta-data management and query management is important and often tool-assisted.

  3. 3.

    Content Management Systems are content-centric and focus on publishing processes. Content can be created, updated, published and deleted. In addition, editing workflows are implemented to address publishing responsibilities. Generally versioning and authorship meta data is maintained. Recently CMS have been used in enterprise content management, as internal documents often follow similar procedures as publishing.

  4. 4.

    (Collaborative) Search engines [13, 14] or enterprise search engines merge the joint efforts of users in locating and tagging information. This allows the retrieval of more relevant information by learning from user input and the relations of users interests.

  5. 5.

    Recommender Systems [15] are used to actively suggest interesting content to the user. Often used in Internet sales to suggest other products that are of interest. Furthermore, they are used to recommend documents, books [16], scientific literature [17] or even teams [18]. Recommender System often learn from users previous choices [19] but may also rely on multi-criteria filtering [20]. How suggestions are generated, should be clearly explained. [21] This is especially important in the case of hybrid systems [22] that integrate collaborative filtering and machine learning approaches.

  6. 6.

    Decision Support Systems (DSS) [23] derive significant information and possible emerging patterns from raw data. By considering the extracted information, the system assists the decision making process. DSS often incorporate visualizations and can be fully automatic, fully human dependent or combine both efforts.

Still all of these systems rely on various forms of information presentation to allow the user to acquire knowledge from the information or data presented. In all cases the type of visualization is critical to improve the transfer of deep structure from machine to user [24]. The concept “overview, zoom, detail on demand” [25] summarizes a core paradigm of visualizations that are based on full information display. The field of HCI-KDD [26] addresses the need for research of the interaction of Human-Computer Interaction and Knowledge Discovery in Databases.

For the case of organizational knowledge it is also important to understand the complexity of knowledge available in an organization that is shared between employees. Finding an employee or a document with critical knowledge or information is a challenge when organizational structures are not well understood. Users must learn the intricacies of overview, structure, and detail along the hierarchy of an organization.

2.1 Visual Recommender Systems

In order to ease the understanding of information, visual approaches can be used. In our case we focus on visual recommender systems. The recommender component, serves the purpose of increasing the transparency of the underlying system. The only similar solution that we could find to our prototypes is proposed by O’Donovan et al. [27]. They propose a graph-based visual collaborative filtering tool called PeerChooser that uses multiple criteria to allow users to find movie suggestions. Montaner et al. [28] propose a taxonomy of recommender systems spanning seven criteria. These should be used in order help in designing a recommender system. The instances are task dependent but are for most cases interchangeable.

  • Representation describes how data is represented (e.g. historical, feature vector, etc.)

  • Initial Information describes how data is preloaded into the algorithm before the user interacts with it (e.g. none, manual, training set)

  • Learning refers to how the algorithm improves on usage (e.g. TF-IDF, ID3, etc.)

  • Feedback describes how the user can give feedback to the algorithm (e.g. rating systems, choice).

  • Adaptation refers to how the algorithm adapts to the feedback (e.g. add new, natural selection, GFF).

  • Filtering indicates how data is filtered before used in the algorithm (e.g. collaborative, hybrid, demographic)

  • Matching describes how items are matched with the users requests or feedback (e.g. nearest neighbour, cosine similarity, etc.)

3 Visualization Prototypes

By considering Montaner et al. criteries, we investigate two different types of knowledge discovery systems in a research setting addressing these topics. We present a user evaluation of these systems and their particular visualizations. The first system is a visual recommender system that recommends documents to read that are relevant to the research interest of the user. The second system presents a visual collaboration support system that allows finding collaborators in a research organization that can contribute to the user’s topic. Both systems are integrated in a social portal that is used within the research organization.

Graph-Based Document Recommender System. The graph-based publication recommender system TIGRS uses a mixed-node graph [29] connecting publications with their keywords, with respect to their relative relevance (see Fig. 1 and [30]). Users can now filter for keywords and their relative relevance in order to find relevant documents. The system uses the users previous keywords to suggest only documents that are relevant to the user. It allows to browse through content, while at the same time seeing connections between documents sharing mutual keywords.

Fig. 1.
figure 1figure 1

Publication and keyword centric visualization of collaboration [30]

Collaborator Suggestion System. The collaborator suggestion system proposed by Yazdi et al. [31] is used to suggest fruitful collaboration in a research cluster by analyzing previous collaboration and mutual keywords. By using social network analysis possible coauthors are visually suggested when hovering over a bubble-based graph. Using a bubble-bag layout (see Fig. 2) it additionally conveys information about where a suggested collaborator works. This further conveys organizational structure information, allowing users to understand who is who in their organization and what they work on.

Fig. 2.
figure 2figure 2

Author and institute centric visualization of collaboration [31]

4 Method

We tested both systems in a large research facility (i.e. over 180 researchers) with a sample of 16 and 20 members from different fields for each system. Both prototypes were used in a user-studies (\(N=16\), \(N=20\)) determining both the overall usability (SUS [32]) and the likelihood of being recommended to a fried (NPS [33]). We investigated user factors (e.g. age, discipline, research expertise, track record) and evaluated how both prototypes complement each other in a scientific setting.

With regard to recommender systems, finding appropriate metrics for their evaluation is critical [34] in order gain an understanding of what needs to be optimized. Here we only look at general usability and qualitative insights. For further detailed analysis please refer to the original works of the prototypes (cf. [30, 31]).

For both prototypes users were invited to take part in a user study in our laboratory. They were given time to get accustomed to the prototypes and their handling and were then given tasks to complete (i.e. find suitable publication/collaborator). The whole process was recorded and then analyzed. Additionally further quantitative analyses were performed derived from a questionnaire completed by the participants.

In this paper we want to focus on reporting qualitative findings from both prototypes and their implementation in a social portal. Both prototypes are to be integrated in the so-called “Scientific Cooperation Portal” (SCP). [35] a tool devised to tackle the high staff volatility in a large research cluster. The SCP is a social portal which centralizes communication, file-exchange, member profiles, and offers interdisciplinary collaboration support. Additionally, it tracks research output of individual researchers by tracking their publications. This latter feature is also used to enable steering the cluster from a management point of view [29]. In the prototypes we use this data to construct visualizations that help facilitate collaboration and understanding the organization.

5 Results

Our analyses (univariate analysis of variance) show that both systems address different aspects of understanding how an organization works. The first allows understanding how different departments work on various topics, while overlapping in their content and methodology. The second system allows understanding the scientific content that researchers work on in depth.

5.1 Sample Description

For the first prototype we asked \(N=16\) researchers from an interdisciplinary research facility whose average age was \(\bar{x}=33.6\) years (\(\sigma =6.14\), range\(=23-52\)) and 56 % of whom were female to take part in our study. Ten had finished their undergraduate training (Masters) while five already had graduate training (PhD or Professor). Most researchers came from the fields of linguistics or communication science (see Table 1). Most researchers had published about 5-6 papers in their careers with some outliers of over 150 (i.e. a professor) and some none (new colleagues). The facility has a total of 25 researchers.

Table 1. Research fields in sample for the first prototype (multiple selections allowed)

The second prototype was testes with \(N=20\) researchers from an interdisciplinary research cluster (out of 180 employees). Forty participants were approached at seven different institutes mostly from engineering sciences, but also including communication science and computer science.

5.2 Quantitative Results

The first prototype received very high ratings in usability. Overall SUS was high (\(\bar{x}=81.5\), \(\sigma =2.17\)) indicating a good usability of the system. Nonetheless the NPS was relatively low (-7). We got 4 detractors, 8 passives, and 4 promoters. This means further development of the system needs to be performed to align with user requirements.

5.3 Qualitative Results

In addition to the quantitative evaluation we screened the user study recordings for mentions of various categories. We also analyzed the behavior how users were using the prototypes and in particular how surprised they were. Furthermore we asked users what they learnt about their organization and how much the visualization improved their knowledge of the organization.

Looking for Relevant Publications. An interesting observation during the usage of the first prototype was the different styles of how it was used. We identified three different approaches how users used the filtering mechanism.

The first style was a drill down approach. Users that applied this approach first looked at the full graph including all publications from their institute, resetting all filters before looking for a recommendation. They then used generic terms that were of interest to them and played with the relevance sliders to further drill down on interesting suggestions. Then more experienced users were trying to look for items they did not know yet while keeping their focus on the center of the graph were items are that are connected to all relevant filter terms.

The second style was an incremental bag approach. They started with a very specific term that was of current interest to them often dissatisfied with the few results they gradually increased the bag of filters with specific terms. Interestingly these users reported to find very relevant suggestions albeit often previously known suggestions.

The third style was a traverse related work approach. Users applying this style looked for single items that they found interesting and sequentially added keywords that were relevant to that single item, repeating this process several times. This can be seen as a traversal along keywords approach often leading to utterances like “we have someone writing about this, I was looking for something like that”, indicating very serendipitous finds.

Finding Fruitful Collaborators. When using the second prototype most users were astonished with how much the visualization revealed about the organizational structure. Users were surprised to see that others in the organization (out of 180 researchers) were working on similar topics that they were. In particular seeing the who has worked with whom was interesting as this information was somewhat opaque to find from publication lists.

Discovering New Knowledge About the Organization. Both prototypes were able to reveal new knowledge by visualizing publicly available information in a new fashion. Both prototypes caused users to have serendipitous finds either as publications or possible collaborators from a pool that was theoretically available to them. Nevertheless, the effort to manually look for this information had been a barrier to do so (both organizations existed for more than 5 years).

In both cases the head of the organization was asked to use the prototypes and talk about the benefits of the visualization for managing purposes. Both mentioned the benefits of getting overview knowledge about their organization. They were also surprised to see how much publications had been written since the funding of the organization and realized the scope of their organization. Interestingly, a need to communicate publications to teams arose during these experiments.

6 Conclusion

In this paper, we have examined two knowledge systems. The first system recommended relevant documents to the user in the form of a visual recommender system. Thereby, the system provides the user with the means to directly influence the recommender algorithm and the recommender visualization. Whereas the second system, supports the user in finding suitable future collaborators, who can contribute to the user’s topic.

In conjunction both systems provide insights into what the colleagues do and how their work can be useful in respect to the user’s own work. This knowledge can be used either for collaboration or as a basis for one’s own work. The systems help in creating knowledge from data and information through their specifically adapted visualizations.

During the use various types of new applications arose. The need to extend the visualization to other types of documents was seen. In particular seeing not only publications but also grant proposals for a whole university and collaboration suggestion within the university was mentioned as a possible application.

6.1 Limitations

Both tools require PDFs and Full-Texts with meta information to work properly. The conducted studies were done with relatively small user groups because of the intensive analysis required after the test. Therefore, the experiment is not able not reveal any effects between user-factors and usage behavior. Differences are too small to be statistically significant, thus they must be assumed non-existent. Inspite of that the experimenter felt the need to report, that less experienced researchers were using the tool differently than experienced researchers. Further analysis of the videos might reveal these differences at a later point in time.

6.2 Outlook

The user studies indicate that both prototypes can be used not only to assist scientific collaboration but also in organizational knowledge management. Here they can be used to interlink documents from an enterprise content management system in order to find relevant documents when working on another. The aim of our prototypes is to bring the various levels of collaboration support together. Documents can be served from the intranet or CMS and be connected with collaborative search and tagging from a social portal. The publication recommender system TIGRS brings these together by providing relevant documents to the user integrated into the social portal. The collaboration suggestion system even goes a step further as it actively recommends suitable collaborators for the users of the social portal. Nonetheless, only in conjunction can they help in understanding the organizational structure, the employees and the topics that are being worked on collaboratively.