Keywords

1 Introduction

Scholars and professionals in various sectors of the economy, including public administrators, corporate compliance officers, and auditors, deal with an ever-increasing flow of information (new scientific publications, business documents and multimedia files, laws, etc.). They need sophisticated tools to evaluate all this information fast and accurately and to visualize the analysis results. Specifically this means that, on the one hand, they need tools that enable state-of-the-art search and semantic analysis of large digital contents, by providing: (i) access to an extensive source inventory, (ii) advanced search and visualization methods, and (iii) functionalities for generating new knowledge from these digital assets. On the other hand, these tools need to be reasonably easy for their users to understand and support them through: (i) a detailed and scientifically proven help system (tutorials, guidance), individually configurable training programmes (learning modules, videos), and a lively community of people that have similar interests or problems to be solved. To face these challenges, the interdisciplinary trans-European project called MOVING (“TraininG towards a society of data-saVvy inforMation prOfessionals to enable open leadership INnovation”) (Vagliano et al. 2018) has built an innovative training platform that enables users from various societal sectors to fundamentally improve their information literacy by training in how to choose, use, and evaluate data mining methods in their daily research and business tasks, and to become data-savvy information professionals.

2 Digitized Science

Initiatives by the European Union (which has long been pursuing a digital agenda) to support research in the field of digitized science illustrate the need to investigate related change processes (European Commission 2016). Obviously, empirical and theoretical justification is needed to develop the practice of science. The innovative approach dealt with here was developed in the MOVING project, which offers an innovative training platform to support scientists and other users from all areas of society to fundamentally improve their information literacy in research-oriented contexts.Footnote 1 The project is about training users to select, apply, and evaluate technologies and data mining methods, so that the relevant research staff can develop into ‘data-savvy’ information professionals in their daily research routines (Scherp et al. 2016; Köhler et al. 2016a, b).

In terms of content, the research methodological changes in scientific action cannot easily be explained as domain-specific activities. This requires analyses of both current technological developments and the changes in how scientists use these technologies (or methods). The eScience Saxony research network provides statements on both perspectives (see, e.g., [Pscheida et al. 2013, 2014]). The network has observed the following:

  • there is great potential for the use of new digital tools in research;

  • preferred topics for development are scientist collaboration and the visualization of (often large or new) databases;

  • transitions between the subject areas of research and teaching can also be observed in technology development;

  • almost all scientists do most of their work using computer-based technologies and have access to appropriate infrastructures;

  • scientists sometimes find it difficult to adopt new media technologies in research and teaching (e.g. social media), although there are also subject-specific differences;

  • there is still uncertainty regarding the requirements, possibilities, and assumed risks of open-access publishing;

  • research methodology has not been fully systematically discussed and is often inadequately implemented;

  • there are no clear standards for high-quality research technology and no recognizable institutionalization to support open-access trends in science, so these still need to be worked out together;

  • digital change in science is comparatively rapid from an individual (scientist) perspective, the outcome is not known, especially regarding location-determining infrastructures.

Indeed the listing matches to a larger proportion with the demands of these cases addressed by the MOVING project. Nevertheless MOVING did set focus on two more main characteristics. First there was a serious interest to address research activity not only in academia but as well in public administration and industry. Second, when developing the approach the project consortium decided to include as well a direct focus on the related skill development, i.e. include a serious effort on innovation in the educational dimension (the Online Literacy Training and Learning) that needs to go along with any new technology in every sector.

3 Overview of the MOVING Platform

An overview of the MOVING platform architecture is illustrated in Fig. 1, which shows the most important components and their relationships. The main component blocks are (i) data acquisition, (ii) data processing, (iii) back-end data storage, user tracking, search and recommendation, and (iv) the MOVING web application that includes the front-end search. In this section, we briefly describe the overall platform.

Fig. 1
A block diagram depicts the framework of the moving web application. It consists of data acquisition, data processing, moving web applications with a front-end search, and adaptive training support.

MOVING platform architecture

The MOVING web application is the core of the platform and the interface to the user. The main entry points to the web application are the community section, the learning environment, and the search interface. The search interface offers different visual representations of search results. These visualizations allow the user to explore the search results in various ways. For this purpose, four visualizations have been added to the MOVING platform, namely: (i) the Concept Graph, which displays the search results as an interactive network, (ii) uRank, a dynamic document ranking view, (iii) Top Properties, a bar chart visualization that aggregates the results based on their properties, and (iv) a Tag Cloud, showing the most frequently occurring keywords. Moreover, the Adaptive Training Support (ATS) widget supports users learning how to search and provides material suited to their needs (Fessl et al. 2018) and the Recommender System (RS) widget (bridging the front and back ends of the platform) points users to potentially relevant documents by evaluating their last search queries. Thanks to its responsive design, all the views adapt to different screen sizes, automatically changing the layout according to the capabilities of the device.

Private user data and public documents are stored in three separate databases: The web application database holds the data for the communities, the learning environment, and the ATS. The index holds the public documents and generated metadata information such as topics, authors, and extracted entities. The user-interaction tracking captures user interactions with the web application and stores them securely in a third database. User tracking provides additional data for both the ATS and the RS, which form the basis for user support by these two widgets.

The index used by the search interface is populated by various data acquisition components (e.g. web crawlers and a Bibliographic Metadata Injection service), to increase the amount of data accessible through the MOVING platform. To date, it hosts over 22 million documents and metadata records. These records include books, scientific articles, laws and regulations, documents about funding opportunities, videos (e.g. of lectures and tutorials), and social media posts. Data processing components have been incorporated into and applied to these records, to improve the quality of data and make it easier to search. Additional features, the Data Integration Service, Author Name Disambiguation, Deduplication, Named Entity Recognition and Linking, and Video Analysis, all refine and enrich the documents stored in the index.

Author name disambiguation addresses the problem that many author names belong to different real-world authors. To deal with this problem, a novel method (Backes 2018a, b) has been developed which applies, for a given author name, agglomerative clustering on features extracted from documents containing the author mention in question, such as affiliation, co-authors, referenced authors, email addresses, keywords, and publication years. The disambiguation procedure calculates the probability with which author mentions with the same name belong to the same person. Name mentions having a high probability to belong to the same author are assigned a unique internal authorID. By this, authors with the same name are distinguished if they refer to different real-world persons. As a result, users who click on the name of an author of a document in the result list of a search will only see documents from authors who have the same author ID as the selected author (instead of showing all documents authored by any person with that name). A modified version of this method has been applied for document deduplication.

In the following, we present the front end of the MOVING platform in detail, in order to provide a concise summary of what a user can do with it. For details on how individual data processing, data acquisition, and other back-end components work, the interested reader is referred to the relevant publications, such as (Nishioka and Scherp 2016; Galanopoulos and Mezaris 2019; Tzelepis et al. 2018), as well as the documentation available on the MOVING project web site.Footnote 2

4 The MOVING Web Application

4.1 Search

Search is a key functionality in the MOVING web application. At the back end, the MOVING search engine is based on Elasticsearch,Footnote 3 given appropriate parameters, and fine-tuned to efficiently index dozens of millions of documents. At the front end, the user sees a search page (Fig. 2), with various search options and filters on the left, visualizations of the results in the centre of the window, and training functionalities such as ATS on the right. The search history of the current user can also be viewed, to support future searches.

Fig. 2
An interface of the Moving platform depicts the filter options on the left, search results in the form of a tree diagram in the middle, and training functionalities on the right side of the page.

MOVING search and results page

To enable platform users to view and replicate their previous searches, the search history view is connected with WevQuery (Apaolaza and Vigo 2017). WevQuery serves as an interface to the data generated by UCIVIT (Apaolaza et al. 2013), the tracking tool of which logs user-interaction data. From WevQuery, we get the information about the previous user searches, time when the user performed the search query, and the number of documents retrieved. This information is then utilized to build the search history view, an example of which is shown in Fig. 3.

Fig. 3
An interface of the Moving platform depicts the result page for the recent search. The result displays a table with columns titled id, name, query, document, and date last run.

Search history view

To present the results of a user query effectively, several visualizations have been implemented. Four characteristic ones are:

  • Concept Graph. For the discovery and exploration of relationships between documents and their properties.

  • uRank. A tool for the interest-driven exploration of search results.

  • Top Properties. A bar chart displaying aggregated information about the properties of the retrieved documents.

  • Tag Cloud. A visualization for the analysis of keyword frequency in the retrieved documents.

Concept Graph: an interactive network visualization the Concept Graph (Fig. 4) visualizes direct and indirect connections between retrieved search results. For example, a single, disambiguated author of two different publications is visualized as a node in the graph connecting the corresponding publications. Further extracted and disambiguated entities are visualized in a way that users can grasp, quickly, such as research networks. The initial graph visualization starts with a few collapsed nodes. These nodes can be expanded to visualize initially hidden nodes and to incrementally add more information to the graph. Thus, users are not overwhelmed with too much information when they start their search.

Fig. 4
An interface depicts the result of the concept graph. It provides an option to filter by text, edges, node types, and year and displays the two different publications in the form of nodes.

Concept Graph with opened filter menu

uRank: interest-based result set exploration. Based on the search query the top 100 retrieved results are displayed as a ranked list. The keywords extracted from the results are presented in the Tag Cloud in the right sidebar of uRank (Fig. 5, point A). By selecting keywords of interest, the results in the list (Fig. 5, point C) are re-ranked in such a way that the results containing the selected keyword move to the top. The ranking view (Fig. 5, point D) provides visual feedback on the relevance of the result. It is possible to select multiple keywords and even fine-tune their importance by using the slider under the selected words (Fig. 5, point B). Clicking on a result opens a dialogue box, which presents additional information about the retrieved document. The user can export the current view of uRank, with the current search configuration, by clicking on the export button, which initiates the download of a zip file containing an image and a report text file.

Fig. 5
An interface depicts the search result in the form of a ranked list. On the left side, it displays document titles and on the right side of the page, it depicts search keywords.

uRank and its components—(A) tag cloud, (B) tag box, (C) result list, (D) ranking view

Top Properties: the Top Properties visualization uses 100 of the most relevant results from the current search query. It shows a bar chart visualization presenting one of the following properties of the available results: Authors, Keywords, Concepts, Sources, and Year of Publication. The results are ordered according to the most frequent values of the selected property, as can be seen in Fig. 6. When the publication year is selected, the sorting order changes so that the years are displayed in chronological order to make it easier to identify year-on-year changes. Clicking on one of the bars shows the results associated with this property in a small dialogue box. The results in this dialogue are sorted in the order provided originally by the search engine. The Top Properties visualization also supports an export functionality, which exports the current view of the visualization with its search configuration.

Fig. 6
An interface depicts the result for the occurrence of the selected property for the 100 most relevant results in the form of a horizontal bar chart.

The Top Properties visualization with the dialogue box showing the result list for a bar of interest

Tag Cloud: the Tag Cloud visualization (Fig. 7) retrieves the 100 most relevant results from the search query and displays them by showing the most frequent keywords that occur in the corresponding titles and abstracts. The displayed keywords are initially sorted by their frequency and can be filtered by occurrence, year, or text. Clicking on one of the keywords shows the results associated with this property. The results are sorted in the order provided originally by the search engine.

Fig. 7
An interface depicts the result for the tag cloud for the 100 most relevant results. The horizontal scroll bar at the bottom exhibits the year range from 1086 through 2018.

Tag Cloud visualization with a dialogue box showing the result list for a keyword

4.2 Recommender System

The RS widget, depicted in Fig. 8, is part of the search page. It gives users additional suggestions for resources of which they may not be aware. The RS interacts with the search engine, user-interaction tracking, and dashboard (WevQuery), hence bridging the back and front ends of the MOVING platform. To build user profiles, it obtains the search history from the user data previously logged through UCIVIT and then retrieves the documents to suggest from the index, depending on the user’s profile. The MOVING RS is based on HCF-IDF (Nishioka and Scherp 2016), a novel semantic profiling approach that can exploit a thesaurus or ontology to provide better recommendations. Further information on the MOVING RS is available elsewhere (Vagliano and Nazir 2019).

Fig. 8
An interface depicts the list of recommended documents. It displays the result for an audio file, a pdf file, and a javascript file.

The Recommender System widget suggesting three new items to the user: a video, an article, and a web page (Vagliano and Nazir 2019)

4.3 Communities

Open collaboration and communication are the foundations of open innovation and open science. MOVING communities offer users a powerful tool to organize group collaboration and communities of practice on the MOVING platform (see Fig. 9). MOVING communities are part of the working environment of the platform and offer a range of social technologies with knowledge and information management, including wikis, forums, blog functions, and group news. MOVING communities are based on the project management tools and technologies of the eScience platform on which the MOVING platform is based. The existing eScience modules, which enabled cooperation in closed teams of researchers, were adapted to the goals of the MOVING platform to provide an open innovation environment and foster open collaboration, communication, and knowledge exchange between its users.

Fig. 9
An interface depicts the result for the community page of the platform. It displays the options for the net community, curriculum implementation, digital auditing, and eye tracking in visual analytics recommendations.

MOVING communities

Registered users who want to create a new community are offered different options. First, users can create public communities that are visible to everyone in the MOVING platform and can be accessed and edited by anyone interested in the topic. Second, users who want to organize specific project teams or research groups can create private communities that users have to join before they can access and edit content. Private communities are not visible to other users but can be shared with collaborators via email.

The MOVING CK EditorFootnote 4 enables the creation of formatted text and the integration of multimedia content in HTML pages that are created by users in the MOVING communities. Videos, pictures, GIFs or documents, and social media content from TwitterFootnote 5 and YouTubeFootnote 6 can all be easily integrated. Features like the accordion and the option to include expandable items make it easy to structure content in the page. It is a WYSIWYG editor (What You See Is What You Get) so even users that are not familiar with HTML can use it easily to create and edit web-based content within MOVING communities.

The wiki module is useful for creating and collaboratively managing large knowledge repositories with a community. The forum module provides space for open communication and information exchange—a precondition for open innovation processes. The forum module contains a user rating functionality that allows the community to publicly rate the content of individual forum entries. Users can vote posts and replies up and down, based on the quality of the contribution. The highest-rated input is highlighted to help users find the best response in a thread, and the summarized score for all received votes is shown on each user profile. The ranking functionality helps communities self-organize and peer assess user-generated content. Community administrators can also choose to assign badges to reward users or motivate them to get actively engaged. Badges can be assigned automatically or manually.

The ease of user-generated content creation and integration combined with the social features of MOVING communities open up a wide range of possible applications. Users can organize group work in small project teams, or create open communities around scientific or technical topics to discuss research or ask questions to an expert community. MOVING communities can be organized as an open innovation tool but also as a learning management system, as the following example shows.

One practical application of MOVING communities is the four-week MOVING MOOC (massive open online course) Science 2.0 and open research methods that was organized on the MOVING platform (see Fig. 10).Footnote 7 The MOOC is organized on the platform as a private team community, so that participants have to register to gain access to the learning materials and the forums. For each week of the MOOC, we created a sub-community containing learning materials in different media formats as well as weekly assignments. The forums were used to organize group communication and allow users to share their assignment results. A wiki was created and contained additional information about the course, learning goals, and technical details about using the editor or the MOOC badges that users can earn on the course (Fig. 11). Badges are displayed on the user’s profile, My page, along with their personal and contact details (profile picture, science field, skills, hometown, institution, email, ORCIDFootnote 8).

Fig. 10
An interface depicts the result for the Mooc community page. It displays the options to register or sign in in the top right corner. In the middle, it provides the option to enroll.

MOVING MOOC community

Fig. 11
An illustration depicts the badges of the moving platform for all the moving MOOC participants at the top. At the bottom, it displays a badge for open science aficionado.

MOVING MOOC badges

4.4 Learning Environment

MOVING offers a unique combination of working and training features in one platform. The heart of the training programme is the MOVING learning environment. Here, all the learning content is organized and directly accessible to the users. The landing page (Fig. 12) gives an overview of the learning materials including the platform demo videos and video tutorials, the Learning Tracks for Information Literacy 2.0, and the MOVING MOOC that was discussed in the previous subsection, Science 2.0 and open research methods. The platform demos are videos hosted on videolectures.net and are embedded in the learning environment so that users can learn about the different platform features and technologies developed within the MOVING project. Users can improve their data and information literacy as well as digital competences through Learning Tracks for Information Literacy 2.0 (Fig. 13).

Fig. 12
An interface reads welcome to the moving learning platform. It displays three blocks for moving learning tracks, platform demos, and moving Mooc science and open result methods.

MOVING learning environment

Fig. 13
A welcome page of the moving platform depicts search, communities, learning, contacts, Mooc, my page, and the sign-out option on the top. It displays several other options on the left.

Start page of Learning Tracks for Information Literacy 2.0

4.5 Adaptive Training Support

The ATS (Fessl et al. 2018) comprises two widgets for learning how to search and curriculum reflection.

The Learning-how-to-search (Fig. 14) widget visualizes information about the use of features provided by the MOVING platform. The widget presents to users how they used the features of the platform in a bar chart to motivate them to explore new features and reflect about their usage behaviour. More information about the widget and its evaluation can be found in (Fessl et al. 2019).

Fig. 14
An interface depicts the bar chart for comparison of the search results for the input interface and result presentation. At the bottom, it displays the button to submit an answer for experience using the result list feature.

Learning-how-to-search widget: The tracked features are separated into features of the search input interface and search result presentation

The curriculum reflection widget (Fessl et al. 2019) consists of two parts: the curriculum learning and reflection and the overall progress. The first part consists of two main areas. The upper area either contains a learning prompt (suggesting that the user learn more about the next topic in the current sub-module) and a button which opens the respective learning unit in a new tab (Fig. 15 left), or it presents a reflective question that motivates the user to think about the current topic of their learning (Fig. 15 right). The user’s progress in the current sub-module is displayed at the bottom of the widget.

Fig. 15
An interface depicts the widget to evaluate the information for learning on the left. On the right, the interface depicts the reasons to stop the progress of evaluating information. Both interfaces contain a progress % bar at the bottom.

Curriculum reflection widget: curriculum learning (left) and reflection (right)

The overall progress part of the widget shows the user’s learning progress through the curriculum using a sunburst visualization. Figure 16 shows that the curriculum is divided into three modules. Each module is represented as a section in the inner circle of the visualization and divided into three sub-modules in the outer circle. Every time a user completes a new learning unit, the percentage in the respective section in the sunburst diagram is updated. Progress in each sub-module is encoded by colour. If the user has not completed any learning units in a sub-module (0%), the respective section will be red. Making progress in a sub-module will turn the section yellow (50%) and completing it will turn the section green (100%).

Fig. 16
An interface depicts a doughnut chart for overall progress. the chart contains the information for content creation, information and data literacy, and communication and collaboration.

Overall progress widget: The first module was completed and the second module is in progress

This is also explained by the legend below the visualization. Moreover, the sections in the sunburst diagram are ordered to mirror the structure of the curriculum. Starting from the top, the sub-modules are completed clockwise, gradually turning the visualization green.

5 Conclusion

In this chapter, we presented the MOVING platform, focusing on the MOVING web application with its search interface and novel results visualizations, community features and learning environment, and components such Adaptive Training Support. These functionalities help users to not only search within and visualize a large multimedia collection using various advanced tools and functionalities, but also to explore the platform more easily, e.g. by showing statistics about their platform use or providing learning guidance. Productive use of the prototype platform in real educational environments, such as the MOVING MOOC, showed how its integrated training and working environment contributes to making information professionals data-savvy and improving users’ information literacy skills.