1 Introduction

Fig. 1
figure 1

Overview of the framework. (1) A graph of biblio-records is constructed for the arrived documents; documents are connected if they mention the same terms, are written by the same author, etc. (2) Next, we have a human-in-the-loop learning iteration for learning the latent space in the user’s mind. Then, the learner predicts the positions of newly arrived papers in the space

Every researcher must conduct a literature review, and there is a personalized need for researchers working on various research topics in their document management. They must organize publications according to their criteria to find relevant research and understand their field trends.

However, there are two significant challenges faced in personalized literature management. First, researchers must manage much research. Fire [6] found that recently, more than seven million new scholarly studies are published annually.

Therefore, the traditional approaches such as the tree hierarchy of document folders and tag-based management, are no longer effective. There is a need for automated literature management techniques.

Second, accessing the content of a paper is challenging. Although their bibliographic information is available to everyone, many papers can only be accessed through paid services. According to Nicolson et al. [15], 65% out of the 100 most cited papers were paywalled. This is a major barrier to researchers accessing relevant.

Therefore, methods for automatic literature management that use the literature contents [24, 31, 40] have limited applicability.

With this background, this study attempts to develop an interactive tool for personal literature management based on bibliographic records without the need to access the contents of papersFootnote 1 The tool asks the researcher to place icons corresponding to papers in a two-dimensional space on the screen using their own criteria, and then predicts the positions of newly arrived papers that the user would place. Figure 1 illustrates this process. First, since the relationships among bibliographic records are naturally modeled as a graph, the set of biblio-records is represented as a heterogeneous graph of biblio-records whose nodes correspond to papers, authors, conference names, years, etc. (Fig. 1(1)). The graph connects papers that share the same authors, the same years, and so on. Subsequently, the machine learner that implements our human-in-the-loop latent space learning method (Sect. 4), computes and visualizes the positions in a two-dimensional space on the screen that corresponds to the space for papers that exists in the researcher’s mind (see Fig. 1(2)). Next, the researcher provides feedback on the suggested positions by moving papers from incorrect positions according to their criteria to the correct position. In the feedback phase, the researchers are provided details about the literature, including the title, authors, publication location, and year. Then, the learner receives the feedback and updates the criteria in the space so that it can correctly predict the positions of newly the arrived papers.

The interactive nature not only captures the current latent space of papers in each researcher’s mind, but also allows the system to follow the researcher’s criteria that are evolving over time [1].

Thus, our problem can be considered as latent space learning with a graph convolutional encoder–decoder model [18]. Here, the encoder and decoder map the paper nodes in the graph to points in the latent space and vice versa, and the objective function is the cross-entropy loss for generating adjacency matrices for document clusters in the space. However, existing models do not support our human-in-the-loop approach; that is, they do not allow the user to provide interactive feedback to the latent space. Therefore, we developed a principled “human-in-the-loop latent space learning” method that estimates the management criteria of each researcher based on their feedback on the estimated positions of documents in a two-dimensional space on the screen. Our challenge is how to make the model capture the characteristics of the latent space for literature management.

1.1 Challenges and contributions

(1) We present a principled framework for interactive latent space learning in literature management. It is based on a common graph convolutional encoder–decoder model, in which the criteria for individual literature management are represented by the weights of a set of meta-paths (i.e., sequences of attributes at the schema of bib-records data), which are a popular means of capturing the semantics of heterogeneous graph [24, 32]. Our model is unique in that it is based on the following two assumptions. First, the user’s criteria in the latent space are consistent only locally. This was inspired by the results in psychology such as [35].Thus, our first research question (RQ1) is whether each researcher has different criteria for different sub-spaces in the latent space or not.

Second, the two papers are connected through paths on the graph if they are close to each other in the latent space. Therefore, unlike other popular graph convolutional encoder–decoder models, our decoder is based on the Euclidean distance between the latent vectors. Thus, our second question (RQ2) is whether our decoder is effective or not.

(2) We show the experimental results of ten academic researchers from the science, engineering, and the humanities domains. The results answer the two research questions positively and show that the approach is much superior to a typical graph convolutional model. The resulting quality is practically good in that it can place the new paper in a position close to the correct one although it does not necessarily exact one. This implies that our tool can help researchers manage relevant publications based on their own criteria.

(3) Based on the above experimental results, a natural question is whether we can devise an active learning method to improve the learning efficiency. We devised an active learning approach using uncertainty sampling. The challenge here is to define “uncertainty” in our problem setting—what the system learns is the importance of each meta-path for each cluster of each user. Thus, our third research question (RQ3) is whether we can develop an effective active learning for the setting or not. We formally define our uncertainty in the setting and the framework is then evaluated experimentally. The result shows that the uncertainty sampling strategy allows the system to boost the performance compared to random sampling, with a statistical significance.

1.2 Limitations

This study does not intend to find the best feature set or the best performance in learning the latent space using bibliographic data that are potentially available to the public.

2 Related work

2.1 Literature management tools

Tools to assist researchers in organizing related papers are widely used, and studies have been conducted on such tools. Francese [7] conducted a survey at the University of Turin to determine the manner in which students and researchers manage their bibliographies. The results of the survey showed that EndNote was the most popular bibliography management software for researchers to manage their electronic literature online, used by 49% of the respondents, followed by BibTex (11%) and Mendeley (9%). In general, such tools can automatically classify the documents with objective criteria such as years and authors and require explicit inputs from users (such as tags given to each paper) to manage them using the users’ criteria for document management. By contrast, our system automatically estimates the user’s document management criteria and can map new documents onto the space so that the user can easily grasp how they are related to other papers.

2.2 Document classification, clustering, recommendation

Document classification, clustering and recommendations are of increasing interest because of the increasing number of academic papers that researchers must manage. Various methods have been proposed, such as hierarchical Bayesian clustering [14] and metric learning [25, 37]; however, almost all these approaches use natural language processing methods [12, 30]. Unlike our method, most existing methods classify, cluster, and recommend documents by analyzing the abstracts and content of papers assuming that the document contents can be accessed, which limits their applicability in the current digital library situation.

Studies have been conducted on personalized paper recommendation methods that do not require document contents [16, 21, 24, 36, 38]. Paper recommendation is orthogonal to the latent space learning problem in that the former does not identify any criteria for how researchers manage the papers, and our method does not address the problem of identifying papers to recommend. Combining these two approaches is an interesting topic for future research.

2.3 Active learning

Active learning is utilized in various machine learning techniques, including the latent space learning [4, 26]. Typically, sampling strategies for active learning are designed to increase classifications and regressions in terms of their evaluation measures. In contrast, our feedback system on the placement of documents in the latent space serves as an oracle for latent space learning while allowing the criteria for organizing documents in each cluster to evolve with the interactive interface. In addition, some studies have used active learning for graph convolutional encoder–decoder models [2, 3]. In these studies, the information entropy and the graph structure are used to select the most informative nodes for the next iteration. Whereas the methods in these studies asks users for nodes with uncertain labels, our method determines the data to be asked based on the uncertainty regarding the importance of the meta-path.

2.4 Latent space learning

Latent space learning has been used to learn data features and comprehending data patterns and/or structural similarities in various contexts. For example, PTE [33] is a semi-supervised latent space learning technique used for textual data. In addition, doc2vec [20] creates representations for each document using latent space learning.

Network-embedding techniques that consider latent semantics have attracted considerable attention for graphs [8, 11, 13, 28, 29]. Some of the techniques such as Deepwalk [27] and Node2vec [9] rely on random walks to produce a distributed representation of nodes; LINE [34] consider and embed nodes that indirectly have edges attached to one another; the Kipf and Welling GCN [18] method learns the latent vectors of the nodes while considering the network structure. In addition, to fit autoencoders [17] to network data, the GCN was used in graph autoencoders (GAE) and variational graph autoencoders (VGAE) [19]. Both methods involve a two-layer graph convolutional network and reconstruction of the adjacency matrix using an encoder–decoder algorithm. Our model is unique in that it addresses the local consistency of criteria in the latent space and adopts the distance-based decoder tailored for literature management.

3 Definitions and the problem

Table 1 Notations used in this paper

We discuss our problem using the notations listed in Table  1. First, we define the important concepts we used in the discussion, and then define our problem.

3.1 Heterogeneous information network

Real-world systems, such as bibliographic information networks, are structured into HINs [5, 32]. A heterogeneous information network (HIN) is a special type of network structure that has multiple types of nodes and edges.

Definition 1

(Heterogeneous Information Network) An HIN is defined as a directed graph \(G(\mathcal {V}, \mathcal {E})\) with an object-type mapping function \(\tau : \mathcal {V} \rightarrow \mathcal {A}\) and relation-type mapping function \(\phi : \mathcal {E} \rightarrow \mathcal {R}\), where mathcal V and \(\mathcal {E}\) represent set of the nodes and edges, and \(\mathcal {A}\) and \(\mathcal {R}\) are the set of the object types and the relation types, respectively. In general, \(|\mathcal {A}| + |\mathcal {R}| > 2 \). For example, in a bibliographic information network, there are object types, such as paper (P), author (A), term (T), year (Y), and relation types, for example, published a paper (A-P) or a paper is published in a venue (P-V). By constructing a schema of paths called a meta-path from these types of objects and relations, we can explain the rich semantics of HIN.

3.2 Meta-path

Intuitively, a meta-path is a sequence of object-types that can have an instance in the graph. For example, A-P and PAP are meta-paths. Meta-paths are commonly used to capture rich semantics of [24, 32].

Definition 2

(Meta-Path) The meta-path P is defined as \( \mathcal {A}_1 \xrightarrow [\text {}] {\text {R}_1} \mathcal {A}_2 \xrightarrow [\text {}] {\text {R}_2} \cdots \xrightarrow [\text {}] {\text {R}_l} \mathcal {A}_{l+1} \) and defines a composite relation \( \mathcal {R} = \mathcal {R}_1 \circ \mathcal {R}_2 \circ \cdots \mathcal {R}_l \) between types \(\mathcal {A}_1\) and \(\mathcal {A}_{l+1}\) where \(\circ \) denotes the composition operator of the relations. As this study focuses on in the relationships between papers, we consider a meta-path in which both the starting and ending points of the meta-path are papers (P). For example, the meta-path “Paper (P)− Author (A)− Paper (P)” indicates the relationship between papers written by the same author.

3.3 Problem

We assume that a set of documents represents an HIN and that each document has features. In this study, we assumed that an attribute is the index of a document, which is represented as a matrix \( \textbf{X} \in \mathbb {R}^{|\mathcal {D}|\times |\mathcal {D}|} \). We construct adjacency matrices \( \{\textbf{A}^p\in \mathbb {R}^{|\mathcal {D}|\times |\mathcal {D}|} \}_{p\in \mathcal {P}} \), each of which represents the relationships between documents in a meta-path p. Additionally, the user interaction processes are provided to estimate the user’s document management criteria. This interaction is denoted as a set of tuples \((\vec {z}, \hat{\vec {z}})\in \hat{\mathcal {Z}}\), where \(\vec {z}_{d_i}\) represents the initial point of the \(d_i\)’s latent vector and \(\hat{\vec {z}}_{d_i}\) represents the point of the vector after the interaction. We formally define our research as follows. Given a set of adjacency matrices \(\{\textbf{A}^p\}_{p\in \mathcal {P}}\), the feature of documents X, and a set of interactions \(\hat{\mathcal {Z}}\), we find \(Z_\mathcal {Q}\), which is a set of latent vectors of a set of unknown documents \(\mathcal {Q}\).

Fig. 2
figure 2

One iteration of Observe-Feedback-Learn-Visualize. Given the visualized latent space of papers, the user gives the feedback by moving papers from incorrect positions to the correct ones, then the learning process updates the space based on the feedback

4 Proposed learning method

This section explains our proposed learning method, called ISLE (Interactive latent Space Learning). Algorithm 1 illustrates the structure of ISLE. The components of the algorithm are explained below.

To enable the model to capture the problem of identifying the positions of documents in the latent space in the user’s mind, our method was designed based on the two assumptions: First, there is some locality of the criteria for managing documents in the space in mind; when the researcher moves papers to a place near some of the other papers, there is a consistent criteria in the neighborhood, but the consistency is not guaranteed in other places. Second, two papers are connected through many paths some way on the graph if they are close to each other in the latent space. Therefore, unlike other popular graph convolutional network-based encoder–decoder models, our model’s decoder is based on the Euclidean distance of the latent vectors.

The learning phase of our proposed framework comprises three steps:

  1. 1.

    Clustering the latent vectors;

  2. 2.

    Estimating document management criteria in each cluster;

  3. 3.

    Learning the latent vectors of documents based on graph autoencoders and obtaining the latent vector for the new document.

These steps were included in the iterations of our human-in-the-loop framework. Each time a user provides feedback, this step to updates the clusters and fine-tunes the models. Figure 2 illustrates the learning phase in one “move-learn-display” iteration in our framework in Fig. 1.

4.1 Clustering the latent vectors

The first step in our proposed method is to cluster the latent space in which the user provides feedback. (This corresponds to Line 22–23 in Algorithm 1.) The k-means clustering method is used in [10]. Clustering by k-means results in an adjacency matrix and center of mass for each cluster. The k-means optimization problem is expressed as follows:

$$\begin{aligned} \{\textbf{r}_{d_i}\}_{d_i\in \mathcal {D}},\{\vec {\mu }_k\}_{k\in [n_c]} = \mathop {\mathrm{arg\,min}}\limits _{\{\textbf{r}_{d_i}\},\{\vec {\mu }_k\}} J, \end{aligned}$$
(1)

where \(\textbf{r}_{d_i} = (r_{d_i,1},\ldots ,r_{d_i,k})^\top \) represents the cluster assignment vector of the document \(d_i\). Each element \(r_{d_i,j}\) is one if document \(d_i\) belongs to cluster j and zero otherwise. \(\vec {\mu }_k \in \mathbb {R}^L\) is the centroid vector of cluster k. The objective function J is defined as follows:

$$\begin{aligned} J = \sum _{d_i \in \mathcal {D}}\sum _{k \in [n_c]} r_{d_i,k} \left\Vert \vec {z}_{d_i} - \vec {\mu }_k \right\Vert _2^2, \end{aligned}$$
(2)

where \(\vec {z}_{d_i}\) is a latent vector for a document \(d_i\). After solving the K-means clustering, we obtain the k-th cluster

$$\begin{aligned} \mathcal {C}_k = \{d_i \in \mathcal {D} \mid r_{d_i, k}=1\}. \end{aligned}$$
(3)

4.2 Estimation of document management criteria in a given cluster

The second step in our proposed method is to estimate the user’s document management criteria in each cluster (corresponding to Line 24–25 of Algorithm 1). We used meta-paths as the management criteria for documents and weighed the meta-paths from the user’s space.

We assume that the management criteria is unique to each cluster created by a user. The fundamental concept for determining the weight of the meta-path is that when two documents in a cluster are related to a meta-path, the user manages the cluster by considering the meta-path. From this insight, we calculate the weight of meta-path p for k-th cluster from the adjacency matrix as follows:

$$\begin{aligned} w^p_k = \frac{n^p_k}{\sum _{ p \in \mathcal {P} } n^p_k} \end{aligned}$$
(4)

where \(n_k^p\) is the number of paths which are assigned to the cluster k and have the meta-path p and is defined as follows:

$$\begin{aligned} n_k^p = \left| \left\{ d_i \in \mathcal {C}_k \mid \exists d_j \in \mathcal {C}_k : \textbf{A}^p_{d_i, d_j}=1\right\} \right| . \end{aligned}$$
(5)

Once the weights of the meta-paths within a cluster k are determined, the adjacency matrices are weighted accordingly. The weighted adjacency matrix is defined as follows:

$$\begin{aligned} \tilde{\textbf{A}} = \displaystyle \sum _{k \in [n_c]} \displaystyle {\sum _{ p \in \mathcal {P} }} w^p_k \, \textbf{A}_{\mathcal {C}_k}^{p}, \end{aligned}$$
(6)

4.3 Learning the latent vector of documents based on graph autoencoders

The third step in our proposed method is to learn the latent vector of documents based on graph autoencoders. (This corresponds to Line 26–30 of Algorithm 1.) We used the weighted adjacency matrix \(\tilde{\textbf{A}}_k\) to obtain a latent vector in the latent space for each document. To this end, we constructed a graph convolutional network(GCN)-based encoder–decoder model with supervision from the user’s interactions.

4.3.1 Encoder

Our encoder is GCN [18] with two layers.

Particularly, the latent vectors are calculated using the following equations:

$$\begin{aligned} \textbf{Z} = GCN_{\phi }(\textbf{X}, \tilde{\textbf{A}}), \end{aligned}$$
(7)

where \(\textbf{X}\) is the feature matrix. GCN is defined as

$$\begin{aligned} GCN_{\phi }(\textbf{X}, \tilde{\textbf{A}}) = \hat{\textbf{A}} ReLU(\hat{\textbf{A}} \textbf{X} \textbf{W}^{(0)})\textbf{W}^{(1)} \end{aligned}$$
(8)

with the GCN parameter set \(\phi =\left\{ \textbf{W}^{(0)}, \textbf{W}^{(1)}\right\} \) where \(\textbf{W}^{(0)}\in \mathbb {R}^{|\mathcal {D}|\times h_1} \) is the weight of first layer and \(\textbf{W}^{(1)} \in \mathbb {R}^{h_1\times L} \) is the weight of the second layer. \(\hat{\textbf{A}}\) is defined as

$$\begin{aligned} \hat{\textbf{A}} = \textbf{D}^{-\frac{1}{2}} \tilde{\textbf{A}} \textbf{D}^{-\frac{1}{2}}. \end{aligned}$$
(9)

The decoder reconstructs the adjacency matrix \(\tilde{\textbf{A}}\) by computing the probability \(p_{\theta } (\tilde{\textbf{A}}|\textbf{Z}_\mathcal {D})\) of the edge generation based on the latent vector of each document where

$$\begin{aligned} p_{\theta } (\tilde{\textbf{A}}|\textbf{Z}) = \prod _{d_i \in \mathcal {D}}\prod _{d_j \in \mathcal {D}} p_{\theta } (\tilde{\textbf{A}}_{{d_i,d_j}} | \vec {z}_{d_i},\vec {z}_{d_j}). \end{aligned}$$
(10)

The decoder in the generative model was configured using the Euclidean distance between the latent vectors. This is intended to increase the probability of generating edges between documents that are placed closer together because the user provides feedback to the system based on the distance between documents. The decoder is expressed as follows:

$$\begin{aligned} p_{\theta } (\tilde{\textbf{A}}_{{d_i,d_j}} | \vec {z_{d_i}},\vec {z_{d_j}}) = \sigma \left( \frac{a}{\Vert \vec {z_{d_i}}-\vec {z_{d_j}}\Vert ^2_2}+ b\right) , \end{aligned}$$
(11)

where \(\sigma (\cdot )\) denotes a sigmoid function and \(\theta =\{a, b\}\) denotes a set of parameters used in the decoder.

4.3.2 Objective function

The objective function consists of cross-entropy loss for generating the adjacency matrix and supervision from the interaction by the user. Parameters \(\varvec{\phi }=\{\phi \}\) and \(\varvec{\theta }=\{\theta \}\) are learned to maximize them.

The cross-entropy used to generate the adjacency matrix is defined as follows:

$$\begin{aligned} \mathcal {L}_{GAE}&= {\sum _{d_i \in \mathcal {D}}\sum _{d_j\in \mathcal {D}}} \tilde{\textbf{A}}_{{d_i,d_j}} \log p_{\theta _k} \left( \tilde{\textbf{A}}_{{d_i,d_j}} \mid \vec {z_{d_i}},\vec {z_{d_j}}\right) \end{aligned}$$
(12)
$$\begin{aligned}&= \mathbb {E}_{GCN_{\phi }((\textbf{X}, \tilde{\textbf{A}})}[\log p_{\theta } (\tilde{\textbf{A}}|\textbf{Z})] \end{aligned}$$
(13)

Moreover, we define a loss function that measures the difference between the user’s feedback and the learned latent vectors to minimize disagreement. We measured this disagreement using the conditional probability that given the user feedback and the generation probability of the latent vector. The objective function is defined as follows:

$$\begin{aligned} \mathcal {L}_{feedback}&= \log p\left( GCN_{\phi }(\textbf{X}, \tilde{\textbf{A}}) \mid \hat{\mathcal {Z}} \right) \end{aligned}$$
(14)
$$\begin{aligned}&= \sum _{(\vec {z_{d_i}},\hat{\vec {z_{d_i}}})\in \hat{\mathcal {Z}}} \log \mathcal {N}(\vec {z_{d_i}} \mid \hat{\vec {z_{d_i}}}, \sigma ^2 \textbf{I}) \end{aligned}$$
(15)
$$\begin{aligned}&= - \sum _{(\vec {z_{d_i}},\hat{\vec {z_{d_i}}})\in \hat{\mathcal {Z}}} \left\Vert \vec {z_{d_i}}-\vec {\hat{z_{d_i}}} \right\Vert ^2_2 + const., \end{aligned}$$
(16)

where \(\mathcal {N}(\vec {x} \mid \vec {\mu },\varvec{\Sigma })\) denotes the multivariate normal distribution and \(\vec {z_{d_i}}^k\) represents a latent vector generated by the encoder. The overall optimization problem is defined as follows:

$$\begin{aligned} \varvec{\phi },\varvec{\theta } = \mathop {\mathrm{arg\,max}}\limits _{\varvec{\phi },\varvec{\theta }} \mathcal {L}_{GAE} + \alpha \mathcal {L}_{feedback}, \end{aligned}$$
(17)

where \(\alpha \) denotes the hyper-parameter.

4.4 Sampling strategy for active learning

In this section, we introduce an active learning sampling strategy for ISLE. (This step corresponds to Line 20–21 in Algorithm 1) Our strategy is a type of uncertainty sampling [22], which selects a sample that provides us one of the most informative answers to improve the model. The challenge here is to formalize the notion of uncertainty in our setting. To efficiently obtain users’ literature management criteria, we ask users for more important data in acquiring a user’s literature management criteria. Therefore, we define our uncertainty as the uncertainty of the meta-path weight in the clusters, where the weights are represented as a Dirichlet distribution, and measure the uncertainty as the information entropy of the probability distribution. The difference between the entropies of the prior and posterior distributions is defined as an increase in information acquired by asking the user. In our query strategy, our system is designed to ask a user for the most informative data, regardless of the cluster in which the requested data will be placed.

The prior distribution of the meta-path weight in the k-th cluster is a Dirichlet distribution, which is defined as follows:

$$\begin{aligned} p(\varvec{\pi } | \textbf{n}_k) = Dir(\varvec{\pi } | \textbf{n}_k) = C_D(\textbf{n}_k) \pi ^{n^p_k - 1}_p \end{aligned}$$
(18)

where \(C_D(\textbf{n}_k)\) is normalizing constant; \(C_D(\textbf{n}_k)=\frac{\Gamma (\sum _p n^p_k)}{\prod _p \Gamma (n^p_k)}\prod _{p\in \mathcal {P}}\). \(\textbf{n}_k=(n_k^1,n_k^2,\ldots ,n_k^{|P|})\), and \(n_k^p\) is defined as Eq. (5). After the user provides feedback on the position of a new paper, the posterior distribution is calculated as follows:

$$\begin{aligned} p(\varvec{\pi } | \textbf{n}_k, \textbf{n}_{new})&= \frac{p(\varvec{\pi }, \textbf{n}_{new} | \textbf{n}_k)}{p(\textbf{n}_{new} | \textbf{n}_k)} \propto p(\varvec{\pi }, \textbf{n}_{new} | \textbf{n}_k) \end{aligned}$$
(19)
$$\begin{aligned}&= Multi(\textbf{n}_{new} | \varvec{\pi }) Dir(\varvec{\pi }| \textbf{n}_k) \end{aligned}$$
(20)
$$\begin{aligned}&\propto \prod _{p\in \mathcal {P}} \pi _p^{n^p_{new}} \prod _{p\in \mathcal {P}} \pi _p^{n^p_k-1} \end{aligned}$$
(21)
$$\begin{aligned}&= \prod _{p\in \mathcal {P}} \pi _p^{n^p_k+n^p_{new} - 1} \end{aligned}$$
(22)
$$\begin{aligned}&\propto Dir(\varvec{\pi } | \textbf{n}_k + \textbf{n}_{new}) \end{aligned}$$
(23)

where \(\textbf{n}_{new}=(n^1_{new}, n^2_{new},\ldots , n^{|P|}_{new})\) is the number of new paths in the meta-path p in a cluster when a new document \(d_{new}\) is added to the cluster:

$$\begin{aligned} n^p_{new} = \left| \left\{ d_i \in \mathcal {C}_k \mid \textbf{A}^p_{d_i, d_{new}}=1\right\} \right| \end{aligned}$$
(24)

Based on the prior and posterior distribution, the information gain in a cluster k is defined as follows:

$$\begin{aligned} \Delta H_k =\,&\mathbb {E}[-\log Dir(\varvec{\pi } | \textbf{n}_k)]\nonumber \\&- \mathbb {E}[-\log Dir(\varvec{\pi } | \textbf{n}_k + \textbf{n}_{new})] \end{aligned}$$
(25)

where the entropy of the Dirichlet distribution is calculated as follows:

$$\begin{aligned}&\mathbb {E}[-\log Dir(\varvec{\pi }|\varvec{\alpha })] \nonumber \\&= -\sum ^{K}_{k=1}(\alpha _k - 1)(\psi (\alpha _k)-\psi (\sum _{i=1}^K \alpha _i)) - \ln C_D(\varvec{\alpha }) \end{aligned}$$
(26)

\(\psi \) is the digamma function. Based on the decrease of the entropy in a cluster k, we define the overall information gain of the asked data as the sum of them:

$$\begin{aligned} \phi _{gain} = \sum _{k\in [n_c]} \Delta H_k \end{aligned}$$
(27)

Based on the above criteria, we ask for the most informative data for the user and get feedback.

Algorithm 1
figure a

The active ISLE process for obtaining a new document q’s latent vector

5 Experiment

We conducted an experiment to answer our three research questions and determine the effectiveness of the method. For RQ1, we compared our method with its variation that assumes the consistency of the criteria across the latent space. For RQ2, we compared our framework with a popular encoder–decoder model for graphs as a baseline that uses an inner-product-based decoder. For RQ3, we compared the effectiveness of the proposed method using the active query strategy using randomly selected queries.

Fig. 3
figure 3

Interface we developed to conduct the experiments. The interface enables us to put the papers into a two-dimensional space. The red and blue dots represent the put papers and an operating paper, respectively. While operating the tool, the biblio-record of the operating paper is displayed at the bottom of the interface, and when the mouse floats on the icon of the paper, the biblio-record of the paper is displayed on the tool

5.1 Settings

5.1.1 Interface

We developed a two-dimensional literature management tool prototype to conduct our experiments. Figure 3 shows the actual interface used in the experiments. The interface displays a two-dimensional space to place the icon of the papers, and the subjects place the papers onto the space.

5.1.2 Participants

We recruited ten researchers (a humanities domain researcher, two data engineering domains, three HCI(Human Computer Interaction) domains and four ML(Machine Learning) domain researchers).

5.1.3 Data collection

First, we asked each of the participants to send us the BibTeX records of any 50 papers related to his or her research. Second, we asked them to use our tool, in such a way that the tool shows the biblio-records in random order and the user puts each into the two-dimensional space. As a result, we obtained the history of how they behaved in the 50 iterations, that is, how they moved their papers to incrementally place all 50 papers in their spaces. The user sees the title, author, conference, and year of publication of each paper in the phase. Figure 4 shows the spaces created by three participants. The distribution of the icon of the papers and the placement scheme of the subjects differ from each other, indicating that there are various document management criteria for each subject.

Fig. 4
figure 4

Spaces that each subject created. The distribution of icons and putting scheme is different in each subject, which clearly indicates the management criteria differ from each other

5.1.4 Evaluation

First, we randomly selected 10 data points and let them be the unknown documents, \(\mathcal {Q}\). The remaining 40 data were used as training data, and when the user interacted with them one by one, we simulated whether \(\mathcal {Q}\) could be placed in the position expected by the user. In the experiments, we set the hyper-parameters to \(L=2\), \(\alpha =100\), \(n_c=6\), and \(h_1=4\).

5.1.5 Metrics

We used Recall@k and nDCG@k [23] where k=6. Recall@k is expressed using the following equation:

$$\begin{aligned} Recall@k = \frac{|\mathcal {U}\cap \mathcal {P}_k|}{|\mathcal {U}|}, \end{aligned}$$
(28)

where \(\mathcal {U}\) denotes the set of the closest (with the Euclidean distance) k documents in the latent space to the test data placed by the user, and \(\mathcal {P}_k\) is the set of the closest k latent vectors to the position of the test data predicted by the model. nDCG@k [23] is obtained by dividing the value of DCG@k by the most ideal value of DCG@k, that is, if all model predictions are correct. The inverse of the distance from the correct position was used as the relevance value.

5.1.6 Active learning

Since our active learning strategy (Sect. 4.4) required a seed to choose the next query, we randomly choose the first query in the experiment.

5.2 Baselines and variations

(1) VGAE. VGAE is a popular encoder–decoder model for graphs [19]. In our experiment, VGAE is a variant of ISLE, in which the decoder is replaced by the decoder used in ordinary VGAE. That is, the decoder expressed in Eq. (11) in Sect. 5 is replaced with the inner product of each latent representation, which is expressed as follows:

$$\begin{aligned} p_{\theta } (\textbf{A}_{d_i,d_j} | \vec {z_{d_i}},\vec {z_{d_j}}) = \sigma (\vec {z_{d_i}} \cdot \vec {z_{d_j}}), \end{aligned}$$
(29)
Table 2 Meaning of each meta-path

where \(\sigma \) denotes the sigmoid function. Note that the meta-paths are considered to create the adjacency matrix, and Step 1 (clustering) (Sect. 4.1) is applied.

(2) ISLE and VAGE without clustering. We used VGAE and ISLE which omit Step 1 (clustering) (Sect. 4.1), to address RQ1.

(3) ISLE with different sets of meta-paths.

The meta-paths used are listed in Table 2. We compared the following five cases for ISLE, while we used all meta-paths for VGAE.

(a) ALL: The adjacency matrix comprises the PAP, PTP, PYP, and PVP meta-paths. (b) PAP Only: The adjacency matrix is composed of PAP only. (c) PTP Only: The adjacency matrix is composed of PTP only. (d) PYP Only: The adjacency matrix is composed of PYP only. (e) PVP Only: The adjacency matrix is composed of PVP only.

(4) Active ISLE. ISLE implements the sampling strategy introduced in Sect. 4.4

5.3 Results: passive setting

Figures 5a–6b show the result. The solid line in each figure represents the mean, and the shaded area represents the 95% confidence interval. The red lines in each figure indicate the results of our proposed method when the adjacency matrix provided as the input consists of ALL, as described in Sect. 5.2. The blue lines in Fig. 5 indicate the results of VGAE when the adjacency matrix provided as the input consists of ALL, as described in Sect. 5.2. The yellow and olive lines in the Fig. 5 indicate the results of estimating the document management criteria without using clusters in the proposed method. The green, peach, purple, and gray lines in Fig. 6 depict the limited types of meta-paths given as inputs in the proposed method. The figures demonstrate that the ISLE outperformed all the methods and that the accuracy improved as the number of feedbacks increased.

Fig. 5
figure 5

Horizontal axis represents the number of times feedback is received from the user and the vertical axis represents the values of recall and nDCG. The results of ISLE outperformed other methods

Fig. 6
figure 6

Horizontal axis represents the number of times feedback is received from the user the vertical axis represents the values of recall and nDCG. The results of ISLE using multiple meta-paths were more accurate

Note that in our context, recall@k indicates how close the predicted position is to the correct position, whereas nDCG@k indicates how it maintains the order of distances. Unlike in the ordinary information retrieval context, Recall@k is more critical for our problem because the order of distances can dramatically change, even if the position is slightly moved.

Figure 6a compares for the results with different sets of meta-paths. The results show that ISLE performs the best when we use all of the four meta-paths. As we noted in the limitation part, finding the best feature set was not our research question. However, this implies that researchers are aware of multiple criteria when managing papers and that the proposed method can flexibly express these criteria by using multiple meta-paths.

5.4 Results: active setting

Figure 8 compares the results of ISLE and Active ISLE, where the blue and orange lines show the results for Active ISLE and ISLE, respectively. As the Fig. 8 shows, Active ISLE exhibits a higher recall value with fewer interactions. This implies that the sampling strategy works well for quickly identifying the criteria for each document cluster.

6 Discussion

6.1 The locality of criteria in the latent space (RQ1)

The results shown in Fig. 5a and b clearly indicate that methods with a clustering phase are superior to those without clustering. This shows that the clusters of each researcher have a different set of weights for meta-paths, which means that researchers use different criteria in the sub-spaces in their latent space. Figure 7 shows the normalized distribution of the meta-path weights in each cluster for three of the ten subjects. Although PTP accounts for a large proportion of their distributions, their weights are often considerably different to each cluster even for the same researcher.

Fig. 7
figure 7

Normalized distribution of meta-path weights in each cluster. These figures show the ratio of the mata-paths in each cluster in the two-dimensional space that each subject made. These figures indicate the management criteria are very different in each cluster, even in the same subject

Fig. 8
figure 8

Comparison of the proposed method with and without active learning. The horizontal axis represents the number of times feedback is received from the user, and the vertical axis represents the values of recall@k. The results of the proposed method using active learning were more accurate

Fig. 9
figure 9

Comparison of recall by each researcher

6.2 Effectiveness of the Euclidean distance-based decoder (RQ2)

The idea behind our second assumption is that the user provides feedback in the latent space based on the Euclidean distance rather than the angle between documents (which is the principle of the VGAE’s decoder). Therefore, the ISLE decoder, which calculates the generation probability of edges based on the Euclidean distance between documents, is more accurate. The results presented in Fig. 5a and b clearly support this assumption.

6.3 Effectiveness of the active learning (RQ3)

Figure 8 shows that the accuracy of the active learning method is generally higher than that of the random sampling method. The main objective of introducing active learning is to achieve high accuracy with a small number of interactions by asking the user for an informative node. Figure 8 shows that this objective was achieved by introducing active learning. When comparing the recall values at the 10th interaction, a statistically significance was observed. This implies the potential effectiveness of query strategies that focus on the uncertainty of meta-path importance. In this experiment, the first dataset was randomly selected. An effective way to select the first data point is a subject for future studies.

6.4 Individual difference

We collected data from ten researchers, and Fig. 9 shows how each researcher’s feedback affected the accuracy of the data. Figure 9a, b shows that the accuracy generally improves as the number of feedback cycles increases. Our findings show that the management criteria of each researcher can be captured using meta-paths, although there are individual differences. In addition, the accuracy of the active learning method was higher than that of the random sampling method in almost all the researchers.

The degree of accuracy improvement through interaction varies from user to user. We assume that this is due to the manner in which users create their latent space. Figure 4 illustrates how each subject creates a latent space. If the user has cluster regions that are clearly divided in the latent space, we can estimate the criteria for managing the literature. However, if the clusters were ambiguous and could not be clustered according to a user’s expected document management criteria, we believe that the increase in interactions did not dramatically improve the accuracy. The following are possible reasons for the decrease in accuracy during the experiment: (1) The criteria changed during the experiment and the cluster was reconstructed. (2) The English paper is mixed with papers in another language.

7 Conclusion and future work

In this study, we proposed a method that estimates a user’s document management criteria based on human-in-the-loop latent space learning.

The experimental results showed that the proposed method accurately placed unknown documents at the user’s desired position compared with the baseline method. In addition, experiments with multiple and a limited number of meta-paths showed that the proposed method (ISLE) is more accurate when multiple meta-paths were used, indicating that ISLE is effective even when users manage documents according to various criteria. Based on the above results, we added an active learning framework to estimate a user’s document management criteria with a fewer number of interactions. The experimental results demonstrated the effectiveness of active learning.

In the future, we intend to study (1) develop a document management system based on ISLE that can be used in the real-world (2) consider using longer meta-paths. The realization of these goals will not only provide a better method for human-in-the-loop latent space learning but will also provide support for researchers in literature management.