1 Introduction

Machine learning techniques have achieved great success in many real-world applications (Butler et al., 2018; Jumper et al., 2021; LeCun et al., 2015). However, except for experienced machine learning experts, most ordinary people can hardly produce well-performed models if starting from scratch, due to a lack of proficient skills or abundant high-quality training data. In addition, although there are numerous powerful pre-trained models, due to data privacy concerns, it is generally difficult for ordinary people to identify beneficial models and apply them to their tasks.

To address these issues simultaneously, Zhou (2016) proposed to develop a learnware market, in which a learnware is a pre-trained model combined with a specification used to describe the specialty and utility of the model. As described in Zhou and Tan (2022), all developers could submit their trained models from various tasks into the market spontaneously. Once the submitted model is accepted, the market will assign it a specification. When the user wants to tackle her learning tasks, instead of starting from scratch, she can figure out her requirement to the market, and the market will identify and recommend helpful learnwares whose specifications match the user’s requirement. Then the user can apply these recommended learnwares to her task directly or polish them with minor labeled data.

In the learnware paradigm, the specification plays a pivotal role in identifying which models are helpful for the user’s current task, which leads to specification design as a fundamental problem. Recently, Wu et al. (2021) proposed the Reduced Kernel Mean Embedding (RKME) as the specification, based on which several efforts have initially attempted to realize a homogeneous prototype learnware market. The RKME specification makes a good approximation for the distribution of training data used by the model without revealing the raw data. This specification maps the data set to an element in the reproducing kernel Hilbert space (RKHS) which is also called specification space. The learnware market can then search helpful learnwares in this space upon the user’s request. However, the requirement that all pre-trained models in the market are obtained from the same feature space limits the scope of the learnware market for helping users identify and reuse models on their tasks.

In real-world scenarios, it is more common that the learnware market comprises models from different feature spaces. We take a medical scenario illustrated by Fig. 1 for example. The blue models and orange models are provided by hospitals, using the blood routine features and computed tomography (CT) features respectively for the diagnosis of common diseases. Carcinoembryonic antigen (CEA) and alpha-fetoprotein (AFP) features are used by green models provided by laboratories for cancer-related tasks. For described more practical learnware market, one appealing question comes that is it possible to identify and reuse helpful models across all models from different feature spaces, not only models sharing totally the same feature space with the user’s task. For example, it is eagerly hoped that the market could recommend helpful learnwares from all heterogeneous learnwares (#1-#6) and that the clinic could receive and reuse recommended learnwares (#1, #5) from two feature spaces for its task.

Fig. 1
figure 1

An example of the learnware market in a medical scenario. A learnware consists of a well-performed pre-trained model and a specification describing its ability. The market is naturally composed of learnwares from different feature spaces and different tasks. The market can help the user identify and reuse helpful learnwares upon the user's requirement

This paper makes the first attempt to handle learnwares from heterogeneous feature spaces, making the learnware paradigm viable in broader applications. The most notable difference compared with the homogeneous learnware studied before (Wu et al., 2021) is the mismatch of feature spaces between different learnwares and between learnwares and the user’s task, which results in harder learnware search and reuse. In this paper, we consider a fundamental heterogeneous scenario that the overall feature space can be divided into several disjoint parts. For example, the newly-built clinic as the medical learnware market user may collect multiple groups of features (e.g. both blood routine features and CT features) of patients with different machines as illustrated in Fig. 1. In order to realize the heterogeneous learnware search and reuse, it is essential to design a more powerful specification to manage learnwares from different feature spaces. This paper provides a solution by generating the RKME specification on a subspace learned from heterogeneous feature spaces, so as to provide a unified specification space for identifying learnwares matching the user's requirement.

The main contributions of this work can be summarized as follows:

  • We give the first formulation for the heterogeneous learnware problem where the overall feature space can be divided into several disjoint parts.

  • We propose a more powerful specification that provides a unified specification space for learnwares from heterogeneous feature spaces, where the market identifies helpful learnwares matching user’s requirement effectively.

  • We provide a detailed procedure for the construction and the usage of the heterogeneous prototype learnware market based on the new specification. Promising experimental results reported on both synthetic and real-world tasks validate the efficacy of the proposed specification and procedure.

The remaining part of the paper proceeds as follows. We briefly review preliminary techniques in Sect. 2. Then, the heterogeneous learnware problem is formulated in Sect. 3, followed by a novel specification design with corresponding procedure for learnware usage in Sect. 4. Next, Sect. 5 provides empirical studies on both synthetic and real-world tasks and Sect. 6 is concerned with related works. Finally, we conclude in Sect. 7.

2 Preliminary

In this section, we first review a specification designed for homogeneous learnwares, which is based on the kernel mean embedding (Smola et al., 2007). Then we review a subspace learning method based on the matrix factorization, which can be extended to finding the unified subspace of heterogeneous feature spaces.

Kernel mean embedding (KME) KME describes the probability distribution in a concise way without information loss and supports convenient operations like mean calculation. It maps a probability distribution \(\mathcal {P}\) defined over \(\mathcal {X}\) to an element in a reproducing kernel Hilbert space (RKHS) as \(\mu _{k}(\mathcal {P}):=\int _{\mathcal {X}} k(\varvec{x}, \cdot ) \mathrm {d} \mathcal {P}(\varvec{x})\), where \(k: \mathcal {X} \times \mathcal {X} \rightarrow \mathbb {R}\) is a symmetric and positive definite kernel function (Schölkopf & Smola, 2002) with associated RKHS \(\mathcal {H}\) and feature map \(\phi : \mathcal {X} \rightarrow \mathcal {H}\). The embedding \(\mu _{k}(\mathcal {P})\) exists and belongs to \(\mathcal {H}\) when \(\mathbb {E}_{\varvec{x} \sim \mathcal {P}}[\sqrt{k(\varvec{x}, \varvec{x})}]<\infty\). When equipped with characteristic kernels such as Gaussian kernel, no information about the distribution \(\mathcal {P}\) will be lost (Sriperumbudur et al., 2011). In reality, we can only access to a data set \(\left\{ \varvec{x}_{n}\right\} _{n=1}^{N}\) sampled from \(\mathcal {P}\), the empirical approximation for KME \(\mu _{k}(\mathcal {P})\) is \(\hat{\mu }_{k}(\mathcal {P}):=\frac{1}{N} \sum _{n=1}^{N} k\left( \varvec{x}_{n}, \cdot \right)\) with \(O(1/\sqrt{N})\) convergence (Smola et al., 2007).

Reduced kernel mean embedding (RKME) The favorable properties of KME make it a potential specification, however, the dependence on the raw data violates the privacy protection that the learnware paradigm needs. To tackle this issue, Wu et al. (2021) proposed the RKME by using a reduced set \(\left\{ \left( \beta _{m}, \varvec{z}_{m}\right) \right\} _{m=1}^{M}\) to approximate the empirical KME of the original data set \(\left\{ \varvec{x}_{n}\right\} _{n=1}^{N}\) via the following minimization problem:

$$\begin{aligned} \min _{\varvec{\beta }, \mathbf {z}}\left\| \frac{1}{N} \sum _{n=1}^{N} k\left( \varvec{x}_{n}, \cdot \right) -\sum _{m=1}^{M} \beta _{m} k\left( \varvec{z}_{m}, \cdot \right) \right\| _{\mathcal {H}}^{2}, \end{aligned}$$
(1)

where \(\varvec{z}_{m}\) is the element in the reduced set and \(\beta _{m}\) is the corresponding coefficient. The RKME \(\tilde{\mu }_{k}(\mathcal {P})\) satisfies a linear convergence rate \(O(e^{-M})\) to the empirical KME of original data set \(\hat{\mu }_{k}(\mathcal {P})\).

Subspace learning Subspace learning aims to find a subspace which can better describe the inherent structure of the data compared with the original feature space, and the methods (Lee & Seung, 2001; Zhu et al., 2022) based on the matrix factorization are commonly used. In this part, we review a wildly used technique called concept factorization (CF) (Xu & Gong, 2004). The idea of CF is to represent each concept as a linear combination of instances and reconstruct each instance as a linear combination of concepts. Given a data matrix \(\mathbf {X}=[\varvec{x}_1,\cdots ,\varvec{x}_N]\in \mathbb {R}^{d\times N}\) and the number of concepts k, CF aims to find a new representation in the subspace of \(\mathbf {X}\) as \(\mathbf {V}^{\top }\in \mathbb {R}^{k\times N}\) by minimizing the reconstruction error via

$$\begin{aligned} \min _{\mathbf {W},\mathbf {V}} \Vert \mathbf {X}-\mathbf {X}\mathbf {W}\mathbf {V}^{\top }\Vert _{\text {F}}^2, \quad \text { s.t. } \mathbf {W},\mathbf {V}\ge 0. \end{aligned}$$

where \(\mathbf {C}:=\mathbf {X}\mathbf {W}=[\varvec{c}_1,\cdots ,\varvec{c}_k]\in \mathbb {R}^{d\times k}\) is the concept matrix, \(\mathbf {V}\) is the reconstruction coefficient matrix and can also be explained as the projection of the input matrix in the subspace, \(\mathbf {W},\mathbf {V}\ge 0\) means all elements of \(\mathbf {W},\mathbf {V}\) is non-negative. The performance of CF can be improved by local structure maintenance (Cai et al., 2010; Wang et al., 2016).

3 Problem formulation

The learnware paradigm generally consists of the submitting stage and the deploying stage. In the submitting stage, the developers submit their models to the market, and the market will assign the specification for accepted high-quality learnwares and accommodate them in the market. In the deploying stage, the market identifies and returns learnwares matching the user's requirement and the user reuses them directly or polishes them with the user’s data. For the heterogeneous learnware problem, since the relationship of various feature spaces needs to be mined first by exploiting some aligned data, we simplify the submitting stage as an establishing stage where the market uses several owned models for initialization, and thus, the market is accessible to the raw data of the models which help connect different feature spaces better. Note that the raw data of the models in establishing stage is still invisible to users, matching the data privacy concern of the learnware paradigm. The two-staged formulation is illustrated in Fig. 2 and described as follows.

Fig. 2
figure 2

An illustration for two-stage heterogeneous learnware problem formulation. To initialize the learnware market, the market assigns specifications in the same space for models constructed from different feature spaces. In the establishing stage, auxiliary data across entire feature spaces is necessary to reveal the relationship between different feature spaces. In the deploying stage, the user’s task is defined over the Cartesian product of some feature spaces and the user can reuse helpful learnwares from the market

In the establishing stage, the market accommodating well-trained models generated from different feature spaces will assign specifications in a shared space for models. This shared specification space makes it possible and effective to match the user’s requirements in the next deploying stage. We first assume that the overall feature space \(\mathcal {X}\) can be split into k disjoint parts, i.e., \(\mathcal {X}_1,\cdots ,\mathcal {X}_k\). Supposing that the market has R models \(\{f_i\}_{i=1}^R\) and the i-th model is trained on the local data set \(D_i:=\{(\varvec{x}_{i,n},y_{i,n})\}_{n=1}^{N_i}\) whose feature space is \(\mathcal {X}_{v_i},v_i\in [k]\). Temporarily, each local data set only provides information of single feature space and no relationship between different feature spaces can be uncovered, which makes it impossible to find a shared specification space. To tackle this troublesome difficulty, a few data across different feature spaces is necessary. In reality, auxiliary data crossing different feature spaces is often accessible. For example, there are abundant multi-modal data (Chen et al., 2015) on the web to connect the figures with texts. Another more detailed example is medical data from different organizations (hospital, clinic laboratory) serving as different feature spaces which is generated from the same patients (Johnson et al., 2016). After the market collects such unlabeled auxiliary data defined on the entire feature space \(\mathcal {X}\) from existing satisfied web data, the heterogeneous learnware market is constructed as \(\{(f_i,\boldsymbol{s}_i)\}_{i=1}^R\), where \(\boldsymbol{s}_i\) is the specification of the model \(f_i\).

In the deploying stage, the user hopes to exploit heterogeneous market \(\{(f_i,\boldsymbol{s}_i)\}_{i=1}^R\) to handle her own task. Specifically, we consider the scenario that the user hopes to make a prediction on her unlabeled data set \(D_u:=\{\varvec{x}_{u,n}\}_{n=1}^{N_u}\) defined over a Cartesian product of several feature spaces \(\mathcal {X}_{v_u}:=\times _{i\in v_u\subseteq [k]}\mathcal {X}_i\). For example, the feature space of the user in Fig. 2 composed of the 1st and k-th feature space is \(\mathcal {X}_{\{1,k\}}\).

4 Our approach

We first sketch the overall procedure. The specification design is discussed in the establishing stage and the deploying stage shows how to use the specification to meet the requirements of the user. In the establishing stage, the market assigns specifications in a union space based on the subspace learning and RKME for models from different feature spaces to construct the learnware market. In the deploying stage, the user generates her requirement in the subspace with the projection tool provided by the market and the market identifies highly-relevant learnwares. After that, the user reuses learnwares via dynamic classifier selection. The main idea of our approach is to find a subspace to bridge different feature spaces with the shared specification space (RKHS).

Table 1 Main notations and corresponding definitions of the subspace generation

4.1 Establishing stage

In this stage, in order to build a union specification space, the market first constructs a union subspace for local tasks from different feature spaces. When the i-th task data set is mapped to the subspace, we generate RKME for such mapped data set as the i-th model’s specification \(\mathbf {s}_i\). Meanwhile, we hope the procedure of subspace generation can provide applicable tools to map the user data in the following deploying stage.

Subspace generation In this step, the market finds a common subspace for different feature spaces based on the local data sets each defined on a single feature space and extra auxiliary data across the entire feature space. We denote by \(\mathbf {X}_i \in \mathbb {R}^{d_i \times N_i}\) the feature of the i-th local data set \(\varvec{D}_i(i \in [R])\), and further denote by \(\hat{\mathbf {X}}^{(i)}\in \mathbb {R}^{d_i\times N^{(i)}}\) the concatenation of all data sets in the i-th feature space, in above, \(d_i\) is the dimension of i-th feature space \(\mathcal {X}_i\) and \(N^{(i)}\) is the total number of samples for all data sets in the i-th feature space. The extra auxiliary data is denoted as \(\mathbf {X}_c=[\mathbf {X}_c^{(1)};\cdots ;\mathbf {X}_c^{(k)}]\in \mathbb {R}^{(d_1+\cdots +d_k)\times N_c}\) where \(N_c\) is the size of auxiliary data. Then, the overall data containing the local tasks and auxiliary data in the i-th feature space is \(\mathbf {X}^{(i)}=[\hat{\mathbf {X}}^{(i)},\mathbf {X}_c^{(i)}]\in \mathbb {R}^{d_i\times (N^{(i)}+N_c)}\). The problem of subspace generation using local task data sets and extra auxiliary data is defined as

$$\begin{aligned} \begin{aligned} \min _{\mathbf {W}^{(i)},\mathbf {V}^{(i)},\mathbf {V}_c^*} O=&~\sum _{i=1}^k\left\{ \Big \Vert \mathbf {X}^{(i)}-\mathbf {X}^{(i)}\mathbf {W}^{(i)}\left( \mathbf {V}^{(i)}\right) ^{\top }\Big \Vert _{\text {F}}^2\right. \\&\left. +\alpha \text {Tr}\left( \left( \mathbf {V}^{(i)}\right) ^{\top }\mathbf {L}^{(i)}\mathbf {V}^{(i)}\right) +\beta \left\| \mathbf {V}_c^{(i)}-\mathbf {V}_c^*\right\| _{\text {F}}^2 \right\} \\ \text {s.t.}\ \ \mathbf {W}^{(i)}\ge 0, \hat{\mathbf {V}}^{(i)}\ge 0, \mathbf {V}_c^{(i)}\ge 0, \mathbf {V}_c^*\ge 0, \end{aligned} \end{aligned}$$
(2)

and the major notations are summarized in Table 1. The first item \(\Vert \mathbf {X}^{(i)}-\mathbf {X}^{(i)}\mathbf {W}^{(i)}(\mathbf {V}^{(i)})^{\top }\Vert _{\text {F}}^2\) presents the reconstruction loss of concept factorization loss, where \((\mathbf {V}^{(i)})^{\top }\in \mathbb {R}^{d\times (N^{(i)}+N_c)}\) is the new representation of \(\mathbf {X}^{(i)}\) in the subspace whose dimension is d. The second item \(\text {Tr}((\mathbf {V}^{(i)})^{\top }\mathbf {L}^{(i)}\mathbf {V}^{(i)})\) is a manifold regularizer used to maintain the local structure during the mapping, forcing similar outputs when inputs are closed. In which, \(\mathbf {L}^{(i)}\in \mathbb {R}^{(N^{(i)}+N_c)\times (N^{(i)}+N_c)}\) is the Laplacian matrix induced by \(\mathbf {X}^{(i)}\). The last item \(\Vert \mathbf {V}_c^{(i)}-\mathbf {V}_c^*\Vert _{\text {F}}^2\) reveals the internal consensus between different feature spaces, punishing the inconsistency of different mapped auxiliary data, where \(\mathbf {V}_c^{(i)}\), a part of \((\mathbf {V}^{(i)})^{\top }=[(\hat{\mathbf {V}}^{(i)})^{\top },(\mathbf {V}_c^{(i)})^{\top }]\), is the mapped auxiliary data of i-th feature space, \(\mathbf {V}_c^*\) is the final representation of the auxiliary data \(\mathbf {X}_c\) in the subspace.

figure a

The objective function Eq. (2) of subspace generation is not convex over all variables \(\mathbf {W}^{(i)},\mathbf {V}^{(i)},\mathbf {V}_c^*\), which makes it unrealistic to find its global minimum. Therefore, we propose an alternative optimization algorithm based on the multiplicative updated rule similar to (Févotte & Idier, 2011; Xu & Gong, 2004) with local minimum achieved. We provide the sketched optimization as Algorithm 1 shows and details are presented in Appendix A.1.

Specification assignment Based on the subspace generation, we can eventually design more powerful specifications. With the help of auxiliary data, we connect different feature spaces with a common subspace, and further develop specifications on such space. After subspace generation, the local data sets \(\{\mathbf {X}_i\}_{i=1}^R\) used by models \(\{f_i\}_{i=1}^R\) are mapped to \(\{\mathbf {V}_i\}_{i=1}^R\), which are achieved by splitting \(\{\mathbf {V}^{(i)}\}_{i=1}^k\). We generate RKME \(\mathbf {s}_i=\{(\gamma _{m},\varvec{w}_{m})\}_{m=1}^M\) for each \(\mathbf {V}_i:=\{\varvec{v}_n\}_{n=1}^{N_i}\) as the specification for i-th model via

$$\begin{aligned} \min _{\varvec{\gamma }, \mathbf {w}}\left\| \frac{1}{N} \sum _{n=1}^{N_i} k\left( \varvec{v}_{n}, \cdot \right) -\sum _{m=1}^{M} \gamma _{m} k\left( \varvec{w}_{m}, \cdot \right) \right\| _{\mathcal {H}}^{2}, \end{aligned}$$
(3)

where M is the size of reduced set. The minimization problem Eq. (3) can be solved with stochastic gradient descent (Wu et al., 2021).

The proposed specification provides models from various feature spaces with a shared specification space (RKHS), which makes the learnwares identification upon the user’s requirements concerning several feature spaces possible and effective. Furthermore, the specification can protect the original data set efficaciously. More specifically, the size of the specification M can be much smaller than that of original data set \(N_i\) and it is impossible to utilize the specification \(\mathbf {s}_i=\{(\gamma _{m},\varvec{w}_{m})\}_{m=1}^M\) to recover the mapped data set \(\mathbf {V}_i\) and the original data set \(\mathbf {X}_i\).

4.2 Deploying stage

In this stage, the user tries to exploit the market to tackle her prediction task while preserving data privacy. The user maps her data via the projection tool provided by the market and generates a reduced set accordingly served as user’s requirements. After receiving requirements, the market recommends highly-reusable learnwares to the user and the user reuses them on her task.

User data mapping In this step, the user produces requirements to the market for learnware recommendation and protects privacy in the meanwhile. The market first passes the projection tool to help the user map her data and the projection tool \(\{\mathbf {B}^{(i)}\}_{i=1}^k\) consists of base matrices of all feature space, which is generated from the byproduct \(\mathbf {W}^{(i)}\) of subspace learning via \(\mathbf {B}^{(i)}=\mathbf {X}^{(i)}\mathbf {W}^{(i)}\).

After receiving the projection tool \(\{\mathbf {B}^{(i)}\}_{i=1}^k\), the user can map her data into the same subspace. Without loss of generality, we assume that the user has data over top t feature spaces \(\mathcal {X}_u=\mathcal {X}_1\times \cdots \times \mathcal {X}_t\ (t\le k)\), we denote by \(\mathbf {X}_u=[\mathbf {X}_u^{(1)};\cdots ;\mathbf {X}_u^{(t)}]\in \mathbb {R}^{(d_1+\cdots +d_t)\times N_u}\) the user data, where \(N_u\) is the size of user data. The projection of user data is formulated as

$$\begin{aligned} \begin{aligned} \min _{\mathbf {V}^{(i)},\mathbf {V}^*} O_u=&~\sum _{i=1}^t\left\{ \left\| \mathbf {X}_u^{(i)}-\mathbf {B}^{(i)}\left( \mathbf {V}^{(i)}\right) ^{\top }\right\| _{\text {F}}^2\right. \\&\left. +\,\alpha \text {Tr}\left( \left( \mathbf {V}^{(i)}\right) ^{\top }\mathbf {L}^{(i)}\mathbf {V}^{(i)}\right) +\beta \left\| \mathbf {V}^{(i)}-\mathbf {V}^*\right\| _{\text {F}}^2 \right\} \\ \text {s.t.}\ \ \mathbf {V}^{(i)}&\ge 0, \mathbf {V}^*\ge 0, \end{aligned} \end{aligned}$$
(4)

where trade-off parameters \(\alpha ,\beta >0\) are used to control the contribution of the manifold regularizer and the consensus loss between different feature spaces, \(\mathbf {L}^{(i)}\) is the Laplacian matrix induced by \(\mathbf {X}_u^{(i)}\) and \(\mathbf {V}^*\) is the final representation of user data in the subspace. This problem has a similar structure to Eq. (2). One of major differences is that Eq. (4) possesses the fixed base matrix \(\mathbf {B}^{(i)}\) while Eq. (2) contains the learned base matrix \(\mathbf {X}^{(i)}\mathbf {W}^{(i)}\). The optimization using the multiplicative updated rule is described in Appendix A.2.

In order to keep the data privacy, the user only passes the reduced set \(r_u:=\{\beta _{u,m},\mathbf {z}_{u,m}\}_{m=1}^{M_u}\) constructed from mapped data set \(V^*:=\{\mathbf {v}_n\}_{n=1}^{N_u}\) via Eq. (1) as requirements to the market, where \(M_u\) is the size of user’s reduced set and can be much smaller than the original data set size \(N_u\). Meanwhile, such minor information can still guarantee the performance of learnware recommendation.

Learnware recommendation In this step, the market identifies useful learnwares for the user. After receiving the user’s requirement \(\mathbf {r}_u=\{\beta _{u,m},\varvec{z}_{u,m}\}_{m=1}^{M_u}\), the learnware market estimates the reusability score \(w_i\) of each learnware via according specifications \(\{\mathbf {s}_i:=\{\beta _{i,m},\varvec{z}_{i,m}\}_{m=1}^{M_i}\}_{i=1}^R\). The reusability score is estimated by the following problem:

$$\begin{aligned} \min _{\varvec{w}} \left\| \Phi _u(\cdot )-\sum _{i=1}^R\omega _i\Phi _i(\cdot )\right\| _{\mathcal {H}}^2, \text { s.t. } \omega _i\ge 0,\sum _{i=1}^R\omega _i=1, \end{aligned}$$

where \(\Phi _u(\cdot )=\sum _{m=1}^{M_u}\beta _{u,m}k(\varvec{z}_{u,m},\cdot )\) is the KME of user’s requirements, \(\Phi _i(\cdot )=\sum _{m=1}^{M_u}\beta _{i,m}k(\varvec{z}_{i,m},\cdot )\) represents the specification of the i-th model. This problem can be solved by quadratic programming (Smola et al., 2007). After reusability score estimation, the market delivers highly reusable learnwares whose reusability score \(\omega _i\) is no less than the pre-defined threshold L to the user. This step makes the user only access to highly relevant learnwares, heavily reducing the exchanged information between the market and the user when the market possesses plentiful learnwares.

User data prediction After the learnware market returns highly-related learnwares \(\{(f_i,\mathbf {s}_i)\vert w_i\ge L\}\), the user can take full advantage of these well-performed models for her problem. The specification based on the subspace can help the user to identify which model should be used for each instance. More specifically, the kernel herding technique (Chen et al., 2012) can be employed to sample mimic data set \(\bar{\varvec{D}}_i=\{\mathbf {x}_n\}_{n=1}^{\bar{N}_i}\) from the specification \(\mathbf {s}_i\) on the subspace, and then, a selector can be trained based on these data to predict which learnware should each unlabeled instance use. For example, the projection of one instance is classified by using the #1-learnware, then the user can reuse such learnware defined on the original feature space to predict on the instance. Above all, the user predicts unlabeled data via returned learnwares and trained selector.

Fig. 3
figure 3

An illustration of the proposed heterogeneous learnware procedure. In the establishing stage, various task data sets from heterogeneous feature spaces are mapped to a common subspace and specifications are built accordingly. In the deploying stage, the user passes requirements with the help of the projection tool to the market and gets helpful learnwares

4.3 Overall procedure and discussion

This part first summarizes the establishing stage for the heterogeneous learnware market construction and deploying stage for learnwares search and reuse. In the establishing stage, the market needs to specify the specification in an identical space for heterogeneous learnwares, which is implemented by using a subspace to bridge the heterogeneous feature spaces and the identical specification space. More specifically, specifications are generated via RKME on the mapped local task data obtained by the CF-based method. In the deploying stage, the user generates the subspace-based requirement, which is a reduced set built on the mapped user data generated by the projection tool (base matrices of different feature spaces) provided by the market. The overall procedure is illustrated by Fig. 3.

Afterwards, we give a preliminary discussion on the performance of the two-stage procedure. When the user’s task is covered by the market, the proposed method can make the user well assisted by the learnware market on her task. More specifically, considering learnwares whose feature space \(\mathcal {X}_{v_i}\) is a subset of the user’s feature space \(\mathcal {X}_{v_u}\), if the distribution of the user’s task \(\mathcal {P}_{u}\) is a mixture of task distribution of aforementioned learnwares, i.e., \(\mathcal {P}_{u}=\sum _{i:\mathcal {X}_{v_i}\subseteq \mathcal {X}_{v_u}}\xi _i\mathcal {P}_{i}\), then our procedure can make the market select out learnwares with large \(\xi _i\) and accurately tell the user for each instance which learnware should be used. Besides this basic scenario, our methods can be extended to solve more complicated scenarios. For example, the user’s task may have a part that isn’t covered by the market. For the homogeneous case where all learnwares and the user’s task share the same feature space, (Zhang et al., 2021) uses the mixture proportion estimation (MPE) technique (Zhang et al., 2020; Ramaswamy et al., 2016) to help identify such a unseen part and this technique can be further adapted to the heterogeneous case.

5 Experiments

This section demonstrates the effectiveness of our methods. We illustrate our methods by a synthetic task and show the performance on real-world tasks compared with contenders.

Fig. 4
figure 4

Basic data source: a and b present four tasks on two feature spaces, c and d present the sampled auxiliary data implying the connection of two feature spaces

Fig. 5
figure 5

Establishing stage: a and b present four tasks generated from aforementioned four tasks, which are used for building learnwares (#1-#4), c illustrates the subspace learned by four task data sets and auxiliary data, d illustrates specifications in the subspace

Fig. 6
figure 6

Deploying stage: a and b present the user data generated from task 2 and 3, c illustrates the mimic data generated from two learnwares (#2 and #3) delivered from the market, d illustrates the model selector (red line shows the classified boundary.) trained on the mimic data, providing which learnware should each sample use (#2 or #3)

5.1 Synthetic task

We first illustrate the two stages of the heterogeneous learnware market workflow through a synthetic task.

In the beginning, we develop four tasks defined on the Cartesian product of two feature spaces plotted by four colors (red, blue, green and orange) illustrated by Fig. 4a and b, each task is a binary classification problem and is generated from the Gaussian distribution. These tasks will be used to generate four local task data sets (Fig. 5a and b), the auxiliary data (Fig. 4c and d) and the test user data (Fig. 6a and b) of our heterogeneous learnware problem.

The unlabeled auxiliary data is sampled from four tasks with a size of 20 as shown in Fig. 4c and d. Each local task is sampled from one task and reserves the data of a single feature space. Each task has 200 samples, and positive samples are plotted in yellow while negative samples are plotted in blue. All tasks are shown in Fig. 5a and b. Figure 6a and b illustrate the test user data containing 100 samples, which is a mixture of task 2 and task 3 with sampling ratios of 0.3 and 0.7 accordingly.

Figure 5 illustrates the establishing stage. The market accommodates SVMs trained on four heterogeneous data sets from two feature spaces (Fig. 5a and b), the accuracy of SVMs is \(0.967\pm 0.015\). To assign the specification for each model, the market collects minor unlabeled auxiliary data across the entire feature space, which can build the connection of different feature spaces. With the help of the auxiliary data, four local data sets from two feature spaces can be mapped to a common subspace as illustrated in Fig. 5(c). Figure 5(d) shows the specification assigned for each SVM via RKME using mapped task data. The size of specification is \(M=5<\lfloor \ln 200\rfloor\), which is much smaller than the size of the original data set. The specifications can effectively protect the privacy of each local task.

Figure 6 illustrates the deploying stage. After the learnware market was constructed, the user can exploit it for her own task (Fig. 6a and b). Although the user only possesses the unlabeled data, she can still make a valid and satisfying prediction with the help of learnware markets under some mild assumptions. Using projection tools provided by the market, the user gets her mapped data (Fig. 6d) and generates the reduced set with a size of \(M_u=5\) to the market. The reduced set can keep user data privacy and provide helpful information for market identifying helpful learnwares in the meanwhile. Based on specifications, the market calculates reusability of four learnwares: [0.001, 0.399, 0.559, 0.042]. With the preset threshold \(L=0.1\), the market delivers learnware #2 and #3 to the user. Upon receiving the helpful learnwares, the user takes advantage of specifications of two learnwares to generate some mimic data (Fig. 6(c)) and trains a model selector that tells for each sample what learnware should be used (Fig. 6(d)), More specifically, learnware #2 is chosen for the lower samples and learnware #3 is chosen for the upper samples. With prediction on the original feature spaces using learnware #2 and #3, the accuracy on user data is 0.987.

5.2 Real-world tasks

We further verify the superior performance of the proposed specification and the procedure designed for the heterogeneous learnware problem on practical tasks.

Data sets The evaluation is conducted on four real world tasks: MFEAT (van Breukelen et al., 1998), AWA (Lampert et al., 2009), KDDCUP99 (Lippmann et al., 2000) and COVTYPE (Blackard & Dean, 1999). MFEAT is a digit data set containing 10 classes over six feature spaces: Fourier coefficients (fou.), profile correlations (fac.), Karhunen-Love coefficients (kar.), pixel averages (pix.), Zernike moments (zer.) and morphological features (mor.). Each class contains 200 samples and the dimensions of feature spaces are 76, 216, 64, 240, 47, 6. AwA is a large-scale animal data set, we randomly select 10 classes with totally 2000 samples over six feature spaces: color histogram features (cq.), local self-similarity features (lss.), PyramidHOG features (phog.), SIFT features (sift.), colorSIFT features (rgsift.) and SURF features (surf.). The dimensions of feature spaces are 2688, 2000, 252, 2000, 2000, 2000. KDDCUP99 is a unbalanced network intrusion detection dataset, we sample a balanced subset containing 8 classes and each class contains 1000 samples. COVTYPE is used for classifying the cover type (the dominant species of trees) of the patches of forest in the United States. We sample a balanced dataset containing 6 classes and each class has 1500 samples.

Data set configuration For MFEAT and AWA defined over six feature spaces, we choose two feature spaces to simulate the learnware scenario and generate totally six heterogeneous learnware tasks: MFEAT_fac_kar, MFEAT_pix_zer, MFEAT_fou_mor, AWA_cq_lss, AWA_phog_rgsift and AWA_sift_surf. For KDDCUP99 and COVTYPE, we randomly divide the entire feature space into two parts to simulate the heterogeneous learnware problem. For each task, we randomly split instances into three or four tasks based on their labels, each task has two or three classes. The mechanism of generating local data sets, the auxiliary data, and the test user data is similar to the synthetic task. The test user data is a mixture of several tasks. The number of mixed tasks ranges from 2 to 4.

Contenders As the heterogeneous learnware problem is new, we first compare it with two naive baselines. In these two methods, heterogeneous models are assigned with blank specifications.

  • Random: The market randomly selects a learnware for the user and the user reuses it to make a prediction directly.

  • Ensemble: The market returns all learnwares to the user without filtering and the user reuses them via ensemble, i.e., using all learnwares to predict one test instance and taking out the most confident predicted class.

Observing that both two methods don’t consider the relevance between learnwares and the user’s task. We equip each heterogeneous model with specification via RKME using raw features, and followed by delivering the learnwares with minimum MMD distance under the same feature space case proposed by Wu et al. (2021), we propose two variants for the heterogeneous case. For both methods, the user will also generate the reduced set on her original feature space to the market.

  • MMD: The market calculates the MMD distance of each learnwares and returns the learnware with the minimum MMD to the user. The user reuses the single learnware to make a prediction.

  • MMD+Ens: The market calculates the MMD distance of each learnwares and returns learnwares with the minimum MMD for each feature space. The user reuses learnwares via ensemble.

Experiment setup For all RKME-based methods, we use the Gaussian kernel \(k(x,y)=\exp (-\gamma \Vert x-y\Vert _2^2)\) with \(\gamma \in [0.1,0.01,0.001]\) for different tasks. The size of reduced set is 10 for specifications (\(M=10\)) and user’s requirements (\(M_u=10\)). We set the dimension of subspace as 10 for MFEAT-based tasks, KDDCUP99, COVTYPE and 50 for AWA-based tasks, which are much smaller than the dimension of original feature spaces. The auxiliary data consists of 160 samples, less than the size of local tasks. The threshold is set as \(L=0.1\). We use the linear SVM for the MFEAT-based tasks, KDDCUP99 and COVTYPE. We use random forest for the AWA-based tasks. All experiments are repeated 10 times.

Table 2 Accuracy (mean ± std.) on true labels of the user data.

Performance on user data Table 2 presents the prediction accuracy over the true labels on the user data. Our method outperforms other contenders, it achieves the best on the 22 over 23 cases, and it behaves significantly better than others in most cases, especially for MFEAT-based tasks and KDDCUP99 task. The Random performs poorly with low mean accuracy and large variance mainly due to selecting models aimlessly. Ensemble performs better than other three contenders because of making the user access all learnwares, however, this leaks information of irrelevant learnwares. Furthermore, when the market has abundant learnwares, it causes heavy burden on passing the learnware information and heavily expands the complexity of reusing learnwares. Compared with Ensemble, our method can make the user only access to the highly irrelevant learnwares. MMD and MMD+Ens performs well than Random in more than half of cases. With the help of distribution matching via RKME, MMD and MMD+Ens identify more reliable learnwares for the user. However, due to a lack of considering the relationship of different feature spaces, it’s still hard to identify truly helpful learnwares and performs much poorly than our methods except for one case.

Convergence analysis Figure 7 presents the convergence curve for the major optimization steps of our procedure, i.e., subspace learning and user data projection. The objective loss is normalized to [0, 1] for different tasks. As shown in Fig. 7(a), the subspace generation of all tasks except KDDCUP99 can be converged within 300 iterations and the value of the objective function decreases remarkably in the first 100 iterations. For user data projection, all tasks except KDDCUP99 converge within 50 iterations and go down rapidly in the first 20 iterations, which are shown in Fig. 7(b). For KDDCUP99, the subspace generation convergences at around 1000 iterations and the user task projection convergences within 50 iterations.

Fig. 7
figure 7

Convergence curves of subspace generation and user data mapping

6 Related work

The learnware paradigm (Zhou, 2016) aims to build a learnware market to help users solve their machine learning tasks more efficiently rather than starting from scratch. By helping users identify and reuse helpful well-performed models in the market for their tasks, this paradigm exploits the potential value of existing trained models and significantly reduces the needed resources for users, like computing resources, expert knowledge and labeled data.

As a novel branch of machine learning research, the learnware paradigm considers a general and realistic framework where a huge amount of models in the market are submitted spontaneously by developers from various tasks, and neither the original training data of developers nor the original data of users can be accessed. These bring grand challenges for users to identify and reuse helpful models in the market, and the specification is the original core component of the learnware paradigm to achieve this goal. Recently, there have been some efforts in this branch attempting to realize a simplified prototype framework. For instance, Wu et al. (2021) proposed the reduced kernel mean embedding (RKME) as the specification, which constructs the specification space by mapping the training data of models to an element of the reproducing kernel Hilbert space (RKHS). When the user’s task involves certain unseen parts not covered by the learnware market, based on RKME specification, Zhang et al. (2021) used the mixture proportion estimation (MPE) technique ( Ramaswamy et al., 2016; Zhang et al., 2020) to identify samples from the unseen parts while assigning the rest to proper models returned from the market. This paper provides a solution for learnwares from heterogeneous feature spaces by generating the RKME specification on a unified subspace.

Note that the techniques in transfer learning (Pan and Yang, 2009) and domain adaptation (Ben-David et al., 2007; Wang et al., 2022), which hope to transfer the knowledge in the source domain to the target domain, typically assume the accessibility of raw data (Dai et al., 2007; Fernando et al., 2013; Huang et al., 2006; Pan et al., 2010), and thus do not satisfy the privacy concerns in learnware paradigm. Besides, hypothesis transfer learning (Kuzborskij and Orabona, 2013) and model reuse (Ding and Zhou, 2020; Zhao et al., 2020) only apply to specific scenarios where the model to be adapted is helpful to the user task, and do not consider how to identify helpful models from a market without leaking raw data. There is limited study to reuse the model from different feature spaces without accessing raw data (Ye et al., 2018, 2020), but they also assume models are helpful for the current task. In this paper, we focus on a more comprehensive process comprising how to accommodate heterogeneous models in the market with appropriate specifications and how to identify and reuse helpful learnwares for the user’s current task.

Besides, since the learnwares in the market are submitted spontaneously by developers from various tasks and are identified for arbitrary user tasks, the learnware paradigm are studied in the open environment (Zhou, 2022), and techniques for open-environment machine learning (Zhao et al., 2021; Zhao and Zhou, 2021) may also bring some inspiration.

Recently, Zhou and Tan (2022) provided a brief overview of progress on learnware, which clarified the process of the learnware market and the design of the specification. It describes the prospects of the learnware paradigm and sheds light on future exploration.

7 Conclusion

In this paper, we have proposed the first practical approach to handling learnwares from heterogeneous feature spaces, which makes the learnware paradigm viable in broader applications. We give a basic formulation for the heterogeneous learnware problem and propose a novel specification design strategy via integrating the subspace learning, along with a detailed procedure for establishing and reusing the heterogeneous learnware market. Empirical studies on both synthetic data and real-world tasks substantiate the effectiveness of our methods. Although our method is designed for the basic scenario that each learnware only comes from one of the disjoint feature spaces, it can be naturally extended to the more general scenario that the learnware comes from the Cartesian product of several disjoint feature spaces. To summarize, for the basic heterogeneous learnware scenario where the overall feature space can be divided into disjoint parts and the feature space of the user’s task and learnwares can be any combination of different parts, the learnware market can be well established and used. For future research, formalizing the heterogeneous learnware problem in a more general way and proposing an effective solution are interesting subjects.