Moderately Supervised Learning: Definition, Framework and Generality

Learning with supervision has achieved remarkable success in numerous artificial intelligence (AI) applications. In the current literature, by referring to the properties of the labels prepared for the training dataset, learning with supervision is categorized as supervised learning (SL) and weakly supervised learning (WSL). SL concerns the situation where the training data set is assigned with ideal (complete, exact and accurate) labels, while WSL concerns the situation where the training data set is assigned with non-ideal (incomplete, inexact or inaccurate) labels. However, various solutions for SL tasks have shown that the given labels are not always easy to learn, and the transformation from the given labels to easy-to-learn targets can significantly affect the performance of the final SL solutions. Without considering the properties of the transformation from the given labels to easy-to-learn targets, the definition of SL conceals some details that can be critical to building the appropriate solutions for specific SL tasks. Thus, for engineers in the AI application field, it is desirable to reveal these details systematically. This article attempts to achieve this goal by expanding the categorization of SL and investigating the sub-type moderately supervised learning (MSL) that concerns the situation where the given labels are ideal, but due to the simplicity in annotation, careful designs are required to transform the given labels into easy-to-learn targets. From the perspectives of the definition, framework and generality, we conceptualize MSL to present a complete fundamental basis to systematically analyse MSL tasks. At meantime, revealing the relation between the conceptualization of MSL and the mathematicians' vision, this paper as well establishes a tutorial for AI application engineers to refer to viewing a problem to be solved from the mathematicians' vision.


Introduction
With the development of fundamental machine learning techniques, especially deep learning (LeCun et al., 2015), learning with supervision has achieved great success in various classification and regression tasks for artificial intelligence (AI) applications.Typically, a predictive machine learning model is learned from a training dataset that contains a number of training examples.For learning with supervision, the training examples usually consist of certain training events/entities and their corresponding labels.In classification, the labels indicate the classes corresponding to the associated training events/entities; in regression, the labels are real-value responses corresponding to the associated training events/entities.
In the current literature of learning with supervision, there are two main streams: supervised learning (SL) and weakly supervised learning (WSL) (Zhou, 2018).SL focuses on the situation where the training events/entities are assigned with ideal labels.The word 'ideal' here refers to that the labels assigned to the training events/entities are complete, exact and accurate.'Complete' indicates that each training event/entity is assigned with a label.'Exact' indicates that the label of each training event/entity is individually assigned.'Accurate' indicates that the assigned label can accurately describe the ground-truth of the corresponding event/entity.In contrast, WSL focuses on the situation where the training events/entities are assigned with non-ideal labels.The word 'non-ideal' here refers to that the labels assigned to the training events/entities are incomplete, inexact or inaccurate.'Incomplete' indicates that, only a proportion of training events/entities are assigned with labels.'Inexact' indicates that several training events/entities can be simultaneously assigned with a same label.'Inaccurate' indicates that the assigned label cannot accurately describe the ground-truth of the corresponding event/entity.More formal descriptions for SL and WSL, and their relations with this work are provided in Section 2.
The clear boundary between the descriptions of SL and WSL is the properties (completeness, exactness and accuracy) of the labels prepared for the training events/entities.However, in many real-world SL tasks under the era of deep learning, we cannot directly learn a predictive model that can effectively map the training events/entities to their correspondingly assigned labels.The main reason lies in the fact that the assigned labels are sometimes not easy to learn, though ideal (complete, exact and accurate) are they.This scenario appears due to the considerations of reducing the labour and difficulty in producing annotations for large amount of data under the era of deep learning.One must first transform the assigned labels into easy-to-learn targets for learning the predictive model of a SL solution.Existing solutions for various SL tasks have shown that the transformation from the given labels to the easy-to-learn targets can significantly affect the performance of the final SL solution (Law & Deng, 2020;Lin et al., 2017Lin et al., , 2020;;Xie et al., 2018;Xue et al., 2019).By simply referring to the properties (completeness, exactness and accuracy) of the labels prepared for the training events/entities, and without considering the properties of the transformation from the given labels to the easy-to-learn targets, the definition of SL conceals some details that can be critical to building the appropriate solutions for certain specific SL tasks.Thus, for practitioners in various application fields, it is desirable to reveal these details systematically.This article attempts to achieve this goal by expanding the categorization of SL and investigating the central sub-type of SL.
Defining the properties of the transformation from the given labels to learnable targets for an SL task as 'carelessly designed' and 'carefully designed' two types, we further categorize SL into three narrower sub-types.The three sub-types include precisely supervised learning (PSL), moderately supervised learning (MSL), and precisely and moderately combined supervised learning (PMCSL).PSL concerns the situation where the given labels are precisely fine.In this situation, we can carelessly design a transformation to obtain the easy-to-learn targets from the given labels.In other words, the given labels can be viewed as easy-to-learn targets to a large extent.PSL is the most classic sub-type of SL, and typical tasks include simple task like image classification (Krizhevsky et al., 2012) and complicated task like image semantic segmentation (Ghosh et al., 2019).MSL concerns the situation where the given labels are ideal, but due to the simplicity in annotation of the given labels, careful designs are required to transform the given labels into easy-to-learn targets for the learning task.This situation is different from the classic PSL, since the given labels must be carefully transformed into easy-to-learn targets for the learning task, which would otherwise lead to poor performance.This situation is also different from WSL since the given labels are complete, exact and accurate.Typical MSL tasks include cell detection (CD) (Xie et al., 2018) and line segment detection (LSD) (Xue et al., 2019).PMCSL concerns the situation where the given labels contain both precisely fine and ideal but simple annotations.Usually, PMCSL consists of a few PSL and MSL sub-tasks.Typical PMCSL tasks include visual object detection (Zhao et al., 2019), facial expression recognition (Ranjan et al., 2019) and human pose identification (Cao et al., 2019).More detailed characteristics of PSL, MSL and PMCSL are provided in Section 3.
In the three narrower sub-types, MSL counts for the majority of SL, due to the fact that PSL only counts for a minor proportion of SL and MSL is an essential part of PMCSL which account for the major proportion of SL.As a result, MSL plays the central role in the field of SL.Although solutions have been intermittently proposed for different MSL tasks, currently, insufficient researches have been devoted to systematically analysing MSL.In this article, we present a complete fundamental basis to systematically analyse MSL, via conceptualizing MSL from the perspectives of the definition, framework and generality.Primarily, presenting the definition of MSL, we illustrate how MSL exists by viewing SL from the abstract to relatively concrete.Subsequently, presenting the framework of MSL based on the definition of MSL, we illustrate how MSL tasks should be generally addressed.Finally, presenting the generality of MSL based on the framework of MSL, we illustrate how generality exist among different MSL solutions by viewing different MSL solutions from the concrete to relatively abstract; that is, solutions for a wide variety of MSL tasks can be more abstractly unified into the presented framework of MSL.Particularly, the framework of MSL builds the bridge between the definition and the generality of MSL.More details of the conceptualization of MSL from the perspectives of the definition, framework and generality are provided in Section 4.
For an application practitioner who invents deep learning-based technologies to promote the development of AI applications, in addition to chasing the state-of-the-art result for a problem to be solved, viewing the problem from the mathematicians' vision is as well or even more critical to discover, evaluate and select appropriate solutions for the problem, especially when deep learning has been becoming increasingly standardized and reaching its limits in some specific AI applications.However, currently, most AI application practitioners primarily focus on chasing the state-of-theart results for a problem to be solved, and few pay attention to viewing the problem from the mathematicians' vision.So, the question is how is the mathematicians' vision?There is a generalized answer to this question, which has been presented by the Chinese scientist Jingzhong Zhang (J.Zhang, 2016): "Mathematicians' vision is abstract.Those we think are different, they seem to be the same.Mathematicians' vision is precise.Those we think are the same, they seem to be very different.Mathematicians' vision is clear and sharp.They continue pursuing mathematical conclusions that we feel very satisfied with.Mathematicians' vision is dialectical.We think one is one and two is two, but they often focus on what is unchanging in the changing and what is changing in the unchanging." What we see from this generalized answer is that can we gain intrinsic insight into the nature of a problem to be solved only when we look at the problem from the mathematicians' vision, which is at least both from the abstract to the concrete and from the concrete to the abstract.With this in minds, in this article, we show the definition of MSL exists when we view SL from the abstract to relatively concrete, while the generality of MSL exists among different solutions when we view them from the concrete to relatively abstract.As a result, intrinsically, the conceptualization of MSL presented in this article is the product of viewing a problem to be solved from the mathematicians' vision.More details about the relation between the conceptualization of MSL and the mathematicians' vision can be found in Section 5.
In a previous article (Yang & Zheng, 2020), the existence of MSL was discussed for the first time.Differently, in this article, in addition to more completely conceptualizing MSL, we focus more on how MSL is conceptualized (i.e., illustrating the underneath methodology of conceptualizing MSL).Conceptualizing MSL from the perspectives of definition, framework and generality, this article provides a complete fundamental basis to systematically analyse the situation, where the given labels are ideal, but due to the simplicity in annotation of the given labels, careful designs are required to transform the given labels into easy-to-learn targets.At meantime, revealing the intrinsic relation between the conceptualization of MSL and the mathematicians' vision, this article also establishes a tutorial for AI application practitioners to refer to viewing a problem to be solved from the mathematicians' vision.We hope this article will be helpful to realize that it is as well or even more critical to systematically analyse a problem to be solved, in addition to chasing the stateof-the-art result.In summary, the detailed contributions of this article are as follows: • The conceptualization of MSL is more completely presented from the perspectives of definition, framework and generality.• Viewing SL from the abstract to relatively concrete, how the definition of MSL was revealed is illustrated.• The framework of MSL, which builds the bridge between the definition and the generality of MSL, provides the foundation to systematically analyse how MSL tasks should be generally addressed.• Viewing different MSL solutions from the concrete to relatively abstract, how the generality of MSL was revealed is illustrated.• The intrinsic relation between the conceptualization of MSL and the mathematicians' vision was revealed.• The whole article establishes a tutorial of viewing a problem to be solved from the mathematicians' vision.The rest of this article is structured as follows.In section 2, we discuss SL and WSL, and their relations to MSL.In section 3, we describe how SL is categorized into three narrower sub-types, discuss the relations of the three SL sub-types, and compare the SL sub-types with the WSL subtypes.In section 4, we present the definition, framework and generality of MSL, and illustrate how the definition and generality of MSL were revealed.In section 5, the relation between the conceptualization of MSL presented in this article and the mathematicians' vision is revealed to illustrate the underneath methodology of proposing the new concept of MSL.Finally, we discuss the whole article and point some possible future research directions for MSL in section 6.

Related Works
Formally, the task of learning with supervision is to learn a function :  ⟼  * from a training dataset .Usually,  denotes a set of events/entities,  * represents the given labels corresponding to  ,  is a function that can map  into corresponding  * , and the training dataset  consists of the events/entities  and corresponding labels  * .In the current literature, there are two main types: supervised learning (SL) and weakly supervised learning (WSL).Usually, these two types are distinguished according to the properties (completeness, exactness and accuracy) of the labels  * prepared for the events/entities  in the training dataset .Both SL and WSL are related to the concept of MSL in this article, since the proposal of MSL is started from the clear boundary between SL and WSL.

Supervised learning
In the paradigm of SL, the predictive models are usually produced via learning with complete, exact and accurate labels.Specifically, the training dataset  = {( 1 ,  1 * ), ⋯ , (  ,   * )}, where  is the number of events/entities and each   has a label   * that can ideally describe the groundtruth corresponding to   .Based on such carefully prepared training dataset  , SL has been widely adopted to solve fundamental tasks such as image classification, visual object tracking, visual object detection, and image semantic segmentation (Yang et al., 2018(Yang et al., , 2019(Yang et al., , 2023;;Yang, Lv, et al., 2020, 2021) and various other tasks (Li et al., 2021(Li et al., , 2022)).
In this article, taking into consideration the properties of the transformation from the given labels to learnable targets in the SL paradigm, we categorize SL into three narrower sub-types and particularly conceptualize the MSL sub-type.More details about the three sub-types of SL will be discussed in section 3.

Weakly supervised learning
In the paradigm of WSL, the predictive models are usually produced via learning with incomplete, inexact or inaccurate labels (Zhou, 2018).
Learning with incomplete labels focuses on the situation, where only a small amount of ideally labelled data is given while abundant unlabelled data are available to train a predictive model.In this situation, the ideally labelled data are commonly insufficient to learn a predictive model that has good performance.Typical techniques for this situation include active learning (Settles, 2010) and semi-supervised learning (Zhu, 2008).Formally, the training dataset for this situation can be denoted as  = {( 1 ,  1

Sub-types of Supervised Learning
The categorization of learning with supervision into SL and WSL in section 2 simply takes into consideration the properties (completeness, exactness and accuracy) of the labels prepared for the training dataset.However, in practice, we usually cannot directly learn a function :  ⟼  * that can effectively map the events/entities  into the corresponding labels  * for an SL task especially under the era of deep learning.Due to the considerations of reducing the labour and difficulty in producing annotations for large amount of data under the era of deep learning, the given labels  * are sometimes are not easy to learn.Usually, we must first build a transformation from given labels  * to easy-to-learn targets  * , and learn a function :  ⟼  * that can effectively map the given labels  * into the transformed easy-to-learn targets  * .This scenario needs to be more systematically discussed.
In this section, by taking the properties of the target transformation from  * to  * into consideration, we expand SL into three narrower sub-types.Usually, a transformation from given labels to easy-to-learn targets for an SL task is coupled with a re-transformation from the predicted targets of the learnt function  to the final predicted labels.Since a label re-transformation commonly consists of the reverse operations corresponding to its coupled target transformation, in this section, we assume that the properties of the label re-transformation remain the same as the properties of its coupled target transformation and give no additional discussions.

Properties of target transformation
We classify the transformations from given labels to easy-to-learn targets for SL tasks into 'carelessly designed' and 'carefully designed' two types.Intrinsically, we define that a target transformation is the 'carelessly designed' type if it is non-parameterized while a target transformation is 'carefully designed' type if it is parameterized, due to the fact that a nonparameterized target transformation simply requires careless designs while a parameterized target transformation must require careful designs.That a non-parameterized target transformation simply requires careless designs is because it can generate a type of the easy-to-learn targets that can be considered to be optimal.However, that a parameterized target transformation must require careful designs is because adjusting its parameters can generate various types of easy-tolearn targets from which the optimal type of easy-to-learn targets need to be found.To formally summarize the properties of a target transformation, we present Definition 1 as follows.Definition 1.For the given labels  * , the easy-to-learn targets generated by a 'carelessly designed' target transformation are where ≈ signifies the non-parameterized target transformation of   * from   * and the type of targets  * can be considered as optimal；and the easy-to-learn targets generated by a 'carefully designed' target transformation are where ⋘ signifies the parameterized target transformation of   * from   * and the optimal type of targets  * need to be found by adjusting the parameters of the target transformation.

Narrower sub-types of SL
Taking into consideration the properties presented in Definition 1 for the target transformation, we further classify SL into three narrower sub-types: precisely supervised learning (PSL), moderately supervised learning (MSL), and precisely and moderately combined supervised learning (PMCSL).

Precisely supervised learning
PSL concerns the situation where the given labels  * in the training dataset have precisely fine ground-truths.In this situation, we can simply construct a non-parameterized target transformation with careless designs to obtain the easy-to-learn targets  * from  * .Image classification (Krizhevsky et al., 2012) and image semantic segmentation (Ghosh et al., 2019) are two typical PSL problems.
In a -class image classification task, the given ground-truth label for the class of an image can usually be transformed into an easy-to-learn target using a -bit vector.In this vector, the bit corresponding to the given ground-truth label is set to 1, and the remaining bits are set to 0. Similarly, for a -class image semantic segmentation task, each pixel point in the given groundtruth label for the semantic objects in an image can be transformed into a value at the same pixel point in the easy-to-learn target.The transformed value can be a one-hot vector in classification or real-value response in regression, corresponding to its predefined class in the given ground-truth label.We can note that the target transformation for these two PSL tasks are non-parameterized and can be simply built with careless designs.In other words, to some extent, the given labels  * can be directly viewed as easy-to-learn targets  * due to their precise fineness.
In summary, a concise illustration for PSL can be shown as Fig. 1.For the image classification task of PSL (top row in Fig. 1), the event/entity () is an image lattice, the label ( * ) corresponding to the image lattice  is a predefined category, and the target ( * ) is transformed from the label  * to be easy-to-learn.Identically, for the image sematic segmentation task of PSL (bottom row in Fig. 1), the event/entity () is an image lattice, the label ( * ) corresponding to the image lattice  is a same size image lattice, in which a square represents a pixel and blue pixels indicate they belong to an object, and the target ( * ) is transformed from the label  * to be easy-to-learn.Here in Fig. 1, we illustrate the image classification task and the image semantic segmentation task of PSL only in two classes.From the top row in Fig. 1, we can note that, for the image classification task of PSL, the label  * can be transformed into the target  * easily, with careless designs.Since the image semantic segmentation task is the image classification task expanded to the pixel-level, for the bottom row in Fig. 1, the label  * can also be transformed into the target  * easily, with careless designs.As a result, the primary feature to distinguish PSL tasks is that the given labels  * can be directly viewed as easy-to-learn targets  * due to their precise fineness, to some extent.

Moderately supervised learning
MSL focuses on the situation where the given labels  * in the training dataset are ideal while possessing to some extent extreme simplicity.This situation is different from PSL since the simplicity of  * makes directly learning with the targets from its carelessly designed transformation probably impossible or leads to very poor performance.Due to the simplicity of  * , in this situation, the transformation from  * to easy-to-learn targets  * usually is parameterized and must require careful designs.Cell detection (CD) (Xie et al., 2018) and line segment detection (LSD) (Xue et al., 2019) with point labels are two typical MSL tasks.
In the CD task, the given labels for cells in an image lattice are simply a set of 2D points indicating the cell centres.In the LSD task, the given labels for line segments in an image lattice are simply a set of tuples, each of which contains two 2D points.The connection between the two 2D points of a tuple indicates a line segment in an image lattice.As the given labels for these two tasks are extremely simple, directly transforming them into easy-to-learn targets, in which pixel points corresponding to  * are set as foreground objects and the rest are set as background objects, will make the learning task impossible or lead to very poor performance.A more appropriate transformation which are restricted by a number of parameters (a parameterized transformation) can be used to alleviate this situation.However, adjusting the parameters of this parameterized transformation can result in various easy-to-learn targets that can significantly affect the performance of the final solution for an MSL task.As a result, it is usually difficult to find the object object： 1 non-object: 0 optimal easy-to-learn targets from the parameterized transformation for an MSL task.Thus, an appropriately parameterized target transformation for an MSL task must require careful designs to be constructed.In summary, a concise illustration for MSL can be shown as Fig. 2. For the cell detection task of MSL (top row in Fig. 2), the event/entity () is an image lattice, the label ( * ) corresponding to the image lattice  is a same size image lattice, in which a coordinate indicating the centre of the cell is given, and the target ( * ) is transformed from the label  * to be easy-to-learn.Identically, for the line segment detection task of MSL (bottom row in Fig. 2), the event/entity () is an image lattice, the label ( * ) corresponding to the image lattice  is a same size image lattice, in which two coordinates indicating the ends of a line segment are given, and the target ( * ) is transformed from the label  * to be easy-to-learn.From Fig. 2, we can note that, the labels  * for the two typical tasks of MSL are extremely simple and cannot be easily transformed into the corresponding easy-to-learn targets  * , leading to the situation that the target transformations for MSL tasks requires careful designs.As a result, the primary feature to distinguish MSL tasks is that the given labels  * require careful designs to be transformed into the easy-to-learn targets  * due to their extreme simplicity.

Precisely and moderately combined supervised learning
PMCSL concerns the situation where the given labels  * contain both precise and moderate annotations.In this situation, the transformation is usually built to have a mixture of properties of both the transformations designed for PSL and MSL tasks.Typical PMCSL tasks include visual object detection (Zhao et al., 2019), facial expression recognition (Ranjan et al., 2019) and human pose identification (Cao et al., 2019).Each of these tasks usually consists of a few PSL and MSL problems.
In the visual object detection task, the given labels for the objects in an image lattice are usually a set of tuples, each containing a class name and a bounding box (two coordinates) to indicate the category of an object and its position.Currently, deep convolutional neural networkbased (He et al., 2016;Hu et al., 2018;Huang et al., 2017;Krizhevsky et al., 2012;Simonyan & Zisserman, 2015;Szegedy et al., 2017) one-stage approaches (YOLO (Bochkovskiy et al., 2020;Redmon et al., 2016;Redmon & Farhadi, 2017a, 2017b), SSD (Liu et al., 2016) and RetinaNet (Lin et al., 2020)) and two-stage approaches (RCNN (Girshick et al., 2014), SPPNet (He et al., 2015), Fast RCNN (Girshick, 2015), Faster RCNN (Ren et al., 2017) and FPN (Lin et al., 2017)) are the state-ofthe art solutions for this task.The transformations of these solutions usually have a parameterized sub-transformation and a non-parameterized sub-transformation.The parameterized subtransformation is responsible for pre-defining a set of reference boxes (a.k.a.anchor boxes) with different sizes and aspect ratios at different locations of an image lattice.The sizes and aspect ratios can be adjusted to generate various reference boxes.These reference boxes are used to indicate the probabilities of corresponding areas as objects in an image lattice.The non-parameterized subtransformation is responsible for transforming the reference boxes obtained from the parameterized sub-transformation into their categories and locations according to the groundtruth class names and ground-truth bounding boxes labelled in an image lattice.Recently, researchers have also begun to propose anchor-free approaches (Duan et al., 2019;Law & Deng, 2020) to achieve object detection.In facial expression recognition (Ranjan et al., 2019) and human pose identification (Cao et al., 2019) tasks, the detection of landmarks of a face or a human is the primary problem.The given labels for the landmarks of a face or a human in an image are usually a set of tuples, each of which contains a 2D vector and a number to indicate the position and category of the landmark.The transformation of possible solution for the detection of landmarks also has a parameterized sub-transformation and a non-parameterized sub-transformation.
Basically, the detection of landmarks consists of two sub-problems: locating the landmarks and classifying the located landmarks.The parameterized sub-transformation, which is similar with the target transformation for pure MSL problem, aims to generate targets for locating landmarks.And, the non-parameterized sub-transformation is responsible for producing targets for classifying the located landmarks.These typical problems show that the target transformation for PMCSL enjoys a mixture of properties of the target transformations for pure PSL and pure MSL. that the labels  * for PMCSL tasks consist of labels for both PSL and MSL tasks, which leads to the targets  * transformed from the labels  * also consist of targets for both PSL and MSL tasks.As a result, the primary feature to distinguish MSL tasks is that the transformation from the given labels  * into the easy-to-learn targets  * constitutes of both careless and careful designs.

Relations of PSL, MSL and PMCSL
According to section 3.1 and section 3.2, the relations of PSL, MSL and PMCSL can be summarized as Fig. 4. coloured ellipses indicate the labels assigned to corresponding events/entities; coloured polygons signify the easy-to-learn targets transformed from the assigned labels.The sign ≈ or ⋘ denotes the 'carelessly designed' or 'carefully designed' transformation from the assigned labels to the easy-to-learn targets.
PSL and MSL are directly derived from SL, and PMCSL is indirectly derived from SL since it is the combination of PSL and MSL.The main differences of PSL, MSL and PMCSL are their target transformations from the assigned labels to corresponding easy-to-learn targets.Usually, the target transformation of PSL is carelessly designed (≈ ), the target transformation of MSL is carefully designed (⋘ ), and the target transformation of PMCSL constitutes both careless and careful designs (≈ and ⋘ ).In fact, PSL, MSL and PMCSL can be converted between each other by changing the modelling methods of the target transformations for their solutions.However, once the target transformation for a possible solution has been constructed, the sub-type of the corresponding SL task is clearly clarified.In other words, the constructed target transformation of a possible solution for an SL task fundamentally determines the sub-type of this SL task, which is crucial to building the appropriate solution for the task.

SL sub-types compared with WSL sub-types
WSL sub-types are determined according to the properties of the prepared labels for training data.Compared with WSL sub-types, the proposed SL sub-types are determined according to the properties of the target transformation from the given labels to easy-to-learn targets.Additionally, taking into consideration the properties presented in Definition 1 for the target transformation, the original WSL sub-types can also be classified into more refined sub-types.However, here we only focus more on the SL sub-types, since SL is more fundamental than WSL in the field of learning with supervision and the SL sub-types can naturally adjust to WSL.As a result, the proposed SL subtypes compared with WSL sub-types can be summarized as Fig. 5.

Conceptualization of Moderately Supervised Learning
MSL plays the central role in the field of SL, due to the fact that PSL only counts for a minor proportion of SL and MSL is an essential part of PMCSL which account for the major proportion of SL.Although solutions have been intermittently proposed for different MSL tasks, currently, insufficient researches have been devoted to systematically analysing MSL.To fill in this gap, we present a complete fundamental basis to systematically analyse MSL, via conceptualizing MSL from the perspectives of the definition, framework and generality.
Primarily, in section 4.1, presenting the definition of MSL, we illustrate how MSL exists by viewing SL from the abstract to relatively concrete.Subsequently, in section 4.2, presenting the framework of MSL based on the definition of MSL, we illustrate how MSL tasks should be generally addressed.Finally, in section 4.3, presenting the generality of MSL based on the framework of MSL, we illustrate how generality exist among different MSL solutions by viewing different MSL solutions from the concrete to relatively abstract; that is, solutions for a wide variety of MSL tasks can be more abstractly unified into the presented framework of MSL.Particularly, the framework of MSL builds the bridge between the definition and the generality of MSL.

Definition of SL
Let us consider the situation where the given labels of a number of training events/entities are ideal (complete, exact and accurate) but possess simplicity.Specifically, with the given simple  =  1 ,  1 * , ⋯ ,   ,   labels  * , the ultimate goal of the learning task here is to find the final predicted labels  that minimize the error against  * .Regarding this situation as a classic SL problem, we can define the objective function as where ℓ(•,•) refers to a loss function that estimates the error between two given elements.The smaller the value of this function is, the better the found  is. denotes the space of the predicted targets .

Definition of MSL
Due to the simplicity of  * , we must carefully build a transformation that transforms  * into easy-to-learn targets  * .On the basis of the transformed easy-to-learn targets  * , we build a learning function that maps events/entities  to the predicted targets  that minimize the error against  * .Based on the predicted targets , we the c refully buil l bel re-tr sform tio th t re-tr sforms  i to the fi l pre icte l bels  that can minimize the error against the labels  * .We assume  * can be constructed by 'decoding'  * as the easy-to-learn targets  * are more informative than the labels  * , the predicted targets  can be obtained by 'inferring' , and  can be constructed by 'encoding'  as the final predicted labels  are less informative than the predicted targets .Formally, we specify the following definition for MSL: where  denotes the target transformation,  denotes the learning function ,  denotes the label re-transformation, and  and  respectively denote the space of the easy-to-learn targets  * and the space of the predicted targets .

How the definition of MSL was revealed
Comparing the definition of SL (the Eq. (0-2)) with the definition of MSL (the Eq. (0-1)), we can note that, the definition of MSL was revealed by taking into consideration the transformation from the given simple labels  * to easy-to-learn targets  * , which proves that some details are indeed concealed by the abstractness of the definition of SL.Intrinsically, the methodology underneath the reveal of the definition of MSL stems from viewing the SL problem from the abstract to relatively concrete, which can be summarized as Fig. 6.

Framework
On the basis of the revealed definition of MSL, in this section, we present a generalized framework for solving MSL tasks.The outline of the presented framework is shown as Fig. 7.The presented MSL framework has three basic components including decoder, inferrer and encoder, and three basic procedures including learning, looping and testing.The three basic components are the key points of constructing fundamental solutions for MSL tasks, and the learning and looping procedures are the key problems of developing better solutions for MSL tasks.

Basic components
Decoder The decoder component transforms the given simple labels  * into easy-to-learn targets  * .Commonly, the decoder is built on the basis of prior knowledge which is parameterized by   .Formally, the function of the Decoder can be expressed by  * = ( * ;   ).
(1) Inferrer The inferrer component models the map between the events/entities  and corresponding easy-to-learn targets  * .Usually, the inferrer is built on the basis of machine learning techniques and is parameterized by   .Formally, the function of the Inferrer can be expressed by  = (;   ).
(2) Encoder The encoder component re-transforms the predicted targets  of the inferrer into the final predicted labels .Coupled with the decoder, the encoder is built on the basis of the decoder, which is parameterized by   .Formally, the function of the Encoder can be expressed by  = (;   ). (3)

Basic procedures
Learning The learning procedure aims to optimize the parameters   and   for the inferrer and encoder, respectively, under the prerequisite of a decoder that is empirically initialized with   ̅̅̅̅ .Specifically, we express the learning procedure as where   and   specify the parameter spaces of   and   , respectively, and  is the number of training events/entities.
Looping As the optimization of the parameters (  ,   ) for both the inferrer and encoder is conducted under the prerequisite of the decoder parameterized by   , a change in the decoder can significantly affect the optimization of   and   , which will eventually be reflected in the final predicted labels .In fact, prior knowledge can be enriched by analysing the predicted labels  of the current solution.The enriched prior knowledge can help us to model and initialize a better decoder.Thus, in practice, we commonly loop several times to adjust the decoder and restart the training for a possibly better solution.Specifically, we express the looping procedure as where   signifies the parameter space of   and |  denotes that the final predicted labels  are obtained by optimizing the parameters (  ,   ) of both the inferrer and encoder under the prerequisite of the decoder initialized with   .Testing As shown in Fig. 2, testing starts from the input , passes through the inferrer and encoder, and ends at .Specifically, the testing procedure can be expressed as where   ̃|  ̃ and   ̃|  ̃ are the parameters of the inferrer and encoder optimized under the prerequisite of the decoder initialized with   ̃ found by the looping procedure.

Analysis
From Eq. ( 1) to ( 6), the generalized framework presented for solving MSL tasks can be formally summarized as Fig. 8.We can note that Fig. 8 in fact reveals the key points of constructing fundamental solutions for MSL tasks and the key problems of developing better solutions for MSL tasks.The key points of constructing fundamental solutions for MSL tasks can be summarized as modelling the three basic components (decoder, inferrer and encoder), and the key problems of developing better solutions for MSL tasks can be summarized as evolving the learning and looping procedures to optimize the three basic components.The decoder is responsible for transforming the given labels into easy-to-learn targets.Usually, it is built and optimized on the basis of prior knowledge.The inferrer is responsible for mapping events/entities to corresponding easy-to-learn targets.Usually, it is built and optimized on the basis of machine learning techniques.The encoder is responsible for transforming the predicted targets of the inferrer into final predicted labels.Usually, the encoder is built and optimized on the basis of the decoder.

Generality
Although solutions have been intermittently proposed for different MSL tasks, little work has explored the generality of different MSL tasks, due to the lack of a clear problem definition and systematic problem analysis.In this subsection, based on the specified definition and presented framework for MSL, we show that generality exists among cell detection (CD) (Xie et al., 2018) and line segment detection (LSD) (Xue et al., 2019), which are two typical MSL according to the definition of MSL and have large differences in application scenarios.Following the presented framework for MSL, we review and rebuild the solutions proposed in (Xie et al., 2018;Xue et al., 2019) for these two largely different typical MSL problems to show that generality exist in their solutions.

Cell detection
Let ℱ be a 2D image lattice (e.g., 800×800).The moderate supervision information for the CD task uses a point   = (  ,   ) with offsets   and   , respectively, to represent the cell centre in ℱ.In this situation, the ground-truth label in ℱ is denoted by  = {  | = {1, ⋯ , }}.Some example images and corresponding labels are given in Fig. 9.
(1) Decoder The decoder transforms  labelled in an image lattice ℱ into a structured easy-to-learn target.It first assigns each pixel point  ∈ ℱ to the nearest cell centre point in  to partitions ℱ into  regions.Each region serves as a supportive area for a cell centre point.Then, by projecting its supportive region into a 1D real-valued representation, it transforms each cell centre   in  Parameterization Using the transformation function , we can transform  labelled in ℱ into a structured target without setting any specific parameters.However, the structured target transformed by this parameter-free model is redundant since many pixel points far from a line segment can also be assigned as being among its supportive points, which we believe is unnecessary.Note that φ is the parameter for adjusting the selection of necessary pixel points in ℱ.We parameterize and rewrite the supportive region for cell centre point  (

SL.
2) Subsequently, we naturally present the framework of MSL for generally solving typical tasks based on the revealed definition of MSL.In fact, the framework of MSL builds the bridge between the definition of MSL and the generality of MSL and is an inevitable product of revealing the definition and a necessary product of revealing the generality of MSL. 3) In addition, viewing different MSL solutions from the concrete to relatively abstract, we reveal the existence of the generality of MSL by showing that MSL solutions with large differences in detailed implementations can be unified into a similar methodological formation, with natural reference to the presented framework of MSL.Specifically, viewing different MSL solutions from the concrete to relatively abstract is another type of mathematicians' vision, which let us notice that solutions for a wide variety of MSL tasks can probably share something in common even though they have large differences in detailed implementations.As discussed in the introduction section, can we gain insight into the nature of a problem to be solved only when we look at the problem from the mathematicians' vision, which is at least both from the abstract to the concrete and from the concrete to the abstract.Specifically, without viewing the SL problem from the abstract to relatively concrete, which is one type of mathematicians' vision, the existence of the definition of MSL cannot be revealed, not to mention presenting the latter framework of MSL and revealing the generality of MSL.Similarly, without viewing different MSL solutions from the concrete to relatively abstract, which is another type of mathematicians' vision, the existence of the generality of MSL cannot be revealed.As a result, the intrinsic relation between the conceptualization of MSL presented in Section 4 and the mathematicians' vision is that the conceptualization of MSL is the product of viewing a problem to be solved from the mathematicians' vision, which can be summarized as Fig. 14.

Discussion
In the current literature, by referring to the properties of the labels prepared for the training dataset, learning with supervision is categorized as supervised learning (SL), which concerns the situation where the training dataset is assigned with ideal (complete, exact and accurate) labels, and weakly supervised learning (WSL), which concerns the situation where the training dataset is assigned with non-ideal (incomplete, inexact and inaccurate) labels.In this article, noticing the given labels are not always easy-to-learn and the transformation from the given labels to easy-tolearn targets can significantly affect the performance of the final SL solutions and taking into consideration the properties of the transformation from the given labels to easy-to-learn targets, we categorize SL into three narrower sub-types including: precisely supervised learning (PSL), which concerns the situation where the given labels are precisely fine; moderately supervised learning (MSL), which concerns the situation where the given labels are ideal, but due to the simplicity in annotation of the given labels, careful designs are required to transform the given labels into easy-to-learn targets for the learning task; and precisely and moderately combined supervised learning (PMCSL), which concerns the situation where the given labels contain both precise and moderate annotations.
Due to the fact that the MSL subtype plays the central role in the field of SL, we comprehensively conceptualize MSL from the perspectives of the definition, framework and generality.Primarily, viewing the SL problem with the mathematicians' vision from the abstract to relatively concrete, we reveal the existence of the definition of MSL by taking into consideration the transformation from the given simple labels to easy-to-learn targets.Subsequently, we naturally present the framework of MSL for generally solving typical tasks based on the revealed definition of MSL.In addition, viewing different MSL solutions with mathematicians' vision from the concrete to relatively abstract, we reveal the existence of the generality of MSL by showing that MSL solutions with large differences in detailed implementations can be unified into a similar methodological formation, with natural reference to the presented framework of MSL.The intrinsic relation between the conceptualization of MSL presented in this article and the mathematicians' vision is that the conceptualization of MSL is the product of viewing a problem to be solved from the mathematicians' vision.
As far as we know, this article is probably the first that formally and completely discussed the MSL situation in supervised learning under the era of deep learning.We hope this article will be helpful to realize that it is as well or even more critical to systematically analyse a problem to be solved, in addition to chasing the state-of-the-art result.One significance of this article is that, conceptualizing MSL from the perspectives of the definition, framework and generality, it provides the complete fundamental basis to systematically analyse the situation where the given labels are ideal, but due to the simplicity in annotation of the given labels, careful designs are required to transform the given labels into easy-to-learn targets.At meantime, the other significance of this article is that, revealing the intrinsic relation between the conceptualization of MSL and the mathematicians' vision to illustrate the underneath methodology of proposing the new concept of MSL, it establishes a tutorial of viewing a problem to be solved from the mathematicians' vision, which can be helpful for AI application practitioners to discover, evaluate and select appropriate solutions for the problem to be solved.
To end this article, we provide some possible future research directions for MSL.According to the framework of MSL, the key points of constructing fundamental MSL solutions can be summarized as modelling three basic components including decoder, inferrer and encoder, and the key problems of developing better MSL solutions can be summarized as the learning and looping procedures to optimize the three basic components.While abundant modelling approaches (Badrinarayanan et al., 2017;Chen et al., 2018;Falk et al., 2019;Shelhamer et al., 2017) and optimization methods (Bottou, 2010;Duchi et al., 2011;Kingma & Ba, 2015) have been proposed for the inferrer, the modelling and optimization of the decoder and the encoder lack systematic and comprehensive studies, except for some sporadic solutions for specific MSL tasks (Lin et al., 2017(Lin et al., , 2020;;Xie et al., 2018;Xue et al., 2019).On one hand, although successful decoders (Lin et al., 2017(Lin et al., , 2020;;Xie et al., 2018;Xue et al., 2019) have been proposed for different MSL tasks, the general methodology for modelling an appropriate decoder for an MSL task is still unclear.As the decoder determines how an MSL task is defined and is the prerequisite for optimization of both the inferrer and decoder, it is valuable to investigate how to effectively model a decoder for an MSL task with prior knowledge.On the other hand, because it is coupled with the decoder, small changes in the encoder can also significantly affect the final performance (Bodla et al., 2017;Hosang et al., 2017;Xie et al., 2018;Xue et al., 2019).Thus, it would also be interesting to investigate how to find an appropriate encoder for an MSL task.Respectively being the preprocessing and post-processing for the inferrer, the decoder and the encoder are both critical to building the appropriate solutions for MSL tasks, especially when the state-of-the-art inferrer (deep neural networks from complex (He et al., 2016;Hu et al., 2018;Huang et al., 2017;Krizhevsky et al., 2012;Simonyan & Zisserman, 2015;Szegedy et al., 2017) to lightweight (Howard et al., 2019;Sun et al., 2019;X. Zhang et al., 2018)) has been becoming standardized and reaching its limits in some specific AI applications.

Fig. 1 .
Fig. 1.Concise illustration for PSL.Top row: illustration for the typical image classification task of PSL; Bottom row: illustration for the typical image semantic segmentation task of PSL.

Fig. 2 .
Fig. 2. Concise illustration for MSL.Top row: illustration for the typical cell detection task of MSL; Bottom row: illustration for the typical line segment detection task of MSL.

Fig. 4 .
Fig. 4. Relations of PSL, MSL and PMCSL.Black rectangles denote the events/entities in the training dataset;

Fig. 6 .
Fig.6.Reveal of the definition of MSL from the abstract to relatively concrete.
Fig. 10 illustrates how the decoder transforms the ground-truth label into a easy-to-learn target for the CD task.

Fig. 10 .
Fig. 10.Illustration of the transforming process of the decoder for the CD task.(a) The given ground-truth label is a 10 × 10 image lattice in which the centres of three cells are labelled.(b) Supportive regions generated during the transforming process of the decoder.(c) Easy-to-learn target generated by transforming supportive regions into a structured representation.The top rows and bottom rows of (a) and (c) are two transformations of the decoder by adjusting its parameters.
) Inferrer Modelling A deep convolutional neural network (DCNN) is employed to model the inferrer for mapping an input image lattice  to an indirect target .Define {  } =1  as the transformation for each of the  layers from the DCNN architecture.The mapping function of the inferrer can be denoted by  =   ° 1 °⋯ °1 .Given one input image lattice , the network computes the output  as  = ().Parameterization We assume {  } =1  are parameterized by {  } =1  .The corresponding   has distinct forms for different types of   .The output computation of the network is rewritten by  = (; ),  = { 1 , ⋯ ,   }.