Evolutionary multiple instance boosting framework for weakly supervised learning

Multiple instance boosting (MILBoost) is a framework which uses multiple instance learning (MIL) with boosting technique to solve the problems regarding weakly labeled inexact data. This paper proposes an enhanced multiple boosting framework—evolutionary MILBoost (EMILBoost) which utilizes differential evolution (DE) to optimize the combination of weak classifier or weak estimator weights in the framework. A standard MIL dataset MUSK and a binary classification dataset Hastie_10_2 are used to evaluate the results. Results are presented in terms of bag and instance classification error and also confusion matrix of test data.


Introduction
Multiple Instance Learning (MIL) is a type of weak supervision. It falls under the inexact supervision category of weak supervision where the data are given with labels but not as exact as desired. This type of data is prevalent in medical field where the class labels are often not available at desired granularity [1]. Hence, MIL is particularly well suited for medical data analysis [2].
MILBoost was first proposed by Viola et al. [3]. It was mainly developed for object detection in images and videos. There on, MILBoost and many of its variants are used for various tasks-human action recognition [4], MIL with gradient boosting for face recognition from videos [5], human detection from artificially generated 3D human models [6], multi-class MILBoost for human parts detection [7], logistic MILBoost for pedestrian detection [8], gentle MILBoost for human detection which uses the Newton update to get an optimal weak classifier [9], confidence rated MILBoost [10], online MILBoost [11], object tracking by incorporating instance significance estimation into online MILBoost [12], online MILBoost for object tracking [13][14][15], visual object tracking. In medical applications, MILBoost has been used for early temporal prediction of Type 2 diabetes risk condition [16], liver cirrhosis classification using ultrasound images [17], histopathology cancer image classification, segmentation, clustering [18][19][20]. The main concept behind boosting is to sequentially train several weak classifiers or weak estimators and combine them to form a strong classifier. This combining is done through weighted sum of the weak classifiers. For this, each weak classifier is assigned a weight. The main task here is to find the combination of optimized weights to generate the strongest classifier. MIL-Boost uses AnyBoost framework [21] where boosting classifier is trained by maximizing the log-likelihood of all bags. There is a scope of improving the MILBoost framework by enhancing the weight optimization process through population-based evolutionary technique instead of a single point gradient descent technique. This also opens up the scope for parallelizing the optimization process. Evolutionary algorithms-Genetic algorithm (GA) [22] and Differential Evolution (DE) [23] has been used for MIL to formulate pooling functions [24,25].
The main objective of this work is to formulate a MIL-Boost framework based on differential evolution (DE) which will make the framework able to parallelize the optimization process.
The rest of the paper is divided into six sections. Section 2 elaborates on MILBoost. Section 3 gives a brief description 1 3 of DE. Section 4 presents the methodology. Section 5 discusses the experiments done and the subsequent results are discussed in Sect. 6. Finally, Sect. 7 concludes the paper.

Multiple instance boosting (MILBoost)
This section presents the formal representation of MIL-Boost. Suppose we have a binary classification data indicates that the positive bag X i contains at least one positive instance x ij (j = {1,2,…,m}). Y i = 0 means that there are no positive instances in the bag X i . The task is to identify a real-valued function h(x ij ) to infer the instance label y ij corresponding to an instance x ij . This function is estimated through a weak classifier. Then, through boosting, weak classifiers are combined to form a strong classifier with low error where K is the number of weak classifiers, α k are the classifier weights or estimator weights which signify the relative importance of a weak classifier. In each phase, incorrectly classified instances receive more weights.
In MILBoost, the probability of an instance being positive is The probability that a bag is positive is The log-likelihood of all bags is The main task is to train the classifier by maximizing this log-likelihood function.

Differential evolution (DE)
Differential evolution (DE) is a population-based evolutionary metaheuristic technique; used for solving the complex structured optimization problem in many application areas. DE was initially proposed by Storn and Price [23] in 1996. For a more profound understanding, article [26] could be referred. In general, DE formulation is divided into two phases-initialization and evolution. Initialization phase comprises random population generation, and evolution phase consists of mutation, crossover and selection for generating the new population for next generation. The flowchart for DE is presented in Fig. 1.

Initialization
In this step, a set of the uniformly distributed random population is generated. These represent the initial solution points in the search space.
where G is the number of generations, NP is the number of individuals in population, D is dimension of an individual, lb and ub are lower and upper bounds respectively, r ∈ [0,1] is random number, i ∈ {1,2,…,NP}, j ∈ {1,2,…,D}.

Mutation
After population generation, mutation is performed to expand the search space. In mutation strategy, for each target vector a corresponding mutant vector is generated. DE has various mutation strategies. In this paper, the DE/rand/1 strategy is used to generate mutant vector where V i is mutant vector, F ∈ (0,1.2] is scaling factor, X are individuals in population and r 1 ,r 2 ,r 3

Crossover
Crossover is performed between target vector and mutant vector to increase the diversity of the population and to assimilate the best individual. After the crossover, trial vectors are generated. For a trial vector U i = (u 1i , u 2i , ..., u Di )where CR ∈ [0,1] is crossover probability, r ∈ [0,1] is random number and j r ∈ {1,2,…,D}.

Selection
Tournament selection is done between the trial and the target vector and the one having a better fitness value move on to the next generation.
where f(•) is the objective function.

Methodology
DE is used in the MILBoost framework to optimize the loglikelihood of all bags as defined in Eq. (4). So, the objective function for DE in this work is the log-likelihood function. A population of α k , the classifier weights as defined in Eq. (1) is randomly initialized. The algorithm for the proposed Evolutionary MILBoost (EMILBoost) is presented below while it is pictorially represented through flowchart in Fig. 2. As mentioned earlier, DE paves the path for parallelization of the optimization process. Unlike general optimization techniques, DE-a metaheuristic process approaches the optimal value from various directions. It generates multiple values in the search space as initial solutions and then converges towards the optimal point in the search space. Hence, rather than approaching from a single point, as DE approaches the problem from various directions, naturally parallelization will improve the process.

Data
For this work, two classic MIL benchmark datasets are used-MUSK1 and MUSK2 [27] which are available in UCI Machine Learning Repository [1]. These correspond to the problem of predicting drug activity. A molecule has the desired drug effect if and only if one or more of its conformations bind to the target binding site. Since molecules can adopt multiple shapes, a bag is made up of shapes belonging to the same molecule. MUSK1 and MUSK2 contain 476 and 6598 instances respectively. MUSK2 is used for training data as it contains greater number of instances. MUSK1 is used as testing data. Both the datasets have total 168 attributes out of which 166 are features. The data attribute information is given in Table 1.
Apart from the aforementioned datasets, a standarad binary classification dataset used to test boosting frameworks-Hastie_10_2 is used in this book [28] which is available in the scikit-learn dataset library [29]. The Hastie_10_2 dataset has 10 attributes X 1 , X 2 , … ., X 10 which are standard independent Gaussian variates. The class is defined as

Experimental setup
Decision Tree classifiers with maximum depth of 1 are used as the weak classifiers. Log-sum-exp pooling is used for bag pooling. For, implementation smoothness, the negative of

Evaluation metrics
As this is a classification problem, hence the standard training error, testing error and confusion matrix of testing data are used as the evaluation metrics here. In MIL, we evaluate our model on the basis of bag classification accuracy. Therefore, bag training error, bag testing error are used here.

Results and discussions
The results of the proposed EMILBoost is compared with another two boosting frameworks-GentleBoost and Log-itBoost [30]. Tables 2 and 3 records the testing and training   Fig. 7 for MUSK dataset. From Tables 2 and 3, it is clear that EMILBoost achieves the lowest errors and hence outperforms GentleBoost and LogitBoost. Figures 3 and 4 also establishes the supremacy of EMILBoost.
From Fig. 5 and 6, it can be easily inferred that increasing the number of weak classifiers improve the learning process i.e. corresponds to lesser error.
The upper left block of confusion matrix signifies the True Positives (TP), lower right signifies True Negatives (TN) while lower left signifies False Positives (FP) and upper right signifies False Negatives (FN). Main aim of a classifier is to obtain more numbers of TP + TN and lesser numbers of FP + FN. From Fig. 7, it can be easily inferred that for EMILBoost framework, TP + TN > FP + FN . Hence, the framework is performing as desired.

Conclusion
The main aim of this paper was to enhance the MILBoost framework through DE, a population-based evolutionary metaheuristic method by optimizing the weak classifier weights. DE also paves the path to parallelize this optimization process. The results show that the proposed EMILBoost outperforms GentleBoost and LogitBoost. Increasing the number of weak classifiers improves the learning process while on the other hand it increases the learning time. A trade-off between these two is needed through optimizing the number of weak classifiers which is a multi-objective problem. This can be regarded as the future extension of this work.
Funding Not applicable.
Availability of data and material Dua and Graff [1].
Code availability Not applicable.

Conflict of interest
The authors declare that there is no conflict of interest or no competing interests regarding the publication of this article.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.