# Multiple-Instance Learning of Real-Valued Geometric Patterns

- 76 Downloads
- 13 Citations

## Abstract

Recently there has been significant research in multiple-instance learning, yet most of this work has only considered this model when there are Boolean labels. However, in many of the application areas for which the multiple-instance model fits, real-valued labels are more appropriate than Boolean labels. We define and study a real-valued multiple-instance model in which each multiple-instance example (bag) is given a real-valued label in [0, 1] that indicates the degree to which the bag satisfies the target concept. To provide additional structure to the learning problem, we associate a real-valued label with each point in the bag. These values are then combined using a real-valued aggregation operator to obtain the label for the bag. We then present on-line agnostic algorithms for learning real-valued multiple-instance geometric concepts defined by axis-aligned boxes in constant-dimensional space and describe several possible applications of these algorithms. We obtain our learning algorithms by reducing the problem to one in which the exponentiated gradient or gradient descent algorithm can be used. We also give a novel application of the virtual weights technique. In typical applications of the virtual weights technique, all of the concepts in a group have the same weight and prediction, allowing a single “representative” concept from each group to be tracked. However, in our application there are an exponential number of different weights and possible predictions. Hence, boxes in each group have different weights and predictions, making the computation of the contribution of a group significantly more involved. However, we are able to both keep the number of groups polynomial in the number of trials and efficiently compute the overall prediction.

## Preview

Unable to display preview. Download preview PDF.

## References

- [1]R.A. Amar, D.R. Dooly, S.A. Goldman and Q. Zhang, Multiple-instance learning of real-valued data, in:
*Proceedings of 18th International Conference on Machine Learning*(Morgan Kaufmann, 2001) pp. 3–10.Google Scholar - [2]D. Angluin, Queries and concept learning, Machine Learning 2(4) (1988) 319–342.Google Scholar
- [3]P. Auer, On learning from multi-instance examples: Empirical evaluation of a theoretical approach, in:
*Proceedings of 14th International Conference on Machine Learning*(1997) pp. 21–29.Google Scholar - [4]P. Auer, S. Kwek, W. Maass and M.K. Warmuth, Learning of depth two neural networks with constant fan-in at the hidden nodes, in:
*Proceedings of 9th Annual Conference on Computational Learning Theory*(1996) pp. 333–343.Google Scholar - [5]P. Auer, P.M. Long and A. Srinivasan, Approximating hyper-rectangles: Learning and pseudo-random sets, Journal of Computer and System Sciences 57(3) (1998) 376–388.Google Scholar
- [6]P.L. Bartlett, P.M. Long and R.C. Williamson, Fat-shattering and the learnability of real-valued functions, Journal of Computer and System Sciences 52(3) (1996) 434–452.Google Scholar
- [7]S. Ben-David and E. Dichterman, Learning with restricted focus of attention, Journal of Computer and System Sciences 56(3) (1998) 277–298.Google Scholar
- [8]A. Birkendorf, E. Dichterman, J. Jackson, N. Klasner and H.U. Simon, On restricted-focus-of-attention learnability of Boolean functions, Machine Learning 30 (1998) 89–123.Google Scholar
- [9]A. Birkendorf, N. Klasner, C. Kuhlmann and H.U. Simon, Structural results about exact learning with unspecified attribute values, Journal of Computer and System Sciences 60(2) (2000) 258–277.Google Scholar
- [10]A. Blum, P. Chalasani, S. Goldman and D. Slonim, Learning with unreliable boundary queries, Journal of Computer and System Sciences 56(2) (1998) 209–222.Google Scholar
- [11]A. Blum and A. Kalai, A note on learning from multiple-instance examples, Machine Learning 30 (1998) 23–29.Google Scholar
- [12]N.H. Bshouty and D.K. Wilson, On learning in the presence of unspecified attribute values, in:
*Proceedings of the Twelfth Annual Conference on Computational Learning Theory*(1999) pp. 81–87.Google Scholar - [13]N. Cesa-Bianchi, P. Long and M.K. Warmuth, Worst-case quadratic loss bounds for prediction using linear functions and gradient descent, IEEE Transactions on Neural Networks 7 (1996) 604–619.Google Scholar
- [14]D. Chawla, L. Li and S.D. Scott, Efficiently approximating weighted sums with exponentially many terms, in:
*Proceedings of the Fourteenth Annual Conference on Computational Learning Theory*(2001) pp. 82–98.Google Scholar - [15]A.P. Dempster, N.M. Laird and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, Series B 7 (1997) 1–38.Google Scholar
- [16]T.G. Dietterich, R.H. Lathrop and T. Lozano-Perez, Solving the multiple-instance problem with axis-parallel rectangles, Artificial Intelligence 89(1–2) (1997) 31–71.Google Scholar
- [17]D.R. Dooly, S.A. Goldman and S.S. Kwek, Real-valued multiple-instance learning with queries, in:
*Proceedings of the Twelfth International Conference on Algorithmic Learning Theory*(2001). To appear.Google Scholar - [18]P. Goldberg, S.A. Goldman and S.D. Scott, PAC learning of one-dimensional patterns, Machine Learning 25(1) (1996) 51–70.Google Scholar
- [19]S.A. Goldman, S.K. Kwek and S.D. Scott, Learning from examples with unspecified attribute values, Information and Computation. To appear. Early version in
*COLT*'97.Google Scholar - [20]S.A. Goldman, S.K. Kwek and S.D. Scott, Agnostic learning of geometric patterns, Journal of Computer and System Sciences 6(1) (2001) 123–151.Google Scholar
- [21]S.A. Goldman and S.D. Scott, A theoretical and empirical study of a noise-tolerant algorithm to learn geometric patterns, Machine Learning 37(1) (1999) 5–49.Google Scholar
- [22]P.M. Gruber, Approximation of convex bodies, in:
*Convexity and its Applications*, eds. P.M. Gruber and J.M. Willis (Birkhäuser, 1983).Google Scholar - [23]D. Haussler, Decision theoretic generalizations of the PAC model for neural net and other learning applications, Information and Computation 100(1) (1992) 78–150.Google Scholar
- [24]D.P. Helmbold, J. Kivinen and M.K. Warmuth, Worst-case loss bounds for single neurons, Technical Report UCSC-CRL-96-2, University of California Computer Research Lab, Santa Cruz, CA (1996). Early version in NIPS 8 (1996).Google Scholar
- [25]D.P. Huttenlocher and W.J. Rucklidge, A multi-resolution technique for comparing images using the Hausdorff distance, Technical Report 92-1321, Department of Computer Science, Cornell University (1992).Google Scholar
- [26]M.J. Kearns and R.E. Schapire,
*Efficient Distribution-Free Learning of Probabilistic Concepts*,*Vol. I: Constraints and Prospects*(MIT Press, 1994). Chapter 10, pp. 289–329. Earlier version appeared in*FOCS*'90.Google Scholar - [27]M.J. Kearns, R.E. Schapire and L.M. Sellie, Toward efficient agnostic learning, Machine Learning 17(2/3) (1994) 115–142.Google Scholar
- [28]J. Kivinen and M.K. Warmuth, Exponentiated gradient versus gradient descent for linear predictors, Information and Computation 132(1) (1997) 1–63.Google Scholar
- [29]J. Kivinen and M.K. Warmuth, Relative loss bounds for multidimensional regression problems, in:
*Proceedings of 1997 Neural Information Processing Conference*(1997) pp. 287–293.Google Scholar - [30]J. Kivinen, M.K. Warmuth and P. Auer, The perception algorithm vs. Winnow: Linear vs. logarithmic mistake bounds when few input variables are relevant, Artificial Intelligence 97(1–2) (1997) 325–343.Google Scholar
- [31]G.J. Klir and B. Yuan,
*Fuzzy Sets and Fuzzy Logic: Theory and Applications*(Prentice Hall, 1995).Google Scholar - [32]C.-T. Lin and C.S.G. Lee,
*Neural Fuzzy Systems: A Neuro-Fuzzy Synergism to Intelligent Systems*(Prentice Hall, 1996).Google Scholar - [33]N. Littlestone, Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm, Machine Learning 2 (1988) 285–318.Google Scholar
- [34]N. Littlestone, Redundant noisy attributes, attribute errors, and linear threshold learning using Winnow,in:
*Proceedings of 4th Annual Workshop on Computer Learning Theory*(Morgan Kaufmann, San Mateo, CA, 1991) pp. 147–156.Google Scholar - [35]N. Littlestone and M.K. Warmuth, The weighted majority algorithm, Information and Computation 108(2) (1994) 212–261.Google Scholar
- [36]P.M. Long, On-line evaluation and prediction using linear functions, in:
*Proceedings of 10th Annual Conference on Computational Learning Theory*(1997) pp. 21–31.Google Scholar - [37]P.M. Long and L. Tan, PAC learning axis-aligned rectangles with respect to product distributions from multiple-instance examples, Machine Learning 30 (1998) 7–21.Google Scholar
- [38]W. Maass and M.K. Warmuth, Efficient learning with virtual threshold gates, Information and Computation 141(1) (1998) 66–83.Google Scholar
- [39]O. Maron, Learning from Ambiguity, Ph.D. thesis, Department of Electrical Engineering and Computer Science, M.I.T. (1998).Google Scholar
- [40]O. Maron and T. Lozano-Pérez, A framework for multiple-instance learning, in:
*Advances in Neural Information Processing Systems 10*(1998).Google Scholar - [41]O. Maron and A.L. Ratan, Multiple-instance learning for natural scene classification, in:
*Proceedings of 15th International Conference on Machine Learning*(1998) pp. 341–349.Google Scholar - [42]K. Mulmuley,
*Combinatorial Geometry: An Introduction through Randomized Algorithms*(Prentice Hall, 1998).Google Scholar - [43]B. Pinette, Image-based navigation through large-scaled environments, Ph.D. thesis, University of Massachusetts, Amherst (1993).Google Scholar
- [44]J. Ramon and L.D. Raedt, Multiple-instance neural networks, in:
*Proceedings of the ICML-2000 Workshop on Attribute-Value and Relational Learning*(2000).Google Scholar - [45]S. Ray and D. Page, Multiple-instance regression, in:
*Proceedings of 18th International Conference on Machine Learning*(Morgan Kaufmann, 2001) pp. 425–432.Google Scholar - [46]D.-G. Sim, O.-K. Kwon and R.-H. Park, Object matching algorithms using robust Hausdorff distance measures, IEEE Transactions on Image Processing 8(3) (1999).Google Scholar
- [47]J. Wang and J.D. Zucker, Solving the multiple-instance problem: A lazy learning approach, in:
*Proceedings of 17th International Conference on Machine Learning*(2000) pp. 1119–1125.Google Scholar - [48]M.K. Warmuth and A.K. Jagota, Continuous and discrete-time non-linear gradient descent: relative loss bounds and convergence, in:
*Proceedings of Fifth International Symposium on Artificial Intelligence and Mathematics*(1997).Google Scholar