Annals of Mathematics and Artificial Intelligence

, Volume 39, Issue 3, pp 259–290 | Cite as

Multiple-Instance Learning of Real-Valued Geometric Patterns

  • Sally A. Goldman
  • Stephen D. Scott


Recently there has been significant research in multiple-instance learning, yet most of this work has only considered this model when there are Boolean labels. However, in many of the application areas for which the multiple-instance model fits, real-valued labels are more appropriate than Boolean labels. We define and study a real-valued multiple-instance model in which each multiple-instance example (bag) is given a real-valued label in [0, 1] that indicates the degree to which the bag satisfies the target concept. To provide additional structure to the learning problem, we associate a real-valued label with each point in the bag. These values are then combined using a real-valued aggregation operator to obtain the label for the bag. We then present on-line agnostic algorithms for learning real-valued multiple-instance geometric concepts defined by axis-aligned boxes in constant-dimensional space and describe several possible applications of these algorithms. We obtain our learning algorithms by reducing the problem to one in which the exponentiated gradient or gradient descent algorithm can be used. We also give a novel application of the virtual weights technique. In typical applications of the virtual weights technique, all of the concepts in a group have the same weight and prediction, allowing a single “representative” concept from each group to be tracked. However, in our application there are an exponential number of different weights and possible predictions. Hence, boxes in each group have different weights and predictions, making the computation of the contribution of a group significantly more involved. However, we are able to both keep the number of groups polynomial in the number of trials and efficiently compute the overall prediction.

exponentiated gradient multiplicative weight updates virtual weights geometric patterns multiple-instance learning scene classification content-based image retrieval landmark matching 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    R.A. Amar, D.R. Dooly, S.A. Goldman and Q. Zhang, Multiple-instance learning of real-valued data, in: Proceedings of 18th International Conference on Machine Learning (Morgan Kaufmann, 2001) pp. 3–10.Google Scholar
  2. [2]
    D. Angluin, Queries and concept learning, Machine Learning 2(4) (1988) 319–342.Google Scholar
  3. [3]
    P. Auer, On learning from multi-instance examples: Empirical evaluation of a theoretical approach, in: Proceedings of 14th International Conference on Machine Learning (1997) pp. 21–29.Google Scholar
  4. [4]
    P. Auer, S. Kwek, W. Maass and M.K. Warmuth, Learning of depth two neural networks with constant fan-in at the hidden nodes, in: Proceedings of 9th Annual Conference on Computational Learning Theory (1996) pp. 333–343.Google Scholar
  5. [5]
    P. Auer, P.M. Long and A. Srinivasan, Approximating hyper-rectangles: Learning and pseudo-random sets, Journal of Computer and System Sciences 57(3) (1998) 376–388.Google Scholar
  6. [6]
    P.L. Bartlett, P.M. Long and R.C. Williamson, Fat-shattering and the learnability of real-valued functions, Journal of Computer and System Sciences 52(3) (1996) 434–452.Google Scholar
  7. [7]
    S. Ben-David and E. Dichterman, Learning with restricted focus of attention, Journal of Computer and System Sciences 56(3) (1998) 277–298.Google Scholar
  8. [8]
    A. Birkendorf, E. Dichterman, J. Jackson, N. Klasner and H.U. Simon, On restricted-focus-of-attention learnability of Boolean functions, Machine Learning 30 (1998) 89–123.Google Scholar
  9. [9]
    A. Birkendorf, N. Klasner, C. Kuhlmann and H.U. Simon, Structural results about exact learning with unspecified attribute values, Journal of Computer and System Sciences 60(2) (2000) 258–277.Google Scholar
  10. [10]
    A. Blum, P. Chalasani, S. Goldman and D. Slonim, Learning with unreliable boundary queries, Journal of Computer and System Sciences 56(2) (1998) 209–222.Google Scholar
  11. [11]
    A. Blum and A. Kalai, A note on learning from multiple-instance examples, Machine Learning 30 (1998) 23–29.Google Scholar
  12. [12]
    N.H. Bshouty and D.K. Wilson, On learning in the presence of unspecified attribute values, in: Proceedings of the Twelfth Annual Conference on Computational Learning Theory (1999) pp. 81–87.Google Scholar
  13. [13]
    N. Cesa-Bianchi, P. Long and M.K. Warmuth, Worst-case quadratic loss bounds for prediction using linear functions and gradient descent, IEEE Transactions on Neural Networks 7 (1996) 604–619.Google Scholar
  14. [14]
    D. Chawla, L. Li and S.D. Scott, Efficiently approximating weighted sums with exponentially many terms, in: Proceedings of the Fourteenth Annual Conference on Computational Learning Theory (2001) pp. 82–98.Google Scholar
  15. [15]
    A.P. Dempster, N.M. Laird and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, Series B 7 (1997) 1–38.Google Scholar
  16. [16]
    T.G. Dietterich, R.H. Lathrop and T. Lozano-Perez, Solving the multiple-instance problem with axis-parallel rectangles, Artificial Intelligence 89(1–2) (1997) 31–71.Google Scholar
  17. [17]
    D.R. Dooly, S.A. Goldman and S.S. Kwek, Real-valued multiple-instance learning with queries, in: Proceedings of the Twelfth International Conference on Algorithmic Learning Theory (2001). To appear.Google Scholar
  18. [18]
    P. Goldberg, S.A. Goldman and S.D. Scott, PAC learning of one-dimensional patterns, Machine Learning 25(1) (1996) 51–70.Google Scholar
  19. [19]
    S.A. Goldman, S.K. Kwek and S.D. Scott, Learning from examples with unspecified attribute values, Information and Computation. To appear. Early version in COLT'97.Google Scholar
  20. [20]
    S.A. Goldman, S.K. Kwek and S.D. Scott, Agnostic learning of geometric patterns, Journal of Computer and System Sciences 6(1) (2001) 123–151.Google Scholar
  21. [21]
    S.A. Goldman and S.D. Scott, A theoretical and empirical study of a noise-tolerant algorithm to learn geometric patterns, Machine Learning 37(1) (1999) 5–49.Google Scholar
  22. [22]
    P.M. Gruber, Approximation of convex bodies, in: Convexity and its Applications, eds. P.M. Gruber and J.M. Willis (Birkhäuser, 1983).Google Scholar
  23. [23]
    D. Haussler, Decision theoretic generalizations of the PAC model for neural net and other learning applications, Information and Computation 100(1) (1992) 78–150.Google Scholar
  24. [24]
    D.P. Helmbold, J. Kivinen and M.K. Warmuth, Worst-case loss bounds for single neurons, Technical Report UCSC-CRL-96-2, University of California Computer Research Lab, Santa Cruz, CA (1996). Early version in NIPS 8 (1996).Google Scholar
  25. [25]
    D.P. Huttenlocher and W.J. Rucklidge, A multi-resolution technique for comparing images using the Hausdorff distance, Technical Report 92-1321, Department of Computer Science, Cornell University (1992).Google Scholar
  26. [26]
    M.J. Kearns and R.E. Schapire, Efficient Distribution-Free Learning of Probabilistic Concepts, Vol. I: Constraints and Prospects (MIT Press, 1994). Chapter 10, pp. 289–329. Earlier version appeared in FOCS'90.Google Scholar
  27. [27]
    M.J. Kearns, R.E. Schapire and L.M. Sellie, Toward efficient agnostic learning, Machine Learning 17(2/3) (1994) 115–142.Google Scholar
  28. [28]
    J. Kivinen and M.K. Warmuth, Exponentiated gradient versus gradient descent for linear predictors, Information and Computation 132(1) (1997) 1–63.Google Scholar
  29. [29]
    J. Kivinen and M.K. Warmuth, Relative loss bounds for multidimensional regression problems, in: Proceedings of 1997 Neural Information Processing Conference (1997) pp. 287–293.Google Scholar
  30. [30]
    J. Kivinen, M.K. Warmuth and P. Auer, The perception algorithm vs. Winnow: Linear vs. logarithmic mistake bounds when few input variables are relevant, Artificial Intelligence 97(1–2) (1997) 325–343.Google Scholar
  31. [31]
    G.J. Klir and B. Yuan, Fuzzy Sets and Fuzzy Logic: Theory and Applications (Prentice Hall, 1995).Google Scholar
  32. [32]
    C.-T. Lin and C.S.G. Lee, Neural Fuzzy Systems: A Neuro-Fuzzy Synergism to Intelligent Systems (Prentice Hall, 1996).Google Scholar
  33. [33]
    N. Littlestone, Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm, Machine Learning 2 (1988) 285–318.Google Scholar
  34. [34]
    N. Littlestone, Redundant noisy attributes, attribute errors, and linear threshold learning using Winnow,in: Proceedings of 4th Annual Workshop on Computer Learning Theory (Morgan Kaufmann, San Mateo, CA, 1991) pp. 147–156.Google Scholar
  35. [35]
    N. Littlestone and M.K. Warmuth, The weighted majority algorithm, Information and Computation 108(2) (1994) 212–261.Google Scholar
  36. [36]
    P.M. Long, On-line evaluation and prediction using linear functions, in: Proceedings of 10th Annual Conference on Computational Learning Theory (1997) pp. 21–31.Google Scholar
  37. [37]
    P.M. Long and L. Tan, PAC learning axis-aligned rectangles with respect to product distributions from multiple-instance examples, Machine Learning 30 (1998) 7–21.Google Scholar
  38. [38]
    W. Maass and M.K. Warmuth, Efficient learning with virtual threshold gates, Information and Computation 141(1) (1998) 66–83.Google Scholar
  39. [39]
    O. Maron, Learning from Ambiguity, Ph.D. thesis, Department of Electrical Engineering and Computer Science, M.I.T. (1998).Google Scholar
  40. [40]
    O. Maron and T. Lozano-Pérez, A framework for multiple-instance learning, in: Advances in Neural Information Processing Systems 10 (1998).Google Scholar
  41. [41]
    O. Maron and A.L. Ratan, Multiple-instance learning for natural scene classification, in: Proceedings of 15th International Conference on Machine Learning (1998) pp. 341–349.Google Scholar
  42. [42]
    K. Mulmuley, Combinatorial Geometry: An Introduction through Randomized Algorithms (Prentice Hall, 1998).Google Scholar
  43. [43]
    B. Pinette, Image-based navigation through large-scaled environments, Ph.D. thesis, University of Massachusetts, Amherst (1993).Google Scholar
  44. [44]
    J. Ramon and L.D. Raedt, Multiple-instance neural networks, in: Proceedings of the ICML-2000 Workshop on Attribute-Value and Relational Learning (2000).Google Scholar
  45. [45]
    S. Ray and D. Page, Multiple-instance regression, in: Proceedings of 18th International Conference on Machine Learning (Morgan Kaufmann, 2001) pp. 425–432.Google Scholar
  46. [46]
    D.-G. Sim, O.-K. Kwon and R.-H. Park, Object matching algorithms using robust Hausdorff distance measures, IEEE Transactions on Image Processing 8(3) (1999).Google Scholar
  47. [47]
    J. Wang and J.D. Zucker, Solving the multiple-instance problem: A lazy learning approach, in: Proceedings of 17th International Conference on Machine Learning (2000) pp. 1119–1125.Google Scholar
  48. [48]
    M.K. Warmuth and A.K. Jagota, Continuous and discrete-time non-linear gradient descent: relative loss bounds and convergence, in: Proceedings of Fifth International Symposium on Artificial Intelligence and Mathematics (1997).Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Sally A. Goldman
    • 1
  • Stephen D. Scott
    • 2
  1. 1.Department of Computer Science & Engr.Washington UniversitySt. LouisUSA
  2. 2.Department of Computer Science & Engr.University of NebraskaLincolnUSA

Personalised recommendations