Skip to main content
Log in

Multi-Instance Learning Based Web Mining

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In multi-instance learning, the training set comprises labeled bags that are composed of unlabeled instances, and the task is to predict the labels of unseen bags. In this paper, a web mining problem, i.e. web index recommendation, is investigated from a multi-instance view. In detail, each web index page is regarded as a bag, while each of its linked pages is regarded as an instance. A user favoring an index page means that he or she is interested in at least one page linked by the index. Based on the browsing history of the user, recommendation could be provided for unseen index pages. An algorithm named Fretcit-kNN, which employs the Minimal Hausdorff distance between frequent term sets and utilizes both the references and citers of an unseen bag in determining its label, is proposed to solve the problem. Experiments show that in average the recommendation accuracy of Fretcit-kNN is 81.0% with 71.7% recall and 70.9% precision, which is significantly better than the best algorithm that does not consider the specific characteristics of multi-instance learning, whose performance is 76.3% accuracy with 63.4% recall and 66.1% precision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. O. Maron, “Learning from ambiguity,” PhD dissertation, Department of Electrical Engineering and Computer Science, MIT, Jun. 1998.

  2. T.G. Dietterich, R.H. Lathrop, and T. Lozano-Pérez, “Solving the multiple-instance problem with axis-parallel rectangles,” Artificial Intelligence, vol. 89, nos. 1–2, pp. 31–71, 1997.

    Google Scholar 

  3. P.M. Long and L. Tan, “PAC learning axis-aligned rectangles with respect to product distributions from multiple-instance examples,” Machine Learning, vol. 30, no. 1, pp. 7–21, 1998.

    Google Scholar 

  4. P. Auer, P.M. Long, and A. Srinivasan, “Approximating hyper-rectangles: Learning and pseudo-random sets,” Journal of Computer and System Sciences, vol. 57, no. 3, pp. 376–388, 1998.

    Google Scholar 

  5. P. Auer, “On learning from multi-instance examples: Empirical evaluation of a theoretical approach,” in Proceedings of the 14th International Conference on Machine Learning, Nashville, TN, 1997, pp. 21–29.

  6. A. Blum and A. Kalai, “A note on learning from multiple-instance examples,” Machine Learning, vol. 30, no. 1, pp. 23–29, 1998.

    Google Scholar 

  7. O. Maron and T. Lozano-Pérez, “A framework for multiple-instance learning,” in Advances in Neural Information Processing Systems 10, edited by M.I. Jordan, M.J. Kearns, and S.A. Solla, MIT Press: Cambridge, MA, 1998, pp. 570–576.

    Google Scholar 

  8. O. Maron and A.L. Ratan, “Multiple-instance learning for natural scene classification,” in Proceedings of the 15th International Conference on Machine Learning, Madison, WI, 1998, pp. 341–349.

  9. C. Yang and T. Lozano-Pérez, “Image database retrieval with multiple-instance learning techniques,” in Proceedings of the 16th International Conference on Data Engineering, San Diego, CA, 2000, pp. 233–243.

  10. J. Wang and J.-D. Zucker, “Solving the multiple-instance problem: A lazy learning approach,” in Proceedings of the 17th International Conference on Machine Learning, San Francisco, CA, 2000, pp. 1119–1125.

  11. G. Ruffo, “Learning single and multiple instance decision tree for computer security applications,” PhD dissertation, Department of Computer Science, University of Turin, Torino, Italy, Feb. 2000.

    Google Scholar 

  12. Y. Chevaleyre and J.-D. Zucker, “Solving multiple-instance and multiple-part learning problems with decision trees and decision rules. Application to the mutagenesis problem,” in Lecture Notes in Artificial Intelligence 2056, edited by E. Stroulia and S. Matwin, Springer: Berlin, 2001, pp. 204–214.

    Google Scholar 

  13. Z.-H. Zhou and M.-L. Zhang, “Neural networks for multi-instance learning,” Technical Report, AI Lab, Computer Science & Technology Department, Nanjing University, Nanjing, China, Aug. 2002.

    Google Scholar 

  14. Q. Zhang and S.A. Goldman, “EM-DD: An improved multi-instance learning technique,” in Advances in Neural Information Processing Systems 14, edited by T.G. Dietterich, S. Becker, and Z. Ghahramani, MIT Press: Cambridge, MA, 2002, pp. 1073–1080.

    Google Scholar 

  15. Q. Zhang, W. Yu, S.A. Goldman, and J.E. Fritts, “Content-based image retrieval using multiple-instance learning,” in Proceedings of the 19th International Conference on Machine Learning, Sydney, Australia, 2002, pp. 682–689.

  16. Z.-H. Zhou and M.-L. Zhang, “Ensembles of multi-instance learners,” in Lecture Notes in Artificial Intelligence 2837, edited by N. Lavrac, D. Gamberger, H. Blockeel, and L. Todorovski, Springer: Berlin, 2003, pp. 492–502.

    Google Scholar 

  17. S. Ray and D. Page, “Multiple instance regression,” in Proceedings of the 18th International Conference on Machine Learning, Williamstown, MA, 2001, pp. 425–432.

  18. R.A. Amar, D.R. Dooly, S.A. Goldman, and Q. Zhang, “Multiple-instance learning of real-valued data,” in Proceedings of the 18th International Conference on Machine Learning, Williamstown, MA, 2001, pp. 3–10.

  19. L. De Raedt, “Attribute-value learning versus inductive logic programming: The missing links,” in Lecture Notes in Artificial Intelligence 1446, edited by D. Page, Berlin: Springer, 1998, pp. 1–8.

    Google Scholar 

  20. J.-D. Zucker and J.-G. Ganascia, “Changes of representation for efficient learning in structural domains,” in Proceedings of the 13th International Conference on Machine Learning, Bary, Italy, 1996, pp. 543–551.

  21. J.-D. Zucker and J.-G. Ganascia, “Learning structurally indeterminate clauses,” in Lecture Notes in Artificial Intelligence 1446, edited by D. Page, Springer: Berlin, 1998, pp. 235–244.

    Google Scholar 

  22. B.V. Dasarathy, Nearest Neighbor Norms: NN Pattern Classification Techniques, Los Alamitos, CA: IEEE Computer Society Press, 1991.

    Google Scholar 

  23. D.W. Aha, “Lazy learning: Special issue editorial,” Artificial Intelligence Review, vol. 11, nos. 1–5, pp. 7–10, 1997.

    Google Scholar 

  24. T.G. Dietterich, A. Jain, R.H. Lathrop, and T. Lozano-Pérez, “A comparison of dynamic reposing and tangent distance for drug activity prediction,” in Advances in Neural Information Processing Systems 6, edited by J. Cowan, G. Tesauro, and J. Alspector, San Mateo: Morgan Kaufmann, 1994, pp. 216–223.

  25. G.A. Edgar, Measure, Topology, and Fractal Geometry, Springer: Berlin, 1990.

    Google Scholar 

  26. T. Joachims, “A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization,” in Proceedings of the 14th International Conference on Machine Learning, Nashville, TN, 1997, pp. 143–151.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhi-Hua Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, ZH., Jiang, K. & Li, M. Multi-Instance Learning Based Web Mining. Appl Intell 22, 135–147 (2005). https://doi.org/10.1007/s10489-005-5602-z

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-005-5602-z

Keywords

Navigation