Multi-Instance Learning Based Web Mining

Zhou, Zhi-Hua; Jiang, Kai; Li, Ming

doi:10.1007/s10489-005-5602-z

Multi-Instance Learning Based Web Mining

Published: March 2005

Volume 22, pages 135–147, (2005)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Zhi-Hua Zhou¹,
Kai Jiang¹ &
Ming Li¹

473 Accesses
97 Citations
Explore all metrics

Abstract

In multi-instance learning, the training set comprises labeled bags that are composed of unlabeled instances, and the task is to predict the labels of unseen bags. In this paper, a web mining problem, i.e. web index recommendation, is investigated from a multi-instance view. In detail, each web index page is regarded as a bag, while each of its linked pages is regarded as an instance. A user favoring an index page means that he or she is interested in at least one page linked by the index. Based on the browsing history of the user, recommendation could be provided for unseen index pages. An algorithm named Fretcit-kNN, which employs the Minimal Hausdorff distance between frequent term sets and utilizes both the references and citers of an unseen bag in determining its label, is proposed to solve the problem. Experiments show that in average the recommendation accuracy of Fretcit-kNN is 81.0% with 71.7% recall and 70.9% precision, which is significantly better than the best algorithm that does not consider the specific characteristics of multi-instance learning, whose performance is 76.3% accuracy with 63.4% recall and 66.1% precision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey of data mining

Article 06 February 2020

A Hybrid Principal Label Space Transformation-Based Binary Relevance Support Vector Machine and Q-Learning Algorithm for Multi-label Classification

Article 20 April 2024

A survey of multi-label classification based on supervised and semi-supervised learning

Article 28 October 2022

References

O. Maron, “Learning from ambiguity,” PhD dissertation, Department of Electrical Engineering and Computer Science, MIT, Jun. 1998.
T.G. Dietterich, R.H. Lathrop, and T. Lozano-Pérez, “Solving the multiple-instance problem with axis-parallel rectangles,” Artificial Intelligence, vol. 89, nos. 1–2, pp. 31–71, 1997.
Google Scholar
P.M. Long and L. Tan, “PAC learning axis-aligned rectangles with respect to product distributions from multiple-instance examples,” Machine Learning, vol. 30, no. 1, pp. 7–21, 1998.
Google Scholar
P. Auer, P.M. Long, and A. Srinivasan, “Approximating hyper-rectangles: Learning and pseudo-random sets,” Journal of Computer and System Sciences, vol. 57, no. 3, pp. 376–388, 1998.
Google Scholar
P. Auer, “On learning from multi-instance examples: Empirical evaluation of a theoretical approach,” in Proceedings of the 14th International Conference on Machine Learning, Nashville, TN, 1997, pp. 21–29.
A. Blum and A. Kalai, “A note on learning from multiple-instance examples,” Machine Learning, vol. 30, no. 1, pp. 23–29, 1998.
Google Scholar
O. Maron and T. Lozano-Pérez, “A framework for multiple-instance learning,” in Advances in Neural Information Processing Systems 10, edited by M.I. Jordan, M.J. Kearns, and S.A. Solla, MIT Press: Cambridge, MA, 1998, pp. 570–576.
Google Scholar
O. Maron and A.L. Ratan, “Multiple-instance learning for natural scene classification,” in Proceedings of the 15th International Conference on Machine Learning, Madison, WI, 1998, pp. 341–349.
C. Yang and T. Lozano-Pérez, “Image database retrieval with multiple-instance learning techniques,” in Proceedings of the 16th International Conference on Data Engineering, San Diego, CA, 2000, pp. 233–243.
J. Wang and J.-D. Zucker, “Solving the multiple-instance problem: A lazy learning approach,” in Proceedings of the 17th International Conference on Machine Learning, San Francisco, CA, 2000, pp. 1119–1125.
G. Ruffo, “Learning single and multiple instance decision tree for computer security applications,” PhD dissertation, Department of Computer Science, University of Turin, Torino, Italy, Feb. 2000.
Google Scholar
Y. Chevaleyre and J.-D. Zucker, “Solving multiple-instance and multiple-part learning problems with decision trees and decision rules. Application to the mutagenesis problem,” in Lecture Notes in Artificial Intelligence 2056, edited by E. Stroulia and S. Matwin, Springer: Berlin, 2001, pp. 204–214.
Google Scholar
Z.-H. Zhou and M.-L. Zhang, “Neural networks for multi-instance learning,” Technical Report, AI Lab, Computer Science & Technology Department, Nanjing University, Nanjing, China, Aug. 2002.
Google Scholar
Q. Zhang and S.A. Goldman, “EM-DD: An improved multi-instance learning technique,” in Advances in Neural Information Processing Systems 14, edited by T.G. Dietterich, S. Becker, and Z. Ghahramani, MIT Press: Cambridge, MA, 2002, pp. 1073–1080.
Google Scholar
Q. Zhang, W. Yu, S.A. Goldman, and J.E. Fritts, “Content-based image retrieval using multiple-instance learning,” in Proceedings of the 19th International Conference on Machine Learning, Sydney, Australia, 2002, pp. 682–689.
Z.-H. Zhou and M.-L. Zhang, “Ensembles of multi-instance learners,” in Lecture Notes in Artificial Intelligence 2837, edited by N. Lavrac, D. Gamberger, H. Blockeel, and L. Todorovski, Springer: Berlin, 2003, pp. 492–502.
Google Scholar
S. Ray and D. Page, “Multiple instance regression,” in Proceedings of the 18th International Conference on Machine Learning, Williamstown, MA, 2001, pp. 425–432.
R.A. Amar, D.R. Dooly, S.A. Goldman, and Q. Zhang, “Multiple-instance learning of real-valued data,” in Proceedings of the 18th International Conference on Machine Learning, Williamstown, MA, 2001, pp. 3–10.
L. De Raedt, “Attribute-value learning versus inductive logic programming: The missing links,” in Lecture Notes in Artificial Intelligence 1446, edited by D. Page, Berlin: Springer, 1998, pp. 1–8.
Google Scholar
J.-D. Zucker and J.-G. Ganascia, “Changes of representation for efficient learning in structural domains,” in Proceedings of the 13th International Conference on Machine Learning, Bary, Italy, 1996, pp. 543–551.
J.-D. Zucker and J.-G. Ganascia, “Learning structurally indeterminate clauses,” in Lecture Notes in Artificial Intelligence 1446, edited by D. Page, Springer: Berlin, 1998, pp. 235–244.
Google Scholar
B.V. Dasarathy, Nearest Neighbor Norms: NN Pattern Classification Techniques, Los Alamitos, CA: IEEE Computer Society Press, 1991.
Google Scholar
D.W. Aha, “Lazy learning: Special issue editorial,” Artificial Intelligence Review, vol. 11, nos. 1–5, pp. 7–10, 1997.
Google Scholar
T.G. Dietterich, A. Jain, R.H. Lathrop, and T. Lozano-Pérez, “A comparison of dynamic reposing and tangent distance for drug activity prediction,” in Advances in Neural Information Processing Systems 6, edited by J. Cowan, G. Tesauro, and J. Alspector, San Mateo: Morgan Kaufmann, 1994, pp. 216–223.
G.A. Edgar, Measure, Topology, and Fractal Geometry, Springer: Berlin, 1990.
Google Scholar
T. Joachims, “A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization,” in Proceedings of the 14th International Conference on Machine Learning, Nashville, TN, 1997, pp. 143–151.

Download references

Author information

Authors and Affiliations

National Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210093, China
Zhi-Hua Zhou, Kai Jiang & Ming Li

Authors

Zhi-Hua Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Kai Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Ming Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhi-Hua Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, ZH., Jiang, K. & Li, M. Multi-Instance Learning Based Web Mining. Appl Intell 22, 135–147 (2005). https://doi.org/10.1007/s10489-005-5602-z

Download citation

Issue Date: March 2005
DOI: https://doi.org/10.1007/s10489-005-5602-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-Instance Learning Based Web Mining

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey of data mining

A Hybrid Principal Label Space Transformation-Based Binary Relevance Support Vector Machine and Q-Learning Algorithm for Multi-label Classification

A survey of multi-label classification based on supervised and semi-supervised learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-Instance Learning Based Web Mining

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey of data mining

A Hybrid Principal Label Space Transformation-Based Binary Relevance Support Vector Machine and Q-Learning Algorithm for Multi-label Classification

A survey of multi-label classification based on supervised and semi-supervised learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation