Abstract
The abundance of information published on the Internet makes filtering of hazardous Web pages a difficult yet important task. Supervised learning methods such as Support Vector Machines can be used to identify hazardous Web content. However, scalability is a big challenge, especially if we have to train multiple classifiers, since different policies exist on what kind of information is hazardous. We therefore propose a transfer learning approach called Hierarchical Training for Multiple SVMs. HTMSVM identifies common data among similar training sets and trains the common data sets first, in order to obtain initial solutions. These initial solutions then reduce the time for training the individual training sets without influencing classification accuracy. In an experiment, in which we trained five Web content filters with 80% of common and 20% of inconsistently labeled training examples, HTMSVM was able to predict hazardous Web pages with a training time of only 26% to 41% compared to LibSVM, but the same classification accuracy (more than 91%).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ikeda, K., Yanagihara, T., Hattori, G., Matsumoto, K., Takisima, Y.: Hazardous Document Detection Based on Dependency Relations and Thesaurus. In: Li, J. (ed.) AI 2010. LNCS, vol. 6464, pp. 455–465. Springer, Heidelberg (2010)
Nguyen, D.D., Matsumoto, K., Takishima, Y., Hashimoto, K.: Condensed vector machines: Learning fast machine for large data. IEEE Transactions on Neural Networks 21(12), 1903–1914 (2010)
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
Platt, J.C.: Sequential minimal optimization: A fast algorithm for training support vector machines. Technical report, Advances in Kernel Methods - Support Vector Learning (1998)
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), Software, http://www.csie.ntu.edu.tw/~cjlin/libsvm
Menon, A.K.: Large-scale support vector machines: Algorithms and theory, research exam. Technical report, University of California San Diego (2009)
Cervantes, J., Li, X., Yu, W.: Svm classification for large data sets by considering models of classes distribution. In: Mexican International Conference on Artificial Intelligence (MIKAI), pp. 51–60 (2007)
Abu-Mostafa, Y.S.: Learning from hints in neural networks. Journal of Complexity 6(2), 192–198 (1990)
Caruana, R.: Multitask learning: A knowledge-based source of inductive bias. In: Proceedings of the Tenth International Conference on Machine Learning, pp. 41–48 (1993)
Thrun, S.: Is learning the n-th thing any easier than learning the first? In: Advances in Neural Information Processing Systems, pp. 640–646 (1996)
Baxter, J.: A model of inductive bias learning. Journal of Artificial Intelligence Research 12, 149–198 (2000)
Arnold, A., Nallapati, R., Cohen, W.W.: A comparative study of methods for transductive transfer learning. In: Proceedings of the Seventh IEEE International Conference on Data Mining Workshops, pp. 77–82 (2007)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22(10), 1345–1359 (2010)
Bickel, S.: Ecml-pkdd discovery challenge 2006 overview. In: ECML-PKDD Discovery Challenge Workshop, pp. 1–9 (2008)
Cauwenberghs, G., Poggio, T.: Incremental and decremental support vector machine learning. In: Advances in Neuronal Information Processing Systems, vol. 13, pp. 409–415 (2000)
Ruping, S.: Incremental learning with support vector machines. In: IEEE International Conference on Data Mining, pp. 641–642 (2001)
Shilton, A., Palaniswami, M., Ralph, D., Tsoi, A.C.: Incremental training of support vector machines. IEEE Transactions on Neural Networks 16(1), 114–131 (2005)
Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 1–13 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Erdmann, M., Nguyen, D.D., Takeyoshi, T., Hattori, G., Matsumoto, K., Ono, C. (2012). Hierarchical Training of Multiple SVMs for Personalized Web Filtering. In: Anthony, P., Ishizuka, M., Lukose, D. (eds) PRICAI 2012: Trends in Artificial Intelligence. PRICAI 2012. Lecture Notes in Computer Science(), vol 7458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32695-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-32695-0_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32694-3
Online ISBN: 978-3-642-32695-0
eBook Packages: Computer ScienceComputer Science (R0)