Web Data Mining pp 63-132 | Cite as

# Supervised Learning

Chapter

First Online:

## Abstract

Supervised learning has been a great success in real-world applications. It is used in almost every domain, including text and Web domains. Supervised learning is also called classification or inductive learning in machine learning. This type of learning is analogous to human learning from past experiences to gain new knowledge in order to improve our ability to perform real-world tasks. However, since computers do not have “experiences”, machine learning learns from data, which are collected in the past and represent past experiences in some real-world applications.

## Keywords

Support Vector Machine Receiver Operating Characteristic Curve Association Rule Leaf Node Association Rule Mining
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## Preview

Unable to display preview. Download preview PDF.

## Bibliography

- 1.Agrawal, R., R. Bayardo, and R. Srikant. Athena: Mining-based interactive management of text databases
*.*Advances in Database Technology—EDBT 2000, 2000: p. 365–379.Google Scholar - 2.Antonie, M. and O. Zaïane. Text document categorization by term association. In
*Proceedings of IEEE International Conference on Data Minig (ICDM-2002)*, 2002.Google Scholar - 3.Boser, B., I. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In
*Proceedings of Fifth Annual Workshop on Computational Learning Theory*, 1992.Google Scholar - 4.Breiman, L. Bagging predictors
*.*Machine learning, 1996, 24(2): p. 123–140. 5. Breiman, L. Random forests*.*Machine learning, 2001, 45(1): p. 5–32.Google Scholar - 5.Breiman, L., J.H. Friedman, R. Olshen, and C.L. Stone.
*Classification and Regression Trees*. 1984: Chapman and Hall.Google Scholar - 6.Brunk, C. and M. Pazzani. An investigation of noise-tolerant relational concept learning algorithms. In
*Proceedings of International Workshop on Macine Learning*, 1991.Google Scholar - 7.Burges, C. A tutorial on support vector machines for pattern recognition
*.*Data mining and knowledge discovery, 1998, 2(2): p. 121–167.CrossRefGoogle Scholar - 8.Clark, P. and T. Niblett. The CN2 induction algorithm
*.*Machine learning, 1989, 3(4): p. 261–283.Google Scholar - 9.Cohen, W. Fast effective rule induction. In
*Proceedings of International Conference on Machine Learning (ICML-1995)*, 1995.Google Scholar - 10.Cong, G., A. Tung, X. Xu, F. Pan, and J. Yang. Farmer: Finding interesting rule groups in microarray datasets. In
*Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD-2004)*, 2004.Google Scholar - 11.Cristianini, N. and J. Shawe-Taylor.
*An introduction to support Vector Machines: and other kernel-based learning methods*. 2000: Cambridge Univ Press.Google Scholar - 12.Deshpande, M. and G. Karypis. Using conjunction of attribute values for classification. In
*Proceedings of ACM Intl. Conf. on Information and Knowledge Management (CIKM-2002)*, 2002.Google Scholar - 13.Dietterich, T. and G. Bakiri. Solving multiclass learning problems via errorcorrecting output codes
*.*Journal of Artificial Intelligence Research, 1995, 2.Google Scholar - 14.Domingos, P. and M. Pazzani. On the optimality of the simple Bayesian classifier under zero-one loss
*.*Machine learning, 1997, 29(2): p. 103–130.zbMATHCrossRefGoogle Scholar - 15.Dougherty, J., R. Kohavi, and M. Sahami. Supervised and unsupervised discretization of continuous features. In
*Proceedings of International Conference on Machine Learning (ICML-1995)*, 1995.Google Scholar - 16.Duda, R., P. Hart, and D. Stork.
*Pattern classification*. 2001: John Wiley & Sons Inc.Google Scholar - 17.Fan, W. On the optimality of probability estimation by random decision trees. In
*Proceedings of National Conf. on Artificial Intelligence (AAAI-2004)*,2004.Google Scholar - 18.Fayyad, U. and K. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. In
*Proceedings of the Intl. Joint Conf. on Artificial Intelligence (IJCAI-1993)*, 1993.Google Scholar - 19.Freund, Y. and R. Schapire. Experiments with a new boosting algorithm. In
*Proceedings of International Conference on Machine Learning (ICML-1996)*, 1996.Google Scholar - 20.Fürnkranz, J. and G. Widmer. Incremental reduced error pruning. In
*Proceedings of International Conference on Machine Learning (ICML-1994)*, 1994.Google Scholar - 21.Gehrke, J., R. Ramakrishnan, and V. Ganti. RainForest—a framework for fast decision tree construction of large datasets
*.*Data mining and knowledge discovery, 2000, 4(2): p. 127–162.CrossRefGoogle Scholar - 22.Good, I.
*The estimation of probabilities: an essay on modern Bayesian methods*. 1965: MIT Press.Google Scholar - 23.Han, J. and M. Kamber.
*Data mining: concepts and techniques*. 2006: Morgan Kaufmann Publishers.Google Scholar - 24.Hand, D., H. Mannila, and P. Smyth.
*Principles of data mining*. 2001: MIT Press.Google Scholar - 25.Hyafil, L. and R. Rivest. Constructing optimal binary decision trees is NPcomplete
*.*Information Processing Letters, 1976, 5(1): p. 15–17.zbMATHCrossRefMathSciNetGoogle Scholar - 26.Jindal, N. and B. Liu. Identifying comparative sentences in text documents. In
*Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-2006)*, 2006.Google Scholar - 27.Kass, G. An exploratory technique for investigating large quantities of categorical data
*.*Applied statistics, 1980, 29(2): p. 119–127.CrossRefGoogle Scholar - 28.Kohavi, R., B. Becker, and D. Sommerfield. Improving simple bayes. In
*Proceedings of European Conference on Machine Learning (ECML-1997)*, 1997.Google Scholar - 29.Langley, P., W. Iba, and K. Thompson. An analysis of Bayesian classifiers. In
*Proceedings of National Conf. on Artificial Intelligence (AAAI-1992)*, 1992.Google Scholar - 30.Lesh, N., M. Zaki, and M. Ogihara. Mining features for sequence classification. In
*Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-1999)*, 1999.Google Scholar - 31.Lewis, D. An evaluation of phrasal and clustered representations on a text categorization task. In
*Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-1992)*, 1992.Google Scholar - 32.Lewis, D. and W. Gale. A sequential algorithm for training text classifiers. In
*Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-1994)*, 1994.Google Scholar - 33.Li, H. and K. Yamanishi. Document classification using a finite mixture model. In
*Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL-1997)*, 1997.Google Scholar - 34.Li, J., G. Dong, K. Ramamohanarao, and L. Wong. DeEPs: A new instancebased lazy discovery and classification system
*.*Machine learning, 2004, 54(2): p. 99–124.zbMATHCrossRefGoogle Scholar - 35.Li, W., J. Han, and J. Pei. CMAR: Accurate and efficient classification based on multiple class-association rules. In
*Proceedings of IEEE International Conference on Data Mining (ICDM-2001)*, 2001.Google Scholar - 36.Lidstone, G. Note on the General Case of the Bayes-Laplace formula for Inductive or a Posteriori Probabilities
*.*Transaction of the Faculty of Actuuaries, 1920, 8: p. 182–192.Google Scholar - 37.Lin, W., S. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender systems
*.*Data mining and knowledge discovery, 2002, 6(1): p. 83–105.CrossRefMathSciNetGoogle Scholar - 38.Liu, B., W. Hsu, and Y. Ma. Integrating classification and association rule mining. In
*Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-1998)*, 1998.Google Scholar - 39.Liu, B., Y. Ma, and C. Wong. Classification using association rules: weaknesses and enhancements
*.*Data mining for scientific applications, 2001.Google Scholar - 40.Liu, B., K. Zhao, J. Benkler, and W. Xiao. Rule interestingness analysis using OLAP operations. In
*Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2006)*, 2006.Google Scholar - 41.McCallum, A. and K. Nigam. A comparison of event models for naive bayes text classification. In
*Proceedings of AAAI–98 Workshop on Learning for Text Categorization*, 1998.Google Scholar - 42.Meretakis, D. and B. Wuthrich. Extending na ve Bayes classifiers using long itemsets. In
*Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-1999)*, 1999.Google Scholar - 43.Michalski, R., I. Mozetic, J. Hong, and N. Lavrac. The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In
*Proceedings of National Conf. on Artificial Intelligence (AAAI-86)*, 1986.Google Scholar - 44.Mitchell, T.
*Machine Learning*. 1997: McGraw Hill.Google Scholar - 45.Mobasher, B., H. Dai, T. Luo, and M. Nakagawa. Effective personalization based on association rule discovery from web usage data. In
*Proceedings of ACM Workshop on Web Information and Data Management*, 2001.Google Scholar - 46.Pazzani, M., C. Brunk, and G. Silverstein. A knowledge-intensive approach to learning relational concepts. In
*Proceedings of Intl. Workshop on Machine Learning (ML-1991)*, 1991.Google Scholar - 47.Quinlan, J. Bagging, boosting, and C4. 5. In
*Proceedings of National Conf. on Artificial Intelligence (AAAI-1996)*, 1996.Google Scholar - 48.Quinlan, J.
*C4. 5: programs for machine learning*. 1993: Morgan Kaufmann Publishers.Google Scholar - 49.Quinlan, J. Learning logical definitions from relations
*.*Machine learning, 1990, 5(3): p. 239–266.Google Scholar - 50.Rivest, R. Learning decision lists
*.*Machine learning, 1987, 2(3): p. 229–246.Google Scholar - 51.Robertson, S. and K. Jones. Relevance weighting of search terms
*.*Journal of the American Society for Information Science, 1976, 27(3): p. 129–146. 53. Schapire, R. The strength of weak learnability*.*Machine learning, 1990, 5(2): p. 197–227.Google Scholar - 52.Scholkopf, B. and A. Smola.
*Learning with kernels*. 2002: MIT Press.Google Scholar - 53.Shannon, E. A mathematical theory of communication
*.*Bell System Technical Journal, 1948, 27: p. 379–423.zbMATHMathSciNetGoogle Scholar - 54.Tan, P., M. Steinbach, and V. Kumar.
*Introduction to data mining*. 2006: Pearson Addison Wesley Boston.Google Scholar - 55.Vapnik, V.
*The nature of statistical learning theory*. 1995: Springer Verlag.Google Scholar - 56.Wang, K., S. Zhou, and Y. He. Growing decision trees on support-less association rules. In
*Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2000)*, 2000: ACM.Google Scholar - 57.Witten, I. and E. Frank.
*Data Mining: Practical machine learning tools and techniques*. 2005: Morgan Kaufmann Publishers.Google Scholar - 58.Wolpert, D. Stacked Generalization
*.*Neural Networks, 1992, 5: p. 241–259.CrossRefGoogle Scholar - 59.Yang, Q., T. Li, and K. Wang. Building association-rule based sequential classifiers for web-document prediction
*.*Data mining and knowledge discovery, 2004, 8(3): p. 253–273.CrossRefMathSciNetGoogle Scholar - 60.Yang, Y. and X. Liu. A re-examination of text categorization methods. In
*Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-1999)*, 1999.Google Scholar - 61.Yin, X. and J. Han. CPAR: Classification based on predictive association rules. In
*Proceedings of SIAM International Conference on Data Mining (SDM-2003)*, 2003.Google Scholar - 62.Zaki, M. and C. Aggarwal. XRules: an effective structural classifier for XML data. In
*Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003)*, 2003.Google Scholar

## Copyright information

© Springer-Verlag Berlin Heidelberg 2011