Supervised Learning

Chapter
Part of the Data-Centric Systems and Applications book series (DCSA)

Abstract

Supervised learning has been a great success in real-world applications. It is used in almost every domain, including text and Web domains. Supervised learning is also called classification or inductive learning in machine learning. This type of learning is analogous to human learning from past experiences to gain new knowledge in order to improve our ability to perform real-world tasks. However, since computers do not have “experiences”, machine learning learns from data, which are collected in the past and represent past experiences in some real-world applications.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Bibliography

  1. 1.
    Agrawal, R., R. Bayardo, and R. Srikant. Athena: Mining-based interactive management of text databases. Advances in Database Technology—EDBT 2000, 2000: p. 365–379.Google Scholar
  2. 2.
    Antonie, M. and O. Zaïane. Text document categorization by term association. In Proceedings of IEEE International Conference on Data Minig (ICDM-2002), 2002.Google Scholar
  3. 3.
    Boser, B., I. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of Fifth Annual Workshop on Computational Learning Theory, 1992.Google Scholar
  4. 4.
    Breiman, L. Bagging predictors. Machine learning, 1996, 24(2): p. 123–140. 5. Breiman, L. Random forests. Machine learning, 2001, 45(1): p. 5–32.Google Scholar
  5. 5.
    Breiman, L., J.H. Friedman, R. Olshen, and C.L. Stone. Classification and Regression Trees. 1984: Chapman and Hall.Google Scholar
  6. 6.
    Brunk, C. and M. Pazzani. An investigation of noise-tolerant relational concept learning algorithms. In Proceedings of International Workshop on Macine Learning, 1991.Google Scholar
  7. 7.
    Burges, C. A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, 1998, 2(2): p. 121–167.CrossRefGoogle Scholar
  8. 8.
    Clark, P. and T. Niblett. The CN2 induction algorithm. Machine learning, 1989, 3(4): p. 261–283.Google Scholar
  9. 9.
    Cohen, W. Fast effective rule induction. In Proceedings of International Conference on Machine Learning (ICML-1995), 1995.Google Scholar
  10. 10.
    Cong, G., A. Tung, X. Xu, F. Pan, and J. Yang. Farmer: Finding interesting rule groups in microarray datasets. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD-2004), 2004.Google Scholar
  11. 11.
    Cristianini, N. and J. Shawe-Taylor. An introduction to support Vector Machines: and other kernel-based learning methods. 2000: Cambridge Univ Press.Google Scholar
  12. 12.
    Deshpande, M. and G. Karypis. Using conjunction of attribute values for classification. In Proceedings of ACM Intl. Conf. on Information and Knowledge Management (CIKM-2002), 2002.Google Scholar
  13. 13.
    Dietterich, T. and G. Bakiri. Solving multiclass learning problems via errorcorrecting output codes. Journal of Artificial Intelligence Research, 1995, 2.Google Scholar
  14. 14.
    Domingos, P. and M. Pazzani. On the optimality of the simple Bayesian classifier under zero-one loss. Machine learning, 1997, 29(2): p. 103–130.MATHCrossRefGoogle Scholar
  15. 15.
    Dougherty, J., R. Kohavi, and M. Sahami. Supervised and unsupervised discretization of continuous features. In Proceedings of International Conference on Machine Learning (ICML-1995), 1995.Google Scholar
  16. 16.
    Duda, R., P. Hart, and D. Stork. Pattern classification. 2001: John Wiley & Sons Inc.Google Scholar
  17. 17.
    Fan, W. On the optimality of probability estimation by random decision trees. In Proceedings of National Conf. on Artificial Intelligence (AAAI-2004),2004.Google Scholar
  18. 18.
    Fayyad, U. and K. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the Intl. Joint Conf. on Artificial Intelligence (IJCAI-1993), 1993.Google Scholar
  19. 19.
    Freund, Y. and R. Schapire. Experiments with a new boosting algorithm. In Proceedings of International Conference on Machine Learning (ICML-1996), 1996.Google Scholar
  20. 20.
    Fürnkranz, J. and G. Widmer. Incremental reduced error pruning. In Proceedings of International Conference on Machine Learning (ICML-1994), 1994.Google Scholar
  21. 21.
    Gehrke, J., R. Ramakrishnan, and V. Ganti. RainForest—a framework for fast decision tree construction of large datasets. Data mining and knowledge discovery, 2000, 4(2): p. 127–162.CrossRefGoogle Scholar
  22. 22.
    Good, I. The estimation of probabilities: an essay on modern Bayesian methods. 1965: MIT Press.Google Scholar
  23. 23.
    Han, J. and M. Kamber. Data mining: concepts and techniques. 2006: Morgan Kaufmann Publishers.Google Scholar
  24. 24.
    Hand, D., H. Mannila, and P. Smyth. Principles of data mining. 2001: MIT Press.Google Scholar
  25. 25.
    Hyafil, L. and R. Rivest. Constructing optimal binary decision trees is NPcomplete. Information Processing Letters, 1976, 5(1): p. 15–17.MATHCrossRefMathSciNetGoogle Scholar
  26. 26.
    Jindal, N. and B. Liu. Identifying comparative sentences in text documents. In Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-2006), 2006.Google Scholar
  27. 27.
    Kass, G. An exploratory technique for investigating large quantities of categorical data. Applied statistics, 1980, 29(2): p. 119–127.CrossRefGoogle Scholar
  28. 28.
    Kohavi, R., B. Becker, and D. Sommerfield. Improving simple bayes. In Proceedings of European Conference on Machine Learning (ECML-1997), 1997.Google Scholar
  29. 29.
    Langley, P., W. Iba, and K. Thompson. An analysis of Bayesian classifiers. In Proceedings of National Conf. on Artificial Intelligence (AAAI-1992), 1992.Google Scholar
  30. 30.
    Lesh, N., M. Zaki, and M. Ogihara. Mining features for sequence classification. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-1999), 1999.Google Scholar
  31. 31.
    Lewis, D. An evaluation of phrasal and clustered representations on a text categorization task. In Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-1992), 1992.Google Scholar
  32. 32.
    Lewis, D. and W. Gale. A sequential algorithm for training text classifiers. In Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-1994), 1994.Google Scholar
  33. 33.
    Li, H. and K. Yamanishi. Document classification using a finite mixture model. In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL-1997), 1997.Google Scholar
  34. 34.
    Li, J., G. Dong, K. Ramamohanarao, and L. Wong. DeEPs: A new instancebased lazy discovery and classification system. Machine learning, 2004, 54(2): p. 99–124.MATHCrossRefGoogle Scholar
  35. 35.
    Li, W., J. Han, and J. Pei. CMAR: Accurate and efficient classification based on multiple class-association rules. In Proceedings of IEEE International Conference on Data Mining (ICDM-2001), 2001.Google Scholar
  36. 36.
    Lidstone, G. Note on the General Case of the Bayes-Laplace formula for Inductive or a Posteriori Probabilities. Transaction of the Faculty of Actuuaries, 1920, 8: p. 182–192.Google Scholar
  37. 37.
    Lin, W., S. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender systems. Data mining and knowledge discovery, 2002, 6(1): p. 83–105.CrossRefMathSciNetGoogle Scholar
  38. 38.
    Liu, B., W. Hsu, and Y. Ma. Integrating classification and association rule mining. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-1998), 1998.Google Scholar
  39. 39.
    Liu, B., Y. Ma, and C. Wong. Classification using association rules: weaknesses and enhancements. Data mining for scientific applications, 2001.Google Scholar
  40. 40.
    Liu, B., K. Zhao, J. Benkler, and W. Xiao. Rule interestingness analysis using OLAP operations. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2006), 2006.Google Scholar
  41. 41.
    McCallum, A. and K. Nigam. A comparison of event models for naive bayes text classification. In Proceedings of AAAI–98 Workshop on Learning for Text Categorization, 1998.Google Scholar
  42. 42.
    Meretakis, D. and B. Wuthrich. Extending na ve Bayes classifiers using long itemsets. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-1999), 1999.Google Scholar
  43. 43.
    Michalski, R., I. Mozetic, J. Hong, and N. Lavrac. The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In Proceedings of National Conf. on Artificial Intelligence (AAAI-86), 1986.Google Scholar
  44. 44.
    Mitchell, T. Machine Learning. 1997: McGraw Hill.Google Scholar
  45. 45.
    Mobasher, B., H. Dai, T. Luo, and M. Nakagawa. Effective personalization based on association rule discovery from web usage data. In Proceedings of ACM Workshop on Web Information and Data Management, 2001.Google Scholar
  46. 46.
    Pazzani, M., C. Brunk, and G. Silverstein. A knowledge-intensive approach to learning relational concepts. In Proceedings of Intl. Workshop on Machine Learning (ML-1991), 1991.Google Scholar
  47. 47.
    Quinlan, J. Bagging, boosting, and C4. 5. In Proceedings of National Conf. on Artificial Intelligence (AAAI-1996), 1996.Google Scholar
  48. 48.
    Quinlan, J. C4. 5: programs for machine learning. 1993: Morgan Kaufmann Publishers.Google Scholar
  49. 49.
    Quinlan, J. Learning logical definitions from relations. Machine learning, 1990, 5(3): p. 239–266.Google Scholar
  50. 50.
    Rivest, R. Learning decision lists. Machine learning, 1987, 2(3): p. 229–246.Google Scholar
  51. 51.
    Robertson, S. and K. Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 1976, 27(3): p. 129–146. 53. Schapire, R. The strength of weak learnability. Machine learning, 1990, 5(2): p. 197–227.Google Scholar
  52. 52.
    Scholkopf, B. and A. Smola. Learning with kernels. 2002: MIT Press.Google Scholar
  53. 53.
    Shannon, E. A mathematical theory of communication. Bell System Technical Journal, 1948, 27: p. 379–423.MATHMathSciNetGoogle Scholar
  54. 54.
    Tan, P., M. Steinbach, and V. Kumar. Introduction to data mining. 2006: Pearson Addison Wesley Boston.Google Scholar
  55. 55.
    Vapnik, V. The nature of statistical learning theory. 1995: Springer Verlag.Google Scholar
  56. 56.
    Wang, K., S. Zhou, and Y. He. Growing decision trees on support-less association rules. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2000), 2000: ACM.Google Scholar
  57. 57.
    Witten, I. and E. Frank. Data Mining: Practical machine learning tools and techniques. 2005: Morgan Kaufmann Publishers.Google Scholar
  58. 58.
    Wolpert, D. Stacked Generalization. Neural Networks, 1992, 5: p. 241–259.CrossRefGoogle Scholar
  59. 59.
    Yang, Q., T. Li, and K. Wang. Building association-rule based sequential classifiers for web-document prediction. Data mining and knowledge discovery, 2004, 8(3): p. 253–273.CrossRefMathSciNetGoogle Scholar
  60. 60.
    Yang, Y. and X. Liu. A re-examination of text categorization methods. In Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-1999), 1999.Google Scholar
  61. 61.
    Yin, X. and J. Han. CPAR: Classification based on predictive association rules. In Proceedings of SIAM International Conference on Data Mining (SDM-2003), 2003.Google Scholar
  62. 62.
    Zaki, M. and C. Aggarwal. XRules: an effective structural classifier for XML data. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), 2003.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of Illinois, ChicagoChicagoUSA

Personalised recommendations