Machine Learning

, Volume 38, Issue 1–2, pp 213–236 | Cite as

A Multistrategy Approach to Classifier Learning from Time Series

  • William H. Hsu
  • Sylvian R. Ray
  • David C. Wilkins

Abstract

We present an approach to inductive concept learning using multiple models for time series. Our objective is to improve the efficiency and accuracy of concept learning by decomposing learning tasks that admit multiple types of learning architectures and mixture estimation methods. The decomposition method adapts attribute subset selection and constructive induction (cluster definition) to define new subproblems. To these problem definitions, we can apply metric-based model selection to select from a database of learning components, thereby producing a specification for supervised learning using a mixture model. We report positive learning results using temporal artificial neural networks (ANNs), on a synthetic, multiattribute learning problem and on a real-world time series monitoring application.

multistrategy learning time series attribute partitioning constructive induction metric-based model selection mixture estimation 

References

  1. Barr, A. & Feigenbaum, E. A. (1981). Search. The handbook of artificial intelligence (Vol. 1, pp. 19–139). Reading, MA: Addison-Wesley.Google Scholar
  2. Beauchamp, J.W., Maher, R. C., & Brown, R. (1993). Detection of musical pitch from recorded solo performances. Proceedings of the 94th Convention of the Audio Engineering Society, Berlin, Germany.Google Scholar
  3. Benjamin, D. P. (Ed.) (1990). Change of representation and inductive bias. Boston: Kluwer Academic Publishers.Google Scholar
  4. Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford, UK: Clarendon Press.Google Scholar
  5. Box, G. E. P., Jenkins, G. M., & Reinsel, G. C. (1994). Time series analysis, forecasting, and control, 3rd ed. San Fransisco, CA: Holden-Day.Google Scholar
  6. Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.Google Scholar
  7. Cover, T. M. & Thomas, J. A. (1991). Elements of information theory. New York, NY: John Wiley and Sons.Google Scholar
  8. Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39 (Series B), 1–38.Google Scholar
  9. Donoho, S. K. (1996). Knowledge-guided constructive induction. Ph.D. Thesis, Department of Computer Science, University of Illinois at Urbana-Champaign.Google Scholar
  10. Duda, R. O. & Hart, P. E. (1973). Pattern classification and scene analysis. New York, NY: JohnWiley and Sons.Google Scholar
  11. Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179–211.Google Scholar
  12. Engels, R., Verdenius, F., & Aha, D. (1998). Proceedings of the 1998 Joint AAAI-ICML Workshop on the Methodology of Applying Machine Learning (Technical Report WS–98–16). Menlo Park, CA: AAAI Press.Google Scholar
  13. Freund, T. & Schapire, R. E. (1996). Experiments with aNewBoosting Algorithm. Machine Learning: Proceedings of the Thirteenth International Conference on (ICML-96).Google Scholar
  14. Fu, L.-M. & Buchanan, B. G. (1985). Learning intermediate concepts in constructing a hierarchical knowledge base. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI-85), Los Angeles, CA (pp. 659–666).Google Scholar
  15. Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural Networks and the Bias/Variance Dilemna. Neural Computation, 4, 1–58.Google Scholar
  16. Gershenfeld, N. A. & Weigend, A. S. (1994). The Future of Time Series: Learning and Understanding. In A. S. Weigend & N. A. Gershenfeld (Eds.), Time series prediction: forecasting the future and understanding the past, Santa Fe institute studies in the sciences of complexity (Vol. XV). Reading, MA: Addison-Wesley.Google Scholar
  17. Goldberg, D. E. (1989). Genetic algorithms in search, optimization, and machine learning. Reading, MA: Addison-Wesley.Google Scholar
  18. Grois, E., Hsu, W. H., Wilkins, D. C., & Voloshin, M. (1998). Bayesian network models for automatic generation of crisis management training scenarios. Proceedings of the National Conference on Innovative Applications of Artificial Intelligence (IAAI-98), Madison, WI (pp. 1113–1120). Menlo Park, CA: AAAI Press.Google Scholar
  19. Hayes-Roth, B., Larsson, J. E., Brownston, L., Gaba, D., & Flanagan, B. (1996). Guardian Project Home Page, URL: http://www-ksl.stanford.edu/projects/guardian/index.html.Google Scholar
  20. Haykin, S. (1994). Neural networks: A comprehensive foundation. New York, NY: Macmillan College Publishing.Google Scholar
  21. Horvitz, E. & Barry, M. (1995). Display of information for time-critical decision making. Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. San Mateo, CA: Morgan-Kaufmann.Google Scholar
  22. Heckerman, D. A. (1996). A tutorial on learning with bayesian networks. Microsoft Research Technical Report 95–06, revised June 1996.Google Scholar
  23. Hsu, W. H. (1998). Time series learning with probabilistic network composites. Ph.D. Thesis, University of Illinois at Urbana-Champaign (UIUC-DCS-R2063). URL: http://www.ncsa.uiuc.edu/People/bhsu/thesis.html.Google Scholar
  24. Hsu, W. H., Gettings, N. D., Lease, V. E., Pan, Y., & Wilkins, D. C. (1998). A new approach to multistrategy learning from heterogeneous time series. Proceedings of the InternationalWorkshop on Multistrategy Learning, Milan, Italy.Google Scholar
  25. Hsu, W. H., Auvil, L. S., Pottenger, W. M., Teheng, D., & Welge, M. (1999). Self-organizing systems for knowledge discovery in databases. Proceedings of the International Joint Conference on Neural Networks (IJCNN-99), Washington, DC.Google Scholar
  26. Hsu, W. H. & Ray, S. R. (1998). A new mixture model for concept learning from time series. Proceedings of the 1998 Joint AAAI-ICML Workshop on AI Approaches to Time Series Problems (Technical Report WS–98–07), Madison, WI (pp. 42–43). Menlo Park, CA: AAAI Press.Google Scholar
  27. Hsu, W. H. & Ray, S. R. (1999). A recurrent mixture model for time series classification. Proceedings of the International Joint Conference on Neural Networks (IJCNN-99), Washington, DC.Google Scholar
  28. Hsu, W. H. & Zwarico, A. E. (1995). Automatic synthesis of compression techniques for heterogeneous files. Software: Practice and Experience, 25(10), 1097–1116.Google Scholar
  29. Jacobs, R. A., Jordan, M. I., & Barto, A. G. (1991). Task decomposition through competition in a modular connectionist architecture: the what and where vision tasks. Cognitive Science, 15, 219–250.Google Scholar
  30. Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive mixtures of local experts. Neural Computation, 3, 79–87.Google Scholar
  31. Jordan, M. I. (1997a). Approximate inference via variational techniques. International Conference on Uncertainty in Artificial Intelligence, Providence, RI, invited talk.Google Scholar
  32. Jordan, M. I. (1997b). Personal communication.Google Scholar
  33. Jordan, M. I. & Jacobs, R. A. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6, 181–214.Google Scholar
  34. Kantz, H. & Schreiber, T. (1997). Nonlinear time series analysis. Cambridge, UK: Cambridge University Press.Google Scholar
  35. Kira, K. & Rendell, L. A. (1992). The feature selection problem: traditional methods and a new algorithm. Proceedings of the National Conference on Artificial Intelligence (AAAI-92), San Jose, CA (pp. 129–134). Cambridge, MA: MIT Press.Google Scholar
  36. Kohavi, R. & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence (special issue on relevance), 97(1–2), 273–324.Google Scholar
  37. Kohavi, R., Sommerfield, D. & Dougherty, J. (1996). Data mining using MLC++: A machine learning library in C++. Tools with artificial intelligence (pp. 234–245). Rockville, MD: IEEE Computer Society Press.Google Scholar
  38. Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78, 1464–1480.Google Scholar
  39. Lang, K. J., Waibel, A. H., & Hinton, G. E. (1990). A time-delay neural network architecture for isolated word recognition. Neural Networks, 3, 23–43.Google Scholar
  40. Li, T., Fang, L. & Li, K. Q-Q. (1993). Hierarchical classification and vector quantization with neural trees. Neurocomputing, 5, 119–139.Google Scholar
  41. McCullagh, P. & Nelder, J. A. (1983). Generalized linear models. London, UK: Chapman and Hall.Google Scholar
  42. Mengshoel, O. J. & Wilkins, D. C. (1996). Recognition and critiquing of erroneous student actions. Proceedings of the AAAI Workshop on Agent Modeling (pp. 61–68). Menlo Park, CA: AAAI Press.Google Scholar
  43. Michalski, R. S. (1983). A theory and methodology of inductive learning. Artificial Intelligence, 20(2), 111–161. Reprinted in; Readings in knowledge acquisition and learning, B. G. Buchanan & D. C. Wilkins (Eds.) (1993). San Mateo, CA: Morgan-Kaufmann.Google Scholar
  44. Mozer, M. C. (1994). Neural net architectures for temporal sequence processing. In A. S. Weigend & N. A. Gershenfeld (Eds.), Time series prediction: forecasting the future and understanding the past, Santa Fe institute studies in the sciences of complexity (Vol. XV). Reading, MA: Addison-Wesley.Google Scholar
  45. Neal, R. M. (1996). Bayesian learning for neural networks. New York, NY: Springer-Verlag.Google Scholar
  46. Principè, J. & deVries. (1992). The Gamma Model-A new neural net model for temporal processing. Neural Networks, 5, 565–576.Google Scholar
  47. Principè, J. & Lefebvre, C. (1998). NeuroSolutions v3.02. Gainesville, FL: NeuroDimension. URL: http://www.nd.com.Google Scholar
  48. Ray, S. R. & Hsu, W. H. (1998). Self-organized-expert modular network for classification of spatiotemporal sequences. Journal of Intelligent Data Analysis, 2(4). URL: http://www-east.elsevier.com/ida/browse/0204/ida00039/ida00039.htm.Google Scholar
  49. Resnick, P. & Varian, H. R. (1997). Recommender systems. Communications of the ACM, 40(3), 56–58.Google Scholar
  50. Rueckl, J. G., Cave, K. R., & Kosslyn, S. M. (1989). Why are “What” and “Where” Processed by Separate Cortical Visual Systems? A computational investigation. Journal of Cognitive Neuroscience, 1, 171–186.Google Scholar
  51. Russell, S. & Norvig, P. (1995). Artificial intelligence: A modern approach. Englewood Cliffs, NJ: Prentice Hall.Google Scholar
  52. Sarle, W. S. (Ed.) (1999). Neural network FAQ, periodic posting to the USENET newsgroup comp.ai.neural-nets.Google Scholar
  53. Schuurmans, D. (1997). A new metric-based approach to model selection. Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI-97), Providence, RI (pp. 552–558).Google Scholar
  54. Smyth, P. (1998). Challlenges for the application of machine learning problems. 1998 Joint AAAI-ICMLWorkshop on the Methodology of Applying Machine, Madison, WI, Invited talk. Menlo Park, CA: AAAI Press.Google Scholar
  55. Stein, B. & Meredith, M. A. (1993). The merging of the senses. Cambridge, MA: MIT Press.Google Scholar
  56. Stepp, R. E. & Michalski, R. S. (1986). Conceptual clustering: Inventing goal-oriented classifications of structured objects. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning: An artificial Intelligence Approach. San Mateo, CA: Morgan-Kaufmann.Google Scholar
  57. Stone, M. (1977). An asymptoticl equivalence of choice of models by cross-validation and akaike's criterion. Journal of the Royal Statistical Society Series B, 39, 44–47.Google Scholar
  58. Vapnik, V. N. (1996). The nature of statistical learning theory. New York, NY: Springer-Verlag.Google Scholar
  59. Watanabe, S. (1985). Pattern recognition: human and mechanical. New York, NY: John Wiley and Sons.Google Scholar
  60. Wilkins, D. C. & Sniezek, J. A. (1997). DC-ARM: Automation for reduced manning. Knowledge Based Systems Laboratory, Technical Report UIUC-BI-KBS–97–012. Beckman Institute, University of Illinois at Urbana-Champaign.Google Scholar
  61. Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5, 241–259.Google Scholar

Copyright information

© Kluwer Academic Publishers 2000

Authors and Affiliations

  • William H. Hsu
    • 1
  • Sylvian R. Ray
    • 2
  • David C. Wilkins
    • 3
  1. 1.National Center for Supercomputing ApplicationsUniversity of Illinois at Urbana-ChampaignChampaignUSA
  2. 2.Department of Computer Science and Beckman InstituteUniversity of Illinois at Urbana-ChampaignUrbanaUSA
  3. 3.Beckman InstituteUniversity of Illinois at Urbana-ChampaignUrbanaUSA

Personalised recommendations