Knows what it knows: a framework for self-aware learning

Li, Lihong; Littman, Michael L.; Walsh, Thomas J.; Strehl, Alexander L.

doi:10.1007/s10994-010-5225-4

Knows what it knows: a framework for self-aware learning

Published: 25 November 2010

Volume 82, pages 399–443, (2011)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Knows what it knows: a framework for self-aware learning

Download PDF

Lihong Li¹,
Michael L. Littman²,
Thomas J. Walsh³ &
…
Alexander L. Strehl⁴

4969 Accesses
49 Citations
1 Altmetric
Explore all metrics

Abstract

We introduce a learning framework that combines elements of the well-known PAC and mistake-bound models. The KWIK (knows what it knows) framework was designed particularly for its utility in learning settings where active exploration can impact the training examples the learner is exposed to, as is true in reinforcement-learning and active-learning problems. We catalog several KWIK-learnable classes as well as open problems, and demonstrate their applications in experience-efficient reinforcement learning.

References

Abbeel, P., & Ng, A. Y. (2005). Exploration and apprenticeship learning in reinforcement learning. In Proceedings of the twenty-second international conference on machine learning (pp. 1–8).
Google Scholar
Angluin, D. (1988). Queries and concept learning. Machine Learning, 2, 319–342.
Google Scholar
Angluin, D. (2004). Queries revisited. Theoretical Computer Science, 313, 175–194.
Article MathSciNet MATH Google Scholar
Auer, P. (2002). Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3, 397–422.
Article MathSciNet Google Scholar
Bagnell, J., Ng, A. Y., & Schneider, J. (2001). Solving uncertain Markov decision problems (Technical Report CMU-RI-TR-01-25). Robotics Institute, Carnegie Mellon University, Pittsburgh, PA.
Bellman, R. (1957). Dynamic programming. Princeton: Princeton University Press.
MATH Google Scholar
Bertsekas, D., & Shreve, S. (1978). Stochastic optimal control: The discrete time case. New York: Academic Press.
MATH Google Scholar
Blum, A. (1994). Separating distribution-free and mistake-bound learning models over the Boolean domain. SIAM Journal on Computing, 23, 990–1000.
Article MathSciNet Google Scholar
Boutilier, C., Dean, T., & Hanks, S. (1999). Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11, 1–94.
MathSciNet MATH Google Scholar
Brafman, R. I., & Tennenholtz, M. (2002). R-MAX—a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3, 213–231.
Article MathSciNet Google Scholar
Brunskill, E., Leffler, B. R., Li, L., Littman, M. L., & Roy, N. (2008). CORL: A continuous-state offset-dynamics reinforcement learner. In Proceedings of the twenty-fourth conference on uncertainty in artificial intelligence (UAI-08) (pp. 53–61).
Google Scholar
Brunskill, E., Leffler, B. R., Li, L., Littman, M. L., & Roy, N. (2009). Provably efficient learning with typed parametric models. Journal of Machine Learning Research, 10, 1955–1988.
MathSciNet Google Scholar
Cesa-Bianchi, N., Lugosi, G., & Stoltz, G. (2005). Minimizing regret with label efficient prediction. IEEE Transactions on Information Theory, 51, 2152–2162.
Article MathSciNet Google Scholar
Cesa-Bianchi, N., Gentile, C., & Zaniboni, L. (2006). Worst-case analysis of selective sampling for linear classification. Journal of Machine Learning Research, 7, 1205–1230.
MathSciNet Google Scholar
Cesa-Bianchi, N., Gentile, C., & Orabona, F. (2009). Robust bounds for classification via selective sampling. In Proceedings of the twenty-sixth international conference on machine learning (ICML-09) (pp. 121–128).
Google Scholar
Chow, C.-S., & Tsitsiklis, J. N. (1989). The complexity of dynamic programming. Journal of Complexity, 5, 466–488.
Article MathSciNet MATH Google Scholar
Cohn, D. A., Atlas, L., & Ladner, R. E. (1994). Improving generalization with active learning. Machine Learning, 15, 201–221.
Google Scholar
Dean, T., & Kanazawa, K. (1989). A model for reasoning about persistence and causation. Computational Intelligence, 5, 142–150.
Article Google Scholar
Diuk, C., Li, L., & Leffler, B. R. (2009). The adaptive k-meteorologists problem and its application to structure discovery and feature selection in reinforcement learning. In Proceedings of the twenty-sixth international conference on machine learning (ICML-09) (pp. 249–256).
Google Scholar
Fong, P. W. L. (1995a). A quantitative study of hypothesis selection. In Proceedings of the twelfth international conference on machine learning (ICML-95) (pp. 226–234).
Google Scholar
Fong, P. W. L. (1995b). A quantitative study of hypothesis selection. Master’s thesis, Department of Computer Science, University of Waterloo, Ontario, Canada.
Freund, Y., Schapire, R. E., Singer, Y., & Warmuth, M. K. (1997a). Using and combining predictors that specialize. In STOC’97: Proceedings of the twenty-ninth annual ACM symposium on theory of computing (pp. 334–343).
Chapter Google Scholar
Freund, Y., Seung, H. S., Shamir, E., & Tishby, N. (1997b). Selective sampling using the query by committee algorithm. Machine Learning, 28, 133–168.
Article MATH Google Scholar
Freund, Y., Mansour, Y., & Schapire, R. E. (2004). Generalization bounds for averaged classifiers. The Annals of Statistics, 32, 1698–1722.
Article MathSciNet MATH Google Scholar
Golub, G. H., & Van Loan, C. F. (1989). Matrix computations (2nd ed.). Baltimore: The Johns Hopkins University Press.
MATH Google Scholar
Helmbold, D. P., Littlestone, N., & Long, P. M. (2000). Apple tasting. Information and Computation, 161, 85–139.
Article MathSciNet MATH Google Scholar
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58, 13–30.
Article MathSciNet MATH Google Scholar
Kakade, S. M. (2003). On the sample complexity of reinforcement learning. Doctoral dissertation, Gatsby Computational Neuroscience Unit, University College London.
Kakade, S., Kearns, M., & Langford, J. (2003). Exploration in metric state spaces. In Proceedings of the 20th international conference on machine learning.
Google Scholar
Kearns, M. J., & Koller, D. (1999). Efficient reinforcement learning in factored MDPs. In Proceedings of the 16th International joint conference on artificial intelligence (IJCAI) (pp. 740–747).
Google Scholar
Kearns, M. J., & Schapire, R. E. (1994). Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48, 464–497.
Article MathSciNet MATH Google Scholar
Kearns, M. J., & Singh, S. P. (2002). Near-optimal reinforcement learning in polynomial time. Machine Learning, 49, 209–232.
Article MATH Google Scholar
Kearns, M. J., Schapire, R. E., & Sellie, L. (1994). Toward efficient agnostic learning. Machine Learning, 17, 115–141.
MATH Google Scholar
Kearns, M. J., Mansour, Y., & Ng, A. Y. (2002). A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Machine Learning, 49, 193–208.
Article MATH Google Scholar
Klasner, N., & Simon, H. U. (1995). From noise-free to noise-tolerant and from on-line to batch learning. In Proceedings of the eighth annual conference on computational learning theory (COLT-95) (pp. 250–257).
Chapter Google Scholar
Kocsis, L., & Szepesvári, C. (2006). Bandit based Monte-Carlo planning. In Proceedings of the seventeenth European conference on machine learning (ECML-06) (pp. 282–293).
Google Scholar
Lane, T., & Brodley, C. E. (2003). An empirical study of two approaches to sequence learning for anomaly detection. Machine Learning, 51, 73–107.
Article MATH Google Scholar
Leffler, B. R., Littman, M. L., & Edmunds, T. (2007). Efficient reinforcement learning with relocatable action models. In Proceedings of the twenty-second conference on artificial intelligence (AAAI-07).
Google Scholar
Li, L. (2009). A unifying framework for computational reinforcement learning theory. Doctoral dissertation, Rutgers University, New Brunswick, NJ.
Li, L., & Littman, M. L. (2010). Reducing reinforcement learning to KWIK online regression. Annals of Mathematics and Artificial Intelligence. doi:10.1007/s10472-010-9201-2.
Google Scholar
Li, L., Littman, M. L., & Walsh, T. J. (2008). Knows what it knows: A framework for self-aware learning. In Proceedings of the twenty-fifth international conference on machine learning (pp. 568–575).
Google Scholar
Li, L., Chu, W., Langford, J., & Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. In Proceedings of the nineteenth international conference on World Wide Web (WWW-10) (pp. 661–670).
Chapter Google Scholar
Littlestone, N. (1987). Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2, 285–318.
Google Scholar
Littlestone, N. (1989). From on-line to batch learning. In Proceedings of the second annual workshop on computational learning theory (COLT-89) (pp. 269–284).
Google Scholar
Moore, A. W., & Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13, 103–130.
Google Scholar
Puterman, M. L. (1994). Markov decision processes—discrete stochastic dynamic programming. New York: Wiley.
MATH Google Scholar
Seung, H. S., Opper, M., & Tishby, N. (1992). Query by committee. In Proceedings of the fifth annual workshop on computational learning theory (COLT-92) (pp. 287–294).
Chapter Google Scholar
Shafer, G., & Vovk, V. (2008). A tutorial on conformal prediction. Journal of Machine Learning Research, 9, 371–421.
MathSciNet Google Scholar
Singh, S. P., & Yee, R. C. (1994). An upper bound on the loss from approximate optimal-value functions. Machine Learning, 16, 227.
MATH Google Scholar
Sontag, E. D. (1998). Texts in Applied Mathematics: Vol. 6. Mathematical control theory: Deterministic finite dimensional systems (2nd ed.). Berlin: Springer.
MATH Google Scholar
Strehl, A. L., & Littman, M. L. (2008). Online linear regression and its application to model-based reinforcement learning. Advances in Neural Information Processing Systems, 20.
Strehl, A. L., Li, L., & Littman, M. L. (2006a). Incremental model-based learners with formal learning-time guarantees. In Proceedings of the 22nd conference on uncertainty in artificial intelligence (UAI 2006).
Google Scholar
Strehl, A. L., Li, L., Wiewiora, E., Langford, J., & Littman, M. L. (2006b). PAC model-free reinforcement learning. In Proceedings of the twenty-third international conference on machine learning (ICML-06).
Google Scholar
Strehl, A. L., Mesterharm, C., Littman, M. L., & Hirsh, H. (2006c). Experience-efficient learning in associative bandit problems. In Proceedings of the twenty-third international conference on machine learning (ICML-06).
Google Scholar
Strehl, A. L., Diuk, C., & Littman, M. L. (2007). Efficient structure learning in factored-state MDPs. In Proceedings of the twenty-second national conference on artificial intelligence (AAAI-07)
Google Scholar
Strehl, A. L., Li, L., & Littman, M. L. (2009). Reinforcement learning in finite MDPs: PAC analysis. Journal of Machine Learning Research, 10, 2413–2444.
MathSciNet Google Scholar
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: The MIT Press.
Google Scholar
Szita, I., & Szepesvári, C. (2010). Model-based reinforcement learning with nearly tight exploration complexity bounds. In Proceedings of the twenty-seventh international conference on machine learning (ICML-2010).
Google Scholar
Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27, 1134–1142.
Article MATH Google Scholar
Walsh, T. J., Szita, I., Diuk, C., & Littman, M. L. (2009). Exploring compact reinforcement-learning representations with linear regression. In Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence (UAI-09) (pp. 591–598). A refined version is available as Technical Report DCS-tr-660, Department of Computer Science, Rutgers University, December, 2009.
Google Scholar
Weiss, G. M., & Tian, Y. (2006). Maximizing classifier utility when training data is costly. SIGKDD Explorations, 8, 31–38.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Yahoo! Research, 4401 Great America Parkway, Santa Clara, CA, 95054, USA
Lihong Li
Department of Computer Science, Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ, 08854, USA
Michael L. Littman
Department of Computer Science, University of Arizona, 1040 E. 4th Street, Tucson, AZ, 85721, USA
Thomas J. Walsh
Facebook, 1601 S California Ave, Palo Alto, CA, 94304, USA
Alexander L. Strehl

Authors

Lihong Li
View author publications
You can also search for this author in PubMed Google Scholar
Michael L. Littman
View author publications
You can also search for this author in PubMed Google Scholar
Thomas J. Walsh
View author publications
You can also search for this author in PubMed Google Scholar
Alexander L. Strehl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lihong Li.

Additional information

Editor: Roni Khardon.

Part of the work was done while L. Li, T. Walsh, and A. Strehl were at the Rutgers University.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, L., Littman, M.L., Walsh, T.J. et al. Knows what it knows: a framework for self-aware learning. Mach Learn 82, 399–443 (2011). https://doi.org/10.1007/s10994-010-5225-4

Download citation

Received: 07 August 2009
Revised: 12 October 2010
Accepted: 30 October 2010
Published: 25 November 2010
Issue Date: March 2011
DOI: https://doi.org/10.1007/s10994-010-5225-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Knows what it knows: a framework for self-aware learning

Abstract

Article PDF

Similar content being viewed by others

Model-free reinforcement learning from expert demonstrations: a survey

Reinforcement Learning: A Friendly Introduction

Reinforcement Learning Algorithms: Categorization and Structural Properties

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Knows what it knows: a framework for self-aware learning

Abstract

Article PDF

Similar content being viewed by others

Model-free reinforcement learning from expert demonstrations: a survey

Reinforcement Learning: A Friendly Introduction

Reinforcement Learning Algorithms: Categorization and Structural Properties

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation