The teaching size: computable teachers and learners for universal languages
- 338 Downloads
The theoretical hardness of machine teaching has usually been analyzed for a range of concept languages under several variants of the teaching dimension: the minimum number of examples that a teacher needs to figure out so that the learner identifies the concept. However, for languages where concepts have structure (and hence size), such as Turing-complete languages, a low teaching dimension can be achieved at the cost of using very large examples, which are hard to process by the learner. In this paper we introduce the teaching size, a more intuitive way of assessing the theoretical feasibility of teaching concepts for structured languages. In the most general case of universal languages, we show that focusing on the total size of a witness set rather than its cardinality, we can teach all total functions that are computable within some fixed time bound. We complement the theoretical results with a range of experimental results on a simple Turing-complete language, showing how teaching dimension and teaching size differ in practice. Quite remarkably, we found that witness sets are usually smaller than the programs they identify, which is an illuminating justification of why machine teaching from examples makes sense at all.
KeywordsMachine teaching Teaching dimension Teaching size Compression Universal languages P” programming language Levin’s search
We would like to thank the anonymous referees for their helpful comments. This work was supported by the EU (FEDER) and the Spanish MINECO under grant RTI2018-094403-B-C32, and the Generalitat Valenciana PROMETEO/2019/098. This work was done while the first author visited Universitat Politècnica de València and also while the third author visited University of Bergen (covered by Generalitat Valenciana BEST/2018/027 and University of Bergen). J. Hernández-Orallo is also funded by an FLI grant RFP2-152.
- Balbach, F. J. (2007). Models for algorithmic teaching. Ph.D. thesis, University of Lübeck.Google Scholar
- Balbach, F. J., & Zeugmann, T. (2009). Recent developments in algorithmic teaching. In Intl conf on language and automata theory and applications (pp. 1–18). Springer.Google Scholar
- Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. In Proceedings of the 26th annual international conference on machine learning (pp. 41–48). ACM.Google Scholar
- Biran, O., & Cotton, C. (2017). Explanation and justification in machine learning: A survey. In IJCAI-17 Workshop on explainable AI (XAI) (p. 8).Google Scholar
- Freivalds, R., Kinber, E. B., & Wiehagen, R. (1989). Inductive inference from good examples. In International workshop on analogical and inductive inference (pp. 1–17). Springer.Google Scholar
- Gao, Z., Ries, C., Simon, H. U., & Zilles, S. (2016). Preference-based teaching. In Conf. on learning theory (pp. 971–997).Google Scholar
- Goldman, S. A., & Mathias, H. D. (1993). Teaching a smart learner. In Conf. on computational learning theory (pp. 67–76).Google Scholar
- Gulwani, S., Hernández-Orallo, J., Kitzelmann, E., Muggleton, S. H., Schmid, U., & Zorn, B. (2015). Inductive programming meets the real world. Communications of the ACM, 58(11).Google Scholar
- Hernandez-Orallo, J., & Telle, J. A. (2018). Finite biased teaching with infinite concept classes. arXiv preprint. arXiv:1804.07121.
- Jun, S. W. (2016). 50,000,000,000 instructions per second: Design and implementation of a 256-core brainfuck computer. Computer Science and AI Laboratory, MIT.Google Scholar
- Khan, F., Mutlu, B., & Zhu, X. (2011). How do humans teach: On curriculum learning and teaching dimension. In Advances in neural information processing systems (pp. 1449–1457).Google Scholar
- Lake, B., & Baroni, M. (2018). Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In ICML (pp. 2879–2888).Google Scholar
- Lázaro-Gredilla, M., Lin, D., Guntupalli, J. S., & George, D. (2019). Beyond imitation: Zero-shot task transfer on robots by learning concepts as cognitive programs. Science Robotics 4.Google Scholar
- Levin, L. A. (1973). Universal Search Problems. Problems of Information Transmission, 9, 265–266.Google Scholar
- Lieberman, H. (2001). Your wish is my command: Programming by example. San Francisco, CA: Morgan Kaufmann.Google Scholar
- Simard, P. Y., Amershi, S., Chickering, D. M., Pelton, A. E., Ghorashi, S., Meek, C., Ramos, G., Suh, J., Verwey, J., & Wang, M., et al. (2017). Machine teaching: A new paradigm for building machine learning systems. arXiv preprint arXiv:1707.06742.
- Zhu, X. (2013). Machine teaching for Bayesian learners in the exponential family. In Neural information processing systems 26, Curran (pp. 1905–1913).Google Scholar
- Zhu, X. (2015). Machine teaching: An inverse problem to machine learning and an approach toward optimal education. In AAAI (pp. 4083–4087).Google Scholar
- Zhu, X., Singla, A., Zilles, S., & Rafferty, A. N. (2018). An overview of machine teaching. arXiv preprint arXiv:1801.05927.