# The teaching size: computable teachers and learners for universal languages

- 338 Downloads

**Part of the following topical collections:**

## Abstract

The theoretical hardness of machine teaching has usually been analyzed for a range of concept languages under several variants of the teaching dimension: the minimum number of examples that a teacher needs to figure out so that the learner identifies the concept. However, for languages where concepts have structure (and hence size), such as Turing-complete languages, a low teaching dimension can be achieved at the cost of using very large examples, which are hard to process by the learner. In this paper we introduce the *teaching size*, a more intuitive way of assessing the theoretical feasibility of teaching concepts for structured languages. In the most general case of universal languages, we show that focusing on the total size of a witness set rather than its cardinality, we can teach all total functions that are computable within some fixed time bound. We complement the theoretical results with a range of experimental results on a simple Turing-complete language, showing how teaching dimension and teaching size differ in practice. Quite remarkably, we found that witness sets are usually smaller than the programs they identify, which is an illuminating justification of why machine teaching from examples makes sense at all.

## Keywords

Machine teaching Teaching dimension Teaching size Compression Universal languages P” programming language Levin’s search## Notes

### Acknowledgements

We would like to thank the anonymous referees for their helpful comments. This work was supported by the EU (FEDER) and the Spanish MINECO under grant RTI2018-094403-B-C32, and the Generalitat Valenciana PROMETEO/2019/098. This work was done while the first author visited Universitat Politècnica de València and also while the third author visited University of Bergen (covered by Generalitat Valenciana BEST/2018/027 and University of Bergen). J. Hernández-Orallo is also funded by an FLI grant RFP2-152.

## References

- Angluin, D., & Kriķis, M. (2003). Learning from different teachers.
*Machine Learning*,*51*(2), 137–163.CrossRefzbMATHGoogle Scholar - Balbach, F. J. (2007). Models for algorithmic teaching. Ph.D. thesis, University of Lübeck.Google Scholar
- Balbach, F. J. (2008). Measuring teachability using variants of the teaching dimension.
*Theoretical Computer Science*,*397*(1–3), 94–113.MathSciNetCrossRefzbMATHGoogle Scholar - Balbach, F. J., & Zeugmann, T. (2009). Recent developments in algorithmic teaching. In
*Intl conf on language and automata theory and applications*(pp. 1–18). Springer.Google Scholar - Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. In
*Proceedings of the 26th annual international conference on machine learning*(pp. 41–48). ACM.Google Scholar - Biran, O., & Cotton, C. (2017). Explanation and justification in machine learning: A survey. In
*IJCAI-17 Workshop on explainable AI (XAI)*(p. 8).Google Scholar - Böhm, C. (1964). On a family of turing machines and the related programming language.
*ICC Bulletin*,*3*(3), 187–194.MathSciNetGoogle Scholar - Elias, P. (1975). Universal codeword sets and representations of the integers.
*IEEE Transactions on Information Theory*,*21*(2), 194–203.MathSciNetCrossRefzbMATHGoogle Scholar - Freivalds, R., Kinber, E. B., & Wiehagen, R. (1989). Inductive inference from good examples. In
*International workshop on analogical and inductive inference*(pp. 1–17). Springer.Google Scholar - Freivalds, R., Kinber, E. B., & Wiehagen, R. (1993). On the power of inductive inference from good examples.
*Theoretical Computer Science*,*110*(1), 131–144.MathSciNetCrossRefzbMATHGoogle Scholar - Gao, Z., Ries, C., Simon, H. U., & Zilles, S. (2016). Preference-based teaching. In
*Conf. on learning theory*(pp. 971–997).Google Scholar - Gold, E. M. (1967). Language identification in the limit.
*Information and Control*,*10*(5), 447–474.MathSciNetCrossRefzbMATHGoogle Scholar - Goldman, S. A., & Kearns, M. J. (1995). On the complexity of teaching.
*Journal of Computer and System Sciences*,*50*(1), 20–31.MathSciNetCrossRefzbMATHGoogle Scholar - Goldman, S. A., & Mathias, H. D. (1993). Teaching a smart learner. In
*Conf. on computational learning theory*(pp. 67–76).Google Scholar - Gulwani, S., Hernández-Orallo, J., Kitzelmann, E., Muggleton, S. H., Schmid, U., & Zorn, B. (2015). Inductive programming meets the real world.
*Communications of the ACM,**58*(11).Google Scholar - Hernandez-Orallo, J., & Telle, J. A. (2018). Finite biased teaching with infinite concept classes. arXiv preprint. arXiv:1804.07121.
- Jun, S. W. (2016). 50,000,000,000 instructions per second: Design and implementation of a 256-core brainfuck computer. Computer Science and AI Laboratory, MIT.Google Scholar
- Khan, F., Mutlu, B., & Zhu, X. (2011). How do humans teach: On curriculum learning and teaching dimension. In
*Advances in neural information processing systems*(pp. 1449–1457).Google Scholar - Lake, B., & Baroni, M. (2018). Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In
*ICML*(pp. 2879–2888).Google Scholar - Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction.
*Science*,*350*(6266), 1332–1338.MathSciNetCrossRefzbMATHGoogle Scholar - Lázaro-Gredilla, M., Lin, D., Guntupalli, J. S., & George, D. (2019). Beyond imitation: Zero-shot task transfer on robots by learning concepts as cognitive programs.
*Science Robotics**4*.Google Scholar - Levin, L. A. (1973). Universal Search Problems.
*Problems of Information Transmission*,*9*, 265–266.Google Scholar - Li, M., & Vitányi, P. (2008).
*An introduction to Kolmogorov complexity and its applications*(3rd ed.). New York, NY: Springer.CrossRefzbMATHGoogle Scholar - Lieberman, H. (2001).
*Your wish is my command: Programming by example*. San Francisco, CA: Morgan Kaufmann.Google Scholar - Shafto, P., Goodman, N. D., & Griffiths, T. L. (2014). A rational account of pedagogical reasoning: Teaching by, and learning from, examples.
*Cognitive Psychology*,*71*, 55–89.CrossRefGoogle Scholar - Shinohara, A., & Miyano, S. (1991). Teachability in computational learning.
*New Generation Computing*,*8*(4), 337–347.CrossRefzbMATHGoogle Scholar - Simard, P. Y., Amershi, S., Chickering, D. M., Pelton, A. E., Ghorashi, S., Meek, C., Ramos, G., Suh, J., Verwey, J., & Wang, M., et al. (2017). Machine teaching: A new paradigm for building machine learning systems. arXiv preprint arXiv:1707.06742.
- Solomonoff, R. J. (1964). A formal theory of inductive inference. Part I.
*Information and Control*,*7*(1), 1–22.MathSciNetCrossRefzbMATHGoogle Scholar - Valiant, L. G. (1984). A theory of the learnable.
*Communications of the ACM*,*27*(11), 1134–1142.CrossRefzbMATHGoogle Scholar - Vapnik, V. N., & Chervonenkis, A. Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities.
*Theory of Probability and Its Applications*,*16*, 264–280.CrossRefzbMATHGoogle Scholar - Zhu, X. (2013). Machine teaching for Bayesian learners in the exponential family. In
*Neural information processing systems 26, Curran*(pp. 1905–1913).Google Scholar - Zhu, X. (2015). Machine teaching: An inverse problem to machine learning and an approach toward optimal education. In
*AAAI*(pp. 4083–4087).Google Scholar - Zhu, X., Singla, A., Zilles, S., & Rafferty, A. N. (2018). An overview of machine teaching. arXiv preprint arXiv:1801.05927.