Abstract
When Minsky and Chomsky were at Harvard in the 1950s, they started out their careers questioning a number of machine learning methods that have since regained popularity. Minsky’s Perceptrons was a reaction to neural nets and Chomsky’s Syntactic Structures was a reaction to ngram language models. Many of their objections are being ignored and forgotten (perhaps for good reasons, and perhaps not). While their arguments may sound negative, I believe there is a more constructive way to think about their efforts; they were both attempting to organize computational tasks into larger frameworks such as what is now known as the Chomsky Hierarchy and algorithmic complexity. Section 5 will propose an organizing framework for deep nets. Deep nets are probably not the solution to all the world’s problems. They don’t do the impossible (solve the halting problem), and they probably aren’t great at many tasks such as sorting large vectors and multiplying large matrices. In practice, deep nets have produced extremely exciting results in vision and speech, though other tasks may be more challenging for deep nets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
http://techtalks.tv/talks/closing-session/60532/ (at 6:07 min).
- 23.
- 24.
References
Church, K.: Emerging trends: artificial intelligence, China and my new job at Baidu. J. Nat. Lang. Eng. (to appear). University Press, Cambridge
Minsky, M., Papert, S.: Perceptrons. MIT Press, Cambridge (1969)
Chomsky, N.: Syntactic Structures. Mouton & Co. (1957). https://archive.org/details/NoamChomskySyntcaticStructures
Shannon, C.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423, 623–656 (1948). http://math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf
Shannon, C.: Prediction and entropy of printed English. Bell Syst. Tech. J. 30(1), 50–64 (1951). https://www.princeton.edu/~wbialek/rome/refs/shannon51.pdf
Zipf, G.: Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Boston (1949)
Harris, Z.: Distributional structure. Word 10(2–3), 146–162 (1954)
Firth, J.: A synopsis of linguistic theory, 1930–1955. Stud. Linguist. Anal. Basil Blackwell (1957). http://annabellelukin.edublogs.org/files/2013/08/Firth-JR-1962-A-Synopsis-of-Linguistic-Theory-wfihi5.pdf
Church, K.: A pendulum swung too far. Linguist. Issues Lang. Technol. 6(6), 1–27 (2011)
Turing, A.: On computable numbers, with an application to the Entscheidungsproblem. In: Proceedings of the London Mathematical Society, vol. 2, no. 1, pp. 230–265. Wiley Online Library (1937). http://www.turingarchive.org/browse.php/b/12
Hillis, W.: The Connection Machine. MIT Press, Cambridge (1989)
Blelloch, G., Leiserson, C., Maggs, B., Plaxton, C., Smith, S., Marco, C.: A comparison of sorting algorithms for the connection machine CM-2. In: Proceedings of the Third Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA, pp. 3–16 (1991). https://courses.cs.washington.edu/courses/cse548/06wi/files/benchmarks/radix.pdf
Church, K.: On memory limitations in natural language processing, unpublished Master’s thesis (1980). http://publications.csail.mit.edu/lcs/pubs/pdf/MIT-LCS-TR-245.pdf
Koskenniemi, K., Church, K.: Complexity, two-level morphology and Finnish. In: Coling (1988). https://aclanthology.info/pdf/C/C88/C88-1069.pdf
Graves, A., Wayne, G., Danihelka, I.: Neural Turing Machines. arXiv (2014). https://arxiv.org/abs/1410.5401
Sun, G., Giles, C., Chen, H., Lee, Y: The Neural Network Pushdown Automaton: Model, Stack and Learning Simulations. arXiv (2017). https://arxiv.org/abs/1711.05738
Banko, M., Brill, E.: Scaling to very very large corpora for natural language disambiguation, pp. 26–33. ACL (2001). http://www.aclweb.org/anthology/P01-1005
Church, K., Mercer, R.: Introduction to the special issue on computational linguistics using large corpora. Comput. Linguist. 19(1), 1–24 (1993). http://www.aclweb.org/anthology/J93-1001
West, G.: Scale. Penguin Books, New York (2017)
Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H.: Deep Learning Scaling is Predictable, Empirically. arXiv (2017). https://arxiv.org/abs/1712.00409
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Church, K.W. (2018). Minsky, Chomsky and Deep Nets. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2018. Lecture Notes in Computer Science(), vol 11107. Springer, Cham. https://doi.org/10.1007/978-3-030-00794-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-00794-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00793-5
Online ISBN: 978-3-030-00794-2
eBook Packages: Computer ScienceComputer Science (R0)