Abstract
The components of most real-world patterns contain redundant information. However, most pattern classifiers (e.g., statistical classifiers and neural nets) work better if pattern components are nonredundant. I present various unsupervised nonlinear predictor-based “neural” learning algorithms that transform patterns and pattern sequences into less redundant patterns without loss of information. The first part of the paper shows how a neural predictor can be used to remove redundant information from input sequences. Experiments with artificial sequences demonstrate that certain supervised classification techniques can greatly benefit from this kind of unsupervised preprocessing. In the second part of the paper, a neural predictor is used to remove redundant information from natural text. With certain short newspaper articles, the neural method can achieve better compression ratios than the widely used asymptotically optimal Lempel-Ziv string compression algorithm. The third part of the paper shows how a system of co-evolving neural predictors and neural code generating modules can build factorial (statistically nonredundant) codes of pattern ensembles. The method is successfully applied to images of letters randomly presented according to the probabilities of English language.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barlow, H.B., T.P. Kaushal, & G.J. Mitchison (1989). Finding minimum entropy codes. Neural Computation 1(3), 412–423.
Chaitin, G.J. (1969). On the length of programs for computing finite binary sequences: Statistical considerations. Journal of the Association for Computing Machinery 16 145–159.
Chaitin, G.J. (1975). A theory of program size formally identical to information theory. Journal of the Journal of the Association for Computing Machinery 22 329–340.
Hartmanis, J. (1983). Generalized Kolmogorov complexity and the structure of feasible computations. Proceedings of the 24th IEEE Symposium on Foundations of Computer Science (pp. 439–445).
Held, G. (1991). Data compression. New York: Wiley.
Hochreiter, S., & J. Schmidhuber (1997a). Flat minima. Neural Computation 9(1), 1–42.
Hochreiter, S., & J. Schmidhuber (1997b). Long short-term memory. Neural Computation 9 1681–1726.
Kolmogorov, A.N. (1965). Three approaches to the quantitative definition of information. Problems of Information 1 1–11.
LeCun, Y. (1985). Une procédure d’apprentissage pour réseau à seuil asymétrique. Proceedings of Cognitiva 85 (pp. 599–604). Paris.
Levin, L.A. (1973). Universal sequential search problems. Problems of Information Transmission 9(3), 265–266.
Levin, L.A. (1974). Laws of information (nongrowth) and aspects of the foundation of probability theory. Problems of Information Transmission 10(3), 206–210.
Li, M., & R.M.B. Vitányi (1993). An introduction to Kolmogorov complexity and its applications. Berlin: Springer.
Lindstädt, S. (1993a). Comparison of two unsupervised neural network models for redundancy reduction. In M.C. Mozer, R. Smolensky, D. Touretzky, J.L. Elman, & A.S. Weigend (eds.), Proceedings of the 1993 Connectionist Models Summer School (pp. 308–315). Hillsdale, NJ: Lawrence Erlbaum Ass.
Lindstädt, S. (1993b). Comparison of unsupervised neural networks for redundancy reduction. Master’s thesis, Deptartment of Computer Science, University of Colorado at Boulder.
Mozer, M.C. (1989). A focused back-propagation algorithm for temporal sequence recognition. Complex Systems 3 349–381.
Parker, D.B. (1985). Learning-logic. Technical Report TR-47, Center for Computer Research in Economics and Management Sciences. Cambridge, MA: MIT.
Pearlmutter, B.A. (1989). Learning state space trajectories in recurrent neural networks. Neural Computation 1(2), 263–269.
Robinson, A.J., & F. Fallside (1987). The utility driven dynamic error propagation network. Technical Report CUED/F-INFENG/TR. 1, Cambridge University Engineering Department.
Rumelhart, D.E., G.E. Hinton, & R.J. Williams (1986). Learning internal representations by error propagation. In D.E. Rumelhart & J.L. McClelland (eds.), Parallel distributed processing, Vol. 1 (pp. 318–362). Cambridge, MA: MIT Press.
Schmidhuber, J. (1992a). A fixed size storage O(n 3) time complexity learning algorithm for fully recurrent continually running networks. Neural Computation 4(2), 243–248.
Schmidhuber, J.(1992b). Learning complex, extended sequences using the principle of history compression. Neural Computation 4(2), 234–242.
Schmidhuber, J. (1992c). Learning factorial codes by predictability minimization. Neural Computation 4(6), 863–879.
Schmidhuber, J. (1992d). Learning unambiguous reduced sequence descriptions. In J.E. Moody, S.J. Hanson, & R.P. Lippman (eds.), Advances in neural information processing systems 4 (pp. 291–298). San Mateo, CA: Morgan Kaufinan.
Schmidhuber, J. (1993). Netzwerkarchitekturen, Zielfunktionen und Kettenregel Habilitationsschrift, Institut für Informatik, Technische Universität München.
Schmidhuber, J. (1995). Discovering solutions with low Kolmogorov complexity and high generalization capability. In A. Prieditis & S. Russell (eds.), Machine learning: Proceedings of the Twelfth International Conference (pp. 488–496). San Francisco, CA: Morgan Kaufmann.
Schmidhuber, J. (1997). Discovering neural nets with low Kolmogorov complexity and high generalization capability. Neural Networks 10(5), 857–873.
Schmidhuber, J., M.C. Mozer, & D. Prelinger (1993a). Continuous history compression. In H. Hüning, S. Neuhauser, M. Raus, & W. Ritschel (eds.), Proceedings of the International Workshop on Neural Networks (pp. 87–95). RWTH Aachen. Aachen, Germany: Augustinus.
Schmidhuber, J., & D. Prelinger (1993b). Discovering predictable classifications. Neural Computation 5(4), 625–635.
Schmidhuber, J., & S. Heil (1995). Predictive coding with neural nets: Application to text compression. In G. Tesauro, D.S. Touretzky, & T.K. Leen (eds.), Advances in neural information processing systems 7 (pp. 1047–1054). Cambridge, MA: MIT Press.
Schmidhuber, J., & S. Heil (1996). Sequential neural text compression. IEEE Transactions on Neural Networks 7(1), 142–146.
Schmidhuber, J., J. Zhao, & N. Schraudolph (1997a). Reinforcement learning with self-modifying policies. In S. Thrun & L. Pratt (eds.), Learning to learn (pp. 293–309). Dordrecht, The Netherlands: Kluwer.
Schmidhuber, J., J. Zhao, & M. Wiering (1997b). Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning 28 105–130.
Shannon, C.E. (1948). A mathematical theory of communication (parts I and II). Bell System Technical journal XXVII, 379–423.
Solomonofif, R.J. (1964). A formal theory of inductive inference. Part I. Information and Control 7 1–22.
Watanabe, O. (1992). Kolmogorov complexity and computational complexity. EATCS Monographs on Theoretical Computer Science. Berlin: Springer.
Werbos, P.J. (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard University.
Werbos, P.J. (1988). Generalization of backpropagation with application to a recurrent gas market model. Neural Networks 1.
Williams, R.J. (1989). Complexity of exact gradient computation algorithms for recurrent neural networks. Technical Report NU-CCS-89–27. Boston: Northeastern University, College of Computer Science.
Williams, R.J., & J. Peng (1990). An efficient gradient-based algorithm for online training of recurrent network trajectories. Neural Computation 4 491–501.
Williams, R.J., & D. Zipser (1989). Experimental analysis of the real-time recurrent learning algorithm. Connection Science 1(1), 87–111.
Witten, I.H., R.M. Neal, & J.G. Cleary (1987). Arithmetic coding for data compression. Communications of the ACM 30(6), 520–540.
Wyner, A., & J. Ziv (1991). Fixed data base version of the Lempel-Ziv data compression algorithm. IEEE Transactions on Information Theory 37 878–880.
Ziv, J., & A. Lempel (1977). A universal algorithm for sequential data compression. IEEE Transactions on Information Theory IT-23(5) 337–343.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Schmidhuber, J. (2000). Neural Predictors for Detecting and Removing Redundant Information. In: Cruse, H., Dean, J., Ritter, H. (eds) Prerational Intelligence: Adaptive Behavior and Intelligent Systems Without Symbols and Logic, Volume 1, Volume 2 Prerational Intelligence: Interdisciplinary Perspectives on the Behavior of Natural and Artificial Systems, Volume 3. Studies in Cognitive Systems, vol 26. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0870-9_73
Download citation
DOI: https://doi.org/10.1007/978-94-010-0870-9_73
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-3792-1
Online ISBN: 978-94-010-0870-9
eBook Packages: Springer Book Archive