Neural Predictors for Detecting and Removing Redundant Information

Schmidhuber, Jürgen

doi:10.1007/978-94-010-0870-9_73

Jürgen Schmidhuber⁴

Part of the book series: Studies in Cognitive Systems ((COGS,volume 26))

600 Accesses

Abstract

The components of most real-world patterns contain redundant information. However, most pattern classifiers (e.g., statistical classifiers and neural nets) work better if pattern components are nonredundant. I present various unsupervised nonlinear predictor-based “neural” learning algorithms that transform patterns and pattern sequences into less redundant patterns without loss of information. The first part of the paper shows how a neural predictor can be used to remove redundant information from input sequences. Experiments with artificial sequences demonstrate that certain supervised classification techniques can greatly benefit from this kind of unsupervised preprocessing. In the second part of the paper, a neural predictor is used to remove redundant information from natural text. With certain short newspaper articles, the neural method can achieve better compression ratios than the widely used asymptotically optimal Lempel-Ziv string compression algorithm. The third part of the paper shows how a system of co-evolving neural predictors and neural code generating modules can build factorial (statistically nonredundant) codes of pattern ensembles. The method is successfully applied to images of letters randomly presented according to the probabilities of English language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barlow, H.B., T.P. Kaushal, & G.J. Mitchison (1989). Finding minimum entropy codes. Neural Computation 1(3), 412–423.
Article Google Scholar
Chaitin, G.J. (1969). On the length of programs for computing finite binary sequences: Statistical considerations. Journal of the Association for Computing Machinery 16 145–159.
Article MathSciNet MATH Google Scholar
Chaitin, G.J. (1975). A theory of program size formally identical to information theory. Journal of the Journal of the Association for Computing Machinery 22 329–340.
Article MathSciNet MATH Google Scholar
Hartmanis, J. (1983). Generalized Kolmogorov complexity and the structure of feasible computations. Proceedings of the 24th IEEE Symposium on Foundations of Computer Science (pp. 439–445).
Google Scholar
Held, G. (1991). Data compression. New York: Wiley.
Google Scholar
Hochreiter, S., & J. Schmidhuber (1997a). Flat minima. Neural Computation 9(1), 1–42.
Article MATH Google Scholar
Hochreiter, S., & J. Schmidhuber (1997b). Long short-term memory. Neural Computation 9 1681–1726.
Google Scholar
Kolmogorov, A.N. (1965). Three approaches to the quantitative definition of information. Problems of Information 1 1–11.
MathSciNet Google Scholar
LeCun, Y. (1985). Une procédure d’apprentissage pour réseau à seuil asymétrique. Proceedings of Cognitiva 85 (pp. 599–604). Paris.
Google Scholar
Levin, L.A. (1973). Universal sequential search problems. Problems of Information Transmission 9(3), 265–266.
Google Scholar
Levin, L.A. (1974). Laws of information (nongrowth) and aspects of the foundation of probability theory. Problems of Information Transmission 10(3), 206–210.
Google Scholar
Li, M., & R.M.B. Vitányi (1993). An introduction to Kolmogorov complexity and its applications. Berlin: Springer.
Book MATH Google Scholar
Lindstädt, S. (1993a). Comparison of two unsupervised neural network models for redundancy reduction. In M.C. Mozer, R. Smolensky, D. Touretzky, J.L. Elman, & A.S. Weigend (eds.), Proceedings of the 1993 Connectionist Models Summer School (pp. 308–315). Hillsdale, NJ: Lawrence Erlbaum Ass.
Google Scholar
Lindstädt, S. (1993b). Comparison of unsupervised neural networks for redundancy reduction. Master’s thesis, Deptartment of Computer Science, University of Colorado at Boulder.
Google Scholar
Mozer, M.C. (1989). A focused back-propagation algorithm for temporal sequence recognition. Complex Systems 3 349–381.
MathSciNet MATH Google Scholar
Parker, D.B. (1985). Learning-logic. Technical Report TR-47, Center for Computer Research in Economics and Management Sciences. Cambridge, MA: MIT.
Google Scholar
Pearlmutter, B.A. (1989). Learning state space trajectories in recurrent neural networks. Neural Computation 1(2), 263–269.
Article Google Scholar
Robinson, A.J., & F. Fallside (1987). The utility driven dynamic error propagation network. Technical Report CUED/F-INFENG/TR. 1, Cambridge University Engineering Department.
Google Scholar
Rumelhart, D.E., G.E. Hinton, & R.J. Williams (1986). Learning internal representations by error propagation. In D.E. Rumelhart & J.L. McClelland (eds.), Parallel distributed processing, Vol. 1 (pp. 318–362). Cambridge, MA: MIT Press.
Google Scholar
Schmidhuber, J. (1992a). A fixed size storage O(n ³) time complexity learning algorithm for fully recurrent continually running networks. Neural Computation 4(2), 243–248.
Article Google Scholar
Schmidhuber, J.(1992b). Learning complex, extended sequences using the principle of history compression. Neural Computation 4(2), 234–242.
Article Google Scholar
Schmidhuber, J. (1992c). Learning factorial codes by predictability minimization. Neural Computation 4(6), 863–879.
Article Google Scholar
Schmidhuber, J. (1992d). Learning unambiguous reduced sequence descriptions. In J.E. Moody, S.J. Hanson, & R.P. Lippman (eds.), Advances in neural information processing systems 4 (pp. 291–298). San Mateo, CA: Morgan Kaufinan.
Google Scholar
Schmidhuber, J. (1993). Netzwerkarchitekturen, Zielfunktionen und Kettenregel Habilitationsschrift, Institut für Informatik, Technische Universität München.
Google Scholar
Schmidhuber, J. (1995). Discovering solutions with low Kolmogorov complexity and high generalization capability. In A. Prieditis & S. Russell (eds.), Machine learning: Proceedings of the Twelfth International Conference (pp. 488–496). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Schmidhuber, J. (1997). Discovering neural nets with low Kolmogorov complexity and high generalization capability. Neural Networks 10(5), 857–873.
Article Google Scholar
Schmidhuber, J., M.C. Mozer, & D. Prelinger (1993a). Continuous history compression. In H. Hüning, S. Neuhauser, M. Raus, & W. Ritschel (eds.), Proceedings of the International Workshop on Neural Networks (pp. 87–95). RWTH Aachen. Aachen, Germany: Augustinus.
Google Scholar
Schmidhuber, J., & D. Prelinger (1993b). Discovering predictable classifications. Neural Computation 5(4), 625–635.
Article Google Scholar
Schmidhuber, J., & S. Heil (1995). Predictive coding with neural nets: Application to text compression. In G. Tesauro, D.S. Touretzky, & T.K. Leen (eds.), Advances in neural information processing systems 7 (pp. 1047–1054). Cambridge, MA: MIT Press.
Google Scholar
Schmidhuber, J., & S. Heil (1996). Sequential neural text compression. IEEE Transactions on Neural Networks 7(1), 142–146.
Article Google Scholar
Schmidhuber, J., J. Zhao, & N. Schraudolph (1997a). Reinforcement learning with self-modifying policies. In S. Thrun & L. Pratt (eds.), Learning to learn (pp. 293–309). Dordrecht, The Netherlands: Kluwer.
Google Scholar
Schmidhuber, J., J. Zhao, & M. Wiering (1997b). Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning 28 105–130.
Article Google Scholar
Shannon, C.E. (1948). A mathematical theory of communication (parts I and II). Bell System Technical journal XXVII, 379–423.
Article MathSciNet Google Scholar
Solomonofif, R.J. (1964). A formal theory of inductive inference. Part I. Information and Control 7 1–22.
Article MathSciNet Google Scholar
Watanabe, O. (1992). Kolmogorov complexity and computational complexity. EATCS Monographs on Theoretical Computer Science. Berlin: Springer.
Book MATH Google Scholar
Werbos, P.J. (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard University.
Google Scholar
Werbos, P.J. (1988). Generalization of backpropagation with application to a recurrent gas market model. Neural Networks 1.
Google Scholar
Williams, R.J. (1989). Complexity of exact gradient computation algorithms for recurrent neural networks. Technical Report NU-CCS-89–27. Boston: Northeastern University, College of Computer Science.
Google Scholar
Williams, R.J., & J. Peng (1990). An efficient gradient-based algorithm for online training of recurrent network trajectories. Neural Computation 4 491–501.
Google Scholar
Williams, R.J., & D. Zipser (1989). Experimental analysis of the real-time recurrent learning algorithm. Connection Science 1(1), 87–111.
Article Google Scholar
Witten, I.H., R.M. Neal, & J.G. Cleary (1987). Arithmetic coding for data compression. Communications of the ACM 30(6), 520–540.
Google Scholar
Wyner, A., & J. Ziv (1991). Fixed data base version of the Lempel-Ziv data compression algorithm. IEEE Transactions on Information Theory 37 878–880.
Article Google Scholar
Ziv, J., & A. Lempel (1977). A universal algorithm for sequential data compression. IEEE Transactions on Information Theory IT-23(5) 337–343.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

IDSIA, Lugano, Switzerland
Jürgen Schmidhuber

Authors

Jürgen Schmidhuber
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Bielefeld, Bielefeld, Germany
Holk Cruse & Helge Ritter &
Cleveland State University, Cleveland, Ohio, USA
Jeffrey Dean

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schmidhuber, J. (2000). Neural Predictors for Detecting and Removing Redundant Information. In: Cruse, H., Dean, J., Ritter, H. (eds) Prerational Intelligence: Adaptive Behavior and Intelligent Systems Without Symbols and Logic, Volume 1, Volume 2 Prerational Intelligence: Interdisciplinary Perspectives on the Behavior of Natural and Artificial Systems, Volume 3. Studies in Cognitive Systems, vol 26. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0870-9_73

Download citation

DOI: https://doi.org/10.1007/978-94-010-0870-9_73
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-3792-1
Online ISBN: 978-94-010-0870-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics