Skip to main content

Part of the book series: Studies in Cognitive Systems ((COGS,volume 26))

  • 600 Accesses

Abstract

The components of most real-world patterns contain redundant information. However, most pattern classifiers (e.g., statistical classifiers and neural nets) work better if pattern components are nonredundant. I present various unsupervised nonlinear predictor-based “neural” learning algorithms that transform patterns and pattern sequences into less redundant patterns without loss of information. The first part of the paper shows how a neural predictor can be used to remove redundant information from input sequences. Experiments with artificial sequences demonstrate that certain supervised classification techniques can greatly benefit from this kind of unsupervised preprocessing. In the second part of the paper, a neural predictor is used to remove redundant information from natural text. With certain short newspaper articles, the neural method can achieve better compression ratios than the widely used asymptotically optimal Lempel-Ziv string compression algorithm. The third part of the paper shows how a system of co-evolving neural predictors and neural code generating modules can build factorial (statistically nonredundant) codes of pattern ensembles. The method is successfully applied to images of letters randomly presented according to the probabilities of English language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Barlow, H.B., T.P. Kaushal, & G.J. Mitchison (1989). Finding minimum entropy codes. Neural Computation 1(3), 412–423.

    Article  Google Scholar 

  • Chaitin, G.J. (1969). On the length of programs for computing finite binary sequences: Statistical considerations. Journal of the Association for Computing Machinery 16 145–159.

    Article  MathSciNet  MATH  Google Scholar 

  • Chaitin, G.J. (1975). A theory of program size formally identical to information theory. Journal of the Journal of the Association for Computing Machinery 22 329–340.

    Article  MathSciNet  MATH  Google Scholar 

  • Hartmanis, J. (1983). Generalized Kolmogorov complexity and the structure of feasible computations. Proceedings of the 24th IEEE Symposium on Foundations of Computer Science (pp. 439–445).

    Google Scholar 

  • Held, G. (1991). Data compression. New York: Wiley.

    Google Scholar 

  • Hochreiter, S., & J. Schmidhuber (1997a). Flat minima. Neural Computation 9(1), 1–42.

    Article  MATH  Google Scholar 

  • Hochreiter, S., & J. Schmidhuber (1997b). Long short-term memory. Neural Computation 9 1681–1726.

    Google Scholar 

  • Kolmogorov, A.N. (1965). Three approaches to the quantitative definition of information. Problems of Information 1 1–11.

    MathSciNet  Google Scholar 

  • LeCun, Y. (1985). Une procédure d’apprentissage pour réseau à seuil asymétrique. Proceedings of Cognitiva 85 (pp. 599–604). Paris.

    Google Scholar 

  • Levin, L.A. (1973). Universal sequential search problems. Problems of Information Transmission 9(3), 265–266.

    Google Scholar 

  • Levin, L.A. (1974). Laws of information (nongrowth) and aspects of the foundation of probability theory. Problems of Information Transmission 10(3), 206–210.

    Google Scholar 

  • Li, M., & R.M.B. Vitányi (1993). An introduction to Kolmogorov complexity and its applications. Berlin: Springer.

    Book  MATH  Google Scholar 

  • Lindstädt, S. (1993a). Comparison of two unsupervised neural network models for redundancy reduction. In M.C. Mozer, R. Smolensky, D. Touretzky, J.L. Elman, & A.S. Weigend (eds.), Proceedings of the 1993 Connectionist Models Summer School (pp. 308–315). Hillsdale, NJ: Lawrence Erlbaum Ass.

    Google Scholar 

  • Lindstädt, S. (1993b). Comparison of unsupervised neural networks for redundancy reduction. Master’s thesis, Deptartment of Computer Science, University of Colorado at Boulder.

    Google Scholar 

  • Mozer, M.C. (1989). A focused back-propagation algorithm for temporal sequence recognition. Complex Systems 3 349–381.

    MathSciNet  MATH  Google Scholar 

  • Parker, D.B. (1985). Learning-logic. Technical Report TR-47, Center for Computer Research in Economics and Management Sciences. Cambridge, MA: MIT.

    Google Scholar 

  • Pearlmutter, B.A. (1989). Learning state space trajectories in recurrent neural networks. Neural Computation 1(2), 263–269.

    Article  Google Scholar 

  • Robinson, A.J., & F. Fallside (1987). The utility driven dynamic error propagation network. Technical Report CUED/F-INFENG/TR. 1, Cambridge University Engineering Department.

    Google Scholar 

  • Rumelhart, D.E., G.E. Hinton, & R.J. Williams (1986). Learning internal representations by error propagation. In D.E. Rumelhart & J.L. McClelland (eds.), Parallel distributed processing, Vol. 1 (pp. 318–362). Cambridge, MA: MIT Press.

    Google Scholar 

  • Schmidhuber, J. (1992a). A fixed size storage O(n 3) time complexity learning algorithm for fully recurrent continually running networks. Neural Computation 4(2), 243–248.

    Article  Google Scholar 

  • Schmidhuber, J.(1992b). Learning complex, extended sequences using the principle of history compression. Neural Computation 4(2), 234–242.

    Article  Google Scholar 

  • Schmidhuber, J. (1992c). Learning factorial codes by predictability minimization. Neural Computation 4(6), 863–879.

    Article  Google Scholar 

  • Schmidhuber, J. (1992d). Learning unambiguous reduced sequence descriptions. In J.E. Moody, S.J. Hanson, & R.P. Lippman (eds.), Advances in neural information processing systems 4 (pp. 291–298). San Mateo, CA: Morgan Kaufinan.

    Google Scholar 

  • Schmidhuber, J. (1993). Netzwerkarchitekturen, Zielfunktionen und Kettenregel Habilitationsschrift, Institut für Informatik, Technische Universität München.

    Google Scholar 

  • Schmidhuber, J. (1995). Discovering solutions with low Kolmogorov complexity and high generalization capability. In A. Prieditis & S. Russell (eds.), Machine learning: Proceedings of the Twelfth International Conference (pp. 488–496). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Schmidhuber, J. (1997). Discovering neural nets with low Kolmogorov complexity and high generalization capability. Neural Networks 10(5), 857–873.

    Article  Google Scholar 

  • Schmidhuber, J., M.C. Mozer, & D. Prelinger (1993a). Continuous history compression. In H. Hüning, S. Neuhauser, M. Raus, & W. Ritschel (eds.), Proceedings of the International Workshop on Neural Networks (pp. 87–95). RWTH Aachen. Aachen, Germany: Augustinus.

    Google Scholar 

  • Schmidhuber, J., & D. Prelinger (1993b). Discovering predictable classifications. Neural Computation 5(4), 625–635.

    Article  Google Scholar 

  • Schmidhuber, J., & S. Heil (1995). Predictive coding with neural nets: Application to text compression. In G. Tesauro, D.S. Touretzky, & T.K. Leen (eds.), Advances in neural information processing systems 7 (pp. 1047–1054). Cambridge, MA: MIT Press.

    Google Scholar 

  • Schmidhuber, J., & S. Heil (1996). Sequential neural text compression. IEEE Transactions on Neural Networks 7(1), 142–146.

    Article  Google Scholar 

  • Schmidhuber, J., J. Zhao, & N. Schraudolph (1997a). Reinforcement learning with self-modifying policies. In S. Thrun & L. Pratt (eds.), Learning to learn (pp. 293–309). Dordrecht, The Netherlands: Kluwer.

    Google Scholar 

  • Schmidhuber, J., J. Zhao, & M. Wiering (1997b). Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning 28 105–130.

    Article  Google Scholar 

  • Shannon, C.E. (1948). A mathematical theory of communication (parts I and II). Bell System Technical journal XXVII, 379–423.

    Article  MathSciNet  Google Scholar 

  • Solomonofif, R.J. (1964). A formal theory of inductive inference. Part I. Information and Control 7 1–22.

    Article  MathSciNet  Google Scholar 

  • Watanabe, O. (1992). Kolmogorov complexity and computational complexity. EATCS Monographs on Theoretical Computer Science. Berlin: Springer.

    Book  MATH  Google Scholar 

  • Werbos, P.J. (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard University.

    Google Scholar 

  • Werbos, P.J. (1988). Generalization of backpropagation with application to a recurrent gas market model. Neural Networks 1.

    Google Scholar 

  • Williams, R.J. (1989). Complexity of exact gradient computation algorithms for recurrent neural networks. Technical Report NU-CCS-89–27. Boston: Northeastern University, College of Computer Science.

    Google Scholar 

  • Williams, R.J., & J. Peng (1990). An efficient gradient-based algorithm for online training of recurrent network trajectories. Neural Computation 4 491–501.

    Google Scholar 

  • Williams, R.J., & D. Zipser (1989). Experimental analysis of the real-time recurrent learning algorithm. Connection Science 1(1), 87–111.

    Article  Google Scholar 

  • Witten, I.H., R.M. Neal, & J.G. Cleary (1987). Arithmetic coding for data compression. Communications of the ACM 30(6), 520–540.

    Google Scholar 

  • Wyner, A., & J. Ziv (1991). Fixed data base version of the Lempel-Ziv data compression algorithm. IEEE Transactions on Information Theory 37 878–880.

    Article  Google Scholar 

  • Ziv, J., & A. Lempel (1977). A universal algorithm for sequential data compression. IEEE Transactions on Information Theory IT-23(5) 337–343.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Schmidhuber, J. (2000). Neural Predictors for Detecting and Removing Redundant Information. In: Cruse, H., Dean, J., Ritter, H. (eds) Prerational Intelligence: Adaptive Behavior and Intelligent Systems Without Symbols and Logic, Volume 1, Volume 2 Prerational Intelligence: Interdisciplinary Perspectives on the Behavior of Natural and Artificial Systems, Volume 3. Studies in Cognitive Systems, vol 26. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0870-9_73

Download citation

  • DOI: https://doi.org/10.1007/978-94-010-0870-9_73

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-010-3792-1

  • Online ISBN: 978-94-010-0870-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics