Skip to main content

Normalized Entropy Aggregation for Inhomogeneous Large-Scale Data

  • Conference paper
  • First Online:
Theory and Applications of Time Series Analysis (ITISE 2018)

Part of the book series: Contributions to Statistics ((CONTRIB.STAT.))

Included in the following conference series:

Abstract

It was already in the fifties of the last century that the relationship between information theory, statistics and maximum entropy was established, following the works of Kullback, Leibler, Lindley and Jaynes. However, the applications were restricted to very specific domains and it was not until recently that the convergence between information processing, data analysis and inference demanded the foundation of a new scientific area, commonly referred to as Info-Metrics [1, 2]. As a huge amount of information and large-scale data have become available, the term “big data” has been used to refer to the many kinds of challenges presented in its analysis: many observations, many variables (or both), limited computational resources, different time regimes or multiple sources. In this work, we consider one particular aspect of big data analysis which is the presence of inhomogeneities, compromising the use of the classical framework in regression modelling. A new approach is proposed, based on the introduction of the concepts of info-metrics to the analysis of inhomogeneous large-scale data. The framework of information-theoretic estimation methods is presented, along with some information measures. In particular, the normalized entropy is tested in aggregation procedures and some simulation results are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Ratio of the largest singular value of \(\varvec{X}\), with the smallest singular value.

  2. 2.

    It is not considered here the case of a single learning set, as in [9], and the need to take repeated bootstrap samples from it.

  3. 3.

    The concept is not used here in a literal sense. A discussion about similar notions of this concept is available in Belsley et al. [12, pp. 85–98].

References

  1. Golan, A.: On the state of art of Info-Metrics. In: Huynh, V.N., Kreinovich, V., Sriboonchitta, S., Suriya, K. (Eds.) Uncertainty Analysis in Econometrics with Applications, pp. 3–15. Springer, Berlin (2013)

    Google Scholar 

  2. Golan, A.: Foundations of Info-Metrics: Modeling, Inference, and Imperfect Information. Oxford University Press, New York (2018)

    Google Scholar 

  3. Golan, A.: On the foundations and philosophy of Info-Metrics. In: Cooper, S.B., Dawar, A., Lowe, B.L. (Eds.) CiE2012. LNCS, vol. 7318, pp. 238–245. Springer, Heidelberg (2012)

    Google Scholar 

  4. Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106, 620–630 (1957)

    Article  MathSciNet  Google Scholar 

  5. Jaynes, E.T.: Information theory and statistical mechanics II. Phys. Rev. 108, 171–190 (1957)

    Article  MathSciNet  Google Scholar 

  6. Golan, A., Judge, G., Miller, D.: Maximum Entropy Econometrics—Robust Estimation with Limited Data. Wiley, Chichester (1996)

    Google Scholar 

  7. Mittelhammer, R., Cardell, N.S., Marsh, T.L.: The Data-constrained generalized maximum entropy estimator of the GLM: asymptotic theory and inference. Entropy 15, 1756–1775 (2013)

    Article  MathSciNet  Google Scholar 

  8. Bühlmann, P., Meinshausen, N.: Magging: maximin aggregation for inhomogeneous large-scale data. In: Proceedings of the IEEE 104 (1): Big Data: Theoretical Aspects, pp. 126–135. IEEE Press, New York (2016)

    Google Scholar 

  9. Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)

    MATH  Google Scholar 

  10. Wolpert, D.: Stacked generalization. Neural Netw. 5, 241–259 (1992)

    Article  Google Scholar 

  11. Breiman, L.: Stacked regressions. Mach. Learn. 24, 49–64 (1996b)

    MATH  Google Scholar 

  12. Belsley, D.A., Kuh, E., Welsch, R.E.: Regression Diagnostics—Identifying Influential Data and Sources of Collinearity. Wiley, Hoboken, New Jersey (2004)

    Google Scholar 

Download references

Acknowledgements

This research was supported by the Portuguese national funding agency for science, research and technology (FCT), within the Center for Research and Development in Mathematics and Applications (CIDMA), project UID/MAT/04106/2019.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maria da Conceição Costa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

da Conceição Costa, M., Macedo, P. (2019). Normalized Entropy Aggregation for Inhomogeneous Large-Scale Data. In: Valenzuela, O., Rojas, F., Pomares, H., Rojas, I. (eds) Theory and Applications of Time Series Analysis. ITISE 2018. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-26036-1_2

Download citation

Publish with us

Policies and ethics