Normalized Entropy Aggregation for Inhomogeneous Large-Scale Data

da Conceição Costa, Maria; Macedo, Pedro

doi:10.1007/978-3-030-26036-1_2

Maria da Conceição Costa⁵ &
Pedro Macedo⁵

Part of the book series: Contributions to Statistics ((CONTRIB.STAT.))

Included in the following conference series:

International Conference on Time Series and Forecasting

1006 Accesses
1 Citations

Abstract

It was already in the fifties of the last century that the relationship between information theory, statistics and maximum entropy was established, following the works of Kullback, Leibler, Lindley and Jaynes. However, the applications were restricted to very specific domains and it was not until recently that the convergence between information processing, data analysis and inference demanded the foundation of a new scientific area, commonly referred to as Info-Metrics [1, 2]. As a huge amount of information and large-scale data have become available, the term “big data” has been used to refer to the many kinds of challenges presented in its analysis: many observations, many variables (or both), limited computational resources, different time regimes or multiple sources. In this work, we consider one particular aspect of big data analysis which is the presence of inhomogeneities, compromising the use of the classical framework in regression modelling. A new approach is proposed, based on the introduction of the concepts of info-metrics to the analysis of inhomogeneous large-scale data. The framework of information-theoretic estimation methods is presented, along with some information measures. In particular, the normalized entropy is tested in aggregation procedures and some simulation results are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Ratio of the largest singular value of \(\varvec{X}\), with the smallest singular value.
2.
It is not considered here the case of a single learning set, as in [9], and the need to take repeated bootstrap samples from it.
3.
The concept is not used here in a literal sense. A discussion about similar notions of this concept is available in Belsley et al. [12, pp. 85–98].

References

Golan, A.: On the state of art of Info-Metrics. In: Huynh, V.N., Kreinovich, V., Sriboonchitta, S., Suriya, K. (Eds.) Uncertainty Analysis in Econometrics with Applications, pp. 3–15. Springer, Berlin (2013)
Google Scholar
Golan, A.: Foundations of Info-Metrics: Modeling, Inference, and Imperfect Information. Oxford University Press, New York (2018)
Google Scholar
Golan, A.: On the foundations and philosophy of Info-Metrics. In: Cooper, S.B., Dawar, A., Lowe, B.L. (Eds.) CiE2012. LNCS, vol. 7318, pp. 238–245. Springer, Heidelberg (2012)
Google Scholar
Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106, 620–630 (1957)
Article MathSciNet Google Scholar
Jaynes, E.T.: Information theory and statistical mechanics II. Phys. Rev. 108, 171–190 (1957)
Article MathSciNet Google Scholar
Golan, A., Judge, G., Miller, D.: Maximum Entropy Econometrics—Robust Estimation with Limited Data. Wiley, Chichester (1996)
Google Scholar
Mittelhammer, R., Cardell, N.S., Marsh, T.L.: The Data-constrained generalized maximum entropy estimator of the GLM: asymptotic theory and inference. Entropy 15, 1756–1775 (2013)
Article MathSciNet Google Scholar
Bühlmann, P., Meinshausen, N.: Magging: maximin aggregation for inhomogeneous large-scale data. In: Proceedings of the IEEE 104 (1): Big Data: Theoretical Aspects, pp. 126–135. IEEE Press, New York (2016)
Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)
MATH Google Scholar
Wolpert, D.: Stacked generalization. Neural Netw. 5, 241–259 (1992)
Article Google Scholar
Breiman, L.: Stacked regressions. Mach. Learn. 24, 49–64 (1996b)
MATH Google Scholar
Belsley, D.A., Kuh, E., Welsch, R.E.: Regression Diagnostics—Identifying Influential Data and Sources of Collinearity. Wiley, Hoboken, New Jersey (2004)
Google Scholar

Download references

Acknowledgements

This research was supported by the Portuguese national funding agency for science, research and technology (FCT), within the Center for Research and Development in Mathematics and Applications (CIDMA), project UID/MAT/04106/2019.

Author information

Authors and Affiliations

Department of Mathematics and CIDMA – Center for Research and Development in Mathematics and Applications, University of Aveiro, 3810-193, Aveiro, Portugal
Maria da Conceição Costa & Pedro Macedo

Authors

Maria da Conceição Costa
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Macedo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maria da Conceição Costa .

Editor information

Editors and Affiliations

Faculty of Sciences, University of Granada, Granada, Spain
Olga Valenzuela
ETSIIT, CITIC-UGR, University of Granada, Granada, Spain
Fernando Rojas
ETSIIT, CITIC-UGR, University of Granada, Granada, Spain
Héctor Pomares
ETSIIT, CITIC-UGR, University of Granada, Granada, Spain
Ignacio Rojas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

da Conceição Costa, M., Macedo, P. (2019). Normalized Entropy Aggregation for Inhomogeneous Large-Scale Data. In: Valenzuela, O., Rojas, F., Pomares, H., Rojas, I. (eds) Theory and Applications of Time Series Analysis. ITISE 2018. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-26036-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-26036-1_2
Published: 19 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26035-4
Online ISBN: 978-3-030-26036-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics