Abstract
It was already in the fifties of the last century that the relationship between information theory, statistics and maximum entropy was established, following the works of Kullback, Leibler, Lindley and Jaynes. However, the applications were restricted to very specific domains and it was not until recently that the convergence between information processing, data analysis and inference demanded the foundation of a new scientific area, commonly referred to as Info-Metrics [1, 2]. As a huge amount of information and large-scale data have become available, the term “big data” has been used to refer to the many kinds of challenges presented in its analysis: many observations, many variables (or both), limited computational resources, different time regimes or multiple sources. In this work, we consider one particular aspect of big data analysis which is the presence of inhomogeneities, compromising the use of the classical framework in regression modelling. A new approach is proposed, based on the introduction of the concepts of info-metrics to the analysis of inhomogeneous large-scale data. The framework of information-theoretic estimation methods is presented, along with some information measures. In particular, the normalized entropy is tested in aggregation procedures and some simulation results are presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Ratio of the largest singular value of \(\varvec{X}\), with the smallest singular value.
- 2.
It is not considered here the case of a single learning set, as in [9], and the need to take repeated bootstrap samples from it.
- 3.
The concept is not used here in a literal sense. A discussion about similar notions of this concept is available in Belsley et al. [12, pp. 85–98].
References
Golan, A.: On the state of art of Info-Metrics. In: Huynh, V.N., Kreinovich, V., Sriboonchitta, S., Suriya, K. (Eds.) Uncertainty Analysis in Econometrics with Applications, pp. 3–15. Springer, Berlin (2013)
Golan, A.: Foundations of Info-Metrics: Modeling, Inference, and Imperfect Information. Oxford University Press, New York (2018)
Golan, A.: On the foundations and philosophy of Info-Metrics. In: Cooper, S.B., Dawar, A., Lowe, B.L. (Eds.) CiE2012. LNCS, vol. 7318, pp. 238–245. Springer, Heidelberg (2012)
Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106, 620–630 (1957)
Jaynes, E.T.: Information theory and statistical mechanics II. Phys. Rev. 108, 171–190 (1957)
Golan, A., Judge, G., Miller, D.: Maximum Entropy Econometrics—Robust Estimation with Limited Data. Wiley, Chichester (1996)
Mittelhammer, R., Cardell, N.S., Marsh, T.L.: The Data-constrained generalized maximum entropy estimator of the GLM: asymptotic theory and inference. Entropy 15, 1756–1775 (2013)
Bühlmann, P., Meinshausen, N.: Magging: maximin aggregation for inhomogeneous large-scale data. In: Proceedings of the IEEE 104 (1): Big Data: Theoretical Aspects, pp. 126–135. IEEE Press, New York (2016)
Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)
Wolpert, D.: Stacked generalization. Neural Netw. 5, 241–259 (1992)
Breiman, L.: Stacked regressions. Mach. Learn. 24, 49–64 (1996b)
Belsley, D.A., Kuh, E., Welsch, R.E.: Regression Diagnostics—Identifying Influential Data and Sources of Collinearity. Wiley, Hoboken, New Jersey (2004)
Acknowledgements
This research was supported by the Portuguese national funding agency for science, research and technology (FCT), within the Center for Research and Development in Mathematics and Applications (CIDMA), project UID/MAT/04106/2019.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
da Conceição Costa, M., Macedo, P. (2019). Normalized Entropy Aggregation for Inhomogeneous Large-Scale Data. In: Valenzuela, O., Rojas, F., Pomares, H., Rojas, I. (eds) Theory and Applications of Time Series Analysis. ITISE 2018. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-26036-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-26036-1_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26035-4
Online ISBN: 978-3-030-26036-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)