Abstract
The application of the generalized linear models to big data is discussed in this chapter using the divide and recombine (D&R) framework. In this chapter, the exponential family of distributions for binary, count, normal, and multinomial outcome variables and the corresponding sufficient statistics for parameters are shown to have great potential in analyzing big data where traditional statistical methods cannot be used for the entire data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bahadur RR (1954) Sufficiency and statistical decision functions. Ann Math Stat 25:423–462
Buhlmann P, Petros D, Michael K, van der Mark L (2016) Handbook of big data. Routledge, London
Chen Y, Dong G, Han J, Pei J, Wah BW, Wang J (2006) Regression cubes with lossless compression and aggregation. IEEE Trans Knowl Data Eng 18:1–15
Chen X, Xie M (2014) A split-and-conquer approach for analysis of extraordinarily large data. Stat Sinica 24:1655–1684
Cleveland S, Hafen R (2014) Divide and recombine (D&R): data science for large complex data. Stat Anal Data Min 7:425–433
Cox DR, Kartsonaki C, Keogh RH (2018) Big data: some statistical issues. Stat Probab Lett 1(36):111–115
Dobson AJ, Barnett AG (2018) An introduction to generalized linear models, 4th edn. CRC Press, Boca Raton
Donoho D (2015) 50 Years of data science. Presentation at the Tukey Centennial Workshop, Princeton, New Jersey, Sep 2015
Donoho D (2017) 50 Years of data science. J Comput Graph Stat 26(4):745–766
Dunson DB (2018) Statistics in the big data era: failures of the machine. Stat Probab Lett 1(36):4–9
Einav L, Levin J (2014) Economics in the age of big data. Science 346:1243089-1, -5
Fahrmeir L, Tutz G (2001) Multivariate statistical modelling based on generalized linear models, 2nd edn. Springer, New York
Fisher RA (1920) A mathematical examination of the method of determining the accuracy of an observation by the mean error and by the mean square error, M.N.R. Astron Soc 80(8):758–770
Fisher RA (1922) On the mathematical foundations of theoretical statistics. Philos Trans R Soc Lond A 222:309–368
Fisher RA (1925) Theory of statistical estimation. Proc Camb Philos Soc 22:700–725
Fraser DAS (1961) Invariance and the fiducial method. Biometrika 48:261–280
Fraser DAS (1963) On sufficiency and the exponential family. J R Stat Soc Ser B 25:115–123
Guha S, Hafen R, Rounds J, Xia J, Li J, Xi B, Cleveland WS (2012) Large complex data: divide and recombine (D&R) with RHIPE. Stat 1(1):53–67
Hafen R (2016) Divide and recombine: approach for detailed analysis and visualization of large complex data. Handbook of big data. Chapman and Hall, Boca Raton
Halmos PR, Savage LJ (1949) Application of the radon-nikodym theorem to the theory of sufficient statistics. Ann Math Stat 20:225–241
Härdle WK, Lu HHS, Shen X (eds) (2018) Handbook of big data analytics. Springer
Koopman BO (1936) On distribution admitting a sufficient statistic. Trans Am Math Soc 39:399–409
Lee JYL, Brown JJ, Ryan MM (2017) Sufficiency revisited: rethinking statistical algorithms in the big data era. Am Stat 71(3):202–208
Lehmann EL (1959) Theory of hypothesis testing. Wiley, New York
Liu W, Li Y (2018) A new stochastic restricted Liu estimator for the logistic regression model. Open J Stat 8:25–37
Pitman EJG (1936) Sufficient statistics and intrinsic accuracy. Proc Camb Philos Soc 32:567–579
Reid N (2018) Statistical science in the world of big data. Stat Probab Lett 1(36):42–45
Sangalli LM (2018) The role of statistics in the era of big data. Stat Probab Lett 1(36):1–3
Xi R, Lin N, Chen Y (2008) Compression and aggregation for logistic regression analysis in data cubes. IEEE Trans Knowl Data Eng 1(1):1–14
Zomaya AY, Sakr S (eds) (2017) Handbook of big data technologies. Springer
ZuoW Li Y (2018) A new stochastic restricted Liu estimator for the logistic regression model. Open J Stat 8:25–37
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Karim, M.R., Islam, M.A. (2019). Analysis of Big Data Using GLM. In: Reliability and Survival Analysis. Springer, Singapore. https://doi.org/10.1007/978-981-13-9776-9_12
Download citation
DOI: https://doi.org/10.1007/978-981-13-9776-9_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9775-2
Online ISBN: 978-981-13-9776-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)