Skip to main content

Analysis of Big Data Using GLM

  • Chapter
  • First Online:
Reliability and Survival Analysis

Abstract

The application of the generalized linear models to big data is discussed in this chapter using the divide and recombine (D&R) framework. In this chapter, the exponential family of distributions for binary, count, normal, and multinomial outcome variables and the corresponding sufficient statistics for parameters are shown to have great potential in analyzing big data where traditional statistical methods cannot be used for the entire data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.gartner.com/it-glossary/big-data.

References

  • Bahadur RR (1954) Sufficiency and statistical decision functions. Ann Math Stat 25:423–462

    Article  MathSciNet  Google Scholar 

  • Buhlmann P, Petros D, Michael K, van der Mark L (2016) Handbook of big data. Routledge, London

    Book  Google Scholar 

  • Chen Y, Dong G, Han J, Pei J, Wah BW, Wang J (2006) Regression cubes with lossless compression and aggregation. IEEE Trans Knowl Data Eng 18:1–15

    Article  Google Scholar 

  • Chen X, Xie M (2014) A split-and-conquer approach for analysis of extraordinarily large data. Stat Sinica 24:1655–1684

    MathSciNet  MATH  Google Scholar 

  • Cleveland S, Hafen R (2014) Divide and recombine (D&R): data science for large complex data. Stat Anal Data Min 7:425–433

    Article  MathSciNet  Google Scholar 

  • Cox DR, Kartsonaki C, Keogh RH (2018) Big data: some statistical issues. Stat Probab Lett 1(36):111–115

    Article  MathSciNet  Google Scholar 

  • Dobson AJ, Barnett AG (2018) An introduction to generalized linear models, 4th edn. CRC Press, Boca Raton

    MATH  Google Scholar 

  • Donoho D (2015) 50 Years of data science. Presentation at the Tukey Centennial Workshop, Princeton, New Jersey, Sep 2015

    Google Scholar 

  • Donoho D (2017) 50 Years of data science. J Comput Graph Stat 26(4):745–766

    Article  MathSciNet  Google Scholar 

  • Dunson DB (2018) Statistics in the big data era: failures of the machine. Stat Probab Lett 1(36):4–9

    Article  MathSciNet  Google Scholar 

  • Einav L, Levin J (2014) Economics in the age of big data. Science 346:1243089-1, -5

    Article  Google Scholar 

  • Fahrmeir L, Tutz G (2001) Multivariate statistical modelling based on generalized linear models, 2nd edn. Springer, New York

    Chapter  Google Scholar 

  • Fisher RA (1920) A mathematical examination of the method of determining the accuracy of an observation by the mean error and by the mean square error, M.N.R. Astron Soc 80(8):758–770

    Article  Google Scholar 

  • Fisher RA (1922) On the mathematical foundations of theoretical statistics. Philos Trans R Soc Lond A 222:309–368

    Article  Google Scholar 

  • Fisher RA (1925) Theory of statistical estimation. Proc Camb Philos Soc 22:700–725

    Article  Google Scholar 

  • Fraser DAS (1961) Invariance and the fiducial method. Biometrika 48:261–280

    Article  MathSciNet  Google Scholar 

  • Fraser DAS (1963) On sufficiency and the exponential family. J R Stat Soc Ser B 25:115–123

    MathSciNet  MATH  Google Scholar 

  • Guha S, Hafen R, Rounds J, Xia J, Li J, Xi B, Cleveland WS (2012) Large complex data: divide and recombine (D&R) with RHIPE. Stat 1(1):53–67

    Article  Google Scholar 

  • Hafen R (2016) Divide and recombine: approach for detailed analysis and visualization of large complex data. Handbook of big data. Chapman and Hall, Boca Raton

    Google Scholar 

  • Halmos PR, Savage LJ (1949) Application of the radon-nikodym theorem to the theory of sufficient statistics. Ann Math Stat 20:225–241

    Article  MathSciNet  Google Scholar 

  • Härdle WK, Lu HHS, Shen X (eds) (2018) Handbook of big data analytics. Springer

    Google Scholar 

  • Koopman BO (1936) On distribution admitting a sufficient statistic. Trans Am Math Soc 39:399–409

    Article  MathSciNet  Google Scholar 

  • Lee JYL, Brown JJ, Ryan MM (2017) Sufficiency revisited: rethinking statistical algorithms in the big data era. Am Stat 71(3):202–208

    Article  MathSciNet  Google Scholar 

  • Lehmann EL (1959) Theory of hypothesis testing. Wiley, New York

    Google Scholar 

  • Liu W, Li Y (2018) A new stochastic restricted Liu estimator for the logistic regression model. Open J Stat 8:25–37

    Article  Google Scholar 

  • Pitman EJG (1936) Sufficient statistics and intrinsic accuracy. Proc Camb Philos Soc 32:567–579

    Article  Google Scholar 

  • Reid N (2018) Statistical science in the world of big data. Stat Probab Lett 1(36):42–45

    Article  MathSciNet  Google Scholar 

  • Sangalli LM (2018) The role of statistics in the era of big data. Stat Probab Lett 1(36):1–3

    Article  MathSciNet  Google Scholar 

  • Xi R, Lin N, Chen Y (2008) Compression and aggregation for logistic regression analysis in data cubes. IEEE Trans Knowl Data Eng 1(1):1–14

    Google Scholar 

  • Zomaya AY, Sakr S (eds) (2017) Handbook of big data technologies. Springer

    Google Scholar 

  • ZuoW Li Y (2018) A new stochastic restricted Liu estimator for the logistic regression model. Open J Stat 8:25–37

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Md. Rezaul Karim .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Karim, M.R., Islam, M.A. (2019). Analysis of Big Data Using GLM. In: Reliability and Survival Analysis. Springer, Singapore. https://doi.org/10.1007/978-981-13-9776-9_12

Download citation

Publish with us

Policies and ethics