Abstract
This paper discusses the opportunities big data offers decision makers from a statistical perspective. It calls for a multidisciplinary approach by computer scientists, statisticians and domain experts to providing useful big data solutions. Big data calls for us to think in new ways and communicate effectively within such teams. We make a plea for linking data-driven and model-driven analytics, and stress the role of cause-effect models for knowledge enhancement in big data analytics. We remember Kant’s statement that theory without data is blind, but facts without theories are meaningless. A case is made for each discipline to define the contribution they offer to big data solutions so that effective teams can be formed to improve inductions. Although new approaches are needed much of the past learning related to small data are valuable in providing big data solutions. Here we have in mind the long-term academic training and field experience of statisticians concerning reduction of dataset volumes, sampling in a more general setting, data depreciation and quality, model design and validation, visualisation, etc. We expect that combining the present approaches will give incentives for increasing the chances for “real big solutions”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bolt, S., Sparks, R.: Detecting and diagnosing hotspots for the enhanced management of hospital emergency departments in Queensland, Australia. Med. Inform. Dec. Making 13, 134 (2013)
Breiman, L.: Statistical modeling: the two cultures (with comments and a rejoinder by the author). Statis. Sci. 16(3), 199–231 (2001)
Deming, W.E.: The new economics: for industry, government, education, 2nd edn. The MIT Press, Cambridge (2000)
Friedman, J.H., Stuetzle, W.: Projection pursuit regression. J. Am. Statis. Assoc. 76, 817–823 (1981)
Friedman, J.H.: Fast sparse regression and classification. Int. J. Forecast. 28, 722–738 (2012)
Harford, T.: Big data: are we making a big mistake? Significance 11(5), 14–19 (2014)
Lahiri, P., Larsen, M.: Regression analysis with linked data. J. Am. Statis. Assoc. 100, 222–230 (2005)
Megahed, F.M., Jones-Farmer, L.A.: A statistical process monitoring perspective on big data. In: XIth International Workshop on Intelligent Statistical Quality Control, CSIRO, Sydney (2013)
Popper, K.: Science as falsification. Conject. Refutat. Readings in the Philosophy of Science, 33–39 (1963)
Savage, L.J.: The Foundations of Statistics, Dover edn, 352pp (1972)
Sparks, R.S., Okugami, C.: Data quality: algorithms for automatic detection of unusual measurements. Front. Statis. Proc. Control 10, 385–400 (2012)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal Statis. Soc. Series B (Methodological), 24, 267–288 (1996)
West, M., Harrison, P.J.: Bayesian Forecasting and Dynamic Models. Springer, New York (1997)
Williams, C., Rasmussen, C.: Gaussian processes for regression (1996). http://eprints.aston.ac.uk/651/1/getPDF.pdf
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Sparks, R., Ickowicz, A., Lenz, H.J. (2016). An Insight on Big Data Analytics. In: Japkowicz, N., Stefanowski, J. (eds) Big Data Analysis: New Algorithms for a New Society. Studies in Big Data, vol 16. Springer, Cham. https://doi.org/10.1007/978-3-319-26989-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-26989-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26987-0
Online ISBN: 978-3-319-26989-4
eBook Packages: EngineeringEngineering (R0)