Advertisement

TEST

pp 1–4 | Cite as

Comments on: Data science, big data and statistics

  • Marco RianiEmail author
  • Anthony C. Atkinson
  • Andrea Cerioli
  • Aldo Corbellini
Discussion
  • 16 Downloads

We congratulate Pedro Galeano and Daniel Peña on a magisterial, thorough and stimulating survey of the rapidly expanding and highly important field of the analysis of big data. Their lucid exposition is complemented by the analysis of two examples which show the results that can be obtained as well as illustrating the difficulties that can arise. The first example (their Fig. 4) provides an intriguing network plot.

In the analysis of any set of data, large or small, it is most helpful to be able to create plots that are informative about the structure of the data and any failings of models that are fitted. In their Section 2.5, the authors survey methods for visualising data in high dimensions, many of which are, like scatter plots, model-free. They use quantiles as one way of reducing dimensionality. In Section 3, “The Emergence of Data Science” the authors focus more on such methods as neural networks and those collected under the umbrella of machine learning, where the focus is...

Mathematics Subject Classification

62-07 

Notes

References

  1. Atkinson AC, Riani (2004) The forward search and data visualisation. Comput Stat 19:29–54MathSciNetCrossRefzbMATHGoogle Scholar
  2. Atkinson AC, Riani M, Cerioli A (2010) The forward search: theory and data analysis (with discussion). J Korean Stat Soc 39:117–134.  https://doi.org/10.1016/j.jkss.2010.02.007 CrossRefzbMATHGoogle Scholar
  3. Atkinson AC, Riani M, Cerioli A (2018) Cluster detection and clustering with random start forward searches. J Appl Stat 45:777–798.  https://doi.org/10.1080/02664763.2017.1310806 MathSciNetCrossRefGoogle Scholar
  4. Cerioli A, Riani M, Atkinson AC, Corbellini A (2018) The power of monitoring: how to make the most of a contaminated multivariate sample (with discussion). Stat Methods Appl 27:559–587.  https://doi.org/10.1007/s10260-017-0409-8 MathSciNetCrossRefGoogle Scholar
  5. Cerioli A, Farcomeni A, Riani M (2019) Wild adaptive trimming for robust estimation and cluster analysis. Scand J Stat 46:235–256.  https://doi.org/10.1111/sjos.12349 CrossRefzbMATHGoogle Scholar
  6. Cox DR (2015) Big data and precision. Biometrika 102:712–716MathSciNetCrossRefzbMATHGoogle Scholar
  7. Riani M, Atkinson AC (2000) Robust diagnostic data analysis: transformations in regression (with discussion). Technometrics 42:384–398MathSciNetCrossRefzbMATHGoogle Scholar
  8. Riani M, Cerioli A, Atkinson AC, Perrotta D (2014) Monitoring robust regression. Electron J Stat 8:642–673MathSciNetCrossRefzbMATHGoogle Scholar
  9. Riani M, Atkinson AC, Corbellini A (2019) Efficient robust methods via monitoring for multivariate data analysis including clustering. Pattern Recognit 88:246–260CrossRefGoogle Scholar
  10. Torti F, Corbellini A, Atkinson AC (2019) Monitoring robust regression, especially the forward search, in SAS IML studio (In preparation)Google Scholar

Copyright information

© Sociedad de Estadística e Investigación Operativa 2019

Authors and Affiliations

  1. 1.Department of Economics and Management, Interdepartmental Centre of Robust StatisticsUniversity of ParmaParmaItaly
  2. 2.The London School of EconomicsLondonUK

Personalised recommendations