Skip to main content

Visualization

  • Chapter
  • First Online:
An Introduction to Data Analysis in R

Part of the book series: Use R! ((USE R))

Abstract

Presenting conclusions with the help of a graph can greatly improve your communication and convincing skills. R is a proficient tool for data visualization and in this chapter we explore some of the most well known plotting packages. First, with the R base graphics one can elaborate most of the fundamental graph styles with great level of customization. This package is commonly used to produce explanatory graphs, being a valuable help to visualize the properties of a dataset. Second, the widely used ggplot2 package can be used to produce highly aesthetic graphs with ease. This exceptional tool processes input data into a final plot which displays new conclusions in an understandable fashion. Finally, and for an extra domain on data visualization, the packages plotly and leaflet, specialized in the construction of interactive plots and maps respectively, are introduced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A univariate dataset consists of one variable data, whereas multivariate allows for many variables. More on this will be seen in Chap. 5.

  2. 2.

    Throughout Chap. 4, whenever this happens, we omit repeated arguments and focus only on the particular ones. The reader should understand a similar usage for those arguments appearing in several plotting functions.

  3. 3.

    RGB stands for Red Green Blue and is a way of defining almost every color based on the proportion of each primary color.

  4. 4.

    For example, plotting a matrix with u and v as columns yields the right-hand picture in Fig. 4.1.

  5. 5.

    This can also be obtained with function pairs( ) from package graphics used on numerical matrices.

  6. 6.

    The dataset iris is contained in the package datasets included in the R core.

  7. 7.

    When the argument col is filled with the variable Species, which is a factor vector with three levels, the first three different colors in the R palette are assigned to corresponding observations from each level.

  8. 8.

    The USPersonalExpenditure dataset is contained in the datasets package.

  9. 9.

    Legend arguments are passed with args.legend and will be explored in detail in Sect. 4.1.3.

  10. 10.

    This will be explained in detail in Sect. 5.1.1.

  11. 11.

    The notches are depicted to a distance of ± 1.58 the interquartile range (a dispersion measure of the data explained in Sect. 5.1.2) divided by the square root of the sample size. This calculation, according to [3], gives a 95% confidence interval for the difference between the two medians being statistically significant.

  12. 12.

    A continuous variable X is a function taking values on the real numbers. See Sect. 5.2.1.

  13. 13.

    It is important to note that the number of breaks is only interpreted by R as a suggestion, so you might ask for breaks=5 and get a plot with 7 breaks, for example.

  14. 14.

    Recall that seq( start, end, by) creates a sequence vector with the starting and end points and the gap between entries.

  15. 15.

    Image provided by Holtz [9] via https://www.r-graph-gallery.com/74-margin-and-oma-cheatsheet/.

  16. 16.

    The ggplot2 motto is Create Elegant Data Visualizations Using the Grammar of Graphics.

  17. 17.

    For more examples and full description of all functions, visit https://ggplot2.tidyverse.org/index.html.

  18. 18.

    It allows more than the two required arguments, but their purpose can be achieved in a more natural way with other layers.

  19. 19.

    The main specific arguments are listed in the table and exploring them is left to the reader since, by now, it should be straight forward.

  20. 20.

    The name alpha is a standard way to refer to transparency, not only in programming but also in picture or video edition.

  21. 21.

    The line type, width, and other components that relate to the particular aspects of a line can be modified by using several secondary arguments that the reader can check in the documentation.

  22. 22.

    Except for the main title and axes labels.

  23. 23.

    The calculations for the slope and intercept will be studied in Sect. 5.3.2.

  24. 24.

    By means of outlier.color, outlier.fill, outlier.shape (which hides the outliers if set to NA), outlier.size, outlier.stroke, and outlier.alpha.

  25. 25.

    By default, a method is chosen based on the sample size. For less than 1000 observations, the method loess (locally estimated scatterplot smoothing, [5]) is implemented; otherwise, gam (generalized additive models, [8, 10]), is used. Both methods are far advanced for the scope of this book, the reason why we stick to the linear model by setting method="lm."

  26. 26.

    To have the plot by rows we will use facet_grid( cut∼ .) .

  27. 27.

    If an implemented function is used, it should be between commas.

  28. 28.

    The code above is equivalent to set theta="x."

  29. 29.

    plotly is also available in other languages such as Python and in standalone online versions.

  30. 30.

    Leaflet is an open-source JavaScript library.

  31. 31.

    https://agafonkin.com/.

  32. 32.

    In geographic coordinates, latitude is the angular distance measured along a meridian, with value 0 at the equator and 90 at the north and south poles. Longitude is the angular distance from the Greenwich meridian along the equator going from 0 till 180 East and West, respectively. South latitudes and West longitudes will be set in R as negative.

  33. 33.

    The Staples Center is a multi-purpose arena in Los Angeles city, site of several sports and arts international events.

  34. 34.

    https://cloud.google.com/maps-platform/.

  35. 35.

    The usage of this platform requires to be registered and to accept Google terms and introduce our billing data, even though no charge is done without explicit user approval.

  36. 36.

    This is a fake key used as an example, which should be substituted by the reader’s personal one.

  37. 37.

    In fact it is a tibble: a data format used by R packages from the tidyverse universe like leaflet. In most situations it can be used just as a data frame.

  38. 38.

    Try flights[!( country=="Russian Federation" | country=="United Kingdom") ].

References

  1. Agafonkin, V. Leaflet: an open-source JavaScript library for mobile-friendly interactive maps. https://leafletjs.com/, 2014. [Online, accessed 2020-02-29].

  2. Belsley, D.A., Kuh, E. and Welsch, R.E. Regression diagnostics: Identifying influential data and sources of collinearity. Wiley Series in Probability and Mathematical Statistics, 571(4), 1980.

    Google Scholar 

  3. Chambers, J. M., Cleveland, W. S., Kleiner, B. and Tukey, P. A. Graphical Methods for Data Analysis. Wadsworth & Brooks/Cole Mathematics Series, Springer, Heidelberg, Germany, 1983.

    Google Scholar 

  4. Cheng, J., Xie, Y., Wickham, H. and Agafonkin, V. leaflet: Create interactive web maps with the JavaScript ‘leaflet’ library. R package version, 1(0):423, 2017.

    Google Scholar 

  5. Cleveland, W.S. LOWESS: A program for smoothing scatterplots by robust locally weighted regression. American Statistician, 35(1):54, 1981.

    Article  Google Scholar 

  6. Cleveland, W.S. The Elements of Graphing Data. Wadsworth Publ. Co., California, USA, 1985.

    Google Scholar 

  7. Fisher, R.A. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179–188, 1936.

    Article  Google Scholar 

  8. Hastie, T., Tibshirani, R. and Friedman, J.H. The elements of statistical learning: data mining, inference, and prediction. Springer, Berlin, Germany, 2009.

    Book  Google Scholar 

  9. Holtz, Y. Margin and Oma cheatsheet. https://www.r-graph-gallery.com/74-margin-and-oma-cheatsheet/, 2016. [Online, accessed 2020-02-29].

  10. James, G., Witten, D., Hastie, T. and Tibshirani, R. An Introduction to Statistical Learning: With Applications in R. Springer Publishing Company, Incorporated, New York, USA, 2014.

    MATH  Google Scholar 

  11. OpenStreetMap. Planet dump retrieved from https://planet.osm.org. https://www.openstreetmap.org, 2017. [Online, accessed 2020-02-29].

  12. Plotly Technologies Inc. Collaborative data science. https://plot.ly, 2015. [Online, accessed 2020-02-29].

  13. Sterling, A. Unpublished BS Thesis, 1977.

    Google Scholar 

  14. H. et al. Wickham. Welcome to the tidyverse. Journal of Open Source Software, 4(43):1686, 2019.

    Google Scholar 

  15. Wickham, H. ggplot2: Elegant graphics for data analysis. https://ggplot2.tidyverse.org, 2016. [Online, accessed 2020-02-29].

  16. Wickham, H. and Grolemund, G. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc., California, USA, 2017.

    Google Scholar 

  17. Wilkinson, L. The grammar of graphics. Springer Science & Business Media, Berlin, Germany, 2006.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Zamora Saiz, A., Quesada González, C., Hurtado Gil, L., Mondéjar Ruiz, D. (2020). Visualization. In: An Introduction to Data Analysis in R. Use R!. Springer, Cham. https://doi.org/10.1007/978-3-030-48997-7_4

Download citation

Publish with us

Policies and ethics