Visualization

Zamora Saiz, Alfonso; Quesada González, Carlos; Hurtado Gil, Lluís; Mondéjar Ruiz, Diego

doi:10.1007/978-3-030-48997-7_4

Alfonso Zamora Saiz⁸,
Carlos Quesada González⁹,
Lluís Hurtado Gil¹⁰ &
…
Diego Mondéjar Ruiz⁹

Part of the book series: Use R! ((USE R))

7061 Accesses
1 Altmetric

Abstract

Presenting conclusions with the help of a graph can greatly improve your communication and convincing skills. R is a proficient tool for data visualization and in this chapter we explore some of the most well known plotting packages. First, with the R base graphics one can elaborate most of the fundamental graph styles with great level of customization. This package is commonly used to produce explanatory graphs, being a valuable help to visualize the properties of a dataset. Second, the widely used ggplot2 package can be used to produce highly aesthetic graphs with ease. This exceptional tool processes input data into a final plot which displays new conclusions in an understandable fashion. Finally, and for an extra domain on data visualization, the packages plotly and leaflet, specialized in the construction of interactive plots and maps respectively, are introduced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A univariate dataset consists of one variable data, whereas multivariate allows for many variables. More on this will be seen in Chap. 5.
2.
Throughout Chap. 4, whenever this happens, we omit repeated arguments and focus only on the particular ones. The reader should understand a similar usage for those arguments appearing in several plotting functions.
3.
RGB stands for Red Green Blue and is a way of defining almost every color based on the proportion of each primary color.
4.
For example, plotting a matrix with u and v as columns yields the right-hand picture in Fig. 4.1.
5.
This can also be obtained with function pairs( ) from package graphics used on numerical matrices.
6.
The dataset iris is contained in the package datasets included in the R core.
7.
When the argument col is filled with the variable Species, which is a factor vector with three levels, the first three different colors in the R palette are assigned to corresponding observations from each level.
8.
The USPersonalExpenditure dataset is contained in the datasets package.
9.
Legend arguments are passed with args.legend and will be explored in detail in Sect. 4.1.3.
10.
This will be explained in detail in Sect. 5.1.1.
11.
The notches are depicted to a distance of ± 1.58 the interquartile range (a dispersion measure of the data explained in Sect. 5.1.2) divided by the square root of the sample size. This calculation, according to [3], gives a 95% confidence interval for the difference between the two medians being statistically significant.
12.
A continuous variable X is a function taking values on the real numbers. See Sect. 5.2.1.
13.
It is important to note that the number of breaks is only interpreted by R as a suggestion, so you might ask for breaks=5 and get a plot with 7 breaks, for example.
14.
Recall that seq( start, end, by) creates a sequence vector with the starting and end points and the gap between entries.
15.
Image provided by Holtz [9] via https://www.r-graph-gallery.com/74-margin-and-oma-cheatsheet/.
16.
The ggplot2 motto is Create Elegant Data Visualizations Using the Grammar of Graphics.
17.
For more examples and full description of all functions, visit https://ggplot2.tidyverse.org/index.html.
18.
It allows more than the two required arguments, but their purpose can be achieved in a more natural way with other layers.
19.
The main specific arguments are listed in the table and exploring them is left to the reader since, by now, it should be straight forward.
20.
The name alpha is a standard way to refer to transparency, not only in programming but also in picture or video edition.
21.
The line type, width, and other components that relate to the particular aspects of a line can be modified by using several secondary arguments that the reader can check in the documentation.
22.
Except for the main title and axes labels.
23.
The calculations for the slope and intercept will be studied in Sect. 5.3.2.
24.
By means of outlier.color, outlier.fill, outlier.shape (which hides the outliers if set to NA), outlier.size, outlier.stroke, and outlier.alpha.
25.
By default, a method is chosen based on the sample size. For less than 1000 observations, the method loess (locally estimated scatterplot smoothing, [5]) is implemented; otherwise, gam (generalized additive models, [8, 10]), is used. Both methods are far advanced for the scope of this book, the reason why we stick to the linear model by setting method="lm."
26.
To have the plot by rows we will use facet_grid( cut∼ .) .
27.
If an implemented function is used, it should be between commas.
28.
The code above is equivalent to set theta="x."
29.
plotly is also available in other languages such as Python and in standalone online versions.
30.
Leaflet is an open-source JavaScript library.
31.
https://agafonkin.com/.
32.
In geographic coordinates, latitude is the angular distance measured along a meridian, with value 0^∘ at the equator and 90^∘ at the north and south poles. Longitude is the angular distance from the Greenwich meridian along the equator going from 0^∘ till 180^∘ East and West, respectively. South latitudes and West longitudes will be set in R as negative.
33.
The Staples Center is a multi-purpose arena in Los Angeles city, site of several sports and arts international events.
34.
https://cloud.google.com/maps-platform/.
35.
The usage of this platform requires to be registered and to accept Google terms and introduce our billing data, even though no charge is done without explicit user approval.
36.
This is a fake key used as an example, which should be substituted by the reader’s personal one.
37.
In fact it is a tibble: a data format used by R packages from the tidyverse universe like leaflet. In most situations it can be used just as a data frame.
38.
Try flights[!( country=="Russian Federation" | country=="United Kingdom") ].

References

Agafonkin, V. Leaflet: an open-source JavaScript library for mobile-friendly interactive maps. https://leafletjs.com/, 2014. [Online, accessed 2020-02-29].
Belsley, D.A., Kuh, E. and Welsch, R.E. Regression diagnostics: Identifying influential data and sources of collinearity. Wiley Series in Probability and Mathematical Statistics, 571(4), 1980.
Google Scholar
Chambers, J. M., Cleveland, W. S., Kleiner, B. and Tukey, P. A. Graphical Methods for Data Analysis. Wadsworth & Brooks/Cole Mathematics Series, Springer, Heidelberg, Germany, 1983.
Google Scholar
Cheng, J., Xie, Y., Wickham, H. and Agafonkin, V. leaflet: Create interactive web maps with the JavaScript ‘leaflet’ library. R package version, 1(0):423, 2017.
Google Scholar
Cleveland, W.S. LOWESS: A program for smoothing scatterplots by robust locally weighted regression. American Statistician, 35(1):54, 1981.
Article Google Scholar
Cleveland, W.S. The Elements of Graphing Data. Wadsworth Publ. Co., California, USA, 1985.
Google Scholar
Fisher, R.A. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179–188, 1936.
Article Google Scholar
Hastie, T., Tibshirani, R. and Friedman, J.H. The elements of statistical learning: data mining, inference, and prediction. Springer, Berlin, Germany, 2009.
Book Google Scholar
Holtz, Y. Margin and Oma cheatsheet. https://www.r-graph-gallery.com/74-margin-and-oma-cheatsheet/, 2016. [Online, accessed 2020-02-29].
James, G., Witten, D., Hastie, T. and Tibshirani, R. An Introduction to Statistical Learning: With Applications in R. Springer Publishing Company, Incorporated, New York, USA, 2014.
MATH Google Scholar
OpenStreetMap. Planet dump retrieved from https://planet.osm.org. https://www.openstreetmap.org, 2017. [Online, accessed 2020-02-29].
Plotly Technologies Inc. Collaborative data science. https://plot.ly, 2015. [Online, accessed 2020-02-29].
Sterling, A. Unpublished BS Thesis, 1977.
Google Scholar
H. et al. Wickham. Welcome to the tidyverse. Journal of Open Source Software, 4(43):1686, 2019.
Google Scholar
Wickham, H. ggplot2: Elegant graphics for data analysis. https://ggplot2.tidyverse.org, 2016. [Online, accessed 2020-02-29].
Wickham, H. and Grolemund, G. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc., California, USA, 2017.
Google Scholar
Wilkinson, L. The grammar of graphics. Springer Science & Business Media, Berlin, Germany, 2006.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics Applied to ICT, Technical University of Madrid, Madrid, Spain
Alfonso Zamora Saiz
Department of Applied Mathematics and Statistics, Universidad San Pablo CEU, Madrid, Spain
Carlos Quesada González & Diego Mondéjar Ruiz
eDreams ODIGEO, Barcelona, Spain
Lluís Hurtado Gil

Authors

Alfonso Zamora Saiz
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Quesada González
View author publications
You can also search for this author in PubMed Google Scholar
Lluís Hurtado Gil
View author publications
You can also search for this author in PubMed Google Scholar
Diego Mondéjar Ruiz
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zamora Saiz, A., Quesada González, C., Hurtado Gil, L., Mondéjar Ruiz, D. (2020). Visualization. In: An Introduction to Data Analysis in R. Use R!. Springer, Cham. https://doi.org/10.1007/978-3-030-48997-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-48997-7_4
Published: 28 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-48996-0
Online ISBN: 978-3-030-48997-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics