Skip to main content

Introduction to R

  • Chapter
  • First Online:
An Introduction to Data Analysis in R

Part of the book series: Use R! ((USE R))

  • 6124 Accesses

Abstract

From Business Intelligence to advanced statistics applications, professionals are expected to access and manipulate large datasets, and R is the perfect tool for it. In this introductory chapter, we explain the principles of programming and the position of R in data science today. Then, a beginners level course on R starts introducing the main data types of this superior programming language. Examples and exercises are included to provide a hands-on training, guaranteeing the users control and understanding of R capabilities. Then, two main generic programming tools are introduced: control structures and functions. This will allow us to manipulate our datasets and generate all sorts of values and conclusions. In addition, this chapter includes specific R operators that highly simplify the use of R and enhance its capabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This algorithm is usually called long division method in US schools and many other places.

  2. 2.

    Assembly languages are often abbreviated asm.

  3. 3.

    Here, with machine we refer both to hardware, the architecture of the computer, and software, the operating system.

  4. 4.

    FORTRAN is the acronym of FORmula TRANslation.

  5. 5.

    Latest FORTRAN version was released on November 28, 2018, known as FORTRAN 2018, see https://wg5-fortran.org/f2018.html.

  6. 6.

    Message, by Peter Dalgaard, of the first beta version released https://stat.ethz.ch/pipermail/r-announce/2000/000127.html.

  7. 7.

    GNU is a recursive acronym for “GNU’s Not Unix.”

  8. 8.

    https://www.gnu.org.

  9. 9.

    Upon completion of this book, last stable versions are R 3.6.2, called Dark and Stormy Night, released on December 12, 2019.

  10. 10.

    https://www.r-project.org.

  11. 11.

    Visit https://stat.ethz.ch/pipermail/r-announce/1997/000001.html for the announcement by Kurt Hornik of the opening of CRAN site.

  12. 12.

    https://cran.r-project.org/.

  13. 13.

    In order to know the exact amount of available packages at a certain moment one can type nrow( available.packages( ) ) on the R console. .

  14. 14.

    https://stat.ethz.ch/mailman/listinfo/r-help.

  15. 15.

    https://stackoverflow.com/questions/tagged/r.

  16. 16.

    https://www.r-project.org/conferences.html.

  17. 17.

    https://journal.r-project.org.

  18. 18.

    Computational costs are defined as the amount of time and memory needed to run an algorithm.

  19. 19.

    Upon this book completion, the CRAN package repository features 15,368 available packages comprehending many possible extensions of the R core library.

  20. 20.

    https://www.tiobe.com/tiobe-index/.

  21. 21.

    https://julialang.org.

  22. 22.

    https://julialang.org/blog/2012/02/why-we-created-julia Ⓒ 2020 JuliaLang.org contributors.

  23. 23.

    https://github.com/ThinkR-open/companies-using-r.

  24. 24.

    https://www.rstudio.com.

  25. 25.

    Announcement of RStudio release on February 28, 2011, https://blog.rstudio.com/2011/02/28/rstudio-new-open-source-ide-for-r/. Upon completion of this book, last stable version released is RStudio 1.2.5033, on December 3, 2019.

  26. 26.

    For example, Oracle, OLBC, Spark, and many others.

  27. 27.

    A very useful shortcut is to use Control+ Enter in PC, or Command+ Enter in Mac, to run each code line.

  28. 28.

    Have a look at the menu Code in RStudio for different run options.

  29. 29.

    The package ggplot2 will be called in Sect. 4.2 to create ellaborated plots in R.

  30. 30.

    When calling library( ) quotation marks are not needed.

  31. 31.

    https://rmarkdown.rstudio.com.

  32. 32.

    We can access this list at https://cran.r-project.org/web/views/.

  33. 33.

    https://www.r-project.org/mail.html.

  34. 34.

    https://stackoverflow.com/.

  35. 35.

    Learn how to make your questions reading https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and https://www.r-project.org/posting-guide.html.

  36. 36.

    The acronym RAM stands for Random-access memory.

  37. 37.

    The same result is achieved by typing assign( "x",4) .

  38. 38.

    A Boolean expression is a data type whose possible values are either TRUE or FALSE . It is named after the mathematician George Bool.

  39. 39.

    For simplicity, logical values can also be written as T or F. We will use the full word or the initial letter indistinctively throughout the book.

  40. 40.

    In Sect. 5.1 we will see how to remove these NAs when performing calculations over vectors containing them, with the argument na.rm .

  41. 41.

    The conversion between Celsius degrees C and Fahrenheit degrees F is F = 1.8 ×C + 32. To go from Celsius to Kelvin we just shift the zero in the scale to 273.15.

  42. 42.

    See Sect. 5.1.1 for an explanation of the arithmetic mean and other statistical measures.

  43. 43.

    summary( ) is one of the most robust and powerful commands in R. Almost all kind of structures can be passed as an argument of this command and it will usually provide plenty of information.

  44. 44.

    Everything can be ordered, alphabetically for example, but nominal scales have no meaningful order related to anything intrinsic to the nature of the variable.

  45. 45.

    Thanks to the combination command c( ) , if data are of different types, all of them are stored in the most general type admitting all kinds appearing in the structure.

  46. 46.

    Unlike matrices, if the column lengths to be included in the data frame are not the same, the function returns an error and a data frame filling the gaps is not created.

  47. 47.

    In Spain and other countries, two family names are used, preserving both the last name of the father and the mother.

  48. 48.

    When applying as.data.frame, unless otherwise specified, the default names of the variables are V 1, V 2, etc., meaning variable 1, variable 2, etc.

  49. 49.

    Some R packages are specially designed for dealing with datasets, such as tibble and data.table, we will explore the later one in Chap. 3.

  50. 50.

    Technically speaking, when using 1 : 10, R is internally doing a loop, so the previous code could be simplified to print( 1 : 10) but it is valid as a first and easy example.

  51. 51.

    Note that, in the example, the logical evaluation of the expression 3!=3 is FALSE, whereas being or not a logical expression is TRUE. Try the command is.logical( "Hello") to see the difference.

  52. 52.

    Observe that f(1) is undefined, because we are dividing by zero. Despite this, R outputs Inf recovering the limits of f when x approaches 0.

  53. 53.

    A richer function is already implemented in the R base library under the name mat.or.vec( ) .

  54. 54.

    The computational advantages and disadvantages of using or not return( ) are beyond the scope of this book.

References

  1. Allaire, J.J. Rstudio: Integrated development environment for r. In The R User Conference, useR!, page 14, Coventry, UK, 2011. University of Warwick.

    Google Scholar 

  2. Allen, F.E. The history of language processor technology in IBM. IBM Journal of Research and Development, 25(5):535–548, 1981.

    Article  Google Scholar 

  3. Austrian, G. Herman Hollerith, forgotten giant of information processing. Columbia University Press, New York, USA, 1982.

    Google Scholar 

  4. Babbage, C. Passages from the Life of a Philosopher. Longman, Green, Longman, Roberts, and Green, London, UK, 1864.

    Google Scholar 

  5. Blass, A. and Gurevich, Y. Algorithms: A quest for absolute definitions. In Current Trends in Theoretical Computer Science: The Challenge of the New Century Vol 1: Algorithms and Complexity Vol 2: Formal Models and Semantics, pages 283–311. World Scientific, Singapur, 2004.

    Google Scholar 

  6. Böhm, C. Calculatrices digitales. Du déchiffrage de formules logico-mathématiques par la machine même dans la conception du programme. Annali di Matematica Pura ed Applicata, 37(1):175–217, 1954.

    Google Scholar 

  7. Cardelli, L. Type systems. ACM Computing Surveys, 28(1):263–264, 1996.

    Article  Google Scholar 

  8. Chambers, J.M.S. Programming with data: A guide to the S language. Springer Science & Business Media, Berlin, Germany, 1998.

    Google Scholar 

  9. Conference on Data Systems Languages. Programming Language Committee. CODASYL COBOL journal of development, 1968. United States Dept. of Commerce, National Bureau of Standards, Maryland, USA, 1969.

    Google Scholar 

  10. Copeland, B.J. The Essential Turing. Clarendon Press, Oxford, UK, 2004.

    MATH  Google Scholar 

  11. Dobre, A.M., Caragea, N. and Alexandru, C.A. R versus Other Statistical Software. Ovidius University Annals, Series Economic Sciences, 13(1), 2013.

    Google Scholar 

  12. Dybvig, R.K. The SCHEME programming language. MIT Press, Massachusetts, USA, 2009.

    MATH  Google Scholar 

  13. Friedman, D.P., Wand, M. and Haynes, C.T. Essentials of programming languages. MIT Press, Massachusetts, USA, 2001.

    MATH  Google Scholar 

  14. Gunter, C.A. Semantics of programming languages: structures and techniques. MIT Press, Massachusetts, USA, 1992.

    MATH  Google Scholar 

  15. Harper, R. What, if anything, is a programming paradigm?, 2017.

    Google Scholar 

  16. Hornik, K. The comprehensive R archive network. Wiley Interdisciplinary Reviews: Computational Statistics, 4(4):394–398, 2012.

    Article  Google Scholar 

  17. Hornik, K. R FAQ. https://CRAN.R-project.org/doc/FAQ/R-FAQ.html, 2018. [Online, accessed 2020-02-29].

  18. Ihaka, R. R: lessons learned, directions for the future. In Joint Statistical Meetings proceedings, Virginia, USA, 2010. ASA.

    Google Scholar 

  19. Ihaka, R. and Gentleman, R. R: a language for data analysis and graphics. Journal of computational and graphical statistics, 5(3):299–314, 1996.

    Google Scholar 

  20. Iverson, K.E. Notation as a tool of thought. Commun. ACM, 23(8):444–465, 1980.

    Article  MathSciNet  Google Scholar 

  21. Knuth, D.E. The art of computer programming, volume 3. Pearson Education, London, UK, 1997.

    Google Scholar 

  22. Knuth, D.E. and Pardo, L.T. The early development of programming languages. In A history of computing in the twentieth century, pages 197–273. Elsevier, Amsterdam, Netherlands, 1980.

    Google Scholar 

  23. McCarthy, J. Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I. Commun. ACM, 3(4):184–195, 1960.

    Article  Google Scholar 

  24. Menabrea, L.F. Notions sur la Machine Analytique de M. Charles Babbage. Bibliothèque Universelle de Genève, 41:352–376, 1842. Translated, with additional notes by Augusta Ada, Countess of Lovelace, as Sketch of the Analytical Engine.

    Google Scholar 

  25. Posselt, E.A. and Philadelphia Museum of Art. The Jacquard Machine Analyzed and Explained: with an Appendix on the Preparation of Jacquard Cards. Published under the auspices of the school [Pennsylvania museum and school of industrial art], Pennsylvania, USA, 1887.

    Google Scholar 

  26. Price, D.S. A history of calculating machines. IEEE Micro, 4(1):22–52, 1984.

    Article  MathSciNet  Google Scholar 

  27. Pugh, E.W. and Eugene Spafford Collection. Building IBM: Shaping an Industry and Its Technology. MIT Press, Massachusetts, USA, 1995.

    Google Scholar 

  28. R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria, 2018.

    Google Scholar 

  29. Racine, J.S. RStudio: A platform-independent IDE for R and Sweave. Journal of Applied Econometrics, 27(1):167–172, 2012.

    Article  Google Scholar 

  30. Rogers, H. and Rogers, H. Theory of recursive functions and effective computability, volume 5. McGraw-Hill, New York, USA, 1967.

    MATH  Google Scholar 

  31. RStudio Team. RStudio: Integrated Development Environment for R. Massachusetts, USA, 2015.

    Google Scholar 

  32. Slonneger, K. and Kurtz, B.L. Formal syntax and semantics of programming languages, volume 340. Addison-Wesley Reading, Massachusetts, USA, 1995.

    Google Scholar 

  33. Truesdell, L.E. The development of punch card tabulation in the Bureau of the Census, 1890-1940; with outlines of actual tabulation programs. U. S. Dept. of Commerce, Bureau of the Census Washington, Washington DC, USA, 1965.

    Google Scholar 

  34. Turing, A.M. On Computable Numbers, with an Application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, 2(42):230–265, 1936.

    MathSciNet  MATH  Google Scholar 

  35. Van Roy, P. and Haridi, S. Concepts, Techniques, and Models of Computer Programming. MIT Press, Massachusetts, USA, 2003.

    Google Scholar 

  36. Wickham, H. R Packages: Organize, Test, Document, and Share Your Code. O’Reilly Media, California, USA, 2015.

    Google Scholar 

  37. Winskel, G. The formal semantics of programming languages: an introduction. MIT Press, Massachusetts, USA, 1993.

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Zamora Saiz, A., Quesada González, C., Hurtado Gil, L., Mondéjar Ruiz, D. (2020). Introduction to R . In: An Introduction to Data Analysis in R. Use R!. Springer, Cham. https://doi.org/10.1007/978-3-030-48997-7_2

Download citation

Publish with us

Policies and ethics