Overview of R and RStudio

Hair, Joseph F.; Hult, G. Tomas M.; Ringle, Christian M.; Sarstedt, Marko; Danks, Nicholas P.; Ray, Soumya

doi:10.1007/978-3-030-80519-7_2

Part of the book series: Classroom Companion: Business ((CCB))

58k Accesses

Abstract

Computational statistics is now an increasingly popular method of analysis for researchers that combines a vast array of algorithms, statistical methods, and the power of functional coding. The R programming language, in particular, has benefitted from this development alongside of traditional graphical user interface (GUI) software. Today, it has become the language of choice for empirical researchers. In this chapter, we introduce the R programming language as well as its popular development environment in the form of RStudio. We walk the reader through downloading both the R language and the RStudio integrated development environment (IDE). Then, we discuss the software layout and demonstrate how to interact with the software. Finally, we address creating and managing R projects and scripts, gaining access to documentation and help via various sources. This chapter is not intended as a tutorial on the writing of code in the R programming language. We do, however, provide useful open-source resources for learning R, which can be accessed from the R console RStudio environment.

You have full access to this open access chapter, Download chapter PDF

Keywords

FormalPara Learning Objectives

After reading this chapter, you will understand:

1.
The syntax and formatting for code used throughout this textbook.
2.
Why computational statistical languages are so powerful and useful.
3.
The layout and interface of R and RStudio.
4.
How to manage R scripts and projects.
5.
How to install and use packages.
6.
How to seek help when bugs or errors are encountered.

2.1 Introduction

This chapter introduces the two software packages that will be used throughout this textbook. Software packages are a series of software functions and features with a similar purpose bundled into a single set. First, we introduce the R statistical computing language (R Core Team, 2021), which is the software language we will use to import and clean data as well as create and analyze PLS path models. We will then introduce the RStudio (RStudio Team, 2021) application, which is an integrated development environment that enables you to easily and productively conduct computational analyses using the R language. We will explain how to download and install the software required, how to interact with the software, and how to store your data and code.

We then offer a basic introduction to writing analytic scripts in R. This textbook will not serve as a comprehensive resource for learning R, so we will share further resources for learning this programming language and helpful documentation on the Internet. Additionally, we will provide examples of R code throughout this textbook, so we start by looking at the syntax and formatting that we will use to distinguish code from regular text.

2.2 Explaining Our Syntax

Throughout this textbook, it will be necessary to discuss various elements of the code when explaining how to perform analytic operations using R. For the sake of clarity, we will use a distinguishable formatting and syntax of code in either code blocks or embedded in the text. Code in the R language will be formatted as follows: vector <- c(1, 2, 3, 4, 5). To distinguish code embedded in the text from regular text, the code will be bolded; and to distinguish arguments from regular code, the arguments will be italicized (weights =,data =). Furthermore, construct and variable names in the text will be italicized to distinguish them (e.g., QUAL and qual_1). We will use a similar format when code is used in a larger block. ◘ Table 2.1 provides a summary of the syntax and format of code that we use in this textbook. You may want to refer back to it when we show larger blocks of code.

Table 2.1 Syntax conventions used in this textbook (source: authors’ own table)

Full size table

The code block below also includes comments that describe the purpose of the following line of code. Comments are not run by the programming language and only serve as communication to other users of the code about the purpose. Comments in the R language begin with a pound symbol (“#”), and we will display them in gray (again, see the code block below).

# Create a vector of integers vector <- c(1, 2, 3, 4, 5)

2.3 Computational Statistics Using Programming

Data analytics using computationally intensive methods is becoming an increasingly important, strategic capability for companies to transform the data collected during business activities into information that can assist effective decision-making and policy creation. Similarly, academic research is rapidly adopting computational methods, involving the implementation of analytic techniques for inferential analysis and machine learning into computer programs (Hair & Sarstedt, 2021). Thus, researchers who learn and adopt computational methods will have the advantage of being able to apply and adapt the latest techniques to their research, while also being competent and conversant with industry trends.

We expect that many quantitative researchers are already familiar with certain types of software to analyze data: spreadsheet software, such as Apache OpenOffice Calc (► https://www.openoffice.org/product/calc.html) or Microsoft Excel (► https://www.microsoft.com/microsoft-365/excel), and more graphical, menu-driven software like IBM SPSS (► https://www.ibm.com/products/spss-statistics) and Statistica (► https://www.statsoft.de/de/software/statistica). Spreadsheet software has long been of value to business researchers, since a familiar ledger or balance book metaphor is adopted that predates computers. Spreadsheets are advantageous for smaller datasets, since they make it easy for users to manipulate data in tabular form and obtain quick results in the same interface as their data. Graphical, menu-driven software has also become popular during recent decades, since it is easier to learn and process for many users and provides rich visualizations. Both of these advantages allow software like SPSS and Tableau (► https://www.tableau.com) to facilitate communication between stakeholders and conduct data exploration interactively in meetings.

Computational analytics using programming syntax has been present since the earliest days of computing, but has recently gained new popularity from the advent of big data analytics, artificial intelligence, and the larger data science movement. By programming in a syntactic language, such as R or Python (Van Rossum & Drake, 1995), analysts can apply complex methods that are not easy to parameterize with spreadsheet or graphical, menu-driven software. Computation offers analysts the ability to run simulations that test particular scenarios and create novel solutions and custom visualizations, which were not considered by others, or are rather specific to one’s own use case. Moreover, the code that analysts generate serves as a manifest – or recipe – of their workflow that can be shared with other analysts or even deployed into online products and platforms. Finally, having code allows others to test, repeat, or replicate analyses in perfect detail – steps that are vital to modern applications of the scientific process (Rigdon, Sarstedt, & Becker, 2020). It is not surprising, therefore, that computational methods have become an integral component of the data science revolution in both industry and academia.

2.4 Introducing R and RStudio

R is a free, open-source software, which enables users to write and execute code that analyzes data. Readers should note that the name “R” can refer to both the programming language and the primary software that runs code written in this language. However, unless otherwise specified, in this book, R refers to the language. Further, open source refers to the kind of software whose underlying code is made freely available and is generally open to suggested improvements or new features built by others. The open-source nature of the R software makes code written in the R language highly reproducible, shareable, testable, scalable, and deployable to larger automated applications. An ever-expanding community of R users supports, tests, documents, and provides add-on resources for each other.

R (R Core Team, 2021) is an alternate implementation of the earlier S programming language, which was first developed by Ross Ihaka and Robert Gentleman in 1991 (Hornik & Leisch, 2002). The R language had been developed for several years, became free and open source in 1995, and started to gain attention with the first stable release on the Comprehensive R Archive Network (CRAN; ► http://www.r-project.org) in February 2000. CRAN serves as a vetted repository where reliable add-on packages of R code libraries can be freely contributed to or downloaded by R users around the world (packages are discussed in more detail in ► Sect. 2.6). The SEMinR package for PLS-SEM (Ray, Danks, & Valdez, 2021) we use in this book is also available on CRAN.

The R language was designed with computational statistics in mind. In its simplest form, it can be run from your operating system’s command line or from the R console (◘ Fig. 2.1). However, we recommend using R from the convenience of an integrated development environment (IDE), such as RStudio. An IDE is a programming environment that offers tools such as project management, tabs for easily managing multiple script files, and additional developer tools. We discuss the layout of the RStudio IDE in more detail in the next section. Throughout this book, we will demonstrate the use of R from within the RStudio IDE.

A window of the R console program output screen depicts the version, copyright, platform, natural language support, and workspace restored location. — **Fig. 2.1**

2.4.1 Installing R and RStudio

Before installing RStudio, the R software for executing code in the programming language must be installed on your operating system. The latest version of the R software for your operating system is available from the CRAN archive at the ► http://www.r-project.org website. Once you visit that website, click on the Download R link, select the mirror website closest to your location, and then choose the download file made for your operating system. Execute the download file and follow the instructions; the R software will then install on your computer.

Next, you will need to install RStudio from this website (► http://www.rstudio.com/). To do so, hover your mouse pointer over the Products menu and select RStudio from the dropdown menu. On the next page, click on the Download RStudio Desktop button, and once again click Download RStudio Desktop. The website will offer you the relevant version for your operating system. Execute the download file and follow the instructions. The RStudio IDE will then install on your computer. With both the R software and RStudio software installed on your computer, you can proceed to become familiar with the RStudio layout and interface.

2.4.2 Layout of RStudio

The RStudio desktop in its standard form comes with a layout of four primary windows: (1) In the upper-left corner is the source window; (2) in the upper right are the environment, history, connections, build, and git windows; (3) in the bottom left are the console and terminal windows; and (4) in the bottom right are the files, plots, packages, help, and viewer windows (◘ Fig. 2.2). Note that the source window only shows when data have been loaded. We will discuss this step in ► Chap. 3. Some of these windows are only available when settings have been enabled. For example, the git tab is only available when version control has been enabled and the build tab is only available when a package is being built. ◘ Table 2.2 describes the various windows and their uses in more detail.

Two windows of the untitled and global environment. It depicts the console details and file options window more in detail. — **Fig. 2.2**

Table 2.2 Table of the RStudio IDE desktop tabs, layout, and purpose

Full size table

Table 2.3 Excerpt of contents of the R documentation for read.csv()

Full size table

2.5 Organizing Your Projects

Organizing projects is much like organizing your documents in regular folders on your computer. The only major difference is you will need to remember where files are stored when loading files into or saving files out of the R environment. That is, you need to know the address of the file relative to the file you are editing. Often, users of R will create a catchall project (named workspace), in which they store their R script files, data files, and output files for multiple analyses or projects. This approach can quickly lead to chaos – the mixing of projects and the overwriting of crucial code and data files. Instead, we recommend you create separate projects and organize them carefully, so that you keep the contents of each project separately and provide some order to your workflow.

To begin a new project in RStudio, click on the File dropdown menu, and select New Project… (◘ Fig. 2.3). The Create Project window will then open and guide you through creating a new project. When you create your first project, you need to set up a New Directory which stores all your project files. Next, click on New Project. In the dialog box that follows, specify a project name under Directory name and choose a folder in which the project files should be stored. Finally, click on Create Project.

An RStudio window depicts the 12 lines of the program. The file menu is open, and the option labeled, new project is selected. — **Fig. 2.3**

An important feature of an R project is that the working directory, in which the project will be conducted, is specified. If at any time you wish to change the working directory, this can be done by clicking on the Session dropdown menu, selecting Set Working Directory, and then specifying the correct location (◘ Fig. 2.4).

An RStudio window depicts the 12 lines of the program. The session window is open, and the file pane location is selected from the set working directory. — **Fig. 2.4**

R project details are stored in an.Rproj file in the project directory. In addition, the environment containing any objects saved to memory is stored in the.Rdata file, and the history of keystrokes and commands run in the console is stored in an.Rhistory file. Thus, a snapshot is kept of your activity in the project, which is reloaded every time you reopen the project. Note, however, that the packages required to run your code need to be reloaded every time you reopen a project and are not stored in the snapshot.

2.6 Packages

R includes a lot of preinstalled packages containing many of the standard functions and algorithms you will use in your statistical computations. Examples of such standard functions are mean() and sd() for calculating the mean and standard deviation of a vector, respectively, or lm() for generating linear regression models. While you should be able to fulfill much of your computational needs with the standard packages bundled in R, you might need to install further software libraries containing newer or more complicated algorithms. Such software libraries are bundled as packages that, when installed, add a new range of functions and operations. Examples of popular packages are dplyr, ggplot, and, of course, the package used in this book, seminr.

A majority of packages are hosted on CRAN. The packages hosted on CRAN have met certain criteria to qualify for being published in the CRAN archive, such as having documentation, being tested, and kept up to date with the latest versions of R. These packages can be installed from the command line or from the packages tab (◘ Fig. 2.1 and ◘ Table 2.2). Note that you will need internet access to install packages from CRAN. To install new packages, select the Packages tab in the lower right window of the RStudio IDE, click the Install button, set Install from to Repository (CRAN), and enter the package name in the Packages field: “seminr” (◘ Fig. 2.5). Next, click on Install.

Two windows depict a 12-line program and a global environment blank window. An install packages dialog box is open in the middle, asking to select install from, packages, and install to the library. — **Fig. 2.5**

Packages can also be installed from the command line using the install.packages() function. In this case, we wish to install the swirl package, which teaches you R programming (see ► Sect. 2.7 for more details on the swirl package). We therefore set the pkgs parameter equal to “swirl”.

# Install the Swirl package install.packages(pkgs = “swirl”)

Note that packages are installed to the local software library on your computer but are not loaded into the RStudio local environment. Once a package is installed, it will be available for computation in R but has to be loaded using the library() function prior to use. Packages must be loaded in each session if you wish to use the functions in this library. If the package is not loaded in a new session (i.e., after opening and rerunning R), the features will not be available in your session until you load the package by using the library() function.

# Load the Swirl package into the environment library(swirl)

2.7 Writing R Scripts

Computational analyses are conducted by writing a series of instructions to the computer on how to import data, modify data, run algorithms for analyzing the data, and then report the results of those analyses. These instructions take the form of R scripts that are typically entered into a file, which contains all the scripts related to a single analysis or computation. These R script files have the suffix.R and are stored in your project directory.

To successfully conduct such analyses, you need to learn the form and function of the scripts that R can process. As indicated above, a key reason for using a free, open-source software, like R, is the community support and resources typically found for such software. A simple Internet search with keywords “R coding lesson” should provide hundreds of high-quality resources. We recommend swirl (► https://swirlstats.com/), which teaches you R programming by offering simple and useful lessons. This package helps the user become experienced at working with R’s command-based interface and can be downloaded and used from the R console command line.

# Begin learning with Swirl swirl()

In addition to online tutorials and code lessons, there are many free e-books describing both introductory and advanced usage of R and RStudio. A good archive for textbooks is available at the CRAN website (► https://www.r-project.org/other-docs.html). We highly recommend the book R for Data Science (Wickham & Grolemund, 2016). As we continue with this chapter and the textbook, we assume that you have studied the basics of using R and are comfortable with the language. We now turn our attention to overcoming the various challenges you might encounter, while writing R scripts and when using the SEMinR package.

2.8 How to Find Help in RStudio

Due to the complexity of a programming language – and the almost endless number of software libraries that can be installed adding to the functions and resources available to you – it can become difficult to keep track of how functions are called, what arguments they take, and what output they provide. Packages have a range of files that are designed to document and demonstrate the use of the functions they provide. These files take the form of R documentation, vignettes, and demonstration files. In this section, we discuss how to access information on using a function by inspecting these documents.

All packages submitted to CRAN are required to have sufficient documentation to describe the functions they add to your software library. For each function, there should be a matching R documentation file that can be accessed. R documentation describes the purpose, input, implementation, and output of a function and provides examples applying the syntax. The contents of an R document are described in ◘ Table 2.3. This documentation can be accessed in the help tab in the lower right window of the RStudio IDE. Help topics and functions can be searched for in the search field of the help window or from the command line in the console window using the ? operator. For example, we can search for help on the read.csv() function by typing the following into the console window in RStudio:

# Searching for help using the ? operator ?read.csv

In ◘ Fig. 2.6, we can see an excerpt of the contents of the R documentation for read.csv(). This information will be displayed in the help tab in the lower right window of the RStudio IDE and provides us with details on the purpose, arguments, and usage of the function and a demonstration example. When encountering a new function or an error, the R documentation is the first place to look in. For a full list of available R documentation topics for a package, click on the Packages tab in the bottom right window and select the package name (highlighted in blue). The Help tab will then open with the full list of documentation available for that package.

A window depicts the data input details from the Help menu. It explains the description and the usage. — **Fig. 2.6**

Another very important document to consult for help using a package or function is the vignette. Vignettes are designed as an all-purpose user’s guide for the package – they describe the problem that the package seeks to solve and how it is used. This document usually describes the functioning of the package in detail and provides examples and demonstrations of the problems and solutions. You can access a list of vignettes installed by calling the vignette() function. This will output a list of available vignettes to an R vignette tab in the top left window of RStudio. You can then run vignette(“SEMinR”) to access a particular vignette – in this case, the vignette for the package SEMinR (◘ Fig. 2.7).

# Check all vignettes available in R vignette() # Load the SEMinR vignette vignette(“SEMinR”)

An illustration depicts the seminar on Dec 31, 2020, package functions. It describes the introduction, setup, data, measurement model description, structural model description, model estimation, reporting the model estimation results, and reference. — **Fig. 2.7**

Another source of help is using the demonstration code that comes bundled with most R packages. These demonstration files typically include an example dataset and model to demonstrate the purpose of the package’s functions. To check all available demonstration files, use the demo() function. For the specific demonstration of the European Customer Satisfaction Index (ECSI) model (Eklöf & Westlund, 2002) in the SEMinR package, as originally presented by Tenenhaus, Esposito Vinzi, Chatelin, and Lauro (2005), use demo(“seminr-pls-ecsi”).

# Check all demos available in R demo() # Load the SEMinR ECSI demo demo(“seminr-pls-ecsi”)

A final invaluable source of help can be found by accessing the greater R community on platforms such as Stack Overflow (► https://stackoverflow.com/). These ask-and-answer forums put you in touch with seasoned veterans who can provide useful tips and other options for executing all your favorite R packages and functions. Members of these communities typically respond quickly and can provide excellent advice and solutions (◘ Fig. 2.8).

Summary

In this chapter, we introduced the R statistical programming language and its popular development environment, RStudio. You should now be familiar with the layout and functionality of RStudio, creating, downloading, and managing projects and RScript files. If you encounter a bug or a function that is unfamiliar, you should now have the requisite tools (and knowledge) for seeking out appropriate help. We strongly recommend the careful study of an introductory program to learning the R language. We also recommend the swirl package for learning R and provide some ideas to assist you in finding supplementary resources and gaining access to useful material.

Exercise

In this chapter, we recommend the use of the swirl package to learn basic coding concepts and become familiar with the popular functions in R (see ► Sects. 2.6 and 2.7 on installing and loading swirl). Please complete the following lessons in the swirl package:

1.
Basic building blocks
2.
Workspaces and files
3.
Sequences of numbers
4.
Vectors
5.
Missing values
6.
Subsetting vectors
7.
Matrices and data frames
8.
Logic
9.
Functions

References

Core Team, R. (2021). R: A language and environment for statistical computing [computer software]. Vienna: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/
Google Scholar
Eklöf, J. A., & Westlund, A. H. (2002). The pan-European customer satisfaction index program: Current work and the way ahead. Total Quality Management, 13(8), 1099–1106.
Article Google Scholar
Hair, J. F., & Sarstedt, M. (2021). Data, measurement, and causal inferences in machine learning: Opportunities and challenges for marketing. Journal of Marketing Theory & Practice, 29(1), 65–77.
Google Scholar
Hornik, K., & Leisch, F. (2002). Vienna and R: Love, marriage and the future. In R. Dutter (Ed.), Festschrift 50 Jahre Österreichische Statistische Gesellschaft (pp. 61–70). Vienna: Österreichische Statistische Gesellschaft.
Google Scholar
Ray, S., Danks, N. P., & Valdez, A.C. (2021). Seminr: Building and Estimating Structural Equation models [computer software]. R package version 2.1.0. Retrieved from: https://cran.r-project.org/web/packages/seminr/index.html
Rigdon, E. E., Sarstedt, M., & Becker, J.-M. (2020). Quantify uncertainty in behavioral research. Nature Human Behaviour, 4, 329–331.
Article Google Scholar
RStudio Team. (2021). RStudio: Integrated development for R [computer software]. Boston, MA: RStudio, PBC. Retrieved from: http://www.rstudio.com/
Google Scholar
Tenenhaus, M., Esposito Vinzi, V., Chatelin, Y.-M., & Lauro, C. (2005). PLS path modeling. Computational Statistics & Data Analysis, 48(1), 159–205.
Article Google Scholar
Van Rossum, G., & Drake, F. L. (1995). Python reference manual. Amsterdam: Centrum voor Wiskunde en Informatica.
Google Scholar
Wickham, H., & Grolemund, G. (2016). R for data science. Sebastopol, CA: O’Reilly Media. Retrieved from: https://r4ds.had.co.nz/
Google Scholar

Author information

Authors and Affiliations

Mitchell College of Business, University of South Alabama, Mobile, AL, USA
Joseph F. Hair Jr.
Broad College of Business, Michigan State University, East Lansing, MI, USA
G. Tomas M. Hult
Department of Management Science and Technology, Hamburg University of Technology, Hamburg, Germany
Christian M. Ringle
Otto-von-Guericke University Magdeburg, Magdeburg, Germany
Marko Sarstedt
Babeș-Bolyai University, Faculty of Economics and Business Administration, Cluj, Romania
Marko Sarstedt
Trinity Business School, Trinity College, Dublin, Ireland
Nicholas P. Danks
National Tsing Hua University, Hsinchu, Taiwan
Soumya Ray

Authors

Joseph F. Hair Jr.
View author publications
You can also search for this author in PubMed Google Scholar
G. Tomas M. Hult
View author publications
You can also search for this author in PubMed Google Scholar
Christian M. Ringle
View author publications
You can also search for this author in PubMed Google Scholar
Marko Sarstedt
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas P. Danks
View author publications
You can also search for this author in PubMed Google Scholar
Soumya Ray
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hair, J.F., Hult, G.T.M., Ringle, C.M., Sarstedt, M., Danks, N.P., Ray, S. (2021). Overview of R and RStudio. In: Partial Least Squares Structural Equation Modeling (PLS-SEM) Using R. Classroom Companion: Business. Springer, Cham. https://doi.org/10.1007/978-3-030-80519-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-80519-7_2
Published: 04 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80518-0
Online ISBN: 978-3-030-80519-7
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics

Overview of R and RStudio

Abstract

Keywords

2.1 Introduction

2.2 Explaining Our Syntax

2.3 Computational Statistics Using Programming