Keywords

FormalPara Learning Objectives

After reading this chapter, you will understand:

  1. 1.

    The syntax and formatting for code used throughout this textbook.

  2. 2.

    Why computational statistical languages are so powerful and useful.

  3. 3.

    The layout and interface of R and RStudio.

  4. 4.

    How to manage R scripts and projects.

  5. 5.

    How to install and use packages.

  6. 6.

    How to seek help when bugs or errors are encountered.

2.1 Introduction

This chapter introduces the two software packages that will be used throughout this textbook. Software packages are a series of software functions and features with a similar purpose bundled into a single set. First, we introduce the R statistical computing language (R Core Team, 2021), which is the software language we will use to import and clean data as well as create and analyze PLS path models. We will then introduce the RStudio (RStudio Team, 2021) application, which is an integrated development environment that enables you to easily and productively conduct computational analyses using the R language. We will explain how to download and install the software required, how to interact with the software, and how to store your data and code.

We then offer a basic introduction to writing analytic scripts in R. This textbook will not serve as a comprehensive resource for learning R, so we will share further resources for learning this programming language and helpful documentation on the Internet. Additionally, we will provide examples of R code throughout this textbook, so we start by looking at the syntax and formatting that we will use to distinguish code from regular text.

2.2 Explaining Our Syntax

Throughout this textbook, it will be necessary to discuss various elements of the code when explaining how to perform analytic operations using R. For the sake of clarity, we will use a distinguishable formatting and syntax of code in either code blocks or embedded in the text. Code in the R language will be formatted as follows: vector <- c(1, 2, 3, 4, 5). To distinguish code embedded in the text from regular text, the code will be bolded; and to distinguish arguments from regular code, the arguments will be italicized (weights =,data =). Furthermore, construct and variable names in the text will be italicized to distinguish them (e.g., QUAL and qual_1). We will use a similar format when code is used in a larger block. ◘ Table 2.1 provides a summary of the syntax and format of code that we use in this textbook. You may want to refer back to it when we show larger blocks of code.

Table 2.1 Syntax conventions used in this textbook (source: authors’ own table)

The code block below also includes comments that describe the purpose of the following line of code. Comments are not run by the programming language and only serve as communication to other users of the code about the purpose. Comments in the R language begin with a pound symbol (“#”), and we will display them in gray (again, see the code block below).

# Create a vector of integers vector <- c(1, 2, 3, 4, 5)

2.3 Computational Statistics Using Programming

Data analytics using computationally intensive methods is becoming an increasingly important, strategic capability for companies to transform the data collected during business activities into information that can assist effective decision-making and policy creation. Similarly, academic research is rapidly adopting computational methods, involving the implementation of analytic techniques for inferential analysis and machine learning into computer programs (Hair & Sarstedt, 2021). Thus, researchers who learn and adopt computational methods will have the advantage of being able to apply and adapt the latest techniques to their research, while also being competent and conversant with industry trends.

We expect that many quantitative researchers are already familiar with certain types of software to analyze data: spreadsheet software, such as Apache OpenOffice Calc (► https://www.openoffice.org/product/calc.html) or Microsoft Excel (► https://www.microsoft.com/microsoft-365/excel), and more graphical, menu-driven software like IBM SPSS (► https://www.ibm.com/products/spss-statistics) and Statistica (► https://www.statsoft.de/de/software/statistica). Spreadsheet software has long been of value to business researchers, since a familiar ledger or balance book metaphor is adopted that predates computers. Spreadsheets are advantageous for smaller datasets, since they make it easy for users to manipulate data in tabular form and obtain quick results in the same interface as their data. Graphical, menu-driven software has also become popular during recent decades, since it is easier to learn and process for many users and provides rich visualizations. Both of these advantages allow software like SPSS and Tableau (► https://www.tableau.com) to facilitate communication between stakeholders and conduct data exploration interactively in meetings.

Computational analytics using programming syntax has been present since the earliest days of computing, but has recently gained new popularity from the advent of big data analytics, artificial intelligence, and the larger data science movement. By programming in a syntactic language, such as R or Python (Van Rossum & Drake, 1995), analysts can apply complex methods that are not easy to parameterize with spreadsheet or graphical, menu-driven software. Computation offers analysts the ability to run simulations that test particular scenarios and create novel solutions and custom visualizations, which were not considered by others, or are rather specific to one’s own use case. Moreover, the code that analysts generate serves as a manifest – or recipe – of their workflow that can be shared with other analysts or even deployed into online products and platforms. Finally, having code allows others to test, repeat, or replicate analyses in perfect detail – steps that are vital to modern applications of the scientific process (Rigdon, Sarstedt, & Becker, 2020). It is not surprising, therefore, that computational methods have become an integral component of the data science revolution in both industry and academia.

2.4 Introducing R and RStudio

R is a free, open-source software, which enables users to write and execute code that analyzes data. Readers should note that the name “R” can refer to both the programming language and the primary software that runs code written in this language. However, unless otherwise specified, in this book, R refers to the language. Further, open source refers to the kind of software whose underlying code is made freely available and is generally open to suggested improvements or new features built by others. The open-source nature of the R software makes code written in the R language highly reproducible, shareable, testable, scalable, and deployable to larger automated applications. An ever-expanding community of R users supports, tests, documents, and provides add-on resources for each other.

R (R Core Team, 2021) is an alternate implementation of the earlier S programming language, which was first developed by Ross Ihaka and Robert Gentleman in 1991 (Hornik & Leisch, 2002). The R language had been developed for several years, became free and open source in 1995, and started to gain attention with the first stable release on the Comprehensive R Archive Network (CRAN; ► http://www.r-project.org) in February 2000. CRAN serves as a vetted repository where reliable add-on packages of R code libraries can be freely contributed to or downloaded by R users around the world (packages are discussed in more detail in ► Sect. 2.6). The SEMinR package for PLS-SEM (Ray, Danks, & Valdez, 2021) we use in this book is also available on CRAN.

The R language was designed with computational statistics in mind. In its simplest form, it can be run from your operating system’s command line or from the R console (◘ Fig. 2.1). However, we recommend using R from the convenience of an integrated development environment (IDE), such as RStudio. An IDE is a programming environment that offers tools such as project management, tabs for easily managing multiple script files, and additional developer tools. We discuss the layout of the RStudio IDE in more detail in the next section. Throughout this book, we will demonstrate the use of R from within the RStudio IDE.

Fig. 2.1
A window of the R console program output screen depicts the version, copyright, platform, natural language support, and workspace restored location.

The R console. (Source: authors’ screenshot from R)

2.4.1 Installing R and RStudio

Before installing RStudio, the R software for executing code in the programming language must be installed on your operating system. The latest version of the R software for your operating system is available from the CRAN archive at the ► http://www.r-project.org website. Once you visit that website, click on the Download R link, select the mirror website closest to your location, and then choose the download file made for your operating system. Execute the download file and follow the instructions; the R software will then install on your computer.

Next, you will need to install RStudio from this website (► http://www.rstudio.com/). To do so, hover your mouse pointer over the Products menu and select RStudio from the dropdown menu. On the next page, click on the Download RStudio Desktop button, and once again click Download RStudio Desktop. The website will offer you the relevant version for your operating system. Execute the download file and follow the instructions. The RStudio IDE will then install on your computer. With both the R software and RStudio software installed on your computer, you can proceed to become familiar with the RStudio layout and interface.

2.4.2 Layout of RStudio

The RStudio desktop in its standard form comes with a layout of four primary windows: (1) In the upper-left corner is the source window; (2) in the upper right are the environment, history, connections, build, and git windows; (3) in the bottom left are the console and terminal windows; and (4) in the bottom right are the files, plots, packages, help, and viewer windows (◘ Fig. 2.2). Note that the source window only shows when data have been loaded. We will discuss this step in ► Chap. 3. Some of these windows are only available when settings have been enabled. For example, the git tab is only available when version control has been enabled and the build tab is only available when a package is being built. ◘ Table 2.2 describes the various windows and their uses in more detail.

Fig. 2.2
Two windows of the untitled and global environment. It depicts the console details and file options window more in detail.

The RStudio IDE desktop layout. (Source: authors’ screenshot from RStudio)

Table 2.2 Table of the RStudio IDE desktop tabs, layout, and purpose
Table 2.3 Excerpt of contents of the R documentation for read.csv()

2.5 Organizing Your Projects

Organizing projects is much like organizing your documents in regular folders on your computer. The only major difference is you will need to remember where files are stored when loading files into or saving files out of the R environment. That is, you need to know the address of the file relative to the file you are editing. Often, users of R will create a catchall project (named workspace), in which they store their R script files, data files, and output files for multiple analyses or projects. This approach can quickly lead to chaos – the mixing of projects and the overwriting of crucial code and data files. Instead, we recommend you create separate projects and organize them carefully, so that you keep the contents of each project separately and provide some order to your workflow.

To begin a new project in RStudio, click on the File dropdown menu, and select New Project… (◘ Fig. 2.3). The Create Project window will then open and guide you through creating a new project. When you create your first project, you need to set up a New Directory which stores all your project files. Next, click on New Project. In the dialog box that follows, specify a project name under Directory name and choose a folder in which the project files should be stored. Finally, click on Create Project.

Fig. 2.3
An RStudio window depicts the 12 lines of the program. The file menu is open, and the option labeled, new project is selected.

Creating a new project in RStudio. (Source: authors’ screenshot from RStudio)

An important feature of an R project is that the working directory, in which the project will be conducted, is specified. If at any time you wish to change the working directory, this can be done by clicking on the Session dropdown menu, selecting Set Working Directory, and then specifying the correct location (◘ Fig. 2.4).

Fig. 2.4
An RStudio window depicts the 12 lines of the program. The session window is open, and the file pane location is selected from the set working directory.

Changing the working directory. (Source: authors’ screenshot from RStudio)

R project details are stored in an.Rproj file in the project directory. In addition, the environment containing any objects saved to memory is stored in the.Rdata file, and the history of keystrokes and commands run in the console is stored in an.Rhistory file. Thus, a snapshot is kept of your activity in the project, which is reloaded every time you reopen the project. Note, however, that the packages required to run your code need to be reloaded every time you reopen a project and are not stored in the snapshot.

2.6 Packages

R includes a lot of preinstalled packages containing many of the standard functions and algorithms you will use in your statistical computations. Examples of such standard functions are mean() and sd() for calculating the mean and standard deviation of a vector, respectively, or lm() for generating linear regression models. While you should be able to fulfill much of your computational needs with the standard packages bundled in R, you might need to install further software libraries containing newer or more complicated algorithms. Such software libraries are bundled as packages that, when installed, add a new range of functions and operations. Examples of popular packages are dplyr, ggplot, and, of course, the package used in this book, seminr.

A majority of packages are hosted on CRAN. The packages hosted on CRAN have met certain criteria to qualify for being published in the CRAN archive, such as having documentation, being tested, and kept up to date with the latest versions of R. These packages can be installed from the command line or from the packages tab (◘ Fig. 2.1 and ◘ Table 2.2). Note that you will need internet access to install packages from CRAN. To install new packages, select the Packages tab in the lower right window of the RStudio IDE, click the Install button, set Install from to Repository (CRAN), and enter the package name in the Packages field: “seminr” (◘ Fig. 2.5). Next, click on Install.

Fig. 2.5
Two windows depict a 12-line program and a global environment blank window. An install packages dialog box is open in the middle, asking to select install from, packages, and install to the library.

Installing packages from the RStudio IDE. (Source: authors’ screenshot from RStudio)

Packages can also be installed from the command line using the install.packages() function. In this case, we wish to install the swirl package, which teaches you R programming (see ► Sect. 2.7 for more details on the swirl package). We therefore set the pkgs parameter equal to “swirl”.

# Install the Swirl package install.packages(pkgs = “swirl”)

Note that packages are installed to the local software library on your computer but are not loaded into the RStudio local environment. Once a package is installed, it will be available for computation in R but has to be loaded using the library() function prior to use. Packages must be loaded in each session if you wish to use the functions in this library. If the package is not loaded in a new session (i.e., after opening and rerunning R), the features will not be available in your session until you load the package by using the library() function.

# Load the Swirl package into the environment library(swirl)

2.7 Writing R Scripts

Computational analyses are conducted by writing a series of instructions to the computer on how to import data, modify data, run algorithms for analyzing the data, and then report the results of those analyses. These instructions take the form of R scripts that are typically entered into a file, which contains all the scripts related to a single analysis or computation. These R script files have the suffix.R and are stored in your project directory.

To successfully conduct such analyses, you need to learn the form and function of the scripts that R can process. As indicated above, a key reason for using a free, open-source software, like R, is the community support and resources typically found for such software. A simple Internet search with keywords “R coding lesson” should provide hundreds of high-quality resources. We recommend swirl (► https://swirlstats.com/), which teaches you R programming by offering simple and useful lessons. This package helps the user become experienced at working with R’s command-based interface and can be downloaded and used from the R console command line.

# Begin learning with Swirl swirl()

In addition to online tutorials and code lessons, there are many free e-books describing both introductory and advanced usage of R and RStudio. A good archive for textbooks is available at the CRAN website (► https://www.r-project.org/other-docs.html). We highly recommend the book R for Data Science (Wickham & Grolemund, 2016). As we continue with this chapter and the textbook, we assume that you have studied the basics of using R and are comfortable with the language. We now turn our attention to overcoming the various challenges you might encounter, while writing R scripts and when using the SEMinR package.

2.8 How to Find Help in RStudio

Due to the complexity of a programming language – and the almost endless number of software libraries that can be installed adding to the functions and resources available to you – it can become difficult to keep track of how functions are called, what arguments they take, and what output they provide. Packages have a range of files that are designed to document and demonstrate the use of the functions they provide. These files take the form of R documentation, vignettes, and demonstration files. In this section, we discuss how to access information on using a function by inspecting these documents.

All packages submitted to CRAN are required to have sufficient documentation to describe the functions they add to your software library. For each function, there should be a matching R documentation file that can be accessed. R documentation describes the purpose, input, implementation, and output of a function and provides examples applying the syntax. The contents of an R document are described in ◘ Table 2.3. This documentation can be accessed in the help tab in the lower right window of the RStudio IDE. Help topics and functions can be searched for in the search field of the help window or from the command line in the console window using the ? operator. For example, we can search for help on the read.csv() function by typing the following into the console window in RStudio:

# Searching for help using the ? operator ?read.csv

In ◘ Fig. 2.6, we can see an excerpt of the contents of the R documentation for read.csv(). This information will be displayed in the help tab in the lower right window of the RStudio IDE and provides us with details on the purpose, arguments, and usage of the function and a demonstration example. When encountering a new function or an error, the R documentation is the first place to look in. For a full list of available R documentation topics for a package, click on the Packages tab in the bottom right window and select the package name (highlighted in blue). The Help tab will then open with the full list of documentation available for that package.

Fig. 2.6
A window depicts the data input details from the Help menu. It explains the description and the usage.

R documentation for the read.csv() function. (Source: authors’ screenshot from RStudio)

Another very important document to consult for help using a package or function is the vignette. Vignettes are designed as an all-purpose user’s guide for the package – they describe the problem that the package seeks to solve and how it is used. This document usually describes the functioning of the package in detail and provides examples and demonstrations of the problems and solutions. You can access a list of vignettes installed by calling the vignette() function. This will output a list of available vignettes to an R vignette tab in the top left window of RStudio. You can then run vignette(“SEMinR”) to access a particular vignette – in this case, the vignette for the package SEMinR (◘ Fig. 2.7).

# Check all vignettes available in R vignette() # Load the SEMinR vignette vignette(“SEMinR”)

Fig. 2.7
An illustration depicts the seminar on Dec 31, 2020, package functions. It describes the introduction, setup, data, measurement model description, structural model description, model estimation, reporting the model estimation results, and reference.

The SEMinR vignette. (Source: authors’ screenshot from RStudio)

Another source of help is using the demonstration code that comes bundled with most R packages. These demonstration files typically include an example dataset and model to demonstrate the purpose of the package’s functions. To check all available demonstration files, use the demo() function. For the specific demonstration of the European Customer Satisfaction Index (ECSI) model (Eklöf & Westlund, 2002) in the SEMinR package, as originally presented by Tenenhaus, Esposito Vinzi, Chatelin, and Lauro (2005), use demo(“seminr-pls-ecsi”).

# Check all demos available in R demo() # Load the SEMinR ECSI demo demo(“seminr-pls-ecsi”)

A final invaluable source of help can be found by accessing the greater R community on platforms such as Stack Overflow (► https://stackoverflow.com/). These ask-and-answer forums put you in touch with seasoned veterans who can provide useful tips and other options for executing all your favorite R packages and functions. Members of these communities typically respond quickly and can provide excellent advice and solutions (◘ Fig. 2.8).

Fig. 2.8
A stack overflow online help window depicts the grouping functions and applies family. It has a login, sign up and ask question tabs. The overflow blog features appear on the right side of the window.

Finding help online using Stack Overflow. (Source: authors’ screenshot from ► https://stackoverflow.com/)

Summary

In this chapter, we introduced the R statistical programming language and its popular development environment, RStudio. You should now be familiar with the layout and functionality of RStudio, creating, downloading, and managing projects and RScript files. If you encounter a bug or a function that is unfamiliar, you should now have the requisite tools (and knowledge) for seeking out appropriate help. We strongly recommend the careful study of an introductory program to learning the R language. We also recommend the swirl package for learning R and provide some ideas to assist you in finding supplementary resources and gaining access to useful material.

Exercise

In this chapter, we recommend the use of the swirl package to learn basic coding concepts and become familiar with the popular functions in R (see ► Sects. 2.6 and 2.7 on installing and loading swirl). Please complete the following lessons in the swirl package:

  1. 1.

    Basic building blocks

  2. 2.

    Workspaces and files

  3. 3.

    Sequences of numbers

  4. 4.

    Vectors

  5. 5.

    Missing values

  6. 6.

    Subsetting vectors

  7. 7.

    Matrices and data frames

  8. 8.

    Logic

  9. 9.

    Functions