Skip to main content
  • Book
  • © 2016

Data Wrangling with R

  • Presents techniques that allow users to spend less time obtaining, cleaning, manipulating, and preprocessing data and more time visualizing, analyzing, and presenting data via a step-by-step tutorial approach

  • Includes a wide range of programming activities, from understanding basic data objects in R to writing functions, applying loops, and webscraping

  • Beneficial to all levels of R programmers: Beginner R programmers will gain a basic understanding of the functionality of R along with learning how to work with data using R, while intermediate and advanced R programmers will find the early chapters reiterating established knowledge and will learn newer and more efficient data wrangling techniques in the mid and later chapters

  • Covers the most recent data wrangling packages: dplyr, tidyr, httr, stringr, lubridate, readr, rvest, magrittr, xlsx, readxl, and others

  • Provides code examples and chapter exercises

  • Includes supplementary material: sn.pub/extras

Part of the book series: Use R! (USE R)

Buying options

eBook USD 64.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-45599-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book USD 84.99
Price excludes VAT (USA)

This is a preview of subscription content, access via your institution.

Table of contents (22 chapters)

  1. Front Matter

    Pages i-xii
  2. Introduction

    1. Front Matter

      Pages 1-2
    2. The Role of Data Wrangling

      • Bradley C. Boehmke
      Pages 3-5
    3. Introduction to R

      • Bradley C. Boehmke
      Pages 7-9
    4. The Basics

      • Bradley C. Boehmke
      Pages 11-27
  3. Working with Different Types of Data in R

    1. Front Matter

      Pages 29-29
    2. Dealing with Numbers

      • Bradley C. Boehmke
      Pages 31-40
    3. Dealing with Character Strings

      • Bradley C. Boehmke
      Pages 41-54
    4. Dealing with Regular Expressions

      • Bradley C. Boehmke
      Pages 55-66
    5. Dealing with Factors

      • Bradley C. Boehmke
      Pages 67-69
    6. Dealing with Dates

      • Bradley C. Boehmke
      Pages 71-78
  4. Managing Data Structures in R

    1. Front Matter

      Pages 79-79
    2. Data Structure Basics

      • Bradley C. Boehmke
      Pages 81-83
    3. Managing Vectors

      • Bradley C. Boehmke
      Pages 85-90
    4. Managing Lists

      • Bradley C. Boehmke
      Pages 91-97
    5. Managing Matrices

      • Bradley C. Boehmke
      Pages 99-104
    6. Managing Data Frames

      • Bradley C. Boehmke
      Pages 105-112
    7. Dealing with Missing Values

      • Bradley C. Boehmke
      Pages 113-116
  5. Importing, Scraping, and Exporting Data with R

    1. Front Matter

      Pages 117-117
    2. Importing Data

      • Bradley C. Boehmke
      Pages 119-128

About this book

This guide for practicing statisticians, data scientists, and R users and programmers will teach the essentials of preprocessing: data leveraging the R programming language to easily and quickly turn noisy data into usable pieces of information. Data wrangling, which is also commonly referred to as data munging, transformation, manipulation, janitor work, etc., can be a painstakingly laborious process. Roughly 80% of data analysis is spent on cleaning and preparing data; however, being a prerequisite to the rest of the data analysis workflow (visualization, analysis, reporting), it is essential that one become fluent and efficient in data wrangling techniques.

This book will guide the user through the data wrangling process via a step-by-step tutorial approach and provide a solid foundation working with data in R. The author's goal is to teach the user how to easily wrangle data in order to spend more time on understanding the content of the data. By the end of the book, the user will have learned: 

  • How to work with different types of data such as numerics, characters, regular expressions, factors, and dates
  • The difference between different data structures and how to create, add additional components to, and subset each data structure
  • How to acquire and parse data from locations previously inaccessible
  • How to develop functions and use loop control structures to reduce code redundancy
  • How to use pipe operators to simplify code and make it more readable
  • How to reshape the layout of data and manipulate, summarize, and join data sets

In essence, the user will have the data wrangling toolbox required for modern day data analysis.

Brad Boehmke, Ph.D., is an Operations Research Analyst at Headquarters Air Force Materiel Command, Studies and Analyses Division. He is also Assistant Professor in the Operational Sciences Department at the Air Force Institute of Technology. Dr. Boehmke's research interests are in the areas of cost analysis, economic modeling, decision analysis, and developing applied modeling applications through the R statistical language.

Keywords

  • R
  • data wrangling
  • data structures
  • dplyr
  • tidyr
  • importing
  • scraping
  • exporting
  • coding
  • data frames
  • data matrix
  • data analysis
  • programming
  • lubridate
  • stringr
  • PCRE
  • fuzzy string
  • curl/rvest
  • xml2
  • plyr

Authors and Affiliations

  • Air Force Institute of Technology, Dayton, USA

    Bradley C. Boehmke, Ph.D.

About the author

Brad Boehmke, Ph.D., is an Operations Research Analyst at Headquarters Air Force Materiel Command, Studies and Analyses Division.  He is also Assistant Professor in the Operational Sciences Department at the Air Force Institute of Technology.  Dr. Boehmke's research interests are in the areas of cost analysis, economic modeling, decision analysis, and developing applied modeling applications through the R statistical language.

Bibliographic Information

Buying options

eBook USD 64.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-45599-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book USD 84.99
Price excludes VAT (USA)