Advertisement

Data Wrangling with R

  • Bradley C. Boehmke, Ph.D.

Part of the Use R! book series (USE R)

Table of contents

  1. Front Matter
    Pages i-xii
  2. Introduction

    1. Front Matter
      Pages 1-2
    2. Bradley C. Boehmke
      Pages 3-5
    3. Bradley C. Boehmke
      Pages 7-9
    4. Bradley C. Boehmke
      Pages 11-27
  3. Working with Different Types of Data in R

    1. Front Matter
      Pages 29-29
    2. Bradley C. Boehmke
      Pages 31-40
    3. Bradley C. Boehmke
      Pages 41-54
    4. Bradley C. Boehmke
      Pages 55-66
    5. Bradley C. Boehmke
      Pages 67-69
    6. Bradley C. Boehmke
      Pages 71-78
  4. Managing Data Structures in R

    1. Front Matter
      Pages 79-79
    2. Bradley C. Boehmke
      Pages 81-83
    3. Bradley C. Boehmke
      Pages 85-90
    4. Bradley C. Boehmke
      Pages 91-97
    5. Bradley C. Boehmke
      Pages 99-104
    6. Bradley C. Boehmke
      Pages 105-112
    7. Bradley C. Boehmke
      Pages 113-116
  5. Importing, Scraping, and Exporting Data with R

    1. Front Matter
      Pages 117-117
    2. Bradley C. Boehmke
      Pages 119-128
    3. Bradley C. Boehmke
      Pages 129-162
    4. Bradley C. Boehmke
      Pages 163-169
  6. Creating Efficient and Readable Code in R

    1. Front Matter
      Pages 171-172
    2. Bradley C. Boehmke
      Pages 173-181
    3. Bradley C. Boehmke
      Pages 183-197
    4. Bradley C. Boehmke
      Pages 199-207
  7. Shaping and Transforming Your Data with R

    1. Front Matter
      Pages 209-209
    2. Bradley C. Boehmke
      Pages 211-218
    3. Bradley C. Boehmke
      Pages 219-232
  8. Back Matter
    Pages 233-238

About this book

Introduction

This guide for practicing statisticians, data scientists, and R users and programmers will teach the essentials of preprocessing: data leveraging the R programming language to easily and quickly turn noisy data into usable pieces of information. Data wrangling, which is also commonly referred to as data munging, transformation, manipulation, janitor work, etc., can be a painstakingly laborious process. Roughly 80% of data analysis is spent on cleaning and preparing data; however, being a prerequisite to the rest of the data analysis workflow (visualization, analysis, reporting), it is essential that one become fluent and efficient in data wrangling techniques.

This book will guide the user through the data wrangling process via a step-by-step tutorial approach and provide a solid foundation working with data in R. The author's goal is to teach the user how to easily wrangle data in order to spend more time on understanding the content of the data. By the end of the book, the user will have learned: 

  • How to work with different types of data such as numerics, characters, regular expressions, factors, and dates
  • The difference between different data structures and how to create, add additional components to, and subset each data structure
  • How to acquire and parse data from locations previously inaccessible
  • How to develop functions and use loop control structures to reduce code redundancy
  • How to use pipe operators to simplify code and make it more readable
  • How to reshape the layout of data and manipulate, summarize, and join data sets

In essence, the user will have the data wrangling toolbox required for modern day data analysis.

Brad Boehmke, Ph.D., is an Operations Research Analyst at Headquarters Air Force Materiel Command, Studies and Analyses Division. He is also Assistant Professor in the Operational Sciences Department at the Air Force Institute of Technology. Dr. Boehmke's research interests are in the areas of cost analysis, economic modeling, decision analysis, and developing applied modeling applications through the R statistical language.

Keywords

R data wrangling data structures dplyr tidyr importing scraping exporting coding data frames data matrix data analysis programming lubridate stringr PCRE fuzzy string curl/rvest xml2 plyr

Authors and affiliations

  • Bradley C. Boehmke, Ph.D.
    • 1
  1. 1.Air Force Institute of TechnologyDaytonUSA

Bibliographic information

  • DOI https://doi.org/10.1007/978-3-319-45599-0
  • Copyright Information Springer International Publishing Switzerland 2016
  • Publisher Name Springer, Cham
  • eBook Packages Mathematics and Statistics
  • Print ISBN 978-3-319-45598-3
  • Online ISBN 978-3-319-45599-0
  • Series Print ISSN 2197-5736
  • Series Online ISSN 2197-5744
  • Buy this book on publisher's site