Skip to main content

Exploratory Data Analysis

  • Chapter
  • First Online:
Data Science for Public Policy

Abstract

Smart phones are a modern wonder that allow society to stay connected and to enhance interactive experiences with the world around. Each interaction between a user and a phone is dependent on a sophisticated array of sensors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 69.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    There are many graphs presented in this chapter. To follow along, download the DIYs repository from Github (https://github.com/DataScienceForPublicPolicy/diys). The R Markdown file for this section is diy-ch06-visuals.Rmd.*

  2. 2.

    The designation of a region depends on classification system. The U.S. Census Bureau, for example, has regional divisions and sub-divisions.

  3. 3.

    With ggplot2, it is also possible to supply the unaggregated data using stat_count.

  4. 4.

    The NNBS dataset is a General Household Survey produced by the Nigerian National Bureau of Statistics in collaboration with the World Bank to measure (Nigeria National Bureau of Statistics 2019). The example is drawn from survey questions relating to banking access in file “sect4a1_plantingw4”. The EU LFS is the European Union’s Labour Force Survey, which is conducted by national statistical agencies of EU member countries and maintained by Eurostat (Eurostat 2020). The ACS dataset is the American Community Survey, administered by the U.S. Census Bureau (U.S. Census Bureau2018a). The NYC 311 SR dataset contains complaints and service requests made to the City of New York (NYC Department of Information Technology and Telecommunication 2020).

  5. 5.

    Inf values are not truly missing values, but can prove to be problematic. We include these values for awareness.

  6. 6.

    Note this applies not only to the logical values from missing value functions but also to any vector with NA values.

  7. 7.

    Many functions have the ability to ignore NA values. When in doubt, check the Help section for documentation.

  8. 8.

    Consider experimenting with the pct_miss parameter to understand the trade-offs.

  9. 9.

    Some series follow a multiplicative formulation. For simplicity, we focus on the additive case.

  10. 10.

    There are challenges with seasonal adjustment, however. The process of decomposing a time series can be subjective and requires analyst judgment. There does not exist a universal definition of what truly constitutes trend or seasonality. Ultimately, whether a series is “well-adjusted” is dependent on trust in the process.

  11. 11.

    All variables should be numeric values.

  12. 12.

    Economic numbers are published as vintages, meaning that a given quarter’s data will be revised each time a new release is made available. The data is “real-time” as the Philadelphia Fed archives the data based on each vintage so that the history of an estimate can be traced.

  13. 13.

    We will revisit hierarchical clustering in Chapter 11.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeffrey C. Chen .

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Chen, J.C., Rubin, E.A., Cornwall, G.J. (2021). Exploratory Data Analysis. In: Data Science for Public Policy. Springer Series in the Data Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-71352-2_6

Download citation

Publish with us

Policies and ethics