Skip to main content

Exploratory Data Analysis

  • 676 Accesses

Part of the Springer Series in the Data Sciences book series (SSDS)

Abstract

Smart phones are a modern wonder that allow society to stay connected and to enhance interactive experiences with the world around. Each interaction between a user and a phone is dependent on a sophisticated array of sensors.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-71352-2_6
  • Chapter length: 30 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-71352-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   59.99
Price excludes VAT (USA)
Hardcover Book
USD   79.99
Price excludes VAT (USA)
Figure 6.1:
Figure 6.2:
Figure 6.3:
Figure 6.4:
Figure 6.5:
Figure 6.6:
Figure 6.7:
Figure 6.8:
Figure 6.9:
Figure 6.10:
Figure 6.11:
Figure 6.12:
Figure 6.13:
Figure 6.14:
Figure 6.15:
Figure 6.16:
Figure 6.17:
Figure 6.18:
Figure 6.19:

Notes

  1. 1.

    There are many graphs presented in this chapter. To follow along, download the DIYs repository from Github (https://github.com/DataScienceForPublicPolicy/diys). The R Markdown file for this section is diy-ch06-visuals.Rmd.*

  2. 2.

    The designation of a region depends on classification system. The U.S. Census Bureau, for example, has regional divisions and sub-divisions.

  3. 3.

    With ggplot2, it is also possible to supply the unaggregated data using stat_count.

  4. 4.

    The NNBS dataset is a General Household Survey produced by the Nigerian National Bureau of Statistics in collaboration with the World Bank to measure (Nigeria National Bureau of Statistics 2019). The example is drawn from survey questions relating to banking access in file “sect4a1_plantingw4”. The EU LFS is the European Union’s Labour Force Survey, which is conducted by national statistical agencies of EU member countries and maintained by Eurostat (Eurostat 2020). The ACS dataset is the American Community Survey, administered by the U.S. Census Bureau (U.S. Census Bureau2018a). The NYC 311 SR dataset contains complaints and service requests made to the City of New York (NYC Department of Information Technology and Telecommunication 2020).

  5. 5.

    Inf values are not truly missing values, but can prove to be problematic. We include these values for awareness.

  6. 6.

    Note this applies not only to the logical values from missing value functions but also to any vector with NA values.

  7. 7.

    Many functions have the ability to ignore NA values. When in doubt, check the Help section for documentation.

  8. 8.

    Consider experimenting with the pct_miss parameter to understand the trade-offs.

  9. 9.

    Some series follow a multiplicative formulation. For simplicity, we focus on the additive case.

  10. 10.

    There are challenges with seasonal adjustment, however. The process of decomposing a time series can be subjective and requires analyst judgment. There does not exist a universal definition of what truly constitutes trend or seasonality. Ultimately, whether a series is “well-adjusted” is dependent on trust in the process.

  11. 11.

    All variables should be numeric values.

  12. 12.

    Economic numbers are published as vintages, meaning that a given quarter’s data will be revised each time a new release is made available. The data is “real-time” as the Philadelphia Fed archives the data based on each vintage so that the history of an estimate can be traced.

  13. 13.

    We will revisit hierarchical clustering in Chapter 11.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeffrey C. Chen .

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Chen, J.C., Rubin, E.A., Cornwall, G.J. (2021). Exploratory Data Analysis. In: Data Science for Public Policy. Springer Series in the Data Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-71352-2_6

Download citation