Abstract
Smart phones are a modern wonder that allow society to stay connected and to enhance interactive experiences with the world around. Each interaction between a user and a phone is dependent on a sophisticated array of sensors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
There are many graphs presented in this chapter. To follow along, download the DIYs repository from Github (https://github.com/DataScienceForPublicPolicy/diys). The R Markdown file for this section is diy-ch06-visuals.Rmd.*
- 2.
The designation of a region depends on classification system. The U.S. Census Bureau, for example, has regional divisions and sub-divisions.
- 3.
With ggplot2, it is also possible to supply the unaggregated data using stat_count.
- 4.
The NNBS dataset is a General Household Survey produced by the Nigerian National Bureau of Statistics in collaboration with the World Bank to measure (Nigeria National Bureau of Statistics 2019). The example is drawn from survey questions relating to banking access in file “sect4a1_plantingw4”. The EU LFS is the European Union’s Labour Force Survey, which is conducted by national statistical agencies of EU member countries and maintained by Eurostat (Eurostat 2020). The ACS dataset is the American Community Survey, administered by the U.S. Census Bureau (U.S. Census Bureau2018a). The NYC 311 SR dataset contains complaints and service requests made to the City of New York (NYC Department of Information Technology and Telecommunication 2020).
- 5.
Inf values are not truly missing values, but can prove to be problematic. We include these values for awareness.
- 6.
Note this applies not only to the logical values from missing value functions but also to any vector with NA values.
- 7.
Many functions have the ability to ignore NA values. When in doubt, check the Help section for documentation.
- 8.
Consider experimenting with the pct_miss parameter to understand the trade-offs.
- 9.
Some series follow a multiplicative formulation. For simplicity, we focus on the additive case.
- 10.
There are challenges with seasonal adjustment, however. The process of decomposing a time series can be subjective and requires analyst judgment. There does not exist a universal definition of what truly constitutes trend or seasonality. Ultimately, whether a series is “well-adjusted” is dependent on trust in the process.
- 11.
All variables should be numeric values.
- 12.
Economic numbers are published as vintages, meaning that a given quarter’s data will be revised each time a new release is made available. The data is “real-time” as the Philadelphia Fed archives the data based on each vintage so that the history of an estimate can be traced.
- 13.
We will revisit hierarchical clustering in Chapter 11.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Chen, J.C., Rubin, E.A., Cornwall, G.J. (2021). Exploratory Data Analysis. In: Data Science for Public Policy. Springer Series in the Data Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-71352-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-71352-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71351-5
Online ISBN: 978-3-030-71352-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)