Cluster Analysis

Chen, Jeffrey C.; Rubin, Edward A.; Cornwall, Gary J.

doi:10.1007/978-3-030-71352-2_11

Part of the book series: Springer Series in the Data Sciences ((SSDS))

1471 Accesses

Abstract

Precision anything is the domain of data science, since it relies on identifying patterns or similarities within data. Precision medicine, for example, matches patients to custom-fit medical interventions, based on the patient’s realized affliction or risk profile. Precision marketing matches individuals to information that will change behavior, like voting for a specific candidate or buying a particular brand of shoes. Presenting an advertisement for bathing suits does not make much sense for a consumer in Antarctica. Similarly, different people along the US political spectrum subscribe to different positions on gun ownership, reproductive rights, among other social issues. There is value in targeting; and more efficient targeting is better.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We illustrate the silhouette method in the following DIY section.
2.
This later was revised to only Arlington, VA due to local politics in New York.
3.
For simplicity, we define online tech industries using NAICS codes 5182, 5112, 5179, 5415, 5417, and 454111 although we recognize this may exclude sub-industries that are rapidly growing in importance in tech.
4.
Hierarchical clustering is technically comprised divisive and agglomerative clustering. The former is a top-down approach, splitting a sample into smaller clusters until each observation is a singleton—reminiscent of decision tree learning. Agglomerative clustering is a bottom-up approach, grouping together observations. Both algorithms are greedy, meaning they make the locally optimal splitting or grouping decision in each iteration.
5.
The BLS does not consider QCEW to be a time series, but it contains useful information if treated as a time series.
6.
For ease of analysis, the authors have pre-processed the data. First, the data aggregate monthly records into average quarterly records. Secondly, the data were also seasonally adjusted (SA), meaning that normal year-to-year cycles have been extracted from the data leaving only trend and noise.

Author information

Authors and Affiliations

Bennett Institute for Public Policy, University of Cambridge, Cambridge, UK
Jeffrey C. Chen
Department of Economics, University of Oregon, Eugene, OR, USA
Edward A. Rubin
Department of Commerce, Bureau of Economic Analysis, Suitland, MD, USA
Gary J. Cornwall

Authors

Jeffrey C. Chen
View author publications
You can also search for this author in PubMed Google Scholar
Edward A. Rubin
View author publications
You can also search for this author in PubMed Google Scholar
Gary J. Cornwall
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jeffrey C. Chen .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chen, J.C., Rubin, E.A., Cornwall, G.J. (2021). Cluster Analysis. In: Data Science for Public Policy. Springer Series in the Data Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-71352-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-71352-2_11
Published: 01 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71351-5
Online ISBN: 978-3-030-71352-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics