Skip to main content

Cluster Analysis

  • Chapter
  • First Online:
Data Science for Public Policy

Abstract

Precision anything is the domain of data science, since it relies on identifying patterns or similarities within data. Precision medicine, for example, matches patients to custom-fit medical interventions, based on the patient’s realized affliction or risk profile. Precision marketing matches individuals to information that will change behavior, like voting for a specific candidate or buying a particular brand of shoes. Presenting an advertisement for bathing suits does not make much sense for a consumer in Antarctica. Similarly, different people along the US political spectrum subscribe to different positions on gun ownership, reproductive rights, among other social issues. There is value in targeting; and more efficient targeting is better.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 69.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We illustrate the silhouette method in the following DIY section.

  2. 2.

    This later was revised to only Arlington, VA due to local politics in New York.

  3. 3.

    For simplicity, we define online tech industries using NAICS codes 5182, 5112, 5179, 5415, 5417, and 454111 although we recognize this may exclude sub-industries that are rapidly growing in importance in tech.

  4. 4.

    Hierarchical clustering is technically comprised divisive and agglomerative clustering. The former is a top-down approach, splitting a sample into smaller clusters until each observation is a singleton—reminiscent of decision tree learning. Agglomerative clustering is a bottom-up approach, grouping together observations. Both algorithms are greedy, meaning they make the locally optimal splitting or grouping decision in each iteration.

  5. 5.

    The BLS does not consider QCEW to be a time series, but it contains useful information if treated as a time series.

  6. 6.

    For ease of analysis, the authors have pre-processed the data. First, the data aggregate monthly records into average quarterly records. Secondly, the data were also seasonally adjusted (SA), meaning that normal year-to-year cycles have been extracted from the data leaving only trend and noise.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeffrey C. Chen .

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Chen, J.C., Rubin, E.A., Cornwall, G.J. (2021). Cluster Analysis. In: Data Science for Public Policy. Springer Series in the Data Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-71352-2_11

Download citation

Publish with us

Policies and ethics