Skip to main content

Cluster Analysis

  • 719 Accesses

Part of the Springer Series in the Data Sciences book series (SSDS)


Precision anything is the domain of data science, since it relies on identifying patterns or similarities within data. Precision medicine, for example, matches patients to custom-fit medical interventions, based on the patient’s realized affliction or risk profile. Precision marketing matches individuals to information that will change behavior, like voting for a specific candidate or buying a particular brand of shoes. Presenting an advertisement for bathing suits does not make much sense for a consumer in Antarctica. Similarly, different people along the US political spectrum subscribe to different positions on gun ownership, reproductive rights, among other social issues. There is value in targeting; and more efficient targeting is better.

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-71352-2_11
  • Chapter length: 19 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   44.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-71352-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   59.99
Price excludes VAT (USA)
Hardcover Book
USD   79.99
Price excludes VAT (USA)
Figure 11.1:
Figure 11.2:
Figure 11.3:
Figure 11.4:
Figure 11.5:
Figure 11.6:
Figure 11.7:
Figure 11.8:
Figure 11.9:
Figure 11.10:


  1. 1.

    We illustrate the silhouette method in the following DIY section.

  2. 2.

    This later was revised to only Arlington, VA due to local politics in New York.

  3. 3.

    For simplicity, we define online tech industries using NAICS codes 5182, 5112, 5179, 5415, 5417, and 454111 although we recognize this may exclude sub-industries that are rapidly growing in importance in tech.

  4. 4.

    Hierarchical clustering is technically comprised divisive and agglomerative clustering. The former is a top-down approach, splitting a sample into smaller clusters until each observation is a singleton—reminiscent of decision tree learning. Agglomerative clustering is a bottom-up approach, grouping together observations. Both algorithms are greedy, meaning they make the locally optimal splitting or grouping decision in each iteration.

  5. 5.

    The BLS does not consider QCEW to be a time series, but it contains useful information if treated as a time series.

  6. 6.

    For ease of analysis, the authors have pre-processed the data. First, the data aggregate monthly records into average quarterly records. Secondly, the data were also seasonally adjusted (SA), meaning that normal year-to-year cycles have been extracted from the data leaving only trend and noise.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jeffrey C. Chen .

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Chen, J.C., Rubin, E.A., Cornwall, G.J. (2021). Cluster Analysis. In: Data Science for Public Policy. Springer Series in the Data Sciences. Springer, Cham.

Download citation