Local Search for K-medians and Facility Location

This is an excerpt from the content

Keywords and Synonyms

k-Medians; k-Means; k-Medioids; Facility location; Point location; Warehouse location; Clustering     

Problem Definition

Clustering is a form of unsupervised learning, where the goal is to “learn” useful patterns in a data set \( { \mathcal{D} } \) of size n. It can also be thought of as a data compression scheme where a large data set is represented using a smaller collection of “representatives”. Such a scheme is characterized by specifying the following:

  1. distance metric \( { \mathbf{d} } \) between items in the data set. This metric should satisfy the triangle inequality: \( { \mathbf{d}(i,j) \le \mathbf{d}(j,k) + \mathbf{d}(k,i) } \) for any three items \( { i,j,k \in \mathcal{D} } \). In addition, \( { \mathbf{d}(i,j) = \mathbf{d}(j,i) } \) for all \( { i,j \in \mathcal{S} } \) and \( { \mathbf{d}(i,i) = 0 } \). Intuitively, if the distance between two items is small