# An Experimental Study of the *k*-MXT Algorithm with Applications to Clustering Geo-Tagged Data

## Abstract

We consider a *graph fragmentation process* which can be described as follows. Each vertex *v* selects the *k* adjacent vertices which have the largest number of common of neighbours. For each selected neighbour *u*, we retain the edge (*v*, *u*) to form a the subgraph graph *S* of the input graph. The object of interest are the components of *S*, the *k*-Max-Triangle-Neighbour (*k*-MXT) subgraph, and the vertex clusters they produce in the original graph.

We study the application of this process to clustering in the planted partition model, and on the geometric disk graph formed from geo-tagged photographic data downloaded from Flickr.

In the planted partition model, there are \(\ell \) numbers of partitions, or subgraphs, which are connected densely within each partition but sparser between partitions. The objective is to recover these hidden partitions. We study the case of the planted partition model based on the random graph \(G_{n,p}\) with additional edge probability *q* within the partitions. Theoretical and experimental results show that the 2-MXT algorithm can recover the partitions for any \(q/p>0\) constant provided the density of triangles is high enough.

We apply the *k*-MXT algorithm experimentally to the problem of clustering geographical data, using London as an example. Given a dataset consisting of geographical coordinates extracted from photographs, we construct a disk graph by connecting every point to other points if and only if theirs distance is at most *d*. Our experimental results show that the *k*-MXT algorithm is able to produce clusters which are of comparable to popular clustering algorithms such as DBSCAN (see e.g. Fig. 5).

## References

- 1.Bollobas, B.: Random Graphs. Cambridge Studies in Advanced Mathematics, 2nd edn. Cambridge University Press, Cambridge (2001)CrossRefGoogle Scholar
- 2.Cheng, Y.: Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell.
**17**(8), 790–799 (1995)CrossRefGoogle Scholar - 3.Condon, A., Karp, R.M.: Algorithms for graph partitioning on the planted partition model. Random Struct. Algorithms
**18**(2), 116–140 (2001)MathSciNetCrossRefGoogle Scholar - 4.Crandall, D.J., Backstrom, L., Huttenlocher, D., Kleinberg, J.: Mapping the world’s photos. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, pp. 761–770. ACM, New York (2009)Google Scholar
- 5.Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise, pp. 226–231. AAAI Press (1996)Google Scholar
- 6.Frieze, A., Michal, K.: Introduction to Random Graphs. Cambridge University Press, Cambridge (2015)zbMATHGoogle Scholar
- 7.Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA
**99**, 7821–7826 (2002). 2001MathSciNetCrossRefGoogle Scholar - 8.Hubert, L., Arabie, P.: Comparing partitions. J. Classif.
**2**(1), 193–218 (1985)CrossRefGoogle Scholar - 9.Ciollaro, M., Wang, D.: Package: MeanShift. https://cran.r-project.org/web/packages/MeanShift/MeanShift.pdf. Accessed 2017
- 10.Hahsler, M., et al.: Package: dbscan. https://cran.r-project.org/web/packages/dbscan/dbscan.pdf. Accessed 2017
- 11.Various. Boost.Geometry. http://www.boost.org/doc/libs/1.61.0/libs/geometry/doc/html/index.html. Accessed 2017