Abstract
In this paper, we present a new clustering algorithm, called A \(\mathbf{Sta }\)ble Cluste\(\mathbf{r }\)ing by \(\mathbf{S }\)tatistically Finding \(\mathbf{C }\)enters \(\mathbf{a }\)nd \(\mathbf{N }\)oises (Star-Scan). Star-Scan is a density-based clustering algorithm that can find arbitrary shape clusters and resists to the noise in a dataset. It borrows the idea from Rodriguez’s Clustering by Fast Search and Find of Density Peaks (CFSFDP) that the cluster centers are characterized by the points with both higher density and farther distance to other centers than their neighbors. Different from CFSFDP, instead of manual operation, Star-Scan uses a statistical method, box plot, to select cluster centers automatically. Furthermore, due to inadequate selection of cluster centers in CFSFDP, we apply a merging post-process to the produced clusters to get stable and correct results. Finally, we also use box plot to filter out noises on each of final clusters to solve the problem of over-filtering in CFSFDP. We have demonstrated the good performance of Star-Scan algorithm on several synthetic datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Inuk, J., Jong, C.P., Sun, K.: piClust: a density based piRNA clustering algorithm. J. Comput. Biol. Chem. 50, 60–67 (2014)
Amineh, A., Ying, W.T., Hadi, S.: On density-based data streams clustering algorithms: a survey. J. Comput. Sci. Technol. 29(1), 116–141 (2014)
Xianchao, Z., Han, L., Xiaotong, Z., Xinyue, L.: Novel density-based clustering algorithms for uncertain data. In: The Twenty-Eighth AAAI Conference on Artificial Intelligence, pp. 2191–2197. AAAI Press (2014)
Levi, L., Jörg, S.: Semi-supervised density-based clustering. In: The Ninth IEEE International Conference on Data Mining, pp. 842–847. IEEE Computer Society (2009)
Son, T., Xiao, H., Nina, H., Claudia, P., Christian, B.: Active density-based clustering. In: IEEE 13th International Conference on Data Mining, pp. 508–517. IEEE Computer Society (2013)
Michael, B.E., Paul, T.S., Patrick, O.B., David, B.: Cluster analysis and display of genome-wide expression patterns. In: National Academy of Sciences of the United States of America (PNAS), pp. 14863–14868. HighWire Press (1998)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. CSUR 31, 264–323 (1999)
Jiawei, H., Micheline, K.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)
Hans-Peter, K., Peer, K., Jörg, S., Arthur, Z.: Density-based clustering. WIREs Data Min. Knowl. Discov. 1, 231–240 (2011)
Mihael, A., Markus, M.B., Hans-Peter, K., Jörg, S.: OPTICS: ordering points to identify the clustering structure. In: ACM SIGMOD International Conference on Management of Data, pp. 49–60. ACM Press, Philadelphia (1999)
Alexander, H., Daniel, K.: An efficient approach to clustering in large multimedia databases with noise. In: The Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), pp. 58–65. AAAI Press, New York (1998)
Hinneburg, A., Gabriel, H.-H.: DENCLUE 2.0: fast clustering based on kernel density estimation. In: Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds.) IDA 2007. LNCS, vol. 4723, pp. 70–80. Springer, Heidelberg (2007)
Martin, E., Hans-Peter, K., Jörg, S., Xiaowei, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: The Second International Conference on Knowledge Discovery and Data Mining (KDD-1996), pp. 226–231. AAAI Press, Portland, Oregon, USA (1996)
Alex, R., Alessabdro, L.: Clustering by fast search and find of density peaks. Science 344, 1492–1496 (2014)
Robert, D.M., Douglas, A.L., William, G.M.: Statistics: An Introduction. Duxbury Press, London (1994)
Junhao, G., Yufei, T.: DBSCAN revisited: mis-claim, un-fixability, and approximation. In: ACM SIGMOD International Conference on Management of Data (SIGMOD 2015), pp. 519–530. ACM Press, Melbourne, Victoria, Australia (2015)
Clustering Datasets. http://cs.joensuu.fi/sipu/datasets/
Chameleon Datasets. http://glaros.dtc.umn.edu/gkhome/cluto/cluto/download
Acknowledgments
This work was supported by National Natural Science Foundation of China under Grant No. 60773216, No. 60773217.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Yang, N., Liu, Q., Li, Y., Xiao, L., Liu, X. (2016). Star-Scan: A Stable Clustering by Statistically Finding Centers and Noises. In: Li, F., Shim, K., Zheng, K., Liu, G. (eds) Web Technologies and Applications. APWeb 2016. Lecture Notes in Computer Science(), vol 9931. Springer, Cham. https://doi.org/10.1007/978-3-319-45814-4_37
Download citation
DOI: https://doi.org/10.1007/978-3-319-45814-4_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45813-7
Online ISBN: 978-3-319-45814-4
eBook Packages: Computer ScienceComputer Science (R0)