A Spectral Clustering-Based Dataset Structure Analysis and OutlierDetection Progress

  • Lin Hai
  • Zhu Qingsheng
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 154)


A dataset structure analysis and outlier detection progress is proposed in the paper. The progress is designed to process those datasets, which their records’ data can be accessed but their data space structuresare not known. The proposed progress is on the basis of spectral clustering algorithm, which the number of clusters of the dataset is needed. But if the number is not given, the proposed progress first apply some certain clustering algorithm which does not need the number to cluster the dataset approximately to get a approximation of the number of clusters. Then the approximation is used to get the boundary of the number of clusters. The third step is to assign different index to each value within the to obtain the optimized result of the clustering and the number of clusters. After that the LOF algorithm is applied to find those records, which have the largest possibility to be outliers.


Gaussian Mixture Model Outlier Detection Spectral Cluster Local Outlier Local Outlier Factor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Yu D., Sheikholeslami G., Zang: A find out: finding outliers in very large datasets. Knowledge and Information System (2002) 387–412Google Scholar
  2. 2.
    Michael J. A. Berry, Gordon Linoff: Data Mining Techniques For marketing, Sales and Customer Support. John Willey & Sons, Inc (1996)Google Scholar
  3. 3.
    Usama M. Fayyad, Gregory Piatesky-Shapiro, Padhraic Smuth and Ramasamy Uthurusamy: Advances in Knowledge Discovery and Data Mining. AAAI Press (1996)Google Scholar
  4. 4.
    RamzeRezaee, B.P.F. Lelieveldt, J.H.C Reiber: A new cluster validity index for the fuzzy c-mean. Pattern Recognition Letters, 19 (1998) 237–246Google Scholar
  5. 5.
    J. Han, M. Kamber: Data mining, Concepts and Techniques. San Francisco: Morgan Kaufmann (2001)Google Scholar
  6. 6.
    V. Barnett and T. Lewis: Outliers in Statistical Data. John Wiley & Sons (1994)Google Scholar
  7. 7.
    K. Yamanishi, J. Takeuchi, G. Williams, and P. Milne: On-Line Unsupervised Learning Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms. Proc. Int’l Conf. Knowledge Discovery and Data Mining (KDD ’00) 250–254Google Scholar
  8. 8.
    K. Yamanishi, J. Takeuchi: Discovering Outlier Filtering Rules from Unlabeled Data. Proc. Int’l Conf. Knowledge Discovery and Data Mining (KDD ’01) 389–394Google Scholar
  9. 9.
    E. Knorr, R. Ng: Algorithms for Mining Distance-Based Outliers in Large Datasets. Proc. Int’l Conf. Very Large Databases (VLDB ’98) 392–403Google Scholar
  10. 10.
    E. Knorr, R. Ng, and V. Tucakov: Distance-Based Outlier: Algorithms and Applications. VLDB J. (2000), vol. 8, nos. 3–4, 237–253Google Scholar
  11. 11.
    S. Ramaswamy, R. Rastogi, and K. Shim: Efficient Algorithms for Mining Outliers from Large Data Sets. Proc. ACM Int’l Conf. Managment of Data (SIGMOD ’00) 427–438Google Scholar
  12. 12.
    A. Arning, R. Aggarwal, and P. Raghavan: A Linear Method for Deviation Detection in Large Databases. Proc. Int’l Conf. Knowledge Discovery and Data Mining (KDD ’96) 164–169Google Scholar
  13. 13.
    C.C. Aggarwal, P.S. Yu: Outlier Detection for High Dimensional Data. Proc. ACM Int’l Conf. Managment of Data (SIGMOD ’01) 37–46Google Scholar
  14. 14.
    D. Yu, G. Sheikholeslami, and A. Zhang: Findout: Finding Outliers in Very Large Datasets. Knowledge and Information Systems (2002), vol. 4, no. 3, 387–412Google Scholar
  15. 15.
    Z.R. Struzik, A. Siebes: Outliers Detection and Localization with Wavelet Based Multifractal Formalism. Technical report, CWI, Amsterdam, INS-R0008 (2000)Google Scholar
  16. 16.
    M.M. Breunig, H. Kriegel, R.T. Ng, and J. Sander. LOF: Identifying Density-Based Local Outliers. Proc. ACM Int’l Conf. Managment of Data (SIGMOD ’00) 93–104Google Scholar
  17. 17.
    W. Jin, A.K.H. Tung, and J. Han: Mining Top-n Local Outliers in Large Databases. Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD ’01) 293–298Google Scholar
  18. 18.
    Chung, F.: Spectral Graph Theory. CBMS Regional Conference Series, vol.92, Conference Board of the Mathematical Sciences, Washington (1997)Google Scholar
  19. 19.
    Neter, J., Wasserman, W., Whitmore, G.A: Applied Statistics. Allyn and Bacon (1992)Google Scholar
  20. 20.
    S. Ray, R.H. Turi: Determination of number of clusters in K-means clustering and application in colour image segmentation. the 4th International Conference on Advances in Pattern Recognition and Digital Techniques (ICAPRDTÕ99), Calcutta, India, 27–29 (1999)Google Scholar
  21. 21.
    JudongShen, Shing I. Chang, E. Stanley Lee, Youping Deng, and Susan J. Brown: Determination of cluster number in clustering microarray data. Applied Mathematics and Computation vol. 169 (2005) 1172–1185Google Scholar

Copyright information

© Springer-Verlag London Limited 2012

Authors and Affiliations

  1. 1.College of Computer ScienceChongqing UniversityChongqingChina

Personalised recommendations