Advertisement

A Multi-graph Spectral Framework for Mining Multi-source Anomalies

  • Jing GaoEmail author
  • Nan Du
  • Wei Fan
  • Deepak Turaga
  • Srinivasan Parthasarathy
  • Jiawei Han
Chapter

Abstract

Anomaly detection refers to the task of detecting objects whose characteristics deviate significantly from the majority of the data [5]. It is widely used in a variety of domains, such as intrusion detection, fraud detection, and health monitoring. Today’s information explosion generates significant challenges for anomaly detection when there exist many large, distributed data repositories consisting of a variety of data sources and formats.

Keywords

Anomaly Detection Spectral Cluster Cosine Similarity Informative Gene Cosine Distance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Bach L (2012) The insulin-like growth factor system in kidney disease and hypertension. Curr Opin Nephrol Hypertens 21(1):86–91MathSciNetCrossRefGoogle Scholar
  2. 2.
    Bickel S, Scheffer T (2004) Multi-view clustering. In: Proceedings of the IEEE international conference on data mining (ICDM’04), pp 19–26Google Scholar
  3. 3.
    Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the annual conference on computational learning theory (COLT’98), pp 92–100Google Scholar
  4. 4.
    Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) Lof: identifying density-based local outliers. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’00), pp 93–104Google Scholar
  5. 5.
    Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: A survey. ACM Comput Surv 41(3):15:1–15:58Google Scholar
  6. 6.
    Dong G, Li J (1999) Efficient mining of emerging patterns: Discovering trends and differences. In: Proceedings of the the ACM SIGKDD international conference on knowledge discovery and data mining (KDD’99), pp 43–52Google Scholar
  7. 7.
    Edgar R, Domrachev M, Lash A (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30(1):207–210CrossRefGoogle Scholar
  8. 8.
    Eskin E (2000) Anomaly detection over noisy data using learned probability distributions. In: Proceedings of the international conference on machine learning (ICML’00), pp 255–262Google Scholar
  9. 9.
    Fan W, Miller M, Stolfo S, Lee W, Chan P (2001) Using artificial anomalies to detect unknown and known network intrusions. In: Proceedings of the IEEE international conference on data mining (ICDM’01), pp 123–130Google Scholar
  10. 10.
    Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of the international conference on machine learning (ICML’04), ACM, New York, NY, pp 281–288Google Scholar
  11. 11.
    Gao J, Liang F, Fan W, Wang C, Sun Y, Han J (2010) On community outliers and their efficient detection in information networks. In: Proceedings of the the ACM SIGKDD international conference on knowledge discovery and data mining (KDD’10), pp 813–822Google Scholar
  12. 12.
    Han J, Kamber M (2006) Data mining: Concepts and techniques, 2nd edn. Morgan Kaufmann, Los AltoszbMATHGoogle Scholar
  13. 13.
    Hart T, Gorry M, Hart P, Woodard A, Shihabi Z, Sandhu J, Shirts B, Xu L, Zhu H, Barmada M, Bleyer A (2002) Mutations of the UMOD gene are responsible for medullary cystic kidney disease 2 and familial juvenile hyperuricaemic nephropathy. J Med Genet 39(12):882–892CrossRefGoogle Scholar
  14. 14.
    Kang U, Meeder B, Faloutsos C (2011) Spectral analysis for billion-scale graphs: Discoveries and implementation. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining (PAKDD’11), pp 13–25Google Scholar
  15. 15.
    Khoa N, Chawla S (2010) Robust outlier detection using commute time and eigenspace embedding. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining (PAKDD’10), pp 422–434Google Scholar
  16. 16.
    Knorr EM, Ng RT, Tucakov V (2000) Distance-based outliers: Algorithms and applications. VLDB J 8(3–4):237–253CrossRefGoogle Scholar
  17. 17.
    Lehoucq R, Sorensen D, Yang C (1998) ARPACK users’ guide: Solution of large-scale eigenvalue problems with implicitly restarted arnoldi methods. SIAM, Philadelphia, PACrossRefGoogle Scholar
  18. 18.
    Liu F, Ting K, Zhou Z (2008) Isolation forest. In: Proceedings of the IEEE international conference on data mining (ICDM’08), pp 413–422Google Scholar
  19. 19.
    Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416MathSciNetCrossRefGoogle Scholar
  20. 20.
    Macaluso M, Cinti C, Russo G, Russo A, Giordano A (2003) pRb2/p130-E2F4/5-HDAC1-SUV39H1-p300 and pRb2/p130-E2F4/5-HDAC1-SUV39H1-DNMT1 multimolecular complexes mediate the transcription of estrogen receptor-alpha in breast cancer. Oncogene 22(23):3511–3517CrossRefGoogle Scholar
  21. 21.
    Markou M, Singh S (2003) Novelty detection: A review–part 1: statistical approaches. Signal Process 83(12):2481–2497zbMATHCrossRefGoogle Scholar
  22. 22.
    Mirza S, Sharma G, Parshad R, Srivastava A, Gupta S, Ralhan R (2010) Clinical significance of Stratifin, ERalpha and PR promoter methylation in tumor and serum DNA in Indian breast cancer patients. Clin Biochem 43(4–5):380–386CrossRefGoogle Scholar
  23. 23.
    Shekhar S, Lu C-T, Zhang P (2001) Detecting graph-based spatial outliers: Algorithms and applications (a summary of results). In: Proceedings of the the ACM SIGKDD international conference on knowledge discovery and data mining (KDD’01), pp 371–376Google Scholar
  24. 24.
    Song X, Wu M, Jermaine C, Ranka S (2007) Conditional anomaly detection. IEEE Trans Knowl Data Eng 19(5):631–645CrossRefGoogle Scholar
  25. 25.
    Strehl A, Ghosh J (2003) Cluster ensembles–a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617MathSciNetzbMATHGoogle Scholar
  26. 26.
    Sun J, Qu H, Chakrabarti D, Faloutsos C (2005) Neighborhood formation and anomaly detection in bipartite graphs. In: Proceedings of the IEEE international conference on data mining (ICDM’05), pp 418–425Google Scholar
  27. 27.
    Takashi M, Zhu Y, Nakano Y, Miyake K, Kato K (1992) Elevated levels of serum aldolase A in patients with renal cell carcinoma. Urol Res 20(4):307–311CrossRefGoogle Scholar
  28. 28.
    Wang X, Davidson I (2009) Discovering contexts and contextual outliers using random walks in graphs. In: Proceedings of the IEEE international conference on data mining (ICDM’09), pp 1034–1039Google Scholar
  29. 29.
    Yano M, Naito Z, Yokoyama M, Shiraki Y, Ishiwata T, Inokuchi M, Asano G (1999) Expression of hsp90 and cyclin D1 in human breast cancer. Cancer Lett 137(1):45–51CrossRefGoogle Scholar
  30. 30.
    Zhou D, Burges C (2007) Spectral clustering and transductive learning with multiple views. In: Proceedings of the international conference on machine learning (ICML’07), pp 1159–1166Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Jing Gao
    • 1
    Email author
  • Nan Du
    • 1
  • Wei Fan
    • 2
  • Deepak Turaga
    • 3
  • Srinivasan Parthasarathy
    • 3
  • Jiawei Han
    • 4
  1. 1.Computer Science and Engineering DepartmentState University of New York at BuffaloBuffaloUSA
  2. 2.Huawei Noah Ark’s LabShatinHong Kong
  3. 3.IBM T.J. Watson Research CenterYorktown HeightsUSA
  4. 4.Computer Science DepartmentUniversity of Illinois, Urbana-ChampaignUrbanaUSA

Personalised recommendations