Advertisement

Temporal Representation in Spike Detection of Sparse Personal Identity Streams

  • Clifton Phua
  • Vincent Lee
  • Ross Gayler
  • Kate Smith
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3917)

Abstract

Identity crime has increased enormously over the recent years. Spike detection is important because it highlights sudden and sharp rises in intensity relative to the current identity attribute value (which can be indicative of abuse). This paper proposes the new spike analysis framework for monitoring sparse personal identity streams. For each identity example, it detects spikes in single attribute values and integrates multiple spikes from different attributes to produce a numeric suspicion score. Although only temporal representation is examined here, experimental results on synthetic and real credit applications reveal some conditions on which the framework will perform well.

Keywords

Discrete Wavelet Transform Synthetic Data Exponentially Weight Move Average Temporal Representation Stream Mining 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arasu, A., Babcock, B., Babu, S., Datar, M., Ito, K., Nishizawa, I., Rosenstein, J., Widom, J.: STREAM: The Stanford Stream Data Manager Demonstration description - short overview of system status and plans. In: Proceedings of SIGMOD 2003 (2003)Google Scholar
  2. 2.
    Balakrishnan, H., Balazinska, M., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Galvez, E., Salz, J., Stonebraker, M., Tatbul, N., Tibbetts, R., Zdonik, S.: Retrospective on Aurora. VLDB Journal 13(4), 370–383 (2004)CrossRefGoogle Scholar
  3. 3.
    Caruana, R., Niculescu-Mizil, A.: Data Mining in Metric Space: An Empirical Analysis of Supervised Learning Performance Criteria. In: Proceedings of SIGKD 2004, pp. 69–78 (2004)Google Scholar
  4. 4.
    Christen, P.: Probabilistic Data Generation for Deduplication and Data Linkage. In: Gallagher, M., Hogan, J.P., Maire, F. (eds.) IDEAL 2005. LNCS, vol. 3578, pp. 109–116. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  5. 5.
    Cook, M.: Fraud and ID Theft – The Lowdown on Fraud Rings. In: Collections and Credit Risk 10 (2005)Google Scholar
  6. 6.
    Fawcett, T., Provost, F.: Activity Monitoring: Noticing Interesting Changes in Behaviour. In: Proceedings of SIGKD 1999, pp. 53–62 (1999)Google Scholar
  7. 7.
    Goldenberg, A., Shmueli, G., Caruana, R.: Using Grocery Sales Data for the Detection of Bio-Terrorist Attacks. In: Statistical Medicine (submitted, 2002)Google Scholar
  8. 8.
    Keogh, E., Chu, S., Hart, D., Pazzani, M.: Segmenting Time Series: A Survey and Novel Approach. In: Last, M., Kandel, A., Horst, B. (eds.) Data Mining in Time Series Databases, World Scientific, Singapore (2004)Google Scholar
  9. 9.
    Kleinberg, J.: Bursty and Hierarchical Structure in Streams. In: Proceedings of SIGKD 2002, pp. 91–101 (2002)Google Scholar
  10. 10.
    Kleinberg, J.: Temporal Dynamics of On-Line Information Streams. In: Garofalakis, M., Gehrke, J., Rastogi, R. (eds.) Data Stream Management: Processing High-Speed Data Streams, Springer, Heidelberg (2005)Google Scholar
  11. 11.
    Montgomery, D.: Introduction to Statistical Quality Control, 4th edn. John Wiley and Sons Inc., ChichesterGoogle Scholar
  12. 12.
    Percival, D., Walden, A.: Wavelet Methods for Time Series Analysis (WMTSA). Cambridge University Press, Cambridge (2000)CrossRefMATHGoogle Scholar
  13. 13.
    Phua, C., Lee, V., Gayler, R., Smith, K.: A Comprehensive Survey of Data Mining-based Fraud Detection Research. Artificial Intelligence Review (submitted)Google Scholar
  14. 14.
    Phua, C., Gayler, R., Lee, V., Smith, K.: On the Approximate Communal Fraud Scoring of Credit Applications. In: Proceedings of Credit Scoring and Credit Control (2005)Google Scholar
  15. 15.
    Roberts, S.: Control-Charts-Tests based on Geometric Moving Averages. In Technometrics, 1, 239-250Google Scholar
  16. 16.
    Wong, W.: Data Mining for Early Disease Outbreak Detection. PhD Thesis, Carnegie Mellon University (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Clifton Phua
    • 1
  • Vincent Lee
    • 1
  • Ross Gayler
    • 2
  • Kate Smith
    • 1
  1. 1.Clayton School of Information TechnologyMonash UniversityMelbourneAustralia
  2. 2.Baycorp AdvantageMelbourneAustralia

Personalised recommendations