Reliable Aggregation on Network Traffic for Web Based Knowledge Discovery

  • Shui YuEmail author
  • Simon James
  • Yonghong Tian
  • Wanchun Dou
Conference paper


The web is a rich resource for information discovery, as a result web mining is a hot topic. However, a reliable mining result depends on the reliability of the data set. For every single second, the web generate huge amount of data, such as web page requests, file transportation. The data reflect human behavior in the cyber space and therefore valuable for our analysis in various disciplines, e.g. social science, network security. How to deposit the data is a challenge. An usual strategy is to save the abstract of the data, such as using aggregation functions to preserve the features of the original data with much smaller space. A key problem, however is that such information can be distorted by the presence of illegitimate traffic, e.g. botnet recruitment scanning, DDoS attack traffic, etc. An important consideration in web related knowledge discovery then is the robustness of the aggregation method, which in turn may be affected by the reliability of network traffic data. In this chapter, we first present the methods of aggregation functions, and then we employe information distances to filter out anomaly data as a preparation for web data mining.


Aggregation Function Information Distance Ordered Weighted Average Legitimate User Hellinger Distance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Beliakov, G., A. Pradera, & T. Calvo 2007. Aggregation Functions: A Guide for Practitioners. Springer, Heidelberg, Berlin, New York.Google Scholar
  2. 2.
    Cooley, Robert Walker 2000. Web Usage Mining: Discovery and Application of Interestin Patterns from Web Data.Google Scholar
  3. 3.
    Cover, Thomas M., & Joy A. Thomas 2006. Elements of Information Theory. John Wiley & Sons.Google Scholar
  4. 4.
    Dai, Honghua, & James Liu 2008. Proceedings of the IEEE International Workshop on Reliability Issues of Knowledge discovery. IEEE Computer Society.Google Scholar
  5. 5.
    Dai, Honghua, James Liu, & Huan Liu 2006. Proceedings of the IEEE InternationalWorkshop on Reliability Issues of Knowledge discovery. IEEE Computer Society.Google Scholar
  6. 6.
    Dai, Honghua, James Liu, & Evgueni Smirnov 2010. Proceedings of the IEEE International Workshop on Reliability Issues of Knowledge discovery. IEEE Computer Society.Google Scholar
  7. 7.
    El-Atawy, Adel, Ehab Al-Shaer, Tung Tran, & Raouf Boutaba 2009. Adaptive Early Packet Filtering for Protecting Firewalls against DoS Attacks. In Proceedings of the INFOCOM.Google Scholar
  8. 8.
    Grabisch, M., J.-L. Marichal, R. Mesiar, & E. Pap 2009. Aggregation Functions. Cambridge University Press, Cambridge.zbMATHGoogle Scholar
  9. 9.
    Manavoglu, Eren, Dmitry Pavlov, & C. Lee Giles 2003. Probabilistic User Behavior Models. Data Mining, IEEE International Conference on, 0:203.Google Scholar
  10. 10.
    McLachlan, G J 1992. Discriminant analysis and statistical pattern recognition. Wiley-Interscience.Google Scholar
  11. 11.
    Moore, David, Colleen Shannon, Douglas J. Brown, Geoffrey M. Voelker, & Stefan Savage 2006. Inferring Internet denial-of-service activity. ACM Transactions on Computer Systems, 24(2):115–139.CrossRefGoogle Scholar
  12. 12.
    Peng, Tao, Christopher Leckie, & Kotagiri Ramamohanarao 2007. Survey of network-based defense mechanisms countering the DoS and DDoS problems. ACM Computing Survey, 39(1).Google Scholar
  13. 13.
    Srivastava, Jaideep, Robert Cooley, Mukund Deshpande, & Pang-Ning Tan 2000. Web usage mining: discovery and applications of usage patterns fromWeb data. SIGKDD Explor. Newsl., 1:12–23.CrossRefGoogle Scholar
  14. 14.
    Thing, Vrizlynn L. L., Morris Sloman, & Naranker Dulay 2007. A Survey of Bots Used for Distributed Denial of Service Attacks. In SEC, pages 229–240.Google Scholar
  15. 15.
    Torra, V., & Y. Narukawa 2007. Modeling Decisions. Information Fusion and Aggregation Operators. Springer, Berlin, Heidelberg.zbMATHCrossRefGoogle Scholar
  16. 16.
    Wang, Haining, Cheng Jin, & Kang G. Shin 2007. Defense against spoofed IP traffic using hop-count filtering. IEEE/ACM Transactions on Networking, 15(1):40–53.CrossRefGoogle Scholar
  17. 17.
    Yager, R.R. 1988. On ordered weighted averaging aggregation operators in multicriteria decision making. IEEE Transactions on Systems, Man and Cybernetics, 18:183–190.MathSciNetzbMATHCrossRefGoogle Scholar
  18. 18.
    Yager, R.R., & D. P. Filev 1999. Induced ordered weighted averaging operators. IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics, 20(2):141–150.Google Scholar
  19. 19.
    Yager, R. R., & G. Beliakov 2010. OWA operators in regression problems. IEEE Transactions on Fuzzy Systems, 18(1):106–113.CrossRefGoogle Scholar
  20. 20.
    Yu, Shui, Robin Doss, & Wanlei Zhou 2008. Information Theory Based Detection Against Network Behavior Mimicking DDoS Attacks. IEEE Communications Letters, 12(4):319–321.Google Scholar
  21. 21.
    Yu, Shui, Theerasak Thapngam, Jianwen Liu, Su Wei, & Wanlei Zhou 2009. Discriminating DDoS Flows from Flash Crowds Using Information Distance. In Proceedings of the 3rd International Conference on Network and System Security, pages 351–356.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Shui Yu
    • 1
    Email author
  • Simon James
    • 1
  • Yonghong Tian
    • 2
  • Wanchun Dou
    • 3
  1. 1.School of Information TechnologyDeakin UniversityVictoriaAustralia
  2. 2.School of Electronic Engineering and Computer SciencePeking UniversityBeijingChina
  3. 3.Department of Computer Science and TechnologyNanjing UniversityNanjingChina

Personalised recommendations