Advertisement

The Architecture of the Cornell Knowledge Broker

  • Alan Demers
  • Johannes Gehrke
  • Mirek Riedewald
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3073)

Abstract

Intelligence applications have to process massive amounts of data in order to extract relevant information. This includes archived historical data as well as continuously arriving new data. We propose a novel architecture that addresses this problem – the Cornell Knowledge Broker. It will not only support knowledge discovery, but also security, privacy, information exchange, and collaboration.

Keywords

Data Mining Data Stream Data Warehouse Data Mining Application Privacy Preserve Data Mining 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adam, N.R., Wortmann, J.C.: Security-control methods for statistical databases: A comparative study. ACM Computing Surveys 21(4), 515–556 (1989)CrossRefGoogle Scholar
  2. 2.
    Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 439–450 (2000)Google Scholar
  3. 3.
    Aguilera, M.K., Strom, R.E., Sturman, D.C., Astley, M., Chandra, T.D.: Matching events in a content-based subscription system. In: Proc. ACM Symp. on Principles of Distributed Computing (PODC), pp. 53–61 (1999)Google Scholar
  4. 4.
    Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: Semantic foundations and query execution. Technical report, Stanford University (2003)Google Scholar
  5. 5.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proc. Symp. on Principles of Database Systems (PODS), pp. 1–16 (2002)Google Scholar
  6. 6.
    Babcock, B., Datar, M., Motwani, R.: Load shedding techniques for data stream systems (short paper). In: Proc. Workshop on Management and Processing of Data Streams, MPDS (2003)Google Scholar
  7. 7.
    Babu, S., Widom, J.: Continuous queries over data streams. ACM SIGMOD Record 30(3), 109–120 (2001)CrossRefGoogle Scholar
  8. 8.
    Bertino, E., Jajodia, S., Samarati, P.: Database security: Research and practice. Information Systems 20(7), 537–556 (1995)CrossRefGoogle Scholar
  9. 9.
    Bonnet, P., Gehrke, J., Seshadri, P.: Towards sensor database systems. In: Tan, K.-L., Franklin, M.J., Lui, J.C.-S. (eds.) MDM 2001. LNCS, vol. 1987, pp. 3–14. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  10. 10.
    Buneman, P., Khanna, S., Tajima, K., Tan, W.C.: Archiving scientific data. In: Proc. SIGMOD, pp. 1–12 (2002)Google Scholar
  11. 11.
    Buneman, P., Khanna, S., Tan, W.C.: Why and where: A characterization of data provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  12. 12.
    Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.: Monitoring streams – a new class of data management applications. In: Proc. Int. Conf. on Very Large Databases, VLDB (2002)Google Scholar
  13. 13.
    Carzaniga, A., Rosenblum, D.S., Wolf, A.L.: Achieving scalability and expressiveness in an internet-scale event notification service. In: Proc. ACM Symp. on Principles of Distributed Computing (PODC), pp. 219–227 (2000)Google Scholar
  14. 14.
    Chakravarthy, S., Krishnaprasad, V., Anwar, E., Kim, S.-K.: Composite events for active databases: Semantics, contexts and detection. In: Proc. Int. Conf. on Very Large Databases (VLDB), pp. 606–617 (1994)Google Scholar
  15. 15.
    Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, W., Krishnamurthy, S., Madden, S.R., Raman, V., Reiss, F., Shah, M.A.: TelegraphCQ: Continuous dataflow processing for an uncertain world. In: Proc. Conf. on Innovative Data Systems Research, CIDR (2003)Google Scholar
  16. 16.
    Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: NiagaraCQ: A scalable continuous query system for internet databases. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 379–390 (2000)Google Scholar
  17. 17.
    Das, A., Gehrke, J., Riedewald, M.: Approximate join processing over data streams. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 40–51 (2003)Google Scholar
  18. 18.
    Demers, A., Gehrke, J., Riedewald, M.: Research issues in mining and monitoring of intelligence data. In: Kargupta, H., Joshi, A., Sivakumar, K., Yesha, Y. (eds.) Data Mining: Next Generation Challenges and Future Directions, MIT/AAAI Press (2004) (to be released)Google Scholar
  19. 19.
    Duncan, G.T., Krishnan, R., Padman, R., Reuther, P., Roehrig, S.: Cell suppression to limit content-based disclosure. In: Proceedings of the 30th Hawaii International Conference on System Sciences, vol. 3, IEEE Computer Society Press, Los Alamitos (1997)Google Scholar
  20. 20.
    Duncan, G.T., Mukherjee, S.: Optimal disclosure limitation strategy in statistical databases: Deterring tracker attacks through additive noise. Journal of the American Statistical Association 95(451), 720–729 (2000)CrossRefGoogle Scholar
  21. 21.
    Evans, T., Zayatz, L., Slanta, J.: Using noise for disclosure limitation of establishment tabular data. Journal of Official Statistics 14(4), 537–551 (1998)Google Scholar
  22. 22.
    Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (2002)Google Scholar
  23. 23.
    Evfimievski, A., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the 22nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS 2003 (2003)Google Scholar
  24. 24.
    Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery in Databases and Data Mining, Edmonton, Alberta, Canada, July 23-26, pp. 217–228 (2002)Google Scholar
  25. 25.
    Fabret, F., Jacobsen, H.-A., Llirbat, F., Pereira, J., Ross, K.A., Shasha, D.: Filtering algorithms and implementation for very fast publish/subscribe. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 115–126 (2001)Google Scholar
  26. 26.
    Fienberg, S.E., Makov, U.E., Steele, R.J.: Disclosure limitation using perturbation and related methods for categorical data. Journal of Official Statistics 14(4), 485–502 (1998)Google Scholar
  27. 27.
    Gehrke, J. (ed.): Special Issue on Privacy and Security. SIGKDD Explorations, vol. 4 (2002)Google Scholar
  28. 28.
    Gouweleeuw, J.M̃., Kooiman, P., Willenborg, L.C.R.J̃., de Wolf, P.-P.: Post randomisation for statistical disclosure control: Theory and implementation. Journal of Official Statistics 14(4), 463–478 (1998)Google Scholar
  29. 29.
    Gupta, A., Mumick, I.S. (eds.): Materialized Views: Techniques, Implementations, and Applications. MIT Press, Cambridge (1998)Google Scholar
  30. 30.
    Hopcroft, J.E., Khan, O., Kulis, B., Selman, B.: Natural communities in large linked networks. In: Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 541–546 (2003)Google Scholar
  31. 31.
    Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery in Databases and Data Mining, Edmonton, Alberta, Canada, July 23-26, pp. 279–288 (2002)Google Scholar
  32. 32.
    Joachims, T.: Optimizing search engines using clickthrough data. In: Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 133–142 (2002)Google Scholar
  33. 33.
    Kang, J., Naughton, J.F., Viglas, S.D.: Evaluating window joins over unbounded streams. In: Proc. Int. Conf. on Data Engineering, ICDE (2003)Google Scholar
  34. 34.
    Kimball, R.: The Data Warehouse Toolkit. John Wiley and Sons, Chichester (1996)Google Scholar
  35. 35.
    Kleinberg, J.M.: Bursty and hierarchical structure in streams. Data Mining and Knowledge Discovery 7(4), 373–397 (2003)CrossRefMathSciNetGoogle Scholar
  36. 36.
    Lindell, Y., Pinkas, B.: Privacy preserving data mining. Journal of Cryptology 15(3), 177–206 (2002)MATHCrossRefMathSciNetGoogle Scholar
  37. 37.
    Madden, S., Franklin, M.J., Hellerstein, J.M., Hong, W.: The design of an acquisitional query processor for sensor networks. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 491–502 (2003)Google Scholar
  38. 38.
    Motakis, I., Zaniolo, C.: Formal semantics for composite temporal events in active database rules. Journal of Systems Integration 7(3-4), 291–325 (1997)CrossRefGoogle Scholar
  39. 39.
    Mukherjee, S., Duncan, G.T.: Disclosure limitation through additive noise data masking: Analysis of skewed sensitive data. In: Proceedings of the 30th Hawaii International Conference on System Sciences, vol. 3, pp. 581–586. IEEE Computer Society Press, Los Alamitos (1997)Google Scholar
  40. 40.
    Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing information. In: Proceedings of the 17th ACM Symposium on Principles of Database Systems, Seattle, Washington, USA, June 1-3 (1998)Google Scholar
  41. 41.
    Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. In: Proceedings of the IEEE Symposium on Research in Security and Privacy, Oakland, California, USA (May 1998)Google Scholar
  42. 42.
    Tatbul, N., Çetintemel, U., Zdonik, S., Cherniack, M., Stonebraker, M.: In: Proc. Int. Conf. on Very Large Databases (VLDB), pp. 309–320 (2003)Google Scholar
  43. 43.
    Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 23–26 (2002)Google Scholar
  44. 44.
    Yalamanchi, A., Srinivasan, J., Gawlick, D.: Managing expressions as data in relational database systems. In: Proc. Conf. on Innovative Data Systems Research, CIDR (2003)Google Scholar
  45. 45.
    Yu, T., Srivastava, D., Lakshmanan, L.V.S., Jagadish, H.V.: Compressed accessibility map: Efficient access control for xml. In: Proc. VLDB, pp. 478–489 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Alan Demers
    • 1
  • Johannes Gehrke
    • 1
  • Mirek Riedewald
    • 1
  1. 1.Department of Computer ScienceCornell University 

Personalised recommendations