The Architecture of the Cornell Knowledge Broker
Conference paper
Abstract
Intelligence applications have to process massive amounts of data in order to extract relevant information. This includes archived historical data as well as continuously arriving new data. We propose a novel architecture that addresses this problem – the Cornell Knowledge Broker. It will not only support knowledge discovery, but also security, privacy, information exchange, and collaboration.
Keywords
Data Mining Data Stream Data Warehouse Data Mining Application Privacy Preserve Data Mining
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Preview
Unable to display preview. Download preview PDF.
References
- 1.Adam, N.R., Wortmann, J.C.: Security-control methods for statistical databases: A comparative study. ACM Computing Surveys 21(4), 515–556 (1989)CrossRefGoogle Scholar
- 2.Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 439–450 (2000)Google Scholar
- 3.Aguilera, M.K., Strom, R.E., Sturman, D.C., Astley, M., Chandra, T.D.: Matching events in a content-based subscription system. In: Proc. ACM Symp. on Principles of Distributed Computing (PODC), pp. 53–61 (1999)Google Scholar
- 4.Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: Semantic foundations and query execution. Technical report, Stanford University (2003)Google Scholar
- 5.Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proc. Symp. on Principles of Database Systems (PODS), pp. 1–16 (2002)Google Scholar
- 6.Babcock, B., Datar, M., Motwani, R.: Load shedding techniques for data stream systems (short paper). In: Proc. Workshop on Management and Processing of Data Streams, MPDS (2003)Google Scholar
- 7.Babu, S., Widom, J.: Continuous queries over data streams. ACM SIGMOD Record 30(3), 109–120 (2001)CrossRefGoogle Scholar
- 8.Bertino, E., Jajodia, S., Samarati, P.: Database security: Research and practice. Information Systems 20(7), 537–556 (1995)CrossRefGoogle Scholar
- 9.Bonnet, P., Gehrke, J., Seshadri, P.: Towards sensor database systems. In: Tan, K.-L., Franklin, M.J., Lui, J.C.-S. (eds.) MDM 2001. LNCS, vol. 1987, pp. 3–14. Springer, Heidelberg (2000)CrossRefGoogle Scholar
- 10.Buneman, P., Khanna, S., Tajima, K., Tan, W.C.: Archiving scientific data. In: Proc. SIGMOD, pp. 1–12 (2002)Google Scholar
- 11.Buneman, P., Khanna, S., Tan, W.C.: Why and where: A characterization of data provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2000)CrossRefGoogle Scholar
- 12.Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.: Monitoring streams – a new class of data management applications. In: Proc. Int. Conf. on Very Large Databases, VLDB (2002)Google Scholar
- 13.Carzaniga, A., Rosenblum, D.S., Wolf, A.L.: Achieving scalability and expressiveness in an internet-scale event notification service. In: Proc. ACM Symp. on Principles of Distributed Computing (PODC), pp. 219–227 (2000)Google Scholar
- 14.Chakravarthy, S., Krishnaprasad, V., Anwar, E., Kim, S.-K.: Composite events for active databases: Semantics, contexts and detection. In: Proc. Int. Conf. on Very Large Databases (VLDB), pp. 606–617 (1994)Google Scholar
- 15.Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, W., Krishnamurthy, S., Madden, S.R., Raman, V., Reiss, F., Shah, M.A.: TelegraphCQ: Continuous dataflow processing for an uncertain world. In: Proc. Conf. on Innovative Data Systems Research, CIDR (2003)Google Scholar
- 16.Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: NiagaraCQ: A scalable continuous query system for internet databases. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 379–390 (2000)Google Scholar
- 17.Das, A., Gehrke, J., Riedewald, M.: Approximate join processing over data streams. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 40–51 (2003)Google Scholar
- 18.Demers, A., Gehrke, J., Riedewald, M.: Research issues in mining and monitoring of intelligence data. In: Kargupta, H., Joshi, A., Sivakumar, K., Yesha, Y. (eds.) Data Mining: Next Generation Challenges and Future Directions, MIT/AAAI Press (2004) (to be released)Google Scholar
- 19.Duncan, G.T., Krishnan, R., Padman, R., Reuther, P., Roehrig, S.: Cell suppression to limit content-based disclosure. In: Proceedings of the 30th Hawaii International Conference on System Sciences, vol. 3, IEEE Computer Society Press, Los Alamitos (1997)Google Scholar
- 20.Duncan, G.T., Mukherjee, S.: Optimal disclosure limitation strategy in statistical databases: Deterring tracker attacks through additive noise. Journal of the American Statistical Association 95(451), 720–729 (2000)CrossRefGoogle Scholar
- 21.Evans, T., Zayatz, L., Slanta, J.: Using noise for disclosure limitation of establishment tabular data. Journal of Official Statistics 14(4), 537–551 (1998)Google Scholar
- 22.Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (2002)Google Scholar
- 23.Evfimievski, A., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. In: Proceedings of the 22nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS 2003 (2003)Google Scholar
- 24.Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery in Databases and Data Mining, Edmonton, Alberta, Canada, July 23-26, pp. 217–228 (2002)Google Scholar
- 25.Fabret, F., Jacobsen, H.-A., Llirbat, F., Pereira, J., Ross, K.A., Shasha, D.: Filtering algorithms and implementation for very fast publish/subscribe. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 115–126 (2001)Google Scholar
- 26.Fienberg, S.E., Makov, U.E., Steele, R.J.: Disclosure limitation using perturbation and related methods for categorical data. Journal of Official Statistics 14(4), 485–502 (1998)Google Scholar
- 27.Gehrke, J. (ed.): Special Issue on Privacy and Security. SIGKDD Explorations, vol. 4 (2002)Google Scholar
- 28.Gouweleeuw, J.M̃., Kooiman, P., Willenborg, L.C.R.J̃., de Wolf, P.-P.: Post randomisation for statistical disclosure control: Theory and implementation. Journal of Official Statistics 14(4), 463–478 (1998)Google Scholar
- 29.Gupta, A., Mumick, I.S. (eds.): Materialized Views: Techniques, Implementations, and Applications. MIT Press, Cambridge (1998)Google Scholar
- 30.Hopcroft, J.E., Khan, O., Kulis, B., Selman, B.: Natural communities in large linked networks. In: Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 541–546 (2003)Google Scholar
- 31.Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery in Databases and Data Mining, Edmonton, Alberta, Canada, July 23-26, pp. 279–288 (2002)Google Scholar
- 32.Joachims, T.: Optimizing search engines using clickthrough data. In: Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 133–142 (2002)Google Scholar
- 33.Kang, J., Naughton, J.F., Viglas, S.D.: Evaluating window joins over unbounded streams. In: Proc. Int. Conf. on Data Engineering, ICDE (2003)Google Scholar
- 34.Kimball, R.: The Data Warehouse Toolkit. John Wiley and Sons, Chichester (1996)Google Scholar
- 35.Kleinberg, J.M.: Bursty and hierarchical structure in streams. Data Mining and Knowledge Discovery 7(4), 373–397 (2003)CrossRefMathSciNetGoogle Scholar
- 36.Lindell, Y., Pinkas, B.: Privacy preserving data mining. Journal of Cryptology 15(3), 177–206 (2002)MATHCrossRefMathSciNetGoogle Scholar
- 37.Madden, S., Franklin, M.J., Hellerstein, J.M., Hong, W.: The design of an acquisitional query processor for sensor networks. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 491–502 (2003)Google Scholar
- 38.Motakis, I., Zaniolo, C.: Formal semantics for composite temporal events in active database rules. Journal of Systems Integration 7(3-4), 291–325 (1997)CrossRefGoogle Scholar
- 39.Mukherjee, S., Duncan, G.T.: Disclosure limitation through additive noise data masking: Analysis of skewed sensitive data. In: Proceedings of the 30th Hawaii International Conference on System Sciences, vol. 3, pp. 581–586. IEEE Computer Society Press, Los Alamitos (1997)Google Scholar
- 40.Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing information. In: Proceedings of the 17th ACM Symposium on Principles of Database Systems, Seattle, Washington, USA, June 1-3 (1998)Google Scholar
- 41.Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. In: Proceedings of the IEEE Symposium on Research in Security and Privacy, Oakland, California, USA (May 1998)Google Scholar
- 42.Tatbul, N., Çetintemel, U., Zdonik, S., Cherniack, M., Stonebraker, M.: In: Proc. Int. Conf. on Very Large Databases (VLDB), pp. 309–320 (2003)Google Scholar
- 43.Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 23–26 (2002)Google Scholar
- 44.Yalamanchi, A., Srinivasan, J., Gawlick, D.: Managing expressions as data in relational database systems. In: Proc. Conf. on Innovative Data Systems Research, CIDR (2003)Google Scholar
- 45.Yu, T., Srivastava, D., Lakshmanan, L.V.S., Jagadish, H.V.: Compressed accessibility map: Efficient access control for xml. In: Proc. VLDB, pp. 478–489 (2002)Google Scholar
Copyright information
© Springer-Verlag Berlin Heidelberg 2004