Abstract
Nowadays, the process of data mining is one of the most important topics in scientific and business problems. There is a huge amount of data that can help to solve many of these problems. However, data is geographically distributed in various locations and belongs to several organizations. Furthermore, it is stored in different kind of systems and it is represented in many formats. In this paper, different techniques have been studied to make easier the data mining process in a distributed environment. Our approach proposes the use of grid to improve the data mining process due to the features of this kind of systems. In addition, we show a flexible architecture that allows data mining applications to be dynamically configured according to their needs. This architecture is made up of generic, data grid and specific data mining grid services.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Atkinson, M., Chervenak, A.L., Kunszt, P., Narang, I., Paton, N.W., Pearson, D., Shashoni, A., Watson, P.: Data access, integration and management, 2nd edn., The Grid: Blueprint for a New Computing Infrastructure, ch. 22, pp. 391–429 (December 2003)
Bailey, S., Creel, E., Grossman, R.L., Gutti, S., Sivakumar, H.: Large-Scale Parallel Data Mining, ch. A High Performance Implementation of the Data Space Transfer Protocol (DSTP), pp. 55–64 (1999)
Bayucan, A., Henderson, R.L., Lesiak, C., Mann, B., Proett, T., Tweten, D.: PBS Portable Batch System. External Reference Specification
Bestavros, A.: Middleware support for data mining and knowledge discovery in large-scale distributed information systems. In: Proceedings of the SIGMOD 1996 Data MiningWorkshop, Montreal, Canada (June 1996)
Cannataro, M., Talia, D.: The knowledge grid. Commun. ACM 46(1), 89–93 (2003)
Chervenak, A., Foster, I., Kesselman, C., Salisbury, C., Tuecke, S.: The Data Grid:Towards an architecture for the distributed management and analysis of large scientific datasets. Journal of Network and Computer Applications 23, 187–200 (2001)
The Condor Project, http://www.cs.wisc.edu/condor
Foster, I., Kesselman, C. (eds.): The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco (1999)
Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International Journal of SuperComputer Applications, 15(3) (2001); Descripci n detallada de la arquitectura Grid
Giannadakis, N., Rowe, A., Ghanem, M., Guo, Y.: InfoGrid: providing information integration for knowledge discovery. Information Sciences 155(3-4), 199–226 (2003)
Grossman, R.L., Bailey, S.M., Sivakumar, H., Turinsky, A.L.: Papyrus: Asystem for data mining over local and wide-area clusters and super-clusters. In: ACM (ed.) SC 1999, ACM Press and IEEE Computer Society Press, New York (1999)
Grid Scheduling Architecture Research Group (GSA-RG), http://ds.e-technik.uni-dortmund.de/yahya/ggf-sched/wg/arch-rg.htm
Grid Engine, http://gridengine.sunsource.net
Huber, J.: PPFS: An experimental filesystem for high performance parallel input/output. Master’s thesis, Department of Computer Science, University of Illinois at Urbana-Champaing (1995)
IBM. Application programming interface and utility reference. IBM DB2 Intelligent Miner for Data. IBM (September 1999)
ISL. Clementine servcer distributed architecture. White Paper, Integrates Solution Limited, SPSS Group (1999)
Joshi, M.V., Han, E.-H(S.): Parallel Computing Handbook, chapter Parallel Algorithms for Data Mining. Morgan Kaufmann, San Francisco (2000)
Kamath, C., Musick, R.: Advanced in Distributed and Parallel Knowledge Discovery, chapter Scalable data mining through fine grained parallesim: the present and the future, pp. 29–77. AAAI Press/MIT Press (2000)
Kargupta, H., Huang, W., Krishnamoorthy, S., Johnson, E.: Distributed clustering using collective principal component analysis. Knoledge and Information Systems Journal Special Issue on Distributed and Parallel Knowledge Discovery (2000)
Kargupta, H., Kamath, C., Chan, P.: Advanced in Distributed and Parallel Knowledge Discovery, chapter Distributed and Parallel Data Mining: Emergence, Growth and Future Directions, pp. 409–416. AAAI Press/MIT Press (2000)
Kargupta, H., Chan, P.: Advances in Distributed and Parallel Knowledge Discovery, chapter Distributed and Parallel Data Mining: A Brief Introduction, pp. xv–xxv. AAAI Press/MIT Press (2000)
Kargupta, H., Hamzaoglu, I., Stafford, B.: Scalable, distributed data mining-an agent architecture. pp. 211
Kensingston, Enterprise Data Mining. Kensington: New generation enterprise data mining. White Paper. Parallel Computing Research Centre, Department of Computing Imperial College, (Contact Martin K hler) (1999)
Kohavi, R.: Data mining with mineset: What worked, what did not, and what might. In: Proceedings of theWorkshop on Knowledge Discovery in Databases, Workshop on the Commercial Success of Data Mining (1997)
Krishnaswamy, S., Loke, S.W., Zaslavsky, A.: Cost models for distributed data mining. Technical Report 2000/59, School of Computer Science and Software Engineering, Monash University, Australia 3168 (February 2000)
Maniatty, W.A., Zaki, M.J.: A requirements analysis for parallel kdd systems. In: Rolim, J.D.P. (ed.) IPDPS-WS 2000. LNCS, vol. 1800, pp. 265–358. Springer, Heidelberg (2000)
Musick, R.: Supporting large-scale computational science. Technical Report UCRL–ID– 129903, Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA (1998)
Prodromidis, A., Chan, P., Stolfo, S.: Meta-learning in distributed data mining systems: Issues and approaches. AAAI/MIT Press (2000)
Provost, F.: Distributed Data Mining: Scaling Up and Beyond. In: Advances in Distributed and Parallel Knowledge Discovery, pp. 3–28. AAAI Press/MIT Press (2000)
Provost, F.J., Kolluri, V.: A survey of methods for scaling up inductive algorithms. Data Mining and Knowledge Discovery 3(2), 131–169 (1999)
Schopf, J.M.: A General Architecture for Scheduling on the Grid. Special Issue of JPDC on Grid Computing, Brokers y planificadores de tareas (2002)
Shearer, C.: User driven data mining. In: Unicom Data Mining Conference, London (1996)
Shintani, T., Kitsuregawa, M.: Parallel algorithms for mining association rule mining on large scale PC cluster. In: Zaki, M.J., Ho, C.-T. (eds.) Conjunction with ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 1999) (1999)
Zaki, M.J., Ho, C.T., Agrawal, R.: Parallel classification for data mining on sharedmemory multiprocessors. In: Proceedings International Conference on Data Engineering (March 1999)
Zaki, M.J.: Scalable Data Mining for Rules. PhD thesis, University of Rochester (July 1998) Published also as Technical Report #702
Zaki, M.J.: Parallel sequence mining onSMPmachines a data clustering algorithm on distributed memory machines a data clustering algorithm on distributed memory machines. In: Zaki, M.J., Ho, C.-T. (eds.) Conjunction with ACMSIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 1999) (1999)
Zaki, M.J., Ho, C.-T. (eds.): Workshop on Large-Scale Parallel KDD Systems, San Diego, CA, USA, August 1999. ACM, New York (1999); in conjunction with ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 1999)
Zaki, M.J., Ho, C.-T.: Workshop report: Large-scale parallel KDD systems. In: SIGKDD Explorations [37], pp. 112–114; in conjunction with ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sánchez, A., Peña, J.M., Pérez, M.S., Robles, V., Herrero, P. (2004). Improving Distributed Data Mining Techniques by Means of a Grid Infrastructure. In: Meersman, R., Tari, Z., Corsaro, A. (eds) On the Move to Meaningful Internet Systems 2004: OTM 2004 Workshops. OTM 2004. Lecture Notes in Computer Science, vol 3292. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30470-8_29
Download citation
DOI: https://doi.org/10.1007/978-3-540-30470-8_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23664-1
Online ISBN: 978-3-540-30470-8
eBook Packages: Springer Book Archive