Parallel Data Mining Experimentation Using Flexible Configurations

Peña, José M.; Javier Crespo, F.; Menasalvas, Ernestina; Robles, Victor

doi:10.1007/3-540-45813-1_58

José M. Peña⁵,
F. Javier Crespo⁶,
Ernestina Menasalvas⁵ &
…
Victor Robles⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2475))

Included in the following conference series:

International Conference on Rough Sets and Current Trends in Computing

Abstract

When data mining first appeared, several disciplines related to data analysis, like statistics or artificial intelligence were combined toward a new topic: extracting significant patterns from data. The original data sources were small datasets and, therefore, traditional machine learning techniques were the most common tools for this tasks. As the volume of data grows these traditional methods were reviewed and extended with the knowledge from experts working on the field of data management and databases. Today problems are even bigger than before and, once again, a new discipline allows the researchers to scale up to these data. This new discipline is distributed and parallel processing. In order to use parallel processing techniques, specific factors about the mining algorithms and the data should be considered. Nowadays, there are several new parallel algorithms, that in most of the cases are extensions of a traditional centralized algorithm. Many of these algorithms have common core parts and only differ on distribution schema, parallel coordination or load/task balancing methods. We call these groups algorithm families. On this paper we introduce a methodology to implement algorithm families. This methodology is founded on the MOIRAE distributed control architecture. In this work we will show how this architecture allows researchers to design parallel processing components that can change, dynamically, their behavior according to some control policies.

This research project is funded under the Universidad Politécnica de Madrid grant program

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Scaling up Data Mining Techniques to Large Datasets Using Parallel and Distributed Processing

Parallelization of Algorithms for Mining Data from Distributed Sources

Parallel Prediction Algorithms for Heterogeneous Data: A Case Study with Real-Time Big Datasets

References

Peter Christen, Ole M. Nielsen, and Markus Hegland. DMtools-open source software for database mining. In PKDD’2001, 2001.
Google Scholar
Robert L. Grossman, Stuart M. Bailey, Harinath Sivakumar, and Andrei L. Turinsky. Papyrus: A system for data mining over local and wide-area clusters and super-clusters. In ACM, editor, SC’99. ACM Press and IEEE Computer Society Press, 1999.
Google Scholar
Mahesh V. Joshi, Eui-Hong (Sam) Han, George Karypis, and Vipin Kumar. CRPC Parallel Computing Handbook, chapter Parallel Algorithms for Data Mining. Morgan Kaufmann, 2000.
Google Scholar
H. Kargupta, B. Park, D. Hershbereger, and E. Johnson. Advanced in Distributed and Parallel Knowledge Discovery, chapter Collective Data Mining: A new perspective towards distributed data mining. AAAI Press / MIT Press, 2000.
Google Scholar
Hillol Kargupta, Ilker Hamzaoglu, and Brian Stafford. Scalable, distributed data mining-an agent architecture. page 211.
Google Scholar
Kensingston, Enterprise Data Mining. Kensington: New generation enterprise data mining. White Paper, 1999. Parallel Computing Research Centre, Department of Computing Imperial College, (Contact Martin Khler).
Google Scholar
S. Krishnaswamy, S. W. Loke, and A. Zaslavsky. Cost models for distributed data mining. Technical Report 2000/59, School of Computer Science and Software Engineering, Monash University, Australia 3168, February 2000.
Google Scholar
José M. Peña. Distributed Control Architecture for Data Mining Systems. PhD thesis, DATSI, FI, Universidad Politécnica de Madrid, Spain, June 2001. Spanish title: “Arquitectura Distribuida de Control para Sistemas con Capacidades de Data Mining”.
Google Scholar
José M. Peña and Ernestina Menasalvas. Towards flexibility in a distributed data mining framework. In Proceedings of ACM-SIGMOD/PODS 2001, pages 58–61, 2001.
Google Scholar
Foster Provost. Advances in Distributed and Parallel Knowledge Discovery, chapter Distributed Data Mining: Scaling Up and Beyond, pages 3–28. AAAI Press/MIT Press, 2000.
Google Scholar
O.F. Rana, D.W. Walker, M. Li, S. Lynden, and M. Ward. PaDDMAS: Parallel and distributed data mining application suite. In Proceedings of the Fourteenth International Parallel and Distributed Processing Symposium, 2000.
Google Scholar
T. Shintani and M. Kitsuregawa. Parallel algorithms for mining association rule mining on large scale PC cluster. In Mohammed J. Zaki and Ching-Tien Ho, editors, Workshop on Large-Scale Parallel KDD Systems, San Diego, CA, USA, August 1999. ACM. in conjunction with ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD99).
Google Scholar
S. Stolfo, W. Fan, W. Lee, A. Prodromidis, and P. Chan. Cost-based modeling for fraud and instrusion detection: Results from the JAM project. In DARPA Information Survivability Conference and Exposition, pages 130–144. IEEE Computer Press, 2000.
Google Scholar
M. Zaki. Large-Scale Parallel Data Mining, volume 1759 of Springer Lecture Note in Artificial Intelligence, chapter Parallel and Distributed Data Mining: An Introduction. Springer Verlag, 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Politécnica de Madrid, Madrid, Spain
José M. Peña, Ernestina Menasalvas & Victor Robles
Universidad Carlos III de Madrid, Madrid, Spain
F. Javier Crespo

Authors

José M. Peña
View author publications
You can also search for this author in PubMed Google Scholar
F. Javier Crespo
View author publications
You can also search for this author in PubMed Google Scholar
Ernestina Menasalvas
View author publications
You can also search for this author in PubMed Google Scholar
Victor Robles
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Penn State Great Valley School of Professional Graduate Studies, 30 East Swedesford Road, Malvern, PA, 19355, USA
James J. Alpigini
Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, MB, R3T 5V6, Canada
James F. Peters
Institute of Mathematics, Warsaw University, Banacha 2, 02-097, Warsaw, Poland
Andrzej Skowron
Department of Systems and Information Engineering, Maebashi Institute of Technology, 460-1 Kamisadori-Cho, Maebashi-City, 371-0816, Japan
Ning Zhong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peña, J.M., Javier Crespo, F., Menasalvas, E., Robles, V. (2002). Parallel Data Mining Experimentation Using Flexible Configurations. In: Alpigini, J.J., Peters, J.F., Skowron, A., Zhong, N. (eds) Rough Sets and Current Trends in Computing. RSCTC 2002. Lecture Notes in Computer Science(), vol 2475. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45813-1_58

Download citation

DOI: https://doi.org/10.1007/3-540-45813-1_58
Published: 20 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44274-5
Online ISBN: 978-3-540-45813-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Parallel Data Mining Experimentation Using Flexible Configurations

Abstract

Access this chapter

Preview

Similar content being viewed by others

Scaling up Data Mining Techniques to Large Datasets Using Parallel and Distributed Processing

Parallelization of Algorithms for Mining Data from Distributed Sources

Parallel Prediction Algorithms for Heterogeneous Data: A Case Study with Real-Time Big Datasets

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Parallel Data Mining Experimentation Using Flexible Configurations

Abstract

Access this chapter

Preview

Similar content being viewed by others

Scaling up Data Mining Techniques to Large Datasets Using Parallel and Distributed Processing

Parallelization of Algorithms for Mining Data from Distributed Sources

Parallel Prediction Algorithms for Heterogeneous Data: A Case Study with Real-Time Big Datasets

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation