Abstract
While massive amounts of data are being collected and stored from not only science fields but also industry and commerce fields, the efficient mining and management of useful information of this data is becoming a challenge and a massive economic need. This led to the development of distributed data mining techniques to deal with huge multi-dimensional datasets distributed among several sites.
Besides, to cope with large, graphically distributed, high dimensional, multi-owner, and heterogeneous datasets, Grid platforms are well suited for data storage and they provide an effective computational support for distributed data mining applications. Although Grid platforms allow to share resources distributed in large, heterogeneous environments, there are still many challenges on carrying these distributed data mining techniques on Grid because of lacking efficient distributed data mining systems.
In this chapter, we present a new DDM system basing on a Grid/P2P middleware tools to execute new distributed data mining techniques on very large and distributed heterogeneous datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering 8, 962–969 (1996)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB’94: Proceedings of the 20th Int. Conf. Very Large Data Bases, Santiago de Chile, Chile, September 12–15, 1994
Alsabti, K., Ranka, S., Singh, V.: A one-pass algorithm for accurately estimating quantiles for disk-resident data. In: Proceedings of the VLDE’97 Conference, pp. 346–355. Morgan Kaufmann, San Francisco (1997)
Aouad, L.M., Le-Khac, N.-A., Kechadi, M.-T.: Lightweight clustering technique for distributed data mining applications. In: The 7th Industrial Conference on Data Mining ICDM 2007. Lecture Notes in Artificial Intelligence, vol. 4597. Springer, Berlin (2007)
Aouad, L.M., Le-Khac, N.-A., Kechadi, M.-T.: A multi-stage clustering algorithm for distributed data mining environments. In: COSI 2008, Colloque sur l’Optimisation et les Systèmes d’Information (2008)
Aouad, L.M., Le-Khac, N.-A., Kechadi, M.-T.: Performance study of distributed apriori-like frequent itemset mining, University College Dublin, Technical report (2008)
Aronis, J., Kulluri, V., Provost, F., Buchanan, B.: The WoRLD: Knowledge discovery and multiple distributed databases. In: Proceedings of Florida Artificial Intelligence Research Symposium (FLAIRS-97) (1997)
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 36, 105–139 (1999)
Brezany, P., Hofer, J., Tjoa, A., Wohrer, A.: GridMiner: An infrastructure for data mining on computational grids. In: Data Mining on Computational Grids APAC’03 Conference, Gold Coast, Australia, October 2003
Brezany, P., Janciak, I., Woehrer, A., Tjoa, A.: GridMiner: A framework for knowledge discovery on the Grid—from a vision to design and implementation. In: Cracow Grid Workshop, Cracow, December 2004, pp. 12–15 (2004)
Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: SIGMOD’97: Proceedings ACM SIGMOD Int. Conf. on Management of Data, Tucson, Arizona, USA, May 13–15, 1997
Buchanan, B.G., Shortliffe, E.H.: Rule-Based Expert Systems: The MYCIN Experiments of The Standford Heuristic Programming Projects. Addison-Wesley, Reading (1984)
Buzan, T., Buzan, B.: The Mind Map Book. Plume, New York (1996)
Calinski, R.B., Harabasz, J.: A dendrite method for cluster analysis. Communication in Statistics Journal 3(1), 1–27 (1974)
Cannataro, M., et al.: A data mining toolset for distributed high performance platforms. In: Proc. of the 3rd International Conference on Data Mining Methods and Databases for Engineering, Finance and Others Fields, pp. 41–50. WIT Press, Southampton (2002)
Cannataro, M., Talia, D., Trunfio, P.: Distributed data mining on the grid. Future Generation Computer Systems 18(8), 1101–1112 (2002)
Chan, P., Stolfo, S.: Toward parallel and distributed learning by meta-learning. In: Working Notes AAAI Workshop in Knowledge Discovery in Databases, pp. 227–240. AAAI Press, Menlo Park (1993)
Chattratichat, J., et al.: An architecture for distributed enterprise data mining. In: HPCN Europe, pp. 573–582. Springer, Heidelberg (1999)
Chen, S.M., Ke, J.-S., Chang, J.-F.: Knowledge representation using fuzzy Petri nets. IEEE Transactions on Knowledge and Data Engineering 2(3), 311–319 (1990)
Curcin, V., Ghanem, M., Guo, Y., Kohler, M., Rowe, A., Syed, J., Wendel, P.: Discovery net: towards a grid of knowledge discovery. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 658–663. ACM, New York (2002)
Czajkowski, K., et al.: The WS-resource framework, Version 1.0. http://www-106.ibm.com/developerworks/library/ws-resource/ws-wsrf.pdf
Davenport, T.H., Prusak, L.: Working Knowledge. Harvard Business School Press, Cambridge (1998)
Deng, Y., Chang, S.-K.: A G-net model for knowledge representation and reasoning. IEEE Transactions on Knowledge and Data Engineering 2(3), 295–310 (1990)
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting and randomization. Machine Learning 40, 139–158 (2000)
Dunham, M.H.: Data Mining Introductory and Advanced Topics. Prentice-Hall, Englewood Cliffs (2002)
Eppler, M.J.: Making knowledge visible through intranet knowledge maps: Concepts, elements, cases. In: Proceedings of the 34th Hawaii International Conference on System Sciences (2001)
Forman, G., Zhang, B.: Distributed data clustering can be efficient and exact. In: SIGKDD Explorations, vol. 2 (2000)
Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure, pp. 593–620. Morgan Kaufmann, Los Altos (2004)
Foster, I., Kesselman, C., Nick, J., Tuecke, S.: The physiology of the grid: An open grid services architecture for distributed systems integration. http://www.globus.org/research/papers/ogsa.pdf
Freitas, A.A., Lavington, S.H.: Mining Very Large Databases with Parallel Processing. Kluwer Academic, Dordrecht (1998)
Globus Tool Kit website: http://www.globus.org
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA (2000).
Hudzia, B., McDermott, L., Illahi, T.N., Kechadi, M.-T.: Entity based peer-to-peer in a data grid environment. In: Proc. of 17th IMACS World Congress Scientific Computation, Applied Mathematics and Simulation, Paris, France, July 2005, pp. 11–15 (2005)
Januzaj, E., Kriegel, H.-P., Pfeifle, M.: DBDC: Density-based distributed clustering. In: Proc. of 9th Int. Conf. on Extending Database Technology (EDBT), Heraklion, Greece, pp. 88–105 (2004)
Joshi, M., et al.: Parallel algorithms for data mining. In: CRPC Parallel Computing Handbook. Morgan Kaufmann, San Francisco (2000)
Le-Khac, N.-A., Aouad, L.M., Kechadi, M.-T.: An efficient support management tool for distributed data mining environments. In: 2nd IEEE International Conference on Digital Information Management (ICDIM’07), Lyon, France, October 28–31, 2007
Le-Khac, N.-A., Aouad, L.M., Kechadi, M.-T.: An efficient knowledge management tool for distributed data mining environments. International Journal of Computational Intelligence Research 5(1), 5–15 (2009)
Martynov, M., Novikov, B.: An indexing algorithm for text retrieval. In: Proceedings of the International Workshop on Advances in Databases and Information System (ADBIS’96), Moscow, pp. 171–175 (1996)
Merz, C.J., Pazzani, M.J.: A principal components approach to combining regression estimates. Machine Learning 36, 9–32 (1999)
Mingjin, Y., Keying, Y.: Determining the number of clusters using the weighted gap statistic. Biometrics 63(4), 1031–1037 (2007)
Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: VLDB, Proceedings of 20th International Conference on Very Large Data Bases, Santiago de Chile, Chile, September 12–15, 1994
Novak, J.D., Gowin, D.B.: Learning How to Learn. Cambridge University Press, Cambridge (1984)
OGSA-DAI website: http://www.ogsadai.org.uk/
Park, J.S., Chen, M.-S., Yu, P.S.: An effective hash-based algorithm for mining association rules. In: SIGMOD’95: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, California, USA (1995)
Peterson, J.-L.: Petri nets. ACM Computing Surveys 9(3), 223–252 (1977)
Purdom, P.W., Van Gucht, D., Groth, D.P.: Average-case performance of the Apriori algorithm. SIAM Journal on Computing 33(5) (2004)
Savasere, A., Omiecinski, E., Navathe, S.B.: An efficient algorithm for mining association rules in large databases. In: VLDB’95: Proceedings of the 21st International Conference on Very Large Databases, Zurich, Switzerland (1995)
Schuster, A., Wolff, R., Trock, D.: A high-performance distributed algorithm for mining association rules. In: ICDM’03: Proceedings of the Third IEEE International Conference on Data Mining, Melbourne, Florida, USA (2003)
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a dataset via the gap statistic. Stanford University (2000)
Wexler, M.N.: The who, what and why of knowledge mapping. Journal of Knowledge Management 5, 249–263 (2001)
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 (2005)
Zhang, B., Hsu, M., Dayal, U.: k-harmonic means—A data clustering algorithm, HP Labs (1999)
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Computing Surveys 38(2), Article 6 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag London
About this chapter
Cite this chapter
Le Khac, N.A., Aouad, L.M., Kechadi, MT. (2010). Toward Distributed Knowledge Discovery on Grid Systems. In: Badr, Y., Chbeir, R., Abraham, A., Hassanien, AE. (eds) Emergent Web Intelligence: Advanced Semantic Technologies. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-84996-077-9_9
Download citation
DOI: https://doi.org/10.1007/978-1-84996-077-9_9
Publisher Name: Springer, London
Print ISBN: 978-1-84996-076-2
Online ISBN: 978-1-84996-077-9
eBook Packages: Computer ScienceComputer Science (R0)