Skip to main content

Abstract

While massive amounts of data are being collected and stored from not only science fields but also industry and commerce fields, the efficient mining and management of useful information of this data is becoming a challenge and a massive economic need. This led to the development of distributed data mining techniques to deal with huge multi-dimensional datasets distributed among several sites.

Besides, to cope with large, graphically distributed, high dimensional, multi-owner, and heterogeneous datasets, Grid platforms are well suited for data storage and they provide an effective computational support for distributed data mining applications. Although Grid platforms allow to share resources distributed in large, heterogeneous environments, there are still many challenges on carrying these distributed data mining techniques on Grid because of lacking efficient distributed data mining systems.

In this chapter, we present a new DDM system basing on a Grid/P2P middleware tools to execute new distributed data mining techniques on very large and distributed heterogeneous datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering 8, 962–969 (1996)

    Article  Google Scholar 

  2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB’94: Proceedings of the 20th Int. Conf. Very Large Data Bases, Santiago de Chile, Chile, September 12–15, 1994

    Google Scholar 

  3. Alsabti, K., Ranka, S., Singh, V.: A one-pass algorithm for accurately estimating quantiles for disk-resident data. In: Proceedings of the VLDE’97 Conference, pp. 346–355. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  4. Aouad, L.M., Le-Khac, N.-A., Kechadi, M.-T.: Lightweight clustering technique for distributed data mining applications. In: The 7th Industrial Conference on Data Mining ICDM 2007. Lecture Notes in Artificial Intelligence, vol. 4597. Springer, Berlin (2007)

    Google Scholar 

  5. Aouad, L.M., Le-Khac, N.-A., Kechadi, M.-T.: A multi-stage clustering algorithm for distributed data mining environments. In: COSI 2008, Colloque sur l’Optimisation et les Systèmes d’Information (2008)

    Google Scholar 

  6. Aouad, L.M., Le-Khac, N.-A., Kechadi, M.-T.: Performance study of distributed apriori-like frequent itemset mining, University College Dublin, Technical report (2008)

    Google Scholar 

  7. Aronis, J., Kulluri, V., Provost, F., Buchanan, B.: The WoRLD: Knowledge discovery and multiple distributed databases. In: Proceedings of Florida Artificial Intelligence Research Symposium (FLAIRS-97) (1997)

    Google Scholar 

  8. Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 36, 105–139 (1999)

    Article  Google Scholar 

  9. Brezany, P., Hofer, J., Tjoa, A., Wohrer, A.: GridMiner: An infrastructure for data mining on computational grids. In: Data Mining on Computational Grids APAC’03 Conference, Gold Coast, Australia, October 2003

    Google Scholar 

  10. Brezany, P., Janciak, I., Woehrer, A., Tjoa, A.: GridMiner: A framework for knowledge discovery on the Grid—from a vision to design and implementation. In: Cracow Grid Workshop, Cracow, December 2004, pp. 12–15 (2004)

    Google Scholar 

  11. Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: SIGMOD’97: Proceedings ACM SIGMOD Int. Conf. on Management of Data, Tucson, Arizona, USA, May 13–15, 1997

    Google Scholar 

  12. Buchanan, B.G., Shortliffe, E.H.: Rule-Based Expert Systems: The MYCIN Experiments of The Standford Heuristic Programming Projects. Addison-Wesley, Reading (1984)

    Google Scholar 

  13. Buzan, T., Buzan, B.: The Mind Map Book. Plume, New York (1996)

    Google Scholar 

  14. Calinski, R.B., Harabasz, J.: A dendrite method for cluster analysis. Communication in Statistics Journal 3(1), 1–27 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  15. Cannataro, M., et al.: A data mining toolset for distributed high performance platforms. In: Proc. of the 3rd International Conference on Data Mining Methods and Databases for Engineering, Finance and Others Fields, pp. 41–50. WIT Press, Southampton (2002)

    Google Scholar 

  16. Cannataro, M., Talia, D., Trunfio, P.: Distributed data mining on the grid. Future Generation Computer Systems 18(8), 1101–1112 (2002)

    Article  MATH  Google Scholar 

  17. Chan, P., Stolfo, S.: Toward parallel and distributed learning by meta-learning. In: Working Notes AAAI Workshop in Knowledge Discovery in Databases, pp. 227–240. AAAI Press, Menlo Park (1993)

    Google Scholar 

  18. Chattratichat, J., et al.: An architecture for distributed enterprise data mining. In: HPCN Europe, pp. 573–582. Springer, Heidelberg (1999)

    Google Scholar 

  19. Chen, S.M., Ke, J.-S., Chang, J.-F.: Knowledge representation using fuzzy Petri nets. IEEE Transactions on Knowledge and Data Engineering 2(3), 311–319 (1990)

    Article  Google Scholar 

  20. Curcin, V., Ghanem, M., Guo, Y., Kohler, M., Rowe, A., Syed, J., Wendel, P.: Discovery net: towards a grid of knowledge discovery. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 658–663. ACM, New York (2002)

    Chapter  Google Scholar 

  21. Czajkowski, K., et al.: The WS-resource framework, Version 1.0. http://www-106.ibm.com/developerworks/library/ws-resource/ws-wsrf.pdf

  22. Davenport, T.H., Prusak, L.: Working Knowledge. Harvard Business School Press, Cambridge (1998)

    Google Scholar 

  23. Deng, Y., Chang, S.-K.: A G-net model for knowledge representation and reasoning. IEEE Transactions on Knowledge and Data Engineering 2(3), 295–310 (1990)

    Article  Google Scholar 

  24. Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting and randomization. Machine Learning 40, 139–158 (2000)

    Article  Google Scholar 

  25. Dunham, M.H.: Data Mining Introductory and Advanced Topics. Prentice-Hall, Englewood Cliffs (2002)

    Google Scholar 

  26. Eppler, M.J.: Making knowledge visible through intranet knowledge maps: Concepts, elements, cases. In: Proceedings of the 34th Hawaii International Conference on System Sciences (2001)

    Google Scholar 

  27. Forman, G., Zhang, B.: Distributed data clustering can be efficient and exact. In: SIGKDD Explorations, vol. 2 (2000)

    Google Scholar 

  28. Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure, pp. 593–620. Morgan Kaufmann, Los Altos (2004)

    Google Scholar 

  29. Foster, I., Kesselman, C., Nick, J., Tuecke, S.: The physiology of the grid: An open grid services architecture for distributed systems integration. http://www.globus.org/research/papers/ogsa.pdf

  30. Freitas, A.A., Lavington, S.H.: Mining Very Large Databases with Parallel Processing. Kluwer Academic, Dordrecht (1998)

    MATH  Google Scholar 

  31. Globus Tool Kit website: http://www.globus.org

  32. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA (2000).

    Google Scholar 

  33. Hudzia, B., McDermott, L., Illahi, T.N., Kechadi, M.-T.: Entity based peer-to-peer in a data grid environment. In: Proc. of 17th IMACS World Congress Scientific Computation, Applied Mathematics and Simulation, Paris, France, July 2005, pp. 11–15 (2005)

    Google Scholar 

  34. Januzaj, E., Kriegel, H.-P., Pfeifle, M.: DBDC: Density-based distributed clustering. In: Proc. of 9th Int. Conf. on Extending Database Technology (EDBT), Heraklion, Greece, pp. 88–105 (2004)

    Google Scholar 

  35. Joshi, M., et al.: Parallel algorithms for data mining. In: CRPC Parallel Computing Handbook. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  36. Le-Khac, N.-A., Aouad, L.M., Kechadi, M.-T.: An efficient support management tool for distributed data mining environments. In: 2nd IEEE International Conference on Digital Information Management (ICDIM’07), Lyon, France, October 28–31, 2007

    Google Scholar 

  37. Le-Khac, N.-A., Aouad, L.M., Kechadi, M.-T.: An efficient knowledge management tool for distributed data mining environments. International Journal of Computational Intelligence Research 5(1), 5–15 (2009)

    Article  Google Scholar 

  38. Martynov, M., Novikov, B.: An indexing algorithm for text retrieval. In: Proceedings of the International Workshop on Advances in Databases and Information System (ADBIS’96), Moscow, pp. 171–175 (1996)

    Google Scholar 

  39. Merz, C.J., Pazzani, M.J.: A principal components approach to combining regression estimates. Machine Learning 36, 9–32 (1999)

    Article  Google Scholar 

  40. Mingjin, Y., Keying, Y.: Determining the number of clusters using the weighted gap statistic. Biometrics 63(4), 1031–1037 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  41. Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: VLDB, Proceedings of 20th International Conference on Very Large Data Bases, Santiago de Chile, Chile, September 12–15, 1994

    Google Scholar 

  42. Novak, J.D., Gowin, D.B.: Learning How to Learn. Cambridge University Press, Cambridge (1984)

    Book  Google Scholar 

  43. OGSA-DAI website: http://www.ogsadai.org.uk/

  44. Park, J.S., Chen, M.-S., Yu, P.S.: An effective hash-based algorithm for mining association rules. In: SIGMOD’95: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, California, USA (1995)

    Google Scholar 

  45. Peterson, J.-L.: Petri nets. ACM Computing Surveys 9(3), 223–252 (1977)

    Article  MATH  Google Scholar 

  46. Purdom, P.W., Van Gucht, D., Groth, D.P.: Average-case performance of the Apriori algorithm. SIAM Journal on Computing 33(5) (2004)

    Google Scholar 

  47. Savasere, A., Omiecinski, E., Navathe, S.B.: An efficient algorithm for mining association rules in large databases. In: VLDB’95: Proceedings of the 21st International Conference on Very Large Databases, Zurich, Switzerland (1995)

    Google Scholar 

  48. Schuster, A., Wolff, R., Trock, D.: A high-performance distributed algorithm for mining association rules. In: ICDM’03: Proceedings of the Third IEEE International Conference on Data Mining, Melbourne, Florida, USA (2003)

    Google Scholar 

  49. Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a dataset via the gap statistic. Stanford University (2000)

    Google Scholar 

  50. Wexler, M.N.: The who, what and why of knowledge mapping. Journal of Knowledge Management 5, 249–263 (2001)

    Article  Google Scholar 

  51. Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 (2005)

    Article  Google Scholar 

  52. Zhang, B., Hsu, M., Dayal, U.: k-harmonic means—A data clustering algorithm, HP Labs (1999)

    Google Scholar 

  53. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Computing Surveys 38(2), Article 6 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nhien An Le Khac .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag London

About this chapter

Cite this chapter

Le Khac, N.A., Aouad, L.M., Kechadi, MT. (2010). Toward Distributed Knowledge Discovery on Grid Systems. In: Badr, Y., Chbeir, R., Abraham, A., Hassanien, AE. (eds) Emergent Web Intelligence: Advanced Semantic Technologies. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-84996-077-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-84996-077-9_9

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84996-076-2

  • Online ISBN: 978-1-84996-077-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics