Shared-Table for Textual Data Clustering in Distributed Relational Databases

  • Wael M. S. Yafooz
  • Siti Z. Z. Abidin
  • Nasiroh Omar
  • Rosenah A. Halim
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 285)

Abstract

High-performance query processing is a significant requirement of database administrators that can be achieved by grouping data into continuous hard disk pages. Such performance can be achieved by using database partitioning techniques. Database partitioning techniques aid in splitting of the physical structure of database tables into small partitions. A distributed database management system is advantageous for many businesses because such a system aids in the achievement of high-performance processing. However, massive amount of data distributed over network nodes affect query processing when retrieving data from different nodes. This study proposes a novel technique based on a shared-table in a relational database under a distributed environment to achieve high-performance query processing by using data mining techniques. A shared-table is used as a guide to show where the data should be saved. Thus, the efficiency of query processing will improve when data is saved at the same location. The proposed method is suitable for news agencies and domains that rely on massive amount of textual data.

Keywords

Database clustering Relational database Distributed environment 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgments

The authors wish to thank Universiti Teknologi MARA (UiTM) for the financial support. This work was supported in part by a grant number 600-RMI-/DANA 5/3/RIF (498/2012).

References

  1. 1.
    Abuelyaman, E.S., An Optimized Scheme for Vertical Partitioning of a Distributed Database. IJCSNS International Journal of Computer Science and Network Security, 2008. VOL.8 No.1: p. 310-316.Google Scholar
  2. 2.
    Khan, S.I. and D.A.S.M.L. Hoque, A New Technique for Database Fragmentation in Distributed Systems. International Journal of Computer Applications, 2010. Volume 5– No.9: p. 0975 – 8887.Google Scholar
  3. 3.
    Chu, W.W. and I.T. Ieong, A Transaction-Based Approach to Vertical Partitioning for Relational Database Systems. Software Engineering, IEEE Transactions on, 1993. VOL. 19, NO. 8.Google Scholar
  4. 4.
    Li, L. and L. Gruenwald, Autonomous Database Partitioning using Data Mining on Single Computers and Cluster Computers. Proceedings of the 16th International Database Engineering & Applications Sysmposium. ACM, 2012.Google Scholar
  5. 5.
    Ma, H., K.-D. Schewe, and M. Kirchberg, A Heuristic Approach to Vertical Fragmentation Incorporating Query Information. Databases and Information Systems, 2006. 7th International Baltic Conference on. IEEE: p. 69-76.Google Scholar
  6. 6.
    Rodriguez, L. and X. Li, A vertical partitioning algorithm for distributed multimedia databases.. In e. a. A Hameurlain, editor, Proceedings of DEXA,. Springer Verlag, 2011. Vol 6861 (544—558).Google Scholar
  7. 7.
    RodríguezA, L. and X. Li, A dynamic vertical partitioning approach for distributed database system. Systems, Man, and Cybernetics (SMC), IEEE International Conference on. IEEE, 2011.Google Scholar
  8. 8.
    Song, S. and N. Gorla, A genetic Algorithm for Vertical Fragmentation and Access Path Selection. The Computer Journal, 2000. vol. 45, no. 1: p. 81-93.Google Scholar
  9. 9.
    Zhang, Y., On horizontal fragmentation of distributed database design. in M. Orlowska & M. Papazoglou, eds, Advances in Database Re- search, 1993. World Scientific Publishing: p. 121-130.Google Scholar
  10. 10.
    Ceri, S., M. Negri, and G. Pelagatti, Horizontal data partitioning in database design. in Proc. ACM SIGMOD, 1982.Google Scholar
  11. 11.
    S. Navathe, K.K., Minyoung Ra, Amixed fragmentation methodology for initial distributed database design. Journal of Computer and Software Engineering 1995. 3.4 (1995): p. 395-426.Google Scholar
  12. 12.
    Gorla, N., V. Ng, and D.M. Law, Improving database performance with a mixed fragmentation design. J Intell Inf Syst (2012) 39, 2012. 39: p. 559–576.Google Scholar
  13. 13.
    Hoffer, H.A. and D.G. Severance, The Use of Cluster Analysis in Physical Database Design. Proceedings First Internutionul Conference on Vety Large Data Bases, 1975.Google Scholar
  14. 14.
    Navathe, S., et al., Vertical partitioning algorithms for database design. ACM Transactions on Database Systems (TODS) 9.4, 1984: p. 680-710.Google Scholar
  15. 15.
    Navathe, S.B. and M. Ra, Vertical Partitioning for Database Design: A Graphical Algorithm. ACM SIGMOD Record 18.2, 1989.Google Scholar
  16. 16.
    Ra, M., Horizontal partitioning for distributed database design. In Advances in Database Research, World Scientific Publishing, 1993: p. 101–120.Google Scholar
  17. 17.
    Ng, V., et al., Applying genetic algorithms in database partitioning. SAC ‘03 Proceedings of the ACM symposium on Applied computing, 2003: p. 544-549.Google Scholar
  18. 18.
    Ozsu, M.T. and P. Valduriez, Principles of Distributed Database Systems. 2nd ed., New Jersey: Prentice-Hall, 1999.Google Scholar
  19. 19.
    McCormick, W.T., P.J. Schweitzer, and T.W. White, Problem decomposition and data reorganization by a clustering technique. 1972. Operations Research 20.5: p. 993-1009.Google Scholar
  20. 20.
    Chakravarthy, S., et al., An objective function for vertically partitioning relations in distributed databases and its analysis. Distributed and parallel databases 2.2 1994. 183-207.Google Scholar
  21. 21.
    Muthuraj, J., et al., A formal approach to the vertical partitioning problem in distributed database design. Parallel and Distributed Information Systems, Proceedings of the Second International Conference on. IEEE, 1993.Google Scholar
  22. 22.
    Guinepain, S. and L. Gruenwald, Using Cluster Computing to Support Automatic and Dynamic Database Clustering. Cluster Computing, 2008 IEEE International Conference on. IEEE, 2008.Google Scholar
  23. 23.
    Rodríguez, L., et al., DYMOND: An Active System for Dynamic Vertical Partitioning of Multimedia Databases. Proceedings of the 16th International Database Engineering & Applications Sysmposium. ACM, 2012., 2012.Google Scholar
  24. 24.
    Cheng, C.-H., W.-K. Lee, and K.-F. Wong, A Genetic Algorithm-Based Clustering Approach for Database Partitioning. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 2002. VOL. 32, NO. 3: p. 215-230.Google Scholar
  25. 25.
    Surmsuk, P. and S. Thanawastien, The Integrated Strategic Information System Planning Methodology. 11th IEEE International Enterprise Distributed Object Computing Conference, 2007.Google Scholar
  26. 26.
    Montalvo, S., F. Víctor, and M. Raquel, NESM: a Named Entity based Proximity Measure for Multilingual News Clustering. Procesamiento de Lenguaje Natural, 2012. 48: p. 81-88.Google Scholar
  27. 27.
    Cao, T.H., T.M. Tang, and C.K. Chau, Data Mining: Foundations and Intelligent Paradigms Springer Berlin Heidelberg, 2012: p. 267-287.Google Scholar
  28. 28.
    YafoozB, W.M.S., S.Z. Abidin, and N. Omar, Challenges and issues on online news management. Control System, Computing and Engineering (ICCSCE),IEEE International Conference on., 2011.Google Scholar
  29. 29.
    Krishna, S.M. and S.D. Bhavani, An Efficient Approach for Text Clustering Based on Frequent Itemsets. European Journal of Scientific Research, 2010. ISSN 1450-216X Vol.42 No.3: p. 399-410.Google Scholar
  30. 30.
    Beil, F., M. Ester, and X. Xu, Frequent Term-Based Text Clustering. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2002.Google Scholar

Copyright information

© Springer Science+Business Media Singapore 2014

Authors and Affiliations

  • Wael M. S. Yafooz
    • 1
  • Siti Z. Z. Abidin
    • 1
  • Nasiroh Omar
    • 1
  • Rosenah A. Halim
    • 1
  1. 1.Faculty of Computer and Mathematical SciencesUiTMShah AlamMalaysia

Personalised recommendations