Abstract
Considering the existing massive volumes of data processed nowadays and the distributed nature of many organizations, there is no doubt how vital the need is for distributed database systems. In such systems, the response time to a transaction or a query is highly affected by the distribution design of the database system, particularly its methods for fragmentation, replication, and allocation data. According to the relevant literature, from the two approaches to fragmentation, namely horizontal and vertical fragmentation, the latter requires the use of heuristic methods due to it being NP-Hard. Currently, there are a number of different methods of providing vertical fragmentation, which normally introduce a relatively high computational complexity or do not yield optimal results, particularly for large-scale problems. In this paper, because of their distributed and scalable nature, we apply swarm intelligence algorithms to present an algorithm for finding a solution to vertical fragmentation problem, which is optimal in most cases. In our proposed algorithm, the relations are tried to be fragmented in such a way so as not only to make transaction processing at each site as much localized as possible, but also to reduce the costs of operations. Moreover, we report on the experimental results of comparing our algorithm with several other similar algorithms to show that ours outperforms the other algorithms and is able to generate a better solution in terms of the optimality of results and computational complexity.
Similar content being viewed by others
References
Adl KR, RouhaniRankoohi SMT (2009) A new ant colony optimization based algorithm for data allocation problem in distributed databases. J Knowl Inf Syst (Springer)
Babad M (1977) A record and file partitioning model. Commun ACM 20(1): 29–31
Benner H (1967) On designing generalized file records for management information systems. In: Proceedings of the fall joint computer conference, pp 291–303
Bonabeau E, Dorigo M, Theraulaz G (1999) Swarm intelligence: from natural to artificial systems, institute studies in the sciences of complexity. Oxford University Press, Santa Fe
Ceri S, Plagatti G (1984) Distributed databases principles and systems. McGraw-Hill Book Company, New York
Ceri S, Navathe SB, Weiderhold G (1983) Distribution design of logical database schemas. IEEE Trans Softw Eng 9(4): 487–503
Ceri S, Pernici S, Weiderhold G (1989) Optimization problems and solution methods in the design of data distribution. Inf Sci 14(3): 261–272
Chakravarthyt S, Varadarajan R, Navathe SB, Muthuraj J (1993) A formal approach to the vertical partitioning problem in distributed database design. In: Proceedings of parallel and distributed information systems (PDIS-2) IEEE, pp 26–34
Chu WW, Fellow IEEE, Ieong IT (1993) A transaction-based approach to vertical partitioning for relational database systems. IEEE Trans Softw Eng 19(8): 804–8012
Cornell D, Yu P (1987) A vertical partitioning algorithm for relational databases. In: Proceeding of third international conference on data engineering, pp 30–35
Cui X, Potok TE, Palathingal P (2005) Document clustering using particle swarm optimization. In: IEEE transaction on swarm intelligence symposium(SIS) proceedings, pp 185–191
Day H (1956) An optimal extracting from a multiple file data storage system: an application of integer programming. Oper Res 13(3): 482–494
Dearnley P (1974) Model of a self-organizing data management system. Comput J 17(1)
Deneuborg JL (1990) The dynamics of collective sorting robot-like ants and ant-like robots. In: 1st international conference on simulation of adaptive behaviour: from animals to animats, vol 1. MIT Press, pp 356–363
Du J, Alhajj R, Barker K (2006) Genetic algorithms based approach to database vertical partition. J Intell Inf Syst 26: 167–183
Eisner M, Severance D (1976) Mathematical techniques for efficient record segmentation in large shared databases. J ACM 23(4)
Falkenauer E (1998) Genetic algorithms and grouping problems. Wiley, England
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W.H. Freeman, USA
Hammer M, Niamir B (1979) A heuristic approach to attribute partitioning. In: Proceedings ACM SIGMOD international conference on management of data
Handl J, Meyer B (2002) Improved ant-based clustering and sorting in a document retrieval interface. In: Proceeding of the 7th internatioanl conference on parallel problem solving from nature, pp 913–923
Handl J, Knowles J, Dorigo M (2003) On the performance of ant-based clustering. Front Artif Intell Appl 104: 204–213
Hoe K, Lai W, Tai T (2002) Homogenous ants for web document similarity modeling and categorization. In: Proceedings of the third international workshop on ant algorithms, LNCS, vol 2463. Springer, Berlin, pp 256–261
Hoffer J (1976) An integer programming formulation of computer database design problems. Inf Sci 11: 29–48
Hoffer J, Severance D (1975) The uses of cluster analysis in physical database design. In: Proceeding of 1st international conference on VLDB, Framingham, pp 69–86
Jing L, Ng MK, Huang JZ (2010) Knowledge-based vector space model for text clustering. J Knowl Inf Syst (Springer) 25(1): 35–55
Kennedy R (1973) The use of access frequencies in database organization. PhD Dissertation, The Wharton School, University of Pennsylvania
Kennedy SR (1972) A file partition model. Technical report in information science
Kranen P, Assent I, Baldauf C, Seidl T (2010) The ClusTree: indexing micro-clusters for anytime stream mining. J Knowl Inf Syst (Springer)
Lumer E, Faieta B (1994) Diversity and adaption in populations of clustering ants. In: 3rd international conference on simulation of adaptive behaviour: from animals to animats, vol 3. MIT Press
Lumer E, Faieta B (1995) Exploratory database analysis via self-organization. Unpublished manuscript. Results summarized in
March S, Severance D (1977) The determination of efficient record segmentation and blocking factors for share data files. ACM Trans Database Syst 2(3): 279–296
McCormick W, Schweitzer P, White T (1972) Problem decomposition and data reorganization by a clustering technique. Oper Res
Navathe S, Ceri S, Wiederhold G, Dou J (1984) Vertical partitioning algorithms for database design. ACM Trans Database Syst 9(4)
Navathe SB, Ra M (1989) Vertical partitioning for database design: a graphical algorithm. ACM SIGMOD Record 18(2): 440–450
Ni X, Quan X, Lu X, Wenyin L, Hua B (2010) Short text clustering by finding core terms. J Knowl Inf Syst (Springer)
Ozsu MT, Valduriez P (1999) Principles of distributed database systems. Printice Hall, Englewood Cliffs
Pérez J, Pazos R, Frausto J, Romero D, Cruz L (1998) Vertical fragmentation and allocation in distributed databases with site capacity restrictions using the threshold accepting algorithm. Parallel Distributed Comput Syst, Las Vegas, pp 210–213
Ramos V, Merelo JJ (2002) Self-organized stigmergic document maps: environments as a mechanism for context learning. In: Proceedings of the first Spanish conference on evolutionary and bio-inspired algorithm, pp 284–293
Sakuma J, Kobayashi S (2009) Large-scale k-means clustering with user-centric privacy-preservation. J Knowl Inf Syst (Springer)
Sarathy R, Shetty B, Sen A (1997) A constrained nonlinear 0–1 program for data allocation. Eur I Oper Res 102: 626–647
Schkolnic M (1977) A clustering algorithm for hierarchical structures. ACM TODS 1(2): 27–44
Seppala Y (1967) Definition of extraction files and their optimization by zero-one programming. BIT 7(3): 206–215
Song SK, Gorla N (2000) A genetic algorithm for vertical fragmentation and access path selection. Comput J 43(1)
Stocker M, Dearnley A (1973) Self-organizing data management systems. Comput J 16(2)
Stutzle T (1997) MAX-MIN Ant system for the qadratic assignment problem. Technical report AIDA-97-4, FG Intellectik, FB Informatik, TU Darmstadt, Germany
Stutzle T, Dorigo M (1999) ACO algorithms for the quadratic assignment problem. In: Corne D, Dorigo M, Glover F New ideas in optimization. McGraw-Hill, Maidenhead
Taillard ED (1995) Comparison of iterative searches for the quadratic assignment problem. Locat Sci 3: 87–105
Takacs B, Demiris Y (2009) Spectral clustering in multi-agent systems. J Knowl Inf Syst (Springer)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Goli, M., Rouhani Rankoohi, S.M.T. A new vertical fragmentation algorithm based on ant collective behavior in distributed database systems. Knowl Inf Syst 30, 435–455 (2012). https://doi.org/10.1007/s10115-011-0384-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-011-0384-6