Heart: a novel optimization algorithm for cluster analysis
Nature has always been a source of inspiration. Natural computing is a type of technology that develops computational systems with the help of ideas derived from the nature. This paper presents a novel optimization method that utilizes the action of the heart and circulatory system in human beings. This algorithm starts with a randomly initial population of candidate solutions and objective function which is computed for them. The best candidate solution is selected as heart and the others form blood molecules. Then the heart enforces other candidates to move toward/away from the heart and search for the optimal solution. The application of the proposed algorithm on data clustering using several benchmark datasets confirms its potential and greatness.
KeywordsNatural computing Heart optimization algorithm Data clustering
New and effective ideas are often inspired by natural processing. Human beings often observe the nature to create new technologies. The created technologies may be different from the inspired natural phenomena. Natural computing is the field of science that develops new software, hardware or wetware tools inspired by nature for problem-solving purposes. Natural computing can also be referred to as bio-inspired computing or biologically motivated computing or computing with biological metaphors [1, 2, 3, 4, 5].
The nature-inspired computational algorithms and systems are the most popular and oldest natural computing approaches. There are two main objectives in this field: the modeling of natural phenomena and its simulation in computers is the first objective. The main goal in this direction is to develop models which can be implemented into computer systems which are able to quantitatively and qualitatively reproduce the functions of natural mechanism. Solving complex problems through algorithms and developing computing systems inspired by natural phenomena is the second objective. The stimulus for this innovation is finding techniques to solve the problems that could not be addressed properly in other traditional methods. Among many approaches within computing inspired by nature, the most well-known ones are the evolutionary algorithms, artificial neural networks, swarm intelligence and artificial immune systems.
This paper introduces a novel optimization method and its application to cluster analysis that is discovered through simulation of a simplified action of the heart organ in human beings body.
Clustering is the act of dividing a set of objects into a finite number of groups in such a way similarity of objects inside a group is maximal, while similarity of objects in different groups is minimal [6, 7]. There are many clustering algorithms available, but no uniformly adequate for all applications, data types and conditions [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]. Clustering algorithms have been used in a large variety of fields and applications ranging from biology, marketing and customer analysis, bankruptcy prediction, e-learning systems, intrusion detection, document clustering, outlier detection, medicine, image processing, etc., [19, 20, 21, 22, 23, 24].
The remainder of this study is organized as follows. Cluster analysis is discussed in Sect. 2. A simple explanation of the heart and circulation system is given in Sect. 3. Section 4 presents the heart optimization algorithm and its application to cluster analysis. Section 5 discusses the simulation results using four benchmark datasets. Finally, concluding remarks are made in Sect. 6.
2 Data clustering
3 The heart and circulation system
The primary function of the human circulation system is to pump blood and traverse oxygen from the lungs to all the parts of the body. The circulation system constitutes three components namely heart, blood vessels and blood with specific functions. The heart acts as the pumping station that pumps the blood to the whole body. The blood vessels comprise of arteries, arterioles, capillaries, venules and veins they act as the channels for the transportation of blood. The main function of the blood is to transmit oxygen from the lungs to remaining organs of the body and carry carbon dioxide to be removed from the body. The circulation system also has two loops namely pulmonary loop and systemic loop. The pulmonary loops are the blood vessels that travel to and fro between the heart and lungs whereas the systemic loops are the blood vessels that connect the heart and body to and fro. The pulmonary loop gets rid of the carbon dioxide and oxidizes the blood. After the blood is oxygenated it is taken back to the heart to be pumped around the systemic loop whereby the oxygen is supplied to the whole body [26, 27, 28].
In summary, the atria receive blood returning to the heart from the body and ventricles eject blood from the heart to the body. Arteries carry blood from the heart and veins bring blood back to the heart. Pulmonary loop connects heart and lungs whereas the systemic loop connects to the body.
4 Heart optimization algorithm
Most of the natural computing approaches are based on related simplified mechanisms or processes found in natural phenomena. There are various reasons for this simplification. The first and foremost reason is to allow computations with large number of tractable entities. At the same time it is beneficial to emphasize the minimum features needed to allow some aspects of a system to be reproduced and monitor some evolving properties.
Based on the above description, our proposed approach is a simplified and abstract version of the heart and blood circulatory system. This algorithm starts with some initial candidate solutions over the search space. The best candidate solution is selected to be the heart and all the other candidates form the blood molecules.
The fitness value of each molecule is again calculated by the objective function. If there is a molecule which has a better fitness value than the heart, they exchange their positions and that molecule becomes the new heart and the old heart changes to a normal molecule.
These expansion and contraction actions of the heart are carried out repeatedly and cause blood molecules (candidate solutions) to move in the problem space toward the heart and vice versa until the termination criterion has been met. Here, a maximum number of iterations are used as the termination criterion. The movement of candidate solutions toward the heart will hopefully cause all the candidate solutions to converge to a (near) optimal solution.
5 Experimental results
We have compared the performance of the proposed heart algorithm on data clustering with four other approaches including k-means , particle swarm optimization (PSO) , gravitational search algorithm (GSA)  and big bang-big crunch algorithm (BB-BC) . Six popular benchmark datasets have been used to assess the performance of clustering algorithms. The datasets are Iris, Wine, Contraceptive Method Choice (CMC), Wisconsin breast cancer, Glass and vowel datasets taken from Machine Learning Laboratory which have the following characteristics :
Iris (\(n=150, d=4, k=3\)): The dataset consists of three categories that consign a type of iris plant and each category contains 50 objects. One of the categories is linearly separable form the other two which are not linearly separable. The iris dataset consists of 150 instances with four numeric features without any missing attribute value. The attributes in the iris dataset are sepal length in cm, sepal width in cm, petal length in cm and petal width in cm.
Wine (\(n=178, d=13, k=3\)): This dataset contains chemical analysis results of wines grown by tree different agriculturists from the same region in Italy. The analysis determines the quantities of 13 constituents form each type of the wine. The wine dataset consists of 178 instances and 13 continuous-numeric attributes without any missing attribute values. Each category has 59, 71 and 48 objects, respectively.
CMC (\(n = 1473, d = 9, k = 3\)) also referred as Contraceptive Method Choice: This is a subset of the Indonesian national contraceptive prevalence survey of the year 1987. The respondents of the survey were married pregnant women or those who did not know their pregnancy status at the time of interview. The survey was conducted to predict the choice of contraceptive methods among women based on their demographic and socioeconomic attributes. 334 objects use long-term methods 510 objects used short-term method whereas 629 objects used none of the contraceptive methods.
Wisconsin breast cancer (\(n = 683, d = 9, k = 2\)): This dataset consists of 683 objects with 9 distinct features such as thickness of clump, uniformity of cell size, uniform cell shape, minor adhesion, solitary epithelial cell size, bare nuclei, weak chromatin, normal nucleoli, and mitoses. The dataset constitutes two categories such as malignant with 444 objects and benign with 239 objects.
Glass (\(n=214, d=9, k=6\)): This dataset contains samples from six different types of glass: tableware, building windows float processed, vehicle windows float processed, building windows non-float processed, containers, and headlamps. There are 214 instances with 9 numeric attributes in glass dataset.
Vowel (\(n=871, d=3, k=6\)): This dataset consists of 871 Indian Telugu vowel sounds. It has six overlapping classes and three attributes for each instance corresponding to the first, second, and third vowel frequencies.
Experimental results of clustering algorithms
From Table 1, we can see that the Heart algorithm has achieved the best performance in terms of the average, best, and worst inter-cluster distances on all the test datasets. Only on the Iris dataset, the result of two algorithms including GSA and BB-BC is comparable with the proposed Heart algorithm. For the Wine dataset, the best, average and worst intra-cluster distances obtained by Heart are 16293.48995, 16294.74339 and 16296.31318, greatly better than other compared methods. On the Glass dataset, the Heart algorithm provides the optimum value of 210.52375, while the best solution found by other algorithms is for the K-means algorithm which is 215.67753. For the other three data sets, as seen from the results, the Heart algorithm performs far superior to other methods. The most interesting thing about the results is that in four of the test datasets, even the worst solution of the Heart algorithm is better than the best solutions found by the other algorithms.
In general, the simulation results indicate that the proposed Heart algorithm is a robust and precise approach for data clustering. This algorithm provides the optimum value and small standard deviation as opposed to other methods.
Best centroids obtained by the Heart algorithm on the Iris dataset
Best centroids obtained by the Heart algorithm on the Wine dataset
Best centroids obtained by the Heart algorithm on the Cancer dataset
Best centroids obtained by the Heart algorithm on the CMC dataset
Best centroids obtained by the Heart algorithm on the Glass dataset
Best centroids obtained by the Heart algorithm on the Vowel dataset
Modeling the behavior of natural phenomena for the purpose of search and problem solving is an active research area. In this paper, a new optimization algorithm is developed to solve clustering problems which is inspired by the action of the heart and circulatory system. To evaluate the performance of the proposed algorithm, it is compared with other popular algorithms using four benchmark datasets. The experimental results indicate the high potential and efficiency of the Heart algorithm. This research leads to various dimensions of research directions which could be combined with local search strategy or fused with other Meta heuristic algorithms. Application of this proposed algorithm to solve other optimization problems can be considered as possible future research.
- 1.de Castro, L.N.: Fundamentals of natural computing: basic concepts, algorithms, and applications. CRC Press, Boca Raton (2006)Google Scholar
- 2.de Castro, L.N., Von Zuben, F.J.: Recent developments in biologically inspired computing. Idea Group Publishing, Hershey, USA (2004)Google Scholar
- 3.Cooke, D.E., Computing with biological metaphors. In: Paton, R. (ed), pp. 97–98. Chapman & Hall, London, ISBN: 0142 544 709 (1994)Google Scholar
- 4.Cooke, D.E.: Engineering Applications of Artificial Intelligence. 9(1): 97–98 (1996)Google Scholar
- 7.Han, J., Kamber, M.: Data mining: concepts and techniques. Academic Press, London (2006)Google Scholar
- 12.Ching-Yi, C., Fun, Y.: Particle swarm optimization algorithm and its application to clustering analysis. In: Proceedings of the 2004 IEEE International Conference on Networking, sensing and control, pp. 789–794 (2004)Google Scholar
- 13.Hatamlou, A., Abdullah, S., Hatamlou, M.: Data clustering using big bang-big crunch algorithm. In: Pichappan, P., Ahmadi, H., Ariwa, E. (eds.) Communications in Computer and Information Science, vol. 241, pp. 383–388. Springer, Heidelberg (2011)Google Scholar
- 14.Hatamlou, A., Abdullah, S., Nezamabadi-pour, H.: Application of gravitational search algorithm on data clustering, rough sets and knowledge technology, pp. 337–346. Springer, Berlin (2011)Google Scholar
- 15.Hatamlou, A., Abdullah, S., Othman, Z.: Gravitational search algorithm with heuristic search for clustering problems. In: 2011 3rd Conference on Data Mining and Optimization (DMO) (2011)Google Scholar
- 16.Senthilnath, J., Omkar, S.N., Mani, V.: Clustering using firefly algorithm: performance study. Swarm Evol. Comput. 1(3), 164–171 (2011) Google Scholar
- 17.Hatamlou, A., Hatamlou, M.: PSOHS: an efficient two-stage approach for data clustering. Memet. Comput. 5(2), 155–161 (2013)Google Scholar
- 25.Han, J., Micheline, K.: Data mining: concepts and techniques. Academic Press, London (2001)Google Scholar
- 26.Gabriel Khan, M.: Anatomy of the Heart and Circulation. In: Encyclopedia of Heart Diseases, chap 4, pp. 13–21. Academic Press, Burlington (2006) ISBN 9780124060616Google Scholar
- 29.Blake, C.L., C.J.M., UCI repository of machine learning databases. Available from: http://www.ics.uci.edu/-mlearn/MLRepository.html