Snake is a cold blooded reptile that is in majority perceived to be deadly to humans [15]. Since the ancient times, Snakes have been worshipped, feared and disliked by people across the world. Snake remain a painful reality in the daily life of millions of affected people and is largely one of the most misunderstood species [6, 7]. At the same time, they are more perilous than the wild animals due to their close existence near human habitation [2]. World Health organization reports around five million snake bites every year resulting in millions of envenomation, hundreds of thousands of amputations and deaths. In cities like Thiruvananthapuram in Kerala, that has high humidity environment, where we started our study, on daily approximately 25–30 Snakes sightings are reported. Majority of these sighted snakes were identified to equip with enough venom to kill a human in the course of few hours.

In tropical regions of the world, most of the snake bite cases are caused by four venomous snakes often referred to as “Big Four” snakes [8]. They include Spectacled Cobra (Naja naja), Common Krait (Bungarus caeruleus), Russell’s Viper (Daboia russelii) and Saw Scaled Viper (Echis carinatus) [7]. Another snakes which causes major snake bite cases and is very commonly found are King cobra (Ophiophagus Hannah) and Hump nosed Pit Viper (Hypnale hypnale). Due to this reason we restrict our study in this paper to these six deadly snakes [9, 10].

Although anti-venom is produced in sufficient quantities by several public and private manufacturers, most snake bite victims don’t have access to good quality care, and in populated countries like India, both morbidity and mortality due to snake bite is high. Because of serious misreporting, the true burden of snake bite is not known. Doctors mostly inject polyvalent anti-venom to the snake bite victim. This is injected without considering which snake has bitten the person, even under the situation when the patient has knowledge about some observational features of the snake under consideration. The taxonomy of the snake is not well understood by majority of the medical practitioners making the correct identification of the snake from the remarks of the victims or eye witness. The polyvalent anti-venom injected by the medical practitioner contains antibodies raised against two or more species of snake, which may neutralize the venom injected by a single snake bite. Since there is only one type of venom injected by a snake bite, the remaining non-neutralized part of the polyvalent anti-venom used for treating the patient creates further risk to the human health. So proper identification of the snake is very important for the proper medical treatment to save the life of the snake bite victims [911].

To our knowledge, there has been no research reported yet on computer based approach to automatically distinguish snake classes. This may be largely due the lack of database for this purpose and less awareness of snake taxonomy research. The lack of database of venomous snakes in India makes this research very challenging, as the collection of images often involve well trained snake catchers, photographers and expert biologists. Through this paper we provide an early set of snake images that are collected in a view to identify relevant features based on snake taxonomy. In addition, the images contain a wide range of features from different snakes that can help with gaining newer understanding on snake taxonomy. The Indian snake taxonomy is a topic that is not investigated with rigor and there is lack of expert taxonomists. This makes the first line snake identification difficult in life threatening situations that are essential for recommending accurate treatment to the snake bite victims.

Materials and methods

Snake database

The snake images for the experiment were collected from forest across different parts of Kerala, India with the help of snake catchers from Pujappura Panchakarma Serpentarium, Trivandrum, India, through the close and 1 year long interaction with the subjects under study. The total number of images used for this experiment is 1299 that are obtained from 10–15 wild snakes of each species taken at different occasions and time.

Table 1 shows the taxonomically relevant features and their logical grouping based on the top, bottom, side or body view of the snake in the captured image, and Figure 1 shows the visual description of taxonomy features for each of snake class. The descriptions of the snakes are included as a supplementary file (Additional file 1). In total, 38 taxonomy based features are identified for creation of the feature database from 1299 snake images collected. There are a total of 490 images of spectacled cobra, 304 images of Russell’s viper, 193 images of king cobra, 88 images of common krait, 116 images of saw scaled viper and 108 images of hump nosed pit viper. For creating the feature database, the 1299 snake images are manually converted by taxonomist to form feature vectors representing 38 taxonomically relevant features. This database file is included as a supplementary material to this article (Additional file 2).

Table 1 The table shows the grouping of the taxonomy features and its idealistic feature values for the creation of the database for automatic classification purpose
Figure 1
figure 1

Scale diagrams for Spectacled Cobra, Common Krait, Saw Scaled Viper, King Cobra, Russell’s viper and Hump Nosed Pit Viper observed at different natural view angles.

Feature ranking and selection

Out of 38 taxonomically relevant features, top features that have highest impact on classification are determined. In order to find the top features from the complete database following 12 Attribute Elevators are used: ChiSquared AttributeEval [12], CfsSubsetEval [13], ConsistencySubsetEval [14], FilteredAttributeEval [15], FilteredSubsetEval [16], GainRatioAttributeEval [17], InfoGainAttributeEval [18], OneRAttributeEval [19], PrincipalComponents [20], ReliefFAttributeEval [21], SVMAttributeEval [19], SymmetricalUncertAttributeEva [19], along with combination of certain search methods [21, 22] like Genetic Search, Greedy Stepwise, Linear Forward Selection, Rank Search, Scatter Search, Subset Size Forward Selection and Ranker. The histogram of the feature counts from these attribute elevators is then plotted to get the ranking of the taxonomically relevant features that are most useful for the classification as shown in Figure 2. The concept of ranking and histograms used in this method is useful for identifying the relevance of the features [2325]. The rank table is made with the help of this histogram based on the total number of repetitions of each features in the experiment. The repetitions of the feature results from the repeated ranking of features using different feature ranking method. The features that share same number of repetitions are then ranked on the basis of their average classification score taken independently for that feature i.e. features with highest average classification score among the features with same repetition is ranked first. Table 2 shows the ranking of all the 38 features using the attribute elevators with search method and classification score. The rank list of features is used to prepare 38 feature subsets with different numbers of features from 1 to 38 starting from the top feature to the last feature of Table 2. The numbers of features in the feature subsets are referred to as feature size.

Figure 2
figure 2

The histogram of the results from 12 Attribute Elevators in combination with certain search methods showing the top relevant features for classification.

Table 2 Ranking of all the 38 features based on the results from 12 attribute elevators with certain search method and the average classification score taken individually for all the features

Classifier selection and training

In order to perform automated snake classification following 13 classifiers are used: Bayes Net [26], Naïve Bayes [27], Multilayer perception [26], Ada BoostM1 [28], Multi BoostAB [29], RBF network [30], IB1 [30], IBk [31], LWL [32], NB Tree [33], J48 [34], Random Sub Space [35], and Bagging [36]. In the setting up the classification experiment, the database is split into training and test set. The training set is the one that will train the classifier parameter, while the test set is used to assess the performance of the classifier in terms of classification accuracy, F-score value, the area under the receiver operator characteristic curve, precision and recall rates. The selection of less number of samples per snake class in the training set makes the problem challenging and performance measures in such situations indicates classifiers applicability in practice. In our study, we use 5% of the samples from each snake class for the training set, while remaining 95% is selected as test set. The classifier that performs the best in terms of performance measures can be selected as a possible candidate for implementation.

The research and work submitted do conform to the guidelines for care and use of animals in scientific research. We’ve followed the guidelines published by Indian National Science Academy. The Ethics committee of Enview R&D Labs gave approval for the research work.

Results and discussion

The feature database of the snakes is as explained in Table 1 and Figure 1 is used for analysing the classification performance of this six class classification problem. The feature database contains 38 features of each sample. Now using Table 2, we perform our further experiments for databases with different feature size. The samples in the databases are randomly split into 5% samples in training set and 95% in test set and performance evaluated on individual classifiers. The selection of features is performed on the training set. To ensure statistical correctness, the selection and testing is repeated 100 times, and the resulted reported in Table 3. The testing is done such that test and training set are non-overlapping in samples. Table 3 shows the comparisons of average performance measures of 38 feature size databases. The performances indicated are percentage accuracy of correct classification, F-score value, the area under the receiver operator characteristic curve, precision and recall rates. Table 3 shows the variation of performance measures with the increase in feature size i.e. the number of features in the feature-subset. As shown in Table 3, the correct classification accuracy increases considerably till feature size 15 which contain top 15 features of rank list in the database and tend to drop from feature size 31. This proves that these top 15 features are alone enough for the automated snake identification instead of 38 taxonomically relevant features.

Table 3 Comparison of average classification result from 13 classifier in different feature size snake database with 5% train and 95% test of the total samples

Tables 4 and 5 shows the performance of the automatic snake classification using Bayes Net [37], Naive Bayes [27], Multilayer perception [26], Ada BoostM1 [28], Multi BoostAB [29], RBF network [30], IB1 [31], IBk [31], LWL [32], NB Tree [33], J48 [34], Random Sub Space [35], and Bagging [36] classification methods for top 15 selected snake feature database and 38 snake feature database respectively. The performances indicated are percentage accuracy of correct classification, F-score value, the Area under the receiver operator characteristic curve, precision and recall rates. The RBF network, IBk and IB1 classifiers showed higher classification performance as opposed other classifiers. The classification accuracy of above 85% in-dicates robustness of the taxonomically relevant features in the automatic classification process. Multilayer perception [26], RBF Network [30], IB1 [31], IBk [31], and J48 [34] shows good recognition performance among the tested classifiers at 5% training data. While increasing the training dataset size to 30% the multilayer perception [26] classifier results in 94.31 ± 1.00% classification accuracy. The results indicate the difficulty of automatic classification of snakes, nonetheless, is indicative of the practical use in as a first line prediction of the snake classification. These early results opens up two major directions of research: (1) as to identify the taxonomy features of unknown snakes using feature automatic feature analysis and (2) to develop accurate feature classification and recognition methods for automatic snake. To use of real-time applications such as in diagnosis an ambitious 100% accuracy is preferred, which is by far a challenging problem posed through these results. In addition, the results on 5% training data, is likely to be more useful in real-time systems as in real applications the size of the test data keeps on growing at a rate higher than the training data, mainly because of the labor intensive processes involved in the preparation and validation of the training data.

Table 4 Comparison of different classifiers when 5% of the class samples are used as gallery and remaining 95% of sample are used as test on top 15 selected snake feature database
Table 5 Comparison of different classifiers when 5% of the class samples are used as gallery and remaining 95% of sample are used as test on 38 snake feature database


In this paper, we presented an automatic snake identification problem by developing a taxonomy based feature targeted for use by the computer scientist and herpetologist. The feature-subset analysis indicated that only 15 features are sufficient for snake identification. In a real-life situation, the snake feature database reflects a situation when the bite victim has seen the snake, and based on the observed features it is required to identify the class of the snake. In addition to the venom detection research required for treating the bite victims, the proposed automatic snake recognition method could provide valuable information to administer correct medication and treatment in life threatening situation. Survey of snakes in wild is another major activity in the process to ensure the preservation of snake population and diversity. This is however a very challenging task and require prohibitive investments in manpower. The automatic classification using snake image database can be extended to the analysis of snake images captured remotely with minimal human intervention. The progress in snake taxonomy research is in the decline for the last 60 years, and has resulted in lack of expertise for environmental surveys and help required for medical practitioners in emergency situations. With a computerized analysis on the images of snakes using the proposed database and classification approach, we hope that more studies would come out to generate interest on this topic.

Additional files