Background

As the amount of data generated on daily basis grows, there is a need for developing new and state-of-the-art techniques for mining these data to discover useful knowledge and patterns. This is also becoming increasingly important for biological data as the information collected automatically by biosensors is accumulating daily and these data are often nonlinear, voluminous and sometimes unstructured (Hamadani and Khan 2015). Conventional systems and techniques are inadequate for the analysis of this kind of data. This is leading to the use of new techniques for data mining in all spheres of life and animal farms across the globe are no exceptions to this. Data mining is a process that automatically extracts hidden, formerly unknown, and valuable knowledge from such big data into all spheres of life. However, data mining is a big challenge for animal sciences because it requires knowledge of the fields of advanced statistics, artificial intelligence (AI), database management, etc. which is generally not in the scope of conventional biology. Therefore, the need of the hour is to delve deeper into interdisciplinary sciences (Bhardwaj et al. 2022) and gain insights for the benefiting animal sciences and also to keep pace with the current global trends. The conventional practices on animal farms have involved analysis of data ever since scientific management and breeding were first performed. Data analysis is since a regular practice for understanding various patterns and trends, especially for the data-intensive science of animal genetics and breeding (Hamadani et al. 2019) and research have been conducted for understanding genetic trends (Hamadani et al. 2021c), genetic parameters of farm animals (Rather et al. 2020), performance evaluation (Hamadani et al. 2020a), genetic evaluation (Khan et al. 2022) and farm economics (Hamadani et al. 2020b) Among other techniques, perhaps the most popular one in data mining today is the artificial neural network (ANN) because of its ability to handle nonlinear and noisy data in addition to its other abilities. ANNs are being used for solving new and emerging problems in all areas of animal sciences, and the results of such networks are promising. These are crucial for blurring for bringing transformative changes into the sector and chauffer it towards a major technological revolution thus keeping pace with global trends. An insight into the latest technologies and their various functionalities for data mining in animal sciences, which includes livestock as well as poultry (Hamadani et al. 2020c, 2022a), is critical for developing an understanding of their basics as well as their applicability. This is crucial for the livestock sector which is and has always been a major sector for the economy and livelihood, especially in developing countries. The primary objective of this paper is to provide an overview of this interdisciplinary ML technique and review major research that is being done in this area. Research in this area is cited within the manuscript at relevant places.

Main text

What is data mining?

The process of discovering patterns in huge datasets is called data mining (DM). This interdisciplinary field involves the intersection of machine learning, statistics, and database management systems. Data mining aims to extract information intelligently from datasets and present information in a lucid manner for further use. It involves knowledge discovery in databases also known as KDD and includes aspects of database and data management, data pre-processing, considerations for model and inference, metrics for interestingness, and complexity, post-processing, visualization, and online updating. The term data mining is frequently applied to all forms of substantial-scale processing of information which includes collection, extraction, warehousing, analysis, and statistics. It also includes the major emerging and interesting fields including artificial intelligence (e.g. machine learning) as well as business intelligence. In animal sciences, data mining has been most used in animal genetics and breeding as the subject is data intensive. The data mining dates to as early as 1977 Artificial intelligence techniques include advanced techniques like genetic algorithms (GAs), artificial neural networks (ANNs), support vector machines (SVMs), fuzzy logic, etc. However, the conventional technique of data analysis is not the same as data mining. Data analysis is essentially done to test models and hypotheses on the dataset which is often been practised in the estimation of morphometric parameters (Hamadani et al. 2016) or genetic parameters in animals (Hamadani et al. 2021a). Data mining, on the other hand, uses machine learning and statistical models on a large dataset to discover hidden patterns (Olson 2006). These patterns also include inferences that were hitherto unknown. Before the actual data mining or knowledge discovery is undertaken, the data is always pre-processed. DM aims to discover previously unknown knowledge from the data associated with different processes and models and this may be referred to as knowledge discovery from data (KDD). DM tasks are often divided into two categories which include predictive tasks and descriptive tasks. The knowledge discovery in databases (KDD) process is commonly defined with the stages: selection, pre-processing, transformation, data mining, and interpretation/evaluation. However, a more popular variation of the process is the cross-industry standard process for data mining (CRISP-DM). It defines six phases which are business and data understanding, preparation, modelling, evaluation, and deployment. Pietersma et al. (2004) used this model for the heifer growth data of Quebec dairy herds to understand the factors associated with delayed first calving. Among other interesting patterns, they found that the decision of the dairy producer to breed heifers was a major factor leading to delayed age at first calving but rather at a relatively heavy weight. Another categorization that captures the DM activity process is as follows: data dimensionality reduction (DDR), classification and clustering, and rule extraction. The common tasks of data mining include:

  • Anomaly detection includes the identification of unusual yet interesting data records or it also helps to find data errors that require further investigation.

  • DDR also involves feature extraction or selection. New features are obtained from the original data to reduce data dimensionality. This is important for increasing computational efficiency as well as classification accuracy. DDR makes use of techniques such as genetic algorithms (GAs), sequential forward selection (SFS), principal components analysis (PCA), and sequential backward selection (SBS).

  • Association rule learning discovers relationships between variables. For example, it is association rule learning that supermarkets/companies use to determine products that are frequently bought together. This information is then used for marketing.

  • The clustering task is useful for discovering structures and groups in the data that have some similarity. Clustering is performed without using any known structures in the data.

  • Classification is the task of generalizing known patterns to extrapolate to new data. Using the information learned, the classification algorithm can classify data that was hitherto unknown to it.

  • The regression task attempts to find an appropriate function that can model the data with the minimum possible error. This is done by estimating the relationships among data or datasets.

  • Summarization is useful for providing a concise representation of the data set. This includes visualization and report generation.

  • Results validation is the final step of knowledge discovery from data. Results validation verifies that the patterns are produced by actual learning and that they can be generalized to a larger dataset.

Sometimes a data mining algorithm may find patterns in the training set which may not be present in the general data set. This is also known as overfitting. Therefore, validation is done using a test set of data which the data mining algorithm is exposed to upon completion of training. Several statistical methods are used for the evaluation of the algorithm, such as precision, recall, ROC curves, and accuracy. Clustering, classification, and regression have been used extensively in animal sciences. In an interesting study, clustering individual animals by species based only on the daily movements of the animals has been done (Curry 2014). Clustering based on other factors like age, lactation length has also been done in animals. Regression in sheep data was performed for breeding value prediction as well as body weight prediction (Hamadani et al. 2022b). As for classification, animal species classification has also been attempted (Alharbi et al. 2019).

Pre-processing

Before actual data mining algorithms can be performed on the data set, they must first be assembled. The target dataset must be huge to contain all the patterns so that they can be discovered with precision. The commonest source for this kind of data is a data warehouse or mart. Pre-processing is essential to analyse multivariate data sets before data mining. Data pre-processing may vary as per the type of data and the techniques. However, a few of the most important data pre-processing techniques include, dealing with missing values and erroneous data, outlier removal, normalization clustering, labelling, data reduction, etc. Hamadani et al. (2021b), used a popular technique of winsorization for the removal of outliers in a sheep dataset and found that it was effective in removing the skewness of the dataset along with the preservation of the amount of data. If proper pre-processing is not done, the results obtained from the trained model are unreliable hence giving birth to the popular phrase, “garbage in, garbage out.” Mahalanobis distance on a dataset of the Jersey cow was used (Madsen et al. 2012) to report an increase in breeding value prediction accuracy and a reduction in bias for edited records. Outlier bias was also studied extensively by Escalante (2005) who proposed data filtering before genetic evaluation.

Artificial neural networks (ANNs)

ANNs are computing systems that are loosely inspired by neural networks that constitute animal brains (Chen et al. 2019). The biological neural systems (BNSs) found in animal brains can perform extraordinarily complex computations and are also capable of learning from new and diverse experiences. ANNs are nonlinear statistical data modelling tools where the complex relationships between various inputs and outputs are modelled for pattern recognition. These models are based on deep learning and are efficient in pattern recognition and machine learning. They are part of the broader artificial intelligence (AI) technology field. An ANN consists of a group of connected units which are also called nodes or artificial neurons. These can be paralleled with the neurons in a biological brain. The connection between these neurons can be equated to synapses in the brain, which transmit signals to other neurons (Fig. 1). The node inside the ANN receives a signal which it processes to signal other neurons connected to it. The signal, unlike the biological connection, is a real number and the output of each node is computed using the sum of these real numbers from every node to which a nonlinearity is applied. The connections are also called edges. Neurons and edges are typically given a weight the adjustment of which leads to learning. It is also important to understand that neurons may have a threshold such that a signal is sent only if the aggregate signal crosses that threshold. Typically, neurons are grouped into multiple layers with each layer may performing a different transformation on the inputs received by it. Signals travel from the first layer also called the input layer, to the last layer aka the output layer and in between, they may traverse the layers multiple times. Though several types of neuronal models are being used in research, among which, the McCulloch-Pitts (MP) model is one of the most popular. Each artificial neuron (AN) receives signals either from the environment or from other ANs. It then gathers these signals and transmits them to all ANs connected to it (Fig. 2). The input signals can be both inhibitory or excitatory because of negative or positive weights associated with each connection. An activation function controls the firing of the AN and the input signal strength. When fired, each neuron computes a net input signal as a function of each of the weights. The input to the activation function is the net signal. This helps in determining the output signal. There are two approaches to training an ANN which are supervised and unsupervised. Supervised training provides the network with the required output by manually grading the network’s performance or by supplying the desired outputs along with the inputs. Unsupervised training is those training types where the network must output useful results from the inputs without any outside help.

Fig. 1
figure 1

Artificial neural networks

Fig. 2
figure 2

Artificial neuron

Supervised and unsupervised networks

Here, both inputs and outputs are provided to the network. The network then practices the inputs and compares its resulting outputs to the desired outputs. Errors are propagated back through the system, leading to the adjustment of weights. This process is multiple times until minimum error or highest accuracy is achieved. The first set of data that enable the network to train is called the training set. It is also possible that some networks may never learn due to a lack of specific information in the input data. Therefore, there should be sufficient data so that some part of the data can be used later to test the network. This prevents mere memorization of data by the network. Several parameters, also called hyperparameters to need to be reviewed by the researcher so that the optimal results are received from the network. These include the number of layers, transfer, the number of elements per layer, the summation, the connections between layers, training functions, and initial weights. When the system has been correctly trained, the weights can be frozen and reused for new data. This finalized network can be converted to hardware so that it can be reused. In unsupervised training, the network is provided with only inputs and no desired outputs. For example, in a classification problem, the network is not provided with classifiers. The system learns by itself and then decides the features that will be used to group the input data. This is also called self-organization or adaption. Due to their ability to handle complex calculation issues, they are exceedingly being applied to solve practical problems. Neural networks are being used the world over for poultry preslaughter mortality prediction, crop yield prediction, disease classification (Golhani et al. 2018), classification of agricultural Pests, identification of identifying deficiency of nutrients in the soil or the plant, recognition of lactation patterns, estimation of live body weights in animals (Amraei et al. 2017). Supervised learning is the more popular algorithm in animal sciences with many researchers using the same for making various predictions on animal farms (Sant’Anna et al. 2015). Hamadani et al. (2022c) used heuristic modelling as well as search algorithms for the prediction of breeding values and body weights. Hamadani and Ganai (2022) trained an ANN-based regression model and deployed it in a decision support system. Gandhi et al. (2012) used ANNs for breeding value estimation and milk yield prediction, respectively, and found them to be accurate for making predictions. Ganesan et al. (2014) compared linear mixed models and artificial neural network models for the growth data of sheep and suggested that ANN models were useful for the analysis of longitudinal animal growth data. The ability of supervised learning in ANNs to predict the genetic values of cattle and reported their considerable potential for genetic evaluation has also been evaluated. Silva et al. (2014) also used neural networks to predict superior genotypes and found them superior to linear models in estimating breeding values. Sant’Anna et al. (2015) concluded that the use of ANNs is a promising technique to solve classification problems. Furthermore, in comparison to other methodologies, these networks have the advantage of not requiring presuppositions to the distribution of the data to be used. Shahinfar et al. (2012) showed that artificial neural networks reliably predicted the breeding values for milk and fat yield in dairy cows. ANNs have been used for a 305-day milk yield prediction in cows and showed that it provided better prediction than conventional regression models. Vijay et al. (2019) performed a similar study in Murrah buffalo. Roush et al. (2006) compared the Gompertz nonlinear regression with ANN for predicting the body weight in broilers. They reported the ANN model to have a lower basis. Golhani et al. (2018) predicted the breeding value of body weight at 6-month age using ANN in the Kermani sheep breed with considerable accuracy. Atıl and Akıllı (2015) investigated dairy cattle traits using ANNs and cluster analysis and found the ANN method to be more efficient than K means clustering. Neural networks have also been used to describe the weight gain of sheep from gene polymorphism, birthweight, and birth type. The results revealed that the ANN model is an appropriate tool to recognize the patterns of data to predict lamb growth in terms of ADG given specific genes polymorphism, birth weight, and birth type. Artificial intelligence has also been used for the analysis of genetic diversity, and the identification of superior genotypes, locus minimization in breed prediction using an artificial neural network approach.

ANN training

ANN training algorithms are generally classified into two categories: a. parameter learning and b. structure learning. In parameter learning, the connecting weights get updated in an ANN, while in structure learning, the network topology and interconnections are important. A backpropagation (BP) algorithm is used in a number of supervised learning tasks MLPs, as well as for RNNs (Du and Swamy 2006). A BP algorithm performs two phases of data flow. Initially, the input pattern is propagated in the forward direction which produces an actual output. The error generated from the difference between the expected and actual outputs is then backpropagated in the reverse direction from the output layer to the previous layers. This is useful for updating the initial weights. The process is repeated until the error can no longer be reduced any further. The training algorithms used in the case of unsupervised ANNs are different because there is no desired output, e.g. SOM training uses a competitive learning strategy that is measured using the Euclidean distance. The learning of the neuron is dependent on the changing weights from inactive connections to active connections. In this way, the neuron with the highest activation level in the output layer becomes active and produces an output signal. The other neurons are suppressed. Neurons close to the active neurons are also updated accordingly.

Architectures

Many different types of ANN models have been used by researchers across the world depending on the characteristics required for specific sets of conditions. These are analogous to the functional specificity linked with various brain regions. Architecturally, ANNs can be classified into feedforward neural networks (FNNs), recurrent neural networks (RNNs), and their combinations. For example, fully connected FNNs, RNNs, self-organizing maps (SOMs), convolutional neural networks (CNNs), and cellular neural networks. In FNN the neurons within each layer are not connected. Information streams in such a way that each AN takes inputs from all the neurons from the preceding layer and transmits its summation as a single output value to all the nodes in the layer next to it. The input layer is receiving the user input and the output layer gives out the result which is used to draw some useful inferences. Two popular FNNs are multilayer perceptions (MLPs) and radial basis function network (RBF) networks. MLPs can estimate practically any function with any required accuracy. There should, however, be an adequate number of hidden neurons in the network as well as enough data. The number of input neurons in an MPL is decided as per the input vector dimension. The neuron number in the hidden layer is determined heuristically. The output vector dimension is modelled as per the number of classes to be classified. The data flow is in one direction only in this kind of ANN, i.e. from external inputs to the outputs. RBF networks are like MLP and use radial basis transfer functions for hidden layers. The RBFs classify data using hyperspheres and not hyper-planes. The SOM is also an unsupervised learning technique working on the feed-forward network. It usually contains a two-dimensional single layer of neurons and an input layer of branched nodes. SOM neurons have conventional forward connections as well as lateral connections between nodes in the output layer. These lateral connections create competition between neurons. Additionally, they can have loops in which the transfer information back to the precious layer. This architecture is known as a recurrent network. In a recurrent neural network (RNN), the neurons are arranged in a grid-like manner, and neurons are connected and their neighbouring nodes. A CNN contains regularly spaced neurons that communicate with the neurons next to them and mutual interconnections occur between adjacent ANs. An AN may get excited by its own signals as well as those from the cells next to it (Du and Swamy 2006). CNN’s have been used for imaging technologies for the estimation of crude protein, fat, acids profile, and freshness (Khoshnoudi-Nia and Moosavi-Nasab 2019) meat quality, palatability, tenderness, etc. (Zapotoczny et al. 2016). CNNs have even been used for automatic sorting and weighing cuts and viscera which is normally performed manually as well as extensively in animal identification and classification (Suryawanshi et al. 2020).

Advantages and challenges

Classical analysis techniques work very well for smaller datasets, but as the dataset increases in size, the computation becomes more challenging. In such a case, statistical techniques may fail to address such challenges. In such a case, intelligent systems (IS) also called soft computing are useful for analysis for finding patterns in the data. IS tools greatly enhance the potential of data mining through significant advances in theoretical as well as applied research. IS tools are tolerant of partial truth, imprecision, uncertainty, and approximation. They are also helpful in achieving tractability, and robustness, and offer low-cost solutions to complex data analysis problems. These techniques include multiple computing paradigms, like ANNs, approximate reasoning, fuzzy set theory, and derivative-free optimization methods like simulated annealing (SA) and GAs. DM has the potential of extracting information from data that has existed for decades. A fundamental challenge for DM these days is the increasing amount of data. However, the simultaneous growth of computer processing and storage/database technologies offer great mitigative measures. ANNs, due to their biological roots, offer a myriad of advantages and this makes them one of the key methodologies for modern DM. Some important features include

  • ANNs require no prior knowledge about data, which is not true for traditional statistical methods.

  • These networks are highly adaptive and pick up the characteristics or patterns of the data. This makes ANN ideal for real problems of the world.

  • ANNs are robust and fault-tolerant which means they are capable of handling incomplete or noisy data. Because the networks are highly intricate, and there are many parallel ANs, information is stored in a distributed way. Therefore, there is no significant effect on the overall performance even if the information at one or a few nodes is lost, or a connection is damaged (Du and Swamy 2006). Also, the performance of ANNs can be improved by updating the connection weights.

  • ANNs are also very useful for nonlinear modelling, and each neuron within a network can have either linearity or nonlinearity depending on the activation function.

  • ANNs are parallel processing each of which performs various mathematical operations. This offers higher calculation speeds and allows parallel software and hardware implementations like very-large-scale-integrated (VLSI) technology. This allows a means of capturing complex behaviours.

  • ANNs have a black-box nature. This means that even after successful training, no information suitable for verification or interpretation by humans is available. Thus, it is a challenge to gain an understanding of how the network handles unknown inputs.

  • As the data increase in size, ANNs are gaining much popularity. However, it is important to select the inputs for training ANN since not all of the variables are equally informative otherwise, they would only contribute to noise and affect the accuracy of the model. These may also make it difficult to understand complex models also leading to increased computational complexity as well as memory requirements of the model.

Therefore, a better understanding of this realm of data mining can go a long way in gaining new and powerful insights into our agricultural systems, thereby contributing to their improvement.

Conclusions

Artificial neural networks offer a lot of promise in animal sciences. They have been used by various researchers in diverse areas of this field with success. Thus, ANNs hold the potential of solving the pressing issues associated with animal husbandry globally. ANNs offer promises in improving aspects of animal sciences by drawing hitherto unknown inferences which were not possible using conventional techniques. ANNs for the prediction of the genetic merit of animals early and more efficiently would transform the sector.