Time series analysis of cow milk production at Andassa dairy farm, West Gojam Zone, Amhara Region, Ethiopia

Milk production is an integral part of Andassa agricultural farming system, even though the area has potential for milk and dairy products, there is always a great demand for milk and milk products among people. However there are no long-term researches done on the area for forecasting the volume of milk production. Therefore this study was attempted to investigate the trends of actual yield of cow milk production and forecast the volume of milk. A time series study was conducted on the volume of cow’s milk production in Andassa dairy farm, and here we used secondary data for this study. The study employed inferential statistics for the purpose of hypothesis testing, estimating the trend, to fit the appropriate model and forecasting. A total of 179 days of milk production were included in this study, and the results showed that the trend of milk yield of caws is decreasing and it is time dependent. The model that has been selected for forecasting volume of milk production is ARIMA (1, 2, 1) since the model was found to be a better model.


Background
Milk is used to hydrating dairy products such as butter, cheese and milk powered. There is always a great demand for milk and milk products among people. Most countries to fulfill the needs of the people expands their production system and techniques to increase the supply of milk production (Haenlein and Wendorff 2006). A number of technological innovations now permit automated daily monitoring of cow performance. So use of automated milk yield recording systems for early detection of diseases requires a statistical model to forecast expect performance and to compare forecast and actual performance. Previous researches on modeling milk production in cows had focused on fitting linear or nonlinear deterministic models to daily, weekly, or monthly milk measurements from lactations sing either partial or complete lactation data sets (Jacobs and Siegford 2012).
It is essential that researchers and dairy development agents have to understand the existing situations, to design relevant development strategies that to the area (Tassew and Seifu 2009). Hence, milk production is an integral part of Andassa agricultural farming system, and the area has favorable climate and potential for milk and dairy products, so the level of its milk production is low when compared to some dairy sectors of tropical countries. So to design relevant development strategies that appropriate to the area, it is essential that to have in-depth understanding about the past behavior of the sector so that this will helps to evaluate the strengths and weaknesses of strategies adopted in the past and also provides the necessary basis for setting future growth targets. This study was designed to; examine daily trend analysis of milk production, to fit the appropriate model, to forecast the milk production for the future based on the previous recorded data.

Data and methods
This study was conducted at Andassa dairy farm which is located around Bahir Dar in north western Ethiopia, Here in Andassa dairy farm, milk productions were recorded in litters per cow on daily basis and the data provided information on volume of daily milk recorded in litters per cow, so this study was employed only secondary data obtained from this dairy sector. Time series analysis method was employed for analyzing this study.

Time series method
Time series analysis comprises methods or processes that breakdown a series into components and explainable portions that allows trends to be identified, estimates and forecasts to be made (Kantz and Schreiber 2004). Basically time series analysis attempts to understand the underlying context of the data points through the use of a model to forecast future values based on known past values. Such time series models include MA, AR, ARIMA, GARCH, TARCH, EGARCH, FIGARCH, CGARCH and ARIMA among others but the main focus of this study is based on MA, AR, ARMA, and ARIMA models.

Test of randomness
This test is used to identify whether there is any systematic components in the time series data or the data are dependent in time t. The main concern is on the assumptions that the data are independent and identically distributed (Beck and Katz 2011

Trend analysis
A time series which extends to consistently throughout the entire period of time under considerations and it shows longterm tendency of data to grow or to decline is known as trend analysis (Golyandina and Zhigljavsky 2013). In this case trend analysis employed to the past trend of milk production and to forecast the future amount of milk product. The trend model that are usually considered as the linear trend, in which the mean of Y t expected to change linearly with time as E ( Y t ) = β 0 + β 1t or as quadratic function of time E ( Y t ) = β 0 +β 1 t 2 or even possibly as an exponential functions of time such as E ( Y t ) = β 0 e β 1 . But, here we consider on linear trend and estimate based on least square estimation method, double moving average and double exponential smoothing methods. This paper used the least square estimations methods. In the mathematical form expressed as follows: (Figs. 4,5,6,7).
Then, the values of β 0 and β 1 can be calculated from the following normal equations.
By, rearrange the equations simply to calculate the unknown parameters of β 0 and β 1 .
where, the values of ∑ Y t , ∑ t, ∑ tY t and ∑ t 2 are obtained from the given collected data.

Stationary time series
Stationary is a critical assumption in time series models. Stationary implies homogeneity in the sense that the series behaves in a similar way regardless of time, which means that its statistical properties do not change over time. More precisely, stationary means that the data fluctuate about a mean or a constant level. Many time series models assume equilibrium processes. However in practice, a much weaker definition of stationary, often referred to a weak stationary is used. The condition of weak stationary requires that the elements of the time series should have a common finite expected value and that the auto covariance of two elements should depend only on their temporal separation. Generally stationary time series is one whose statistical properties such as mean, variance and autocorrelation are all constant over time. Mathematically, these conditions are: Mean = μ and variance = б 2 are constant for all values of t (Miettinen et al. 2016).

Stationary through differencing
The first difference of a time series is the series of changes from one period to the next. If Y (t) denotes the value of the time series Y at period t, then the first difference of Y at period t is equal to

General linear processes
Let { Y t }denotes the observed time series and also { e t } represents an observed white noise series, that is sequence of identically distributed with zero mean and independent random variable. If the right side of this expression is truly an infinite series, then certain conditions must be placed on the Ψ weights for the right hand side to the meaningful mathematical expression. For our process, it suffices to assume that,

Partial autocorrelation function for average daily milk production
We should also note that since { e t } is an unobservable, there is no loss in the overview of the above equation if we assume that the coefficient on e t is 1; effectively Ψ 0 =1 (Box et al. 2015).

Moving average process
In the situation where only a finite number of Ψ weights are non-zero, we called it a moving average process. In this case, we change the notation slightly and we can write as this is moving average of order q and abbreviated as MA (q).The terminology of moving average arises from the fact that Y t is obtained by applying the weights 1, -1, -2 ,…−q to the variables,e t−1 ,e t−2, … e t−q and then moving the weights and applying to, e t+1 ,e t ,e t−1 , …., e t−q+1 to obtain Y t−1 and so on.
For the general MA (q) process Y t = e t -1 e t−1 − 2 e t−2 -……q e t−q , similar calculations shows that Otherwise 0 for k > q, where the numerator of q is just q. The autocorrelation function "cuts off" after lag q, and its shape can be almost anything for the earlier lags (Fan and Yao 2008).

Autoregressive process
Autoregressive models are based on the idea that the current value of the series,x t , can be explained as a function of past values,x t−1 ,x t−2 ...x t−p , where "p" determines the number of steps into the past needed to forecast the current value. An autoregressive of order "p" can be written as: where x t is stationary series,ϕ 1 , ϕ 2 , …, ϕ p are the parameters of the AR (ϕ p ≠ 0) (Lütkepohl et al. 2004).

Autocorrelation function (ACF)
It is known that the key properties of any random variable are the two moments, namely its mean and the variance, and as we assumed in the introduction that a time series is a realization of a stochastic process this also applies for any time series. But in the context of time series analysis, the relationships between observations in different time-periods play also a very important role. These relationships across time can be captured by the time series correlation, respectively (Brockwell and Davis 2016).

Partial autocorrelation function (PACF)
The partial autocorrelation function ( π k ), where k ≥ 2, is defined as the partial correlation between Y t and Y t−k under holding the random variables in between Y u , where t-k < u > t, constant. The partial autocorrelation plot or partial correlogram is also commonly used for model identification in Box and Jenkins models (Inoue 2008).

ARMA model
ARMA model is a mixed model in which the series is partly auto regressive and partly moving average, we obtain a quite general time series model in general the model is:- We say that { Y t } is a mixed auto regressive moving average process of order p and q, respectively, we abbreviate the name to ARMA (p, q) (Tsay and Tiao 1984).

Box Jenkins modelling method
Box Jenkins model is one of the classes of model to choose the systematic approach to identifying the correct model form (Hipel et al. 1977). There are statistical tests for validity of the model using this test it is possible to identify the best appropriate Box Jenkins model, to fitting the data on the milk production at Andassa dairy farm. To use Box Jenkins method for the time series, data must be stationary after one or differencing for the reason to assuming stationery if to identify and estimate a statistical model, which can be interpreted as having the generated for the sample data. After finishing the process of stationary we used autocorrelation and partial autocorrelation functions to identify the model which is the good fit to the collected data. From this fitted model parameters are estimated by minimizing the residuals sum of squares. Finally, using these parameters, it is easy to calculate the increments or decrements of milk production at Andassa dairy farm. Therefore, in this study we identified an appropriate Box-Jenkins process or model, fitting to the data and then using the fitted model for application based on the data that comes from Andassa dairy farm.

Descriptive summary
A common first step in data analysis is to summarize information about variables in the data using descriptive statistics such as the minimum, maximum, averages, variances, quartiles and percentiles, of variables set. We were collected secondarily data of the daily milk production of caw at Andassa dairy farm. There are 60 caws in the farm, but from this number of caws we selected 20 caws which have the same day of milking and we took the average daily milk production for 179 days of these caws.
As we look from the above average daily milk production table the minimum and maximum record of milk production at Andassa dairy farm is 715.0 and 2295.0, respectively, observed in the day 11/12/2017 and 11/2/2017 respectively, and the mean for the average daily milk production is 1416.9.

Test of randomness using different sign test
Test of hypothesis, H0: The data of daily recorded in the average milk production are random.
H1: The data of daily recorded in the average milk production are not random.

From thus we have the number of points increase
Therefore: Z cal = 81−89 3.87 =-2.0672.
We reject the null hypothesis since | Z cal |= 2.0672 > Z 2 =1.96.Thus, at five percent level of significance, the data of the average daily milk production is not random. This shows there is systematic pattern on the data and also it indicates, disseminations of the data are not balanced and it must be transformed in to random form using different methods.

Stationary time series
Stationary is a critical assumption in a time series models. Stationary implies homogeneity in the sense that the series behaves in a similar way regard less of time, which means that its statistical properties do not change over time. This can be judged by looking its time plot. And the time plot appears similar at different points along the time axis, and the time series plot is as shown below.
As we see from the above graph of average daily milk production of caw (Y), it is decreasing through time (t) and it is not stationary, so we must be transform in to stationary form of time series plot using differencing method.
Then after transforming the data by differencing in lag two it become stationary series as shown below.
As we see from the above time series plot the series fluctuates about the mean. This means that it does not rend away or drift away from the mean and the time plots appears similar at different points. Because of this reason the time series is stationary meaning, the series fluctuate of movement in the average daily milk production of caw is not far away from one to the other.

Trend and difference stationary
A random walk with or without a drift can be transformed to a stationary process by differencing (subtracting Y t−1 from Y t , taking the difference Y t − Y t−1 ) correspondingly to Y t -Y t−1 = e t or Y t −= Y t−1 α + e t and then the process becomes difference-stationary. The disadvantage of differencing is that the process loses one observation each time when difference is taken.
The trend analysis plot shows it is not stationary, and then it also needs differencing to be stationary. After differencing the data by lag two, the trend analysis becomes stationary as shown in the graph bellow.
As the fitted trend shows, the amount of average daily milk production is decreasing. The slope of the trend for 179 days data is − 16.066 which is the rate at which the amount average milk per day are going to decrease. This also indicates that, the amount of average daily milk decreases from time to time.

Moving average
After the data are transformed in to stationary form using differencing the observations, the moving average plot becomes as shown below.

Box-Jenkins model
Most statistical methodologies are developed by assuming that the observations are random but the data in which successive observations correlated in such a case methods designed to export ARIMA (p, d, q). ARIMA models are, in theory, the most general class of models for forecasting a time series which can be stationeries by transformations such as differencing and lagging. To identify the appropriate ARIMA model for a time series, we begin by identifying the order(s) of differencing needing to stationary the series.

Auto correlation function of daily milk production
It is known that the key properties of any random variable are the two moments, namely its mean and the variance. But in the context of time series analysis, as the definition of a time series indicates, the relationships between observations in different time-periods play a vital role. These relationships across time can be captured by the time series correlation. The plot of the ACF over time is used to identify the order and degree of non-stationary of an ARIMA model.
From the above autocorrelations graph there is lagged number of the series and here we must test the AR model to check the model adequacy. That is, the average milk production under the autocorrelations graph show there are one points outside the lower bound that is AR(1).

Partial autocorrelation function: for average daily milk production
The partial autocorrelation plot or partial correlogram is also commonly used for model identification in Box and Jenkins models.
Similarly, from the above partial autocorrelation graph there is one lagged number. So we must test the MA model to check the model adequacy. That is, the average daily milk production under the partial autocorrelations graph shows there is a point outside the lower bounds that is MA (1). Meaning, disseminations' of the average daily milk production is not balance or equal.

ARMA process
ARMA is the combination of the autoregressive and moving average processes, so we have AR(1) and MA (1), we obtain a quite general time series model ARMA(1,1). (Tables 1, 2).

ARIMA process
One implication of above is that AR (p) process with a unit root can always be written as in the form AR (p − 1) process in differences. Such processes are often called auto-regressive integrated moving average processes ARIMA (p,d,q) where the d refers to how many times the data is differenced before it is an ARMA(p,q). Therefore the ARIMA model becomes:-ARIMA (p,d,q) = ARIMA (1,2,1). Differencing: 2 regular differences. Number of observations: Original series 177, after differencing 175.

Testing of parameter
The final estimates are those that minimize the sum of squared errors to a point where no other estimates can be found that yield smaller sum of squared errors. A model should have significant parameter as shown in the MINITAB output. ARIMA (1, 2, 1) have a p_ value which is less than the level of significance (α = 0.05). This means that the parameters are significantly different from zero, and have minimum sum squared error. Then, it has statistically significant parameters. So, the model is adequate.

Forecasting
After identification of the model and estimation of the parameters, the next step is forecasting with the appropriate model for the average daily milk production of caw at Andassa dairy farm. The appropriate model for this data is ARIMA (1, 2, 1) to obtain the point of forecast. The final model must be rewritten in the original form is described as follows.
.9977 e t−1 +e t 95 Percent limits. We can check the accuracy of the forecasted value using the above 95% confidence interval described as follows. All the forecast values are found between the lower and the upper interval then we can say that forecasted value is accurate (Table 4, 5).

Conclusions
The study was conducted using the data from secondary source. Based on the analysis conducted in this research on the average daily milk production, the following ideas are summarized. Based on the time series plot, the milk production is time dependent. The amount milk production at Andassa dairy farm the mean amount of the milk production is decreasing. The 179 day shows that the high variability in the amount of daily milk production than other days, this means that the amount milk is highly different from day to day. The minimum and maximum record milk production at Andassa dairy farm is 715.0 and 2295 observed in the day 11/12/2017 and 11/2/2017 respectively. The amount of milk is decreasing, since the slope of the trends is − 16.066. Generally daily milk production under the autocorrelations and partial autocorrelations graph shows it does not include the whole observation in the upper and lower bounds. Meaning, disseminations' of the milk production is not balance at Andassa dairy farm. For ARIMA (1, 2, 1) is a parameters which has a p value which is less than the level of significance (α = 0.05). This means the parameter is significantly different from zero, and has minimum sum squared error. Then, the parameter is statistically significant.

Recommendations
After observing all results of the study, the following recommendations can be made.
From the result of this study the amount of daily milk is decreasing from day to day. Even if the study is limited at Andassa dairy farm; the concerned body should take the following remedial actions.
Concerned bodies are recommended to take the remedial action that decreases the milk production. Since this decreasing of milk production may come from different factors so this factors must be study for the future to gain high amount of milk.
Regarding the dairy sector managers, data should be well organized and reported to have role in further studies so that the organization will beneficial.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.