As part of the EXPOSED project, operational data were gathered at three different aquaculture locations. Each of the locations was exposed to weather and harsh environments at a level well above average in the industry.
Reports
The operators of the sites were tasked with recording whether a set of possible operations was possible to perform that day or if the weather or environment was too challenging to safely execute these operations.
The operations considered each day were:
-
Go out to the site (all of the personnel typically do not live/sleep at the location);
-
Perform a daily inspection, go out on the seacage structure, and inspect the structure itself, as well as the fish, to ensure sufficient fish welfare;
-
Use of a crane on boat in relation to the location structures. Typically, strong winds or waves make operating a crane from a boat very difficult because the length of the crane amplifies the movement generated by the waves on the boat;
-
Use a winch from a boat to operate at the location (e.g. pull up different parts of the underwater structure);
-
Operate a wellboat on location. Wellboats are used to collect fully grown fish for slaughter or deliver spawn to the fish farm cages;
-
Perform delicing operations on location. Typically, delicing is performed with wellboats using either chemical, temperature or mechanical methods;
-
Deliver fish food and freshwater via a feedboat equipped without dynamic positioning (DP) equipment; and
-
Deliver fish food and freshwater via a feedboat equipped with DP equipment.
In addition, if any operation was deemed too difficult because of the weather or environment, each of the reports had to specify whether the limitation was related to winds, waves, currents or a combination of these factors that hindered the operation. Table 1 shows example reports from four days over two weeks. In total, 708 reports were recorded from 05.12.2016 to 30.12.2018.
Table 1 Example of two cases from different aquaculture sites from two days over a period of two calendar weeks. The features of the cases are binary, where “1” (highlighted in bold in the table) indicates a failure for that operation at that time and location Weather reports
The Norwegian Meteorological Institute provides historical records of weather data through its API.Footnote 2 This API provides recorded weather data from the closest weather station to a given point in Norway. Thus, we could collect wind speed and wind direction at the location and time for each report. However, the different weather stations and their sensors do fail from time to time, so for some days, the closest operational weather station may be farther away from the location of the aquaculture operation than it is on other days. As a result, we calculated the distance from the weather station to the location for each report as a feature in the dataset.
Exposure level and wind effect
The EXPOSED project has produced a dataset [15] that describes the degree of exposure for most of the aquaculture installations in Norway. The data set provides a level of exposure of 360 degrees around the installation. One such installation and its exposure is illustrated in Fig. 1. The exposure level is quantified in the range from 0 to 1, where 0 represents that the installation is shielded by a landmass and 1 represents no land within 40 km. This dataset provides the exposure level in the direction of the wind at any point in time.
It is intuitive to incorporate the exposure level data into the cases so that a learned similarity function can compare levels of exposure between sites when computing the similarity between operational situations. Including all 360 data points per site in every report would be counterproductive for several reasons. First, the exposure data do not change over time for each site. Second, only a small portion of the exposure data in the direction of the wind on a particular day influence the operations for that day (being exposed in the direction of no wind has little effect). Our solution is to combine the exposure level with the wind direction at the location and the time of the report. In this way, the learned similarity function can consider the exposure level in the same direction as the wind direction at that time. This approach is implemented as a lookup function that returns the exposure level for a given wind direction. To make the function smoother in terms of the wind effect, we add a Gaussian filter. This addition will let neighboring exposure levels influence the calculated wind effect. The lookup function gf is defined as follows:
$$ gf(wd,el,wis) = G(wis+1) \bullet el(wd,wis), $$
(1)
where G(⋅) returns a Gaussian filter of size wis as a vector and el(wd,wis) returns the exposure level in the wind direction wd as well as the wis exposure levels adjacent to the wind direction wd. Thus, the model considers the level of exposure adjacent to the wind direction and not just the single degree of influence in the direction of the wind. In our model, we set this window wis to 10, thus accounting for the exposure level within ± 5 degrees of the wind direction. This vector of wind exposure levels in and near the wind direction can then be combined with the wind speed to obtain the wind effect. This value is defined as we:
$$ we(w,wd,el,wis) = gf(wd,el,wis) \cdot w, $$
(2)
where w is the wind speed at the site at that current time and all other function parameters are as defined in (1).
Case definition and case base population
For all eight different types of operations listed in Section 3.1, there can be four different outcomes: no failure or failure because of wind, waves or current. As a result, 4 ∗ 8 = 32 classes exist, which are too many classes to learn to separate from 708 data points. However, from the perspective of a DSS user in the setting of aquaculture operation planning, a general prediction of operational failure is useful. Thus, grouping the failure types and causes reduces the resolution but retains most of the utility of AQCBR as a DSS. After grouping all the failures, we can evaluate the AQCBR’s ability to predict failures related to weather. Given that these operational failures seldom occur, the dataset is unbalanced, with 88% of all cases not reporting any failures. Given a failed operation, it is highly likely that higher winds from the same direction will also cause failures. Thus, it is simple to generate realistic failure cases from the existing failure cases and expand the training dataset. To generate a realistic case, we pick a random failed operation and add a small random value to the wind speed; this is done while making sure the data point is not noise (, is associated with a low wind speed, see Fig. 2). Figure 2 shows a pair plot for a subset of the case features, with the cases colored according to class (failure/success). The pair plot shows that most failure cases occur due to high wind speeds but that some occur at low wind speeds. The failure cases of the latter type are not considered during the rebalancing of the dataset.
We can now define the case base for AQCBR. Formally, the case base data are denoted as d = x1,x2,…,xn, where xi is one report containing success or failure (sf ) information for an operation. Furthermore, let el = el1,…,eln be the dataset of exposure levels, where eli corresponds to the exposure level at the location of report xi. Let w = w1,…,wn be the dataset of weather reports collected for these sites, where wi corresponds to the report xi. These weather reports contain wind speed (ws), wind direction (wd) and distance to weather station (di) information. Thus, a case can be represented as:
$$ \begin{array}{@{}rcl@{}} &&C_{i}(x_{i},el_{i},w_{i}) = w_{i}(ws,wd,di),we(w_{i}(ws),w_{i}(wd),\\ &&el_{i},wis),x_{i}(sf), \end{array} $$
(3)
where we(⋅) is defined by (2) and wis is the window size for the weather effect (how much the exposure levels to either side of the wind direction are taken into account). Case bases are then split into testing (querying) and training sets based on stratified cross-validation to evaluate the AQCBR method (see Section 5). Example cases following this definition are given in Table 2.
Table 2 This table shows two example cases from the recorded data used for the training and testing performed in this work