1 Introduction

The VLC system is used in underwater applications in a wide range due to its high reliability, huge band width and high data rate. The channel modeling used in VLC systems has mathematical analysis according to the features of water types (Shawky et al. 2020; Ehiremen et al.2020; Tang et al., 2014; Yiming et al.2018). The VLC underwater system used radio frequency (RF) and acoustic waves for several years. The acoustic system has a limitation in spectrum, high complexity, and high latency. Also, the RF suffers attenuation for short ranges, so, it cannot achieve the optimum long-range localization (Mapunda et al. 2020).

Recent traditional ways for localization using the VLC systems include time of arrival (TOA) technique, depending on the real time of arrival for the received optical signal (Wang et al. 2013), arrival of angle (AOA) method, which uses the intersection of the angle direction received signal lines (Islam et al. 2012; Yang et al. 2014; Sahin et al.2015), time difference of arrival (TDOA) technique, based on the difference among received signals in the arrival times for three transmitters at least (Jung et al. 2011), and received signal strength (RSS) method, where it uses measuring the received power of the optical signals strengths for more than two transmitters (Islam et al. 2012; Yang et al. 2014; Xin Su et al. 2020; Ullah et al. 2019; Sahin et al.2015; Jung et al. 2011).

In (Vegni et al. 2021), authors presented a foot-printing positioning method depending on the optical signal in VLC spectrum. This approach combines RF and VLC network. The RF sensor holds a map of channel gain in the environment to calculate the estimation of the localization. The localization technique was utilized to evaluate the localization using the accuracy of centimeter-base for turbidity scenarios. Underwater localization using VLC was presented in (Alzahraa Ghonim et al. 2021) depending on the neural networks (NNs) and RSS was estimated in two stages: data collection and NN training. Firstly, data have been collected using Zemax and Monte Carlo ray tracing software. The gains of the channel were used as the data set input to the NN, where the outputs of the NN were the coordinates of the detector depending on the intensity of RSS method. Secondly, an NN system has been built and trained with the aid of orange data mining software.

Similarly, many studies used DLMs in conjunction with VLC systems to improve system performance (Chaleshtori et al. 2019; Ma et al.2019; Irshad et al.2019; Alonso-González et al. 2018). Chaleshtori et al. investigated the effect of training algorithms in an ANN equalizer in VLC systems using an organic light source. Ma et al. investigated the design and implementation of machine learning (ML) based demodulation methods in VLC systems physical layer (Zhang et al. 2016). By analyzing the mobility patterns of water near the seashore, Zhang et al. proposed a localization approach for underwater acoustic wireless sensor networks based on mobility prediction and a particle swarm optimization algorithm in (Saeed et al. 2018). Saeed et al., on the other hand, proposed an RSS-based localization method for underwater optical wireless sensor networks (Teruyama et al. 2013).

An HP is a ML parameter that must be set before the training process begins. As a result, unlike the values of parameters (e.g., weights) that can be taught during training, HPs (e.g., learning rate, batch size, and number of hidden nodes) cannot be learned during learning. HPs can have an impact on the model quality produced during the training process, as well as the algorithm time and memory requirements (Mai et al 2019). As a result, HPs must be fine-tuned to produce the best results possible in any given situation. This fine-tuning can be performed manually or automatically. A variety of popular approaches are used in automatic HP tweaking. These include Bayesian optimization, which is distinguished from grid or random search. It chooses the most promising HP values based on an objective function probability model (Mai et al 2019).

This paper proposes an underwater VLC system to improve localization accuracy using two channel modeling: WDGF and CPEAF for sea water. In terms of localization accuracy, the KF with averaging outperforms the RSS average method. We use DLMS to increase the enhancement. Our proposed framework is based on determining the 2D positioning system using different techniques, including DLMs, SSD, RetinaNet, ResNet50V2 and InceptionResNetV2, in order to approximate Cartesian coordinates. The proposed framework uses the \((x,z)\) coordinates to process a grid of RSSs. The DLMs input is the received signal power. This paper is concerned with underwater localization, specifically how to find a diver or any other object. The proposed system is notable for its ability to determine the precise coordinates of any object within sea water using absorption and scattering. This system is hardware feasible due to its high precision, low cost, and low computational complexity. Furthermore, Bayesian optimization-based high-performance computing approaches are used to improve our framework. Our proposed framework is categorized into two phases. First, data collection, where data is collected based on MATLAB software. Second, the training and testing are applied for DLMs, SSD, RetinaNet, ResNet50V2 and InceptionResNetV2. The channel gain is the DLMs' input data set, while the DLMs' output is the RSS intensity technique coordinates for each detector. The DLMs are then developed and trained using Python software.

This paper is organized follows; Sect. 2 discusses the optical channel modeling of an underwater VLC system with the impulse response modeling. Section 3 illustrates the localization techniques with the proposed averaging technique and the algorithm KF and localization based on DLMs. Simulation analysis is displayed and discussed in Sect. 4. Finally, Sect. 5 concludes the work.

2 Underwater optical channel

According to (Tang et al., 2014; Yiming et al. 2018), both scattering and absorption represent the main parameters affecting light propagation in underwater systems, where both are wavelength dependent. The loss in intensity is caused by absorption which depends on the water refractive index. The spectral absorption coefficient, a(λ) m−1, is the intrinsic optical property to get the model of water absorption. The scattering coefficient, b(λ) m−1, is the light deflection in underwater system from the real track, resulting from the presence of diffraction, or via some of substances with several values of refractive index. The extinction coefficient, \(c\left(\lambda \right)\) m−1, is defined as a summation of \(a\left(\lambda \right)\) and \(b\left(\lambda \right)\)

$$c(\lambda ) = a(\lambda ) + b(\lambda )$$
(1)

The difference between the types of water is according to both of view of matter and quality. The work in (Yiming et al.2018) shows the values of water coefficients for different types of water, pure, ocean, coastal and harbor water.

2.1 Optical characterization of sea water

According to (Yiming et al.2018), there are four parameters for absorption effect

$$a(\lambda ) = {a}_{w}(\lambda ) +{a}_{phy}(\lambda ) + {a}_{det}(\lambda ) + {a}_{CDOM}(\lambda )$$
(2)

Here, the absorbing parameter of pure sea water is expressed as \({a}_{w}(\lambda )\), the absorption resulted from Chromophoric Dissolved Organic Matter (CDOM) and phytoplankton are \({a}_{CDOM}(\lambda )\) and \({a}_{phy}\left(\lambda \right),\) respectively, and the coefficient of absorption caused by detritus is assumed as \({a}_{det}(\lambda )\).

Our results used the sea water and the characterization of WDGF and CEAPF channel modeling are used also for sea water type. We do not use the different types of water to decide if the performance is as same or not but we can use different types of water in our future works.

Depending on the scattering effects, the intensity of the received signal can be deduced and provides intersymbol interference, where the bit rate is not dropped to accommodate the temporal scattering. The scattering is dependent on low wavelength and is based on the huge number of several particles in underwater (Yiming et al.2018).

$$b(\lambda ) = {b}_{w}(\lambda ) + {b}_{phy}(\lambda ) + {b}_{det}(\lambda ) +{b}_{CDOM}(\lambda )$$
(3)

Here, the scattering parameter related to CDOM is expressed as \({b}_{CDOM}(\lambda )\), while the scattering for phytoplankton is \({b}_{phy}(\lambda )\). \({b}_{w}(\lambda )\) and \({b}_{det}(\lambda )\) represent scattering due to pure sea water and detritus, respectively.

The value of the extinction coefficient, c(\(\lambda \)), is varying according to the water depths and water types. The main characteristics of water can be classified in two main groups: inherent and apparent. The inherent properties describe the optical parameters which depend only on the medium, more specifically the composition and particulate substances present. The apparent properties are based on both channel through transmission and the geometric structure of the illumination, so, it is a directional property.

2.2 Impulse response in the channel modeling

For the impulse response used in clouds, the WDGF is assumed in (Mapunda et al.2020) to characterize the channel impulse response (CIR). While the output of result has high performance were using four degrees of freedom, the clouds characteristics are multiple from those of the water.

The CIR can be expressed as (Mapunda et al.2020)

$${h}_{1}\left(t\right)= {c}_{1}t{e}^{-{c}_{2}t}+{c}_{3}t{e}^{-{c}_{4}t}$$
(4)

where \({c}_{1}\), \({c}_{2}\), \({c}_{3}\) and \({c}_{4}\) are four factors to be obtained through the Monte-Carlo simulations.

The model of function, h2(t), depending on CEAPF was recently assumed in (Wang et al. 2020)

$${h}_{2}\left(t\right)= {c}_{1}t\propto (t+{c}_{2})\beta {e}^{-\alpha vt}$$
(5)

where \({c}_{1}\) > 0, \({c}_{2}\)> 0, \(\alpha \) > -1 and β > 0, are the four parameters to be found and \(v\) is speed of light in water. These parameters can be calculated from Monte-Carlo simulation results using the nonlinear least square criterion. Note that none of these CIR models are valid for water types in which the effect of scattering is not as dominant as in turbid environments. In addition, these models do not consider the channel path loss. Tables 1 and 2 show the main parameters of CIR for different values of field of view (FOV) of sea water for CEAPF and WDGF, respectively, and L refers to the height of the water.

Table 1 Parameters of CEAPF in different UOWC channels (Tang et al. 2014)
Table 2 Parameters of WDGF in different UOWC channels (S. Tang et al. 2014)

3 Methodology

3.1 Proposed localization methodology using averaging RSS technique

The conventional trilateration localization method is applied as modeled in (Shawky et al. 2020) to obtain the receiver location, using the RSS technique from 3 LEDs having the highest received levels (Teruyama et al. 2013). The proposal aims to take an average of the estimated receiver location using the number of measurements resulting from the RSS method to deduce the localization error. Figure 1 shows the block diagram that demonstrates this approach.

Fig. 1
figure 1

Localization based on RSS averaging

Using (Shawky et al. 2020) and RSS technique, the received power LoS from transmitter \(i \in \{1, 2, 3, 4,..\}\) can be expressed as (Mai et al. 2019).

$${P}_{R,i}=\left(\frac{m+1}{2\pi {d}_{i}^{2}}{cos}^{m+1}({\varphi }_{i}){A}_{R}\right){P}_{T,i}$$
(6)

where \({P}_{T,i}\) represents the power transmitted from LED ( \({i}^{th}\)), m is the Lambertian order.

Here, we suppose that \({{\varphi }}_{i} = {\uppsi }_{i}\) where \({{\varphi }}_{i}\) is the irradiance angle, \({\uppsi }_{i}\) is the incidence angle, that is calculated from Fig. 1 as (Teruyama et al. 2013)

$$\mathrm{cos}\left({\varphi }_{i}\right)=\frac{V}{{d}_{i}}$$
(7)

where \(V\) is the height between receiver and transmitter and is assumed as a constant value. The distance, \({d}_{i},\) between both receiver and transmitter is expressed as

$${d}_{i}=\sqrt[m+3]{\frac{(m+1){V}^{m+1}{A}_{R}{P}_{T,i}}{2\pi {P}_{R,i}}}$$
(8)

3.2 Localization using KF in conjunction with averaging

The KF algorithm can estimate the state of the linear system by utilizing some series of the noisy measurements and produces the estimation of unknown variables to get more accurate results than those based only on a single measurement (Mai et al.2019). The KF algorithm aims to enhance the estimation of receiver location. The estimation starts by the estimation of KF to some samples for measured received powers in different times, where the time difference between each sample is nearly 1 ns. The selection of 1 ns depends on creating semi similar samples and guarantees that the receiver does not change its position to make the averaging method with KF and that gets better results and the basis for selecting the time interval depends on the main reference (Shawky et al. 2020). Then, an average is calculated for those estimated powers. Utilizing average power estimation can calculate the receiver position with the RSS method. The block diagram of KF with averaging technique is shown in Fig. 2a. Figure 2b illustrates the procedure of using the proposed KF as in (Shawky et al. 2020).

Fig. 2
figure 2

(a) Proposed KF with averaging technique. (b) Block diagram of using KF with AVG method

3.3 Kalman algorithm

As applied in (Irshad et al.2019), the channel is modeled as an auto-regressive (AR) process in the space state model. The AR models and past values take the current values effects. The scheme is based on the enhancement of the estimated accuracy. In the KF, the state vector is denoted as \(x\). This vector measures the state of the received power and some of samples in the process, depending on the estimation at the iteration \(k-1\), and has the state \({x}_{k-1/k-1}\). The next \(k\) of the dynamics system, \({x}_{k/k-1}\), is calculated in the predict and measurement stages as illustrated in (Teruyama et al. 2013).

The next \(k\) of the dynamics system, \({x}_{k/k-1}\), is calculated in predict and measurement stages as follows.

First: Predict step:

$${x}_{k/k-1}={F}_{k}{x}_{k-1/k-1}+{v}_{k}$$
(9)

where \({F}_{k}\) is the state transition matrix and \({v}_{k}\) is the white process noise.

The corresponding state covariance matrix is given in (Teruyama et al. 2013).

$${P}_{k/k-1}= {F}_{k}{P}_{k-1/k-1}{F}_{k}^{T}+{Q}_{k}$$
(10)

where \({Q}_{k}\) represents the covariance of the noise process.

Measurement step:

The updated state variable, \(x_{k/k}\), and updated state covariance matrix \(P_{k/k - 1}\) can, respectively, be represented by

$$ x_{k/k} = x_{k/k - 1} + K_{k} \left( {z_{k} - H_{k} x_{k/k - 1} } \right) $$
(11)
$$ P_{k/k} = \left( {I - K_{k} H_{k} } \right)P_{k/k - 1} $$
(12)

where \(K_{k}\) represents the Kalman gain, and \(H_{k}\) denotes the observation model given by

$$ K_{k} = P_{k/k - 1} H_{k}^{T} S_{k}^{ - 1} $$
(13)

Here, \(z_{k}\) is the measurement vector given by

$$ z_{k} = x_{k}^{T} + { }w_{k} $$
(14)

where \(w_{k}\) is the measurement noise.

Also, \(S_{k}\) represents the innovation matrix, which is correlated with the covariance of the state variables to measurement vector as:

$$ S_{k} = \left( {H_{k} P_{{\frac{k}{k} - 1}} H_{k}^{T} } \right) + R_{k} $$
(15)

where \(R_{k}\) is the covariance of the observation noise.

3.4 Dataset used in DLMs

To use the optimized DLMs, one must first prepare the dataset for accurate use of this model. The dataset used in the optimized DLMs is based on the average received power. The average RSS technique employs the average received power for each sample in the track position. We take these samples for measured received powers at different times, with a time difference of nearly 1 ns between each sample to ensure that there is no extra change between samples. The average KF algorithm, on the other hand, uses the estimated average received power for the samples, with the estimation performed using the KF algorithm.

3.5 Proposed DLMs based under water localization

The pre-trained models are trained on a large benchmark dataset to solve various problems. In our study, four different optimized pre-trained models (e.g., SSD, RetinaNet, ResNet50V2, and InceptionResNetV2) are used for localization. To localize under water, all the models have different convolution and pooling layers.

3.5.1 ResNet50V2 and InceptionResNetV2 DLMs

The ResNet50V2 (Rodrigues et al. 2022) is the ResNet50 upgraded version. This architecture is based on skip connections, which allow to feed activation from one layer to the next. Inception-ResNet-v2 (Sarker et al. 2021) is the Inception mutual architecture with residual connections. The average Pooling 2D is used to calculate the average for each patch of the feature map during the training process for ResNet50V2 and InceptionResNetV2 models. The activations are then flattened to create a vectorized feature map, and two fully connected layers are connected: one with 128 nodes and the other with 2-class classification \((x,z).\) The second fully connected layer activations are then fed into a SoftMax layer, which calculates the probability for each coordinate \((x,z)\). The DLMs parameters are explained in Table 3.

Table 3 Parameters of DNN models

3.5.2 SSD DLMs

The Single Shot Detector (SSD) algorithm (Wulandari et al. 2022) is a one-stage detection model that can perform object localization and classification in a single neural network (NN) forward pass. The SSD is said to be faster and easier to train. The removal of region proposals and the feature resampling stage significantly increases speed. The network is fed to the received signal power, and the 2D Cartesian coordinates \((x,z)\) are predicted using a single network. The SSD predicts the offset for default boxes of varying sizes and aspect ratios in several feature layers, and then applies a 3 × 3 convolution to each feature dimension to provide box and class outputs. At the network's end, the outputs are combined to apply non-maximum suppression.

3.5.3 RetinaNet DLMs

The Resnet-101 serves as the RetinaNet (Wang et al. 2019) backbone network, followed by two task-specific subnetworks: the classification subnet and the box regression subnet. The classification subnet is a fully convolutional network (FCN) and is linked to each FPN level and predicts the probability of an object being present at each spatial position. Each pyramid level also has a regression subnet, which is also a small FCN. This subnet is in charge of regressing the offset of each anchor box.

3.5.4 DLMs, ResNet50V2, InceptionResNetV2, RetinaNet, and SSD with Hyper-Parameter Optimization

As previously stated, the choice of HP influences model performance and determining the ideal value for each HP is difficult. As a result, we use Bayesian optimization to adjust the appropriate HP for the used DLMs to see if there is any benefit. The HP tuning can be applied to both the Adam optimizer (Bock et al. 2017), the Stochastic Gradient Descent (SGD) and RMSProp optimizer (Liu et al. 2020). The best Adam optimizer-ResNet50V2 combination results in a learning rate of 0.000198, a beta_1 value of 0.788, and a loss metric of 2.11. The InceptionResNetV2 and RetinaNet have SGD optimizer learning rate values of 0.0099, 0.00049, and 0.981, 0.879 tuned momentum, respectively, for a loss metric of 1.77 and 1.88. Furthermore, the SSD has the best RMSProp optimizer combination with a learning rate of 0.0004. Based on our simulation results, the Adam optimizer outperforms the SGD optimizer in these datasets.

4 Results and discussion

4.1 Evaluation metrics

To achieve the superb robustness of proposed technique, various optimized DLMs are utilized. In this paper, we assess the performance of underwater localization for several DLMs using various strategies.

The evaluation of a metric is primarily based on the calculation of four parameters: number of true positives \((TP)\), true negatives \((TN)\), false negatives \((FN),\) and false positives \((FP)\). The accuracy, \(AUC, Pr, F1-score\), \(RMSE\), and computational time are used to assess classification performance. The accuracy is used to assess the rate of correct classification, \(Pr\) is the positive predictive value that corresponds to the original value, and \(Se\) is the true positive value. The harmonic mean of \(Pr\) and Se is used to calculate the \(F1-score.\) It is a more generalized method for balancing both \(Pr\). The \(AUC\) calculates the area beneath the entire \(ROC\) curve in two dimensions. The \(RMSE\) is an error metric that calculates a total error estimate. In our dataset, it is calculated as the square root of the arithmetic mean of squares of error. It provides an overall performance measure across all classification thresholds. (Ghonim et al. 2021) and defines all these metrics.

Table 4 shows that the proposed CEAPF optimized SSD model consumes less time to train and test than the other models. When the proposed optimized DLMs localization capability is considered, however, these computational durations are reasonable for underwater localization. Our research is based on Keras, a high-level Python library that runs smoothly on Notebook GPU cloud (2 CPU cores and 13 GB RAM). It is observed that the ResNet50V2 achieves the best performance based on computational time as shown in Table 4.

Table 4 Time for all models to be trained and tested

Various methods, RSS methodologies, average positions, and KF locations for two distinct channel models, CEAPF and WDGF, are based on optimized DLMs, as illustrated in Fig. 3a–h and Fig. 4. Although it is claimed that the enhanced SSD algorithm is faster and simpler to train, it has a poor accuracy. Optimized RetinaNet performs more accurately than optimized SDD, but it requires more time. Additionally, the enhanced ResNet50V2 produces the greatest results in the least amount of time. The optimized ResNet50V2 based on the average KF position method in the CEAPF channel model reaches 99.99% accuracy, 99.99% AUC, 99.98% precision, 99.89% F1-score, 0.099 RMSE, and 0.43 s testing time, according to the experimental data. We would like to emphasize that the optimized ResNet50V2 model, which has been found to have superior performance probability, is related to the obtained RMSE.

Fig. 3
figure 3figure 3

(a) Average RSS with KF technique based on CEAPF channel (Average RSS + KF + optimized DLM). (b) Average RSS technique based on CEAPF channel model + optimized DL models (Average RSS + optimized DLM). (c) RSS technique + KF based on CEAPF channel model + optimized DL models (RSS + KF + optimized DLM). (d) RSS technique + KF based on CEAPF channel model + optimized DL models (RSS + KF + optimized DLM). (e) Average RSS technique based on WDGF channel model + optimized DL models (Average RSS + optimized DLM). (f) Average RSS technique based on WDGF channel model + optimized DL models (Average RSS + optimized DLM). (g) Average RSS technique based on WDGF channel model + optimized DL models (Average RSS + optimized DLM). (h) RSS technique based on WDGF channel model + optimized DL models (RSS + optimized DLM)

Fig. 4
figure 4

Summary of the performance of our proposed optimized models

Table 5 is explains the comparison of performance of our proposed optimized models and to clarify Fig. 4.

Table 5 Performance of different strategies based on DL models

Table 6 compares our proposed framework to others in the literature, demonstrating that our proposed framework outperforms others in terms of accuracy, precision, AUC, F1-score and RMSE.

Table 6 Comparison between our framework and others in the literature

5 Conclusion

This paper investigates combining the KF algorithm and the optimized DLMs based on the CEAPF channel model significantly improves the performance of our proposed framework. According to the results of our trials, the proposed framework achieves a reasonable localization accuracy for underwater localization. When compared to previously published work, our proposed framework outperforms that found in many references, achieving 99.99% accuracy, 99.99% AUC, 99.98% precision, 99.89% F1-score, 0.099 RMSE, and 0.43 s for testing time. As a result, our proposed system has high accuracy, low complexity, and a small error distance while requiring very little training time.

The obtained results show that using CEAPF channel modeling with ResNetV2 strategy achieves the best accuracy of the localization for different methods, with a value of 99.99% with applying the average method with KF and RSS technique, where the improvement percentage for ResNetV2 over the InceptionResNetV2, RetinaNet and SSD is 0.56%, 1.35%, and 2.67% respectively. Also, the ResNetV2 outperforms other strategies for using RSS average technique by about 1.1%, 2.57% and 4% for InceptionResNetV2, RetinaNet, and SSD, respectively. The RSS with KF and DLM achieves a higher accuracy with ResNetV2 than InceptionResNetV2, RetinaNet and SSD by 1.01%, 1.8%, and 2.91%, respectively.

Using WDGF achieves accuracy less than that in CEAPF where for using KF with average RSS method, CEAPF achieves improvement by ~ 8.52% when using ResNetV2 and 9.11%, 10.18%, and 10.46% when using InceptionResNetV2, RetinaNet and SSD, respectively. When using the average RSS technique, the improvement percentages for using CEAPF over WDGF is 6.27%, 7.09%, 6.96%, and 7.45% for ResNetV2, InceptionResNetV2, RetinaNet, and SSD, respectively. Applying the RSS with KF with CEAPF channel modeling improves the performance than using WDGF by ~ 4.04%, 3.91, 5.88%, and 5.99%, for ResNetV2, InceptionResNetV2, RetinaNet and SSD, respectively.

We use an automatic hyper-parameter (HP) approach to the Bayesian optimization models ResNet50V2, InceptionResNetV2, SSD, and RetinaNet. The ResNet50V2 based on average RSS technique hybrid with KF in CEAPF channel model achieves 99.99% accuracy, 99.99% area under the curve (AUC), 99.98% precision, 99.89% F1-score, 0.099 RMSE and 0.43 s testing time.