Semi-supervised Adversarial Domain Adaptation for Seagrass Detection Using Multispectral Images in Coastal Areas

Islam, Kazi Aminul; Hill, Victoria; Schaeffer, Blake; Zimmerman, Richard; Li, Jiang

doi:10.1007/s41019-020-00126-0

Semi-supervised Adversarial Domain Adaptation for Seagrass Detection Using Multispectral Images in Coastal Areas

Open access
Published: 02 June 2020

Volume 5, pages 111–125, (2020)
Cite this article

Download PDF

You have full access to this open access article

Data Science and Engineering Aims and scope Submit manuscript

Semi-supervised Adversarial Domain Adaptation for Seagrass Detection Using Multispectral Images in Coastal Areas

Download PDF

Kazi Aminul Islam ORCID: orcid.org/0000-0002-9320-0858¹,
Victoria Hill²,
Blake Schaeffer³,
Richard Zimmerman² &
…
Jiang Li¹

3052 Accesses
12 Citations
Explore all metrics

Abstract

Seagrass form the basis for critically important marine ecosystems. Previously, we implemented a deep convolutional neural network (CNN) model to detect seagrass in multispectral satellite images of three coastal habitats in northern Florida. However, a deep CNN model trained at one location usually does not generalize to other locations due to data distribution shifts. In this paper, we developed a semi-supervised domain adaptation method to generalize a trained deep CNN model to other locations for seagrass detection. First, we utilized a generative adversarial network loss to align marginal data distribution between source domain and target domain using unlabeled data from both data domains. Second, we used a few labelled samples from the target domain to align class specific data distributions between the two domains, based on the contrastive semantic alignment loss. We achieved the best results in 28 out of 36 scenarios as compared to other state-of-the-art domain adaptation methods.

Learning domain invariant representations of heterogeneous image data

Article 28 August 2023

A Bayesian-inspired, deep learning-based, semi-supervised domain adaptation technique for land cover mapping

Article 04 March 2021

Open Set Domain Adaptation for Hyperspectral Image Classification Using Generative Adversarial Network

Find the latest articles, discoveries, and news in related topics.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Seagrasses create critically important marine ecosystems that provide food to marine animals and humans, stabilize the sea bottom, and absorb carbon dioxide from the environment. Seagrass can be found in coastal areas all over the world [1]. Previous assessments of seagrass distributions from remotely sensed imagery have mostly been performed manually by domain experts [2], although various automated classification methods are now being explored [3]. Our previous work showed that deep convolutional neural network (CNN) models can effectively detect seagrass in multispectral images if the models were trained with enough labelled data [4, 5].

Deep CNN models usually require a large number of labelled training data to achieve competitive results. For seagrass quantification, these labelled data are obtained by in situ observations that are time consuming and labor intensive. Consequently, it can be difficult to collect enough labeled data to train a separate model for each location. However, a well-trained deep CNN model at one location may fail at another location if seagrass density distribution shifts from source domain to target domain. This happens due to the change of appearance/distribution of seagrass from one location to another. Our previous models degraded if directly applied to different locations for seagrass detection [4, 5].

For seagrass detection, we usually have a large amount of unlabeled data for a given new location and it is possible to obtain limited labeled data by domain experts. In this study, we propose a novel domain adaptation approach that uses both unlabeled data and a few labeled samples to learn an effective classifier for new locations. First, we utilized an unsupervised adversarial domain adaptation approach to adapt target domain representation to mimic source domain representation so that the classifier trained in source domain may work in target domain. In the unsupervised domain adaption step, we do not use any labeled samples from the target domain to solve the domain adaptation problem. Second, we utilized a supervised approach with the contrastive semantic alignment loss to learn domain invariant representations between source and target domains. The first step aligns marginal distribution between domains and the second step aligns class specific distributions using a few labeled samples from target domain. The contrastive semantic alignment loss consists of semantic alignment and separation losses. Here, the semantic alignment loss keeps the same class samples from different domains as close as possible. The class separation loss tries to put different class samples from different domains as far as possible. The proposed domain adaptation approach optimizes target domain embedding function to create a simple classifier that can work effectively in the target domain.

Contributions of our proposed approach are:

A novel approach that uses both unlabeled and a few labeled samples in the target domain to learn a domain invariant embedding for domain adaptation. It can utilize a large amount of unlabeled data for efficient training.
To the best of our knowledge, this is the first attempt and successful system that can generalize deep CNN models for seagrass detection from one location to another.

The paper is structured as follows: Section 2 discusses the relevant literature. Sections 3 and 4 describe the proposed method and experimental setup. Sections 5 and 6 present results and discussions, respectively, and Sect. 7 summarizes conclusions.

2 Related Work

2.1 Seagrass Distributions Mapping

Automated systems to map seagrass distribution in multispectral satellite images have been developed. Traganos et al. proposed a support vector machine (SVM) approach to map the Mediterranean seagrass distribution in Greece utilizing Sentinel-2 satellite imagery [6, 7]. Lions et al. utilized field survey data and multi-spectral image data from the QuickBird satellite for seagrass mapping in shallow coastal water [8]. Different data sources including Landsat [9], IKONOS [10,11,12], Quickbird [13] and WorldView-2 satellite image sensors [4, 5, 14,15,16], and different machine learning models such as decision trees, naive Bayes, SVMs [9], maximum likelihood [10, 11, 15, 16] and deep capsule network [4, 5] have been utilized for effective seagrass distribution mapping. However, no model can be directly applied to new locations successfully without adaptation.

2.2 Deep Learning

Deep learning models are a subset of machine learning methods which were inspired to mimic mammal’s vision system. A typical deep learning model consists of multiple layers of feature extraction processing units named as “neurons”. During training, these neurons learn to extract useful features from data to perform classification or regression. Deep learning has been successfully applied in image classification [17, 18], image segmentation [19], image super-resolution [20,21,22], hyperspectral images[23], object detection [24], speech recognition [25], audio classification [26], computer-aided medical diagnosis [27, 28], medical imaging [29, 30] and cybersecurity [31,32,33]. Among different deep learning models, deep CNN is the most popular model and more details are provided in a comprehensive survey by Alom et al [34]. A deep CNN model scans input image using a set of trained filters to search for matched patterns contained in the filters. Each layer in the deep CNN model contains a number of trained filters. A layer close to input searches for simple patterns such as edges with different orientations and layers adjacent to output try to match more class-specific patterns to conduct classification. This hierarchy feature extraction mechanism is key to the success of CNN. Popular deep vision CNN models include AlexNet [17], VGG-net [35], Resnet [18], Dense-net [36] and inceptionV3 [37]. Deep CNN has also been applied for seagrass detection in our previous studies [4, 5]. Deep learning models include feature extraction in the optimization loop and achieve state-of-the-art performances in many applications [34]. However, one challenge of deep learning models is they require large training data to achieve competitive performances, making adaptation of deep learning models between domains difficult.

2.3 Domain Adaptation

Domain adaptation techniques can be applied if there are not enough labeled data available to train a deep learning model from scratch in a new domain. In domain adaptation, a model in source domain is first trained using available large training dataset. A domain adaptation method is then applied to adapt the trained model to a new domain (named as target domain) w/o a few labeled samples from the target domain. Tzeng et. al proposed an unsupervised domain adaptation method that used the adversarial loss to match source and target domain distributions [38]. Motiian et. al proposed a semi-supervised approach for domain adaptation which used the Siamese architecture for domain adaptation [39]. This model learned an embedding function for source and target data where the two domains were semantically aligned and different classes were maximally separated.

3 Proposed Model

3.1 System Diagram

The diagram of the proposed domain adaptation method for seagrass detection is shown in Fig. 1. There are enough labelled data in the source domain to train a deep CNN model for seagrass detection, while only a few labelled samples in target domain as shown in Fig. 1a. The trained CNN model contains multiple convolutional layers for feature extraction and a fully connected layer for classification. These convolutional layers essentially learn an embedding function, and the fully connected layer takes its outputs for classification. Our proposed system uses two steps to adapt the embedding function trained in source domain to target domain as shown in Fig. 1b. In the first step, the proposed model uses unlabeled samples from both domains to modify the target embedding function while keeping the source embedding fixed, so that the outputs from both embedding functions have a similar distribution. In the second step, the proposed model pair labelled samples from source domain and a few labelled samples from target domain were used to align class specific distributions among both domains. Once the target embedding function is adapted, a simple classifier can be trained using the few labelled samples from the target domain to perform seagrass detection on the remaining target domain (Fig. 1c).

3.2 Model Architecture

Figure 2 shows the domain adaptation procedures of the proposed method. We first train a deep CNN model in the source domain with labeled data (Fig. 2a), where the CNN model learns an embedding function, $G_s$, named as source embedding function, and a simple classifier, $C_s$, for seagrass detection. In the target domain (Fig. 2b), we first use unlabeled data samples from both domains to adapt the target embedding function with a genarative adversarial network (GAN) loss such that the discriminator cannot tell which domain an embedding comes from. This step will align marginal data distributions $p(G_s(x^s))$ and $p(G_t(x^t))$ of the source and target domains. In Fig. 2c, we utilize a few labeled samples from target domain with a classification and a contrastive semantic alignment loss to further adapt the target embedding function such that the class specific data distributions $p(G_s(x^s)|y)$ and $p(G_t(x^t)|y)$ from the two domains are aligned after embedding. Figure 2d illustrates the training and testing steps for class specific alignment. We will detail each of the steps in the following subsections.

3.3 Deep CNN Model Training in Source Domain

Let $D_{s}=\{X^{s}, Y^{s}\}$ and $D_{t}=\{X^{t},Y^{t}\}$ denote source and target domain datasets, and we assume that there are limited amount of labeled samples available in target domain. A source domain deep CNN model is trained with the following classification loss (Fig. 2a),

$$\begin{aligned} L_c(f_s)=E[l(f_s(X^s),Y^s)] \end{aligned}$$

(1)

where $f_s$ is a classifier to be trained, E denotes the expectation function, and l denotes any related loss functions.

A classifier, f, can be modeled as two functions as $f=G \circ C$, where G is the embedding function from the input image X to embedding space and C is the function for predicting the class label from the embedding space. So $f_s=G_s \circ C_s$ and $f_t=G_t \circ C_t$ denote the deep CNN model in source domain and target domain, respectively.

3.4 Adversarial Discriminative Domain Adaptation

By following the idea in Tzeng et al. [38], we utilize the GAN loss to adapt the embedding function $G_s$ in source domain to target domain. It is assumed that we have source image ${X}^s$ with label ${Y}^s$ from source domain distribution $p_{s}(x, y)$, and image ${X}^t$ from target domain where we do not have any label information. This unsupervised domain adaptation step tries to learn a target embedding function $G_t$ based on $G_s$ and unlabeled data from both domains. $G_t$ and D in Fig. 2b are trained by MinMax optimization with the GAN loss $L_{\rm{adv}_D}(X^{s},X^{t},G_{s},G_{t})$,

$$\begin{aligned} L_{\rm{adv}_D}\left( X^{s},X^{t},G_{s},G_{t}\right) &=E_{x^{s} \sim X^{s}}\left[ {\rm{log}}D(G_{s}(x^{s}))\right] \nonumber \\&-E_{x^{t}\sim X^{t}}\left[ {\rm{log}}\left( 1-D\left( G_{t}(x^{t})\right) \right) \right] \end{aligned}$$

(2)

where D is the discriminator used in the GAN model [40] and works as a classifier trained by the cross-entropy loss. The source domain samples are labeled as ‘1’ and target domain samples labeled as ‘0’. The discriminator, D, distinguishes whether a sample belongs to source domain or target domain. The target embedding function $G_t$ modifies its parameters using following generator loss,

$$\begin{aligned} {\rm{Min}}_{G_{t}} L_{\rm{adv}_G}\left( X^{s},X^{t},D\right) = -E_{x^{t} \sim X^{t}}\left[ {\rm{log}}D(G_{t}(x^{t}))\right] \end{aligned}$$

(3)

This is similar to the standard GAN loss where $G_t$ modifies its weights to mimic source domain sample embeddings to fool the discriminator, D. During training, we keep $G_s$ fixed while changing $G_t$.

3.5 Classification and Contrastive Semantic Alignment

If there is a distribution shift between source and target domains, the source deep CNN model will not perform well in the target domain. We utilize a few labeled samples in the target domain and some labeled samples in the source domain to jointly adapt $G_s$ and $G_t$ using the classification loss and the contrastive semantic alignment (CCSA) loss proposed by Motiian et al. [39] as shown in Fig. 2c),

3.5.1 Classification Loss

We define the classification loss as

$$\begin{aligned} L_C(G \circ C)=E[l(f(X),Y)] \end{aligned}$$

(4)

This loss function is minimized in the source domain and target domain, respectively, with the selected labeled samples from corresponding domain. This step will separate samples from different classes in both source and target domains, respectively.

3.5.2 Contrastive Semantic Alignment (CSA) Loss

To align class specific embedding between source and target domains, we use the CSA loss to jointly adapt $G_t$ and $G_s$. The CSA loss in target domain contains two components and can be described as

$$\begin{aligned} L_{\rm{CSA}}(G_t)=L_{\rm{SA}}(G_t)+L_{\rm{CS}}(G_t) \end{aligned}$$

(5)

where $L_{\rm{SA}}(G_t)$ is the semantic alignment loss and $L_{\rm{CS}}(G_t)$ is a class separation loss. $L_{\rm{SA}}(G_t)$ is computed as,

$$\begin{aligned} L_{\rm{SA}}(G_t)=\sum \limits _{a=1}^{N_c}{d(p(G_s({X}_a^s)),p(G_t({X}_a^t)))} \end{aligned}$$

(6)

where $N_c$ is the number of class label, $X_a^s=X^{s}/\{Y=a\}$ and $X_a^t=X^t/\{Y=a\}$ are conditional random variables. d is a distance metric between the distribution of $X_a^s$ and $X_a^t$. This semantic alignment loss tries to map source domain and target domain data samples as close as possible if they carry the same class label. However, there is no guarantee that samples from different domains with different labels will be mapped as far as possible in the embedding space. To overcome this challenge, the class separation loss $L_{\rm{CS}}(G_t)$ is computed as

$$\begin{aligned} L_{\rm{CS}}(G_t)=\sum \limits _{a,b|a \ne b}{k(p(G_s({X}_a^s)),p(G_t({X}_b^t)))} \end{aligned}$$

(7)

where k is a similarity matrix which adds a penalty when the distribution of $X_a^s$ and $X_b^t$ are close to each other. This encourages samples with different labels from different domains to be mapped as far as possible in the embedding space. Figure 2d shows the working mechanism of the CSA loss.

During training, the semantic alignment loss (orange arrows) keeps the same class samples from different domains as close as possible. The class separation loss (red dashed line) tries to put different class samples from different domains as far as possible. The classification loss (blue solid line) ensures high classification accuracy in the embedding space. During testing, we use the trained target mapping function to put the unseen target samples into domain invariant space. The overall classification and contrastive semantic alignment loss becomes

$$\begin{aligned} L_{\rm{CCSA}}(G_t)=L_{C}(G_t \circ C_t)+L_{\rm{SA}}(G_t)+L_{\rm{CS}}(G_t) \end{aligned}$$

(8)

Equations (5)–(8) are used to optimize $G_t$. A similar set of equations are used to optimize $G_s$ such that both embedding functions are jointly adapted.

We paired each labeled sample in target domain with randomly selected labeled and unlabeled samples in source domain to compute the loss in Eq. (8), where d(, ) in Eq. (6) is Euclidean distance in the embedded space and k(, ) in Eq. (7) is a similarity measure defined between samples.

3.6 Loss Function Computation

The semantic alignment loss and class separation loss are defined as distance or similarity between distributions. It is not easy to estimate conditional distribution for each class given just a few labelled samples in target domain. Following the method described in [39], we compute the semantic alignment loss as

$$\begin{aligned} {d(p(G_s({X}_a^s)),p(G_t({X}_a^t)))}= \sum \limits _{i,j}{d(G_s(x_i^s),G_t(x_j^t)))} \end{aligned}$$

(9)

where $(x_i^s, x_j^t)$ are all paired labelled samples in source and target domains. Each labelled sample in target domain is paired with many selected labelled samples of the same class in source domain such that $y_j^t=y_i^s=a$. It helps a single labeled target sample to be paired with many source labelled samples and force target labelled samples to be mapped as close as possible to the same class samples in source domain. The class separation loss is calculated as

$$\begin{aligned} {k(p(G_s({X}_a^s)),p(G_t({X}_b^t)))}= \sum \limits _{i,j}{k(G_s(x_i^s),G_t(x_j^t)))} \end{aligned}$$

(10)

where a and b denote class labels and $a \ne b$. Each labelled sample in target domain is paired with many labelled samples from different classes in source domain. The distance measure, d(, ), is defined as Euclidean distance in the embedded space,

$$\begin{aligned} {d(G_s(x_i^s),G_t(x_j^t))}= \frac{1}{2} \left\| G_s(x_i^s)-G_t(x_j^t) \right\| \end{aligned}$$

(11)

The similarity measure, k(, ), is calculated as

$$\begin{aligned} {k(G_s(x_i^s),G_t(x_j^t))}= \frac{1}{2}max\left( 0,m- \left\| G_s(x_i^s)-G_t(x_i^t) \right\| \right) ^2 \end{aligned}$$

(12)

Here we use the Frobenius norm and m is the margin that specifies the separability in the embedding space. The combination of $L_{\rm{SA}}(G)$ and $L_{\rm{CS}}(G)$ is also known as contrastive loss as defined in [39]. Note that we use the CCSA loss to jointly optimize $G_t$ and $G_s$.

4 Experiment Setup

4.1 Datasets

We validated the proposed model on three multispectral images captured by the WorldView-2 satellite at three locations in Florida coastal area: Saint Joseph Bay (SJB), Keeton Beach (KB) and Saint George Sound (SGS). Each image has eight bands (Coastal Blue, Blue, Green, Yellow, Red, Red Edge, NIR-1 and NIR-2) with spatial resolution of 2 meters. An experienced domain expert (co-author of this paper) labelled some regions for five classes in each image: seagrass, sea, sand, land, and inter tidal as shown as green, blue, cyan, yellow and magenta in Fig. 3. Figure 3d–f shows classification results by a physics model [41]. In this study, we trained a deep CNN model at one location and utilize the proposed domain adaptation model to generalize the model to other locations for seagrass detection.

4.2 WorldView-2 Atmospheric Correction

We performed atmospheric correction in the three WorldView-2 satellite multispectral images by matching the images with in situ measurements collected at 22 stations across the images on the same day by a survey boat. At each station, the following measurements were obtained by two spectroradiometer systems in tandem:

$[E_s(0^+)]$: downwelling spectral irradiance above the sea surface (395 to 795 nm, 2.5 nm bandwidth),
$L_{\mu }(0.65, \lambda )$: upwelling spectral radiance 0.65 m beneath the sea surface where $\lambda $ represents wavelength [HTSRB, Satlantic Instr.],
$E_{\mu }(0.21)$ and $L{\mu }(0.21)$: upwelling irradiance and radiance 0.21 m beneath the sea surface [HyperPro, Satlantic Instr.].

With these measurements, we calculated the following attributes:

Spectral upwelling diffuse attenuation coefficient,
$$\begin{aligned} K_{L_\mu }=-\frac{1}{z}\ln \frac{L_{\mu }(0.65)}{L_{\mu }(0.21)} \end{aligned}$$
(13)
where z was the difference in depth between the sensors placed at 0.65 m and 0.21 m.
Upwelling radiance just beneath the air–water interface $L_{\mu }(0-, \lambda )$ was calculated using $KL_\mu (\lambda )$ to propagate $L_\mu (0.21, \lambda )$ to the surface using Beers Law [42].
Remote sensing reflectance $[R_{rs}(\lambda )]$ was computed as $L_w (0+, l)/E_s(0+, \lambda )$.

We then reduced the spectral resolution of the field measurements to match the spectral bands of the WorldView-2 image based on the published spectral response functions (www.digitalglobe.com). Finally, we performed a linear regression between the 22 in situ measurements to their corresponding WorldView-2 spectra at the same location and created the gain and offset for each band to effectively remove atmospheric signals from the image.

4.3 Data Analysis

We compared the spectral signatures of each class in the multispectral WorldView-2 images taken at different locations. To better visualize the high-dimensional spectral information, we utilized the t-distributed stochastic neighbor embedding (t-SNE) algorithm [43] to compress high-dimensional data to 2 dimensions.

4.4 k-Fold Cross-validation (CV) for Seagrass Detection

At each of the three locations, we performed cross-validation for seagrass detection in the labeled regions. The experimental results gave us performance upper limits for domain adaptation. In k-fold CV, we split data into k parts and kept one part for testing and the remaining parts for training. We repeated this experiment k times such that each part was tested once.

4.5 Domain Adaptation Between Different Locations

In the domain adaptation experiments, each image was used as source image to train a deep CNN model and it was then adapted to other two locations guided by a few labeled samples from the new locations.

4.6 Models for Comparison

4.6.1 Source-Only

The source-only model used source domain samples to train a deep CNN model and the model was then directly applied to new locations for seagrass detection.

4.6.2 ADDA

Adversarial discriminative domain adaptation (ADDA) [38] adapts the embedding function in the source domain to the target domain based on the GAN loss (Sect. 3.4) with all unlabeled samples in new locations, which was then combined with the classifier trained in source domain to detect seagrass at the new locations.

4.6.3 Source + Target

We trained a deep CNN model in the source domain and used a few labeled data samples from the target domain to fine-tune the model. This is a baseline model for transfer learning.

4.6.4 CCSA

This model used the contrastive semantic alignment loss and classification loss to learn the embedding function and classification layers [39]. We used two separate embedding functions that were jointly optimized for source and target domains (Sect. 3.5).

4.6.5 Proposed Model

We first used the GAN loss to adapt the embedding function trained in the source domain. Then the CCSA loss together with a few labeled samples from target domain was utilized to further adapt the model to new locations as detailed in Sect. 3.5.

5 Results

5.1 Data Analysis

Figure 4 shows atmospherically corrected means and standard deviations of the eight WorldView-2 multispectral bands in the labelled regions at the three locations. Land had the highest mean spectral magnitude as it is located above water. Intertidal class is located between sand and land on spectral magnitude. These classes were followed by sand, seagrass and sea in spectral magnitude.

It is also observed that spectral signatures of the same class at different locations have different shapes, indicating that there are distribution shifts among locations. In Fig. 4d, we show t-SNE representations for samples from all the three locations. We use green, blue, cyan, yellow and magenta to represent seagrass, sea, sand, land and intertidal classes. We use three shades to represent three different locations: the most bright shade, most dark shade and shade between this two to represent SJB, SGS and KB samples, respectively (Fig. 4d). Note that there are significant distribution shifts among different locations in different classes.

5.2 Hyper-Parameter Determination

Deep CNN models take a patch from the multispectral image to predict a class label for the central pixel of the patch. A large patch may cause over-smoothing and requires higher computation power, whereas a too small patch may degrade the performance. After some trial and error, we found that a $5\times 5\times 8$ patch size produced the best results in the threefold CV experiment. Other parameters were determined in the same way and are listed below.

5.2.1 Embedding Functions $G_s$ and $G_t$, in CNN Models

Both contain two convolutional layers followed by a flatten layer. The first layer had 20 filters with a size of 2 * 2 * 8, and the second layer had 100 filters with a size of 4 * 4 * 20. All layers used ReLu activation function.

5.2.2 Classifiers $C_s$ and $C_t$, in CNN Models

Both contained a fully connected layer with 84 hidden units, and the output layer had 5 units with SoftMax activation function for classification.

5.2.3 Source and Target Data Pairing

400 labeled samples from each class in source domain were randomly selected to pair with the few labeled samples in target domain to compute the loss function described in Sect. 3.5.

5.2.4 Training Parameter Settings

We trained the source CNN models 50 epochs with a batch size of 128. We trained the unsupervised adversarial domain adaptation step 300 epochs and the CCSA step 240 epochs in all experimentals.

5.2.5 Learning Rate

We used 0.0002 as the learning rate in all experiments. No dropout layer was used.

5.3 Cross-validation

Table 1 shows threefold CV results at the three locations to find upper limits of domain adaption. We achieved 99.99% accuracy at SJB, 99.98% at KB and 99.71% at SGS, respectively. The low variances indicate that the results are very reliable.

Table 1 Threefold cross-validation results at SJB, KB and SGS

Full size table

5.4 Domain Adaptation

Table 2 Classification results in target domain by different methods (All numbers are in %)

Full size table

We conducted six domain adaptation experiments for the three WorldView-2 satellite images as KB $\rightarrow $ SJB, SJB $\rightarrow $ KB, SGS $\rightarrow $ SJB, SJB $\rightarrow $ SGS, SGS $\rightarrow $ KB and KB $\rightarrow $ SGS. Comparison of our proposed model with previous models and results is shown in Table 2. For each domain adaptation experiment, we implemented 6 scenarios including 1 to 5-shot and 10-shot cases (n-shot stands for having n labeled samples from each class). One “shot” means one labeled sample per class in target domain is used to adapt the model. Each scenario was performed three times with randomly selected labelled samples from target domain, and means and standard deviations are shown in Table 2. The proposed method achieved the best results in 28 out of 36 scenarios in Table 2. In the 10-shot domain adaptation scenario, the proposed method approached to model upper limits (3-fold CV performances). The second best model is the Source+Target (f.t.) that achieved the best results in 4 out of 36 scenarios in Table 2.

5.5 t-SNE Plotting

We demonstrate how the proposed model maps samples from different domains to the embedding space by utilizing the t-SNE algorithm with the following procedure:

1.
Compress the original samples from source and target domains (200 = 5*5*8 dimensions) to 2 dimensions using the t-SNE algorithm (before adaptation),
2.
Feed original samples from source and target domains to the embedding functions, $G_s$ and $G_t$, respectively, to obtain new representations in the embedding space,
3.
Compress the new representations to 2 dimensions using the t-SNE algorithm (after adaptation),
4.
Plot the compressed data samples on 2D plane using different colors for different classes. Use blue, cyan, green, yellow and magenta colors to represent sea, sand, seagrass, land and intertidal class. Utilize two different shades of same color to denote target and source samples

t-SNE results are shown in Fig. 5 for three domain adaptation scenarios: SJB $\rightarrow $ KB, KB$\rightarrow $ SGS and SGS $\rightarrow $ KB. We used 400 samples in each class, respectively, from source and target domains. The proposed model achieved better embedding for sea and seagrass classes as compared to CCSA model in the scenario of SJB $\rightarrow $ KB as shown in Fig. 5a. In Fig. 5b, c, similar trends are observed for KB $\rightarrow $ SGS and SGS $\rightarrow $ KB cases. The CCSA model incorrectly mapped seagrass samples closer to sea samples and sand samples in the embedding space. Unsupervised domain adaptation method was performed poorly in all the cases as shown in Fig. 5.

5.6 Ablation Study

Table 3 Ablation study of the proposed method

Full size table

Our proposed model contained two loss functions: semantic contrastive alignment loss and GAN loss. If we remove the GAN loss from the proposed model and just use semantic contrastive alignment loss for domain adaptation, the model would be equivalent to the CCSA model. If we remove the contrastive semantic alignment loss from the proposed model, then it will be equivalent to the unsupervised ADDA model. Our proposed model also used joint optimization for the source embedding function, $G_s$, and the target embedding function, $G_t$, in the supervised domain adaption step. We investigated the three components in the ablation study for KB $\rightarrow $ SGS and results are shown in Table 3. Note that ADDA does not require labelled samples from target domain, so only one scenario was performed. The proposed model with all the three components achieved the best results.

5.7 Classification Maps

The classification maps produced by our proposed model, CCSA approach, and baseline model are shown in Fig. 6. The first row of Fig. 6 represents the base line classification maps where we directly applied classification models trained in source domains to classify target domain images without performing any adaptation. The baseline model performed poorly as compared to the physics model as shown in Fig. 3. Second and fourth rows of Fig. 6 represent classification maps produced by CCSA model with 1-shot (Fig. 6b) and 5-shot (Fig. 6d), respectively. In this step, we used only contrastive semantic alignment loss to perform domain adaptation task. The third and the last row in Fig. 6 represents classification results by the proposed model with 5-shot. We used both the GAN loss and the contrastive semantic alignment loss for domain adaptation. The proposed model with 5-shot produced good classification results as compared to the physics model as shown in Fig. 3. Note that the classification maps shown here are for visualization purpose only as the physics model has 10% error [41].

6 Discussion

Our proposed approach produced the best results for 28 out of 36 domain adaptation experimental scenarios as shown in Table 2. For KB $\rightarrow $ SGS and SGS $\rightarrow $ KB, our proposed method won all the scenarios. For SJB $\rightarrow $ KB, our model achieved 98.84% accuracy using just one labelled sample from target domain and it is much better than CCSA (71.26%), ADDA (35.76%) and Source+Target (f.t.) (84.78%). For KB $\rightarrow $ SGS, our proposed model with one labelled sample from target domain achieved an accuracy of 93.32% as compared to CCSA (72.49%), ADDA (78.69%) and Source+Target (f.t.) (63.39%). Similar trends can also be found in SGS $\rightarrow $ KB in all the 1-shot domain adaptation cases except SJB $\rightarrow $ SGS, where all the methods achieved similar results. On average, our proposed method won by a large margin.

As we utilizing more labeled samples from the target domain, the proposed method can still provide better domain adaptation, winning four or five out of the six experimental scenarios with 2-shot up to 10-shot cases. On average, however, the winning margin decreased as more labeled samples were used for adaptation. For the 10-shot scenario, CCSA and the proposed method achieved similar results and the results were close to the threefold CV results, indicating that adding more labeled samples from target domain did not provide more benefits.

For most of the scenarios, standard deviations of the proposed method were much smaller than these of other methods. Our method first utilized a large number of unlabeled samples in both domains to perform domain adaptation. We then used a few labeled samples from target domain to semantically align class specific distribution in the embedding space. The first step of the method aligned marginal distribution based upon a large number of unlabeled data and worked as a regularizer for the subsequent semantic alignment. Therefore, the proposed method can provide more stable performances.

Figure 5 shows t-SNE plots for data samples or embeddings in source and target domains before and after domain adaptation. Before adaptation, we can see that data distributions in source domain and target domain are not aligned. ADDA aligned distributions between the source and target domains, but there is no guarantee that the same class samples from different domains will be mapped closer in the embedding space. With the guidance of labelled samples, CCSA and the proposed model can do a better semantic alignment: same class samples from different domains can be mapped closer, and the proposed method can do a better job as compared to CCSA.

We only performed the KB $\rightarrow $ SGS case study for ablation as shown in Table 3. All three components in the proposed model are important. With joint optimization, the proposed model became much more stable and achieved much smaller standard deviation in performances for all the scenarios. With more labeled samples from target domain, CCSA can perform much better than ADDA.

As compared to the physics model classification maps in Fig. 3d–f, the classification maps produced by the proposed model with 5-shots were much better than those from the direct source domain model as shown in Fig. 6a. Classification maps produced by CCSA with 5-shot (Fig. 6d) are good. However, those produced by CCSA with 1-shot (Fig. 6b) are much worse. Note that the physics model results have 10% error [41] and the classification maps are shown for visualization purpose only. For accurate quantitative assessment of these models, please see results in Table 2 where the accuracy was computed in the labeled regions.

7 Conclusion

Automatic seagrass detection systems in multispectral images are important tools for seagrass monitoring. Labelling atmospherically corrected multispectral images is labor intensive and time consuming. We developed a semi-supervised domain adaptation method for deep CNN models for seagrass detection. The proposed model first used unlabelled samples in both domains to adapt source domain model to target domain based on the GAN loss. Then it utilized contrastive semantic loss with a few labelled samples from target domain to further adapt the model. In addition, the source model and target model were jointly optimized in the second step. We evaluated the proposed model in three atmospherically corrected WorldView-2 multispectral images taken in Florida and achieved the best results among 28 out of 36 experimental scenarios. Future work will evaluate the proposed model with other image detection methods across broader regional areas such as the southeastern USA.

References

Cullen-Unsworth L, Jones BL, Lilley R, Unsworth RK (2018) Secret gardens under the sea: What are seagrass meadows and why are they important? Frontiers for Young Minds 6(2)
Moniruzzaman M, Islam S, Lavery P, Bennamoun M, Lam CP (2019) Imaging and classification techniques for seagrass mapping and monitoring: a comprehensive survey. arXiv:1902.11114
Su L, Huang Y (2019) Seagrass resource assessment using worldview-2 imagery in the redfish bay, Texas. J Mar Sci Eng 7(4):98
Article Google Scholar
Islam KA, Perez D, Hill V, Schaeffer B, Zimmerman R, Li J (2018) Seagrass detection in coastal water through deep capsule networks. In: Chinese conference on pattern recognition and computer vision (PRCV). Springer, pp 320–331
Perez D, Islam K, Hill V, Zimmerman R, Schaeffer B, Li J (2018) Deepcoast: quantifying seagrass distribution in coastal water through deep capsule networks. In: Chinese conference on pattern recognition and computer vision (PRCV). Springer, pp 404–416
Traganos D, Reinartz P (2018) Mapping mediterranean seagrasses with sentinel-2 imagery. Mar Pollut Bull 134:197–209
Article Google Scholar
Traganos D, Cerra D, Reinartz P (2017) Cubesat-derived detection of seagrasses using planet imagery following unmixing-based denoising: is small the next big?. Remote sensing and spatial information sciences. International Archives of the Photogrammetry, Amsterdam, p 42
Google Scholar
Lyons M, Phinn S, Roelfsema C (2011) Integrating quickbird multi-spectral satellite and field data: mapping bathymetry, seagrass cover, seagrass species and change in moreton bay, Australia in 2004 and 2007. Remote Sens 3(1):42–64
Article Google Scholar
Li F, Xiao B (2011) Aquatic vegetation mapping based on remote sensing imagery: an application to honghu lake. In: 2011 International conference on remote sensing, environment and transportation engineering. IEEE, pp 4832–4836
Hochberg EJ, Andréfouët S, Tyler MR (2003) Sea surface correction of high spatial resolution ikonos images to improve bottom mapping in near-shore environments. IEEE Trans Geosci Remote Sens 41(7):1724–1729
Article Google Scholar
Mishra DR, Narumalani S, Rundquist D, Lawson M (2005) High-resolution ocean color remote sensing of benthic habitats: a case study at the roatan island, honduras. IEEE Trans Geosci Remote Sens 43(7):1592–1604
Article Google Scholar
Fornes A, Basterretxea G, Orfila A, Jordi A, Álvarez A, Tintoré J (2006) Mapping posidonia oceanica from ikonos. ISPRS J Photogram Remote Sens 60(5):315–322
Article Google Scholar
Yang D, Yang Y, Yang C, Zhao J, Sun Z (2011) Detection of seagrass in optical shallow water with quickbird in the Xincun bay, Hainan province, China. IET Image Proc 5(5):363–368
Article Google Scholar
Cerdeira-Estrada S, Heege T, Kolb M, Ohlendorf S, Uribe A, Müller A, Garza R, Ressl R, Aguirre R, Mari N et al (2012) Benthic habitat and bathymetry mapping of shallow waters in Puerto Morelos reefs using remote sensing with a physics based data processing. In: 2012 IEEE international geoscience and remote sensing symposium. IEEE, pp 4383–4386
Eugenio F, Marcello J, Martin J (2015) High-resolution maps of bathymetry and benthic habitats in shallow-water environments using multispectral remote sensing imagery”. IEEE Trans Geosci Remote Sens 53(7):3539–3549
Article Google Scholar
Marcello J, Eugenio F, Marqués F, Martín J (2015) Precise classification of coastal benthic habitats using high resolution worldview-2 imagery. In: 2015 IEEE international geoscience and remote sensing symposium (IGARSS). IEEE, pp 2307–2310
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241
Dong C, Loy CC, He K, Tang X (2015) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307
Article Google Scholar
Dai T, Cai J, Zhang Y, Xia S.-T, Zhang L (2019) Second-order attention network for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11065–11074
Wang X, Yu K, Wu S, Gu J, Liu Y, Dong C, Qiao Y, Change Loy C (2018) Esrgan: enhanced super-resolution generative adversarial networks. In: Proceedings of the European conference on computer vision (ECCV), pp 0–0
Hu W, Huang Y, Wei L, Zhang F, Li H (2015) Deep convolutional neural networks for hyperspectral image classification. J Sens 2015
Liu Y, Wang Y, Wang S, Liang T, Zhao Q, Tang Z, Ling H (2019) Cbnet: a novel composite backbone network architecture for object detection. arXiv:1909.03625
Hinton G, Deng L, Yu D, Dahl G, Mohamed A-R, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Kingsbury B et al (2012) Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process Mag 29(6):82–97
Article Google Scholar
Hershey S, Chaudhuri S, Ellis DP, Gemmeke JF, Jansen A, Moore RC, Plakal M, Platt D, Saurous RA, Seybold B et al (2017) CNN architectures for large-scale audio classification. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 131–135
Islam KA, Perez D, Li J (2018) A transfer learning approach for the 2018 FEMH voice data challenge. In: 2018 IEEE international conference on big data (big data). IEEE, pp 5252–5257
Banerjee D, Islam K, Mei G, Xiao L, Zhang G, Xu R, Ji S, Li J (2017) A deep transfer learning approach for improved post-traumatic stress disorder diagnosis. In: 2017 IEEE international conference on data mining (ICDM). IEEE, pp 11–20
Li F, Tran L, Thung K, Ji S, Shen D, Li J (2015) A robust deep model for improved classification of AD/MCI patients. IEEE J Biomed Health Inf 19(5):1610–1616
Article Google Scholar
Li F, Tran L, Thung K, Ji S, Shen D, Li J (2014) Robust deep learning for improved classification of AD/MCI patients. In: International Workshop on Machine Learning in Medical Imaging. Springer, Cham, pp 240–247
Ning R, Wang C, Xin C, Li J, Wu H (2018) Deepmag: sniffing mobile apps in magnetic field through deep convolutional neural networks. In: IEEE conference on pervasive computing and communication (PerCom). IEEE, pp 1–10
Ning R, Wang C, Xin C, Li J, Wu H (2020) Deepmag+: sniffing mobile apps in magnetic field through deep learning. Pervas Mobile Comput 61:101106
Article Google Scholar
Chowdhury M, Hammond F, Konowicz G, Li J, Xin C, Wu H (2017) A few-shot deep learning approach for improved intrusion detection. In: 2017 IEEE 8th annual ubiquitous computing, electronics and mobile communication conference (UEMCON). IEEE, pp 456–462
Alom MZ, Taha TM, Yakopcic C, Westberg S, Sidike P, Nasrin MS, Hasan M, Van Essen BC, Awwal AA, Asari VK (2019) A state-of-the-art survey on deep learning theory and architectures. Electronics 8(3):292
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Tzeng E, Hoffman J, Saenko K, Darrell T (2017) Adversarial discriminative domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7167–7176
Motiian S, Piccirilli M, Adjeroh DA, Doretto G (2017) Unified deep supervised domain adaptation and generalization. In: Proceedings of the IEEE international conference on computer vision, pp 5715–5725
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems. pp 2672–2680
Hill VJ, Zimmerman RC, Bissett WP, Dierssen H, Kohler DD (2014) Evaluating light availability, seagrass biomass, and productivity using hyperspectral airborne remote sensing in Saint Joseph’s Bay, Florida. Estuaries Coasts 37(6):1467–1489
Article Google Scholar
Kirk J (1994) Light and photosynthesis in the sea, vol 2. Cambridge University Press, Cambridge
Google Scholar
Maaten LVD, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2605
MATH Google Scholar

Download references

Acknowledgements

This work was supported by the NASA Ocean Biology and Biogeochemistry Program NRA #NNH16ZDA001N and augmented by the NASA Commercial Data Buy. This article has been reviewed by the Center for Environmental Measurement and Modeling and approved for publication. Mention of trade names or commercial products does not constitute endorsement or recommendation for use by the US Government. The views expressed in this article are those of the authors and do not necessarily reflect the views or policies of the U.S. EPA.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Old Dominion University, Norfolk, VA, USA
Kazi Aminul Islam & Jiang Li
Department of Ocean, Earth and Atmospheric Sciences, Old Dominion University, Norfolk, VA, USA
Victoria Hill & Richard Zimmerman
Office of Research and Development, U.S. Environmental Protection Agency, Durham, NC, USA
Blake Schaeffer

Authors

Kazi Aminul Islam
View author publications
You can also search for this author in PubMed Google Scholar
Victoria Hill
View author publications
You can also search for this author in PubMed Google Scholar
Blake Schaeffer
View author publications
You can also search for this author in PubMed Google Scholar
Richard Zimmerman
View author publications
You can also search for this author in PubMed Google Scholar
Jiang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kazi Aminul Islam.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Islam, K.A., Hill, V., Schaeffer, B. et al. Semi-supervised Adversarial Domain Adaptation for Seagrass Detection Using Multispectral Images in Coastal Areas. Data Sci. Eng. 5, 111–125 (2020). https://doi.org/10.1007/s41019-020-00126-0

Download citation

Received: 31 March 2020
Revised: 04 May 2020
Accepted: 08 May 2020
Published: 02 June 2020
Issue Date: June 2020
DOI: https://doi.org/10.1007/s41019-020-00126-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Semi-supervised Adversarial Domain Adaptation for Seagrass Detection Using Multispectral Images in Coastal Areas

Abstract

Similar content being viewed by others

Learning domain invariant representations of heterogeneous image data

A Bayesian-inspired, deep learning-based, semi-supervised domain adaptation technique for land cover mapping

Open Set Domain Adaptation for Hyperspectral Image Classification Using Generative Adversarial Network

Explore related subjects

1 Introduction

2 Related Work

2.1 Seagrass Distributions Mapping

2.2 Deep Learning

2.3 Domain Adaptation

3 Proposed Model

3.1 System Diagram

3.2 Model Architecture

3.3 Deep CNN Model Training in Source Domain

3.4 Adversarial Discriminative Domain Adaptation

3.5 Classification and Contrastive Semantic Alignment

3.5.1 Classification Loss

3.5.2 Contrastive Semantic Alignment (CSA) Loss

3.6 Loss Function Computation

4 Experiment Setup

4.1 Datasets

4.2 WorldView-2 Atmospheric Correction

4.3 Data Analysis

4.4 k-Fold Cross-validation (CV) for Seagrass Detection

4.5 Domain Adaptation Between Different Locations

4.6 Models for Comparison

4.6.1 Source-Only

4.6.2 ADDA

4.6.3 Source + Target

4.6.4 CCSA

4.6.5 Proposed Model

5 Results

5.1 Data Analysis

5.2 Hyper-Parameter Determination

5.2.1 Embedding Functions \(G_s\) and \(G_t\), in CNN Models

5.2.2 Classifiers \(C_s\) and \(C_t\), in CNN Models

5.2.3 Source and Target Data Pairing

5.2.4 Training Parameter Settings

5.2.5 Learning Rate

5.3 Cross-validation

5.4 Domain Adaptation

5.5 t-SNE Plotting

5.6 Ablation Study

5.7 Classification Maps

6 Discussion

7 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation