Salinity analysis based on multivariate nonlinear regression for web‐based visualization of oceanic data

Traditionally, temperature-salinity (T-S) relationship was analysed to indicate the characteristic of water mass, and prediction models based on regression may be built to estimate the salinity in earlier researches. Temperature-salinity characteristic however might change dynamically with respect to the geographic location, season, or water layer, and is quite sensitive to the depth for the same location. It is therefore of interest whether including depth into the regression model could help to improve the prediction accuracy. In this paper, multivariate nonlinear regression is investigated to predict the salinity according to both temperature and depth. Experimental results show that depth is very effective for improving the prediction accuracy, and season-dependent model may achieve better performance than season-independent model. In addition, when the analysis was conducted for 5-year range, it is found the prediction accuracy is significantly higher than the result for all years, which indicates there might exist long-term variation on the characteristics of the water masses. Furthermore, 3D model and visualization scheme were proposed to explore the effect of depth on the temperature-salinity-depth characteristic, and a visualization system was built accordingly. This system may present the T-S curve and 3D Model according to the assigned criteria of season or multi-year range, and allows the user to view the similarity map for the given T-S-D data so as to conduct comparative study of water masses for a wide area of ocean. MNLR is utilized to predict the salinity according to both temperature and depth. 3D model improves to explore the effect of depth on the water mass. 3D visualization scheme improves to analyze water mass characteristic significantly. MNLR is utilized to predict the salinity according to both temperature and depth. 3D model improves to explore the effect of depth on the water mass. 3D visualization scheme improves to analyze water mass characteristic significantly.


Introduction
Temperature and salinity have long been important characteristics for oceanography researches. The relationship of temperature and salinity, denoted as T-S relationship conventionally, may convey rich information about the structure and circulation of water. Every water mass has its own relationship between temperature and salinity that can be used for analysis (Emery and Dewar 1982a). T-S data have been widely explored, based on which a few models may be built to estimate the salinity with respect to temperature from hydrographical observations. In addition, the T-S relationship is sensitive to the depth, and therefore in earlier researches the T-S characteristics were analysed for four layers of water mass

Terrestrial, Atmospheric and Oceanic Sciences
Page 2 of 20 Wu and Lin Terrestrial, Atmospheric and Oceanic Sciences (2022) 33:6 individually (Chen andWang, 1998, Jan et al. 2015;Lien et al. 2015;Yang et al. 2015). In oceanography, temperature-salinity-depth (T-S-D) data were used in several studies that reported the variations of the water masses in seawater (Emery and Dewar 1982b;Troccoli and Haines 1999;Jenkins et al. 2015). On the other hand, the prediction model of salinity is a valuable issue for the management of ocean data. World Ocean Database (Boyer et al. 2018), for example, contain 3.56 billion individual profile measurements. Among these data 1.95 billion are temperature and 1.13 billion are salinity measurements, and these measurements make up the 15.7 million oceanographic casts. The measurements of temperature profiles are more numerous than salinity profiles. If the salinity can be estimated reliably through the prediction model, there would be more salinity data which are less expensively and available for investigating the water mass. According to previous study, the cluster analysis is used to define the water mass similarity based on the data of T-S-D (Hur et al. 1999). The T-S-D relationship can be defined a similarity function to distinguish water masses Qi et al. 2014). Based on the above discussions, we consider whether it is possible to include depth as a parameter for a model capable of predicting salinity more accurately. Such a model could be beneficial for the ocean researchers to explore and interpret the water masses, and could be used to identify water masses, to compare the water masses, to estimate salinity for predicting the climate and weather patterns, and to determine the nature of the transformation and interaction of different waters. Multivariate regression developed in statistics is good at predicting the relation between random variables. It is worthy to investigate whether and how such approach could be applied to the prediction of T-S-D relationship. In this paper, the T-S-D relationship representing the characteristic of a water mass is modelled as a problem of Multivariate Non-Linear Regression (MNLR) whose coefficients can be estimated through the minimization of the prediction error. Based on this regression model, salinity may be predicted according to temperature and depth, and the prediction error lower than polynomial regression may be achieved. The prediction performance with respect to season, multi-year range, and water layer, is also analysed and discussed.
The characterization of the thermohaline structure of an ocean region is typically based on vertical profiles measurements, where data is as discrete set of observations of temperature and salinity sampled at varying depths (Assunção et al. 2020). With the additional dimension of depth, it could possibly assist the ocean researchers to explore, interpret and compare the water masses efficiently. The 3D visualization can accelerate the exploration of T-S-D characteristics with additional dimension of depth and the discriminative information from different perspectives. Such visualization scheme can help to further explore and interpret earlier research issues, such as Kuroshio intrusion into the South China Sea, by taking the depth into consideration. In this paper, a three-layer architecture of 3D data visualization is proposed and utilized to build a web-based system that may facilitate the comparative study on the T-S-D characteristics of the water masses for a wide area of ocean. This system provides the users with a flexible interface through which they may query, explore, and compare the temperature-salinity-depth data analytically in real time.

Temperature-salinity-depth data
To build a predictive model, a database with historical conductivity-temperature-depth (CTD) observations that span spatially over a domain for a period of time is required. Taiwan, located between the tropics and the subtropics, lies on the border between the largest land mass and the largest ocean in the world, where the marine or atmospheric environments are complicated and sensitive (Chien et al. 2010). The success of the analysis method of ocean research relies highly on the quantity and quality of the available research data. For this purpose, the data provided by Ocean Data Bank of the Ministry of Science and Technology (ODB/MOST) are used in this research. Since 1986, an oceanographic database of a large area, called the ODB, has been built and operated by the Institute of Oceanography, National Taiwan University, and adopted to analyze the water masses around Taiwan. The data in ODB are obtained from R/V Ocean Researcher I, II, and III through long-term surveys around Taiwan for decades. ODB provides CTD data for relevant investigations in the region of East Asian Seas. In this study, the distribution of ocean conductivity, temperature, and depth is in around the region of 10-30°N latitude and 110-130°E longitude. The routes of the cruises for Ocean Researcher I (red), II (orange), and III (pink) for collecting the T-S-D data during 1986-2017 are shown in Fig. 1. We use annual CTD maximum depth data from 51,780 casts collected during the last three decades. All CTD data were processed by ODB/MOST with strict quality control, and converted into temperature-salinity-depth data for regression analysis. In addition, in this research the geographical area near Taiwan between 10° to 30°N latitude and 110° to 130°E longitude was equally divided into 15′ × 15′ grid. Every CTD record was uniquely assigned to a grid according to its latitude and longitude. A few sample records of CTD data for a Page 3 of 20 Wu and Lin Terrestrial, Atmospheric and Oceanic Sciences (2022)   Page 4 of 20 Wu and Lin Terrestrial, Atmospheric and Oceanic Sciences (2022) 33:6 grid are shown in Table 1, in which CenterID signifies the centre of the grid to which those records belong.

Multivariate non-linear regression
Statistical approaches, such as regression models, are effective tools for investigating the relationship between dependent and independent variables (Razi and Athappilly 2005). Among the regression models, multivariate nonlinear regression (MNLR) is able to model nonlinear relationship flexibly between a dependent variable and a few explanatory variables. It has been successfully employed to model a wide range of hydrologic processes (Razi and Athappilly 2005). MNLR is derived from multivariate linear regression model that can be formulated as follows (Montgomery et al. 2015).
where Y is an N-dimensional vector of prediction output, X is an N × (k + 1) matrix, β is a (k + 1)-dimensional vector of the coefficients, and ε is an N-dimensional vector of the error terms. Notice that X consists of X i 's, where X i is the augmented array for a sample in form of . Furthermore, ordinary-least-squares (OLS) can be used to find out the optimal coefficients, β , that minimize the prediction error, e(β) , for the given training samples. The prediction error is computed as where Y i is the observed output of the sample X i . The value of β which minimizes this sum is called the OLS estimator. The function e(β) is quadratic in β with positive-definite Hessian, and therefore this function possesses a unique global minimum (Hayashi 2000). The optimal coefficients of β can be solved by Multivariate linear regression as depicted above could be used to model nonlinear relationship for the samples by performing nonlinear transformation on either dependent or explanatory variables. The nonlinear model might provide a better estimate because it is unbiased and produces smaller residuals (Glantz and Slinker 1990). To apply the multivariate nonlinear regression to the prediction of the temperature-salinity-depth characteristic, Eq. (1) can be modified as where f temp is the regression model containing the polynomial terms for temperature T , f depth is the model containing depth D, and f cross is the model containing the cross terms of T and D . Equation (4) can be reduced to Eq. (1) by performing variable conversion and coefficient substitution. For example, by applying the variable conversions, Y = S , X 1 = T , X 2 = T 2 , X 3 = T 3 , X 4 = T 4 and X 5 = T 5 , and the coefficient substitutions, j = β j for j equivalent to 0,1,…,5, Eq. (4) becomes the polynomial regression of degree five with β being a 6-dimensinal vector. If the variable conversions, X 6 = T 2 D and X 7 = T D , and the coefficient substitutions, γ 1 = β 6 and γ 2 = β 7 , are further applied, the two cross terms of T and D ( X 6 and X 7 ) are then included in the regression model with β being a 8-dimensinal vector. The variables and coefficients in Eq. (5) through (7) may be mapped to β j 's and X j 's in Eq. (1) flexibly. In earlier research it was shown that the temperature-salinity relationship could be well modelled as a polynomial of degree five (Wu et al. 2014). In such condition, the terms in f temp in Eq. (5) are kept while the terms in f cross and f depth in Eqs. (6) and (7) are dropped. A few experiments will be conducted later to investigate how the terms of the regression model influence the prediction performance.

Root mean square error and coefficient of determination (R 2 )
For the MNLR analysis proposed in the previous section, the salinity S is represented as a function of temperature and depth with a set of coefficients, β . The error for each sample is the difference between the real observed value and the estimated value. In order to evaluate the model, root mean square error (RMSE) is used to indicate how well the outcomes are predicted by the regression model. It is an indicator of the average prediction error as defined below.
where Y i is the actual output value, Ỹ i is the predicted value, and N is the total number of samples. Additionally, to avoid the scale dependency, Normalized Root Mean Square Error (NRMSE) was proposed as defined below. (5) Page 5 of 20 Wu and Lin Terrestrial, Atmospheric and Oceanic Sciences (2022) 33:6 where σ is the standard deviation of the observed outputs.
In addition, the coefficient of determination, denoted as R 2 here, is used to estimate the percentage of variance of the response variable that can be well explained by its relationship with the explanatory variables (Glantz and Slinker 1990), which is defined as follows.
where Y i is the observed output, Ỹ i is the corresponding predicted output, and Y is the mean of all the observed data. SST is the sum of squared errors between the actual output and the mean of all outputs, and SSR is the sum of squared prediction errors for the prediction model. The higher the coefficient of determination (R 2 ), the better the prediction performance is. Earlier research suggested R 2 greater than 0.7 stands for strong effect size (Moore et al. 2015). In this paper, 0.7 is selected as the threshold of R 2 for determining if the regression analysis for a grid is effective and much superior to the overall mean. Furthermore, to evaluate the whole effectiveness for a wide area, MNLR is performed for every grid to obtain its R 2 value, and the percentage of effective grids out of all grids can be obtained. For a wide area of waters, the more grids with R 2 higher than 0.7, the more effective the regression model is.

Comparison of different MNLR models
The regression analysis of T-S-D relationship is conducted on the large historical data near Taiwan, which requires a large storage volume and a lot of computational time. To reduce the consumption of resources and accelerate the processing speed, the ocean area was divided into 1504 grids (15′ × 15′), and MNLR is conducted for every grid individually. The two performance metrics, root mean square error and the percentage of effective grids with R 2 higher than 0.7, are computed respectively. Table 2 shows the experimental results of the baseline MNLR analysis. It can be observed from Table 2, the percentage of effective grids for polynomial regression, f temp (T ) , is 77.66%, which means 1168 grids out of 1504 grid cells are effective. If the model for depth, f depth , is added to the baseline, the percentage increases a little. The percentage of effective grids, 79.72% and 79.11%, can be obtained for f temp (T ) + D and f temp (T ) + D 2 , respectively. This implies the linear term of depth is more useful than the quadratic term. On the other hand, when the model for the cross terms of temperature and depth, (6) is added to the baseline, the percentage rises more significantly, up to 80.44% and 80.92% for the model f temp (T ) + TD and f temp (T ) + T 2 D , respectively.
If both the depth model f depth and the cross model f cross are used, the highest percentage, 82.77%, can be achieved finally, and the corresponding model is summarized as follows.
This model contains the polynomial function of temperature of degree five, the linear term of depth, and two cross terms of temperature and depth. Though this model works well for a large area, it was however found that for some region near Taiwan Strait, which is located between longitude 119°E and 122°E and latitude 23°N and 26°N, the prediction accuracy is low. It means this model might not work well for some region with very high spatiotemporal variability.

Variations of prediction performance
In addition to the baseline analysis and the comparison of various MNLR models, it is of interest how the prediction performance varies with respect to such factors as season, multi-years range, and water layer. Some experiments are conducted in this section to explore this issue.
Seasonality T-S-D characteristics of the water masses might change with respect to season. The months for four seasons in this study are defined as follows: December, January, and February for winter, March April, and May for spring, June, July, and August for summer, and September, October, and November for autumn. Regression analyses were performed for every season, and the results are shown in Table 3. The percentages of effective grids for winter, autumn, spring and summer are 86.42%, 89.35%, 84.20% and 82.77%, respectively. The average of (11) Page 6 of 20 Wu and Lin Terrestrial, Atmospheric and Oceanic Sciences (2022) 33:6 season-dependent percentages is 85.69%, which is significantly higher than the season-independent result, 82.77%. This means that for the large area of 1,504 grids, season-dependent prediction model can estimate the salinity effectively for more grids than the season-independent model.
Five-year Variation It is of interest to explorer the long-term changes of water masses. The MNLR analysis were conducted every 5 years for all the grids. The percentage of effective grids with respect to 5-years range is displayed in Table 4. It can be seen from this table, the percentages are roughly between 85 and 91% for the range of 5 years, which are much higher than the result for all the years (82.77%). Such results are reasonable since the characteristics of the ocean data for longer term tend to have higher variation which lead to higher prediction errors.
Water Layer It is worthy to investigate how the T-S-D characteristics of water masses change with respect to the water layer. We followed relevant researches which divided the water mass into surface waters (0-100 m), tropical waters (100-300 m), intermediate waters (300-800 m) and deep waters (800 m-bottom), and conducted the analysis of temperature-salinity characteristics for the four layers respectively (Chen and Wang 1998;Chen 2005;Mensah et al. 2014;Mensah et al. 2015;Jan et al. 2015;Lien et al. 2015;Yang et al. 2015). The surface and tropical waters have the largest property ranges, and occupy the least amount of ocean volume physically. The reverse is true of the intermediate and deep waters, which have a fairly restricted range but occupy a substantial portion of the ocean. Properties of most ocean water masses are established at the ocean's surface, strongly influenced by fluctuations at the ocean surface. Table 5 displays the result of regression analysis for the four water layers. It could be seen from

Real-time 3D model and visualization
In the previous section, MNLR model is used to analyse T-S-D characteristic of the water mass. With the additional dimension of depth, it could possibly assist the ocean researchers to explore, interpret and compare the T-S-D characteristics efficiently. In this section, we further proposed the real-time 3D Model and visualization scheme, which may facilitate the user to explore and compare the T-S-D characteristics of the water masses intuitively and flexibly, or observe and interpret specific phenomena, such as the influence of depth, and Kuroshio intrusion into the South China Sea.

MNLR surface and 3D model
Visualization contributes to the extraction and identification of the important information from large volumes of spatial-temporal data. Many studies have highlighted the value of visualization in environmental science, geology, meteorology, and hydrology (Xie et al. 2019). In this research, a visualization approach was implemented to render the T-S-D data and the corresponding surface of regression in 3D space. Figure 2a shows the 3D visualization for the grid at longitude 122.75°E and latitude 23.75°N, in which the blue surface was generated from the T-S-D data of the grid, and the orange surface was generated from the MNLR model learned from the data. As can be observed from Fig. 2a, the T-S-D data and the 3D surface of the regression model are close, which verifies visually the low RMSE of the regression model. However the high variability near the surface (wide blue area) and to a less extent at the intermediate layer (~ 600 m) cannot be reproduced accurately by the predictive model. This is because any given temperature is associated with a very wide range of salinity, depending on the location, time, and current conditions. In addition, rendering 3D surface for the regression model requires a lot of computations. For fast rendering, T-S-D data were downsized by taking the average according to the depth to obtain a T-S-D curve, which is a compact representation of the T-S-D data called 3D model here and is shown in Fig. 2b. The visualization interface, which integrates the T-S-D data, the MNLR surface and the 3D model, can provide the researchers intuitive, flexible and informative ways to inspect and interpret the ocean data interactively. For example, the user may rotate the 3D model to view and compare the curve and the surfaces from different perspectives. The user may also explore how the T-S-D characteristic of the water mass varies with respect to depth analytically through the interface.

Season variation of 3D model
T-S curve is one of the most important indicators for the characteristic of water mass. The comparison of the T-S curves may help to analyse the factors that determine the nature of the transformation and interaction of different waters. For example, on Kuroshio's western side, the mixing of Kuroshio and South China Sea water usually occurs in the Luzon Strait (Mensah et al. 2014). Comparing the mixing water with the tropical waters helps to analyse the interaction of water masses around Taiwan. Next, we demonstrate how the 3D model could help to investigate the variation of T-S-D characteristic with respect to season. Figure 3a and b are the 3D Model of the water mass at longitude 121.5°E and latitude 22.5°N near East China Sea for four seasons in two different perspectives. The viewing angle in Fig. 3a is perpendicular to the axis of depth, so the paths are quite close to the conventional T-S curves. Seasonal change of the characteristic is similar to that of Kuroshio surface water which has a strong seasonal thermal cycle. It could be observed from Fig. 3a, the temperature is the lowest (14.2-24.4 °C) in winter and the highest (23.7-30 °C) in summer. In contrast, salinity is higher (34.5-34.9 psu) in winter and lower (34.3-34.7 psu) in summer (Qi et al. 2014). This agrees with the known finding that the primary causes of seasonal change of water mass on the East China Sea continental shelf are wind stress, surface heating and cooling, precipitation and evaporation, river discharge, and the Kuroshio and its branch currents (Qi et al. 2014). In addition, the 3D model in Fig. 3a not only has compatible expressive power of conventional T-S curve, but conveys discriminative information due to the additional dimension of depth. Figure 3b displays the 3D Models for four seasons from another perspective. It could be observed from this figure, the paths of four seasons are separated in the upper 600 m but coincide largely in the deep waters. The salinity and temperature have high variability, which is not so obvious from the 2D view of the T-S curve. As a consequence, 3D visualization interface is a powerful tool for querying and comparing the temperature-salinity-depth characteristics at different locations or in different seasons such that knowledge discovery becomes much easier.

Comparison with typical waters
The typical waters were used to analyze the waters nearby Taiwan and their interactions with the surrounding tropical waters (Chen 2005, Mensah et al. 2014. In this study, two typical waters were selected for reference: the South China Sea and the Kuroshio near its origin. Figure 4a displays the T-S data of the grid at longitude 122.75°E and latitude 23.75°N (grey points), the T-S curve for the average data (red), and the reference T-S curves for the typical waters of South China Sea (green), Kuroshio (blue) and North  Figure 4b and c are the 3D Model for the same data in two viewing angles, respectively. It can be observed from Fig. 4b, the temperatures are higher in the upper 300 m, and the salinities can be easily distinguished. Figure 4c is the view by rotating the axis of depth to the front face, which looks quite similar to the conventional T-S curves shown in Fig. 4a. In addition, the similarity between the T-S data of the grid and the T-S curve may be computed for every reference T-S curve. In Fig. 4a it is shown that the grid is similar to the Kuroshio and the North Pacific by 19.92% and 80.08%, respectively. The 3D visualization with the T-S-D data shown in Fig. 4b makes it easier to distinguish and understand the characteristics of the water masses. As can be observed from Fig. 4b, the T-S-D paths in the surface, tropical, intermediate and deep waters are closer to the curve of North Pacific. Such sophisticated phenomenon is not apparent in Fig. 4a, but can be observed from 3D view of Fig. 4b. 3D visualization makes it easy to distinguish and understand the characteristics of the water masses. Fig. 4 The views of 2D/3D visualization for the grid at longitude 122.75°E and latitude 23.75°N, average data(red) and the typical waters of South China Sea (green), the Kuroshio (blue) and the North Pacific(pink) (a) shows raw T-S data (grey points) and the conventional T-S curves. b Is the view of 3D Model in an upper view angle. It makes easier to distinguish and understand the characteristics of the water masses. (c) is the view by rotating the axis of depth to the front face, and it looks similar to the view of conventional T-S curves, as shown in a Wu and Lin Terrestrial, Atmospheric and Oceanic Sciences (2022)

Kuroshio intrusion into the South China Sea
In the past decades, many researches have been done on the Kuroshio intrusion. The Kuroshio carrying the northwestern Pacific water intrudes into the South China Sea through the Luzon Strait, significantly affecting the temperature, salinity, circulation, and eddy generation in the South China Sea (Nan et al. 2015). This phenomenon can be illustrated with the example in Fig. 5. Figure 5a is the similarity map for the Kuroshio example dataset. Figure 5b shows the T-S data and the T-S curve at longitude 118.5°E and latitude 20.25°N, and the T-S curves of the two typical waters. In Fig. 5b, the T-S curve of the water (red) has the similarity of 67.79% and 18.38% with the typical waters of the South China Sea (orange) and the Kuroshio (brown), respectively. In earlier research, this water was interpreted as the mixing of the waters of South China Sea and Kuroshio (Nan et al.

2015)
. Figure 5c further displays the corresponding 3D Model for this water and the two typical waters. It could be observed from Fig. 5c, the red curve for this water lies between the curves for Kuroshio and the South China Sea in the upper water, and is very close to both of them in the deep water. The 3D visualization helps to verify the earlier research, and facilitates the user to explore or interpret the phenomenon of Kuroshio intrusion into the South China Sea. Moreover, an early study showed that Kuroshio intrusion into the South China Sea is all year-round through the Luzon Strait with greater strength in winter and summer than in spring and autumn (Qu et al. 2000). Figure 6 displays the T-S curves in four seasons for an example water mass of South China Sea, located in longitude 118.5°E, latitude 20.25°N. It could be seen from Fig. 6, this water in winter (34.14%) and summer (22.9%) has The T-S curve of the water (red) has the similarity of 67.79% with the typical water of the South China Sea (green) and the similarity of 18.38% with typical water of Kuroshio (blue). In the previous research, this water was interpreted as the mixing of the waters of South China Sea and Kuroshio (Nan et al. 2015). c further displays the corresponding 3D Model for this water and the two typical waters. The red curve for this water lies between the curves for Kuroshio and the South China Sea in upper water, and is very close to them in deep water Page 11 of 20 Wu and Lin Terrestrial, Atmospheric and Oceanic Sciences (2022) 33:6 higher similarity with the typical waters of Kuroshio than in spring (0%) and autumn (4.73%). In Fig. 7 the 3D Models of this water are depicted and compared with the 2D T-S curves in winter and summer. It could be observed from Fig. 7b and d, the Kuroshio intrusion into the South China Sea is sensitive to the depth. Significant effect of interaction occurs in the upper 300 m, and such effect appears less significant in the deep water. 3D visualization can help to observe and investigate such phenomena discriminatively. Fig. 6 Comparison for the average of T-S data of the grid (red) at longitude 118.5 o E, latitude 20.25 o N and the T-S curves for the typical waters of South China Sea (green) and the Kuroshio (blue). It could be seen this water in winter (34.14%) and summer (22.9%) has higher similarity with the typical waters of Kuroshio than in spring (0%) and autumn (4.73%). Kuroshio intrusion into the South China Sea is more significant in winter and summer (Nan et al. 2015) and the general public (Lipsa et al. 2012). The architecture includes three layers, the data layer, the service layer and the visualization layer, as shown in Fig. 8. In the data layer, the raw data were converted to the CTD data with quality control and aggregated into 15′ × 15′ squares. MNLR was then performed for all grids individually to Fig. 7 shows the 3D Models at longitude 118.5°E, latitude 20.25°N and compares them with the 2D T-S curves in winter and summer. It could be found from (b) to (d), the Kuroshio intrusion into the South China Sea is sensitive to the depth, and significant interaction effect occurs in the upper 300 m while the effect seems less significant in deep water. The 3D visualization could help to observe and investigate the phenomena more discriminatively Page 13 of 20 Wu and Lin Terrestrial, Atmospheric and Oceanic Sciences (2022) 33:6 obtain the T-S-D characteristics, which were stored as the analytical ocean data. In the service layer, the distances among the grids were computed pairwise and stored in the memory in advance so as to be accessed efficiently from the visualization layer. The detailed processes for every layer are described as follows.

Data layer
This layer provides the data storage of the aggregated data and the analytical ocean data for service layer. First, the raw data in ODB were converted into a set of CTD data with quality control. The dataset contains more than 20 million CTD records, which result in huge storage volume, poor indexing performance and large delay for search. To increase the efficiency of computing the T-S-D characteristics, the primary CTD data were aggregated according to their geographical coordinates. CTD data records in ODB were distributed into 1504 grids, and the data for each grid are used to compute the corresponding T-S-D characteristic through MNLR. Data aggregation is effective for reducing the computations of MNLR with the prerequisite that the T-S-D characteristic within the region of a grid is stable relatively. The coefficients of MNLR model, as depicted in Eq. (4) through (7), for all the grids are finally stored as analytical ocean data for further access.

Service layer
This layer has two modules: analytical web service and primary web service. The analytical web service provides the services of data analysis and communication with the client applications. The primary web service provides the service of data filtering used for rendering in the visualization layer.
Analytical Web Service This module interacts with the visualization layer and computes the similarities between water masses for the input T-S-D data. It can facilitate the user to search and track a large number of water masses visually based on similarity. The processing steps of this service are as follows. Fig. 8 The web-based interactive mapping and visualization tool to best present analytical ocean data in data layer, service layer and visualization layer. A web-based 3D visualization framework can be easily customized to display the analysis of oceanic forecasting data Page 14 of 20 Wu and Lin Terrestrial, Atmospheric and Oceanic Sciences (2022) 33:6 (1) The user selects a reference grid from the user interface and uses its T-S-D data for analysis. Every sample contains temperature, salinity and depth, which are denoted as T, S, and D, respectively. (2) Get the MNLR coefficients computed in the data layer for every grid. Substitute the temperature and depth into the MNLR model for each sample to obtain the predicted salinity, QS(T , D).
(3) Calculate for every sample the Euclidean distance between S and QS , which is the measure of prediction error. The smaller the distance is, the closer S is to QS . The overall distance for all samples can be computed as follows (Wu et al. 2014).
d(S, QS) is the distance between the input T-S-D data and the MNLR model for a specific grid, and can indicate the difference of T-S-D characteristic between two water masses. (4) Encode each distance between the reference grid and any grid with a gradient colour so as to express the similarity between the two grids. Page 15 of 20 Wu and Lin Terrestrial, Atmospheric and Oceanic Sciences (2022) 33:6 (5) Send the information of gradient colors to the visualization layer for displaying the similarity map through Google Map.
As far as spatial representation is concerned, the Geo-JSON standard proposed by Butler et al. (2016) is used. GeoJSON is a geospatial format of data interchange on JavaScript Object Notation (JSON). It defines JSON objects of several types that can be combined to represent data about the geographic features, the properties, and the spatial extents, which could form flexibly various data structures required by Google map.

Primary Web Service
This module provides the service of filtering the T-S-D data for an assigned grid. The filtering criteria could be set flexibly to explore how the T-S-D characteristics of the water masses varies with respect to different factors, such as season or range of years.

Visualization layer
In this layer a high-level visualization interface based on Google Map is connected to the low-level data from the Analytical Web Service in the service layer. Several modules were developed to support geo-collaboration through the JavaScript API of Google Map, and to present the map with the data produced in the service layers. Page 16 of 20 Wu and Lin Terrestrial, Atmospheric and Oceanic Sciences (2022) 33:6 The map was built with Google Map API that accesses the two-dimensional data and presents them with a number of colour levels. Such method has been used in earlier scientific fields (Epitropou et al. 2016, Liu et al. 2015. To present the water mass characteristic, both T-S curve and the 3D T-S-D Model are drawn according to the T-S-D data obtained from Primary Web Service. T-S curve expresses the equation of sea water, and has long been an important characteristic in oceanography. The 3D visualization can accelerate the exploration of T-S-D characteristics with additional dimension of depth and the discriminative information from different perspectives. A real-time 3D visualization tool, Plotly.js, was adopted because it allows the developers to quickly establish the 3D visualization applications and may avoid the intensive programming and computations (Qin et al. 2020). The web-based interface for 3D visualization was developed with HTML5, Plotly.js, and cascade style sheet, and the 3D model rendered on HTML5 Canvas through WebGL.

Visualization examples
To demonstrate the proposed approaches and evaluate the applicability and efficiency of the visualization framework, an example screenshot of searching Kuroshio water mass is presented in Fig. 9. The top left of Fig. 9 contains the interactive areas for inputting the reference T-S-D data and setting the filtering criteria based on season or range of years. Bottom left of Fig. 9 is the area for Page 17 of 20 Wu and Lin Terrestrial, Atmospheric and Oceanic Sciences (2022) 33:6 rendering the 3D Model of the T-S-D data in real time. On the right-hand side of Fig. 9, the similarity map is generated and displayed to compare the filtered T-S-D data with all the grids.
The example of season visualization shows the T-S relationships for the four seasons in Fig. 10. Each season displays the T-S data of the grid at longitude 121.5°E and latitude 22.5°N (grey points), the T-S curve for the average data (red). In winter, the water temperature significantly dropped to 19.0-24.0 °C, lower than in summer (25.0-30.0 °C). The salinity in summer is lower than in winter, and there may be mixed with the South China Sea water. As can be observed from this figure, seasonal change is prominent for the Kuroshio nearby Taiwan. In addition, the seasonal change in the similarity map for two reference grids, one in the Kuroshio region nearby Taiwan and the other in the South China Sea, are shown in Figs. 11 and 12, respectively. The colored masks are drawn by Google Map automatically, the visualization tool can help the understanding of the temperature-salinity relationship across space.
The data of multi-year ranges from 1985 to 2010 are set for aggregating the T-S-D data, with the corresponding similarity maps displayed in Fig. 13. This figure shows that the Kuroshio patterns are broadly similar for many years based on long-term observation. Notice here the multi-year range is around 5 years in order to aggregate sufficient T-S-D data for computing T-S characteristics, because the amount of data varies drastically year to year. Page 18 of 20 Wu and Lin Terrestrial, Atmospheric and Oceanic Sciences (2022) 33:6 The adopted technologies can efficiently display timevarying 2D T-S curve and 3D Model in a web browser for large-volume geospatial datasets, and avoid the complicated and time-consuming geo-processing tasks on the server side. In addition, automation techniques are utilized to simplify the daily activities of outputs processing tasks.

Conclusion
Taiwan is an important link on an island chain in the west Pacific, and the marine and atmospheric environments of Taiwan are complex and sensitive. Traditional T-S relationship that uses temperature only to predict salinity is inadequate for illuminating the change of the T-S characteristic in different water layer. In this paper, depth, as a potentially useful variable for salinity prediction of a water mass, is taken into consideration. By introducing depth into the multivariate nonlinear regression, it was verified that depth is effective for reducing the prediction error and may achieve better performance. The best obtainable model in our combinatorial experiment consists of the polynomial terms for temperature, a linear term for depth, and two cross terms of temperature and depth, for which the percentage of effective grids with R 2 higher than 0.7 is 82.77%.
In addition, when seasonality is investigated on MNLR model, it is found season-dependent model can achieve significantly better prediction performance than seasonindependent model. Near the surface waters (0-100 m), the salinity prediction has relatively low accuracy because of the very high variability. However, using the seasonal model, the derived salinity might still be usable. On the MNLR analysis for five-year range, the prediction performance of T-S-D characteristic is superior to the result for all years, which implies the water masses might have long-term variation on the T-S-D characteristics. Furthermore, surface and tropical waters are more difficult to predict due to the higher variations, while intermediate and deep waters may achieve more accurate estimates. For deep waters greater than 800 or 1000 m, the variability of waters is very weak and there has been limited interest in research works for the region. The MNLR Page 19 of 20 Wu and Lin Terrestrial, Atmospheric and Oceanic Sciences (2022) 33:6 model may be particularly useful in east of Taiwan for the tropical and intermediate water up until 122.75°E. This is because a front exists between SCS waters and Kuroshio waters, and these waters have clearly different T-S properties (Chen 2005;Jan et al. 2015;Mensah et al. 2020). However, it was also found in our analysis that for some region near Taiwan Strait, which is roughly located between longitude 119°E and 122°E and latitude 23°N and 26°N, the prediction accuracy is lower relatively. This indicates the possible limitation of this model for the region with high variability. Moreover, 3D model and visualization scheme were proposed in this paper to explore the effect of depth on T-S-D characteristic. Through the 3D visualization scheme, seasonal change becomes more distinguishable, and the change occurring mainly in the upper 600 m could be observed easily. Such visualization scheme can help to further explore and interpret earlier research issues, such as Kuroshio intrusion into the South China Sea, by taking the depth into consideration. For example, the effect of mixing the waters from Kuroshio and South China Sea is quite different for different water layer, which is significant in the upper 300 m and can be observed more easily in 3D view. The proposed 3D visualization scheme has been successfully applied to an interactive system in multi-layer architecture. The system allows the user to view the map of similarity for the input T-S-D data, and may present the corresponding T-S curve and 3D T-S-D Model according to the assigned filtering criteria such as season or multi-year range so as to conduct the comparative studies of water masses for a wide area of ocean.