Enhanced performance of on-chip integrated biosensor using deep learning

A new approach for determining the concentration composition of a multi-element media using a micro-ring resonator (MRR) is proposed which allows for both electrical and thermal noise removal as well as moderately higher average accuracy. This method uses two neural networks, namely a convolutional neural network (CNN) and a deep neural network (DNN). The CNN differentiates the transmission spectrum from the noise. This spectrum is used to obtain selected features before being fed into the DNN, which determines the concentration of each chemical in the analyte. Both models are trained to work using simulated data from a silicon on-insulator ring resonator operating between the infrared wavelengths of λ=1.46μm\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda =1.46\,\upmu \hbox{m}$$\end{document} to λ=1.6μm\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda =1.6\,\upmu \hbox{m}$$\end{document} on mixtures of water, ethanol, methanol, and propanol, although the same approach can be used with other designs and substances. The CNN was trained using the MRR transmission spectra superimposed with white Gaussian noise as well as Poisson noise to mimic different noise sources, while the DNN underwent training on the extracted features. Average root-mean-square error for element concentration for the entire system is 0.0775% for a range of concentrations from 0.0357 to 75%, and the largest error had a value of 0.68% concentration.


Introduction
The sensing of chemical compounds for monitoring or detection is necessary for various fields and industries, with applications ranging from analytical chemistry, medicine, healthcare, to industrial production (Wade and Bailey 2016;El Shamy et al. 2022;Wolfbeis and Narayanaswamy 2004).Integrated optical sensors offer a cost-effective and highperformance solution in these fields (Li et al. 2021), as well as the opportunity to integrate 967 Page 2 of 13 with electronics to achieve an effective lab-on-a-chip that is compact and portable (Duval et al. 2013).Furthermore, they can provide real time monitoring capabilities, label free detection, limited sample consumption and resistance to electromagnetic interference.
Integrated optical sensing falls into two categories, either detecting the real component n of the medium in refractive index sensing, or determining the imaginary component k through absorption sensing (Correa-Mena and Gonzalez 2017).Refractive index sensors may use optical devices such as Mach-Zehnder interferometers, micro-ring resonators (MRR), or surface plasmon polariton to detect n.They are especially useful in applications where the volume of samples is small, as n does not depend on the volume (Correa-Mena and Gonzalez 2017).However, they are not selective without additional structures, meaning that substances with the same detected n at a given wavelength cannot be distinguished.Absorption sensing, by comparison, measures absorption k by measuring the interaction between light and the sample.This means that it is dependant on the volume of the substance, since longer interaction lengths between light and the sample will result in higher light absorption and therefore higher sensitivity (Nitkowski et al. 2008).One way to circumvent this is the use of high quality factor micro-cavity, such as a MRR, where the light circulates multiple times through the ring hence having a much higher "effective" length where the light interacts with the analyte, increasing the optical path length from the micrometer range to one in the millimeters (Farca et al. 2007).
In fact, MRR have been shown to be an effective tool for a wide variety of sensing applications, from molecular weight distribution (Mordan et al. 2018) to biochemical and chemical sensing (Sun and Fan 2011).A method originally using an MRR to obtain the imaginary component of the refractive index (Nitkowski et al. 2008) was also extended to obtain the real component (El Shamy et al. 2022), allowing both to be measured simultaneously.
In this work, we propose a machine learning based technique to provide accurate chemical concentration measurement.Machine learning has shown growing popularity in recent decades given its powerful predictive capabilities, and it is no surprise that widespread adoption has spread into the photonics community (Li et al. 2021;Nguyen et al. 2021).There have been several studies that combine the sensing abilities of MRR and the analytical potential of machine learning models for chemical detection, but these usually focus on detection rather than concentration measurement (Lu et al. 2021;Hu et al. 2020;Malathi et al. 2023), meaning the models only perform classification.
A study by Li et al. (2021) used a neural network to decompose a multicomponent spectra into the spectra of the components in order to determine the concentrations of the components.They achieved an accuracy with a low root-mean-squared error, ranging from 0.13 to 2.28 mg/mL with a dataset of three chemicals containing 343 samples of mixtures of glucose, PEG200, and BSA.The study utilized the entire spectra rather than individual features due to the absence of distinct features unique to each component, highlighting the challenges faced in feature extraction.In Sect.3.2.1,we provide evidence to support the assertion that feature extraction directly from the spectrum is challenging, especially as the number of components increases.Similarly, Zhang et al. (2018) used a dual ring design with a backward-propagation neural network to measure binary mixtures of Acetone and 15% NaCl solution, and were able to obtain a RMSE of 0.000345.
On the other hand, the study by El Shamy et al. ( 2022) proposed a method of medium complex refractive index measurement with a MRR operating over = 1.46 − 1.6 μm .They use this data to measure concentration composition for mixtures of ethanol, methanol, propanol and water using the least-squares method, and achieved 97.4% in detection accuracy.However, their method assumed an idealized spectrum, and did not take into account noise sources.
As with other refractive index sensors, the technology's high sensitivity can render it vulnerable to various noise sources, especially in applications that require low limits of detection (Zhou et al. 2016).Sources of potential noise vary and can occur at the detector, the laser, or in the optical materials themselves.For the detector, the main sources of noise from shot noise, Johnson noise, and dark current noise (Zhou et al. 2016;Kiyat et al. 2003), albeit the effect of the latter is much less significant than the former two, and is therefore often neglected.Shot noise is a result of the discrete nature of light, resulting in statistical uncertainty in the arrival rate of photons, while Johnson noise is the result of thermal fluctuations affecting the load resistance of the photodetector.Frequency instability and the finite range of wavelengths of lasers can result in spectral noise, although this would be quite small in the case of intensity detection (Zhou et al. 2016).
The dominant source of noise occurs, however, in the the optical material itself due to thermal equilibrium fluctuations (Kiyat et al. 2003;Gretarsson and Gretarsson 2018).This is a result the thermo-optical effect, which causes the optical properties of a material to change in response to changes in temperature.For devices that are microns in size, these fluctuations have a significant effect, resulting in limits to the performance of some high sensitivity instruments.In the case of ring resonators, it is especially noticeable in rings with a high Q factor (Zhou et al. 2016), as well as those made of silicon, which has a high thermo-optical coefficient.The cumulative effect of these sources of noise can cause the detected transmission spectrum to be indistinguishable from the noise, resulting in false measurements.
We build upon the work of Raghi et al., using the same device and chemicals.We use their method of determining the complex refractive index as a method of providing multiple independent features for the model, overcoming the issues in feature extraction.Furthermore, our study benefits from a much larger and more diverse dataset, containing 6221 unique mixtures of four chemicals, which contributes to a more comprehensive analysis and robust model training.We also take into account real world effect by working with noisy spectra.
The proposed method approaches the problem by combining the predictions of multiple models in a simple stacking ensemble, which, compared to a single model, has the benefits of greatly improved generalization ability of the resulting system, resulting in higher predictive capability, reduced bias and variance, as well as a less noisy output (Zhou and Chen 2002).Using this method, we were able to obtain a root mean square error of 0.0775%.

Methods
Two neural networks are used to perform the chemical measurements.The reasoning for using such an ensemble was to utilize the strengths of different neural network architectures, namely the predictive power of standard deep neural networks (DNN) and the ability of convolutional neural networks (CNN) to identify local and global patterns.The CNN removes noise, and then key features are extracted from the spectrum which are fed into the DNN, which outputs the concentration of each chemical.Combining these processes, the suggested algorithm for applying all this would be as follows, as also shown in Fig. 1: 1. Measure the output of the MRR 10 times 2. Remove noise with the CNN 3. Measure the resonance wavelengths and transmissions 967 Page 4 of 13 4. Obtain n and k at these points 5. Use the DNN to predict results

Data generation
To train these models, a large amount of simulation data was generated.This was done with the same MRR as El Shamy et al. (2022), which is designed to operate between the wavelengths of 1.46-− 1.6 μm .Simulating the transmission spectra of the MRR at various concentrations required the refractive index of the various chemical combinations had to first be obtained.This required obtaining the complex effective index for the individual sensing media over the wavelength range 1.46-− 1.6, and were obtained from the work of Myers et al. (2018) and Kedenburg et al. (2012).To create unique mixtures of the chemicals, it was assumed that the complex refractive index of the medium ( n med ( i ) + k med ( i ) ) can be expressed as a linear combination of each chemical j's real ( n j ( i ) ) and imaginary ( k j ( i ) ) parts for x chemicals: where w j is the percentage of the mixture that contains a given chemical.By algorithmi- cally iterating over unique weights, it was possible to obtain 6221 unique complex refractive indexes over the target spectra corresponding to unique concentration combinations of the four chemicals.These resulting analytes were simulated surrounding a silicon-oninsulator waveguide with the aforementioned dimensions using MODE Finite Difference Eigenmode solver (https:// www.ansys.com/ produ cts/ photo nics/ mode) (Fig. 2).Once the (1) Fig. 1 Overview of the algorithm complex effective index was obtained for the wave guide in the different mixtures, the simulated data was extrapolated to allow a higher resolution of 5 pm.
With accurate spectra for the behavior of the waveguide in various analytes, it became possible to convert this data to the spectra of an MRR.Surrounded by a medium, the effective index n eff ( i ) + k eff ( i ) of the entire system is dependent on the refractive index of the medium, with the locations and transmission values of the resonances being unique to specific effective indexes.The real part of the refractive index of the medium determines the resonance wavelength res , while the imaginary part of the medium's refractive index determines the cavity losses and as a result the field attenuation coefficient a, thus determining the "depth" of the resonance.Assuming lossless coupling, the transmission spectra(T) of the MRR can be calculated with: where L is the cavity length of the ring, given by 2 R , where R is the ring radius.a is the attenuation factor and is the field loss coefficient, which is the sum of intrinsic ( int ) and medium absorption loss ( abs ).Intrinsic loss is caused by roughness on the surface of the waveguide, as well as bend losses.Bend losses are loosely inversely proportionate to the radius of the ring for a set waveguide width.However, increasing the radius reduces the free-spectral-range (FSR) of the device, resulting in a greater number and of resonances which limits the maximum refractive index change that can be detected.Therefore the ring is selected to be the minimum radius of curvature for the given waveguide width R min to reduce bend losses as much as possible.The waveguide width determines the sensitivity S wg of the device, determining the minimum maximum change in n and k that can be detected.These are given by: It was found that sensitivity is maximum for a fundamental quasi-transverse electric mode at a width of 270 nm.However, n max = 0.048 for the selected chemicals over the wave- length range, which outside the limits of detection for the selected width.Therefore, the less optimal width of 320 nm was selected, which corresponds to an R min of 7.6 μm , and results in an FSR in the range of 9.85-− 12.13 nm.returning to Eq. ( 6), r is the forward-cou- pling coefficient, which is a measure of how much light transmits through the waveguide as opposed to coupling to the ring cavity, and is function of the separation of the waveguide to the ring.Critical coupling, where the light coupled into the ring resonator is completely transferred out of the ring resonator, leading to a sharp and pronounced resonance peak in the transmission spectrum occurs when ( r ≈ a ).To achieve this, a separation distance of 500 nm was used.The height is standard for silicon on insulator technology, at 220 nm.Using a script and the aforementioned formulas and dimensions, it was possible to obtain the unique spectrum for the MRR under each analyte, hence creating a dataset that could be used to train the machine learning models.

Machine learning results
This section describes the two models that were trained to work together to obtain the mixture in an ensemble, including model architecture, hyperparameters, training, and feature selection.The CNN is trained on spectra obtained in the aforementioned simulations with added noise, while the DNN is trained on selected features from the dataset.Two different sets of features were attempted, one including various features directly measured from the spectrum, the other using the complex refractive index at the resonances.It was found that the second dataset resulted in far less error.

CNN for noise removal
As mentioned in the introduction, MRRs with high Q factors are extremely sensitive to noise.Noise can be broadly described as either additive or multiplicative, depending on the source.The main source of noise in MRR with high Q factors is thermal noise (Zhou Brownian noise shows a strictly Gaussian distribution, and therefore additive Gaussian noise was used to model this, as well as approximate the cumulative effect of noise from other sources (Momeni et al. 2009;Tombez et al. 2017).As far as multiplicative noise, Poisson noise was applied to approximate shot noise, as it has been shown to follow a Poisson distribution (Miller and Harvey 2001).The signal-to-noise ratio (SNR) was taken to be 11.8 dB, which is exaggerated as a proof of concept given the high Q factor of the ring resonator (Zhou et al. 2016), and is meant to demonstrate the ability of the network to remove noise.Given that spectral and thermal noise is dominant at Q > 8 * 10 4 , the intensity of Gaussian noise was assumed to be three times stronger than Poisson noise, and the Gaussian noise standard deviation was taken to be 5% of the transmission and the Poisson noise reduced to 1.6%.For the noise removal process, a small auto-encoder was designed to remove the applied noise.A convolutional neural network was selected due to their ability to recognize local and global patterns, which allows them to be a common architecture for noise removal, as well as successful with similar problems (Mannam et al. 2022).Furthermore, the model is able to remove both sources of noise simultaneously.The transmission data was split into groups of 130 to allow for a more manageable model, as a model with ≈ 18,000 inputs correspond- ing to each point in the spectrum would not be practical and is unnecessary.
The model was designed by initial trial and error before the useful hyperparameters were identified and a grid search was used to obtain the best model.This led to an encoder of 5 layers.The first two are one-dimensional convolutional layers (Conv1D) made up 130 and 16 neurons, respectively, while the third and fourth are made up of transpose onedimensional convolutional layers of size 16 and 130 respectively.The final layer is an output layer with a single Conv1D neuron.All layers but the last use rectified linear unit (ReLU) activation functions, which instead uses a sigmoid function.Furthermore, the kernel size was set to nine for all convolutional layers.Smaller values for convolution size showed admirable success as well, but larger ones showed greater and greater error.
During training, a Gaussian noise layer of standard deviation = 0.05 was used from the Tensorflow library to generate a new noise spectra every epoch.A custom Poisson noise layer of intensity of 0.016 was also created and added to achieve the same purpose.The Adam optimizer and binary cross-entropy were used for training over 5 epochs.A train-test split was not used, as the noise provided to the network was unique every time.
The results for noise removal were very promising, and the binary cross-entropy loss was found to be 0.0103.Figure 3 shows how the denoised signal compares to the original, as well as a noisy signal.
Occasionally, for some extreme random cases of the noise, the resonances would not be denoised completely.This may be impossible to avoid given the stochastic nature of the noise.To combat this, one can obtain and denoise the spectra several times, then averaging these outputs, reducing the effects of noise "outliers".Successive iterations result in increasing accuracy, but the returns quickly diminish after 10 iterations, as seen in Fig. 4.

DNN for chemical concentration measurement
While machine learning can, in many instances, discover features or trends in data to allow it to perform a desired task, and while feeding the entire spectrum into a network would work, the high resolution required for this device to function renders this computationally sub-optimal and inefficient if a smaller subset of features can be used instead.A resolution of 5 pm corresponds to approximately 28,000 points in the spectrum, which would require a large neural network to be analyzed and, therefore, a larger number of parameters to train.Furthermore, the resonances are very narrow, and reducing the resolution to allow for a smaller dataset would result in the resonances being lost all together.As such, it was favorable to perform feature extraction to reduce the spectrum to a manageable number of features.
Two sets of features were attempted, the first of which relied on data directly observed in the spectrum, and the second which relied on aggregating the features into the refractive and imaginary index.

Features measured directly from spectrum
Directly using features measured from the spectrum was attractive given that it would require less computation to process the data, and was therefore attempted.Furthermore, the method has shown success in the aforementioned studies that combined machine learning and MRR (Li et al. 2021;Zhang et al. 2018).It was therefore attempted to act as a comparison to the other dataset.The features selected to train the DNN were the following: 1.The transmission and location of the 14 resonance troughs 2. The full width at half maximum of the 14 resonances 3. The location of transmission "peaks" between the 14 resonances as well as their corresponding transmission values 4. The average transmission of the spectrum These were selected with a combination of feature selection techniques as well as some trial and error.Feature correlation techniques used were Fisher's score and Pearson correlation against the concentration percentages of each chemical.These showed that some chemicals had much stronger correlations with the selected features than others.This implied that the machine might be much better at determining the concentrations of some components of the medium than others.As an example, ethanol showed near perfect negative and positive correlation across many of the features, while water did not strongly correlate with any of the selected features.Attempting to train a model based on these features confirmed this, and the model behaved admirably when determining the concentration of chemicals that showed high correlation to the chosen features, but was not particularly accurate when measuring those that did not, leading to a rather wide range of error values, as seen in Table 1 for chemical concentrations ranging between 0.0357 and 75%.
Figure 5 shows the real versus measured values or five random mixtures.RMSE is quite low for ethanol and, to a lesser extent, propanol but is much higher for the other two chemicals, resulting in most of the error in the scatter plot occurring along the axis  corresponding to methanol.As a result, it seems that this method may only be accurate for obtaining the medium concentrations of analytes with fewer chemicals, where the problem is far less complex.It was also attempted to use the shift in the intensity and wavelengths of the resonances, similar to Li et al. (2021) but this showed even less success.Therefore, new features needed to be selected to obtain higher accuracy.

Complex refractive index at resonances as features
Given earlier success of using the complex refractive index measured to obtain the composition of a medium by El Shamy et al. ( 2022), it was naturally an attractive data set to attempt to train the neural network.To derive these, we measured the change in the resonances relative to that of a known reference medium.As previously stated, a shift in the wavelength of the resonance is the result of a difference in the real part, while a change in the loss coefficient ( Δ ), and therefore transmission, of the resonance is the result of the change in the imaginary component.The relationship is given by: Which are used to obtain the medium index by: where S wg is the sensitivity.Lastly, the refractive index is obtained by: Using these, the n and k of each resonance was obtained with minimal This data-set showed immediate promise during training, and hyperparameters of the DNN were optimized over the course of several days using a Weights and Biases sweep to further improve results (https:// wandb.ai/ site), and the resulting parameters are shown in Table 2.
The learning rate was set to decay after 100 epochs.Early stopping based on validation loss was used to prevent overfitting, which would occur around the 1200th epoch.The activation function on the output layer was set to softmax to ensure that the sum of concentrations was constrained to equal 1.For accuracy, 10% of the data was set aside to validate results, and the model was trained with a Gaussian noise layer of 0.005% to be consistent with the error of obtaining the medium refractive index (El Shamy et al. 2022).Figure 6 shows how the loss progressed with every training epoch.
This dataset showed much better results and the final neural network was much better at measuring concentration than its predecessor, rivaling the error of the linear least squares   methods even with the additional noise.For initially noisy inputs, with chemical concentrations ranging between 0.0357 and 75%, the results came out to be (Table 3):

Conclusion and future work
We propose a method of applying machine learning to further advance the sensing abilities of micro-ring resonators, providing the benefits of noise tolerance as well as more accurate results.The method uses two machine learning models, a deep neural network to determine chemical concentrations, and a convolutional neural network to remove noise.The models were trained on mixtures of water, ethanol, propanol, and methanol, but the same process could be applied to other sets of chemicals.The imaginary and real components of the refractive index at the resonances were determined to be the most optimal features.

Fig. 2
Fig.2A micro-ring resonator, with primary mode et al. 2016), caused by fluctuations in thermal equilibrium, which show a Brownian noise pattern consistent with the random walk nature of the fluctuations.

Table 1
RMSE for initial DNN

Table 2
Loss versus training epoch for both train and test data