Introduction

The emergence of nanophotonics has ushered in a transformative era in optics, enabling precise control of light-matter interactions through subwavelength structures1,2,3.

This paradigm shift has catalyzed revolutionary breakthroughs, facilitating optical devices that operate beyond the diffraction limit, with wide-ranging implications for biology2,4,5 and nanotechnology6,7,8. The 2014 Nobel Prize in Chemistry for super-resolved fluorescence microscopy9,10,11 underscores the profound impact of these advancements in nanophotonics.

Nanophotonics harnesses optical resonances and intense localized fields produced by surface plasmons via carefully designed nanostructures12. Analyzing complex nanostructures often requires sophisticated numerical techniques like the finite element method (FEM)13 and the Finite-Difference Time-Domain (FDTD) method14 supported by robust solvers15,16. However, these methods frequently demand significant computational resources, particularly in the context of inverse design, which also relies heavily on trial-and-error approaches.

The rapid progress of artificial intelligence has propelled Deep Learning (DL) to the forefront as a transformative method for addressing existing challenges17. DL utilizes complex multilayer structures of Neural Networks (NNs) to extract features at multiple scales and depths, enhancing accuracy and efficiency in regression analyses18,19. This trend has led to a surge in DL applications across numerous disciplines20,21,22.

Related works

Our literature review begins by studying guided-wave components before expanding to spectrum analysis in later phases. Early studies revealed the proficiency of NNs in predicting dispersion relations and photonic band gaps within two-dimensional photonic crystals23. Further research introduced an innovative optimization method for photonic device design utilizing an NN framework based on radial basis functions24. These studies employed NNs with a modest number of layers, generally ranging from two to three. Within two years, the domain of plasmonics began to harness the benefits of NNs. Investigations presented NN-based models capable of predicting the propagation characteristics of plasmonic nanostrip and coupled nanostrip transmission lines with exceptional accuracy and efficiency25. An innovative NN-based method markedly increased the efficiency of calculating power coupling efficiencies in photonic coupler devices26. Research demonstrated that a Multilayer Perceptron could predict these efficiencies in real-time, enhancing computational speed by approximately 105 times compared to the FEM.

Furthermore, the multilayer perceptron, alongside extreme learning machine NNs, has been implemented for the swift and accurate determination of dispersion relations and photonic band gaps in optimized bi- and tri-dimensional photonic crystals27. This strategy utilizes data from an electromagnetic solver to train and validate the NN models. Simultaneously, another research initiative focused on amplifying the quality factor, adopting a DL strategy to elevate the quality factors of two-dimensional photonic crystal nanocavities significantly28. This DL methodology attained quality factors exceeding those of the base cavity design tenfold and double the previously reported quality factors of 1.58 × 109, achieved by optimizing air hole displacements within high-dimensional parameter spaces.

Further investigations led to the creation of an open-source deep NN model for designing polarization-insensitive subwavelength grating couplers on a silicon-on-insulator platform29. Additionally, a study introduced a novel design framework for integrated photonic circuit components utilizing NNs, specifically targeting strip waveguides and chirped Bragg gratings30. An innovative study showcased machine learning applications by employing a multilayer perceptron algorithm tailored for the efficient design of grating waveguides, mainly focusing on augmented reality applications31. Recently, a study showcased the application of DL for efficient spectrum prediction and inverse design of circular ring resonators (RRs), markedly enhancing computational efficiency and accuracy compared to traditional methods. However, challenges persist in data collection and the reliance on validation using the FDTD method. In spectroscopy, a DL model has been innovatively used as an optimization tool, enabling a single-peak high scattering effect in a multilayer nanoparticle structure32. Another seminal paper introduces a method combining convolutional NNs and recurrent NNs to extract absorption spectra from images of plasmonic structures33. The introduction of these innovative deep NN frameworks for spectral analysis exemplifies the transformative impact that artificial intelligence can have on optics, paving the way for novel applications and improved performance in optical communications and photonic device design34.

Study limitations

The reviewed studies in this section illustrate the expansive applicability of the DL methodology across various applications18,35, despite facing inherent limitations and challenges36. Such challenges encompass the necessity for voluminous training data37, the reliance of prediction accuracy on the quality and representativeness of data38, and the intricacies associated with generalizing DL models to novel or disparate scenarios38. Additional considerations, including overfitting39, issues of model interpretability40,41, computational requisites42,43, susceptibility to adversarial attacks44,45, and obstacles in inverse design46 characterized by the many-to-one dilemma47,48, underscore the imperative for continual research and innovation within this domain.

Research gap

In photonics research, the RR is universally recognized as an indispensable component for fabricating photonic integrated circuits. Despite notable advancements achieved through DL techniques in nanophotonics, a conspicuous gap persists in applying DL for spectral prediction and reverse engineering of All-Optical Plasmonic Switches (AOPS) that incorporate square RRs. This study is dedicated to investigating the capabilities of deep NN frameworks expressly devised for plasmonic resonator-based switching architectures to bridge the identified gaps in knowledge.

Objectives and contributions

This manuscript centers on the escalating interest in AOPS systems founded on surface plasmon polaritons, noted for their swift response times and compact dimensions. We leverage our team's broad experience in plasmonic device research2,49,50 as well as in DL implementation34 to address the inverse design problem for square-shaped Nonlinear Plasmonic Ring Resonators (NPRR). We specifically harness the nanophotonic configuration depicted in Fig. 1, which offers a solid framework for examining the integration of DL techniques in this field. The essence of this study intersects nanophotonics with artificial intelligence, aiming to pioneer advancements in DL's ability to predict spectral behaviors and assist in the reverse engineering of AOPS configurations based on NPRRs.

Fig. 1
figure 1

The plan of the square AOPS structure. This three-dimensional schematic presents the geometric construct and parameters used in AOPS design. The cladding layer, consisting of air, and the glass base were chosen for their favorable optical properties, which are commonly used in plasmonic device fabrication.

Our chief goal is to introduce an innovative method incorporating DL for spectral prediction, enriching our comprehension of square resonators and facilitating the plasmonic switch inverse design process. We seek to establish a predictive model that precisely characterizes the spectral properties of these devices, employing groundbreaking methods like the Taguchi method in the data generation phase.

In this work, when we use the term "inverse design," we refer to obtaining the parameters of a specific RR design. This differs from the more comprehensive nanophotonic inverse design approaches reviewed by Molesky et al.51, which aim to determine the optimal structural geometry of the entire device.

In validating the effectiveness of our proposed NPRR strategy, we explore the Kerr effect52 and appraise switch performance using the FDTD method. This procedure involves training an NN to replicate simulations accurately, enabling precise prediction of transmission spectra and identification of resonant wavelength characteristics. While our focus primarily lies on AOPS challenges, the outlined methodology bears significant potential for addressing various issues across the nanophotonics landscape.

Results and discussion

This section has three sub-sections. The first analyzes the forward model's findings, the second discusses its efficacy in aiding AOPS designers, and the final sub-section highlights the inverse model's capabilities. Moreover, the necessary source codes and data are publicly available under MIT license through GitHub and Zenodo repositories. These provide a comprehensive repository of coding resources essential for reproducing the simulation, establishing the model, and retrieving findings pertinent to the research’s primary concern. The repository contains MATLAB and Python scripts, detailed guidelines, code files for simulations, deep NN model training, and outcomes analysis.

Forward model

This sub-section delves into evaluating a unique methodology designed for analyzing the transmission spectrum within an AOPS. This work utilizes a square-shaped RR rather than a circular one, which may seem counterintuitive given the general preference for avoiding sharp corners in integrated photonics due to radiative losses. However, in plasmonic waveguides, sharp bends can provide high transmission with low bending loss, owing to the strong confinement of light by surface plasmon polaritons53. Square-shaped resonators offer higher coupling efficiency compared to circular ones due to the extended coupling section between the bus waveguide and the resonator54. Moreover, plasmonic waveguides with sharp bends have been shown to maintain high transmission and low bending loss, unlike their dielectric counterparts55. This unique property of plasmonic structures allows us to leverage the advantages of a square geometry in our design without the drawbacks typically associated with sharp corners in conventional photonic systems.

The crux of the approach hinges on a deep NN architecture, which leverages the structural parameters of the RR as its input, as depicted in Fig. 1. The investigation broadens its scope by exploring a diverse range of waveguide widths, extending from 31 to 59 nm, and gap widths that vary from 15 to 25 nm. This comprehensive exploration created a vast training dataset encompassing 18,432 unique instances, each crafted through FDTD simulations.

The generation and preprocessing of the dataset are extensively detailed in the Methods section of the document. Additionally, applying the Taguchi method to refine the resolution of input parameters highlights the methodological rigor of this research. The specifics of the Taguchi method and the evaluation of the dataset's distribution are thoroughly discussed in Sects. S1 and S2 of the Supplementary Information. An exciting observation emerges from analyzing the computational costs and significance of structural parameters associated with varying the width and gap of the bus, drop, and square. Despite the intuitive assumption that larger widths would significantly and equally influence the resonance characteristics, the empirical evidence suggests otherwise. The gaps within the bus and drop waveguides exhibit a more pronounced impact on these characteristics, as highlighted in Figs. S2b and S3 of the Supplementary Information. This insight led to a strategic reduction in the resolution of the three abovementioned parameters to half that assigned to parameters Gbus and Gdrop.

The application of the Taguchi method in the data generation phase embodies a strategic approach to simulation. It ensures an optimized selection of spectra for training the NN, thus making the training process efficient and significantly effective. This method strategically explores the multi-dimensional parameter space by constructing an orthogonal array that minimizes the required number of simulations while capturing essential variations. The process involves determining discrete levels for each input parameter, designing the orthogonal array, and performing FDTD simulations for the specified configurations. Notably, this approach allowed us to reduce the required dataset to one-sixteenth of a naive parameter sweep without significantly impacting model performance. As illustrated in Fig. S3, the computational cost for each parameter is substantially reduced. The Taguchi method's efficiency in generating a representative dataset with minimal simulations represents a key contribution to our work, demonstrating how machine learning models for nanophotonic design can be trained more effectively and with lower computational overhead. The specified range of parameters empowers the NN to make precise predictions regarding the spectral attributes of millions of RR structures, thereby showcasing the efficacy and robustness of the proposed approach in spectrum prediction.

After this phase, our team embarked on the NN training utilizing a dataset crafted with meticulous precision. The methodology underlying the NN’s architecture, which leverages input variables to predict the spectral response relevant to the designated AOPS, is unveiled in Fig. 2. This figure also showcases the NN’s output, namely the transmission spectrum, across a 1000–1800 nm wavelength range. The architecture chosen for this endeavor relies on a fully connected, layer-based NN, incorporating 11 optimized hidden layers and 160 neurons within the central hidden layer, resulting in 34,612 parameters. A detailed rationale for selecting this particular configuration is articulated in Sect. S3 of the Supplementary Information.

Fig. 2
figure 2

The schematic of the deep NN architecture. The prediction of the spectrum of the AOPS utilizes this deep NN architecture, which takes inputs such as the geometrical specifics (like waveguide width and waveguide gap) and the wavelength of the input light. After traversing across eleven hidden layers, the NN predicts the transmission over different wavelengths, indicating the entire spectrum achieved from the AOPS device. This NN makes it possible to identify sophisticated connections between input parameters and the resulting transmission spectrum, giving a deeper understanding of the AOPS's behavior and performance.

After completing the training phase, we archived the NN's weights, enabling their straightforward retrieval and application. We conducted an in-depth analysis to shed light on the practical application of the DL methodology in accurately predicting the transmission spectrum. The training loss graph, depicted in Fig. 3a, elucidates the NN’s performance throughout the training process, revealing the network’s proficiency in accurately predicting transmission spectra, evidenced by a minimal validation loss value of 0.028. To evaluate the model's generalization capabilities, we calculated the loss on a held-out test dataset comprising 15% of the total data. The forward model's test loss was 0.03, which is closely aligned with the validation loss. This consistency between test and validation performance indicates that our model generalizes well to unseen data and is not overfitting. Subsequently, we redirected our attention to evaluating the network's capability to simulate spectra not encompassed within the training dataset. The network’s comprehensive adaptability was further assessed by comparing the predicted transmission and actual spectra, as illustrated in Fig. 3b,c.

Fig. 3
figure 3

The NN’s performance in estimating the spectrum. (a) The training loss, which shows significant drops, hints at the NN’s ability to identify trends in the data. (b) A juxtaposition of the approximated values of the NN and the actual spectrum in the through port and also the nearest training instances. (c) The corresponding comparison of the NN’s predicted spectrum and the actual spectrum in the drop port. The gray area highlights the NN’s generalization ability.

This comparison engenders a thorough investigation into the similarities and differences between the spectra. Notably, the spectral prediction derived from the forward model closely mirrors the actual spectra, demonstrating the network’s exceptional ability to accurately match spectra for parameter values not present in the training set, thereby highlighting its potential to discern and replicate features not present in the initial training data. This observation is corroborated by plotting proximate samples from the training set, as shown in Fig. 3b,c, where the network transcends simple interpolation or averaging of the nearest training spectra (refer to the gray area in Fig. 3b,c). Instead, it exemplifies the network’s capacity for generalization and identifying novel, previously unrecognized features, thereby indicating that the NN does not simply conform to the data but actively explores substantial patterns and configurations within the input and output data. The spectra plotted for the larger and smaller structures are the simulated spectra. It is important to note that the spectra presented in Fig. 3b,c (as well as Fig. 6b) were not chosen randomly or intentionally for favorable results. Instead, these examples were deliberately chosen to represent the most challenging cases from the extremes of our training distribution. This approach allows us to evaluate our model's performance under worst-case scenarios. By showcasing these challenging examples, we aim to provide a more stringent and transparent assessment of our model's robustness and generalization ability, rather than presenting only the most favorable outcomes.

As seen in Fig. 3b,c, the model's performance exhibits slightly lower accuracy in the resonant region compared to other spectral areas. This phenomenon, also observed in our previous work, can be attributed to an inherent imbalance in the training data. Each structure typically has a single dominant resonant frequency, resulting in other data points representing non-resonant cases for every single data point corresponding to the resonant wavelength. This significant disparity in representation poses a challenge for the model to accurately learn the characteristics of the critical resonant region. As illustrated in Fig. S6 of the Supplementary Information, increasing the complexity of the NN improves the model's accuracy in predicting the resonant wavelength. However, we acknowledge that there is still room for enhancement, particularly in these challenging spectral regions.

AOPS design based on the deep NN

While the NN training does require extensive data from numerical simulations, our DL-based approach can subsequently circumvent the necessity for extended numerical method computations in the analytical process. This substantially expedites the design and optimization of the nanophotonic structures after the initial model training is complete. (refer to Fig. S9). Our methodology facilitates incorporating various waveguide characteristics into the trained NN, producing outcomes within minutes. This efficiency grants us access to a vast repository of spectral responses for a wide array of structures, enabling the rapid retrieval of needed spectral data by navigating through this collection.

Our deep NN model was trained exclusively on low-intensity simulations, allowing efficient prediction of the nanophotonic structures' linear, low-intensity transmission spectra. The model is used to optimize structural parameters for a distinct peak in the drop port spectrum at a specific target wavelength, forming the basis of our AOPS design. We then employ FDTD simulations to evaluate the nonlinear, high-intensity behavior of the optimized structure. This two-step process enables efficient design and validation of the AOPS, leveraging the computational advantages of machine learning while accurately capturing the full range of optical phenomena involved in the switching mechanism.

The choice of the most suitable switching mechanism may depend on the specific design goals of the optical switch. DL offers designers unprecedented flexibility, with its capability to generate a broad spectrum of spectra in a shortened timeframe. This flexibility allows designers to make well-informed decisions about the most appropriate structure for various applications, enhancing confidence. This part of our research mainly focuses on the observable differences between the through port and the drop port across the spectra produced by the NN, highlighting the switching features of the RR structure. The emphasis on the third telecommunications window is due to its lower optical attenuation compared to shorter wavelengths, substantially benefiting optical communication systems56. Moreover, adopting a high contrast ratio is deemed beneficial, as it increases the system’s resilience to noise and improves error detection capabilities in the operational deployment of switching devices. This advantage, in turn, enhances the operational reliability of these devices, marking a notable advancement in the field of optical communications57.

Upon determining the optimal structural parameters through DL, we performed FDTD electromagnetic simulations to minimize errors associated with the DL methodology and accurately ascertain the transmission spectra of the structure. The essential design parameters for a square NPRR and its transmission spectrum are prominently featured in Fig. 4a. Investigating the switch’s linear and nonlinear domains sheds light on its operational mechanisms. The electromagnetic simulation of the switching operation under both low and high optical intensities is vividly illustrated in Fig. 4b,c, enhancing our conceptual understanding of the process.

Fig. 4
figure 4

The performance of the switching mechanism. (a) The transmission spectra of the square NPRR. The resonant wavelength is notably occurring at 1554 nm. With an increase in the input light’s intensity, a spectral redshift occurs in the resonant wavelength due to nonlinear effects. (b) The low-intensity optical field of the AOPS. Around 63% of the incoming light goes through the drop port, while 6% goes through the through port, signifying an "ON" situation for the drop port and an "OFF" situation for the through port. (c) In an AOPS under high intensity, around 57% of the input light passes through the through port, while about 19% traverses via the drop port, indicating an "ON" state and an "OFF" state, respectively.

Optical power is effectively coupled to the RRs within the linear operation domain when low-intensity light is introduced at the resonant wavelength, enabling transmission at the drop port (see points ① and ④ in Fig. 4). This phenomenon occurs when the resonant wavelength coincides with the linear resonant condition of the RRs. In contrast, an increase in light intensity triggers the onset of the nonlinear Kerr effect, characterized by the change in a material’s refractive index in response to the intensity of the light (see points ② and ③ in Fig. 4). The transmission spectra reveal a redshift due to the Kerr effect, causing the light to diverge from the RRs’ resonance, thus reducing power coupling efficiency. Consequently, rather than being coupled to the RRs and directed to the drop port, the light proceeds to the through port. The efficiency of light coupling to the RRs diminishes due to the resonance condition shifting away from the initial wavelength, primarily because of the redshift induced by the Kerr nonlinearity. This dynamic illustrates the Kerr nonlinear effect’s notable influence on the structure’s transmission characteristics. By modulating the light intensity, we can precisely control the resonance conditions and the power coupling to the RRs, thereby enabling enhanced manipulation of light transmission between the device's ports. Consequently, the Kerr effect modulates light transmission within this nonlinear photonic structure, offering a nuanced control mechanism for optical switching applications.

To provide a more vivid depiction of the AOPS mechanism, we performed electromagnetic simulations using input light pulses of variable intensities on the square geometry at the resonant wavelength of 1554 nm. The optical field patterns computed under low-intensity linear and high-intensity nonlinear excitation conditions are graphically represented in Fig. 4b,c, respectively. With minimal input power, the field profile strongly indicates a robust resonant coupling of the signal into the square ring, consistent with signal propagation towards the output waveguide, as anticipated within the linear regime’s operational principles (Fig. 4b). However, as the input intensity increases, the transmission spectrum shifts towards the left, attributable to the refractive index modification induced by the Kerr effect.

Consequently, the pronounced field pattern shown in Fig. 4c demonstrates negligible coupling into the square ring, with the pulse primarily proceeding undisturbed through the input waveguide, corroborating the theoretical principles of nonlinear switching activation. The threshold power in Fig. 4 is 9.6 MW/cm2.

Figure 4 compellingly demonstrates a notable correlation between the optical field trends and the operational functionality of the AOPS in both its "OFF" and "ON" states. This alignment underscores the intricate interplay between input light intensity and the resultant optical field distribution within the square ring structure, highlighting the pivotal role of the Kerr effect in modulating the device’s transmission characteristics. These simulations elucidate the AOPS's dual operational states, offering insights into the mechanism's efficacy in switching between transmission modes. Switching between transmission modes effectively affirms such structures' technological viability and adaptability in advanced optical switching applications. Enhancing contrast in the transmission spectrum is advantageous; however, adopting a strategy that promotes a more pronounced decrease in the transmission spectrum can also effectively select the optimal spectrum56,58. The inherent characteristics of the switching mechanism fundamentally inform the selection of a transmission spectrum with a sharper decline59. Identifying a spectral dip with a steeper incline enables a more explicit and quicker transition during the switching operation. Detailed insights into our strategy for selecting a transmission spectrum that meets this criterion are provided in Sect. S4 of the Supplementary Information. Initially, we establish the connection between a substantial reduction in the transmission spectrum and its second derivative, as shown in Fig. S7. We then apply this relationship to the dataset generated using the DL methodology, explicitly targeting the wavelength of 1310 nm. This particular wavelength is selected as it falls within the second telecommunications transmission window, chosen for its minimal chromatic dispersion within this bandwidth60,61.

Figure 5 displays the transmission spectra for the proposed plasmonic switching device under conditions of low and high optical input intensities. At lower intensities, where linear optical effects dominate, a distinct extinction dip is visible in the transmission spectrum at the through port, near the resonant wavelength of the plasmonic square ring (red closed circles in Fig. 5). This sharp dip indicates a strong coupling of the incident light with the resonant square ring, consistent with the switch’s theoretical model in the "ON" state. However, as the intensity increases where nonlinear effects prevail, a spectral redshift in the extinction dip occurs (blue opened circles in Fig. 5). Due to this resonance shift, the incident light becomes out of sync with the original resonant coupling condition, leading to reduced power transmission to the square ring and a transition to the "OFF" state, as depicted by the transmission profile.

Fig. 5
figure 5

The efficiency of the switch structure. The square NPRR switch's transmission spectrum and the AOPS's optical field were analyzed in low and high-intensity states. The switch's performance was evaluated by examining its optimized transmission spectra, revealing a resonant wavelength at 1310 nm. Increasing the input light's intensity caused a redshift in the resonant wavelength due to nonlinear effects. The AOPS's light field was also depicted. At low intensity, around 7% of the incoming light passed through the port, indicating an "OFF" state. Approximately 67% of the input light passed through the through port at high intensity, indicating an "ON" status. The drop port remained "OFF" in both cases with minimal transmission.

The simulation results bolster the conceptual model, demonstrating how the modulation of resonance properties dependent on intensity can enable optical switching capabilities in the designed plasmonic nanostructure. This approach validates the theoretical underpinnings of the switch’s operation and underscores the practical feasibility of employing intensity-dependent resonance modulation as a mechanism for optical switching within plasmonic nanostructures.

These examples illustrate the notable potential of our proposed DL methodology for identifying an optimal configuration for the plasmonic switching device. Moreover, we demonstrated the forward model's utility in optimization efforts, showcasing its value in refining design parameters. Investigating the proposed plasmonic switching device utilizing the forward deep NN revealed insightful revelations regarding its operational dynamics and design considerations. The ability of the trained NN to obviate the need for extensive computations of numerical methods highlights its practical utility in streamlining the design process. The exploration of optimal geometric parameters, guided by the network’s predictions, further accentuates the efficacy and precision augmented by DL in designing AOPS. The groundwork established in this discussion lays a robust foundation for comprehending and utilizing the forward model, facilitating its implementation in real-world applications. As we progress to the following sub-section, our attention shifts toward the real-world application and the experimental validation of predictions made by the inverse deep NN. This transition sets the stage for a deeper understanding and applying the forward model in practical settings.

Inverse deep NN and inverse design

Our research illuminates the profound efficacy of DL in adeptly navigating the complexities of inverse design challenges, a pivotal domain straddling engineering and physics disciplines. The inverse design involves identifying a desired spectral output and subsequently deducing the precise geometry capable of reproducing this output with high fidelity. Our studies highlight the remarkable success of NNs in achieving this goal, demonstrating their capacity to resolve inverse design challenges with notable precision.

At the heart of our methodology lies the strategy of selecting a random target spectrum as a benchmark, upon which the trained network is tasked with inferring the input variables necessary to engender a spectrum that closely mirrors the target. This approach facilitates the precise determination of input parameters essential for attaining the desired spectral outcome, enhancing the efficiency of the inverse design process. To ascertain the physical validity of these spectra, we derived the target spectrum from a feasible configuration within the AOPS and ensured that the spectrum originates from a physically plausible structure. Such a strategy bolsters the network’s proficiency in predicting input parameters that yield corresponding spectra, adhering to real-world constraints62, and ensuring the applicability of the designs in practical scenarios. This methodology showcases the potential of DL in transforming the landscape of inverse design and underscores its utility in streamlining the design process, thereby facilitating the development of innovative solutions within engineering and physics.

We deliberately chose a structure substantially different from those used in the network’s training to ensure the accuracy and reliability of the model. This process confirms the model's ability to predict the desired spectrum, even for unseen configurations precisely. Figure 6a shows the inverse model’s performance throughout training, indicating the network’s precision in predicting geometric details with a minimal validation loss value of 0.018 and test loss value of 0.019. Additionally, Fig. 5b compares the transmission spectra of the targeted AOPS configuration with the predicted transmission spectrum derived via the DL methodology. The minimal differences between these spectra highlight the DL method’s proficiency in effectively predicting the desired transmission spectrum.

Fig. 6
figure 6

The outcomes of inverse design for the AOPS using the DL approach. (a) The illustration of training loss shows a remarkable decrease over initial epochs. The low loss value confirms that the NN successfully learns and recognizes patterns in the data throughout the training process. (b) The capability of the inverse model to predict design parameters for the transmission spectrum in the furthest data point away from the training dataset. The miniature table serves as an identifier of the design parameters used in the investigation.

Highlighting the instrumental role of the FDTD method in deriving the spectra depicted in Fig. 5b is imperative to address the inverse problem. This approach mitigates potential inaccuracies associated with the forward model. Our inverse model’s comprehensive training and validation enhance its credibility and demonstrate its prowess in predicting intricate structural designs to generate the anticipated spectra. Our methodology simplifies the resolution of inverse design dilemmas, circumventing the need for labor-intensive manual derivation and computation of inverse equations. While simpler prediction methods like nearest neighbor interpolation or linear regression might suffice for less complex systems, the intricate, nonlinear relationships inherent in nanophotonic structures necessitate more sophisticated approaches. Our DL model excels in capturing the complex interactions between multiple geometric parameters and their influence on spectral responses, particularly in high-dimensional parameter spaces like square resonators. The NN's multilayered architecture enables it to learn hierarchical representations of the data, effectively modeling the nonlinear dependencies that simpler methods struggle to capture. This ability becomes increasingly valuable as the parameter space expands, offering superior predictive power and generalization capabilities compared to traditional interpolation or regression techniques. Our model's performance in predicting spectra for configurations at the extremes of the parameter space further demonstrates its robustness and utility in navigating the complex landscape of nanophotonic design.

To enable a comprehensive comparative analysis of recent advancements in the field, a summary of key photonic research studies is provided in Table 1. This table includes details such as the type of structure, the number of input parameters, the configuration of hidden layers, and the model accuracy as measured by various evaluation metrics. Directly comparing the results of different DL methods can be challenging due to the diversity in structures, variations in NN architectures, and the use of different evaluation metrics. For example, the table presents the performance of the models in terms of Mean Squared Error (MSE), Mean Percentage Absolute Error (MAPE), Mean Relative Error (MRE), and Mean Squared Logarithmic Error (MSLE), which are among the most commonly used evaluation metrics in this domain.

Table 1 A comparative analysis of applying DL in the design of photonic structures.

As seen in the comparative table, nanophotonic structures like plasmonic structures and cases with an increased number of input parameters require more complex NN architectures. Consequently, this leads to higher computational costs. Nonetheless, our work, when compared to similar studies, has achieved relatively desirable error rates with a less complex architecture. As seen in this table, the application of the Taguchi method, which significantly reduces the amount of data required for training the NN, distinguishes our work from other studies.

As we dissect our findings, critically examining our methodologies is essential, recognizing the inherent limitations. The deep NN's success relies on the quality and diversity of the training data. Although utilizing FDTD simulations to generate datasets introduces a degree of bias, it is essential to acknowledge that data acquisition is a notable challenge in implementing DL strategies. Understanding the rationale behind the model's decisions presents a challenge, prompting further investigation into model explainability in the nanophotonic domain. As our study concludes, we hope that the insights provided will inspire specialists in the field to conduct more in-depth investigations. The Inverse design by deep NN stands out not only as a time-efficient approach in nanophotonic design but also as a catalyst for exploring new possibilities in optical communication.

Conclusion

This study successfully employed DL techniques to establish a notable correlation between the spectroscopic attributes and the operational performance of plasmonic square RR. By harnessing the power of DL, our research overcomes the intricacies of inverse design, thus enhancing the functionality of AOPS based on NPRR. Moreover, this study demonstrated that the Taguchi robust design is a potent tool that can improve the quality of datasets, minimize data generation expenses, and yield substantial advantages in the data generation phase. The architecture of our NN features 11 overcomplete hidden layers, with the central layer comprising 160 neurons, and the training process extended over 1000 epochs. These parameters were carefully adjusted to achieve an ideal equilibrium between swift convergence and accurate prediction of spectral characteristics for AOPS’s spatial configurations. It is crucial to emphasize that these parameters can be adapted depending on the unique challenges encountered. In situations marked by uncertain outcomes, cautiously adjusting these parameters is recommended to maintain the DL model’s reliability and safety. The transmission spectra predicted by NNs display exceptional agreement with those obtained through FDTD simulations, highlighting the unparalleled accuracy of our DL approach. This methodology substantially reduces computational costs compared to traditional numerical solvers, offering rapid and economical spectral predictions for RR arrangements.

Furthermore, our approach tackles the challenge of inverse design, creating optimal geometries for the desired optical response spectra. Our DL model is validated using physically plausible configurations, confirming the feasibility of the proposed switches for incorporation into photonic integrated circuits. Our findings have practical relevance and demonstrate their applicability in various real-world scenarios.

The fusion of nanotechnology’s precision and artificial intelligence’s computational might and pattern recognition capabilities heralds a new era of scientific innovation. This synergy is set to drive notable breakthroughs and foster the development of pioneering applications, marking a transformative shift in scientific exploration and the potential to revolutionize various fields. As this paper concludes, we invite readers to review our accomplishments in Animation S1.

Materials and methods

This section describes the methods used in our investigation to ensure the results can be replicated. This section expounds on the methodologies utilized in this study, commencing with a concise overview of the theoretical underpinnings. Further comprehensive elucidation of these foundations is available in the Supplementary Information S1. Considering the multifaceted nature of this study, the theoretical framework was split into two distinct categories within Sects. S5 and S6. The initial sub-section delves into the mathematical equations foundational to the design and modeling of AOPS, while the subsequent portion is dedicated to elucidating the formulation of deep NN strategies. Within the defined mathematical framework (Eq. S4), we adeptly integrated the wavelength of the incident light into the forward model of the NN. This integration endowed the model with the remarkable capability to accurately predict transmittance values across discrete and broadband wavelengths, inviting the reader into a realm where precision and practicality meet innovation. Following this examination of the theoretical foundations, the upcoming sub-section will shed light on generating and preparing data. The final sub-section will comprehensively outline the steps followed when training the NN.

Generation and preprocessing of data

The study evaluates the efficacy of AOPS by selecting and adjusting specific design parameters such as the widths (Wbus, Wdrop, and Wsquare) and gaps (Gbus and Gdrop) of the waveguides. Changes range from 31 to 59 nm for widths and 15 to 25 nm for gaps to optimize performance. The training dataset, generated through FDTD simulations, is accessible under an MIT license on GitHub and Zenodo, with detailed composition outlined in the Supplementary Information (Sect. S2). The dataset's construction using FEM or FDTD is time-consuming, with significant computational demands discussed in Sect. S7. To enhance the NN's performance, we adopted the Taguchi method, supported by software solvers like Qualitek-470 and Minitab71 (detailed in Sect. S1) to prioritize and select parameter resolutions for data generation, significantly involving statistical analysis to assess parameter influences72. Despite the computational intensity, this method produced a comprehensive dataset in about a month using three 3.1 GHz 16-core computers, training the NN to predict vast datasets swiftly, as opposed to the more extended data generation phase.

Procedure of training

This research utilizes the Keras library73, which has been integrated into Google’s TensorFlow74 since 2016, to execute tasks in Python75 within the Anaconda76 environment. We assessed multiple machine learning packages, leveraging Pandas77 for data preprocessing and Scikit-learn78 for model training. NumPy79 was crucial for handling matrices in developing the regression model. All relevant codes are available on the GitHub repository.

The study features an NN with an overcomplete hidden layer whose dimensionality matches or exceeds the input space. Following the formula \({2}^{\left({N}_{L}-1\right)/2}\), we adopted an approach to determine the neuron count in the central hidden layer based on the total number of hidden layers (NL). We optimized the NN architecture for maximum neurons centrally, with neuron counts halving towards the input and output layers. This architecture includes an odd number of hidden layers to maintain symmetry.

Using the Fibonacci sequence to adjust the central layer's neuron count, we explored different neuron scaling strategies. The dataset was divided into 70% training, 15% validation, and 15% testing, with a batch size of 80 and updates based on training loss-derived gradients. It is crucial to emphasize that our data division methodology was conducted on a simulation-by-simulation basis, not by individual data points. Details on hyperparameter tuning, including layer counts and training epochs, are discussed in the Supplementary Information (Sects. S3 and S7), which also addresses computational cost implications.

The study deviates from conventional activation functions like ReLU, opting for Leaky ReLU80 to prevent gradient vanishing and enhance optimization convergence. This function's suitability for continuous value estimation tasks is highlighted, fitting well with the study's objectives. The following equation deliveries the Leaky ReLU function’s formula34:

$$\text{Leaky ReLU}=\left\{\begin{array}{cc}\alpha x& x\le 0\\ x& x>0\end{array}\right.$$
(1)

In this equation, x is the input value, and α is a small positive constant that sets the function’s slope for negative inputs, set at 0.2 during training. For error estimation, we compared the anticipated spectral output of the proposed deep NN with the actual spectral values using MSE. The calculation of MSE follows this equation81:

$$\text{MSE}=\frac{1}{n}\sum_{i=1}^{n}{\left({y}_{\text{pred}}-{y}_{\text{true}}\right)}^{2}$$
(2)

where n denotes the total number of data points, ypred is the predicted value, and ytrue represents the actual value computed utilizing the FDTD technique. The Adam optimizer, known for its fast convergence compared to stochastic gradient descent, was chosen for its ability to handle nonlinear datasets and adaptively adjust learning rates for each parameter, optimizing memory use82,83. This optimizer was integral in the iterative refinement of the deep NN model’s weights and biases, aiming to minimize the MSE84.