Supervised and unsupervised learning using a fully-plastic all-optical unit of artificial intelligence based on solitonic waveguides

The software implementations of neuronal systems have shown great effectiveness, even if the natural hardware separation between the processing and memory areas in computers slows down the analysis capacity. To overcome these limitations, new hardware configurations are moving towards neuromorphic models, capable of unifying the processing/memory dichotomy. Recently, integrated photonic X-junctions formed by waveguides written by spatial solitons have shown the ability to perform supervised learning. The solitonic technology, compared to the traditional one, offers the advantage of realizing plastic circuitry, a typical characteristic of biological neural networks. This work extensively studies both supervised and unsupervised learning of photonic soliton X-junctions. By exploiting the plasticity of the nonlinear refractive index at the base of the soliton formation, X-junctions can readdress their behaviours forwarding data to different outputs. In this article, we will extend the state-of-the-art: starting from supervised learning, for which all possible cases are now investigated, a material sensitive to the transported signals will be introduced to allow the junction to carry out unsupervised learning. In this way, the junction autonomously recognises the transported signals without the external intervention of the operator. Learning and memory now physically coincide in fact, learning means that the junction slowly switches based on the information sent; any further unknown information sent will find the junction in the modified state which corresponds to the learned information and will be recognised as well (reasoning based on comparison with stored information).


Introduction
In recent years, the problem of managing and processing large quantities of data has pushed towards new methods that could guarantee computation high speed, parallel processing and high efficiency [1][2][3][4]. To increase such efficiency, new software models have been studied in order to replicate the typical learning functions of the brain [5], the most efficient computer ever existed with the lowest energy consumption. But, differently from biological neural networks, traditional computing structures are built following the von Neumann architecture, i.e. core computing functions of processing and memory are physically separated; as a consequence, the information processing times become longer, making the whole system less efficient and more energy intensive. In response to these needs, hardware research has been oriented towards a neuromorphic approach in order to solve the processing/ memory dichotomy [6,7]. At the hardware level, a neuromorphic circuit intends to replicate the fundamental functional blocks of brain biology, i.e. neurons and synapses, capable of simultaneously processing and remembering.
The electronic implementation of neuromorphic hardware has some limitations, being not easily adaptable to different learning situations, and having a memory based on absorption processes which consequently increase the energy necessary for network management.
Photonic hardware could play a significant role in managing data processing and storage simultaneously [8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26]. The solutions proposed so far show surprising results even if they are still characterized by the dichotomy between the process unit and the memory unit, remaining very far from the functioning of biological neural circuits [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25]. Biological memory, on the other hand, is achieved by making a change in the function of the synapse, by strengthening or weakening pre-existing connections [25]. By exploiting the plasticity of a nonlinear refractive index, the circuits based on self-assembled waveguides can both switch and store the state, taking advantage only of the charge movement induced in the host material [11][12][13][14]. Such procedure is known as stigmergy, which means to communicate information through the environment modification [15]. It was successfully implemented in 2018 with a solitonic X-junction [16] that performed all-optical supervised learning simulating the procedure adopted by ant colonies during their food searching. A soliton X-junction is the intersection between two waveguides self-written by two laser beams that do not diffract (spatial solitons). Therefore, its operation is based on two different laser systems, one to write the circuit and one that carries information within it. The work [..] describes the device under supervised learning, for which the information laser system is absolutely inert: its only function is to propagate information within the circuit following the addressing of the junction. This kind of solitonic X-junction can learn the commands of an external supervisor: in fact, the light addressing ratio at the intersection depends on the refractive index contrast present. If the waveguides are written with similar powers, the junction will be balanced and the optical signal power within each input channel will split 50/50 on the output channels. If one of the two channels is reinforced by increasing the power or by means of a feedback from the corresponding writing beam, it will be highlighted, and the junction will unbalance in its favour (supervised learning).
Is it possible to extend the gate's behaviour by making it capable of both supervised and unsupervised learning? This paper redesigns the soliton X-junctions, introducing an innovative configuration that is also able to autonomously recognize the information propagated and consequently switch the structure (unsupervised learning). The innovation concerns the use of materials sensitive to both writing and signal beams. By doing so, the writing beams will only have the duty to create perfectly balanced junctions, exactly as it happens in biological neural tissue where preexisting connections are modified in their strength and arrangement: in the paper, it will be shown how the signal beams, by slightly modifying the refractive index contrast of the soliton waveguides within which they propagate, are able to consequently unbalance the junction and have the information recognized by the device. These new structures represent the fundamental building blocks for the implementation of learning processes, be it supervised or unsupervised.
The paper will firstly describe the complete behaviour of supervised learning and then will introduce the geometrymaterials necessary to perform unsupervised learning too.

The solitonic X-junction as a photonic neuron
A biological neuron is a complex device whose functioning is not yet fully understood. However, its basic signal processing and transmission can be sketched in a relatively simple scheme according to the model shown in Fig. 1a. A neuron receives signals through input channels called dendrites and sent them to the soma, the neuron microprocessor. Here, the signals are added together and compared with an activation function, a kind of high pass filter. When the signal sum exceeds a certain threshold, it is transferred to the axon; otherwise, it is blocked by the soma. Axon is a simple long transmission line that transfers the information to the post-synaptic connections, which in turn transfer it to subsequent neurons. The intrinsic behaviour of a photonic X-junction as shown in Fig. 1b: the junction waveguides are made by the variation of the refractive index induced by two incoherent self-confined light beams (soliton waveguides [13][14][15][16]) inside a material with saturating electro-optic nonlinearity. These transient guides are written by two laser beams in the green. A small angle of about 0.8°7 1.0°allows them to cross one-each-other at the junction point. A third IR light beam is injected into one of the self-confined channels, as shown in Fig. 1c: it represents the transported information.
The input arms of the gate constitute the input dendrites. The signal travelling inside the dendrites arrives at the crossing point, the soma of the photonic neuron: here the signal is analyzed and sorted on one or the other output based on the refractive index contrast of the intersection point. This contrast can be varied during the device operation based on the output feedback (both in the supervised and unsupervised regimes). It constitutes the core of the neuron, acting simultaneously as a learning and memory centre.
The axon instead is just a long conductor, used to transfer information far away. In an integrated circuit, this is not that important: there is often a tendency to create compact devices to increase integration. If necessary, a soliton neuron, like the one presented here, can be either long (some centimetres) or very short (a few hundred microns) depending on the needs.
We have simulated the whole device behaviour using a well-tested FDTD numerical code [17][18][19][20][21] that solves the nonlinear wave equations in Slowly Varying Envelope Fig. 2 The solver code is divided in two sections: the first section writes a balanced X-junction by solving the nonlinear wave equations of two light beam within a nonlinear medium. The output of this section corresponds to the refractive index mapping of the junction which is now used as initial structure for the propagation of a signal beam which is injected into one of the two input arms. The code solves the nonlinear coupled wave equations of the three injected beams. The signal one would slightly modify one of the junction arms. By recursive feedbacks, the junction slowly learns the inserted information Approximation (SVEA). The flow chart of the solver is shown in Fig. 2.
The numerical code is divided into two consecutive sessions, one used for writing the junction and one for the learning process.
In the writing phase, two visible light beams with transverse hyperbolic secant profiles are injected inside a virgin medium with saturating electro-optical nonlinearity. The beams propagate with small reciprocal angles with respect to the longitudinal direction of the material, to cross at the centre of it. These beams interact with the environment and evolve towards soliton regimes by modifying the refractive index. The overall propagation is set at 15 mm to improve channel contrast although it could be reduced to a few hundred microns for compactness purposes.
Since the powers of the writing beams are set equal, the result of their propagation is a perfectly balanced X-junction, in the sense that a possible third signal beam can propagate confined in one of the two channels as if it was a waveguide, and once it reaches the crossing point, it splits exactly into two equal parts that come out of the output arms.
Thus, at the end of the writing process, the code records the refractive index mapping of the balanced X-junction structure.
This map represents the starting point of the learning section. In fact, once the junction is written, a signal beam is injected inside one of the two input arms and propagates through the structure. While balanced, the junction splits the signal in two equal outputs. At the end of the first differential equation resolution, a feedback is re-injected at the input together with the beams in order to slightly modify the balanced junction. In the case of supervised learning, the whole output intensity of a single channel that must be highlighted is reported at the input and summed with the input beams. In case of unsupervised learning, the refractive index mapping is feedbacked in order to gets memory of the previous behaviours. At the end of the learning loops, the junction results unbalanced, having recognised the specific information either highlighted (supervised learning) or autonomously (unsupervised learning). The final output of the solver is the mapping of the signal intensity, showing that the junction has indeed recognised the specific information and has switched accordingly.
Let's see in more details how these processes occur.

Supervised reinforcement learning
Supervised learning implies that the external operator decides which output is to be reinforced by feedbacking the corresponding writing beam intensity, which is injected back into its own soliton channel from the output. To perform the simulation, we have considered lithium niobate as host nonlinear medium, inside which the three beams, two of them for the soliton writing (A 1 and A 2 ) and one for the signal (A 3 ), propagate following the Helmholtz equations with saturating electro-optic nonlinearity: where 2 NL is the nonlinear dielectric constant, E bias is an external electrostatic bias necessary for photorefractive screening solitons 30 and |A sat | 2 is the saturation intensity. Please note that only A 1 and A 2 are able to induce a nonlinear response in the material (being in the visible range), while A 3 is just sensitive to the refractive index modification produced by the former ones. Numerical parameters of lithium niobate were used for the simulations. The electrostatic bias has been set at 36 kV/cm, while the light powers at 8 lW for the writing beams (intensities of the order of 2.75 9 10 5 W/m 2 ) and at 0.5 lW for the signal one (intensities of the order of 0.05 9 10 5 W/m 2 ).
After the writing process (during which only the writing beams were present), a balanced X-junction is obtained, as shown in Fig. 3a. Here, a signal is injected inside the A input and it is perfectly split 50-50 on both output a b c Fig. 3 Supervised learning: the X-junction switches from the balanced outputs without any feedback (a) to the unbalanced behaviours, either due to a feedback on the alpha channel (b) or due to a feedback on beta channel (c). The single channels are highlighted by sending back the specific writing beam inside the corresponding channel from the output channels. In the learning phase, both channels a (Fig. 3b) and b (Fig. 3c) have been feedbacked by re-injecting 30% of the output power inside the corresponding channel. This is supervised learning because the operator decides from the outside which channel should be highlighted by the feedback; in this way, only the trajectory towards one of the two outputs is reinforced by switching the junction from the balanced configuration (Fig. 3a), i.e. to the corresponding unbalanced one (Fig. 2b-c).
The imbalance induced in the junction acts on the propagating signals as a neural activation function, whose characteristic behaviours are described by the output powers recorded on each channel, as shown in Fig. 4 (in  the case of a feedback on a output).
This process has a bistable trend: after very few feedback iterations, the junction is completely unbalanced reaching a 90/10 ratio that remains stable for the subsequent iterations, constituting a memory of the highlighted channel too. Therefore, the learning condition is reached, and the system knows the desired external information (supervised learning), memorising it through a specific modification of the environmental refractive index.

Unsupervised reinforcement learning
The unsupervised learning implies that the junction itself recognises the information transported by the signal and switches consequently. In this case, the refractive index map is feedbacked, carrying the slight imbalance that the signal is able to perform on the junction, and time after time, the recurrent feedbacks drive the whole junction to readdress it. As project parameter, the originally inactive signal must now interact with the nonlinearity of the host material to be recognised.
To allow the signal beam to interact and modify the refractive index of the junction, we considered specific doping of the electro-optical host material capable of inducing two-step absorption and visible-light re-emissions. Absorptive nonlinearity is necessary to allow the system to perform a threshold recognition. One possible way to accomplish this is by using an erbium doping. Erbium is commonly used as active material for laser emission at 1.55 mm by pumping at 980 nm. Both these wavelengths are not absorbed by lithium niobate and therefore are unable to excite its nonlinearity. However, erbium can have a nonlinear 2-step absorption of the 980 nm radiation with green re-emission, which instead is sensitive for the lithium niobate host environment. Such nonlinear process has already shown to be able to support luminescence induced spatial olitons [22]. Therefore, through a 2-step absorption, even an IR signal at 980 nm can modify the refractive index and give rise to the feedback necessary for switching the junction.
The 2-step population N 2step of the erbium excited level must satisfy the rate-differential equation: where r is the nonlinear absorption cross section, N 0 the population of the ground level, F 2 the square of the signal photon flux at 980 nm and c the excited level relaxation time. Due to the nonlinear nature of the absorptive transition, we might consider the ground level population constant, which gives rise to the following solution: As a consequence, the Helmholtz Eq. (1) in this case takes on a slightly different form which considers the contribution of the A 3 signal to the nonlinearity too: where g is an efficiency factor for the nonlinear process. It depends on the material doping concentration and on the excited-level relaxation time. The evolution of the unsupervised junction is reported in Fig. 5 for a signal injected exclusively inside the A input channel; the figure reports the normalized outputs vs time.
This simulation has been performed considering c ¼ 2000 (in roundtrip units) and g / 10 À6 . As you can see, the b output channel (blue line) starts to increase more than the a channel inside which the signal was injected at the input, reaching a maximum within a time of 0.2-0.3 c. After this maximum, the b channel begins to decrease, while the signal energy is totally transferred to the a channel, which grows up until it reaches the maximum saturation value, very close to 100%. This behaviour is typical of the soliton junction and depends on the refractive index contrast that is generated in the interaction zone between channels.
As shown in Fig. 6, the signal light entering inside channel A arrives at the junction and undergoes a kind of elastic rebound on the nonlinear index variation, coupling itself inside the b output channel. This effect becomes more and more efficient as the feedback writes a deeper and deeper index contrast. However, when the b channel reaches its maximum intensity, the potential well written in the crossing point by the refractive index becomes so deep that the light can no longer escape out from it. Consequently, the scattering losses decrease and the signal light begins to flow into the a output channel, which slowly acquires almost whole the transmitted signal power. Both a and b channels evolve towards a stable final state, where all the signal light now comes out of a single and well-identified gate. In this way, the junction has recognized and learned the state of light, switching from the neutral 0.5/0.5 to the recognized 1/0 state. Both channel a and b evolve towards this final state following two saturating trends: fitting them with negative-exponential functions (sigmoid trends), the saturation time constants can be calculated as function of the efficiency factor g, as shown in Fig. 7.
These time constants linearly depend on the relaxation time constants of the 2-step population.
We have tested that the junction is able to recognize the input states and switch accordingly. In the initial state, the signal splits 50% on each of the outputs, regardless of which channel it enters. However, the feedback allows the junction to learn from where the signal arrives and recognizes it. This means that from the initial neutral condition 50/50, the junction switches to the 0/1 or 1/0 state, depending on the powered input. If both inputs are in the 1 state (1/1), the junction recognizes them and maintains this information at the outputs (1/1). These behaviours are highlighted by Fig. 8 in which the normalized output powers are represented as the number of iterations Fig. 5 Nonlinear evolution of the two output channels for a signal injected inside the a input channel. The crossed channel is initially filled of light until the junction switches: at that point, the a output grows up and the b output depletes. The overall trends behave like sigmoid transfer functions. In the transient regime, the sum of both channels overpass 100% of power: this anomalous behaviour is justified by a temporary decrease in the propagation losses by means of a higher refractive index contrast of the junction. This transient allows to confine more energy to the channels than those that are transported in a balanced junction regime Fig. 6 Different behaviours of the junction during the recognition and switching phase for an input 0-1. Initially the junction is perfectly balanced; due to the inclination of the beams reaching the junction, the beam opposite to the highlighted one gains power first; only at a certain point the gate switches due to the accumulation of nonlinearity and the whole structure stabilizes in a 1-0 unbalanced state (actually 0.9-0.1) The simulations show that the 1-0 state practically corresponds to a 0.9-0.1 output. That is, the gate does not switch up to 100% and 0% but it maintains a non-zero zero signal. This behaviour is reminiscent of the TTL electronic logics for which in the zero state, a value lower than 0.4 is recognized, while in state 1, a value greater than 0.8 is recognized. To verify these recognition limits, we have studied the following ratios of the input signals: 0.1-1, 0.5-1, 0.8-1. The trends for these ratios are shown in Fig. 9: both low ratios, 0.1-1 and 0.5-1, are recognized as 0-1, while the 0.8-1 ratio is rightly recognized as 1-1. The generalized scheme of the recognition is summarized in Table 1.
It should be pointed out here that the junction recognizes following a differential instead of absolute process. The recognition is indeed comparative between different inputs: through successive iterations, the refractive index contrast at the waveguide crossing grows up according to the relative intensities, driving the whole structure into a novel unbalanced configuration. As a consequence, the junction has both digitally recognized the signal information and memorized it for future operations.

Conclusions
This work showed the effectiveness of soliton X-junctions as neuromorphic processors, able to learn from either external instructions (supervised learning) or experience (unsupervised learning) or both. The plasticity of the nonlinear refractive index allows these devices to be written and addressed according to specific information. Fig. 8 Learning dynamics of the solitonic junction: starting from the initial neutral condition 50/50, the junction recognises the input and switches accordingly Fig. 9 The recognition is comparative and digital: by holding one input channel (A) fixed in state 1, the other (B) is recognized as a state 0 or 1 if its power is lower or higher than 0.8. This behaviour is directly reminiscent of the Boolean TTL logic, for which states 0 and 1 are defined on the basis of specific potential thresholds The present work as shown that supervised learning can efficiently recognise every highlighted output by the junction readdressing. On the other hand, unsupervised learning has effectively recognized all possible input combinations, switching from the initial balanced condition 0.5/0.5 to the 1/0-0/1-1/1 states as a function of corresponding inputs.
The neuromorphic device recognizes the signals by comparison between the input channels, with a TTL digital type recognition ratio, whereby the level 1 corresponds to 90% of the input signal and the level 0 corresponds to 10% of the input signal. We believe the non-nullity of the zero level is of fundamental importance for the behaviour of complex networks, where it is necessary to keep memory of the trajectories even during the ''off'' state, i.e. without input signals.
The transmission transfer functions of both supervised and unsupervised learnings have followed sigmoid-like trends, corresponding to the active response of the soma of biological neurons. Thus, solitonic X-junctions are good candidates as photonic neuromorphic circuits. With respect to every other neuromorphic circuitry, solitonic X-junctions offer the great advantage of being able to exploit the plasticity of the nonlinear refractive index, a feature with enormous potentiality that we believe has not yet been fully explored. The possibility of creating completely plastic circuits, which are written, modified and possibly deleted, opens the way to innovative behaviours. At the moment, networks of solitonic neurons are being studied that are able to learn complex information and recognise unknowns for comparison.
This work represents the first step of an extremely innovative process in the field of neuromorphic photonics that overcomes the dichotomy between processing and memory units and slowly approaching the functioning of biological neural circuits by replicating biological plasticity through the nonlinear plasticity of the refractive index.

Declaration
Conflict of interest The authors declare that they have not conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.  Initially the gate is in a perfectly balanced state for which, whether it enters a 1-0 or a 0-1 state, the energy is equally distributed at 50% on each output. As time passes (feedback iterations), the gate recognizes the input state and switches accordingly. This also happens for a 1-1 input for which nothing would seem to happen: in fact in this case, the output is always in a 1-1 state. However, a recognition takes place in this case too: being able to ''mark'' the inputs, one would initially find their 50% division on each output channel, while at the end of the recognition, each input will exit entirely from the output that belongs to it