Predicting Frequency Deviation of a Crystal Oscillator based on Long Short-Term Memory Network and Transfer Learning Technique

Crystal oscillators are fundamental to an extensive range of electronic systems, spanning computers, mobile phones, and automotive electronics. Their significance is accentuated in high-precision applications such as global positioning systems (GPS) and aerospace systems where the frequency-temperature characteristics and thermal hysteresis phenomena are of paramount importance. This study introduces a groundbreaking approach for predicting frequency deviations arising from thermal hysteresis using Long Short-Term Memory (LSTM) networks. Contrary to prior research which predominantly utilized cubic functions to model frequency-temperature characteristics and frequently overlooked thermal hysteresis, this investigation distinguishes itself by leveraging LSTM. The proposed methodology is aptly designed to model both time-dependent and temperature-dependent variations, consequently offering a heightened precision in predicting frequency deviations. By integrating transfer learning techniques, the model's adaptability to diverse databases is augmented, broadening its utility. Experimental evaluations with real-world data underscore the preeminence of the introduced method, registering a root mean square error (RMSE) of less than 0.05 ppm, more favorable than that by the traditional cubic functions and all the prior arts.


Introduction
Crystal resonators serve as critical components of contemporary electronic systems, are paramount in the orchestration of accurate and efficient functionalities within these circuits [1,2].These components, although minuscule, play a pivotal role in determining the overarching performance and fidelity of systems they inhabit.In low-demanding applications like wristwatches, minor frequency deviations pose little to no threat to the system's operation.
However, when the focus shifts to industries rooted in precision, like aerospace or mobile communications, even the slightest frequency inaccuracies can lead to significant system aberrations, occasionally resulting in critical repercussions [3,4].
The phenomenon leading these frequency deviations primarily stems from fluctuations in temperature, collectively encapsulated under frequency-temperature (f-T) characteristics [5,6].
An intricate aspect of this relationship is the manifestation of thermal hysteresis.Acting as a detrimental confounding variable, thermal hysteresis [7] introduces inconsistencies in the frequency response of crystal oscillators during varying thermal regimes, thereby complicating the linearity of the f-T relationship [8,9].Consider, for instance, the precision mandated by the global positioning systems (GPS).A slight aberration in the f-T characteristic could result in substantial navigational inaccuracies, potentially jeopardizing safety in critical applications.
Hence, the exigency to develop methodologies and systems that can adeptly compensate for these frequency anomalies cannot be stressed enough.Extensive research has been dedicated to the study of f-T characteristics in crystal oscillators [10].For example, Ballato [11] characterized the static f-T properties using cubic functions.Zhou et al. [9] discussed multiple factors impacting the accuracy of oscillators and underscored temperature as the predominant factor.Islam et al. [12] employed temperature compensation techniques like linear equation fitting and lookup tables, while Tran et al. [13] designed a digitally controlled crystal oscillator (DCXO) based on cubic f-T curves.However, most of these studies rely on cubic functions to model the f-T characteristics, which although efficient in certain scenarios, are inadequate for applications demanding high precision.These models often neglect the implications of thermal hysteresis on frequency deviations [14].Despite its early identification, thermal hysteresis has frequently been overlooked in traditional temperature-compensated crystal oscillator (TCXO) systems [15].
Consequently, investigating f-T characteristics that account for thermal hysteresis has emerged as a significant research avenue as presented in Figure 1.
The current research endeavors to carve a niche in the field by presenting a robust analytical framework that holistically captures the nuances of f-T characteristics, especially in the context of thermal hysteresis.This study transcends these limitations by integrating advanced analytical methods, leveraging the potential of deep learning architectures to gain nuanced insights into the intricate f-T relationship.A cornerstone of our contribution is the adept utilization of long shortterm memory (LSTM) [16,17].These deep learning constructs, celebrated for their prowess in discerning and deciphering temporal relationships, introduce a fresh perspective to the analysis, allowing for a meticulous interpretation of patterns that have remained obscure in traditional models.The intricacies of thermal hysteresis, previously obfuscated in conventional analyses, are unraveled with newfound clarity, bringing to light the latent variables and intricate interplays that dictate frequency deviations.Furthermore, the study introduces a paradigm shift in research methodologies by championing the cause of transfer learning.By harnessing previously acquired knowledge and adapting it to new, yet somewhat similar challenges, the model achieves superior predictive accuracy even in scenarios fraught with limited data.This versatility and adaptability underscore the potential of the proposed approach in real-world applications, where datasets can be sparse, yet the demand for precision remains uncompromising.In essence, the contributions of this study are manifold.Beyond presenting a superior analytical framework, it underscores the pivotal role of innovative methodologies in understanding complex phenomena.By bridging traditional knowledge gaps and illuminating uncharted territories in crystal oscillator research, this study holds the promise of pioneering advancements that can redefine the standards of precision and reliability in the electronic systems.

Data Collection
Data on temperature and frequency deviation are collected to assess the proposed method for modeling f-T characteristics.As depicted in Figure 2, both a crystal oscillator and a thermistor are contained within the same ceramic enclosure.The measurements taken include the thermistor's temperature and the crystal oscillator's frequency deviation.The initial phase of the research involved establishing an experimental system comprised of a temperature-controlled oven, a temperature sensor, a frequency measurement instrument, and a computational unit.The crystal oscillator is situated within the oven, facilitating temperature modulation.The architecture of this experimental system is represented in Figure 3.A complete set of cyclical data is gathered, capturing two peaks in accordance with the pattern shown in Figure 1.

Deep Learning Model
In the present study, data were gathered from three distinct printed circuit boards (PCBs), labeled as PCB A, PCB B, and PCB C. Each board utilizes a unique type of oscillator and the configurations of other circuits on these boards differ as well.Table I provides a summary of the database characteristics.This research employs transfer learning methodologies during the model's training phase.By using transfer learning, the model is adeptly fine-tuned for new PCB, thereby reducing the training time and computational resources yet maintaining high predictive accuracy.The initial training data, referred to as the pre-trained database, were sourced from PCB A, chosen due to its wide temperature range and high rate of temperature variation.This database serves as the foundational model for subsequent fine-tuning.Databases from PCB B and PCB C were employed to fine-tune the model.Figure 4 delineates the overall research framework, indicating the specific databases used for pre-training and fine-tuning phases.
Throughout the training protocols, each database was partitioned into two subsets.Eighty percent of the data were allocated for training and the remaining 20% were set aside for testing.In the context of our model, these LSTM layers are meticulously trained to extract, learn, and remember patterns from the historical frequency-temperature data, ensuring that even subtle temporal correlations inherent in the input features are accurately captured.By retaining this historical context, the LSTM layers provide a robust foundation for predictive tasks, particularly those rife with sequential dependencies.Following the dual LSTM layers, the architecture culminates in a fully connected layer.This layer, densely interconnected, functions as the computational nexus of the model.It synthesizes the high-dimensional temporal insights gleaned from the preceding LSTM layers, processing them to compute the final prediction for frequency deviation.One of the predominant challenges, and indeed the motivation behind the architectural choices, is the inherent complexity associated with predicting frequency deviations.Crystal readout boards frequently encounter intricate temperature gradients, characterized by both gradual shifts and abrupt temperature fluctuations.These non-linear temperature dynamics, if not aptly addressed, can introduce significant predictive inaccuracies.The LSTM layers, with their inherent memory cells and gating mechanisms, excel in capturing and adapting to these complex temperature dynamics.Their capacity to remember long-term dependencies ensures that even sudden temperature changes or prolonged gradients do not confound the predictive outputs.Thus, by employing LSTM layers, our model stands equipped to dynamically adjust its predictions in alignment with the intricate temperature histories, ensuring enhanced accuracy and reliability in the frequency deviation estimations.

Input Features
The frequency of a crystal oscillator is influenced by various factors, including the cut type, wafer size, vibration mode, temperature, aging, and drive level [9].Once manufacturing is completed, the operating temperature becomes the primary factor affecting frequency.Frequency deviations are observed particularly when the operating temperature exceeds ambient conditions.
In situations with slow temperature changes, transient thermal effects may be ignored, allowing the f-T characteristic to be represented by a static equation.The static f-T characteristic is defined as [11] where T0 is a reference temperature, f0 is a nominal frequency and n is a static model order which is typically set to 3. For analytical convenience, Δ f = f -f0 and ΔT = T -T0 are defined, and (Δ f / f0) = (f -f0)/(f0) is termed as the relative frequency deviation.However, this static model fails to accurately represent the frequency deviation under rapid temperature changes.The inclusion of the rate of temperature change enriches the classical dynamic model [11] as Thermal hysteresis is characterized herein by different frequency deviations during ascending and descending temperature phases.At a constant ΔT value of 0°C, variable frequency deviations are governed by the first-order derivative term.This term serves to represent the rate of temperature change and affects the subsequent frequency deviation.The necessity of historical temperature data for determining this rate is addressed by LSTM layers, which incorporate past information for predictive analysis.Based on the foundational research, T，ΔT，ΔT 2 ，ΔT 3 ， ΔT/Δt and Δ 2 T/Δt 2 are employed as input features to develop a more accurate frequencytemperature model as presented in Table II.The thermal resistor adjacent to the crystal is designated by the suffix "TSX," from which temperature measurements are taken.

Dataset Splitting and Cross Validation
In However, to further fortify the model against overfitting, this study integrates the renowned K-fold cross-validation method [18] during the training regimen.This method, transcending the confines of traditional train-test splits, cyclically rotates data through training and validation roles, ensuring every data point is comprehensively utilized.The K-fold mechanism is visually illustrated in Figure 6, providing a clear representation of the methodology.Within the purview of K-fold cross-validation, where K is set at a value of 5, the dataset is meticulously segmented into five equidistant subsets.As the training process unfolds, each subset, in rotation, is earmarked for validation, while the others amalgamate to form the training set.This iterative process ensures that each data subset experiences both the crucible of training and the scrutiny of validation.Upon the culmination of these cycles, the performance of each model variant, sculpted by its unique training-validation blend, is rigorously assessed against an external test dataset.The iteration demonstrating superior performance metrics is anointed as the definitive model, primed for predictive tasks in real-world scenarios.

Model Optimization
The hyperparameters relevant to the proposed model are outlined in Table III.Given that the task of predicting frequency deviation is essentially a regression problem, the mean square error (MSE) serves as the loss function for the proposed deep learning model.The formal equation for MSE calculation is presented as where y be the vector of observed values of the dependent variable, and ŷ represent the corresponding predicted values.The Adam optimizer [19] is selected for this research owing to the intricate nature of the proposed model and suboptimal convergence rates exhibited by alternative optimizers.Advantages conferred by the Adam optimizer encompass adaptive learning rates and adaptive momentum, both of which expedite the convergence process.These features collectively contribute to an enhanced optimization efficiency and superior model performance.

Finetuning the Model by Transfer Learning
The f-T characteristics, defining the frequency-temperature relationship of crystal oscillators, can exhibit variance due to a myriad of factors.One such determinant is the choice of crystal itself.
Different crystals inherently possess distinct electronic properties and resonate frequencies, leading to nuanced shifts in their f-T characteristics.Further complicating the landscape is the spatial configuration of identical crystal oscillators on a PCB.Minute alterations in their placement can result in diverse electromagnetic interactions and thermal dissipation patterns, subsequently influencing their oscillatory behavior and, by extension, their f-T characteristics.
Given these inherent variations, training a model from scratch for every unique setup would be computationally intensive and practically infeasible, especially when the available data for each specific scenario might be limited.Transfer learning [20,21] deep learning methodologies, as illustrated in Figure 7, is a potent strategy that capitalizes on the knowledge acquired from a previously trained model and adapts it to a new, yet somewhat related task.The overarching principle is grounded in the notion that if a model has been trained on a particular task, its learned features can act as a foundational knowledge base for a related task, obviating the need to start the learning process from ground zero.
In the context of this study, the pre-existing model, previously trained on a specific crystal oscillator configuration, serves as the foundation.When adapting to a new setup, instead of retraining the entire model, a strategic approach is employed.The first LSTM layer, having grasped the fundamental temporal patterns from the pre-trained model, is retained as nontrainable.This ensures that the foundational knowledge remains intact.The subsequent layers, however, are subjected to fine-tuning, allowing them to adapt and learn the nuances and specificities of the new oscillator configuration.This approach, while preserving the foundational knowledge from the pre-trained model, refines the upper layers to ensure they are optimally tailored to the unique characteristics of the new crystal oscillator board.The culmination is a highly specialized model, primed to deliver accurate predictions for each specific setup, without the need for exhaustive training on vast datasets.Through the adept incorporation of transfer learning, this study underscores a pragmatic approach to harnessing and adapting pre-existing knowledge, enabling the efficient development of models tailored to diverse crystal oscillator configurations.

Results by Pretrained model
Having trained the multiple deep learning model, it is imperative to assess their performance.
The technique of cross-validation, delineated in the preceding section, serves a dual purpose: to choose the most efficacious model and to gauge its performance.This segment of the paper delves into the performance metrics of the pre-trained model, with a particular emphasis on the root mean square error (RMSE) as the evaluative metric for the frequency deviation model.In the model selection phase, three diverse architectures were considered including extreme learning machine (ELM) [22], multilayer perceptron (MLP) [23], and LSTM.The structure of the MLP model is elaborated upon in Figure 8 and Table IV, encompassing two hidden layers, each containing twenty neurons.LSTM configurations, varying from a single layer to four layers, were investigated to pinpoint the optimal setup.The performance metrics, consolidated in Table V, suggest that a dual-layer LSTM architecture holds an edge over its counterparts.Evaluations highlighted that the RMSE associated with the MLP model was reduced by 33.8% when juxtaposed with the curve-fitting model.This decrement, however, was not deemed substantial, leading to the preference for the dual-layer LSTM model which exhibited a 51.3% diminution in RMSE relative to the curve-fitting approach.Figures 9(a), 9(b), and 9(c) depict the frequencytemperature attributes of PCB A under disparate computational methodologies: curve fitting, multilayer perceptron, and dual-layer LSTM networks.Thermal hysteresis is conspicuously apparent within the temperature intervals of -5°C to 15°C and 40°C to 60°C in these illustrations.
The traditional curve-fitting technique fell short in providing a precise estimation of thermal hysteresis.The MLP model rendered precision at diminished temperatures but faltered at elevated temperatures.Contrarily, the dual-layer LSTM framework rendered accurate prognostications across both temperature extremities, a claim substantiated by the learning curve showcased in Figure 10.

Results by Fine-Tuned Model
The subsequent analysis provides insight into the outcomes of the refined model, as detailed in

Evaluation based on Error Distribution
In the pursuit of developing precise predictive models, evaluating their accuracy and reliability remains paramount.Analyzing the distribution of errors offers a comprehensive insight into a model's efficacy, highlighting deviations between anticipated and actual results.Figures 15(a

Performance Comparison to Other Works
This study emphasizes modeling the f-T characteristics of crystal oscillators.Table VIII provides a comparative analysis between the method introduced in this research and those outlined in prior literature.For a uniform benchmarking standard, the methods listed in Table VIII were applied using a consistent dataset, namely PCB A, as expounded in earlier sections.The preferred metric for gauging predictive accuracy across the spectrum of studies is RMSE.The analysis commences with the ELM technique, a distinct machine learning methodology tailored for the adept training of single-hidden layer feedforward neural networks.Distinguished by its accelerated learning prowess and streamlined approach relative to conventional gradient-based optimization, ELM initializes the hidden layer weights arbitrarily and retains them as static.For ELM, the output layer weights are then fine-tuned through linear regression, with Figure 16 illustrating its modeling outcomes.However, Figure 16 also underscores ELM's limitations in modeling the thermal hysteresis phenomenon across both lower and upper temperature zones.
Following ELM, the study explores the MLP, as presented in Figure 8 and detailed in Table IV The subsequent analysis probes the factors enhancing the performance of the methodology presented in this study relative to prior literature.A pivotal determinant is thermal hysteresis, shaped by both the prevailing temperature and its rate of change.Comparative scrutiny of input features in ELM and MLP shows that conventional models largely employ ΔT, ΔT 2 , and ΔT 3 as their input features.In a departure from this, our model encompasses the actual temperature (T) and the rate of temperature change (ΔT/Δt) as additional features.The strategic deployment of LSTM in our architecture further equips the model to discern temporal dynamics among these features.The elevated performance of our methodology stems from its enriched input feature set and the judicious architectural choice.Experimental results corroborate that our approach renders precise predictions on frequency deviations linked to thermal hysteresis.Quantitatively, our LSTM-based method marks a pronounced RMSE reduction in contrast to ELM and MLP: from 0.0987 in the ELM model to 0.0483 in the LSTM and from 0.0656 in the MLP to 0.0483 in the LSTM.Thus, this study demonstrates superior accuracy, underscoring fewer prediction errors than prior works employing ELM [24] and MLP [25] presented in Table VIII.

Discussion and Conclusion
In the quest to accurately model the f-T characteristics of crystal oscillators, this study harnesses the potent capabilities of a deep learning framework.By employing LSTM architecture, recognized for their dexterity in time-series analysis, intricate f-T characteristics are modelled with unparalleled precision.A salient feature of this research is the strategic implementation of transfer learning techniques.This not only broadened the adaptability of our neural architecture across various oscillator types but also substantially enhanced model generalization.
A compelling observation from our experimental results is the exemplary performance of LSTM, especially when predicting thermal hysteresis.Experimental evaluations with real-world data underscore the preeminence of the introduced method, registering a root mean square error              It is square of the change in temperature introducing squared terms allows for capturing non-linear relationships in the data or system behavior.
ΔT 3  It is the cube of the change in temperature allowing for capturing higherorder non-linearities in the model.

ΔT/Δt
It represents the rate of change of temperature with respect to time.It is the first derivative of the temperature concerning time and gives information about how quickly the temperature of the crystal is changing, which can be crucial for understanding dynamic behaviors.

Figure 5
Figure 5 offers a comprehensive visualization of the architecture of our proposed deep learning model.At its core, the architecture integrates two LSTM layers, transitioning into a singular, fully connected layer towards its conclusion.The selection of LSTM layers is underpinned by their renowned capability to parse and comprehend temporal relationships within data sequences.
Figures 13 and 14 display the training evolution of the LSTM framework, referencing the loss learning curves for PCB B and PCB C, respectively.
), (b), and (c) display the spread of prediction errors for datasets derived from PCB A, PCB B, and PCB C, respectively.These illustrations reveal pronounced error intensities, especially under conditions marked by temperature fluctuations.Such pronounced discrepancies are traced back to the sparse experimental data available under these specific conditions, thereby compromising the model's generalization capabilities.To address this limitation, a prospective approach might entail systematically augmenting the experimental data to capture a wider range of temperature fluctuation scenarios.By enriching the dataset with these specific conditions, the model would benefit from a broader and more illustrative set of training instances.Implementing this approach could potentially bolster the model's prediction precision under such demanding conditions, consequently boosting its overarching performance.

(
RMSE) of less than 0.05 ppm, more favorable than that by the traditional cubic functions and all the prior arts.The empirical findings underscore the dominance of the dual-layer LSTM architecture over traditional methods, showcasing a marked reduction in RMSE relative to curvefitting and MLP models.This supremacy of LSTM is further underscored by its robust performance across diverse oscillator configurations and under multifarious temperature conditions.The rate of temperature change in the dataset, confined to a specific range, poses limitations on the extrapolative capabilities of the model.While the software-based validation and testing were thorough, they may not fully capture the challenges associated with deploying the neural network model on actual hardware in real-world scenarios.Future research should prioritize the fusion of neural architectures with the nuances of hardware platforms, aiming for models that are both accurate and compatible with real-world applications.In summation, backed by robust experimental evidence, this study has engineered a pathway in modeling f-T characteristics of crystal oscillators through deep learning.The dual-layer LSTM model, with its stellar accuracy, sets the stage for further innovative explorations.It beckons a future where datasets are richer, models are sharper, and the gap between software prowess and hardware intricacies is seamlessly bridged.International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT), 2021.

Figure 7 .
Figure 7. Block diagram of the proposed transfer learning.

Figure 10 .Figure 12
Figure 10.The model loss during the training process of PCB A using 2 LSTMs.

Figure 13 .
Figure 13.The model loss during training process of PCB B using 2 LSTMs.

Figure 14 .Figure 15 .
Figure 14.The model loss during training process of PCB C using 2 LSTMs.

Figure 16 .
Figure 16.The f-T plot of PCB A in the testing stage using ELM.

Figure 17 .
Figure 17.The f-T plot of PCB A in the testing stage using MLP.

Figure 18 .
Figure 18.The f-T plot of PCB A in testing stage using the proposed LSTM model.

Δ 2 T/Δt 2
It is the second derivative of the temperature concerning time.It represents the acceleration or the rate of change of the rate of change of temperature.It is useful for understanding the inflection points or where the temperature's rate of change is itself changing.
the intricate realm of deep learning and predictive modeling, establishing a model's resilience and ability to generalize is of utmost importance.A central strategy in achieving this is the judicious use of data partitioning.This tactic is deployed in the current study to serve dual purposes: to ward off the perennial risk of model overfitting and to ensure that the entirety of the dataset is judiciously leveraged.Overfitting represents a scenario where a model, though performing admirably on its training data, struggles with new, unseen data.This phenomenon can severely compromise the model's applicability in real-world scenarios.To counteract this, the dataset encompassing crystal oscillator readings has been bifurcated into training and validation segments, following the widely-accepted 80:20 ratio.Such a division, grounded in empirical research, offers a substantial data volume for training while reserving a significant portion for model validation, thereby ensuring a holistic model learning experience.
, a ubiquitous variant of artificial neural networks.Reference texts leverage the backpropagation algorithm for MLP's training.Contrasting with ELM, MLP facilitates perpetual weight modifications between input and hidden layers via backpropagation.Figure17 underscores MLP's adeptness in forecasting thermal hysteresis at milder temperatures, but highlights a diminishing efficacy at higher temperatures.The final technique under evaluation is the LSTM, as previously touched upon.Insights from Figure18accentuate LSTM's superior capability in prognosticating thermal hysteresis over a comprehensive temperature spectrum.In summary, ELM, despite its expedited training phase, registers suboptimal RMSE metrics and grapples with thermal hysteresis.Conversely, while MLP performs well at lower temperatures, it is less effective at higher ones.

Table I .
Information of the databases used.

Table II .
Symbols and Descriptions for Temperature-Dependent Features in Crystal Resonator Modeling.

Table III .
Hyperparameters of the proposed model.

Table IV .
Hyperparameters of the MLP model.

Table V .
Performance of different architecture in pre-trained model with dataset PCB A.

Table VI .
Performance of different architecture in fine-tuned model with dataset PCB B.

Table VII .
Performance of different architecture in fine-tuned model with dataset PCB C.

Table VIII .
Comparison of research related to modeling the f-T characteristics.