DAerosol-NTM: applying deep learning and neural Turing machine in aerosol prediction

Asaei-Moamam, Zahra-Sadat; Safi-Esfahani, Faramraz; Mirjalili, Seyedali; Mohammadpour, Reza; Nadimi-Shahraki, Mohamad-Hosein

doi:10.1007/s00521-023-08868-4

DAerosol-NTM: applying deep learning and neural Turing machine in aerosol prediction

Original Article
Published: 22 September 2023

Volume 35, pages 24123–24159, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Zahra-Sadat Asaei-Moamam^1,2,
Faramraz Safi-Esfahani ORCID: orcid.org/0000-0001-7539-3089^1,2,
Seyedali Mirjalili³,
Reza Mohammadpour⁴ &
…
Mohamad-Hosein Nadimi-Shahraki^1,2

295 Accesses
2 Citations
Explore all metrics

Abstract

The pollution caused by aerosol (particulate matter) has a detrimental impact on urban environments, particularly in terms of socio-economic factors and public health. Aerosol particles, ranging in size from 1 nm to 100 µm, can easily penetrate organic tissues, carrying toxic gaseous compounds and minerals such as carbon monoxide, ozone, nitrogen dioxide, and sulfur dioxide. Recent advancements in neural network technology, combined with deep learning techniques, have made it possible to predict surges in aerosol pollution. In this study, we introduce DAerosol-NTM, a deep learning framework that utilizes the latest developments in neural Turing machines (NTMs) to access external memory. When compared with four baseline studies that employ multilayer perceptron (MLP), deep neural networks (DNNs), long short-term memory (LSTM), and deep LSTM (DLSTM), DAerosol-NTM significantly improves prediction accuracy by 8–31% and precision by 46–91% and reduces the root mean square error (RMSE) by 24–85%. Additionally, DAerosol-NTM incorporates up to 20 years of particulate matter data in its external storage, making it the first model capable of predicting aerosol pollution surges. By analyzing the data from the previous 96 h, the optimal time interval before and after the aerosol event (TIBAAE) enables the prediction of aerosol events within the following 24 h.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative analysis of deep and machine learning approaches for daily carbon monoxide pollutant concentration estimation

Article 07 December 2022

A new method for prediction of air pollution based on intelligent computation

Article 28 November 2019

Deep Learning for Air Quality Forecasts: a Review

Article 03 September 2020

Data availability

This research also uses well-known data from public repositories that can be shared based on request.

Notes

Neuro evolution of augmenting topologies (NEAT).
Chemical transport models (CTMs).
Weather research and forecasting (WRF) model coupled with chemistry (Chem).
Operational street pollution models (OSPM).
Nested air quality prediction modelling system (NAQPMS).
Generalized additive models (GAMs),
Autoregressive integrated moving average (ARIMA).
Geographically weighted regression (GWR).
Multi-layer regression (MLR).
Support vector machine (SVM).
Artificial neural networks (ANNs).
Fuzzy logic (FL).
Random forest (RF).

References

Nakata M, Sano I, Mukai S (2015) Relation between aerosol characteristics and impact factors on climate and environment. In: International geoscience and remote sensing symposium (IGARSS), 2015-November, pp 2342–2345
Qin Y, Yin Y, Wu Z, Shi L (2010) An observational study of atmospheric Aerosol in the Shijiazhuang area. In: 2010 2nd IITA international conference on geoscience and remote sensing, IITA-GRS 2010, 2, pp 328–331
Diro AA, Chilamkurti N (2018) Distributed attack detection scheme using deep learning approach for Internet of Things. Futur Gener Comput Syst 82:761–768. https://doi.org/10.1016/j.future.2017.08.043
Article Google Scholar
Zhu S, Lian X, Liu H, Hu J, Wang Y, Che J (2017) Daily air quality index forecasting with hybrid models: a case in China. Environ Pollut 231:1232–1244. https://doi.org/10.1016/j.envpol.2017.08.069
Article Google Scholar
Kim S, Lee JM, Lee J, Seo J (2019) Deep-dust: predicting concentrations of fine dust in Seoul using LSTM. arXiv Preprint arXiv:1901.10106, pp 8–10
Xayasouk T, Lee HM, Lee G (2020) Air pollution prediction using long short-term memory (LSTM) and deep auto encoder (DAE) models. Sustainability. https://doi.org/10.3390/su12062570
Article Google Scholar
Sharma A, Mitra A, Sharma S, Roy S (2018) Estimation of air quality index from seasonal trends using deep. Int Conf Artif Neural Netw 2018:511–521. https://doi.org/10.1007/978-3-030-01424-7
Article Google Scholar
Ma J, Cheng JC, Lin C, Tan Y, Zhang J (2019) Improving air quality prediction accuracy at larger temporal resolutions using deep learning and transfer learning techniques. Atmos Environ 214:116885. https://doi.org/10.1016/j.atmosenv.2019.116885
Article Google Scholar
Pengfei Y, Juanjuan H, Xiaoming L, Kai Z (2018b) Industrial air pollution prediction using deep neural network. Commun Comput Inf Sci 951:173–185. https://doi.org/10.1007/978-981-13-2826-8_16
Article Google Scholar
Gulcehre C, Chandar S, Cho K, Bengio Y (2016) Dynamic neural Turing machine with soft and hard addressing schemes. arXiv Preprint arXiv:1607.00036
Turing AM (1950) A quarterly review of psychology and philosophy I. Computing machinery and intelligence. Mind 59:433–460
Article MathSciNet Google Scholar
Siegelmann HT, Sontag ED (1991) Turing computability with neural nets. Appl Math Lett 4(6):77–80. https://doi.org/10.1016/0893-9659(91)90080-F
Article MathSciNet MATH Google Scholar
Han W, Cha S, Ha H-J (2006) Method and apparatus for multi-layered video encoding and decoding. https://patents.google.com/patent/US20060120450A1/en
Graves A, Wayne G, Danihelka I (2014) Neural Turing machines. arXiv preprint arXiv:1410.5401, pp 1–26
Malekmohammadi Faradonbeh S, Safi-Esfahani F (2019) A review on neural Turing machine. https://arxiv.org/abs/1904.05061
Baddeley A (1996) Working memory and executive control. Philos Trans R Soc Lond Ser B Biol Sci 351(1346):1397–1404. https://doi.org/10.1098/rstb.1996.0123
Article Google Scholar
Weston J, Bordes A, Chopra S, Rush AM, Van Merriënboer B, Joulin A, Mikolov T (2016). Towards AI-complete question answering: a set of prerequisite toy tasks. In: 4th International conference on learning representations, ICLR 2016—conference track proceedings. https://arxiv.org/abs/1502.05698
Graves A, Wayne G, Reynolds M, Harley T, Danihelka I, Grabska-Barwińska A, Colmenarejo SG, Grefenstette E, Ramalho T, Agapiou J, Badia AP (2016) Hybrid computing using a neural network with dynamic external memory. Nature 538(7626):471–476. https://doi.org/10.1038/nature20101
Article Google Scholar
Yang G, Rush AM (2019) Lie-access neural Turing machines. In: 5th International conference on learning representations, ICLR 2017—conference track proceedings. http://arxiv.org/abs/1602.08671
Zaremba W, Sutskever I (2015) Reinforcement learning neural Turing machines—revised. arXiv Preprint arXiv:1505.00521
Greve RB, Jacobsen EJ, Risi S (2016) Evolving neural Turing machines for reward-based learning. In: GECCO 2016—proceedings of the 2016 genetic and evolutionary computation conference, pp 117–124. https://doi.org/10.1145/2908812.2908930
Stanley KO, Miikkulainen R (2002) Evolving neural networks through augmenting topologies. Evol Comput 10(2):99–127
Article Google Scholar
Stein G, Gonzalez AJ, Barham C (2013) Machines that learn and teach seamlessly. IEEE Trans Learn Technol 6(4):389–402. https://doi.org/10.1109/TLT.2013.32
Article Google Scholar
Zhao J, Peng G (2011) NEAT versus PSO for evolving autonomous multi-agent coordination on pursuit-evasion problem. Lecture Notes in Electrical Engineering, vol 2, 711–717
Verbancsics P, Harguess J (2013) Generative NeuroEvolution for deep learning. arXiv Preprint arXiv:1312.5355
Lin B, Zhu J (2018) Changes in urban air quality during urbanization in China. J Clean Prod 188:312–321. https://doi.org/10.1016/j.jclepro.2018.03.293
Article Google Scholar
Bekkar A, Hssina B, Douzi S, Douzi K (2021) Air-pollution prediction in smart city, deep learning approach. J Big Data 8(1):1–21. https://doi.org/10.1186/s40537-021-00548-1
Article Google Scholar
Akhtar A, Masood S, Gupta C, Masood A (2018) Prediction and analysis of pollution levels in Delhi using multilayer perceptron. Adv Intell Syst Comput 542:563–572. https://doi.org/10.1007/978-981-10-3223-3_54
Article Google Scholar
Raturi R, Prasad JR (2018) Recognition of future air quality index using artificial neural network. Int Res J Eng Technol (IRJET) 5:2395–0056
Google Scholar
Wang J, Zhang X, Guo Z, Lu H (2017) Developing an early-warning system for air quality prediction and assessment of cities in China. Expert Syst Appl 84:102–116. https://doi.org/10.1016/j.eswa.2017.04.059
Article Google Scholar
Kök I, Şimşek MU, Özdemir S (2017) A deep learning model for air quality prediction in smart cities. In: Proceedings—2017 IEEE international conference on big data, big data 2017, 2018-Jan, 1983–1990
Beheshti S, Khosroshahi K (2017) Study of the ability of various types of artificial neural networks in predicting the amount of CO, NO2, and SO2 pollutants in the metropolitan area of Tabriz. In: Fourth international conference on planning and management
Shams R, World A (2017) Assessing the accuracy of multiple regression model in forecasting air quality index (AQI) in Tehran. Int Conf Res Civil Eng Urban Manage Environ. https://civilica.com/doc/711061/
Zangouei H, Asdaleh F (2017) Prediction of PM10 contamination in Mashhad using MLP artificial neural networks and Markov chain model. J Appl Res Geogr Sci 17(47):39–59. https://iranjournals.nlai.ir/handle/123456789/578038
Farhadi R, Hadavifar M (2016) Prediction of air pollutant concentrations in Tehran based on climatic factors using artificial neural network. In: National conference on research and technology findings in natural and agricultural ecosystems
Li X, Peng L, Hu Y, Shao J, Chi T (2016) Deep learning architecture for air quality predictions. Environ Sci Pollut Res 23(22):22408–22417. https://doi.org/10.1007/s11356-016-7812-9
Article Google Scholar
Kellman P, Hansen MS (2014) T1-mapping in the heart: accuracy and precision. J Cardiovasc Magn Reson 16(1):1–20. https://doi.org/10.1186/1532-429X-16-2
Article Google Scholar
Azevedo A, Santos MF (2008) DD, SEMMA and CRISP-DM: a parallel overview. In: MCCSIS'08—IADIS multi conference on computer science and information systems; proceedings of informatics 2008 and data mining 2008, pp 182–185. https://recipp.ipp.pt/handle/10400.22/136
Castillo Esparcia A, López Gómez S (2021) Public opinion about climate change in United States, partisan view and media coverage of the 2019 United Nations climate change conference (COP 25) in Madrid. Sustainability 13(7):3926. https://doi.org/10.3390/su13073926
Article Google Scholar
Organización de las Naciones Unidas. (2018). World urbanization prospects 2018. In: Department of economic and social affairs. World Population Prospects 2018
Hosseini V, Shahbazi H (2016) Urban air pollution in Iran. Iran Stud 49(6):1029–1046. https://doi.org/10.1080/00210862.2016.1241587
Article Google Scholar
Nazmfar H, Saredeh A, Eshgi A, Feizizadeh B (2019) Vulnerability evaluation of urban buildings to various earthquake intensities: a case study of the municipal zone 9 of Tehran. Hum Ecol Risk Assess Int J 25(1–2):455–474. https://doi.org/10.1080/10807039.2018.1556086
Article Google Scholar
Vallero D (2014) Fundamentals of air pollution—Daniel Vallero—Google Books. Academic Press
Google Scholar
Mohammadpour R, Asaie Z, Shojaeian MR, Sadeghzadeh M (2018) A hybrid of ANN and CLA to predict rainfall. Arab J Geosci. https://doi.org/10.1007/s12517-018-3804-z
Article Google Scholar
Brownlee J (2018) How to develop LSTM models for time series forecasting. Mach Learn Mastery 14:1–77
Google Scholar
Zocca V, Spacagna G, Slater D, Roelants P (2017) Python deep learning—Google Books. Packt Publishing
Google Scholar
Brownlee J (2016) Deep learning with Python: develop deep learning models on Theano and TensorFlow using Keras in deep learning with Python
Vasilev I, Slater D, Spacagna G, Roelants P, Zocca V (2019) Python deep learning: exploring deep learning techniques and neural network—Ivan Vasilev, Daniel Slater, Gianmario Spacagna, Peter Roelants Valentino Zocca—Google Books. Packt Publishing
Google Scholar
Brownlee J (2020) Deep learning with Python: develop deep learning models on Theano and Jason Brownlee—Google Sách
Hossain E, Shariff MAU, Hossain MS, Andersson K (2020) A novel deep learning approach to predict air quality index. In: Proceedings of international conference on trends in computational and cognitive engineering, pp 367–381
Jamal A, Nodehi RN (2017) Predicting air quality index based on meteorological data: a comparison of regression analysis, artificial neural networks and decision tree. J Air Pollut Health 2(1)
Wu Q, Lin H (2019) A novel optimal-hybrid model for daily air quality index prediction considering air pollutant factors. Sci Total Environ 683:808–821. https://doi.org/10.1016/j.scitotenv.2019.05.288
Article Google Scholar
Battan LJ (1979) Fundamentals of meteorology. Fundam Meteorol. https://doi.org/10.1007/978-3-030-52655-9
Article Google Scholar
Jassim MS, Coskuner G (2017) Assessment of spatial variations of particulate matter (PM10 and PM2.5) in Bahrain identified by air quality index (AQI). Arab J Geosci 10(1):1–14. https://doi.org/10.1007/s12517-016-2808-9
Article Google Scholar
Hochreiter S, Urgen Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):17351780
Article Google Scholar
Kim P (2017) Machine learning. MATLAB Deep Learning 130:1–18. https://doi.org/10.1007/978-1-4842-2845-6_1
Article Google Scholar
Boloukian B, Safi-Esfahan F (2020) Recognition of words from brain-generated signals of speech-impaired people: application of autoencoders as a neural Turing machine controller in deep neural networks. Neural Netw 121:186–207. https://doi.org/10.1016/j.neunet.2019.07.012
Article Google Scholar
Gers FA, Schraudolph NN, Schmidhuber J (2003) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3(1):115–143. https://doi.org/10.1162/153244303768966139
Article MathSciNet MATH Google Scholar
Mohammadi M, Al-Fuqaha A, Guizani M, Oh JS (2018) Semisupervised deep reinforcement learning in support of IoT and Smart City services. IEEE Internet Things J 5(2):624–635. https://doi.org/10.1109/JIOT.2017.2712560
Article Google Scholar
Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning. arXiv Preprint arXiv:1506.00019
Yao K, Peng B, Zhang Y, Yu D, Zweig G, Shi Y (2014) Spoken language understanding using long short-term memory neural networks. In: 2014 IEEE Spoken Language Technology Workshop (SLT), pp. 189–194
Gulcehre C, Chandar S, Cho K, Bengio Y (2018) Dynamic neural Turing machine with continuous and discrete addressing schemes. Neural Comput 30(4):857–884. https://doi.org/10.1162/NECO_a_01060
Article MathSciNet MATH Google Scholar
Graves A, Mohamed A, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing, pp 6645–6649
Ji S, Xu W, Yang M, Yu K (2013) 3D Convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231. https://doi.org/10.1109/TPAMI.2012.59
Article Google Scholar
Liu Y, Zheng H, Feng X, Chen Z (2017) Short-term traffic flow prediction with Conv-LSTM. In: 2017 9th international conference on wireless communications and signal processing, WCSP 2017—proceedings, 2017-Jan, 1–6
Faradonbe SM, Safi-Esfahani F (2020) A classifier task based on neural Turing machine and particle swarm algorithm. Neurocomputing 396:133–152. https://doi.org/10.1016/j.neucom.2018.07.097
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Najafabad, Iran
Zahra-Sadat Asaei-Moamam, Faramraz Safi-Esfahani & Mohamad-Hosein Nadimi-Shahraki
Big Data Research Center, Najafabad Branch, Islamic Azad University, Najafabad, Iran
Zahra-Sadat Asaei-Moamam, Faramraz Safi-Esfahani & Mohamad-Hosein Nadimi-Shahraki
Centre for Artificial Intelligence Research and Optimisation, Torrens University, Melbourne, Australia
Seyedali Mirjalili
Department of civil engineering, Estahban Branch, Islamic Azad University, Estahban, Iran
Reza Mohammadpour

Authors

Zahra-Sadat Asaei-Moamam
View author publications
You can also search for this author in PubMed Google Scholar
Faramraz Safi-Esfahani
View author publications
You can also search for this author in PubMed Google Scholar
Seyedali Mirjalili
View author publications
You can also search for this author in PubMed Google Scholar
Reza Mohammadpour
View author publications
You can also search for this author in PubMed Google Scholar
Mohamad-Hosein Nadimi-Shahraki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Faramraz Safi-Esfahani.

Ethics declarations

Conflict of interest

There are no conflicts of interest to disclose for this submission titled “DAerosol-NTM: Applying deep learning and neural Turing machine in aerosol prediction.”

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: research concepts

This section addresses the theoretical and technical nuances and specialized terms used in the study and the advancements of DAerosol-NTM, including the AQI, deep learning (DL), deep short-term neural networks (DSTNN), and the neural Turing machine. For direct remarks on the materials and methods and the results of this paper, please visit the segments following “Research Concepts” section.

1.1 AQI index

AQI is an air quality indicator that reflects and evaluates the air quality status. Although the AQI scale is continuous, different descriptive categories have been implemented to ease public communication, as given in Table

Table 25 Different levels (AQI) [54]

Full size table

25 [3, 54].

One crucial factor to note about AQI is its calculation method. The concentrations are independently measured using six reported pollutant parameters (SO2, NO2, PM2.5, PM10, O3, and CO). Any pollutant's highest value is taken as the AQI value [4]. For this reason, the calculation of AQI is driven by a single parameter, and the accurate prediction of all six pollutants is essential. In most urban areas, PM2.5 and PM10 are the leading drivers of AQI calculations and the dominating reason behind pollution and erosion.

1.2 Deep learning

Deep learning is machine learning, but it functions more similarly to the human brain in a deeper and more advanced form. In other words, deep learning is part of a more prominent family of machine learning that focuses on methods based on artificial neural networks. This type of learning is an essential element in data science that receives raw inputs and extracts high-level features in several layers, including statistics and forecasting modeling. In-depth learning is very beneficial for data scientists to collect, analyze, and interpret large amounts of data. In general, it makes the process faster and easier. In other words, deep learning models process data more accurately and quicker due to the complexity and high ability to learn, especially in big data in research fields such as image processing, pattern recognition, and computer vision [55]. Deep learning is a powerful machine learning method that provides approximation, classification, and predictability [57,58,59] and [56].

See Fig.

17.

1.3 Deep short-term neural network

The LSTM architecture [55] works better than conventional neural networks for long-term tasks [58], including a deep and short-term neural network. Thus, the lower the short-term neural network output sequence, the higher the short-term neural network input sequence [61]. The deep LSTM architecture used in Refs. [7, 63] can be described by ($\sigma (x)$, $h_{t}^{l}$, $i_{t}^{l}$, $f_{t}^{l}$, $s_{t}^{l}$, $o_{t}^{l}$), where $l$ represents the layer index followed by Eqs. 4–9. Equation 10 defines the Softmax activity function for a simple N-dimensional problem.

$$\sigma (x) = \frac{1}{1 + \exp ( - x)}$$

(4)

$$h_{t}^{l} = o_{t}^{l} \tanh (S_{t}^{l} )$$

(5)

$$i_{t}^{l} = \sigma (W_{i}^{l} [X_{t} ;h_{t - 1}^{l} ;h_{t}^{l - 1} ] + b_{i}^{l} )$$

(6)

$$f_{t}^{l} = \sigma (W_{f}^{l} [X_{t} ;h_{t - 1}^{l} ;h_{t}^{l - 1} ] + b_{f}^{l} )$$

(7)

$$s_{t}^{l} = f_{t}^{l} s_{t - 1}^{l} + i_{t}^{l} \tanh (W_{s}^{l} [X_{t} ;h_{t - 1}^{l} ;h_{t}^{l - 1} ] + b_{s}^{l} )$$

(8)

$$o_{t}^{l} = \sigma (W_{o}^{l} [X_{t} ;h_{t - 1}^{l} ;h_{t}^{l - 1} ] + b_{o}^{l} )$$

(9)

$$S_{N} = \left\{ {a \in R^{N} :a_{i} \in [0,1],\sum \sum\nolimits_{{}}^{N} {a_{i} } = 1} \right\}$$

(10)

1.4 CNN-DLSTM

The use of classical CNN architecture is the best choice when input networks are 2-D or 3-D tensors like images or videos [64]. Since LSTMs architectures were more adapted for 1-D data, a new variant of LSTM called convolutional LSTM or ConvLSTM [65] is designed. In this architecture, the LSTM cell, which contains a convolution operation and input dimension of data, is kept in the output layer instead of just a 1-D vector. A convolution operation has replaced matrix multiplication at each gate of classical LSTM. The ConvLSTM architecture applies the capabilities of CNN and LSTM neural networks. It is normally developed for 2-D spatiotemporal data such as satellite images. In the first part of this model, convolutional layers extract essential features of the input data, and the results were flattened in a 1-D tensor so that they can use as input for the second part of the model (LSTM). Finally, before passing data in the last hidden layer, information has to be reshaped in the original form of input data. The architecture of CNN-LSTM is shown in Fig.

18.

1.5 Neural Turing machine

It is a method derived from the Turing machine and neural networks. This model consists of recurrent neural networks (RNNs) [60] with an addressable external memory along with the ability of the recursive neural networks to perform algorithmic tasks such as sorting, copying, and N-gram. Generally speaking, a memory bank, a controller, and read and write heads are the main components of this method. The controller's job is to receive data from the outside and generate outputs during the update cycle. In addition, the neural Turing machine method guides the read and write heads directly into the external memory in the form of a tape [57, 66]. Figure

19 shows the structure of the neural Turing machine in general in section (A) and its components in section (B) [10] and [62].

1.5.1 The actions that the controller performs through the heads

The controller consists of five main functions: read, write, erase, move to the next memory cell, and move to the previous one. During each update cycle, the network controller receives inputs from the external environment and publishes the outputs in response. The network also starts reading and writing from a memory matrix using parallel read and write heads. The dotted line in Fig. 19A shows the division between the NTM circuit and the outside world. Each part of the structure is recognizable and distinct. This feature makes the network easier to train with gradient descent by defining “blurry or unclear” reading and writing operations. It also communicates more or less with all memory elements (instead of considering one element as a typical Turing machine or digital computer). A “focus” mechanism determines the degree of blur.

Each operation forces the read/write heads to communicate with the small memory and ignore the rest. Because the interaction with memory is low and fragmented, NTM is based on data storage without interference. The output of the heads determines which memory location gets the most attention. These outputs define a normalized weighting on the rows of the memory matrix (referring to memory locations). At each assigned weight, each read/write head establishes the degree to which the head reads/writes in each area. Each head focuses precisely on a single memory location or multiple memory locations. Figure

20 displays the conceptual model of the controller's actions through the heads.

1.5.2 Reading and writing

The read and write operations are normalized weighting functions over the memory locations, similar to attention mechanisms. These weightings define a continuous distribution over the memory locations to make the operation differentiable. The reading operation is a simple linear combination of the memory locations:

Figure

21 shows the read and write operation in NTM [66] in sections (A) and (B).

Figure 21A shows the Nth element of W_t(i) according to the limits of Eq. 11:

$$\sum\limits_{i = 0} {W_{t} (i)} = 1 \to 0 \le W_{t} (i) \le 1,\forall i$$

(11)

M_t is the contents of the M*N memory matrix at time t, where N is the number of memory locations and M is the size of the vector in each location. W is the vector of weights on the N locations propagated by a head reading at time t since all weights are normalized.

The read vector is a weighted convex combination of the memory location. The length M reads the return r_t vector by the head defined as a combination of row vectors M_t(i) in memory, as shown in Eq. 12, separable in terms of memory and weight.

$$r_{t} \xleftarrow{{}}\sum\limits_{{}} {w_{i} } (i)M_{t} (i)$$

(12)

Figure 21B shows that the writing operation is a convex combination of erasing and writing to the memory locations. The write head outputs both erase (e) and add (a) vectors. Writing to the memory will then be made by erasing the locations defined by the write weighting vector and adding the locations specified by the same weighting vector. Again, notice that erasing and writing locations in different proportions make the operation differentiable, where parts of memory are erased according to the weighting vector, as shown in Eq. 13:

$$M^{\prime }_{t} (i) \leftarrow M_{t - 1(i)} [1 - w_{t} (i)e_{{t_{t} }} ]$$

(13)

Multiplying it in the memory location works as a peer-to-peer. Therefore, the elements of the memory location are zeroed where the weight is and the clearing element is 1. If the weight or clearance element is zero, the memory remains unchanged. When there are multiple writing heads, cleansing can be done in any order, where new information is added to locations defined by the weightings, as shown in Eq. 14:

$$M\mathop {_{t} }\limits^{{}} (i) \leftarrow M^{\prime}_{t} (i) + W_{t} (i)a_{t}$$

(14)

The combine, clear, and add operations produce the final memory content at time t. Because delete and add are different, compound writing operations are also distinguishable. Note that both addition and subtraction vectors have M-independent components. Again, adding the sequence of vectors by multiple heads is also trivial. It allows precise control over the modified elements in each memory location.

1.5.3 Addressing mechanism

In the previous section, the equations of reading and writing were shown and examined. However, no explanation was given on how the weights are produced. These weights are created by combining two addressing mechanisms with complementary features. The first mechanism is content-based addressing, which looks at locations based on the similarity between current values and values published by the controller that addresses the content of Hopfield networks. The advantage of content-based addressing is that the retrieval is easy. It only needs a controller to estimate a portion of the stored data to compare it to memory for the accurate stored value. However, content-based addressing is not suitable for solving all problems. In some works, the content of a variable is as desired [57]. However, the variable still needs a recognizable name or address. Computational problems are as follows: Variables X and Y can take both values, but the $f(x,y) = x \times y$ trend must still be defined. A controller initializes the x and y variables, stores them in different addresses, retrieves them, and performs a multiplication algorithm. In this case, the variables are addressable by location, not content. This form of addressing is called location-based addressing. Content-based addressing is more commonly used than location-based addressing, as the content of a memory location can contain location information within it. However, it is necessary to provide location-based addressing as a primary operation for some generalized forms in experiments. As a result, both mechanisms are used together. The flow diagram of the addressing mechanism indicates the sequence of operations to construct a weight vector while reading and writing [11] (see Fig.

22).

1.5.4 Network controller

The NTM structure is described in the section. It has several free parameters: memory size, number of read/write heads, and allowable location changes. Perhaps, the most important structure is the type of neural network used as the controller, especially when deciding whether to use a recurrent or a feedforward neural network (FNN). A recursive controller such as LSTM has internal memory to complete a larger matrix. Suppose a central processing unit controller on a digital computer (albeit with adaptive instructions instead of predefined ones) is compared to the RAM matrix. In that case, the hidden activities of the recursive controller are similar to the processor's registers. They allow the controller to combine information throughout the various time stages of the operation.

On the other hand, a feedforward controller mimics an RNN network by reading and writing to the specific memory location at each step. In addition, the feedforward controllers often give more transparency to network operations, because the read and write patterns in the memory matrix are usually easier to interpret than the internal state of an RNN. However, one of the limitations of a feedforward controller is that the number of read/write heads simultaneously causes a constraint (limiting operation) on the type of computation that NTM can perform. With a single read head, only a single conversion can be performed on a single memory vector at each time step. With two reading heads, it can do binary conversions, etc. Recurrent controllers can store reading vectors from previous efforts, so they do not suffer from this limitation.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Asaei-Moamam, ZS., Safi-Esfahani, F., Mirjalili, S. et al. DAerosol-NTM: applying deep learning and neural Turing machine in aerosol prediction. Neural Comput & Applic 35, 24123–24159 (2023). https://doi.org/10.1007/s00521-023-08868-4

Download citation

Received: 01 November 2022
Accepted: 10 July 2023
Published: 22 September 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00521-023-08868-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DAerosol-NTM: applying deep learning and neural Turing machine in aerosol prediction

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of deep and machine learning approaches for daily carbon monoxide pollutant concentration estimation

A new method for prediction of air pollution based on intelligent computation

Deep Learning for Air Quality Forecasts: a Review

Data availability

Notes

References