Skip to main content
Log in

DAerosol-NTM: applying deep learning and neural Turing machine in aerosol prediction

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The pollution caused by aerosol (particulate matter) has a detrimental impact on urban environments, particularly in terms of socio-economic factors and public health. Aerosol particles, ranging in size from 1 nm to 100 µm, can easily penetrate organic tissues, carrying toxic gaseous compounds and minerals such as carbon monoxide, ozone, nitrogen dioxide, and sulfur dioxide. Recent advancements in neural network technology, combined with deep learning techniques, have made it possible to predict surges in aerosol pollution. In this study, we introduce DAerosol-NTM, a deep learning framework that utilizes the latest developments in neural Turing machines (NTMs) to access external memory. When compared with four baseline studies that employ multilayer perceptron (MLP), deep neural networks (DNNs), long short-term memory (LSTM), and deep LSTM (DLSTM), DAerosol-NTM significantly improves prediction accuracy by 8–31% and precision by 46–91% and reduces the root mean square error (RMSE) by 24–85%. Additionally, DAerosol-NTM incorporates up to 20 years of particulate matter data in its external storage, making it the first model capable of predicting aerosol pollution surges. By analyzing the data from the previous 96 h, the optimal time interval before and after the aerosol event (TIBAAE) enables the prediction of aerosol events within the following 24 h.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Data availability

This research also uses well-known data from public repositories that can be shared based on request.

Notes

  1. Neuro evolution of augmenting topologies (NEAT).

  2. Chemical transport models (CTMs).

  3. Weather research and forecasting (WRF) model coupled with chemistry (Chem).

  4. Operational street pollution models (OSPM).

  5. Nested air quality prediction modelling system (NAQPMS).

  6. Generalized additive models (GAMs),

  7. Autoregressive integrated moving average (ARIMA).

  8. Geographically weighted regression (GWR).

  9. Multi-layer regression (MLR).

  10. Support vector machine (SVM).

  11. Artificial neural networks (ANNs).

  12. Fuzzy logic (FL).

  13. Random forest (RF).

References

  1. Nakata M, Sano I, Mukai S (2015) Relation between aerosol characteristics and impact factors on climate and environment. In: International geoscience and remote sensing symposium (IGARSS), 2015-November, pp 2342–2345

  2. Qin Y, Yin Y, Wu Z, Shi L (2010) An observational study of atmospheric Aerosol in the Shijiazhuang area. In: 2010 2nd IITA international conference on geoscience and remote sensing, IITA-GRS 2010, 2, pp 328–331

  3. Diro AA, Chilamkurti N (2018) Distributed attack detection scheme using deep learning approach for Internet of Things. Futur Gener Comput Syst 82:761–768. https://doi.org/10.1016/j.future.2017.08.043

    Article  Google Scholar 

  4. Zhu S, Lian X, Liu H, Hu J, Wang Y, Che J (2017) Daily air quality index forecasting with hybrid models: a case in China. Environ Pollut 231:1232–1244. https://doi.org/10.1016/j.envpol.2017.08.069

    Article  Google Scholar 

  5. Kim S, Lee JM, Lee J, Seo J (2019) Deep-dust: predicting concentrations of fine dust in Seoul using LSTM. arXiv Preprint arXiv:1901.10106, pp 8–10

  6. Xayasouk T, Lee HM, Lee G (2020) Air pollution prediction using long short-term memory (LSTM) and deep auto encoder (DAE) models. Sustainability. https://doi.org/10.3390/su12062570

    Article  Google Scholar 

  7. Sharma A, Mitra A, Sharma S, Roy S (2018) Estimation of air quality index from seasonal trends using deep. Int Conf Artif Neural Netw 2018:511–521. https://doi.org/10.1007/978-3-030-01424-7

    Article  Google Scholar 

  8. Ma J, Cheng JC, Lin C, Tan Y, Zhang J (2019) Improving air quality prediction accuracy at larger temporal resolutions using deep learning and transfer learning techniques. Atmos Environ 214:116885. https://doi.org/10.1016/j.atmosenv.2019.116885

    Article  Google Scholar 

  9. Pengfei Y, Juanjuan H, Xiaoming L, Kai Z (2018b) Industrial air pollution prediction using deep neural network. Commun Comput Inf Sci 951:173–185. https://doi.org/10.1007/978-981-13-2826-8_16

    Article  Google Scholar 

  10. Gulcehre C, Chandar S, Cho K, Bengio Y (2016) Dynamic neural Turing machine with soft and hard addressing schemes. arXiv Preprint arXiv:1607.00036

  11. Turing AM (1950) A quarterly review of psychology and philosophy I. Computing machinery and intelligence. Mind 59:433–460

    Article  MathSciNet  Google Scholar 

  12. Siegelmann HT, Sontag ED (1991) Turing computability with neural nets. Appl Math Lett 4(6):77–80. https://doi.org/10.1016/0893-9659(91)90080-F

    Article  MathSciNet  MATH  Google Scholar 

  13. Han W, Cha S, Ha H-J (2006) Method and apparatus for multi-layered video encoding and decoding. https://patents.google.com/patent/US20060120450A1/en

  14. Graves A, Wayne G, Danihelka I (2014) Neural Turing machines. arXiv preprint arXiv:1410.5401, pp 1–26

  15. Malekmohammadi Faradonbeh S, Safi-Esfahani F (2019) A review on neural Turing machine. https://arxiv.org/abs/1904.05061

  16. Baddeley A (1996) Working memory and executive control. Philos Trans R Soc Lond Ser B Biol Sci 351(1346):1397–1404. https://doi.org/10.1098/rstb.1996.0123

    Article  Google Scholar 

  17. Weston J, Bordes A, Chopra S, Rush AM, Van Merriënboer B, Joulin A, Mikolov T (2016). Towards AI-complete question answering: a set of prerequisite toy tasks. In: 4th International conference on learning representations, ICLR 2016—conference track proceedings. https://arxiv.org/abs/1502.05698

  18. Graves A, Wayne G, Reynolds M, Harley T, Danihelka I, Grabska-Barwińska A, Colmenarejo SG, Grefenstette E, Ramalho T, Agapiou J, Badia AP (2016) Hybrid computing using a neural network with dynamic external memory. Nature 538(7626):471–476. https://doi.org/10.1038/nature20101

    Article  Google Scholar 

  19. Yang G, Rush AM (2019) Lie-access neural Turing machines. In: 5th International conference on learning representations, ICLR 2017—conference track proceedings. http://arxiv.org/abs/1602.08671

  20. Zaremba W, Sutskever I (2015) Reinforcement learning neural Turing machines—revised. arXiv Preprint arXiv:1505.00521

  21. Greve RB, Jacobsen EJ, Risi S (2016) Evolving neural Turing machines for reward-based learning. In: GECCO 2016—proceedings of the 2016 genetic and evolutionary computation conference, pp 117–124. https://doi.org/10.1145/2908812.2908930

  22. Stanley KO, Miikkulainen R (2002) Evolving neural networks through augmenting topologies. Evol Comput 10(2):99–127

    Article  Google Scholar 

  23. Stein G, Gonzalez AJ, Barham C (2013) Machines that learn and teach seamlessly. IEEE Trans Learn Technol 6(4):389–402. https://doi.org/10.1109/TLT.2013.32

    Article  Google Scholar 

  24. Zhao J, Peng G (2011) NEAT versus PSO for evolving autonomous multi-agent coordination on pursuit-evasion problem. Lecture Notes in Electrical Engineering, vol 2, 711–717

  25. Verbancsics P, Harguess J (2013) Generative NeuroEvolution for deep learning. arXiv Preprint arXiv:1312.5355

  26. Lin B, Zhu J (2018) Changes in urban air quality during urbanization in China. J Clean Prod 188:312–321. https://doi.org/10.1016/j.jclepro.2018.03.293

    Article  Google Scholar 

  27. Bekkar A, Hssina B, Douzi S, Douzi K (2021) Air-pollution prediction in smart city, deep learning approach. J Big Data 8(1):1–21. https://doi.org/10.1186/s40537-021-00548-1

    Article  Google Scholar 

  28. Akhtar A, Masood S, Gupta C, Masood A (2018) Prediction and analysis of pollution levels in Delhi using multilayer perceptron. Adv Intell Syst Comput 542:563–572. https://doi.org/10.1007/978-981-10-3223-3_54

    Article  Google Scholar 

  29. Raturi R, Prasad JR (2018) Recognition of future air quality index using artificial neural network. Int Res J Eng Technol (IRJET) 5:2395–0056

    Google Scholar 

  30. Wang J, Zhang X, Guo Z, Lu H (2017) Developing an early-warning system for air quality prediction and assessment of cities in China. Expert Syst Appl 84:102–116. https://doi.org/10.1016/j.eswa.2017.04.059

    Article  Google Scholar 

  31. Kök I, Şimşek MU, Özdemir S (2017) A deep learning model for air quality prediction in smart cities. In: Proceedings—2017 IEEE international conference on big data, big data 2017, 2018-Jan, 1983–1990

  32. Beheshti S, Khosroshahi K (2017) Study of the ability of various types of artificial neural networks in predicting the amount of CO, NO2, and SO2 pollutants in the metropolitan area of Tabriz. In: Fourth international conference on planning and management

  33. Shams R, World A (2017) Assessing the accuracy of multiple regression model in forecasting air quality index (AQI) in Tehran. Int Conf Res Civil Eng Urban Manage Environ. https://civilica.com/doc/711061/

  34. Zangouei H, Asdaleh F (2017) Prediction of PM10 contamination in Mashhad using MLP artificial neural networks and Markov chain model. J Appl Res Geogr Sci 17(47):39–59. https://iranjournals.nlai.ir/handle/123456789/578038

  35. Farhadi R, Hadavifar M (2016) Prediction of air pollutant concentrations in Tehran based on climatic factors using artificial neural network. In: National conference on research and technology findings in natural and agricultural ecosystems

  36. Li X, Peng L, Hu Y, Shao J, Chi T (2016) Deep learning architecture for air quality predictions. Environ Sci Pollut Res 23(22):22408–22417. https://doi.org/10.1007/s11356-016-7812-9

    Article  Google Scholar 

  37. Kellman P, Hansen MS (2014) T1-mapping in the heart: accuracy and precision. J Cardiovasc Magn Reson 16(1):1–20. https://doi.org/10.1186/1532-429X-16-2

    Article  Google Scholar 

  38. Azevedo A, Santos MF (2008) DD, SEMMA and CRISP-DM: a parallel overview. In: MCCSIS'08—IADIS multi conference on computer science and information systems; proceedings of informatics 2008 and data mining 2008, pp 182–185. https://recipp.ipp.pt/handle/10400.22/136

  39. Castillo Esparcia A, López Gómez S (2021) Public opinion about climate change in United States, partisan view and media coverage of the 2019 United Nations climate change conference (COP 25) in Madrid. Sustainability 13(7):3926. https://doi.org/10.3390/su13073926

    Article  Google Scholar 

  40. Organización de las Naciones Unidas. (2018). World urbanization prospects 2018. In: Department of economic and social affairs. World Population Prospects 2018

  41. Hosseini V, Shahbazi H (2016) Urban air pollution in Iran. Iran Stud 49(6):1029–1046. https://doi.org/10.1080/00210862.2016.1241587

    Article  Google Scholar 

  42. Nazmfar H, Saredeh A, Eshgi A, Feizizadeh B (2019) Vulnerability evaluation of urban buildings to various earthquake intensities: a case study of the municipal zone 9 of Tehran. Hum Ecol Risk Assess Int J 25(1–2):455–474. https://doi.org/10.1080/10807039.2018.1556086

    Article  Google Scholar 

  43. Vallero D (2014) Fundamentals of air pollution—Daniel Vallero—Google Books. Academic Press

    Google Scholar 

  44. Mohammadpour R, Asaie Z, Shojaeian MR, Sadeghzadeh M (2018) A hybrid of ANN and CLA to predict rainfall. Arab J Geosci. https://doi.org/10.1007/s12517-018-3804-z

    Article  Google Scholar 

  45. Brownlee J (2018) How to develop LSTM models for time series forecasting. Mach Learn Mastery 14:1–77

    Google Scholar 

  46. Zocca V, Spacagna G, Slater D, Roelants P (2017) Python deep learning—Google Books. Packt Publishing

    Google Scholar 

  47. Brownlee J (2016) Deep learning with Python: develop deep learning models on Theano and TensorFlow using Keras in deep learning with Python

  48. Vasilev I, Slater D, Spacagna G, Roelants P, Zocca V (2019) Python deep learning: exploring deep learning techniques and neural network—Ivan Vasilev, Daniel Slater, Gianmario Spacagna, Peter Roelants Valentino Zocca—Google Books. Packt Publishing

    Google Scholar 

  49. Brownlee J (2020) Deep learning with Python: develop deep learning models on Theano and Jason Brownlee—Google Sách

  50. Hossain E, Shariff MAU, Hossain MS, Andersson K (2020) A novel deep learning approach to predict air quality index. In: Proceedings of international conference on trends in computational and cognitive engineering, pp 367–381

  51. Jamal A, Nodehi RN (2017) Predicting air quality index based on meteorological data: a comparison of regression analysis, artificial neural networks and decision tree. J Air Pollut Health 2(1)

  52. Wu Q, Lin H (2019) A novel optimal-hybrid model for daily air quality index prediction considering air pollutant factors. Sci Total Environ 683:808–821. https://doi.org/10.1016/j.scitotenv.2019.05.288

    Article  Google Scholar 

  53. Battan LJ (1979) Fundamentals of meteorology. Fundam Meteorol. https://doi.org/10.1007/978-3-030-52655-9

    Article  Google Scholar 

  54. Jassim MS, Coskuner G (2017) Assessment of spatial variations of particulate matter (PM10 and PM2.5) in Bahrain identified by air quality index (AQI). Arab J Geosci 10(1):1–14. https://doi.org/10.1007/s12517-016-2808-9

    Article  Google Scholar 

  55. Hochreiter S, Urgen Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):17351780

    Article  Google Scholar 

  56. Kim P (2017) Machine learning. MATLAB Deep Learning 130:1–18. https://doi.org/10.1007/978-1-4842-2845-6_1

    Article  Google Scholar 

  57. Boloukian B, Safi-Esfahan F (2020) Recognition of words from brain-generated signals of speech-impaired people: application of autoencoders as a neural Turing machine controller in deep neural networks. Neural Netw 121:186–207. https://doi.org/10.1016/j.neunet.2019.07.012

    Article  Google Scholar 

  58. Gers FA, Schraudolph NN, Schmidhuber J (2003) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3(1):115–143. https://doi.org/10.1162/153244303768966139

    Article  MathSciNet  MATH  Google Scholar 

  59. Mohammadi M, Al-Fuqaha A, Guizani M, Oh JS (2018) Semisupervised deep reinforcement learning in support of IoT and Smart City services. IEEE Internet Things J 5(2):624–635. https://doi.org/10.1109/JIOT.2017.2712560

    Article  Google Scholar 

  60. Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning. arXiv Preprint arXiv:1506.00019

  61. Yao K, Peng B, Zhang Y, Yu D, Zweig G, Shi Y (2014) Spoken language understanding using long short-term memory neural networks. In: 2014 IEEE Spoken Language Technology Workshop (SLT), pp. 189–194

  62. Gulcehre C, Chandar S, Cho K, Bengio Y (2018) Dynamic neural Turing machine with continuous and discrete addressing schemes. Neural Comput 30(4):857–884. https://doi.org/10.1162/NECO_a_01060

    Article  MathSciNet  MATH  Google Scholar 

  63. Graves A, Mohamed A, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing, pp 6645–6649

  64. Ji S, Xu W, Yang M, Yu K (2013) 3D Convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231. https://doi.org/10.1109/TPAMI.2012.59

    Article  Google Scholar 

  65. Liu Y, Zheng H, Feng X, Chen Z (2017) Short-term traffic flow prediction with Conv-LSTM. In: 2017 9th international conference on wireless communications and signal processing, WCSP 2017—proceedings, 2017-Jan, 1–6

  66. Faradonbe SM, Safi-Esfahani F (2020) A classifier task based on neural Turing machine and particle swarm algorithm. Neurocomputing 396:133–152. https://doi.org/10.1016/j.neucom.2018.07.097

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Faramraz Safi-Esfahani.

Ethics declarations

Conflict of interest

There are no conflicts of interest to disclose for this submission titled “DAerosol-NTM: Applying deep learning and neural Turing machine in aerosol prediction.”

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: research concepts

Appendix A: research concepts

This section addresses the theoretical and technical nuances and specialized terms used in the study and the advancements of DAerosol-NTM, including the AQI, deep learning (DL), deep short-term neural networks (DSTNN), and the neural Turing machine. For direct remarks on the materials and methods and the results of this paper, please visit the segments following “Research Concepts” section.

1.1 AQI index

AQI is an air quality indicator that reflects and evaluates the air quality status. Although the AQI scale is continuous, different descriptive categories have been implemented to ease public communication, as given in Table

Table 25 Different levels (AQI) [54]

25 [3, 54].

One crucial factor to note about AQI is its calculation method. The concentrations are independently measured using six reported pollutant parameters (SO2, NO2, PM2.5, PM10, O3, and CO). Any pollutant's highest value is taken as the AQI value [4]. For this reason, the calculation of AQI is driven by a single parameter, and the accurate prediction of all six pollutants is essential. In most urban areas, PM2.5 and PM10 are the leading drivers of AQI calculations and the dominating reason behind pollution and erosion.

1.2 Deep learning

Deep learning is machine learning, but it functions more similarly to the human brain in a deeper and more advanced form. In other words, deep learning is part of a more prominent family of machine learning that focuses on methods based on artificial neural networks. This type of learning is an essential element in data science that receives raw inputs and extracts high-level features in several layers, including statistics and forecasting modeling. In-depth learning is very beneficial for data scientists to collect, analyze, and interpret large amounts of data. In general, it makes the process faster and easier. In other words, deep learning models process data more accurately and quicker due to the complexity and high ability to learn, especially in big data in research fields such as image processing, pattern recognition, and computer vision [55]. Deep learning is a powerful machine learning method that provides approximation, classification, and predictability [57,58,59] and [56].

See Fig. 

Fig. 17
figure 17

The difference between machine learning and deep learning 6 and [56]

17.

1.3 Deep short-term neural network

The LSTM architecture [55] works better than conventional neural networks for long-term tasks [58], including a deep and short-term neural network. Thus, the lower the short-term neural network output sequence, the higher the short-term neural network input sequence [61]. The deep LSTM architecture used in Refs. [7, 63] can be described by (\(\sigma (x)\), \(h_{t}^{l}\), \(i_{t}^{l}\), \(f_{t}^{l}\), \(s_{t}^{l}\), \(o_{t}^{l}\)), where \(l\) represents the layer index followed by Eqs. 49. Equation 10 defines the Softmax activity function for a simple N-dimensional problem.

$$\sigma (x) = \frac{1}{1 + \exp ( - x)}$$
(4)
$$h_{t}^{l} = o_{t}^{l} \tanh (S_{t}^{l} )$$
(5)
$$i_{t}^{l} = \sigma (W_{i}^{l} [X_{t} ;h_{t - 1}^{l} ;h_{t}^{l - 1} ] + b_{i}^{l} )$$
(6)
$$f_{t}^{l} = \sigma (W_{f}^{l} [X_{t} ;h_{t - 1}^{l} ;h_{t}^{l - 1} ] + b_{f}^{l} )$$
(7)
$$s_{t}^{l} = f_{t}^{l} s_{t - 1}^{l} + i_{t}^{l} \tanh (W_{s}^{l} [X_{t} ;h_{t - 1}^{l} ;h_{t}^{l - 1} ] + b_{s}^{l} )$$
(8)
$$o_{t}^{l} = \sigma (W_{o}^{l} [X_{t} ;h_{t - 1}^{l} ;h_{t}^{l - 1} ] + b_{o}^{l} )$$
(9)
$$S_{N} = \left\{ {a \in R^{N} :a_{i} \in [0,1],\sum \sum\nolimits_{{}}^{N} {a_{i} } = 1} \right\}$$
(10)

1.4 CNN-DLSTM

The use of classical CNN architecture is the best choice when input networks are 2-D or 3-D tensors like images or videos [64]. Since LSTMs architectures were more adapted for 1-D data, a new variant of LSTM called convolutional LSTM or ConvLSTM [65] is designed. In this architecture, the LSTM cell, which contains a convolution operation and input dimension of data, is kept in the output layer instead of just a 1-D vector. A convolution operation has replaced matrix multiplication at each gate of classical LSTM. The ConvLSTM architecture applies the capabilities of CNN and LSTM neural networks. It is normally developed for 2-D spatiotemporal data such as satellite images. In the first part of this model, convolutional layers extract essential features of the input data, and the results were flattened in a 1-D tensor so that they can use as input for the second part of the model (LSTM). Finally, before passing data in the last hidden layer, information has to be reshaped in the original form of input data. The architecture of CNN-LSTM is shown in Fig. 

Fig. 18
figure 18

Architecture of the CNN-LSTM model [27]

18.

1.5 Neural Turing machine

It is a method derived from the Turing machine and neural networks. This model consists of recurrent neural networks (RNNs) [60] with an addressable external memory along with the ability of the recursive neural networks to perform algorithmic tasks such as sorting, copying, and N-gram. Generally speaking, a memory bank, a controller, and read and write heads are the main components of this method. The controller's job is to receive data from the outside and generate outputs during the update cycle. In addition, the neural Turing machine method guides the read and write heads directly into the external memory in the form of a tape [57, 66]. Figure 

Fig. 19
figure 19

Structure of the neural Turing machine A in general and B in part [10] and [62]

19 shows the structure of the neural Turing machine in general in section (A) and its components in section (B) [10] and [62].

1.5.1 The actions that the controller performs through the heads

The controller consists of five main functions: read, write, erase, move to the next memory cell, and move to the previous one. During each update cycle, the network controller receives inputs from the external environment and publishes the outputs in response. The network also starts reading and writing from a memory matrix using parallel read and write heads. The dotted line in Fig. 19A shows the division between the NTM circuit and the outside world. Each part of the structure is recognizable and distinct. This feature makes the network easier to train with gradient descent by defining “blurry or unclear” reading and writing operations. It also communicates more or less with all memory elements (instead of considering one element as a typical Turing machine or digital computer). A “focus” mechanism determines the degree of blur.

Each operation forces the read/write heads to communicate with the small memory and ignore the rest. Because the interaction with memory is low and fragmented, NTM is based on data storage without interference. The output of the heads determines which memory location gets the most attention. These outputs define a normalized weighting on the rows of the memory matrix (referring to memory locations). At each assigned weight, each read/write head establishes the degree to which the head reads/writes in each area. Each head focuses precisely on a single memory location or multiple memory locations. Figure 

Fig. 20
figure 20

How to access external memory [66]

20 displays the conceptual model of the controller's actions through the heads.

1.5.2 Reading and writing

The read and write operations are normalized weighting functions over the memory locations, similar to attention mechanisms. These weightings define a continuous distribution over the memory locations to make the operation differentiable. The reading operation is a simple linear combination of the memory locations:

Figure 

Fig. 21
figure 21

Read and write operation in NTM [66]

21 shows the read and write operation in NTM [66] in sections (A) and (B).

Figure 21A shows the Nth element of Wt(i) according to the limits of Eq. 11:

$$\sum\limits_{i = 0} {W_{t} (i)} = 1 \to 0 \le W_{t} (i) \le 1,\forall i$$
(11)

Mt is the contents of the M*N memory matrix at time t, where N is the number of memory locations and M is the size of the vector in each location. W is the vector of weights on the N locations propagated by a head reading at time t since all weights are normalized.

The read vector is a weighted convex combination of the memory location. The length M reads the return rt vector by the head defined as a combination of row vectors Mt(i) in memory, as shown in Eq. 12, separable in terms of memory and weight.

$$r_{t} \xleftarrow{{}}\sum\limits_{{}} {w_{i} } (i)M_{t} (i)$$
(12)

Figure 21B shows that the writing operation is a convex combination of erasing and writing to the memory locations. The write head outputs both erase (e) and add (a) vectors. Writing to the memory will then be made by erasing the locations defined by the write weighting vector and adding the locations specified by the same weighting vector. Again, notice that erasing and writing locations in different proportions make the operation differentiable, where parts of memory are erased according to the weighting vector, as shown in Eq. 13:

$$M^{\prime }_{t} (i) \leftarrow M_{t - 1(i)} [1 - w_{t} (i)e_{{t_{t} }} ]$$
(13)

Multiplying it in the memory location works as a peer-to-peer. Therefore, the elements of the memory location are zeroed where the weight is and the clearing element is 1. If the weight or clearance element is zero, the memory remains unchanged. When there are multiple writing heads, cleansing can be done in any order, where new information is added to locations defined by the weightings, as shown in Eq. 14:

$$M\mathop {_{t} }\limits^{{}} (i) \leftarrow M^{\prime}_{t} (i) + W_{t} (i)a_{t}$$
(14)

The combine, clear, and add operations produce the final memory content at time t. Because delete and add are different, compound writing operations are also distinguishable. Note that both addition and subtraction vectors have M-independent components. Again, adding the sequence of vectors by multiple heads is also trivial. It allows precise control over the modified elements in each memory location.

1.5.3 Addressing mechanism

In the previous section, the equations of reading and writing were shown and examined. However, no explanation was given on how the weights are produced. These weights are created by combining two addressing mechanisms with complementary features. The first mechanism is content-based addressing, which looks at locations based on the similarity between current values and values published by the controller that addresses the content of Hopfield networks. The advantage of content-based addressing is that the retrieval is easy. It only needs a controller to estimate a portion of the stored data to compare it to memory for the accurate stored value. However, content-based addressing is not suitable for solving all problems. In some works, the content of a variable is as desired [57]. However, the variable still needs a recognizable name or address. Computational problems are as follows: Variables X and Y can take both values, but the \(f(x,y) = x \times y\) trend must still be defined. A controller initializes the x and y variables, stores them in different addresses, retrieves them, and performs a multiplication algorithm. In this case, the variables are addressable by location, not content. This form of addressing is called location-based addressing. Content-based addressing is more commonly used than location-based addressing, as the content of a memory location can contain location information within it. However, it is necessary to provide location-based addressing as a primary operation for some generalized forms in experiments. As a result, both mechanisms are used together. The flow diagram of the addressing mechanism indicates the sequence of operations to construct a weight vector while reading and writing [11] (see Fig.

Fig. 22
figure 22

Flow diagram of the addressing mechanism [10] and [62]

22).

1.5.4 Network controller

The NTM structure is described in the section. It has several free parameters: memory size, number of read/write heads, and allowable location changes. Perhaps, the most important structure is the type of neural network used as the controller, especially when deciding whether to use a recurrent or a feedforward neural network (FNN). A recursive controller such as LSTM has internal memory to complete a larger matrix. Suppose a central processing unit controller on a digital computer (albeit with adaptive instructions instead of predefined ones) is compared to the RAM matrix. In that case, the hidden activities of the recursive controller are similar to the processor's registers. They allow the controller to combine information throughout the various time stages of the operation.

On the other hand, a feedforward controller mimics an RNN network by reading and writing to the specific memory location at each step. In addition, the feedforward controllers often give more transparency to network operations, because the read and write patterns in the memory matrix are usually easier to interpret than the internal state of an RNN. However, one of the limitations of a feedforward controller is that the number of read/write heads simultaneously causes a constraint (limiting operation) on the type of computation that NTM can perform. With a single read head, only a single conversion can be performed on a single memory vector at each time step. With two reading heads, it can do binary conversions, etc. Recurrent controllers can store reading vectors from previous efforts, so they do not suffer from this limitation.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Asaei-Moamam, ZS., Safi-Esfahani, F., Mirjalili, S. et al. DAerosol-NTM: applying deep learning and neural Turing machine in aerosol prediction. Neural Comput & Applic 35, 24123–24159 (2023). https://doi.org/10.1007/s00521-023-08868-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08868-4

Keywords

Navigation