How NLP and Visual Analytics Can Improve Asset Management

Santos, Pedro; Pato, Matilde P. M.; Datia, Nuno; Sobral, José

doi:10.1007/978-3-031-46549-9_15

Pedro Santos⁷,
Matilde P. M. Pato⁸,
Nuno Datia⁹ &
…
José Sobral¹⁰

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1126))

48 Accesses

Abstract

In Asset Management, maintenance work is mainly reported through Work Orders (WO), which are technical documents that specify the asset to be repaired. These works are described in free text, with no imposed structure or fixed vocabulary, which makes them difficult to analyse automatically. This challenge becomes more significant as the number and variety of assets increase. This study presents the use of Natural Language Processing (NLP) to automate the processing of work order descriptions. NLP algorithms can summarise large amounts of text into concise summaries. To understand the text corpus and effectively communicate the results to understand the underlying semantic patterns better, two well-known Word Embeddings (WE) models, Word2Vec and Fasttext, capture the semantic and syntactic relationships between words. By reducing the dimensions of the encoded vectors, it becomes possible to explore a 3D vector space through vector visualisation interactively. The results show that the Fasttext approach outperforms Word2Vec in capturing semantic information. This allows the development of machine learning algorithms to summarise a work order using a small set of predefined words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Available at https://www.ncbi.nlm.nih.gov/pmc/.
2.
Available at https://opennlp.apache.org/.
3.
Available at https://stanfordnlp.github.io/CoreNLP/.
4.
Available at https://www.nltk.org/.
5.
Available at https://spacy.io/.
6.
Available at https://stanfordnlp.github.io/stanza/.
7.
A subfamily is a domain-specific second-level classification.

References

Albawi S, Mohammed TA, Al-Zawi S (2017) Understanding of a convolutional neural network. In: 2017 international conference on engineering and technology (ICET). IEEE, pp 1–6
Google Scholar
Bhoir S, Ghorpade T, Mane V (2017) Comparative analysis of different word embedding models. In: 2017 international conference on advances in computing, communication and control (ICAC3). IEEE, pp 1–4
Google Scholar
Bojanowski P, Celebi O, Mikolov T, Grave E, Joulin A (2019) Updating pre-trained word vectors and text classifiers using monolingual alignment. arXiv:1910.06241
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146
Article Google Scholar
Bouabdallaoui Y, Lafhaj Z, Yim P, Ducoulombier L, Bennadji B (2020) Natural language processing model for managing maintenance requests in buildings. Buildings 10(9):160
Article Google Scholar
Brundage MP, Sexton T, Hodkiewicz M, Dima A, Lukens S (2021) Technical language processing: unlocking maintenance knowledge. Manuf Lett 27:42–46
Google Scholar
Chandrasekaran D, Mago V (2021) Evolution of semantic similarity—a survey. ACM Comput Surv (CSUR) 54(2):1–37
Article Google Scholar
Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. arXiv:1301.3781
Dima A, Lukens S, Hodkiewicz M, Sexton T, Brundage MP (2021) Adapting natural language processing for technical text. Appl AI Lett 2(3):e33
Google Scholar
Ganguly D, Roy D, Mitra M, Jones GJ (2015) Word embedding based generalized language model for information retrieval. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pp 795–798
Google Scholar
Hartigan JA, Wong MA (1979) A k-means clustering algorithm. Appl Stat 28(1):100–108
Google Scholar
Huang X, Zhang J, Li D, Li P (2019) Knowledge graph embedding based question answering. In: Proceedings of the twelfth ACM international conference on web search and data mining, pp 105–113
Google Scholar
Idowu S, Strüber D, Berger T (2021) Asset management in machine learning: a survey. In: 2021 IEEE/ACM 43rd international conference on software engineering: software engineering in practice (ICSE-SEIP), pp 51–60
Google Scholar
Jolliffe IT, Cadima J (2016) Principal component analysis: a review and recent developments. Philosophical transactions of the royal society a: mathematical, physical and engineering sciences 374(2065):20150202
Google Scholar
Katricheva N, Yaskevich A, Lisitsina A, Zhordaniya T, Kutuzov A, Kuzmenko E (2020) Vec2graph: a python library for visualizing word embeddings as graphs, vol 02. Springer, Berlin, pp 190–198
Google Scholar
Kaur J, Buttar PK (2018) A systematic review on stopword removal algorithms. Int J Futur Revolut Comput Sci Commun Eng 4(4):207–210
Google Scholar
Kovalerchuk B, Andonie R, Datia N, Nazemi K, Banissi E (2022) Visual knowledge discovery with artificial intelligence: challenges and future directions. In: Integrating artificial intelligence and visualization for visual knowledge discovery. Springer, Berlin, pp 1–27
Google Scholar
Lebret R, Collobert R (2013) Word emdeddings through hellinger PCA. arXiv:1312.5542
Li Y, Yang T (2018) Word embedding for understanding natural language: a survey. In: Guide to big data applications. Springer, Berlin, pp 83–104
Google Scholar
Liu S, Bremer P-T, Thiagarajan JJ, Srikumar V, Wang B, Livnat Y, Pascucci V (2018) Visual exploration of semantic relationships in neural word embeddings. IEEE Trans Visual Comput Graphics 24(1):553–562
Article Google Scholar
Maneewongvatana S, Mount DM (1999) Analysis of approximate nearest neighbor searching with clustered point sets. arXiv:cs/9901013
Manning CD, Surdeanu M, Bauer J, Finkel JR, Bethard S, McClosky D (2014) The stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pp 55–60
Google Scholar
McInnes L, Healy J, Melville J (2018) UMAP: uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
Mikolov T, Yih WT, Zweig G (2013) linguistic regularities in continuous space word representations. In: Proceedings of the 2013 conference of the north American chapter of the association for computational linguistics: human language technologies, pp 746–751
Google Scholar
Oubenali N, Messaoud S, Filiot A, Lamer A, Andrey P (2022) Visualization of medical concepts represented using word embeddings: a scoping review. BMC Med Inform Decis Mak 22:03
Google Scholar
Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Google Scholar
Qi P, Zhang Y, Zhang Y, Bolton J, Manning CD (2020) Stanza: a python natural language processing toolkit for many human languages. arXiv:2003.07082
Qi Y, Sachan DS, Felix M, Padmanabhan SJ, Neubig G (2018) When and why are pre-trained word embeddings useful for neural machine translation? arXiv:1804.06323
Ren M, Kiros R, Zemel R (2015) Exploring models and data for image question answering. Adv Neural Inf Process Syst 28
Google Scholar
Rodriguez PL, Spirling A (2022) Word embeddings: what works, what doesn’t, and how to tell the difference for applied research. J Polit 84(1):101–115
Google Scholar
Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. arXiv:1509.00685
Santos P, Datia N, Pato M, Sobral J (2022) Comparing word embeddings through visualisation. In: 2022 26th international conference information visualisation (IV). IEEE, pp 91–97
Google Scholar
Schnabel T, Labutov I, Mimno D, Joachims T (2015) Evaluation methods for unsupervised word embeddings. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 298–307
Google Scholar
Stein RA, Jaques PA, Valiati JF (2019) An analysis of hierarchical text classification using word embeddings. Inf Sci 471:216–232
Google Scholar
Stenström C, Al-Jumaili M, Parida A (2015) Natural language processing of maintenance records data. Int J COMADEM 18(2):33–37
Google Scholar
Tang D, Wei F, Yang N, Zhou M, Liu T, Qin B (2014) Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, pp 1555–1565
Google Scholar
Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11)
Google Scholar
Vanier DJ (2001) Asset management A to Z. In: Innovations in urban infrastructure, pp 1–16
Google Scholar
Voutilainen A (2003) Part-of-speech tagging. In: The oxford handbook of computational linguistics, pp 219–232
Google Scholar
Wang Y, Liu S, Afzal N, Rastegar-Mojarad M, Wang L, Shen F, Kingsbury P, Liu H (2018) A comparison of word embeddings for the biomedical natural language processing. J Biomed Inform 87:12–20
Article Google Scholar
Wang Y, Wang L, Rastegar-Mojarad M, Moon S, Shen F, Afzal N, Liu S, Zeng Y, Mehrabi S, Sohn S, Liu H (2018) Clinical information extraction applications: a literature review. J Biomed Inform 77:34–49
Article Google Scholar
Zhang L, Wang S, Liu B (2018) Deep learning for sentiment analysis: a survey. Wiley Interdiscip Rev: Data Min Knowl Discov 8(4):e1253
Google Scholar
Zhao R, Mao K (2017) Fuzzy bag-of-words model for document representation. IEEE Trans Fuzzy Syst 26(2):794–804
Article Google Scholar

Download references

Acknowledgements

This work was supported by the REV@CONSTRUCTION mobiliser project, under the grant LISBOA-01-0247-FEDER-046123 from ANI - National Innovation Agency, and by NOVA LINCS (UIDB/04516/2020) and LASIGE (UIDB/00408/2020) with financial support from FCT— Fundação para a Ciência e a Tecnologia, through national funds. This work contributes to the Strategic Research Plan of the Centre for Marine Technology and Ocean Engineering (CENTEC), which is financed by FCT under contract UIDB/UIDP/00134/2020.

Author information

Authors and Affiliations

ISEL - Lisbon School of Engineering, Politécnico de Lisboa, Lisbon, Portugal
Pedro Santos
NOVA LINCS & LAGIGE & Future Internet Technologies, ISEL - Lisbon School of Engineering, Politaécnico de Lisboa, Lisbon, Portugal
Matilde P. M. Pato
NOVA LINCS & Future Internet Technologies, ISEL - Lisbon School of Engineering, Politécnico de Lisboa, Lisbon, Portugal
Nuno Datia
ISEL - Lisbon School of Engineering, Politécnico de Lisboa, Portugal & Center for Marine Technology and Ocean Engineering (CENTEC), Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
José Sobral

Authors

Pedro Santos
View author publications
You can also search for this author in PubMed Google Scholar
Matilde P. M. Pato
View author publications
You can also search for this author in PubMed Google Scholar
Nuno Datia
View author publications
You can also search for this author in PubMed Google Scholar
José Sobral
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro Santos .

Editor information

Editors and Affiliations

Dept. of Computer Science, Central Washington University, Ellensburg, WA, USA
Boris Kovalerchuk
Darmstadt University of Applied Sciences, Darmstadt, Germany
Kawa Nazemi
Dept. of Computer Science, Central Washington University, Ellensburg, WA, USA
Răzvan Andonie
ISEL, Polytechnic Institute of Lisbon, Lisboa, Portugal
Nuno Datia
Department of Informatics, London South Bank University, London, UK
Ebad Bannissi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Santos, P., Pato, M.P.M., Datia, N., Sobral, J. (2024). How NLP and Visual Analytics Can Improve Asset Management. In: Kovalerchuk, B., Nazemi, K., Andonie, R., Datia, N., Bannissi, E. (eds) Artificial Intelligence and Visualization: Advancing Visual Knowledge Discovery. Studies in Computational Intelligence, vol 1126. Springer, Cham. https://doi.org/10.1007/978-3-031-46549-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-46549-9_15
Published: 25 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46548-2
Online ISBN: 978-3-031-46549-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

How NLP and Visual Analytics Can Improve Asset Management