Abstract
In Asset Management, maintenance work is mainly reported through Work Orders (WO), which are technical documents that specify the asset to be repaired. These works are described in free text, with no imposed structure or fixed vocabulary, which makes them difficult to analyse automatically. This challenge becomes more significant as the number and variety of assets increase. This study presents the use of Natural Language Processing (NLP) to automate the processing of work order descriptions. NLP algorithms can summarise large amounts of text into concise summaries. To understand the text corpus and effectively communicate the results to understand the underlying semantic patterns better, two well-known Word Embeddings (WE) models, Word2Vec and Fasttext, capture the semantic and syntactic relationships between words. By reducing the dimensions of the encoded vectors, it becomes possible to explore a 3D vector space through vector visualisation interactively. The results show that the Fasttext approach outperforms Word2Vec in capturing semantic information. This allows the development of machine learning algorithms to summarise a work order using a small set of predefined words.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Available at https://www.ncbi.nlm.nih.gov/pmc/.
- 2.
Available at https://opennlp.apache.org/.
- 3.
Available at https://stanfordnlp.github.io/CoreNLP/.
- 4.
Available at https://www.nltk.org/.
- 5.
Available at https://spacy.io/.
- 6.
Available at https://stanfordnlp.github.io/stanza/.
- 7.
A subfamily is a domain-specific second-level classification.
References
Albawi S, Mohammed TA, Al-Zawi S (2017) Understanding of a convolutional neural network. In: 2017 international conference on engineering and technology (ICET). IEEE, pp 1–6
Bhoir S, Ghorpade T, Mane V (2017) Comparative analysis of different word embedding models. In: 2017 international conference on advances in computing, communication and control (ICAC3). IEEE, pp 1–4
Bojanowski P, Celebi O, Mikolov T, Grave E, Joulin A (2019) Updating pre-trained word vectors and text classifiers using monolingual alignment. arXiv:1910.06241
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146
Bouabdallaoui Y, Lafhaj Z, Yim P, Ducoulombier L, Bennadji B (2020) Natural language processing model for managing maintenance requests in buildings. Buildings 10(9):160
Brundage MP, Sexton T, Hodkiewicz M, Dima A, Lukens S (2021) Technical language processing: unlocking maintenance knowledge. Manuf Lett 27:42–46
Chandrasekaran D, Mago V (2021) Evolution of semantic similarity—a survey. ACM Comput Surv (CSUR) 54(2):1–37
Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. arXiv:1301.3781
Dima A, Lukens S, Hodkiewicz M, Sexton T, Brundage MP (2021) Adapting natural language processing for technical text. Appl AI Lett 2(3):e33
Ganguly D, Roy D, Mitra M, Jones GJ (2015) Word embedding based generalized language model for information retrieval. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pp 795–798
Hartigan JA, Wong MA (1979) A k-means clustering algorithm. Appl Stat 28(1):100–108
Huang X, Zhang J, Li D, Li P (2019) Knowledge graph embedding based question answering. In: Proceedings of the twelfth ACM international conference on web search and data mining, pp 105–113
Idowu S, Strüber D, Berger T (2021) Asset management in machine learning: a survey. In: 2021 IEEE/ACM 43rd international conference on software engineering: software engineering in practice (ICSE-SEIP), pp 51–60
Jolliffe IT, Cadima J (2016) Principal component analysis: a review and recent developments. Philosophical transactions of the royal society a: mathematical, physical and engineering sciences 374(2065):20150202
Katricheva N, Yaskevich A, Lisitsina A, Zhordaniya T, Kutuzov A, Kuzmenko E (2020) Vec2graph: a python library for visualizing word embeddings as graphs, vol 02. Springer, Berlin, pp 190–198
Kaur J, Buttar PK (2018) A systematic review on stopword removal algorithms. Int J Futur Revolut Comput Sci Commun Eng 4(4):207–210
Kovalerchuk B, Andonie R, Datia N, Nazemi K, Banissi E (2022) Visual knowledge discovery with artificial intelligence: challenges and future directions. In: Integrating artificial intelligence and visualization for visual knowledge discovery. Springer, Berlin, pp 1–27
Lebret R, Collobert R (2013) Word emdeddings through hellinger PCA. arXiv:1312.5542
Li Y, Yang T (2018) Word embedding for understanding natural language: a survey. In: Guide to big data applications. Springer, Berlin, pp 83–104
Liu S, Bremer P-T, Thiagarajan JJ, Srikumar V, Wang B, Livnat Y, Pascucci V (2018) Visual exploration of semantic relationships in neural word embeddings. IEEE Trans Visual Comput Graphics 24(1):553–562
Maneewongvatana S, Mount DM (1999) Analysis of approximate nearest neighbor searching with clustered point sets. arXiv:cs/9901013
Manning CD, Surdeanu M, Bauer J, Finkel JR, Bethard S, McClosky D (2014) The stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pp 55–60
McInnes L, Healy J, Melville J (2018) UMAP: uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
Mikolov T, Yih WT, Zweig G (2013) linguistic regularities in continuous space word representations. In: Proceedings of the 2013 conference of the north American chapter of the association for computational linguistics: human language technologies, pp 746–751
Oubenali N, Messaoud S, Filiot A, Lamer A, Andrey P (2022) Visualization of medical concepts represented using word embeddings: a scoping review. BMC Med Inform Decis Mak 22:03
Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Qi P, Zhang Y, Zhang Y, Bolton J, Manning CD (2020) Stanza: a python natural language processing toolkit for many human languages. arXiv:2003.07082
Qi Y, Sachan DS, Felix M, Padmanabhan SJ, Neubig G (2018) When and why are pre-trained word embeddings useful for neural machine translation? arXiv:1804.06323
Ren M, Kiros R, Zemel R (2015) Exploring models and data for image question answering. Adv Neural Inf Process Syst 28
Rodriguez PL, Spirling A (2022) Word embeddings: what works, what doesn’t, and how to tell the difference for applied research. J Polit 84(1):101–115
Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. arXiv:1509.00685
Santos P, Datia N, Pato M, Sobral J (2022) Comparing word embeddings through visualisation. In: 2022 26th international conference information visualisation (IV). IEEE, pp 91–97
Schnabel T, Labutov I, Mimno D, Joachims T (2015) Evaluation methods for unsupervised word embeddings. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 298–307
Stein RA, Jaques PA, Valiati JF (2019) An analysis of hierarchical text classification using word embeddings. Inf Sci 471:216–232
Stenström C, Al-Jumaili M, Parida A (2015) Natural language processing of maintenance records data. Int J COMADEM 18(2):33–37
Tang D, Wei F, Yang N, Zhou M, Liu T, Qin B (2014) Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, pp 1555–1565
Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11)
Vanier DJ (2001) Asset management A to Z. In: Innovations in urban infrastructure, pp 1–16
Voutilainen A (2003) Part-of-speech tagging. In: The oxford handbook of computational linguistics, pp 219–232
Wang Y, Liu S, Afzal N, Rastegar-Mojarad M, Wang L, Shen F, Kingsbury P, Liu H (2018) A comparison of word embeddings for the biomedical natural language processing. J Biomed Inform 87:12–20
Wang Y, Wang L, Rastegar-Mojarad M, Moon S, Shen F, Afzal N, Liu S, Zeng Y, Mehrabi S, Sohn S, Liu H (2018) Clinical information extraction applications: a literature review. J Biomed Inform 77:34–49
Zhang L, Wang S, Liu B (2018) Deep learning for sentiment analysis: a survey. Wiley Interdiscip Rev: Data Min Knowl Discov 8(4):e1253
Zhao R, Mao K (2017) Fuzzy bag-of-words model for document representation. IEEE Trans Fuzzy Syst 26(2):794–804
Acknowledgements
This work was supported by the REV@CONSTRUCTION mobiliser project, under the grant LISBOA-01-0247-FEDER-046123 from ANI - National Innovation Agency, and by NOVA LINCS (UIDB/04516/2020) and LASIGE (UIDB/00408/2020) with financial support from FCT— Fundação para a Ciência e a Tecnologia, through national funds. This work contributes to the Strategic Research Plan of the Centre for Marine Technology and Ocean Engineering (CENTEC), which is financed by FCT under contract UIDB/UIDP/00134/2020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Santos, P., Pato, M.P.M., Datia, N., Sobral, J. (2024). How NLP and Visual Analytics Can Improve Asset Management. In: Kovalerchuk, B., Nazemi, K., Andonie, R., Datia, N., Bannissi, E. (eds) Artificial Intelligence and Visualization: Advancing Visual Knowledge Discovery. Studies in Computational Intelligence, vol 1126. Springer, Cham. https://doi.org/10.1007/978-3-031-46549-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-46549-9_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46548-2
Online ISBN: 978-3-031-46549-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)