Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1126))

  • 48 Accesses

Abstract

In Asset Management, maintenance work is mainly reported through Work Orders (WO), which are technical documents that specify the asset to be repaired. These works are described in free text, with no imposed structure or fixed vocabulary, which makes them difficult to analyse automatically. This challenge becomes more significant as the number and variety of assets increase. This study presents the use of Natural Language Processing (NLP) to automate the processing of work order descriptions. NLP algorithms can summarise large amounts of text into concise summaries. To understand the text corpus and effectively communicate the results to understand the underlying semantic patterns better, two well-known Word Embeddings (WE) models, Word2Vec and Fasttext, capture the semantic and syntactic relationships between words. By reducing the dimensions of the encoded vectors, it becomes possible to explore a 3D vector space through vector visualisation interactively. The results show that the Fasttext approach outperforms Word2Vec in capturing semantic information. This allows the development of machine learning algorithms to summarise a work order using a small set of predefined words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Available at https://www.ncbi.nlm.nih.gov/pmc/.

  2. 2.

    Available at https://opennlp.apache.org/.

  3. 3.

    Available at https://stanfordnlp.github.io/CoreNLP/.

  4. 4.

    Available at https://www.nltk.org/.

  5. 5.

    Available at https://spacy.io/.

  6. 6.

    Available at https://stanfordnlp.github.io/stanza/.

  7. 7.

    A subfamily is a domain-specific second-level classification.

References

  1. Albawi S, Mohammed TA, Al-Zawi S (2017) Understanding of a convolutional neural network. In: 2017 international conference on engineering and technology (ICET). IEEE, pp 1–6

    Google Scholar 

  2. Bhoir S, Ghorpade T, Mane V (2017) Comparative analysis of different word embedding models. In: 2017 international conference on advances in computing, communication and control (ICAC3). IEEE, pp 1–4

    Google Scholar 

  3. Bojanowski P, Celebi O, Mikolov T, Grave E, Joulin A (2019) Updating pre-trained word vectors and text classifiers using monolingual alignment. arXiv:1910.06241

  4. Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146

    Article  Google Scholar 

  5. Bouabdallaoui Y, Lafhaj Z, Yim P, Ducoulombier L, Bennadji B (2020) Natural language processing model for managing maintenance requests in buildings. Buildings 10(9):160

    Article  Google Scholar 

  6. Brundage MP, Sexton T, Hodkiewicz M, Dima A, Lukens S (2021) Technical language processing: unlocking maintenance knowledge. Manuf Lett 27:42–46

    Google Scholar 

  7. Chandrasekaran D, Mago V (2021) Evolution of semantic similarity—a survey. ACM Comput Surv (CSUR) 54(2):1–37

    Article  Google Scholar 

  8. Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. arXiv:1301.3781

  9. Dima A, Lukens S, Hodkiewicz M, Sexton T, Brundage MP (2021) Adapting natural language processing for technical text. Appl AI Lett 2(3):e33

    Google Scholar 

  10. Ganguly D, Roy D, Mitra M, Jones GJ (2015) Word embedding based generalized language model for information retrieval. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pp 795–798

    Google Scholar 

  11. Hartigan JA, Wong MA (1979) A k-means clustering algorithm. Appl Stat 28(1):100–108

    Google Scholar 

  12. Huang X, Zhang J, Li D, Li P (2019) Knowledge graph embedding based question answering. In: Proceedings of the twelfth ACM international conference on web search and data mining, pp 105–113

    Google Scholar 

  13. Idowu S, Strüber D, Berger T (2021) Asset management in machine learning: a survey. In: 2021 IEEE/ACM 43rd international conference on software engineering: software engineering in practice (ICSE-SEIP), pp 51–60

    Google Scholar 

  14. Jolliffe IT, Cadima J (2016) Principal component analysis: a review and recent developments. Philosophical transactions of the royal society a: mathematical, physical and engineering sciences 374(2065):20150202

    Google Scholar 

  15. Katricheva N, Yaskevich A, Lisitsina A, Zhordaniya T, Kutuzov A, Kuzmenko E (2020) Vec2graph: a python library for visualizing word embeddings as graphs, vol 02. Springer, Berlin, pp 190–198

    Google Scholar 

  16. Kaur J, Buttar PK (2018) A systematic review on stopword removal algorithms. Int J Futur Revolut Comput Sci Commun Eng 4(4):207–210

    Google Scholar 

  17. Kovalerchuk B, Andonie R, Datia N, Nazemi K, Banissi E (2022) Visual knowledge discovery with artificial intelligence: challenges and future directions. In: Integrating artificial intelligence and visualization for visual knowledge discovery. Springer, Berlin, pp 1–27

    Google Scholar 

  18. Lebret R, Collobert R (2013) Word emdeddings through hellinger PCA. arXiv:1312.5542

  19. Li Y, Yang T (2018) Word embedding for understanding natural language: a survey. In: Guide to big data applications. Springer, Berlin, pp 83–104

    Google Scholar 

  20. Liu S, Bremer P-T, Thiagarajan JJ, Srikumar V, Wang B, Livnat Y, Pascucci V (2018) Visual exploration of semantic relationships in neural word embeddings. IEEE Trans Visual Comput Graphics 24(1):553–562

    Article  Google Scholar 

  21. Maneewongvatana S, Mount DM (1999) Analysis of approximate nearest neighbor searching with clustered point sets. arXiv:cs/9901013

  22. Manning CD, Surdeanu M, Bauer J, Finkel JR, Bethard S, McClosky D (2014) The stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pp 55–60

    Google Scholar 

  23. McInnes L, Healy J, Melville J (2018) UMAP: uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426

  24. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781

  25. Mikolov T, Yih WT, Zweig G (2013) linguistic regularities in continuous space word representations. In: Proceedings of the 2013 conference of the north American chapter of the association for computational linguistics: human language technologies, pp 746–751

    Google Scholar 

  26. Oubenali N, Messaoud S, Filiot A, Lamer A, Andrey P (2022) Visualization of medical concepts represented using word embeddings: a scoping review. BMC Med Inform Decis Mak 22:03

    Google Scholar 

  27. Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

    Google Scholar 

  28. Qi P, Zhang Y, Zhang Y, Bolton J, Manning CD (2020) Stanza: a python natural language processing toolkit for many human languages. arXiv:2003.07082

  29. Qi Y, Sachan DS, Felix M, Padmanabhan SJ, Neubig G (2018) When and why are pre-trained word embeddings useful for neural machine translation? arXiv:1804.06323

  30. Ren M, Kiros R, Zemel R (2015) Exploring models and data for image question answering. Adv Neural Inf Process Syst 28

    Google Scholar 

  31. Rodriguez PL, Spirling A (2022) Word embeddings: what works, what doesn’t, and how to tell the difference for applied research. J Polit 84(1):101–115

    Google Scholar 

  32. Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. arXiv:1509.00685

  33. Santos P, Datia N, Pato M, Sobral J (2022) Comparing word embeddings through visualisation. In: 2022 26th international conference information visualisation (IV). IEEE, pp 91–97

    Google Scholar 

  34. Schnabel T, Labutov I, Mimno D, Joachims T (2015) Evaluation methods for unsupervised word embeddings. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 298–307

    Google Scholar 

  35. Stein RA, Jaques PA, Valiati JF (2019) An analysis of hierarchical text classification using word embeddings. Inf Sci 471:216–232

    Google Scholar 

  36. Stenström C, Al-Jumaili M, Parida A (2015) Natural language processing of maintenance records data. Int J COMADEM 18(2):33–37

    Google Scholar 

  37. Tang D, Wei F, Yang N, Zhou M, Liu T, Qin B (2014) Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, pp 1555–1565

    Google Scholar 

  38. Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11)

    Google Scholar 

  39. Vanier DJ (2001) Asset management A to Z. In: Innovations in urban infrastructure, pp 1–16

    Google Scholar 

  40. Voutilainen A (2003) Part-of-speech tagging. In: The oxford handbook of computational linguistics, pp 219–232

    Google Scholar 

  41. Wang Y, Liu S, Afzal N, Rastegar-Mojarad M, Wang L, Shen F, Kingsbury P, Liu H (2018) A comparison of word embeddings for the biomedical natural language processing. J Biomed Inform 87:12–20

    Article  Google Scholar 

  42. Wang Y, Wang L, Rastegar-Mojarad M, Moon S, Shen F, Afzal N, Liu S, Zeng Y, Mehrabi S, Sohn S, Liu H (2018) Clinical information extraction applications: a literature review. J Biomed Inform 77:34–49

    Article  Google Scholar 

  43. Zhang L, Wang S, Liu B (2018) Deep learning for sentiment analysis: a survey. Wiley Interdiscip Rev: Data Min Knowl Discov 8(4):e1253

    Google Scholar 

  44. Zhao R, Mao K (2017) Fuzzy bag-of-words model for document representation. IEEE Trans Fuzzy Syst 26(2):794–804

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the REV@CONSTRUCTION mobiliser project, under the grant LISBOA-01-0247-FEDER-046123 from ANI - National Innovation Agency, and by NOVA LINCS (UIDB/04516/2020) and LASIGE (UIDB/00408/2020) with financial support from FCT— Fundação para a Ciência e a Tecnologia, through national funds. This work contributes to the Strategic Research Plan of the Centre for Marine Technology and Ocean Engineering (CENTEC), which is financed by FCT under contract UIDB/UIDP/00134/2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pedro Santos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Santos, P., Pato, M.P.M., Datia, N., Sobral, J. (2024). How NLP and Visual Analytics Can Improve Asset Management. In: Kovalerchuk, B., Nazemi, K., Andonie, R., Datia, N., Bannissi, E. (eds) Artificial Intelligence and Visualization: Advancing Visual Knowledge Discovery. Studies in Computational Intelligence, vol 1126. Springer, Cham. https://doi.org/10.1007/978-3-031-46549-9_15

Download citation

Publish with us

Policies and ethics