Introduction

The era of globalization is under question following the pandemic that was due to the most recent outbreak of the coronavirus disease in 2019 (Pla-Barber et al. 2021). The shipping industry greatly depends on the global character of the economy (Wallace 2000), and is highly affected by any situation that pertains to globalization. Participants in various industries, especially in the maritime sector, have been called to make the right operational decisions, under unprecedented and uncertain circumstances, to survive in a highly market that is volatile and calls for sustainable operations. Hence, the wider use of global communications and the internet to continue business operations and facilitate alignment among market participants in the shipping industry was and is rendered crucially necessary (Trimmer et al. 2011).

Shipping is a highly demanding industry where executives are faced with significant challenges, making decisions that are affecting relevant operations. Nevertheless, such decisions are bound by the environmental variables of uncertainty, limited information, and time restrictions. In effect, uncertainty characterizes most of the day-to-day business in the shipping industry (Wallace 2020), leading to higher inefficiencies and costs (Danielson and Ekenberg 2020; Trimmer et al. 2011). Information, the economy’s main attribute, can largely be supported by modern digital information networks (Aiello et al. 2020). The emergence of the COVID-19 pandemic has obliged organizations in all industries to abandon outdated and traditional approaches and adopt new operational directions assisted (and sometimes governed) by digital technology (Min 2022; Davidson et al. 2021). Such operational strategies involve technological innovation, associated with digital transformation, which offers not only massive productivity improvements but also higher organizational competitiveness, owing to real-time data access (Donthu and Gustffson 2020; Vo and Tran 2021; Guo et al. 2020). Nowadays, information technology can contribute to more efficient operations, thus rendering organizations capable of withstanding international competition (Kauffman et al. 2013; Polasky et al. 2011; Yánez et al. 2020).

The Internet of Things (IoT henceforth) represents a technology that supports the dissemination of information to market participants. This new digital information technology offers shipping practitioners the opportunity to collect real-time data to support business operations based on actual evidence and facts, instead of placing reliance on past decisions and intuition. To the best of our knowledge, there are only a few studies examining in a spherical way the impact of the IoT technology in the shipping operations sector (cf. Danielson and Ekenberg 2020; Xiao et al. 2021). The existing studies focus specifically on smart ships, smart ports, container tracking, and/or logistic channels (Knieps and Bauer 2022; Li 2020; Zhang and Chen 2020). Thus, it would thus be interesting, not only for practitioners, but for academics as well, to review the existing literature and explore how the novel IoT technology could complement the new maritime business reality in addressing the existing economic and communication difficulties (Wallace 2000). The scope of this research is to utilize a novel methodology to practically demonstrate a more holistic view of the impact of IoT as embedded in the threads of shipping operations. More specifically, this research undertakes a text-mining-based review, utilizing an innovative machine learning methodology, to investigate the potential benefits (and/or drawbacks) that the new digital technology can offer with reference to the improvement of operations in the shipping industry (Danielson and Ekenberg 2020).

Literature review

Due to the scope and specialization of the topic, systematic literature review protocols have been followedFootnote 1 with reference to reviewing the extant literature that is analysed in the present section; said analysis points out the contemporary nature of the topic and uncovers the research gap of the holistic mapping of the impact of IoT in shipping, that is addressed through the research question of the present work. The research gap and emerging research question are framed in the context of a shipping industry that especially in the post-COVID reality (Stavroulakis et al. 2021a), strives for sustainability (Stavroulakis et al. 2023a), through green shipping (Koutsouradi et al. 2022), to deal with an array of current issues and mandates, inclusive of optimizing efficiency and cost-effectiveness, minimization of environmental footprint, alternative fuels (Stavroulakis et al. 2024), and gender parity (Stavroulakis et al. 2023b), among others. Contemporary maritime affairs have transcended their confines and concern not a locality, region, or the industry herself, but are consolidated in competitive constructs that are indicative pillars of sustainability, coined as maritime clusters (cf. Koliousis et al. 2017, 2018a, 2018b, 2019; Stavroulakis and Papadimitriou 2016, 2017, 2022; Stavroulakis et al. 2021b, 2020, 2019; Tsioumas et al. 2023). Clusters are one example of the industry modernizing itself and claiming its stake in sustainability. Another example of this venture refers to the concept of digitalization, which is a broad term referring to the advent from a bureaucratic approach to shipping, to the utilization of technology towards effectiveness and streamlined operations. Digitalization in shipping pertains to indicative spillovers to human resource management (Theotokas et al. 2024), strategic management (Ichimura et al. 2022a, b), blockchain (Yang 2019), cargo (Raza et al. 2023) and traffic flows (Calabria et al. 2017), Arctic shipping (Vicentiy 2021), the supply chain (Ahmed and Rios 2022; Feibert et al. 2017), standardization and certification (MacKinnon et al. 2023), and environmental impact (Agarwala et al. 2021; Bui and Perera 2019; Pavlinović et al. 2023; Pu and Lam 2021a, b).

Digitalization in the shipping industry is utilized as a vessel for improved strategic planning (Seo et al. 2023) and decision making, higher efficiency and optimization of cost-effectiveness (Bui and Perera 2020), with an indicative impact towards cybersecurity (Poyhonen and Lehto 2023; Wu et al. 2023) and the minimization of risk (Istomin et al. 2022). At the same time, digitalization in shipping can assure the emergence of smart services and operations, in the form of smart fairways (Heikkilä et al. 2024), smart terminal systems (Simola et al. 2023), ICT solutions (Fiorini and Gupta 2021), smart product-service-software systems–PSS, and agile smart services development (Tardo et al. 2022). Digitalization in shipping is crucial for the extended digital logistics platform (Yang and Lin 2023) with important spillovers to port networks (Maydanova et al. 2019), the port community system (Tsiulin et al. 2020a, b), as well as the aspect of port digitalization (Brunila et al. 2021), inclusive of container booking and slot allocation (Mandal et al. 2024), and green and smart ports (Philipp et al. 2021). Within this framework, the concept of IoT can play a governing role (Abusohyon and Tonelli 2021) and should be viewed as a decisive and promising approach toward the effective and sustainable digitalization of the shipping industry.

Internet of Things

During the last few years, digital information technology has steadily expanded and been adopted by various industries, thus paving the way to the fourth industrial revolution (Guo et al. 2020). Not surprisingly, both scholars and practitioners are nowadays interested in Industry 4.0 technologies (Zhang and Chen 2020). The novice IoT technology involves fifth generation (5G) networks, which embed smart physical network systems (Knieps and Bauer 2022; Li 2020). The ‘brain of IoT’ consists of elements such as big data, cloud computing (which assists big data analysis), and artificial intelligence (Han et al. 2021). In the context of the maritime shipping industry, IoT involves numerous sensors, satellites, ocean stations, buoys, and aerial remote sensing, able to collect large amounts of useful information for intelligent management and monitoring (Han et al. 2021; Kanagachidambaresan et al. 2020). IoT combines technologies such as sensing, automation, telecommunications, computers, and smart control, in a single platform (Munim et al. 2020; Singh et al. 2020). Thus, useful maritime big data can be generated, comprising geological, navigation, marine buoy monitoring, and navigation weather features (Han et al. 2021). IoT can also support unmanned operations such as unmanned cargo ships, underwater vehicles, and surface crafts (Han et al. 2021). These innovative technologies can significantly contribute to human resource issues as well as tangible resources’ streamlining, as pertaining to the digitalization of several sectors, such as navigational weather forecasting, ship design, ship manufacturing, maritime transportation, and maritime management, among others (Han et al. 2021). Furthermore, the so called ‘digitized port’ is a promising concept that can offer the possibility for real time communication, monitoring, and data sharing between the commercial, supply, and logistics’ entities (Cil et al. 2022).

The main objective of Industry 4.0 is to improve efficiency and highlight the value of informed decisions (Aiello et al. 2020). IoT technology supports objects’ intercommunication regardless of time or space constraints, utilizing audible and visual learning systems, e-mail, and SMS technologies (Cil et al. 2022; Xiao et al. 2021). Smart shipping specifically refers to a system that connects ports and ships to help stakeholders make decisions with greater efficiency, minimized costs and smaller ecological footprints, based on smart waterways, smart ports, smart ships, and maritime intelligence (Xiao et al. 2021). IoT technology, collecting real-time information about the location, humidity, temperature, vibration, and noise of objects, contributes to time and resources’ optimization, constituting human intervention unnecessary (Cil et al. 2022; Aiello et al. 2020). Furthermore, IoT technology combines the crew, officers ashore and electronic objects to create a collaborative environment not only for individuals onboard but also to those working in the ashore facilities (Ichimura et al. 2022a, b; Dai et al. 2020). IoT currently represents the fourth industrial revolution’s cornerstone, transforming shipping business operations into simpler, more efficient tasks, enhancing the quality of services (Nižetić et al., 2020; Lvovich et al. 2019; Brous et al., 2019) and promoting sustainable business models (Del Giudice et al. 2022). It is worth mentioning that numerous case studies have been employed successfully not only in shippingFootnote 2 (e.g., navigation, real time kinematic, differential GPS, monitoring, safety, laser systems, cf. Durlik et al. 2023), but in transportation and logistics sectors as well.Footnote 3

IoT innovation and opportunities for the shipping industry

IoT technology has a wide range of applications in the maritime sector, providing numerous data that can, for instance, refer to a vessel’s position, cargo condition, and port congestion (Xiao et al. 2021; Katranas et al. 2020). The digital metamorphosis assumes digital conversion of physical resources and their incorporation in the worldwide Internet network (Min 2022; Fu et al. 2020; Madhok 2021). IoT technology is able to associate physical business operations with decision-making algorithms (Dai et al. 2020), linking physical objects (such as mobile phones, machines, and smart sensors) to individuals and offices onshore (Parola et al. 2021). Consequently, ‘Shipping 4.0’ technologies can contribute to risk management, involving cyber-physical systems, cloud computing, as well as sophisticated tracking and tracing techniques able to gradually substitute perilous manual tasks with intelligent autonomous practices (Sepehri et al. 2021).

IoT can support safety at sea, by providing real time information (Ichimura et al. 2022a, b). The fact that IoT technology can combine data from Automatic Identification System (AIS) base stations, wireless phones, radars, and shipborne terminals, enables the creation of an integrated ship traffic management system (Han et al. 2021). IoT technology’s combination with the geographic information system, spatial information processing and wireless sensor networks can provide accurate and instant data which allows intelligent control of the marine environment and safe navigation of vessels (Han et al. 2021). In particular, the modern tracking and tracing systems based on IoT technology are supplemented by sensors and augmented reality (AR), providing autonomous ship guidance, able to support safe navigation, and collision/accident minimization (Sepehri et al. 2021). At the same time, IoT can contribute to collision avoidance and accidents’ prevention through accurate and timely detection of obstacles as well as other vessels (Ichimura et al. 2022a, b).

Additionally, IoT can link ships with shore facilities via digital entities, thus improving operations such as anomalies’ detection in vessel tracking, prediction of vessel’s estimated time of arrival, traffic hot spots, peculiar vessel behaviour and illegal bunkering, among others (Munim et al. 2020). Furthermore, the 6G communication innovation enables instant dissemination of ship energy consumption data (Deng et al. 2021). IoT allows accurate consumption and weather condition measurements, thus contributing to optimal fuel consumption (Hiekata et al. 2021), and vessel speed management (Munim et al. 2020). Besides that, IoT can be used to effectively measure toxic blooms in coastal waters, thus predicting vessel propulsion failure (Munim et al. 2020), in addition to the added environmental benefit of mapping eutrophication hotspots and thus actively assisting the industry’s efforts towards sustainable operations.

IoT supports the so called ‘Ocean Cloud Warehouse’ system, able to record the kind, quantity, and release time signature of pollutants automatically, while dispatching integrated pollutants to a suitable treatment agency (Han et al. 2021). Simultaneously, IoT can improve the hazardous waste treatment process, by connecting all treatment stages, thus strengthening marine pollution prevention (Han et al. 2021). IoT can moreover contribute to ship designing, general assembly, construction operations, and ship protection (Han et al. 2021). On the one hand, IoT technology can be utilized to instantly disseminate equipment data to marine equipment manufacturers, thus decreasing the necessary time and cost of procurement (Hiekata et al. 2021). On the other hand, IoT can support remote maintenance from onshore and render emergency stops and deviations on voyage redundant (Hiekata et al. 2021).

At this point, we should underscore the fact that IoT technology enables constant monitoring of cargo movement and operation. IoT data can be utilized to coordinate load control functions and controlling objects, such as rudders and thrusters, thus eschewing excessive loading and vessel’s age deterioration (Hiekata et al. 2021). The fact that IoT technology enables constant monitoring of cargo handling equipment (such as cranes), facilitates loading equipment maintenance and repair operations (Hiekata et al. 2021). Since IoT remotely monitors critical equipment onboard, it also allows the collaboration between workers and robots for cargo operations (Ichimura et al. 2022a, b). IoT cargo handling and operation monitoring can be further linked to logistic centres, thus preventing accidents (Parola et al. 2021). Consequently, IoT can coordinate cargo operations, transportation, freight forwarding, customs clearance, and warehousing, as well as distribution activities (Parola et al. 2021). Apparently, such an accurate kind of management contributes to maritime operations efficiency (Hiekata et al. 2021).

IoT can also be applied in ship survey automation, by linking virtual and physical objects and taking advantage of augmented reality (AR) and virtual reality (VR) technologies (Ichimura et al. 2022a, b). Alternatively, IoT can render the field of bills of lading smoother, by uniting port community systems to IoT data and thus effectively mitigating issues such as double spending, duplicated documents, masqueraded information, fraud, and hacking, among others (Irannezhad and Faroqi 2021).

Referring specifically to the container shipping sector, Cold Chain Logistics based on IoT technology permits not only real-time containerized product monitoring, but also instant detection of abnormal events associated with refrigerated products. Such information can be disseminated to the relevant parties in the supply chain, thus improving decision-making (Cil et al. 2022). Furthermore, container location monitoring can contribute to efficient logistics management, and carbon emissions’ reduction (Choi et al. 2018). IoT assisted container tracking systems can also support smoother cross-border procedures, and better monitoring of international container movements (Choi et al. 2018). Additionally, IoT offers the option of linking port terminals, distribution centres, intermodal terminals, and dry ports, thus enhancing not only productivity but also working conditions and firm strategies (Parola et al. 2021). As a result, the Internet of Ships (IoS) can upgrade not only the managerial but also the maintenance performance of shipping companies, all the while decreasing operational costs (Zhang et al. 2021b). In other words, IoT can advance cost reduction and operational efficiency, supporting effective decision-making and stakeholder relationship management (SRM), while decreasing the possibility of human error and the amount (and cost) of human resources required (Parola et al. 2021; Li 2020).

One would be remiss if in all the aspects concerned with the innovative applications and approaches of IoT, the data security aspect was left out of the narrative. Indeed data security in shipping and maritime affairs is of the outmost importance and must be factored in the framework of any IoT application, as it bears the volatility to reverse implicit as well as expected benefits and streamlining, turning these into threats and risk, for a wide array of (digitally) interconnected companies. Currently, several data security applications have been proposed with reference to the shipping industry (cf. Mahmood et al. 2023; Zhang et al. 2022b). Solutions and applications range from ship detection (Arifin et al. 2011), sensor networks (Sun and Niu 2020) to document handling (Tsiulin et al. 2020a, b) and privacy protection (Han and Yang 2021); the effortless conclusion is that any implementation of IoT should include data security mandates as a fundamental prerequisite.

On the antipode, shipping smart data refers to the outcome of digitalization technology, which involves the interconnection of smart objects, permitting the linking of physical operations to a virtual information environment (Aiello et al. 2020; Cil et al. 2022). In reference to a possible implementation of smart objects in the shipping industry, the advent of smart ships represents a promising evolution (Acanfora and Balsano 2020; Dai et al. 2020).

The internet of ships focuses on intelligent ships, which can connect people-to-vessels, ships-to-ships, ships-to-shores, and ships-to-cargoes, utilizing a combination of sensing, positioning, and tracking IoT data (Han et al. 2021). The so called ‘smart ships’ operate with computerized systems, and can automate tasks onboard, such as monitoring of navigational systems and engines remotely (Ichimura et al. 2022a, b; Li 2020; Brous et al., 2019). Additionally, such smart ships enable remote ship manipulation and unmanned decision-making based on a fully integrated arrangement of sensors (Ichimura et al. 2022a, b; Acanfora and Balsano 2020; Uslu et al. 2019). Unmanned cargo ships permit the vessel’s master to remotely manage ship sailing around the globe, thus supporting the application of intelligent navigation (Han et al. 2021; Nikghadam et al. 2021; Guo et al. 2020). Munim et al., (2020) specifically suggest that the error rate of autonomous ships is less than that of human-operated vessels.

Similarly, the ‘smart port’ [using automation and innovative technologies including Artificial Intelligence (AI), Big Data, Internet of Things (IoT) and Blockchain to improve its performance] and ‘smart container’ [shipping container used in freight and logistics integrated with Internet of Things (IoT) technologies, sensors, GPS tracking and solar panels] provide new opportunities for more efficient shipping operations, that could significantly decrease service time delays (Nikghadam et al. 2021; Nižetić et al., 2020; Singh et al. 2020). Meanwhile, the extant literature suggests that the new IoT technology generates valuable operational data for further statistical analysis, thus creating additional economic value for shipping companies (Guo et al. 2020). Overall, IoT technology offers the possibility for integrated networking and effectiveness regarding improved shipping operations (Nižetić et al., 2020). In this context, shipping managers are expected to invest around $2.5 million in IoT technology soon (Xiao et al. 2021).

Following the discussion of the extant literature in the field (with its main points summarized in Table 1) of the IoT technology and its applications in the shipping industry, the research question of this work can be formulated as the mapping of the holistic application of IoT in shipping. The research question is addressed with the utilization of a novel methodology (a machine learning approach to perform a text mining-based literature review) that results in a conceptual framework with reference to the topography of the applications of IoT in shipping.

Table 1 IoT applications in the shipping sector

Methods

Research methodology

Given the wealth of information sources, a publications’ trend analysis is important in assessing the development, implementation, and effectiveness of the IoT technology (Chen and Ho 2021). This study adopted a text mining method to validate its research objective and reveal information from academic articles, relevant to the IoT technology’s contribution to more effective operations in the shipping industry; namely increased operational and decision-making efficiency (Hirata et al. 2020; Galati and Bigliardi 2019; Chen and Ho 2021). Maritime research has lately adopted innovative methods to produce theoretical insights (Hirata et al. 2020). The use of machine learning, and especially topic modelling, has started to become popular in the field of management studies (Hirata et al. 2020). Specifically, topic modelling has become an important branch of text analysis, utilized to identify the primary topics from a large corpus of documents (González-Santos et al. 2021). Topic modelling presents the statistical liaison that exists between each document in the corpus and each identified topic and between each document and the dominant words of each topic (González-Santos et al. 2021).

The collected text data were analysed utilizing machine learning models attempting to respond to the research objective. The extant literature suggests that a very large unstructured dataset can be difficult to analyse (Sharma and Sharma 2022). We adopted the Natural Language Processing (NLP) technique, as the use of algorithms contributes to transforming text data into a structured representation, which can be easily processed by computers (Hirata et al. 2020). NLP is a subsection of artificial intelligence, which enables personal computers to explicate natural human languages (Hirata et al. 2020). In the Python version 3 environment and using the open-source application Jupyter Notebook version 6.4.0, this study analyses a document corpus, which comprises many academic papers, relevant to the utilization of IoT technology in the shipping industry. However, not all information in the corpus was useful for this study (González-Santos et al. 2021). In this respect, NLP tools have been utilized to process and statistically analyse this large number of unstructured data and produce new knowledge (Piris et al. 2021).

The NLP method is categorized into three basic pillars: (i) data preprocessing, (ii) the machine learning model, and (iii) the visualization of the findings (Hirata et al. 2020). To facilitate our analysis, Python’s the Natural Language Toolkit (NLTK), was utilized engaging the use of the libraries: Spacy, NLTK PorterStemmer, NLTK Stopwords and NLTK WordNetLemmatizer for the data preprocessing, the genism models library for the machine learning model and finally, matplotlib and sklearn libraries for the visualization of the analysis results. The unstructured text data were first pre-processed to produce word tokens and lemmas (González-Santos et al. 2021). Following the data preprocessing, each separate word in the unstructured text data was converted into a token, that computers can read and analyse (González-Santos et al. 2021). The statistical analysis of these tokens led to the extraction of a predetermined number of underlying topics existing in the document corpus and to the identification of the dominant tokens, listed in semantic groups (González-Santos et al. 2021). The complete procedural sequence can be found in Fig. 1.

Fig. 1
figure 1

(Source: Authors)

Flowchart of processing steps

Data preprocessing

Initially, a text corpus was generated for further analysis. In this research, the corpus generation involved a collection of academic papers, retrieved from the Scopus database (Del Giudice et al., 2022). The Scopus search involved the combination of the following parameters: “Internet of Things AND shipping AND maritime” (Du et al., 2021; Mustak et al. 2021). Moreover, the search included the time frame from January 2010 to June 2021, since these dates represent a suitable time frame for our analysis (Hirata et al. 2020). The search output was carefully filtered to include only academic articles; abstracts and conclusions were carefully checked by the researchers to ensure that they were relevant with the research scope. All articles that contained both maritime/shipping and IoT/Internet of Things in their abstract or title were included in the dataset. Research suggests that a reasonable word selection strategy can enhance the readability and interpretability of LDA model analysis results (Du et al. 2021). All articles that matched the above-mentioned criteria were included in the dataset. The resulting filtering produced 228 academic articles, exclusively in English, which were relevant to our study (Mustak et al. 2021). Regarding the large dimensionality of the corpus, we analysed only the abstracts and conclusions (Du et al. 2021; Mustak et al. 2021).

Referring next to the descriptive statistics, the statistics for the 'YEAR' column provide insights into the distribution of publication years for the academic articles in the dataset (Mcauliffe and Blei 2007). In other words, the ‘YEAR’ statistics provide an overview of the distribution of publication years in the dataset. Specifically, they depict how many unique years are represented, which year is the most common, and how many articles were published in that most common year (Mcauliffe and Blei 2007). The descriptive statistics results (Table 2) show that there are 218 non-null entries in the 'YEAR' column, indicating that there are 218 articles with information on the year they were published. Moreover, there are 15 unique values in the 'YEAR' column. This means that there are 15 different publication years represented in the dataset. The most frequently occurring publication year in our dataset is 2020 (as indicated by the 'top' value). This suggests that 2020 is the most common publication year among the articles. Besides that, the frequency of the most frequently occurring publication year (2020) is 63, indicating that 63 articles in our dataset were published in the year 2020.

Table 2 Descriptive statistics: year

The descriptive statistics for the 'JOURNAL' column provide insights into the distribution of values within that column. In summary, these statistics provide an overview of the distribution of journals in our dataset (Mcauliffe and Blei 2007). They describe how many unique journals there are, which one appears most frequently, and how many times it occurs. According to the descriptive statistics results (Table 3), there are 219 non-null entries in the 'JOURNAL' column, indicating that there are 219 articles with information on the journal they were published in. Additionally, there are 185 unique values in the 'JOURNAL' column. This can be interpreted as there being 185 different journals represented in our dataset. The most frequently occurring journal in our dataset is "Maritime Policy & Management" (as indicated by the 'top' value). This journal appears 11 times, making it the most common one. Moreover, the frequency of the most frequently occurring journal is 11, as mentioned above.

Table 3 Descriptive Statistics: journal

Tokenization

Machines can conduct text mining efficiently, as they can analyse enormous amounts of data and produce precise findings (Hirata et al. 2020). Data preprocessing techniques, such as stemming and tokenization, transform the text data into a data frame that machines can easily read and analyse (Chintalapudi et al. 2021). In our case, every document in the corpus was transformed into plain text (González-Santos et al. 2021). Then, every single word in the plain text was treated as a token, to be stored in the data structure for further analysis (González-Santos et al. 2021). This data transformation is significant, as it enables the machines to understand the text data by analysing word sequences (Hirata et al. 2020). All kinds of symbols (e.g., parentheses, asterisks, punctuation marks, and hyphens) were removed during this stage of plain text preprocessing, because such symbols have no contribution at all to the interpretation of the plain text (González-Santos et al. 2021). All words in uppercase were converted to lowercase. In this way, all words in the plain text could be processed uniformly (González-Santos et al. 2021). This research utilized the Python script of the NLTK Tokenizer package (Hirata et al. 2020).

Stop words

Following the tokenization task, this study proceeded with the data preprocessing with the removal of stop words from the plain text (Hirata et al. 2020). The term “stop words” refers to the most common English words, which do not play any meaningful role in the interpretation of the topics in the corpus (González-Santos et al. 2021). Examples of stop words are articles and prepositions. Several other words, not included in the English stop word list (e.g., “academic” or “paper”), were also filtered out from the plain text, as they add little to the explication of the text data (González-Santos et al. 2021).

Stemming

Besides the tokenization and stop words tasks, the stemming technique was utilized in the research corpus. The beginning or end of the words was removed, thus reducing the inflected words to their word root (Chintalapudi et al. 2021). Consequently, all words that belonged to the same stem group were treated as a single token to reduce the total number of tokens and thus facilitate the computer analysis (González-Santos et al. 2021). We adopted the Porter Stemmer algorithm of the NLTK package (Hirata et al. 2020). Figure 2 illustrates the first 10 documents in the processed data.

Fig. 2
figure 2

(Source: Authors)

The first 10 documents in the dataset

Machine learning model

After the data preprocessing, the data was prepared for the use of algorithms, to find the association among different word groups in the corpus and, in this way, identify the most important existing topics (Hirata et al. 2020). The text data frame was classified by training a machine learning algorithm model, namely the Latent Dirichlet Allocation (LDA) model in the Python environment, utilizing the Jupyter Notebook (Hirata et al. 2020). Sharma and Sharma (2022) underscore the popularity of Latent Dirichlet Allocation (LDA) as a topic modelling technique.

The LDA method was utilized to conduct the learning and classification functions on the already processed data. The method was also used to cluster the co-occurring tokens and thus give prominence to the dominant latent topics, extant in the research corpus (Mustak et al. 2021). The LDA is an efficient, unsupervised method that analyses text data probabilistically (Zhou et al. 2021). LDA models obtain only the words in the corpus documents and infer the topics, which maximize the posterior probability of the observed corpus. In other words, the LDA model is a learning method that maximizes the probability of words’ assignment to a k-fixed topic.

Regarding the number of topics, we have identified four topics, based on the analysis of the perplexity and coherence scores of our model (Hasan et al. 2021). The descriptive statistics of the model that we built are the following:

  • Parameters: num_topics = 4, update_every = 1, passes = 5, alpha = symmetric

  • Perplexity: −7.466162260159726, Coherence Score:0.35287973999111755

Referring to the perplexity, the model which had the second best such metrics with num_topics = 3 and alpha = auto has a slightly lower perplexity (-7.20) compared to the second model with num_topics = 4 and alpha = symmetric (-7.47). Lower perplexity values generally indicate better model performance.

Regarding next the coherence score, the second model with num_topics = 4 and alpha = symmetric has a slightly higher coherence score (0.353) compared to the first model with num_topics = 3 and alpha = auto (0.346). A higher coherence score is generally desirable. Overall, both models are relatively close in performance. Thus, it seems that the optimal number of topics, which balances model complexity with the ability to explain the data, is four (Hasan et al. 2021). That given, this method enables the revelation of underlying topics in the research corpus and uncovers the most pertinent documents to each of these identified topics (Mustak et al. 2021). The LDA method is based on the fact that documents are distributions of latent topics and that topics are distributions of groups of words. This, therefore, reveals not only the dominant topics but also the dominant words per topic in the corpus (Hirata et al. 2020). Consequently, the main object of the LDA is to quantify the probability distribution of the words, in a subject text data frame (Hirata et al. 2020). In this research, the predetermined number of topics, used for the model training, was four (Piris et al. 2021).

Visualization

After fitting the machine learning model to analyse the text data, this study used the pyLDAvis package to visualize the LDA findings. pyLDAvis created an inter-topic distance map of the four most important topics that were identified in the text data (Mustak et al. 2021), each represented on a map by a circle. The distances between the circles of the map show the similarity among the different underlying four topics, whereas the size of each circle signifies the substance of each topic, namely, the largest circle represents the dominant topic (Mustak et al. 2021). Additionally, the visual representation of the results contains a bar chart, on the right side of the inter-topic map, presenting the top 30 most relevant terms for each topic (Mustak et al. 2021).

To strengthen the topic modelling methodology, this research adopted the t-distributed Stochastic Neighbour Embedding (t-SNE) technique to emphasize on the visualization of the machine learning model findings (Mustak et al. 2021). t-SNE is a statistical method for visualizing high-dimensional data, by presenting each data point in a two-dimensional map. Specifically, t-SNE is a nonlinear dimensionality reduction technique, which can be used to embed high-dimensional data. t-SNE presents the data in a reduced dimensional space and depicts the document term variance, by computing the eigenvectors from the covariance matrix (Mustak et al. 2021). Hence, similar data points are presented to be nearby, and dissimilar ones are modelled as distant points.

Results and discussions

Data collected was analysed with several different methods and presented with alternative visualization tools (cf. Figure 1) to investigate in depth the validity of the research objectives and extract credible conclusions. In this context, following training of the LDA model in the processed data, four principal topics were revealed out of which the top 10 keywords per topic were initially identified, accompanied by their relevant weights, as presented in Table 4.

Table 4 The dominant topics and top 10 keywords per each topic

The results clearly present four topics relevant to the IoT technology. We have labelled the topics generated by the LDA model as follows to facilitate comprehension of the fundamental themes: a) the digitalization of the shipping industry, b) its application to the data handling and decision-making, c) the introduction of smart services, and d) the integration of port maritime networks (Table 5). Referring to the first topic (Topic “0”), the results of the LDA mode suggest “blockchain” (weight = 0.031) to be the most important term. Meanwhile, the term “shipping” (weight = 0.029) has a similar weight to the terms “blockchain” and “technology” (weight = 0.026), suggesting a clear connection between this novice “digital” (weight = 0.019) technology and the shipping “industry” (weight = 0.014).

Table 5 Patent analysis: the dominant topics and top 10 keywords per each topic

The second topic (Topic “1”) involves a bag of words, which is related to the “facilitation” (weight = 0.019) of “data,” (weight = 0.073) “handling” (weight = 0.010), and “decision-making” (weight = 0.015). Moreover, this topic refers to the creation of a “framework” (weight = 0.013), which may “influence” (weight = 0.009) “decisions” (weight = 0.009). Concerning the third topic (Topic “2”), the terms “data” (weight = 0.021) and “information” (weight = 0.021) seem to have the same similarities, whereas the terms “technology” (weight = 0.019) and “shipping” (weight = 0.012) share some similarities with the terms “smart” (weight = 0.016) and “services” (weight = 0.017).

The last topic (Topic “3”) focuses on terms, such as “port” (weight = 0.025) and “maritime” (weight = 0.019) in connection with terms, such as, “network” (weight = 0.015), “integration” (weight = 0.015), “system” (weight = 0.009), and the “international” (weight = 0.008), “flow” (weight = 0.008), and “sharing” (weight = 0.007) of information. To provide a better interpretation of the four identified topics and the top 10 keywords related to them, this study explored the most representative sentences for each of the four dominant topics (Table 6). Specifically, the most indicative sentence (contribution of 0.98%) for the first topic (Topic “0”) refers to the development of the digital transformation of the maritime sector. Similarly, the most exemplar sentence (having a contribution of 0.99%) for the third topic (Topic “2”) relates to the autonomous maritime systems, which can collect and manage various types of information. At this point, we should emphasize the fact that the topic percentage contribution of the dominant topic (i.e., Topic “0,” as per Figs. 3 and 5) in the relevant document is greater than 98%. This suggests an efficient training of the machine learning model in this study (Hirata et al. 2020). Consequently, the preliminary research findings suggest the potential of the IoT as a developing technology, which may process information automatically and facilitate shipping operations, implying the validity of both research objectives. The contribution of the IoT technology to the digitalization of the shipping industry, as well as to the transformation of maritime systems into autonomous ones, may result not only in advancing efficiency in the shipping industry, but in parallel in ameliorating decision-making processes in the shipping sector.

Table 6 The most representative sentences for each topic percentage contribution of the dominant
Fig. 3
figure 3

(Source: Authors)

Word cloud

Additionally, we have conducted a patent data analysis. In the context of natural language processing (NLP) and machine learning, patent analysis in LDA topic modelling refers to the application of Latent Dirichlet Allocation (LDA) to analyse a collection of patents (Zhang et al. 2021a). Patent analysis involves examining patents to extract meaningful insights, patterns, or trends, categorizing patents into different technology domains, and identifying emerging technologies (Zhang et al. 2021a). Our analysis revealed three topics (Fig. 3):

Topic 0: Management and Control in Radio Systems.

This topic comprises approximately 29% of the overall content. It focuses on the management and control aspects of radio systems, with terms such as "management" (2.9%), "system" (2.8%), "radio" (2.8%), "control" (2.0%), "service" (1.9%), "measurement" (1.7%), "automate" (1.3%), "resource" (1.3%), and "energy" (1.3%).

Topic 1: Data Handling and Internet of Things (IoT).

This topic constitutes approximately 52% of the overall content. It revolves around data handling and IoT-related concepts, with terms like "data" (5.2%), "method" (3.5%), "system" (3.4%), "things" (3.4%), "base" (2.6%), "devices" (2.5%), "internet" (2.4%), "signal" (2.1%), "detection" (1.7%), and "equipment" (1.2%).

Topic 2: Network Communication and Wireless Devices.

This topic represents approximately 19% of the overall content. It is centred on network communication and wireless devices, with terms such as "method" (8.4%), "system" (8.1%), "network" (5.8%), "apparatus" (4.2%), "communication" (3.4%), "wireless" (3.1%), "device" (2.4%), "information" (1.9%), "transmission" (1.5%), and "access" (1.5%).

The topics revealed by our initial analysis shed light on the potential of IoT technology to drive digital transformation within the shipping industry, enhance decision-making processes, introduce smart services, and improve connectivity within port networks (Acanfora and Balsano, 2020; Dai et al. 2020). Overall, both the exploration of IoT-related topics in the shipping industry and the analysis of patent data using LDA topic modelling provide valuable insights into the current trends, challenges, and opportunities within the maritime sector, particularly in relation to digitalization and technological advancements. These findings suggest the potential of IoT technology to revolutionize shipping operations, improve efficiency, and enhance decision-making processes in the maritime industry (Brous et al., 2019; Ichimura et al. 2022a, b; Munim et al. 2020; Parola et al 2021).

An alternative visualization method enhancing understanding of the various topics significance and their underlying context is the construction of a word cloud figure. The aforementioned top 10 keywords for each of the four dominant topics are presented in a word cloud manner in Fig. 3. The key words importance is represented by their relative size, were terms of larger size within the word clouds are considered to have more weight (i.e., more importance) than the smaller-sized terms.

The key word importance is further emphasized by their relative frequency of appearance. To illustrate this the distribution of document word counts per dominant topic is plotted in Fig. 4 which depicts that the first (Topic “0”) and third (Topic “2”) topics concentrate on the highest document word counts. More specifically, these two topics relate to the utilization of the new digital blockchain technology in the shipping industry (Topic “0”) and the exploitation of new smart systems to support the dissemination of shipping data and information (Topic “2”).

Fig. 4
figure 4

(Source: Authors)

Distribution of document words counts per dominant topic

To further validate the LDA model results, the word count and weight of the four topic’s keywords are plotted (Hirata et al. 2020). Specifically, Fig. 5 suggests that in the first topic (Topic “0”), the terms “blockchain” and “technology” are weighted as much as the term “shipping.” Moreover, in the second topic (Topic “1”), the terms “data,” “facilitate,” and “decision-making” are highly weighted, although they have a lower frequency of occurrence in the documents of the research corpus. Additionally, the most important terms of the third topic (Topic “2”) are “data” and “information.” Moreover, these terms arise together with terms, such as “smart” “system” and “shipping” “industry.” Finally, we may observe that the most significant terms of the last topic (Topic “3”) are “maritime” and “services,” whereas the terms “integration” of “international” “network” and “system” “flow” appear but have lower importance and frequency of appearance.

Fig. 5
figure 5

(Source: Authors)

Words count & importance of topic keywords

The relative importance of the 4 topics as well as their relevant is presented with the use of an inter-topic distance map which was generated utilizing the pyLDAvis package. Figure 6 presents the four most important topics of the research corpus in a differentiated way representing them with circles the size of which determines their relative importance. The location on the map represents the relative interconnection of the topics. It should be noted that pyLDAvis visualizes topics by numbering 1 thru 4, corresponding to the Topic 0 to 3 as presented in the above and the subsequent figures. Figure 6 illustrates that none of the 4 topics overlap, indicating their relative independence, and as such they represent distinct subjects. Furthermore, the top 30 keywords for each topic are depicted at the accompanying bar chart indicating the significance of each keyword to the corresponding topic. Contrary to the results of Fig. 4, Fig. 6 suggests that the first two (i.e., Topics “1” and “2”) are the principal topics, with the first (Topic “1”) being the dominant one, while the effect of Topic 2 is diminished. Moreover, the distance between the first two topics is wider, meaning that their similarity is rather low, whereas the last two topics (Topics “3” and “4”) are close enough. This proximity between the last two topics represents their contextual similarity. However, topics “3” and “4” seem to be of smaller size, that is, have less weight and therefore less importance.

Fig. 6
figure 6figure 6

(Source: Authors)

a Inter-topic distance map, and top-30 most relevant terms of topic 1, b Inter-topic distance map, and top-30 most relevant terms of topic 2, c Inter-topic distance map, and top-30 most relevant terms of topic 3, and d Inter-topic distance

Referring to the dominant topic (Topic “1”, Fig. 6a), the most significant terms of the top 30 are “data,” “smart,” “information,” and “technology,” followed by “shipping,” “industry,” “efficiency,” and “improvement” of “application”. The leading keywords identified within Topic 1 highlight the importance of novel smart technologies and information sources at the efficiency enhancement of shipping industry. The keywords identified at the second significant topic (Topic 2, Fig. 6b), are indicating the importance of “blockchain” and “digital” “technologies” and the “processes” of the “maritime” “industry”, “supply” “chains” and “logistics”. Further, analysis of Topics 3 and 4 (Fig. 6c, d) are emphasizing on the necessity of “integration” of “smart” “systems” “networks” at the “port” “maritime” “services” as well as the significance of “data” “facilitation” and “handling” at “decision-making”. The findings of this analysis are further validating both of our research objectives.

Finally, the document distribution per dominant topic was further analysed and visualized via the t-SNE statistical method (Fig. 7). T-SNE is a dimensionality reduction technique, which enables the visualization of high-dimensional datasets (Flexa et al. 2021). Utilizing a probabilistic method, t-SNE visualizes individual data variables as points in the context of a two-dimensional scatterplot (Flexa et al. 2021). Thus, similar data points are positioned close to one another, eventually forming thematic clusters. In this respect, Fig. 7 presents a coloured distribution of each of the four dominant topics identified in the text data. Once more, we observe that the first topic (Blue) is dominant, followed by the third (Green) and second (Orange) ones. Apparently, the t-SNE results are in accordance with the inter-topic distance map (Fig. 7), as four thematic clusters are formed in both visualization analyses, featuring no overlap, and thus indicating contextual independence, with the fourth cluster having the less significance.

Fig. 7
figure 7

(Source: Authors)

t-SNE clustering of four LDA topics

Research implications

The adoption of the IoT technology will enable the shipping participants to reap the fruits of digital information technology (Aiello et al. 2020). The research results can be summarized in four perspectives. First, “blockchain” seems to be related to “shipping”, “blockchain” “technology”, “digital” and “industry”. Secondly the terms “facilitation” “data” and “handling” relate to “decision-making”, “framework”, “influence” and “decisions”. Thirdly, “data” and “information” are related to “technology”, “shipping”, “smart” and “services”. Finally, the terms “port” and “maritime” are associated with “network”, “integration”, “system”, “international”, “flow” and “sharing”. The analyses clearly identify four independent areas showcasing the effect of IoT technologies on the maritime industry. Firstly, it identified the area of Blockchain-enabled Maritime Logistics. This topic revolves around the integration of blockchain technology within the shipping industry, focusing on its application in logistics processes. It explores how digitalization and blockchain facilitate efficient and transparent business processes, as confirming literature findings (cf. Ni and Irannezhad 2024; Carlan, et al 2022; Gerakoudi 2022; Wagner and Wisnicki 2019; Shirani 2018), spanning from supply chain management to maritime operations.

Secondly, the Data-driven Decision Framework is also identified. This topic emphasizes the importance of data handling and decision-making frameworks within the shipping industry. It examines how data facilitates informed decision-making, the establishment of robust decision frameworks, and the influence of data clusters on strategic decisions and operational efficiency, which is validated frequently by researchers (Gavalas et al.., 2022; Bousedkis et. al 2021; Thiess & Muller 2018). In sequence, the topic of Smart Technology Integration in Shipping Systems is also pointed out from the analyses. This topic explores the integration of smart technologies within shipping systems, emphasizing the interconnectedness of data, information, and technology. It delves into the role of smart systems in enhancing efficiency, optimizing ship operations, and leveraging the IoT to improve industry practices. Applications of such systems are evaluated and adopted to improve operational performance (Noto et al. 2023; Aslam et al. 2023; Alop 2019).

Finally, the topic of International Port Network Integration is also identified. This topic focuses on the integration of smart port systems and international maritime services. It examines the flow of information and services within port networks, emphasizing the importance of integration, collaboration, and data sharing for enhancing international maritime operations and logistics. (Wu et al 2024; Jeevan and Roso 2019; Crainic et al. 2015). The research implications can be outlined in four standpoints. Initially, the research findings imply that the fourth industrial revolution will largely affect shipping operations by creating a digital industry. Namely, an interconnected shipping environment able to process information automatically and facilitate operations (Aiello et al. 2020).

Additionally, the research results suggest that this novice technology can improve business operations in the shipping industry by connecting the market participants on a global and real-time basis, and in this way facilitate the handling of data and support effective decision-making (Xiao et al. 2021). Decision-making will gain a real-time trait because of the interoperability among the various systems (Aiello et al. 2020). Next, the utilization of smart devices can enable the dissemination of integrated data and information, thus offering the potential for smart services in the shipping industry. As a result, the shipping sector is likely to be transformed into a digital industry, where autonomous maritime systems will function, able to process information and data in an automated way. Finally, the IoT technology offers the possibility for an integrated port and maritime network, characterized by efficient flow of operations and dissemination of information.

Referring to the practical implications of the research findings, we may suggest that IoT can play a role of paramount importance in shipping operations (Brous et al., 2019). The innovative IoT technology obliterates the traditional way of management and allows shipowners an automated and “scientific” control of their fleet (Li 2020). In this respect, IoT enables remote vessels’ inspection to safeguard abidance by various standards (Xiao et al. 2021). Consequently, IoT technology contributes to the minimization of inefficiencies, risks, and expenses in the shipping industry, thus radically altering how shipping companies’ function (Xiao et al. 2021). Additionally, the recent measures imposed against the COVID-19 virus have created a new social-distancing reality, which stimulates the utilization of human less technologies in operations (Rahman et al. 2021). In this context, a percentage of shipping operations can be automated following the adoption and implementation of this technology, and thus business processes can be reshaped (Brous et al., 2019).

The new digital information technology may compensate for the lack of personal contact and prove to be an efficient and indispensable business tool for efficient shipping operations (Vo and Tran 2021). Moreover, IoT technology can contribute toward energy efficiency in parallel to the improvement of shipping operations (Cil et al. 2022). Furthermore, intelligence received from IoT technology allows the manager not only to monitor and control shipping operations but also to make successful predictions, related to organizational operations (Singh et al. 2020). The fact that this new technology can serve as a data predictive tool, enables the amelioration of the quality of services to a high degree and facilitates the minimization of operational costs (Cil et al. 2022; Aheleroff et al. 2020).

Referring next to the theoretical implications, this research utilizes an innovative technique to data mining, namely a machine learning approach. In this context, our study provides a holistic approach to the benefits and limitations accrued from the adoption of the IoT technology in the maritime operations. Unlike the existing studies, each of which focuses on a specific IoT operational application, our research explores the holistic impact of the IoT technology adoption in various operations of the shipping industry, thus contributing to the existing knowledge.

Conclusion

Referring to the limitations of the IoT, adopting the IoT technology is not an easy task for firms (Uslu et al. 2019). Hence, shipping practitioners who adopt the new technology must embed data collected from such digital technologies in their business operations (Brous et al., 2019). However, the data collected should be integrated and analysed to be converted into useful input for operational decisions (Dai et al. 2020). The adoption of big data analytics in the maritime sector is inhibited, unless talented market participants exist in the industry, competent in quantitative analysis (Zhang and Lam 2019). Specifically, such a digital system will provide the companies with a very large amount of data that must be processed to take full advantage of the IoT technology and improve shipping operations (Aiello et al. 2020). Consequently, identifying trends and patterns in the data received from digital information technology can be an issue for shipping companies (Aiello et al. 2020), since the number of trained employees competent in IoT technology is currently insufficient (Cil et al. 2022).

Shipping market participants should acquire a sufficient understanding of digital systems and develop the necessary analytic skills to support the extraction of meaningful information from data received (Aiello et al. 2020). Overall, the industry practitioners should familiarize themselves with statistical analysis and network science to distil data collected and generate meaningful information for their firms to improve shipping operations (Brous et al., 2019). Moreover, some concerns regarding the security of data arise (Yánez et al. 2020). Actually, the leaking of data constitutes a challenge to the new digital technology (Henesey et al. 2020). Perhaps, blockchain technology could resolve data security and render this novice digital technology more trustworthy for individuals in the shipping industry (Yánez et al. 2020). Additionally, storing collected data represents one more challenge for shipping companies (Dai et al. 2020). Another problematic factor is the wireless networks’ quality and operating power, which constitutes a major concern referring to the wider implementation of the IoT technology in the shipping industry (Nižetić et al., 2020). One last issue regarding the successful adoption of the IoT is that this new technology requires interoperability between the various objects that operate in the industry (Uslu et al. 2019). Specifically, interoperability advantages could support a smooth and successful transition to autonomous ways of operation in the shipping industry (Kurt and Aymelek 2020). Presently, the necessary infrastructure supporting such a digital system is lacking (Uslu et al. 2019). Hence, this technology must be standardized before it becomes fully operational (Brous et al., 2019).

In conclusion, one of the key lessons for shipping practitioners should first consider the pros and cons of the new digital technology and the willingness of the various parties to participate in such a system before proceeding with its adoption and implementation (Henesey et al. 2020). IoT technology becomes progressively popular in the shipping industry, thanks to its ability to support the safety of operations, transparency, and cost reduction (Cil et al. 2022). The post-COVID-19 pandemic era entails a new reality that offers fertile soil for further development and application of innovative information technologies in the shipping industry (Barnes 2020). As a matter of fact, the standard way of conducting shipping business has changed. Many industries have rapidly adopted IoT technology because of its tremendous potential to share information instantly and securely (Nižetić et al., 2020). Such a technology provides business intelligence, which can improve the business operations and the organizational profitability, and the competitiveness of a firm (Aiello et al. 2020). More than various other industries, shipping necessitates the adoption of new technologies that can support such a complex environment of expensive business operations, involving a large number of parties. The wider use of smart ports, ships, and spare parts also seems to be a promising field for the shipping industry. Industry 4.0 represents a revolutionary torrent, able to change the characteristics of operations in many industries (Galati and Bigliardi 2019). There is a great tendency for the IoT technology to have a huge economic effect and be adopted by several industries (Cil et al. 2022).

However, facing IoT technology issues is of paramount importance before its wide application in the shipping industry (Pu and Lam 2021a, b). To achieve a wider application within the shipping industry, this ground-breaking digital technology prerequisite a high degree of standardization to enable interoperability among the individual company systems. It also requires familiarization of the market participants so that they can reap the benefits of the new technology and improving shipping operations.

Future directions

This study investigates in a holistic way the shipping operations in connection with the utilization of new digital technology. Regarding the limitations of this study, the examination of further applications and uses of IoT technology in the shipping industry, such as the value of smart ships and smart spare parts, would be of interest for the market participants, and may constitute the basis of future research. This research complements the extant literature by utilizing a machine learning natural language processing model to analyse the general impact of the IoT technology in the shipping operations and decision-making. To the best of our knowledge, prior studies have used bibliometrics analyses, but never adopted a machine learning approach in this research subject. Relating to the limitations of this research, the research corpus limited to 228 academic papers, collected from the Scopus database. This may cause bias to the research findings.

Referring to future research, studies can extend the machine learning analysis of a larger corpus would strengthen the findings. Training alternative machine learning models to crosscheck the findings of this study would also be useful in future research. Finally, exploring the areas of concern, referring to the wider application of the IoT technology in the maritime shipping sector, would contribute to finding some sustainable solutions to accelerate its adoption.