Analytics of high average-utility patterns in the industrial internet of things

Recently, revealing more valuable information except for quantity value for a database is an essential research field. High utility itemset mining (HAUIM) was suggested to reveal useful patterns by average-utility measure for pattern analytics and evaluations. HAUIM provides a more fair assessment than generic high utility itemset mining and ignores the influence of the length of itemsets. There are several high-performance HAUIM algorithms proposed to gain knowledge from a disorganized database. However, most existing works do not concern the uncertainty factor, which is one of the characteristics of data gathered from IoT equipment. In this work, an efficient algorithm for HAUIM to handle the uncertainty databases in IoTs is presented. Two upper-bound values are estimated to early diminish the search space for discovering meaningful patterns that greatly solve the limitations of pattern mining in IoTs. Experimental results showed several evaluations of the proposed approach compared to the existing algorithms, and the results are acceptable to state that the designed approach efficiently reveals high average utility itemsets from an uncertain situation.


Introduction
Sensor networks and the Internet of things (IoTs) expand rapidly and several smart devices such as mobile phones, sensors, actuators, and RFID are widely used and applied in different industrial applications and domains [1,2]. It is necessary to concern all the relevant factors such as weight, interestingness, utility, or frequency to discover meaningful and useful information for more specific applications, that bring more effective knowledge for decision making. Several projects were carried out to quantify the necessary information efficiently regarding high-utility itemsets in a database by considering the improvements of mining efficiency and effectiveness of the discovered knoweldge in different applications [8]. More than that, security issue [9,10] regarding pattern analytics in IoT environment [11] has also arisen as an important and meaningful topic in recent years.
HUIM, on the other hand, has a significant weakness in pattern-mining tasks. With the length of a pattern, the utility value of an itemset increases as its size increases. Under this situation, if an item(set) is considered as a high-utility pattern, there is a very high possibility that any combination with this item(set) can also be considered as a high-utility pattern (i.e., any combination with the caviar such as apple or pear can also be considered as an itemset with very high utility in the basket-market analytics). This is inappropriate for pattern evaluation, particularly when the length (size) of a pattern (itemset) is very long. An alternative way to better reveal the utility of a pattern named high average-utility itemset mining (HAUIM) [12] was presented to figure out this limitation and provide better and fair results in pattern mining. Several works [13,14] have been presented, and most of them only focus on handling the precise datasets. When the collected data include the uncertain factor, such as probability value in industrial applications in the WSN or IoTs, existing works could not handle this situation, and the discovered knowledge could be irrelevant to decision making.
Compare to traditional pattern-mining applications, IoT environments have more uncertainty. Even though the technique to enhance the reliability of IoT equipment has also growth recently, the value collected from the equipments of IoTs could not be considered as a precise value. For example, a temperature sensor is impossible to detect a precise value when it is placed at different locations in a room. Thus, the pattern-mining process should not ignore this limitation to establish knoweldge for decision-making; the uncertainty should be involved in pattern analytics regarding IoT environments. PHAUI can reveal useful knoweldge (patterns) from IoT environments. For instance, a sensor is designed to detect the various natural environment values. To understand the relationship with temperature, it is possible to apply the PHAUI for discovering the valuable patterns. For the traditional PHUI or HUI, they are used to generate a set of longer patterns, which may lead to produce the meaningless information; the designed PHAUI can be used to solve this limitation. Thus, we then presented APHAUIM, a new framework that considers the level-wise manner to efficiently disclose HAUIs from the collected uncertain databases of WSNs or IoTs in this work. This model is suitable and applicable in the industrial WSNs or IoTs for further pattern-mining tasks. Main developments are then summarized below.

A Potential High Average-Utility Itemset Mining
(APHAUIM) based on a level-wise model is introduced to discover the desired potential high average-utility itemsets efficiently from uncertain datasets used in industrial IoTs. 2. To further minimize the scale of the search space for discovering possible patterns of average-utility and uncertainty constraints, two effective upper-bounds are used. Those upper-bounds served as pruning strategies, reducing the search space size and allowing for more efficient discovery of the necessary information. 3. Experimental results indicated that the designed APHAUIM could reveal more significant and nonredundant patterns if the uncertain situation is considered more applicable and suitable in the industrial WSNs or IoTs.

Related work
As the rapid growth of computer techniques [15], analyzing data obtained from various devices, such as IoTs or WSNs, is not a simple process. Pattern mining is one of the techniques that reveals the relationships of the items in different domains and applications [16,17]. The discovered information is presented as useful knowledge to show the implicit and potential information for making strategic decisions. The first generic algorithm to discover the correlation of the itemsets in the database is referred as association rule mining (or called ARM), which finds the set of frequent itemsets in the first stage by minimum support threshold then discovers the association rules in the second stage by minimum confidence threshold. As a result, it only deals with the precise dataset [18], however, data quantity and uncertainty exist in real-world domains and industries, which has been ignored and discarded in the generic ARM models. Quantitative factor [19] or interestingness [20] are also the alternative constraints that can be put into the FIM or ARM to reveal more information. The above algorithms handle the precise dataset that cannot be well-performed for handling the uncertain databases. In industrial applications such as manufacturing, the collected or received data may consist of the uncertainty factor; it is not possible to discover suitable knowledge for decision-making if the probability or uncertainty is involved in the pattern-mining tasks. Chui et al. [21] then presented an algorithm to discover the frequent expected patterns. Bernecker et al. [22] presented a model for handling the probabilistic frequent pattern mining from the uncertain databases. Several works regarding uncertain data mining can be studied [23].
In addition to traditional FIM or ARM, utility-oriented concept [24] is considered as an emerging and important task in recent years since it reveals more information for decision-making, especially the unit of profit and the quantity of the items are considered in the pattern-mining task. Compared to the support-based constraint, utility-oriented concept is more appropriate in real-world applications and domains, and the high-utility itemset mining (HUIM) [25] was then studied and implemented to better reveal HUIs for decision-making. Yao et al. [26] proposed the utility issue regarding the internal utility (quantity) and external utility (unit or profit) to mine HUIs. However, this work fails to handle the limitation so-called "combinational explosion", thus the search space to discover the sufficient patterns is huge. To better solve this limitation, Liu et al. [27] then presented the TWU concept that takes the transaction-weighted utility as the measurement to estimate the value of upper-bound on the patterns. In addition, by investigating the high-transaction weighted utilization itemset (HTWUI) regarding the transaction-weighted downward closure (TWDC) property, the search space can be greatly cut down and the performance can be efficiently improved compared to the generic models. Ahmed et al. [28] then presented an incremental process called IHUP to revise the discovered HUIs in the dynamic database for transaction insertion. Tseng et al. then introduced the UP-growth+ method, which uses a condensed tree structure to shrink the original database and a variety of pruning techniques to shrink the search space for mining the required HUIs [29]. Then HUI-Miner approach proposed by Liu et al. [25] to use a compressed data structure for HUIM. Several extensions of HUIM regarding different topics and constraints are then discussed [30][31][32].
A key problem of HUIM is that the utility value gets higher along with the length of the itemset. The explanation for this is that even though an item in a high-utility pattern contributes very little utility, this pattern can still be called a HUI. To better reveal a good and an alternative solution for pattern mining regarding utility concept, a method called high average-utility itemset mining (HAUIM) [12,13] was then proposed and presented that is used to find the average-utility of a pattern in the mining task. The first 2-phase TPAU algorithm [12] was presented to reveal the high average-utility itemsets (HAUIs) in a level-wise manner. It estimates the upper-bound value on the patterns using the average-utility upper-bound (auub), downsizing the search space to find the set of HAUIs. A projected algorithm called PAI [33] was presented to speedup mining performance by introducing the projection mechanism. A tree-based algorithm known as HAUP-growth [13] was also investigated to find the collection of HAUIs quickly using a compact tree structure. Besides, this structure can thus avoid the multiple database scans compared to the levelwise manner. An attached array is utilized for each tree node; thus, the computational cost can also be reduced. An average-utility-list (AU-list) structure [14] was further studied to methodically reveal the set of HAUIs based on a condensed link-list structure. Two pruning strategies were further implemented to diminish the search space for pattern mining. Several extensions [34] were also presented and studied. However, the above works do not consider the uncertainty factor of the data that is collected in WSNs or IoTs. Thus, all the events and activities are treated as the same weight and importance, which is inappropriate for industrial and real-world applications.
In some industrial tasks and applications, the schedule for the manufacturing tasks can be arranged and organized by the knoweldge discovered from the WSNs or IoTs [7]. Pattern analytics and mining play an important role to receive up-to-date information for the manufacturing industry [7]. For example, a record in a database can be considered a set of sensors receiving the values from WSNs or IoTs (e.g., temperature degree). Those values can be the abnormality or risk measure regarding the consideration of location or geographic issues. To better analyze the collected data in the industrial applications [16], the uncertainty constraint is one of the major factors that should be concerned for pattern analytics and knowledge evaluation. Several works put the uncertainty constraint into the pattern-mining tasks, for example, combining FIM and uncertainty as uncertain frequent pattern mining [35,36]; taking the SPM and uncertainty as uncertain sequential pattern mining [37]; or integrating the HUIM and uncertainty as uncertain HUIM [38,39].
The relevant utility-oriented pattern mining algorithms are organized in Table 1. Obviously, previous works seldom discussed the applications in IoT environments. In [38,39], Lin et al. involved the uncertainty concept in the utility pattern mining field. As we know, uncertainty is one of the major characteristics in IoT data. Revealing valuable knowledge or information from IoT data is not a trivial task. Compared with the high-utility itemset (pattern) concept, high-average utility itemsets provide more precise and streamlined patterns than that of the high-utility concept. It is crucial to develop strategies or effective knowledge from IoT data. Up to now, fewer works considered the uncertainty factor especially in the high average-utility pattern mining. In addition, traditional utility pattern mining models do not consider the average concept, thus the generated patterns sometimes are too long and become meaningless. Thus, a high potential high average utility itemset algorithm (revealing high average utility itemsets in an uncertainty database) is proposed in this paper that can be adapted in IoT environments.

Preliminary and problem statement
The first part of this section provides the foundational definitions of potential high average utility itemset mining (PHAUIM). The problem statement for PHAUIM is then presented in the Section 2. The last is PHAUIM's downward closure property. Note that most of the previous utility research is based on transaction databases in trading market environments. In order to incorporate the earlier definitions in utility pattern mining, we followed the transaction and profit (utility) concepts from the earlier definitions. However, in an IoT environment, a transaction can be denoted as a record collected from sensors, and the value of utility can be indicated as the specific attribution value detected by sensors in a record. This situation also applies to an item in the sensor environments. Thus, an itemset is a combination of attributions in a record from a IoT data. Moreover, in order to simplify the content, the foundational definitions from the previous high-utility itemset mining would not be defined again here but refer to the previously average utility works [12,13]. These foundational definitions include itemset average utility (denoted as au(X), where X is an itemset), high average utility itemset (denoted as HAUI), average utility upper bound (denoted as auub), and high average utility upper bound itemsets (denoted as HAUUBI). The other foundational definitions using in this work are provided in the following parts.

Definition 1 (Itemset and Transaction Probability)
The corresponding probability p(X, T ) of the itemset X in the transaction T is set as the same value of the corresponding probability p (T ) in the transaction T. The formula is given below:

Definition 2 (Itemset Potential Probability in a
Database) The potential probability P ro(X) of the itemset X in the database D is defined below: Definition 3 (High Probability Itemset, HPI) If an itemset X in the database D is a high probability itemset (HPI) iff the following condition is satisfied: where μ is the predefined threshold of the minimum potential probability.
HPIs are different from HAUIs, since HPIs do not need an upper-bound to keep the downward closure property. Thus, if the itemset X is not a HPI, all of the supersets of X is impossible as a HPI. Therefore, it can be applied in an Apriori-based algorithm directly.

Definition 4 (Potential High Average Utility Itemset, PHAUI)
If an itemset X in the database D is a high potential average utility itemset (PHAUI) iff it is both a HAUI and a HPI simultaneously.
This work first proposed the concept of PHAUIs, and the proposed methods perform a Apriori-based process to reveal all of PHAUIs in a database. To keep the downward closure property to reveal PHAUI, a superset of PHAUIs name as potential high average utility upper bound itemset (PHAUUBI) is introduced below.

Definition 5 (Potential High Average Utility Upper Bound Itemsets, PHAUUBI)
If an itemset X in the database D is a high potential average utility upper bound itemset (HAUUBI) iff it is both a HAUUBI and HPI simultaneously.
Due to the above definitions, PHAUUBIs is a superset of PHAUIs and also keep the downward closure property. Therefore, it is used to produce some candidate PHAUIs in order to narrow down the search space. However, the traditional auub is a very large and loose value thus the search space to find the promising HAUIs is very huge. This is reasonable since the generic auub does not include its own value. For example, suppose two itemsets A and B exist in a transaction, and based on auub model, the itemsets A and B have the same auub value in this transaction. Thus, a partial-upper bound called pub is then designed to solve this limitation and is explained as follows.
Definition 6 (Active Item) Let ai and ais respectively be the active item and active itemset in the designed algorithm for mining the HAUIs in a level-wise manner. We can have that the k-itemsets can be used for the next generation of (k+1)-itemsets, which is the generic progress of the Apriorilike approach. Thus, we can also assume an item or an itemset does not exist in the k-itemsets, it would not be involved in the next generation of (k+1)-itemsets. For the most Apriori-like algorithms, this model is used to generate the n itemsets from (n-1)-itemsets, and it is also called the downward closure property for pattern-mining progress.
Assume estimate the set of PHAUUBIs with length n, an active item (ai) shows that this specific item must appear in one of the PHAUUBIs with the (n-1) length. We then can obtain the extended definition of active itemset of the candidate PHAUUBIs with the length is n (called ais n ) as follows: where PHAUUBIs n−1 can be considered as a PHAUUBI with the (n-1) length.
For instance, suppose the itemsets {A,B}, {A,C} and {B,C} are the discovered itemsets of the last iteration. In this situation, the active itemset for the next generation of PHAUUBIs is considered as ais 3 = {A,B,C} with the length 3.
Thus, we can conclude that the active items can be used in the proposed model as the set of ais that holds the downward closure property of the PHAUUBIs. So that if an itemset is considered as a PHAUUBI, of course, any subset of it is also considered as a PHAUUBI.
Definition 7 (Remaining Maximal Utility) Suppose the remaining maximal utility of the active itemset in a transaction is then denoted as rmua in the proposed algorithm. Suppose that an itemset I = {i 1 , i 2 , . . . , i n } in a transaction T = {i t , u t }; I t = {i t1 , i t2 , . . . , i tm } can be considered as the purchase products in this record (or transaction in the basket-market dataset), and U t = {u t (i t1 ) = u t1 , u t2 , . . . , u tm } can be considered as the corresponding utility for all items existed in this transaction. Suppose we have the active itemsets called ais. The rmua of an itemset i in a transaction t is denoted as rmua(i, t) and defined below:

Definition 8 (Partial Maximal Utility Upper Bound)
Let partial maximal utility upper bound be denoted as (pub). Suppose that an itemset is denoted as i and a record/transaction is denoted as D. Let pb i be the pub value of i in D, that can also be defined below: Definition 9 (High Partial Upper Bound Itemset, HPUBI) Let high partial upper bound itemset be denoted as pubi. Suppose an pre-defined threshold value is set as ε × T U. An itemset i is considered as a pubi if it satisfies the condition as: We can ensure that the constructed algorithm can retain the downward closure property for pattern generation based on the Apriori-like approach using the proposed Lemma 2. As a result, pubi collection can be used for the next generation of super-itemsets. As a result of the designed lemma, the search space can be greatly reduced, and the computational cost can be greatly reduced.
Since pubi can be used to reduce the size of the search space in the pattern-mining tasks of candidate generation, the subset of lead partial upper bound (named lead-pub) is also introduced here that can be further utilized in the designed algorithm and reduce the search space for candidate generation. In this situation, the orders of itemsets should be first pre-defined.
Definition 10 (Lead Partial Upper Bound Itemset, lead-pubi) Suppose that a predefined item order l is assumed initially and a candidate itemset is considered as I c = {i 1 , i 2 , . . . , i n } following the order of l. For the level-wise manner to for candidate generation progress, this candidate must be one of the subset of pubi, and the set of pubi is defined below: In this case, the first element of this set is one of the lead-pubis.
Based on the above definition, we can have that a smaller upper-bound than that of the pub of a candidate can be defined as follows. This upper-bound of remaining maximal utility of active itemset can be used to further minimize the size of search space.

Definition 11 (Remaining Maximal Utility of Active itemset, lrmua)
Suppose that an itemset is considered as I = {i 1 , i 2 , . . . , i n } and a transaction in the database is defined as T = {i t , u t }. I t = {i t1 , i t2 , . . . , i tm }. Each transaction consists of the purchase items in the database. Suppose that U t = {u t (i t1 ) = u t1 , u t2 , . . . , u tm } is considered as the corresponding utility for all items existed in this transaction. Based on the above definitions and properties, the active itemsets is denoted as ais and a predefined item order L = i l 1 , i l 2 , . . . , i l p . Suppose that i n = i l w , and set an itemset s = i l w+1 , i l w+2 , . . . , i l p . Thus, the lrmua of i in T is denoted as lrmua(i, T ), which is defined below: and U t = {u t (i t1 ) = u t1 , u t2 , . . . , u tm } is considered as the corresponding utility of all items existed in this transaction.
In this case, we can have that the active itemsets is denoted as ais. Also, we can suppose that the predefined item order is set as: L = i l 1 , i l 2 , . . . , i l p . The lead-pub of i in T is denoted as lpb T i , and defined below: where m is considered as the number of (ais ∩ i t ∩ s) \ i.

Definition 13 (Lead Partial Maximal Utility Upper Bound for an Itemset)
Suppose that an itemset is denoted as i and a database is denoted as D. The lead-pub of i in D can thus be denoted as lpb i , which is defined below:

Definition 14 (Lead High Partial Upper Bound Itemset, LHPUBI)
Suppose that a pre-defined threshold is set as ε × T U, an itemset i is considered as a lead-pubi if it holds the condition as: In conclusion, the lead-pubi holds the similar definition as the pubi. That is, the lead-pubi can also hold the same downward closure property as pubi. However, the number of the candidates in the set of lead-pubi is much smaller than that of the pubi since lead-pubi⊂pubi. Thus, the leadpubi can be used to further reduce the search space of the potential candidates. Based on the above definitions of two upper-bound values, two candidate itemsets are considered in the designed model that can be utilized by the levelwise manner (or so-called Apriori-like approach). Those two candidates for finding the required patterns are then defined and stated below.

Definition 15 (Potential High Partial Upper Bound Itemset, PPUBI)
The itemset X in the database D is a potential high partial upper-bound itemset (PPUBI) iff it is both a HPUBI and HPI simultaneously.

Definition 16 (Potential LEAD High Partial Upper Bound Itemset, PLPUBI)
The itemset X in the database D is a potential lead high partial upper bound itemset (PLPUBI) iff it is both a LHPUBI and HPI simultaneously.

Problem statement
The intention of PHAUIM is revealing all of the PHAUIs in an uncertain dataset. Suppose that an input dataset is set as D, the unit profit table of the itemsets is set as p, the pre-defined utility threshold is set as ε, a pre-defined potential probability threshold is set as μ. The output of the designed algorithm is to discover the set of potential high average-utility (PHAUIs) from the uncertain database D.

Designed Apriori-based PHAUIM, APHAUIM
The detailed algorithm of the developed APHAUIM is described as follows. The proposed APHAUIM utilized two tighter upper-bounds such as pub and lead-pub to hold the downward closure property. Based on those two upper-bounds, the designed algorithm maintains two itemsets respectively called phps and plhps to maintain the downward closure property. The developed algorithm can thus efficiently reveal all the satisfied PHAUIs by holding the completeness and correctness based on the maintained two itemsets. The pseudo-codes of the developed algorithm are respectively described in Algorithm 1. Note that all the acronyms used in the developed algorithm are then shown in the Appendix of Table 3. Due to the same reason mentioned in the definition section, we also made use of the concepts from the applications in trading markets. The pseudo-code uses the definitions from transaction databases like the word "transaction" and "itemset". For instance, a transaction con also be transferred into a record in a IoT database.
The proposed method (Algorithms 1) is a standard Apriori-based process to find all PHAUIs in a database. In Algorithm 1, the proposed APHAUIM first calculates the maximal utility for each transaction, utility information and probability information for each item by a loop (lines 2 to 9). Then the proposed method finds the PHAUIs with one item and the initial PPUBIs and PLPUBIs in lines 15 Figure 1 shows a simple flowchart of the proposed method. The proposed method generates Potential LEAD High Partial Upper Bound Itemset (PLPUBIs) and Potential High Partial Upper Bound Itemset (PPUBIs) to obtain candidate itemsets based on the Apriori-like method. Then the process will check the candidate itemsets to see whether it is Potential High Average Utility Itemset (PHAUIs) (also generates PLPUBIs and PPUBIs for the next round). If PLPUBIs is an empty set, then the progress is terminated and the output is produced PHAUIs.

Experimental evaluation
In this section, the proposed APHAUIMs showed the performance to reveal PHAUIs efficiently. APHAUIM(I) is the proposed method with lead-pub and APHAUIM is the proposed method without lead-pub. The experiments also showed the results of the previous AHAUIM (Apriori-based HAUIM) [12] in revealing HAUIs. Results also showed the performance of APHAUIM and AHAUIM in four standard datasets that have been widely used in utility-oriented pattern-mining tasks [40]. Table 2 states the properties of the four used datasets datasets in the experimental evaluation. All algorithms were performed by Mac PC, running on Apple M1 processor with 8GB main memory by macOS Big Sur OS. In additions, the algorithms are then written by Java language and will be released to public for the further usage.

Runtime
In this section, the runtimes of the proposed APHAUIM and the previous AHAUIM with different minimum averageutility thresholds in the four datasets from SPMF [40] are then evaluated and compared. The results of the experiments are indicated in Fig. 2. Obviously, the algorithms applied lead-pub usually have better performance than the algorithms without lead-pub. However, the influence of leadpub is more conspicuous in revealing HAUIs than revealing PHAUIs. Applying lead-pub is also suffered the computation cost in calculating one more upper-bound. Due to fewer candidate itemsets in PHAUIs, the benefit of lead-pub is not very obvious, especially in a small and loose database. The runtime of APHAUIM(I) is even more than APHAUIM in foodmart database with minimum average-utility threshold 0.0012. However, in most situations, lead-pub is a useful technique to reduce the computational cost in the whole process. Thus, the designed model is acceptable and efficient to mine the required information (e.g., potential high average-utility itemsets) from the uncertain databases.

Number of candidates
In this section, the amount of candidates (PHAUIs and HAUIs) for the proposed APHAUIM and the previous AHAUIM with different minimum average-utility thresholds in the four datasets are compared and evaluated. The results of the experiments are indicated in Fig. 3. The first point of these experiments is the different numbers of the candidates of PHAUIs and HAUIs. In PHAUI mining algorithm, there is one more upper-bound (limitation) than the traditional HAUI mining algorithms. Therefore, the number of candidates in PHAUI mining algorithms is significantly less than the candidates of HAUIs. Note that the proposed method utilized and applied the downward closure property of probability value to generate the candidate itemsets in the Apriori process. The second point of this section is showing the power of lead-pub. Whether in the field of PHAUIM or HAUIM, lead-pub can always reduce the numerous candidates in the mining process. It is always useful to reduce the computation time in the process. Moreover, the effect of lead-pub is more effective in Runtime a loosen minimum average-utility threshold. The Aprioribased algorithms without using lead-pub will produce a massive number of candidate itemsets when the value of the threshold is small. The lead-pub can control and suppress the increase of candidate itemsets effectively when the value threshold is declining down. Thus, it is necessary to apply lead-pub when a mining process is set a small value of the threshold.

Discovered PHAUIs or HAUIs
In this section, the number of PHAUIs or HAUIs with different minimum average-utility thresholds in the four datasets are compared and shown in Fig. 4. The experimental results showed that if we apply a traditional HAUIs mining algorithm in an uncertainty database, it will generate many overestimated itemsets in the results. If the average probability of the whole uncertainty database is low, the meaning of the revealed itemsets is unusable, and the real usable knowledge will be hidden in the huge number of itemsets. That is to say, if the level of uncertainty is very high in an uncertainty database, the previous HAUI algorithms could not be applied to discover effective and valuable knowledge. After involving in the concept of the probability, the proposed method can precisely estimate interesting itemsets with an uncertain probability. Especially for a large minimum average-utility threshold, the different numbers of the candidates of PHAUIs between HAUIs are larger than ten million, and this big gap shows the effectiveness of the designed model.
Generally, the designed model is capable and effective to handle the mining task regarding the utility and uncertainty constraint, which is very applicable in the WSNs or IoTs scenario. Also, the discovered patterns are meaningful in the WSNs or IoTs since the uncertainty factor should be considered and implemented while evaluating the

Memory usage
This experimental section shows the memory usage of the compared algorithms. Due to the Java environment, it is very hard to evaluate the precise memory usage of running processes. Here we monitored the maximal memory usage for the Java machine when a specific algorithm is performed. Results are then shown in Fig. 5.
From the results, we can observe that due to the small searching space, the memory usages of PHAUI-based algorithms are less than HAUI-based approaches. However, the influence of the proposed lead-pub is not clear in the memory usage. Even though lead-pub can effectively reduce the number of candidates, it however, still suffers the extra computation process and a more memory requirement. Normally, if a mining process needs more runtime, it also needs more memory to handle the entire progress for obtaining the final results.

Scalability
Following the results of the previous section, the scalability of the proposed PHAUI is discussed in this section. Figures 6 and 7 showed runtimes and the number of candidates of the compared algorithms in terms of the different scales of database sizes.
From the experimental results, if lead-pub is not considered, a large-scale database size and high-density database should logically requires more time to reveal

Runtime
HAUIs or PHAIUs. However, these two factors did not show a high relationship with runtimes in the experimental results. Even though a database has more candidates, but it is possible to spend less time than a database that has fewer candidates. In Fig. 6, the relationship between the average length and the runtime for a database is obviously higher than the other factors. That is because the scanning process of Apriori should take more time. There is no doubt that HAUIs and PHAUIs methods have this same characteristic in the above conclusion.
In addition, the proposed lead-pub should have the outstanding performance in most situations. It should be noted that if a database has a low average length and a low density, a well-defined pruning strategy is unfruitful (for example, in foodmart dataset). The difference between a number of normal patterns and a number of HAUIs (PHAUIs) is huge. Thus, a precise cutting method usually cannot diminish the searching space effectively. Fortunately, this database type does not require higher computational sources in most cases.
Finally, due to the simple duplication of the original database, the numbers of candidates are absolutely fixed. Figure 7 shows this characteristic and makes sure that the proposed method can work well in large-scale databases.

Conclusion
In recent decades, the collected data from the wireless sensor networks (WSNs) and Internet of Things (IoTs) have dramatically increased; thus, the uncertainty in the database is necessary to be considered for the pattern-mining task, especially for some tasks in the industrial applications and domains. However, most existing algorithms in pattern mining especially for the HAUIM cannot be performed to handle the collected data with uncertainty, which is inappropriate in real situations. In this work, a new model called Apriori-based potential high average-utility itemset mining (APHAUIM) is presented to reveal the potential high average-utility patterns from an uncertain database. Experimental evaluation is then performed to present the effectiveness and efficiency of the developed algorithms compared to the generic approach of HAUIM, and the designed APHAUIM with two upper-bounds (pruning strategies) is acceptable for performance evaluation used in the uncertain IoT databases.
In the future, a compressed data structure will be explored to solve the limitation of the level-wise progress for pattern-mining tasks especially the tree or list structures could be involved and concerned to improve the mining performance. More online or stream tasks in the real Runtime industry can also be extended and considered for further studies. Besides, it is also an interesting topic to consider the dynamic environments to develop the efficient algorithms regarding transaction insertion, deletion, and modification in the databases.
Funding Open access funding provided by Western Norway University Of Applied Sciences.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.