Asymptotic Time Complexity of Identification of Basic-level

This paper focuses on cognitive computing approaches to identifying the basic-level in a hierarchical structure. In particular, it provides asymptotic time complexities of the identification of the basic-level for five basic-levelness measures, i.e., feature-possession, category utility, category attentional slip, category’s cue validity with global threshold, category’s cue validity with feature-possession. Asymptotic time complexities were analytically determined for each basic-levelness measure separately. First, the time complexity of auxiliary measures (i.e., utilized by basic-levelness measures) was determined. Second, the time complexity of the identification of the basic-level was determined. Finally, an optimization of the identification was proposed. The identification of the basic-level requires polynomial time. In particular, category attentional slip and category’s cue validity with feature-possession require an additional iteration through all objects, which increase the time complexity.


Introduction
Categorization is a concept derived from psycholinguistics. According to Eleanor Rosch a category is a group of objects considered equivalent [1], i.e., objects are in the same category since they are similar to each other in terms of their features. It is worth emphasizing that having a category does not mean learning a list of all members belonging to the category, but it is strictly related to gaining knowledge about what could be a member of it [2]. In other words, a person has a mechanism, which allows B Mariusz Mulka mariusz.mulka5@gmail.com 1 Faculty of Information and Communication Technology, Wrocław University of Science and Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland classifying objects to a category, e.g., having a category "tree" means that somebody seeing a tree knows that it is a tree without the necessity of seeing all existing trees located in the external world.
Categories are organized into a hierarchical structure having vertical and horizontal dimensions [3]. The vertical dimension concerns the inclusiveness of categories, e.g., swivel chairs are chairs, seats, and furniture. The horizontal dimension is a segmentation of categories performed at the same hierarchy level, e.g., chairs, tables, wardrobes, etc. In general, there are three types of hierarchy levels, namely, subordinate, basic, and superordinate levels [1]. These levels differ in terms of informativeness and abstractness. Subordinate categories (e.g., a swivel chair) belong to subordinate levels. Objects belonging to these categories share many common features (i.e., informativeness is high). In addition, they tend to be detailed, specific, and distinct representations (i.e., abstractness is low). While superordinate categories (e.g., a piece of furniture) belong to superordinate levels. Objects belonging to these categories share a few common features (i.e., informativeness is low). Moreover, they tend to be very broad and general representations (i.e., abstractness is high). Finally, the basic-level is a special hierarchy level that lays between abstractness and informativeness (e.g., a chair).
The balance between abstractness and informativeness is related to the fact that the basic-level provides the most cognitively useful distinction between categories [4]. Basic-level categories decompose the perception of the external world into very informative categories and provide fundamental building blocks of cognition since they tend to maximize the contrast with other basic-level categories, while maintaining high internal consistency. Hence, people are generally faster and more accurate to name or categorize objects at the basic-level (e.g., a chair) compared to superordinate (e.g., a piece of furniture) or subordinate (e.g., a swivel chair) categories [5]. 1 Being a core building block of human cognition, basic-level categories are also crucial for artificial cognitive systems, which simulate the functions that enable humans to perform semantic parsing, rather than implementing semantic parsing as a machineoriented mechanism [7]. Such systems can assist humans in better understanding of complex situations and making smarter decisions, since they utilize computational powers of machines and human-like approaches to take advantage of available vast amounts of data [8].
Psycholinguists propose measures which can be applied in an artificial cognitive system to extract basic-level categories. These measures are called basic-levelness measures, and they are utilized in a hierarchical structure to identify which hierarchy level is the basic-level. In particular, the following basic-levelness measures are found: (a) feature-possession (FP) [9], (b) category utility (CU) [10], (c) category attentional slip (CAS) [11], (d) category's cue validity (CCV) [1]. Further, two modifications of the measure proposed by Rosch are found in the literature: (a) category's cue validity with global threshold (CCVGT) [12], (b) category's cue validity with feature-possession (CCVFP) [13].
In recent years, the interest in the extraction of the basic-level categories has increased in computer science literature. Namely, approaches can be found where knowledge is organized in a hierarchical structure, and then the basic-level is identified [14][15][16]. The identification of the basic-level utilizes basic-levelness measures to identify the basic-level in an already defined hierarchical structure. Hence, the identification of the basic-level can be considered as a subtask of the extraction of basic-level categories.
Controlling the time complexity is an essential part of computer programming. It is worth emphasizing that organizing knowledge in a hierarchical structure is welldefined in computer science in terms of the (asymptotic) time complexity (see [17,18]), whereas the time complexity of the identification of the basic-level is neglected. Hence, the time complexity required to identify the basic-level for each basic-levelness measure is determined in this paper. In addition, the optimization of the identification of the basic-level is proposed for each basic-levelness measure separately. In conclusion, a technical novelty of this paper is to investigate the time complexity of identifying the basic-level, which fills the research gap.
The rest of this paper is organized as follows. First, Sect. 2 contains the definition of a hierarchical structure where the identification of the basic-level can be performed. Then, basic-levelness measures are formally defined in Sect. 3, while the (asymptotic) time complexities of the identification of the basic-level are determined for each basiclevelness measure separately in Sect. 4. Finally, the discussion is performed in Sect. 5, while concluding remarks are given in Sect. 6.

Objects
To analytically determine the time complexity of the identification of the basic-level, a static and finite set of objects is further assumed, Objects can exhibit only a predefined set of binary features F = { f 1 , f 2 , . . . , f M }, where M =| F |. Furthermore, the relation between an object and features is given by an information function γ : (O × F) → {0, 1} (in this paper, an information function γ (o, f ) is equivalently written as γ o, f ). In particular, γ o, f = 1 represents a case where "an object o exhibits a feature f ", and γ o, f = 0 otherwise.

Hierarchy of Clusters
Psycholinguistic experiments related to the identification of the basic-level have been performed using hierarchical structures containing categories, where each object is a member of all categories in a specific branch of the hierarchical structure (e.g., [5,[19][20][21][22]). In the case of [5,19], the root of the hierarchical structure was not explicitly defined. In contrast, the root was explicitly defined in the remaining three papers, and such a hierarchical structure is called a hierarchy of categories [13].
As aforementioned, the identification of the basic-level is performed in a hierarchical structure. However, it is worth noting that in this process, basic-levelness measures refer to categories that are considered groups of objects (i.e., clusters). Hence, the complex structure of a category is irrelevant for determining the time complexity of identifying the basic-level.
Therefore, for the purposes of the analytical research conducted in this paper, two formal definitions are presented as computational auxiliary structures: a cluster (see Definition 1) and a hierarchy of clusters (see Definition 2). Definition 1 A cluster X i is defined as a non-empty subset of objects, i.e., X i ⊆O ∧ X i =∅.
A hierarchical structure utilized in this paper is analogous to a hierarchy of categories defined in paper [13], the only difference is that it contains clusters instead of categories. A hierarchy of clusters X consists of r ρ hierarchy levels, i.e., X = {X 1 , X 2 , . . . , X r ρ }, where each hierarchy level is a partition of a set of objects O (see Definition 2).

Definition 2 The lowest hierarchy level is a set
i r +1 }, where i r +1 is the number of clusters and each cluster X r +1 k fulfills the following properties: where Z i r ⊆X r is a subset of clusters used to create a new cluster X r +1 i . In addition, a hierarchy level X r +1 fulfills the following properties: Therefore, first, two clusters from the same hierarchy level r +1 cannot contain the same object. Second, clusters from a hierarchy level r +1 are prepared by merging clusters from a hierarchy level r (in particular, a cluster from a hierarchy level r + 1 can be identical to a cluster from a hierarchy level r , i.e., both contain the same objects). Third, a hierarchy level X r +1 is a partition of all objects. Fourth, it is crucial that at least one new cluster from a hierarchy level r + 1 is created by merging at least two clusters from a hierarchy level r .
It is worth noting that the definition of a hierarchy of clusters is consistent with the hierarchical structures used in psycholinguistic experiments related to the identification of the basic-level (e.g., [5,[19][20][21][22]). In particular, each object is assigned to exactly one cluster (or a category) at each hierarchy level, meaning that each hierarchy level is a partition of all objects.
Moreover, soft clustering [23] is not considered in this paper, due to the assumption used by psycholinguists that an object belongs to exactly one basic-level category [24]. In other words, according to psycholinguistic research, an object belongs to exactly one basic-level category, which is consistent with hard clustering [23].
The definition of a hierarchy of clusters presented in this paper is consistent with the definition in computer science literature [12]. However, the definition 2 (which is analogous to a hierarchy of categories presented in [13]) extends the definition in [12] by allowing for a different number of hierarchy levels than the number of objects. This is consistent with psycholinguistic research, unlike the definition in [12], which requires each hierarchy level to contain one less cluster than the level below it.

Additional Terminology
In this paper, referring to the relation between clusters X r k , X r +1 is performed as follows: (a) a cluster X r k is a subcluster (child) of a cluster X r +1 l , (b) a cluster X r +1 l is a supercluster (parent) of a cluster X r k . Referring to clusters at some hierarchy levels is performed by usage of commonly recognized terms [25], namely: (a) the highest hierarchy level contains a single cluster which is called the root (i.e., it is a cluster containing all objects), (b) a cluster without subclusters is called a leaf (i.e., all clusters at the hierarchy level r = 1 are leaves).
In addition, a hierarchy of clusters contains up to | O | hierarchy levels. In addition, the (unique) hierarchy of clusters contains up to (2 | O | −1) unique clusters (i.e., containing a unique subset of objects). It results from the fact that the lowest hierarchy level contains | O | unique clusters. In addition, different pairs of clusters can be merged up to (| O | −1) times in order to create a unique cluster (since each merging decreases the number of possible merges by 1). Whereas, merging more than two clusters in order to create a new unique cluster decreases the maximal number of unique clusters. Concluding, these clusters (i.e., | O | clusters from the lowest hierarchy level and up to (| O | −1) clusters merged from at least two clusters) can be stored to perform further optimization of the identification of the basic-level (see Sect. 4), which requires at least (| O |) space.
In conclusion, it is further assumed that a hierarchy of clusters has already been prepared 2 to focus solely on the identification of the basic level.

Identification of the Basic-level
The identification of the basic-level is performed in an already prepared hierarchy of clusters using basic-levelness measures. In the psycholinguistic literature, a few basiclevelness measures are found. In particular, the following basic-levelness measures can be utilized: (a) feature-possession (FP) proposed by Jones [9], (b) category utility (CU) proposed by Corter and Gluck [10], (c) category attentional slip (CAS) proposed by Gosselin and Schyns [11], (d) category's cue validity with global threshold (CCVGT) proposed by Katarzyniak et al. [12], (e) category's cue validity with feature-possession (CCVFP) proposed by Mulka and Lorkiewicz [13]. The formal definitions of these measures are presented in [13] and are included in this paper for completeness. It is worth noting that category's cue validity [1] is not considered, as it requires defining a feature-category relation (see Sect. 3.2).
This section contains first auxiliary measures and then basic-levelness measures. It results from the fact that psycholinguists utilize auxiliary measures to define basiclevelness measures [3,10]. In addition, presenting separately the definition of auxiliary measures simplifies the determination of the time complexity of basic-levelness measures.

Auxiliary Measures
There are five auxiliary measures utilized by at least one basic-levelness measure, namely: (a) probability of a feature, (b) probability of a cluster, (c) cue validity, (d) category validity, (e) collocation. Probability of a feature The probability of a feature (denoted as P( f )) [10] determines how it is probable that any object exhibits a feature f . It is calculated as follows: The probability of a feature P( f ) increases as the frequency increases with which a feature f is common among objects. In general, the probability of a feature is equal to 0 when no objects exhibit a feature f , and the value of 1 occurs when all objects exhibit a feature f . However, in this paper, it is further assumed that each feature is exhibited by at least one object, hence This auxiliary measure is utilized only by one basic-levelness measure, namely CU.

Probability of a cluster
The probability of a cluster (denoted as P(X r k )) [10] determines how it is probable that an object belongs to a cluster X r k . It is calculated as follows: The probability of a cluster P(X r k ) increases with the number of objects belonging to a cluster X r k . In particular, the probability of a cluster cannot be equal to 0 since a cluster has to contain at least one object. While, the probability of a cluster is equal to 1 when all objects belong to this cluster. Due to the fact that each hierarchy level is a partition of the set O, the sum of all probabilities of clusters from a specific hierarchy level is equal to 1.
This auxiliary measure is utilized only by one basic-levelness measure, namely, CU. Cue validity Cue validity (denoted as P(X r k | f )) [3] informs how a feature f is a good predictor for determining a cluster X r k , i.e., assuming that an object exhibit a feature f , it informs how likely it is to say that an object belongs to a cluster. It is calculated as follows: where o∈O γ o, f denotes the sum of occurrences of a feature f in objects O, whilst o∈X r k γ o, f denotes the sum of occurrences of a feature f in all objects from a cluster X r k . The value of cue validity P(X r k | f ) increases as the frequency increases with which a feature f is associated with a cluster X k and decreases as the frequency increases with which a feature f is related to other clusters than X r k . In particular, the cue validity's value of 0 occurs when no objects from a cluster X r k exhibit a feature f , and the value of 1 occurs when at least one object from a cluster X r k exhibits a feature f and a feature f is not exhibited by any objects from all remaining clusters.
Four basic-levelness measures: FP, CU, CCVGT, CCVFP utilize this auxiliary measure. Category validity Category validity (denoted as P( f |X r k )) [10] is the probability that an object belonging to a cluster has a feature f . It is calculated as follows: where o∈X r k γ o, f denotes the sum of occurrences a feature f in all objects from a cluster X r k , whilst | X r k | denotes the total number of objects in a cluster X r k . The value of category validity P( f |X r k ) increases as the frequency of a feature f in a cluster X r k increases. In particular, category validity is equal to 0 when no object from a cluster X r k exhibits a feature f , while it is equal to 1 when all objects from a cluster X r k exhibit a feature f . This auxiliary measure is utilized by all considered basic-levelness measures. Collocation Collocation (denoted as Col(X r k , f )) [9] is defined based on cue validity and category validity, and it represents a trade-off between a feature's spread within a cluster and it's ability to predict the cluster. It is calculated as follows so collocation is a product of cue and category validities of a cluster X r k and a feature f .
Collocation is equal to 0 when no objects from a cluster X r k exhibit a feature f , and the value of 1 occurs when cue and category validities are equal to 1 for a specific feature and cluster.
Maximal collocations are denoted as Col max ( f ) and they are calculated as follows Hence, maximal collocations are established for all features, using all clusters in a hierarchy of clusters. This auxiliary measure is utilized only by two basic-levelness measures, namely, FP, CCVFP.

Basic-Levelness Measures
As aforementioned, there are five basic-levelness measures for which the time complexity of the identification of the basic-level is determined in this paper, namely (a) feature-possession, (b) category utility, (c) category attentional slip, (d) category's cue validity with global threshold, (e) category's cue validity with feature-possession. Feature-possession Jones [9] proposes a basic-levelness measure, which is called feature-possession. It captures a certain trade-off between cue validity and category validity on a category level. Furthermore, the author argues that the basic-level is a hierarchy level at which the average feature-possession is maximal. It reflects the basic-level property of assigning the largest number of features at the basic-level [1].
Feature-possession for a hierarchy level X r is defined based on collocation (see Definition 3).
Definition 3 Feature-possession FP(X r ) of a hierarchy level X r is given as an average Feature-possession FP * (X r ) of a cluster X r k is given as where function θ(x, y) (x, y ∈ [0, 1]) is used to determine whether collocation is equal to the maximum collocation, so it is defined as follows: Hence, a feature f belongs to a cluster X r k if and only if Col(X r k , f ) = Col max ( f ). The basic-level X b ∈X is a hierarchy level for which the value of feature-possession is maximized, namely X b = arg max X r ∈X F P(X r ). It is not defined which hierarchy level should be identified as the basic-level if more than one hierarchy level maximizes such a basic-levelness measure. Corter and Gluck [10] propose a basic-levelness measure called category utility. A category is useful to the extent that it can be expected to improve people's ability to: (a) accurately predict features of a member of such a category, (b) efficiently communicate information to others about features of members of such a category. They argue that a category which is optimal for one of these purposes also tends to be optimal for the other one. In particular, category utility captures the overall categories' predictability and informativeness of features within a cluster (see Definition 4). 3 Definition 4 Category utility CU(X r ) of a hierarchy level X r is given as an average

Category utility
Category utility CU * (X r ) of a cluster is given as where category validity probability of a feature P( f ) is calculated as follows: probability of a cluster P(X r k ) is calculated as follows: The basic-level X b ∈X is the hierarchy level for which the value of category utility is maximized, namely X b = arg max X r ∈X CU (X r ). It is not defined which hierarchy level should be identified as the basic-level if more than one hierarchy level maximizes such a basic-levelness measure. Gosselin and Schyns [11] define a basic-levelness measure called category attentional slip. There are two fundamental determinants of basiclevelness, namely, the cardinality of redundant tests and the length of the optimal strategy needed to establish categories. The former is related to the distinctiveness of features among categories, i.e., how many different tests (understood as a verification whether an object exhibits a certain feature) can be performed to determine the placement of a new object in a hierarchical structure -it defines how features of a category characterize an object. The latter is related to the length of the optimal strategy to determine a category, i.e., how many decisions, starting from the root of a hierarchy, should be performed to establish a specific category.

Category attentional slip
Category attentional slip (see Definition 5) captures the aforementioned notions of cardinality and optimal strategy length. In particular, it is related to the number of tests required to determine the category by an ideal categorizer.
Definition 5 Category attentional slip CAS(X r ) of a hierarchy level X r is given as an average Category attentional slip for a cluster CAS * (X r k ) is given as where the probability q r k of a relevant test is equal to and θ(x, y) is calculated as follows: where j } is a set containing a cluster X r k and all its superclusters. In addition, p ∈ (0, 1) is the probability that attention randomly slips to one feature and ( p− pq r k ) is the probability of an irrelevant test.
To calculate the probability of a relevant test q r k it is enough to focus on features that are common to all objects in a cluster, i.e., f ∈F θ(P( f ,X r k ),1) |F | and subtract all features that are common to any of its superclusters X r j ∈ϒ r k \X r k q r j . Unlike other basic-levelness measures, the basic-level X b ∈X is the hierarchy level for which category attentional slip is minimized, namely X b = arg min X r ∈X C AS(X r ). It is not specified which hierarchy level should be identified as the basic-level if more than one hierarchy level has a minimal value of such a basic-levelness measure. Rosch et al. [1] propose that it is possible to determine cue validity for an entire category. It is done by adding up cue validity of all features of a category, hence a category's cue validity is no longer a probabilistic concept (so, its value may exceed 1). In addition, Rosch noticed that a category with a large value of category's cue validity is by definition distinguishable from these with a low value for this parameter. Hence, the hierarchy level that maximizes the average category's cue validity is the basic-level.

Category's cue validity
Murphy [5] pointed out that such a definition of a category's cue validity will always be maximal at the most general or inclusive hierarchy level (i.e., at the root). However, it should be noted that Murphy assumed that categories contain all features of all subordinate categories, whereas Rosch has a less restrictive understanding of featurecategory relations, i.e., superordinate categories have fewer features as compared with the basic-level. To determine which features are shared by members of a category, two approaches are found in the literature: cue validity with a global threshold [12], and category's cue validity with feature-possession [13]. Katarzyniak et al. [12] proposed category's cue validity with global threshold (see Definition 6). Such a basic-levelness measure utilizes a global threshold (i.e., a borderline value) to determine which features fall into a cluster.

Definition 6
Category's cue validity with global threshold CCVGT(X r , δ) of a hierarchy level X r and a global threshold δ∈[0, 1] is given as an average Category's cue validity with global threshold for a single cluster CCVGT * (X r k , δ) is given as where P( f i |X r k ) is category validity, P(X r k | f i ) is cue validity, and ψ k i is calculated as follows: It determines whether a cluster X r k exhibits a feature f i with probability greater than or equal to δ.
The basic-level X b ∈X is a hierarchy level for which category's cue validity with global threshold is maximized, i.e., X b = arg max X r ∈X CCVGT(X r ). It is not specified which hierarchy level should be identified as the basic-level if more than one hierarchy level maximizes such a basic-levelness measure.
The introduced measure takes into account Rosch's remarks, i.e., with a sufficiently large δ value (e.g., at least greater than 0.5) at higher hierarchy levels some features are not taken into account since the value of P( f i |X r k ) could be smaller than δ (e.g., if more than 50% of objects do not exhibit a feature). It is worth noting that Murphy's understanding of feature-category relations occurs when the global threshold is equal to 0.
A disadvantage of this basic-levelness measure is the need to determine the global threshold. Too low values of this threshold may result in maximizing cue validity at the root of a hierarchy of clusters (similarly as in Murphy's comment [5]), while too high values of the global threshold may cause that a hierarchy level that should be the basic-level is not correctly identified (since too many features could not be taken into account). Mulka and Lorkiewicz [13] proposed category's cue validity with feature-possession. It assumes that a cluster exhibits a feature f if and only if its or any of its superclusters' collocation is equal to Col max ( f ) (see Definition 7).

Definition 7
Category's cue validity with feature-possession CCVFP(X r ) of a hierarchy level X r is given as an average Category's cue validity with feature-possession for a single cluster CCVFP * (X r k ) is given as where the maximal collocation is calculated using the formula Col max ( f ) = max X j ∈ Xr ∈X P( f |X j )P(X j | f ). In addition, collocation Col r k ( f ) which uses a cluster and all its superclusters is calculated using the following formula: where ϒ r k is a set containing a cluster X r k and all its superclusters, i.e., ϒ r In addition, the function θ determines whether the maximal collocation is equal to any of these collocations (see Eq. 9).
The basic-level X b ∈X is a hierarchy level for which category's cue validity with feature-possession is maximized, i.e., X b = arg max X r ∈X CCVFP(X r ). It is not specified which hierarchy level should be identified as the basic-level if more than one hierarchy level maximizes such a basic-levelness measure.

Asymptotic Time Complexity
Basic-levelness measures are calculated for each hierarchy level to identify the basiclevel. Controlling complexity is an essential part of computer programming, hence the time complexity of the identification of the basic-level is determined for each basiclevelness measure separately. In addition, an proposed approach for identifying the basic-level is defined for each basic-levelness measures in this section.

Auxiliary Measures
To determine the time complexity for the identification of the basic-level it is crucial to know how auxiliary measures influence the overall time complexity. In addition, this section contains an approach for the calculation of auxiliary measures which lower the required time complexity.

Probability of a Feature
Determining for original approach The probability of a feature is calculated only when CU is utilized for the identification of the basic-level. The time complexity required to calculate all probabilities of features is presented in the Theorem 1.

Theorem 1 Calculation of probabilities of features requires O(| O || F |) time.
Proof First, the calculation of the probability of a feature P( f ) requires | O | time since it is necessary to determine how many objects exhibit a feature f . Second, it can be noted that there are exactly | F | features. Therefore, the calculation of all these probabilities requires O(| O || F |) time.

Probability of a Cluster
Determining for original approach The probability of a cluster is calculated only when CU is utilized for the identification of the basic-level. The time complexity required to calculate all probabilities of clusters is presented in the Theorem 2. Determining for proposed approach Due to the fact that P(X r , the calculation of the probabilities of clusters can be performed only for unique clusters (see Theorem 3). Note that unique clusters can be stored once the hierarchy of clusters is prepared.

Theorem 3 Calculation of probabilities of clusters for only unique clusters requires O(| O |) time.
Proof First, the lowest hierarchy level contains exactly | O | clusters for which the probability of a cluster is equal to 1 |O| . Second, a hierarchy of clusters contains up to (| O | −1) clusters which contain more than one subcluster. Hence, probabilities of clusters can be calculated for leaves (i.e., | O | times), and then only for up to (| O | −1) clusters containing more than one subcluster. The calculation of these probabilities requires O(2(| O | −1)) = O(| O |) time since the probability of supercluster is a sum of probabilities of its clusters. Note that 2(| O | −1) is the maximal number of clusters used to calculate these probabilities (it occurs when hierarchy of clusters contains exactly | O | hierarchy levels). Therefore, the overall time complexity is linear.

Cue Validity
Determining for original approach Cue validity is calculated if the following basiclevelness measures FP, CCVGT, and CCVFP are utilized for the identification of the basic-level. The time complexity required to calculate all cue validities is presented in the Theorem 4.

Theorem 4 Calculation of cue validities for all clusters (in a hierarchy of clusters) and features requires O(|
Proof Let us consider the calculation of cue validity for a single feature f and all clusters X ∈ X i ∈X X i . First, a hierarchy of clusters contains up to | O | hierarchy levels. Second, at each hierarchy level an object belongs to exactly one cluster, hence determining the exhibition of a feature f is performed up to | O | times. Finally, cue validity P(X | f ) for a all clusters and a feature f is calculated | O | 2 times. Therefore, the calculation of cue validity for all features and all clusters requires Determining for proposed approach Due to the fact that P(X r , the calculation of cue validities can be performed only for unique clusters (see Theorem 5).

Theorem 5 Calculation of cue validities for all unique clusters (in a hierarchy of clusters) and all features requires O(| O || F |) time.
Proof First, cue validity for a cluster X r +1 k and a feature f is a sum of cue validities its subclusters X r i and a feature f , i.e., P(X r +1 It results from the definition of cue validity, namely

Category Validity
Determining for original approach Category validity is always calculated. It results from the fact that all presented in this paper basic-levelness measures utilize category validity. The time complexity required to calculate all category validities is presented in the Theorem 6.

Theorem 6 Calculation of category validities for all clusters and features requires
Proof The proof can be analogously performed as in the case of cue validity (see Theorem 4).

Determining for proposed approach Due to the fact that
k , the calculation of category validities can be performed only for unique clusters (see Theorem 7).

Theorem 7 Calculation of category validities for all unique clusters (in a hierarchy of clusters) and all features requires O(| O || F |) time.
Proof First, category validity for a cluster X r +1 k can be calculated based on category validities of its subclusters, i.e., P( f |X r +1 . It can be proved as follows Second, it can be noted that a hierarchy of clusters contains up to (| O | −1) clusters containing more than one subclusters. Therefore, category validity (for a single feature) has to be calculated for all leaves (i.e.,

Collocation
Determining for original approach Collocation is calculated when FP and CCVFP are utilized for the identification of the basic-level. The time complexity required to calculate all collocations is presented in the Theorem 8.

Theorem 8 Calculation of collocation for all clusters and features requires
, the calculation of collocations can be performed only for unique clusters (see Theorem 9).

Theorem 9 Calculation of collocations for all unique clusters (in a hierarchy of clusters) and all features requires O(| O || F |) time.
Proof Collocation for a cluster X r +1

Basic-Levelness Measures
The determined time complexity of auxiliary measures is further utilized to determine the time complexity of basic-levelness measures. In addition, this section contains an approach for the calculation of basic-levelness measures which lower the required time complexity.

Feature-Possession
Determining for original approach Feature-possession can be used for the identification of the basic-level and it requires polynomial time (see Theorem 10). And this is exactly the desired formulation given in Theorem 10.
Determining for proposed approach The value of FP * (X r +1 k ) depends only on collocations. Knowing that the following two collocations Col(X r k , f ) and Col(X r +1 k , f ) are equal to each other for identical clusters (X r k = X r +1 k ), it is clear that (a) if a feature f is assigned to a cluster X r k , it will be assigned to a cluster X r +1 k , (b) if a feature f is not assigned to a cluster X r k , it will not be assigned to a cluster X r +1 k . Therefore, it can be further noted that the identification of the basic-level can be optimized (see Theorem 11) based on the auxiliary measures calculated for unique clusters and the iterative the calculation of FP(X r ) (see Lemma 1). Lemma 1 (Theorem 11) FP(X r +1 ) can be calculated as follows: where FP r is equal to X r k ∈X r \Y r FP * (X r k ). It is calculated only clusters X r \Y r which are used to create new unique superclusters X r +1 \Y r +1 (i.e., merged from at least two subclusters). While, set Y r +1 contains clusters containing only a single subcluster (all of these subclusters are in a set Y r ).
Proof Note that FP(X r ) is calculated as follows: The value of X r i ∈Y r FP * (X r i ) can be used to calculate the value of FP(X r +1 ), namely Note that FP r +1 is equal to FP r , so it is further calculated as or, equivalently (after substitution) And this is exactly the desired formulation given in Lemma 1.
And this is exactly the desired formulation given in Theorem 11.

Category Utility
Determining for original approach Category utility can be used for the identification of the basic-level and it requires polynomial time (see Theorem 12).
And this is exactly the desired formulation given in Theorem 12.
Determining for proposed approach The value of CU * (X r +1 k ) depends on the probability of a feature, the probability of a cluster, and category validity. Due to the fact that these values are equal to each other for identical clusters (X r . Therefore, the identification of the basic-level can be optimized (see Theorem 13) based on auxiliary measures calculated for unique clusters and the iterative calculation of CU(X r ) (see Lemma 2). Lemma 2 (Theorem 13) CU(X r +1 ) can calculated as follows: where CU r is equal to X r k ∈X r \Y r CU * (X r k ). It is calculated only clusters X r \Y r which are used to create new unique superclusters X r +1 \Y r +1 (i.e., merged from at least two subclusters).
Proof The proof can be analogously performed as in the case of feature-possession (see Lemma 1). Third, CU(X r +1 ) can be calculated based on CU(X r ) (see Lemma 2). If there are a few cluster are in a set X r \Y r the calculation of CU(X r +1 ) requires O (1). If there are many cluster in a set X r \Y r (e.g., | O |), then the calculation of CU(X r +1 ) requires O(| O |) time. However, the more clusters are in X r \Y r , the less hierarchy levels is in a hierarchy of clusters (which results non-increasing the time complexity). Therefore, the calculation of CU(X r ) for all hierarchy levels requires O(| O || F |) time. Fourth, the identification of a hierarchy level maximizing CU requires O(| O |) time (since it is the maximal number of hierarchy levels). Concluding the time complexity of the identification of the basic-level by CU can be as much as

Theorem 13 The identification of the basic-level by CU requires O(| O ||
And this is exactly the desired formulation given in Theorem 13.

Category Attentional Slip
Determining for original approach Category attentional slip can be used for the identification of the basic-level and it requires polynomial time (see Theorem 14).

Theorem 14 The identification of the basic-level by C AS requires O(|
Proof First, the calculation of all category validities requires O(| O | 2 | F |) time (see Theorem 6). Second, determining probability of relevant test q r k requires O(| O || F |) time since for each feature f it is checked whether its category validity is equal to 1. In addition, it is checked whether any supercluster has assigned such a feature. A

bottom-up calculation of all probabilities of the relevant test requires
time since for each cluster it is checked whether any supercluster has assigned any from | F | features. However, determining probabilities of relevant test q r k using top-down approach reduces the time complexity. Determining probability of relevant test q r ρ k requires | F | time. Then, determining q r k for a hierarchy level r based on hierarchy level (r + 1) requires O(| F |) time and it is performed | X r | times.
And this is exactly the desired formulation given in Theorem 14. However, the usage of top-down approach causes that the time complexity of the identification of the basic-level by C AS can be as much as Determining for proposed approach Note that features assigned to any cluster are not included in any of its subclusters. Hence, the probabilities of the relevant test could be different, i.e., q r k =q r +1 k for two identical clusters X r k = X r +1 k . Therefore, the identification of the basic-level cannot be optimized similarly as in the case of previous basic-levelness measures However, CAS contains the sum of series which can be calculated. Let us note that To calculate the value of such a basic-levelness measure, it is necessary to calculate the value of the sum of series +∞ i=1 i( p − pq k ) i−1 . It can be equivalently written as +∞ i=1 i x i−1 for x = ( p − pq k ). In addition, it can be noted that x∈[0, 1) since x = p(1 − q k ) and p∈(0, 1) ∧ q k ∈[0, 1]. Furthermore, it can be noted that such a sum of series is convergent (see Lemma 3).

Lemma 3 (Theorem 15) The sum of series
The ratio test can be used to determine whether the sum of the series is convergent. According to the ratio test, the following sum of series is convergent when lim i→∞ Let us calculate the following limit Due to the fact that | x | <1, the following sum of series +∞ i=1 i x i−1 is convergent. Knowing that the series +∞ i=1 i x i−1 is convergent, the calculation can be simplified (see Lemma 4).
. . Then, the difference between W (x) and x W (x) is equal to note that it is a geometric series (and | x | <1), so its sum is equal to 1 1−x . Therefore, And this is exactly the desired formulation given in Lemma 4. Knowing that the calculation of the sum of series can be simplified makes it possible to optimize the calculation of C AS (see Theorem 15).

Theorem 15
Calculation of CAS(X r ) can be simplified to the following form: Proof Note that CAS(X r ) is calculated as follows: Then, the calculation of the sum of series can be performed as follows (since Finally, it can be simplified to And this is exactly the desired formulation given in Theorem 15. Calculation of the original version of such a basic-levelness measure (for a single cluster) requires linear time (O(n s ) where n s is number of steps used for calculating sum of series), clearly simplified version has O(1) the time complexity assuming that q r k is already determined.

Category's Cue Validity with Global Threshold
Determining for original approach Category's cue validity with global threshold can be utilized for the identification of the basic-level and it requires polynomial time (see Theorem 16).
And this is exactly the desired formulation given in Theorem 16.
Determining for proposed approach The value of CCVGT * (X r +1 k ) depends on cue and category validities. Knowing that these values are equal to each other for identical clusters (X r . Therefore, the identification of the basic-level can be optimized (see Theorem 17) based on auxiliary measures calculated for unique clusters and the iterative calculation of CCVGT(X r ) (see Lemma 5). these values are already calculated, the calculation of CCVFP * (X r k ) requires determining which features are assigned to any supercluster. A bottom-up calculation of all CCVFP * (X r k ) requires time. It results from the fact that for each cluster it is checked whether any supercluster has assigned any from | F | features. However, determining CCVFP * (X r k ) using topdown approach reduces the time complexity. Determining CCVFP(X r ρ ) for the root of a hierarchy of clusters requires O(| F |) time. Then, calculating CCVFP(X r ρ −1 ) And this is exactly the desired formulation given in Theorem 18. However, the usage of top-down approach causes that the time complexity of the identification of the basic-level by CCVFP can be as much as Determining for proposed approach The value of CCVFP * (X r +1 k ) depends on cue and category validities. Knowing that these values are equal to each other for identical clusters (X r . Therefore, the identification of the basic-level can be optimized (see Theorem 19) based on auxiliary measures calculated for unique clusters and the iterative calculation of CCVFP(X r ) (see Lemma 6).
Lemma 6 (Theorem 19) CCVFP(X r +1 ) can calculated as follows: where CCVFP r is equal to X r k ∈X r \Y r CCVFP * (X r k ). It is calculated only clusters X r \Y r which are used to create new unique superclusters X r +1 \Y r +1 (i.e., merged from at least two subclusters).
Proof The proof can be analogously performed as in the case of feature-possession (see Lemma 1).

Theorem 19 The time complexity of the identification of the basic-level by CCVFP can be as much as O(| O | 2 ) when the calculation of CCVFP(X r ) is iteratively performed.
Proof First, the calculation of cue and category validities for all unique clusters requires O(| O || F |) time (see Theorems 5,7). Second, a hierarchy of clusters contains up to (2 | O | −1) unique clusters. Hence, CCVFP * (X r k ) has to be calculated up to (2 | O | −1) times. Determining for all unique clusters which features should be taken into account can be performed using top-down approach. Namely, first features are determined for the root of a hierarchy of clusters which requires O(| F |) time. Then, it is determined for its unique subclusters. The procedure is repeated until reaching the lowest hierarchy level. It can be noted that the assignment of features And this is exactly the desired formulation given in Theorem 19.

Discussion
Let us consider an original approach of the identification of the basic-level. CAS and CCVFP require higher the time complexity than the remaining three basic-levelness measures. These basic-levelness measures determine the category-feature relation based on not only the cluster but also all its superclusters. In particular, in the case of CAS a feature can be only once assigned in a branch from a leaf to the root, while in the case of CCVFP a cluster contains features of its superclusters. Both approaches for determining feature-category relations require an additional loop through all objects, which causes an overall increase of the time complexity.
The time complexity of the identification of basic-level is strictly related to the time complexity of the calculation of auxiliary measures. It results from the fact that each basic-levelness measure uses at least one auxiliary measure (i.e., category validity is utilized by all considered basic-levelness measures). An original approach for the calculation of category validity requires O(| O | 2 | F |) time, hence it seems to be the lower bound (i.e., (| O | 2 | F |)) for the time complexity of the identification of the basic-level.
However, in this paper, optimizations of the calculation of auxiliary and basic-level measures were performed. Agglomerative clustering algorithms [26] 4 were an inspiration for the performed optimizations, namely, agglomerative clustering algorithms utilize a proximity measure to determine which clusters should be merged. After merging clusters, only the necessary proximities are updated, since not all proximities were changed. Based on this remark, the calculation of auxiliary and basic-levelness measures was proposed. Namely, calculations are performed only for clusters which were merged since for the remaining clusters the values of auxiliary and basic-levelness measures do not change (except for CAS). In the case of CAS, if a parent cluster exhibits a specific feature, its subclusters cannot exhibit such a feature (regardless of whether they contain the same set of objects).
The performed optimizations were beneficial since it caused that the time complexity related to the identification of the basic-level is not significant considering a more complex process, namely, the extraction of basic-level categories. Let us note that the identification of the basic-level assumes that a hierarchical structure is already defined, while approaches coping with the extraction of basic-level categories organize knowledge in a hierarchical structure, and then identify the basic-level (see [14][15][16]). It is worth emphasizing that most agglomerative clustering algorithms [17,18]). 5 Therefore, the usage of the proposed approach of identifying the basic-level by FP, CU, CCVGT, CCVFP does not have a significant impact on the overall time complexity (considering the extraction of basic-level categories) since they are lower than or equal to the time complexity required for preparing a hierarchical structure using a hierarchical clustering technique. However, the usage of CAS (even the proposed approach) might increase the overall time complexity.

Summary
This paper concerned the categorization performed by an artificial cognitive system focusing on the subtask of the process of extraction basic-level categories, namely the identification of the basic-level in an already defined hierarchical structure. The identification of the basic-level utilizes measures proposed by psycholinguists (called basic-levelness measures) which were analytically studied in this paper.
The asymptotic time complexity of the identification of the basic-level was determined for 5 basic-levelness measures, namely: (a) feature-possession, (b) category utility, (c) category attentional slip, (d) category's cue validity with global threshold, and (e) category's cue validity with feature-possession. Summarized findings are presented in Table 1.
The identification of the basic-level by the following three basic-levelness measures FP, CU, CCVGT requires O(| O | 2 | F |) time since these measure refer only to one hierarchy level, while the identification of the basic-level by CAS and CCVFP requires O(| O | 3 | F |) time since these measures refer to higher hierarchy levels too.
However, the identification of the basic-level was optimized in this paper. Namely, the identification of the basic-level requires The performed optimizations were beneficial since it caused that the time complexity related to the identification of the basic-level is not significant for 4 basic-levelness measures (i.e., except CAS) considering a more complex process, namely, the extraction of basic-level categories.

Conflict of interest
The author declares that he has no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.