Co-occurrence pattern mining based on a biological approximation scoring matrix

Guo, Dan; Yuan, Ermao; Hu, Xuegang; Wu, Xindong

doi:10.1007/s10044-017-0609-8

Co-occurrence pattern mining based on a biological approximation scoring matrix

Theoretical Advances
Published: 28 February 2017

Volume 21, pages 977–996, (2018)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Dan Guo¹,
Ermao Yuan¹,
Xuegang Hu¹ &
…
Xindong Wu²

281 Accesses
2 Citations
Explore all metrics

Abstract

Mining co-occurrence frequency patterns from multiple sequences is a hot topic in bioinformatics. Many seemingly disorganized constituents repetitively appear under different biological matrices, such as PAM250 and BLOSUM62, which are considered hidden frequent patterns (FPs). A hidden FP with both gap and flexible approximation operations (replacement, deletion or insertion) deepens the difficulty in discovering its true occurrences. To effectively discover co-occurrence FPs (Co-FPs) under these conditions, we design a mining algorithm (co-fp-miner) using the following steps: (1) a biological approximation scoring matrix is designed to discover various deformations of a single FP pattern; (2) a data-driven intersection tactic is used to generate candidate Co-FPs; (3) a deterministic Apriori-like rule is proposed to prune unnecessary Co-FPs; and (4) finally, we employ a backtracking matching scheme to validate true Co-FPs. The co-fp-miner algorithm is an unified framework for both exact and approximate mining on multiple sequences. Experiments on DNA and protein sequences demonstrate that co-fp-miner is more efficient on solutions, time and memory consumption than that of other peers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Frequent Pattern Mining with Non-overlapping Inversions

Mining Frequent Closed Sequential Patterns with Non-user-defined Gap Constraints

FCloSM, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy

Article 17 February 2017

Notes

The paper obeys the deletion rule that while deleting a sub-pattern of pattern P, its previous gap usually has no meaning for constraint preservation and we delete the gap with no operation cost. But if the first sub-pattern disappears, its following gap becomes no meaning. We discard the gap with no operation cost. Deleting a sub-pattern is taken as an approximate cost.

References

Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15:55–86
Article MathSciNet Google Scholar
Chen G, Wu XD, Zhu XQ, Arslan AN, He Y (2006) Efficient string matching with wildcards and length constraints. Knowl Inf Syst 10:399–419
Article Google Scholar
Ding B, Lo D, Han J, Khoo S (2005) Efficient mining of closed repetitive gapped subsequences from a sequence database. In: IEEE 25th international conference on data engineering, pp 1024–1035
Xie F, Wu XD, Hu XG, Gao J, Guo D, Fei Y, Hua E (2010) Sequential pattern mining with wildcards. In: 22nd IEEE international conference on tools with artificial intelligence, pp 241–247
Yang QX, Yuan SS, Zhao L et al (2003) Faster algorithm of string comparison. Pattern Anal Appl 6(2):122–133
Article MathSciNet Google Scholar
Chen YC, Weng JTY, Hui LA (2016) A novel algorithm for mining closed temporal patterns from interval-based data[J]. Knowl Inf Syst 46(1):151–183
Article Google Scholar
Silva A, Antunes C (2016) Constrained pattern mining in the new era[J]. Knowl Inf Syst 47(3):489–516
Article Google Scholar
Oates T, Cohen PR (1996) Searching for structure in multiple streams of data. In: Proceeding of 13th international conference on machine learning, pp 346–354
Notredame C (2002) Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 3(1):131–144
Article Google Scholar
Mathkour H, Ahmad M (2009) A pattern matching technique for multiple sequences alignment with GAP consideration. In: International conference on signal acquisition and processing, pp 123–127
Yao D, Jiang M, You X et al (2015) An algorithm of multiple sequence alignment based on consensus sequence searched by simulated annealing and star alignment. In: International symposium on bioelectronics and bioinformatics, pp 3–6
Ni B, Wong MH, Lam CFD et al (2014) Applying Agrep to r-NSA to solve multiple sequences approximate matching. Int J Data Min Bioinform 9(4):358–385
Article Google Scholar
Kouzinopoulos CS, Michailidis PD, Margaritis KG (2011) Experimental results on multiple pattern matching algorithms for biological sequences. Bioinformatics 274–277
Li Y, Patel JM, Terrell A (2012) WHAM: a high-throughput sequence alignment method. ACM Trans Database Syst 37(4):28
Google Scholar
Besharati A et al (2014) Multiple sequence alignment using biological features classification. In: International congress on technology, communication and knowledge, pp 1–5
Zhan Q, Ye Y, Lam TW et al (2015) Improving multiple sequence alignment by using better guide trees. BMC Bioinform 16(5):1
Google Scholar
Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
Article Google Scholar
Han J, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu MC (2000) Freespan: frequent pattern-projected sequential pattern mining. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp 355–359
He D, Zhu XQ, Wu XD (2011) Mining approximate repeating patterns from sequence data with gap constraints. Comput Intell 27(3):336–362
Article MathSciNet Google Scholar
Boeva V, Regnier M, Papatsenko D et al (2006) Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression. Bioinformatics 22(6):676–684
Article Google Scholar
Navarro G, Raffinot M (2002) Flexible pattern matching in strings practical on-line search algorithms for texts and Biological Sequences. Cambridge University Press, Cambridge
Book Google Scholar
Zhang M, Kao B, Cheung DW et al (2007) Mining periodic patterns with gap requirement from sequences. ACM Trans Knowl Discov Data 1(2):7
Article Google Scholar
Bille P, Gortz I, Vildhoj H, Wind D (2012) String matching with variable length gaps. Theor Comput Sci 443:25–34
Article MathSciNet Google Scholar
Zhang JY, Yang CH (2013) Pattern matching with wildcard gaps based on cross list. In: Proceedings of 6th international symposium on computational intelligence and design, pp 154–156
Pasquier C, Sanhes J, Flouvat F et al. (2016) Frequent pattern mining in attributed trees: algorithms and applications[J]. Knowl Inf Syst 46(3):491–514
Article Google Scholar
Wang JZ, Huang JL, Chen YC (2016) On efficiently mining high utility sequential patterns[J]. Knowl Inf Syst 49(2):597–627
Article Google Scholar
Gouda K, Zaki M (2001) Efficiently mining maximal frequent itemsets. ICDM. In: Proceedings IEEE international conference on IEEE, pp 163–170
Hong XL, Wu XD, Hu XG, Liu YL, Gao J, Wu GQ (2009) BPBM: an algorithm for string matching with wildcards and length constraints. In: International conference on rough sets. Fuzzy sets, data mining and granular computing, pp 518–525
Hu H, Wang H, Li J et al. (2016) An efficient pruning strategy for approximate string matching over suffix tree[J]. Knowl Inf Syst 49(1):121–141
Article Google Scholar
Kum HC, Pei J, Wang W et al (2003) ApproxMAP: approximate mining of consensus sequential patterns. In: Proceedings of the 2003 SIAM international conference on data mining. Society for industrial and applied mathematics, pp 311–315
Chen C, Yan X, Zhu F et al (2007) gapprox: mining frequent approximate patterns from a massive network. In: Seventh IEEE international conference on data mining. IEEE, pp 445–450
Manber U, Baeza-Yates R (1991) An algorithm for string matching with a sequence of don’t cares. Inf Process Lett 37(3):133–136
Article MathSciNet Google Scholar
Huang CW, Lee WS, Hsieh SY (2011) An improved heuristic algorithm for finding motif signals in dna sequences. IEEE/ACM Trans Comput Biol Bioinform 8(4):959–975
Article Google Scholar
Machanick P, Bailey TL (2011) Meme-chip: motif analysis of large DNA datasets. Bioinformatics 27(12):1696–1697
Article Google Scholar
Felicioli C, Marangoni R (2012) Bpmatch: an efficient algorithm for a segmental analysis of genomic sequences. IEEE/ACM Trans Comput Biol Bioinform 9(4):1120–1127
Article Google Scholar
Wong AK, Lee ESA (2014) Aligning and clustering patterns to reveal the protein functionality of sequences. IEEE/ACM Trans Comput Biol Bioinform 11(3):548–560
Article Google Scholar
Freire JM, Dias SA, Flores L, Veiga AS, Castanho MA (2015) Mining viral proteins for antimicrobial and cell-penetrating drug delivery peptides. Bioinformatics 31(14):2252–2256
Article Google Scholar
Vijaya PA, Murty MN, Subramanian DK (2006) Efficient median based clustering and classification techniques for protein sequences. Pattern Anal Appl 9(2):243–255
Article MathSciNet Google Scholar
Floratou A, Tata S, Patel JM (2011) Efficient and accurate discovery of patterns in sequence data sets. IEEE Trans Knowl Data Eng 23(8):1154–1168
Article Google Scholar
Wang K, Xu Y, Yu JX (2004) Scalable sequential pattern mining for biological sequences. In: Proceedings of the thirteenth ACM international conference on Information and knowledge management. ACM, pp 178–187
Zhang J, Wang Y, Zhang C et al (2016) Mining contiguous sequential generators in biological sequences. IEEE/ACM Trans Comput Biol Bioinform 13(5):855–867
Article Google Scholar
Durian B, Holub J, Peltola H, Tarhio J (2009) Tuning BNDM with q-grams. In: Proceedings of the meeting on algorithm engineering and experiments, pp 29–37
Prasad R, Agarwal S (2007) Optimal shift-or string matching algorithm for multiple patterns. In: Proceedings of international conference on computer science and applications, pp 263–266
Kandhan R, Teletia N, Patel JM (2010) SigMatch: fast and scalable multi-pattern matching. Proc VLDB Endow 3(1–2):1173–1184
Article Google Scholar
Wang XD, Liu JX, Xu Y et al (2015) A survey of multiple sequence alignment techniques. In: International conference on intelligent computing. Springer International Publishing, pp 529–538
Prasad R, Agarwal S, Yadav I et al (2010) A fast bit-parallel multi-patterns string matching algorithm for biological sequences. In: Proceedings of the international symposium on biocomputing, pp 46
Zhu H, He Z, Jia Y (2015) A novel approach to multiple sequence alignment using multi-objective evolutionary algorithm based on decomposition. IEEE J Biomed Health Inform 20(2):717–727
Article Google Scholar
https://www.cs.us.es/~fran/students/julian/index.html
Research Collaboratory for Structural Bioinformatics (RCSB): Protein Data Bank. http://www.rcsb.org/pdb/home/home.do
http://www.ncbi.nlm.nih.gov
GenBank, yeast (saccharomyces cerevisiae). http://www.ncbi.nlm.nih.gov/genbank
Nature Reviews Microbiology Article (2006) Dataset. http://www.psort.org/dataset/

Download references

Author information

Authors and Affiliations

School of Computer and Information, Hefei University of Technology, Hefei, China
Dan Guo, Ermao Yuan & Xuegang Hu
School of Computing and Informatics, University of Louisiana, Lafayette, LA, USA
Xindong Wu

Authors

Dan Guo
View author publications
You can also search for this author in PubMed Google Scholar
Ermao Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Xuegang Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xindong Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dan Guo.

Additional information

This work was supported by the National Natural Science Foundation of China (NSFC) under Grants 61305062 and 61229301, and National 973 Program of China under Grant 2013CB329604.

Appendices

Appendix 1: Measurement of the large number $N_{l,op_{max}}$

While $op_{max}=0$, Zhang et al. [22] inferred the upper limit number of exact occurrences $N_{l,op_{max}}=N_l$ as follow:

$$\begin{aligned} N_l=\left[ L-(l-1)\left(\frac{M+N}{2}+1\right)\right] W^{l-1} \end{aligned}$$

(5)

where l is the length of pattern P, L is the length of subject sequence S, and $W=M-N+1$ is the total length of gap $\phi ^M_N$. Formula (5) must meet the condition that the maximal span of pattern $L_{P_{Max}}=l+(l-1)*M<L$. While $L_{P_{Max}}+op_{max} \ll L$, we extend $N_l$ to the upper limit of the number of approximate occurrences $N_{l,op_{max}}$, where $op_{max}$ is the operation threshold defined in Definition 3 in Sect. 3.

For the same gap $\phi ^M_N$, we consider three situations as follows:

(1) Number by exact matching and replacement $(N_{E/R})$:

On the basis of conclusion of Zhang et al. in [22], $N_l$ describes the number of distinct length-l offset queues with gap $\phi ^M_N$ in sequence S. The length of each occurrence by exact matching and replacement is still l with the same gap $\phi ^M_N$. Hence, $N_{E/R}=N_l$.

(2) Incremental number by insertion ($N_I$):

By Definition 3 in Sect. 3, the insertion position does not appear in any occurrence, and the length of the real occurrence is still l; however, the gap will be changed. Some new occurrences by insertion operations overlap $N_{E/R}$ and some do not. It is equivalent to count the incremental number of length-l offset matching position queues that exceed the gap’s upper limit.

Supposing the number of insertion operations is i and the number of inserted positions in an occ is j, then $j\le i \le op_{max}$.

First, to insert i wildcards into j positions in an occ, there are $\sum ^i_{j=1}Y_{i,j}\cdot C^j_{l-1}$ ways, where $Y_{i,j}$ is the number of ways to divide i wildcards into j groups, and $C^j_{l-1}$ is the number of ways to select j gaps from P’s l-1 gaps. We discover that $Y_{i,j}$ is the jth coefficient in the ith row in Pascal’s triangle. By Pascal’s rule, $Y_{i,j}=C^{j-1}_{i-1}$.

Second, with the j positions fixed, we need to confirm how many occurrences can be selected to extend new occurrences. The selected occurrences must satisfy the least one, and more gaps upper limit is M. P’s j gaps are fixed with M, and the other l-j gaps are flexible. Similar to Zhang et al.’s conclusion [22], with flexible l-j gaps, the number of the selected occurrences is no more than $N_{l-j}$. Thus, under the number of insertion operations i, there is $\sum ^{i}_{j=1}Y_{i,j}\cdot C^j_{l-1}\cdot N_{l-j}$ as an incremental number by insertion operations.

Therefore, under the virtual operation threshold $op_{max}$ ($i\le op_{max}$), there is $N_{I}=\sum ^{op_{max}}_{i=1}\sum ^i_{j=1}\left[ Y_{i,j}\cdot C^j_{l-1}\cdot N_{l-j}\right] =\sum ^{op_{max}}_{i=1}\sum ^i_{j=1}\left[ C_{i-1}^{j-1}\cdot C^j_{l-1}\cdot N_{l-j}\right]$.

(3) Incremental number by deletion ($N_D$);

The deletion operation is divided into two situations as follows:

$\textcircled {\small {1}}$ Delete sub-patterns. Under $op_{max}$, the length of incremental occurrences of pattern P could be $l-op_{max}$, $l-op_{max}+1$ and $l-1$ with the same gap $\phi ^{M}_N$. Similar to Zhang et al.’s conclusion [22], with flexible $l-op_{max}$, $l-op_{max}+1$, ..., and $l-1$ gaps, the number of new occurrences is no more than $\sum ^{l-1}_{l'=l-op_{max}}N_{l'}=\sum ^{op_{max}}_{i=1}N_{l-i}$.

$\textcircled {\small {2}}$ Delete wildcards between gaps. If a deletion operation happens on a gap, the length of occurrences is still l, but the gap is changed. This case is similar to the incremental number by insertion, but the gaps are lessening. The selected occurrences must satisfy the least one, and more gaps’ lower limit is N. The total number of new occurrences is no more than $\sum ^{op_{max}}_{i=1}\sum ^{i}_{j=1}\left[ C^{j-1}_{i-1}\cdot C^j_{l-1}\cdot N_{l-j}\right]$.

Conclusion:

The unified large number $N_{l,op_{max}}=N_{E/R}+N_I+N_D$. When $op_{max}$=0, there is exact mining, and we force that $N_I=N_D =0$. Therefore, the formula of $N_{l,op_{max}}$ is

$$\begin{aligned}&N_{l.op_{max}}=\left\{\begin{array}{ll}N_l,&if(op_{max}=0) \\N_l+\sum \limits^{op_{max}}_{i=1}N_{l-i}+2\sum \limits ^{op_{max}}_{i=1}\sum \limits^i_{j=1}\left[ C^{j-1}_{i-1}\cdot C^{j}_{l-1} \cdot N_{l-j}\right],&else\qquad \\ \end{array} \right. \nonumber \\&s.t. \left\{\begin{aligned}&j\le i \le op_{max}\\&{\text {while}}(l-i\le 0),N_{l-i}=0 \end{aligned} \right. \end{aligned}$$

(6)

We stipulate the rule that when $l-i\le 0$, then $N_{l-i}=0$.

For different gaps {$\phi ^{M_i}_{N_i} (1\le i<l)$}:

If we set $W=max\{M_i-N_{i}+1\}(1\le i\le l-1)$, $N_{l,op_{max}}$ still satisfies formula (6). We express it more accurately in formula (7):

$$\begin{aligned}&N_{l.op_{max}}=\left\{ \begin{array}{ll}N_l,&if(op_{max}=0) \\ N_l+\sum \limits ^{op_{max}}_{i=1}N_{l-i}+2\sum \limits ^{op_{max}}_{i=1}\sum \limits ^i_{j=1}\left[ C^{j-1}_{i-1}\cdot C^{j}_{l-1} \cdot N_{l-j}\right] ,&else\qquad \\ \end{array} \right. \nonumber \\ &s.t. \left\{ \begin{array}{ll}N_l=[L-(l-1)\left( \frac{M_{max}+N_{max}}{2}+1\right) ]W^{l-1}\\&j\le i \le op_{max}\\&{\text {while }}(l-i\le 0),N_{l-i}=0 \end{array} \right. \end{aligned}$$

(7)

where $M_{max}$ and $N_{max}$ denote the maximum and minimum local length of gaps, respectively.

Appendix 2: Theorem proof

Lemma 1

Given a length- n pattern $P=p_1\phi ^{M_1}_{N_1}p_2\ldots \phi ^{M_{n-1}}_{N_{n-1}}p_n$ and its any length- l sub-pattern $Q=p_k\phi ^{M_{k}}_{N_{k}}p_{k+1}\phi ^{M_{k+1}}_{N_{k+1}}\ldots p_{k+l-1}$ $(1\le k\le n-l+1,0<l<n)$ , here is Sup(Q) $\ge Sup(P)/w^*$ , where

$$w^*=\left\{ \begin{array}{ll}\prod _{i\in [1,k]\cup [k+l,n-1]}(M_i-N_i+1),&if (op_{max})=0\\ \prod _{i\in [1,k]\cup [k+l,n-1]}(M_i-N_i+2\times op_{max}+3),&else.\qquad \end{array} \right.$$

(8)

Proof

As shown in Fig. 5 in Sect. 4, ${\mathcal {M}}_{i,j}$ can be traced back to ${\mathcal {M}}_{i-1, 0}$, ${\mathcal {M}}_{i-1,s}$, ${\mathcal {M}}_{i-1,s'}$, ${\mathcal {M}}_{i-1,s''}$, $M_{i,j-1}$, and $M_{i-1,j}$.

(1)
The paths from ${\mathcal {M}}_{i-1,0}$, ${\mathcal {M}}_{i-1,s}$, ${\mathcal {M}}_{i-1,s'}$, and ${\mathcal {M}}_{i-1,s''}$ are replacement paths. Occs (occ, an occurrence of pattern P in sequence S) by the replacement paths do not overlap each other.
(2)
The insertion path from ${\mathcal {M}}_{i,j-1}$ to ${\mathcal {M}}_{i,j}$ must be neglected because the insertion operation after $p_i$ is a virtual operation. By Definition 3 in Section 3, any occ from the path (${\mathcal {M}}_{i,j-1}$ to ${\mathcal {M}}_{i,j}$) can overlap an occ at the (j-1)th position of sequence S.
(3)
Finally, the path from ${\mathcal {M}}_{i-1,j}$ to ${\mathcal {M}}_{i,j}$ is a deletion path. The deletion is also a virtual operation; however, its generated occs’ lengths are changed. Hence, its generated occs also do not overlap any occ.

Therefore, under the virtual operation threshold, there is no more than $w'=\left[ (j-N_{i-1}-1+op_{max})-(j-M_{i-1}-1-op_{max})+1\right] +1+1=M_i-N_i+2op_{max}+3$ times to expand sub-pattern $P_{k\ldots i-1}$ to sub-pattern $P_{k\ldots i}$, where $P_{k\ldots i}= p_k\phi ^{M_{k}}_{N_{k}}p_{k+1}\phi ^{M_{k+1}}_{N_{k+1}}\ldots p_i$. $\square$

By the analogy method, it needs to expand sub-patterns $\{ p_1, p_2, \ldots , p_k \}$ and $\{ p_{k+l}, \ldots , p_n \}$ to convert from Q to P. Therefore, $sup(Q)\ge sup(P)/w^*$.

Note that for exact mining, $op_{max}=0$, the deletion path and insertion path are discarded. There is $w'= \left[ (j-N_{i-1}-1+op_{max})-(j-M_{i-1}-1-op_{max})+1\right] =M_i-N_i+1$.

Lemma 1’

Lemma 1 for the same gap $\phi ^M_N$ here is $sup(Q) \ge sup(P)/w^{n-l}:$

$$w=\left\{ \begin{array}{ll}W=M-N+1,&if (op_{max})=0\\ M-N=2\times op_{max}+3,&else\qquad \end{array} \right.$$

(9)

Proof

For the same gap $\phi ^M_N$, $w^*=w^{n-l}$. Lemma 1’ is a special situation of Lemma 1. $\square$

Theorem 1

(Apriori-like). Given the same length- n pattern P, length- l sub-pattern Q and variable $w^*$ in Lemma 1 , if Q satisfies $Freq(Q)<\eta _{n-l,op_{max}}\times \rho$ , then P is not a frequent pattern, where $\eta _{n-l,op_{max}}=\displaystyle \frac{N_{n,op_{max}}}{N_{l,op_{max}\times w^*}}.$

Proof

We use the method of reduction to absurdity.

Suppose P is a frequent pattern, by Lemma 1, there is $\rho \le Freq(P)=\displaystyle \frac{sup(P)}{N_{n,op_{max}}} \le \displaystyle \frac{sup(Q)\times w^*}{N_{n,op_{max}}} .$ Thus, we have $sup(Q)\ge N_{n,op_{max}}\times \rho /w^*$.

Then, $Freq(Q)=\displaystyle \frac{sup(Q)}{N_{l,op_{max}}}\ge \displaystyle \frac{N_{n,op_{max}}\times \rho }{N_{l,op_{max}}\times w^*}=\eta _{n-l,op_{max}}\times \rho$. However, it is contrary to the hypothesis that $Freq(Q)<\eta _{n-l,op_{max}\times \rho }$.

Hence, the proof is given. $\square$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, D., Yuan, E., Hu, X. et al. Co-occurrence pattern mining based on a biological approximation scoring matrix. Pattern Anal Applic 21, 977–996 (2018). https://doi.org/10.1007/s10044-017-0609-8

Download citation

Received: 07 September 2016
Accepted: 06 February 2017
Published: 28 February 2017
Issue Date: November 2018
DOI: https://doi.org/10.1007/s10044-017-0609-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Co-occurrence pattern mining based on a biological approximation scoring matrix

Abstract

Access this article

Similar content being viewed by others

Frequent Pattern Mining with Non-overlapping Inversions

Mining Frequent Closed Sequential Patterns with Non-user-defined Gap Constraints

FCloSM, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: Measurement of the large number \(N_{l,op_{max}}\)

Appendix 2: Theorem proof

Lemma 1

Proof

Lemma 1’

Proof

Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Co-occurrence pattern mining based on a biological approximation scoring matrix

Abstract

Access this article

Similar content being viewed by others

Frequent Pattern Mining with Non-overlapping Inversions

Mining Frequent Closed Sequential Patterns with Non-user-defined Gap Constraints

FCloSM, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: Measurement of the large number \(N_{l,op_{max}}\)

Appendix 2: Theorem proof

Lemma 1

Proof

Lemma 1’

Proof

Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation