Skip to main content
Log in

The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern mining

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

In big data science, the classic frequent pattern mining is fundamental to various pattern mining applications. Extensive research on this mining has been undertaken for nearly 30 years but left with no reliable mining approach. One of the main issues is the lack of study on the imperative pattern frequency distribution theory. With an emphasis on mining reliability and methodological change, this paper makes up the absent theory, which consists of a bundle of findings on the frequency distribution properties. The primary property is that the frequency distribution curves from different pattern generation modes are quasi-concave and ultimately resultant bell-shaped curves over large datasets. All the findings are well-formed with no exogenous input but rigorous mathematical proofs that every classic pattern mining approach should observe. This paper thus builds up a solid block of the theoretical foundation for rational and ultimately reliable pattern mining. Moreover, the findings inspire interesting rethinking and new conceptions not merely in pattern mining but also extended deeply to set theory and combinatorics. With this inspiration plus the pure mathematic nature of the explorations presented, the contributions of this study may not be restricted to pattern mining only but spring to data science in general or even broader.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Note: A real number sequence \((s_n)\) is a Cauchy sequence if \(\forall \epsilon > 0, \exists t \in N\), such that if \( m, n \ge t\), then \(|s_n - s_m|< \epsilon \). The Cauchy criterion says that any Cauchy sequence is convergent, and vice versa, any convergent series of real numbers is a Cauchy sequence [59,60,61].

References

  1. Fard, M.J.S., Namin, P.A.: Review of apriori based frequent itemset mining solutions on big data. In: 6th International Conference on Web Research (ICWR), pp. 157–164 (2020). https://doi.org/10.1109/ICWR49608.2020.9122295

  2. Gupta, M.K., Chandra, P.: A comprehensive survey of data mining. Int. J. Inf. Technol. 12, 1243–1257 (2020). https://doi.org/10.1007/s41870-020-00427-7

    Article  Google Scholar 

  3. Alangari, N., Alturki, R.: Association rule mining in higher education: A case study of computer science students. In: Mehmood, R., See, S., Katib, I., Chlamtac, I. (eds.) Smart Infrastructure and Applications (2020). Springer, Cham. https://doi.org/10.1007/978-3-030-13705-2_13

  4. Liu, Y., Man, Y., Cui, J.: Research on alarm causality filtering based on association mining. In: Zu, Q., Tang, Y., Mladenović, V. (eds.) Human Centered Computing. HCC 2020. Lecture Notes in Computer Science, vol. 12634 (2021). Springer, Cham. https://doi.org/10.1007/978-3-030-70626-5_47

  5. Zhao, S.: Mining medical causality for diagnosis assistance. In: WSDM \(^{\prime }17\): Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, p. 841 (2017). https://doi.org/10.1145/3018661.3022752

  6. Wang, T., Tian, X., Yu, M., et al.: Stage division and pattern discovery of complex patient care processes. J. Syst. Sci. Complex. 30, 1136–1159 (2017). https://doi.org/10.1007/s11424-017-5302-x

    Article  MathSciNet  MATH  Google Scholar 

  7. Tóth, K., Kósa, I., Vathy-Fogarassy, A.: Frequent treatment sequence mining from medical databases. Stud. Health Technol. Inform. 236, 211–218 (2017). https://doi.org/10.3233/978-1-61499-759-7-211

    Article  Google Scholar 

  8. Malik, M.M., Abdallah, S., Ala’raj, M.: Data mining and predictive analytics applications for the delivery of healthcare services: a systematic literature review. Ann. Oper. Res. 270, 287–312 (2018). https://doi.org/10.1007/s10479-016-2393-z

    Article  MathSciNet  MATH  Google Scholar 

  9. Lakshmanna, K., Khare, N.: Mining DNA sequence patterns with constraints using hybridization of firefly and group search optimization. J. Intell. Syst. 27(3), 349–362 (2018). https://doi.org/10.1515/jisys-2016-0111

    Article  Google Scholar 

  10. Wang, Q., Davis, D.N., Ren, J.: Mining frequent biological sequences based on bitmap without candidate sequence generation. Comput. Biol. Med. 69, 152–157 (2016). https://doi.org/10.1016/j.compbiomed.2015.12.016

    Article  Google Scholar 

  11. Medina-Franco, J.L., Sánchez-Cruz, N., López-López, E., et al.: Progress on open chemoinformatic tools for expanding and exploring the chemical space. J. Comput. Aided Mol. Des. (2021). https://doi.org/10.1007/s10822-021-00399-1

    Article  Google Scholar 

  12. Carrera, G.V.S.M., da Ponte, M.N., Rebelo, L.P.N.: Cover feature: chemoinformatic approaches to predict the viscosities of ionic liquids and ionic liquid-containing systems. ChemPhysChem 20(21), 2720–2720 (2019). https://doi.org/10.1002/cphc.201900978

    Article  Google Scholar 

  13. Peña-Guerrero, J., Nguewa, P.A., García-Sosa, A.T.: Machine learning, artificial intelligence, and data science breaking into drug design and neglected diseases. WIREs Comput. Mol. Sci. 11(5), e1513 (2021). https://doi.org/10.1002/wcms.1513

    Article  Google Scholar 

  14. Hoadley, K.A., Yau, C., Hinoue, T., et al.: Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell 173(2), 291-304.e6 (2018). https://doi.org/10.1016/j.cell.2018.03.022

    Article  Google Scholar 

  15. Schrider, D.R., Kern, A.D.: Supervised machine learning for population genetics: a new paradigm. Trends Genet. 34(4), 301–12 (2018). https://doi.org/10.1016/j.tig.2017.12.005

    Article  Google Scholar 

  16. Wilson, C.M., Li, K., Yu, X., et al.: Multiple-kernel learning for genomic data mining and prediction. BMC Bioinform. 20, 426 (2019). https://doi.org/10.1186/s12859-019-2992-1

    Article  Google Scholar 

  17. Grzenda, M., Gomes, H.M., Bifet, A.: Delayed labelling evaluation for data streams. Data Min. Knowl. Disc. 34(5), 1237–1266 (2019). https://doi.org/10.1007/s10618-019-00654-y

    Article  MathSciNet  MATH  Google Scholar 

  18. Kawabata, k., Matsubara, Y., Sakurai, Y.: Automatic sequential pattern mining in data streams. In: CIKM \(^{\prime }19\): Proceedings of the 28th ACM International Conference on Information and Knowledge Management November, pp. 1733–1742 (2019). https://doi.org/10.1145/3357384.3358002

  19. Bhogadhi, V., Chandak, M.B.: A review of frequent pattern mining algorithms for uncertain data. In: Bi, Y., Kapoor, S., Bhatia, R. (eds.) Proceedings of SAI Intelligent Systems Conference (IntelliSys) 2016. IntelliSys 2016. Lecture Notes in Networks and Systems, vol. 16. Springer, Cham. https://doi.org/10.1007/978-3-319-56991-8_73

  20. Wu, D., Ren, J., Sheng, L.: Uncertain maximal frequent subgraph mining algorithm based on adjacency matrix and weight. Int. J. Mach. Learn. Cyber. 9, 1445–1455 (2018). https://doi.org/10.1007/s13042-017-0655-y

    Article  Google Scholar 

  21. Wang, L.: Heterogeneous data and big data analytics. Autom. Control Inf. Sci. 3(1), 8–15 (2017). https://doi.org/10.12691/acis-3-1-3

    Article  Google Scholar 

  22. Saxena, K., Patil, A., Sunkle, S., Kulkarni, V.: Mining heterogeneous data for formulation design. International Conference on Data Mining Workshops (ICDMW), pp. 589–596 (2020). https://doi.org/10.1109/ICDMW51313.2020.00084

  23. Wang, T., Desai, B.C.: On the appropriate pattern frequentness measure and pattern generation mode: a critical review. In: IDEAS \(^{\prime }19\): Proceedings of the 23rd International Database Applications & Engineering Symposium, Article No.: 32 (1–15) (2019). https://doi.org/10.1145/3331076.3331125

  24. Tijms, H.: Understanding Probability. Cambridge University Press, Cambridge (2004)

    MATH  Google Scholar 

  25. Gut, A.: Probability: A Graduate Course. Springer, Berlin (2005)

    MATH  Google Scholar 

  26. Al-Rifai, S. S., Shaban, A. M., et al.: Paper review on data mining, components, and big data. In: International Congress on Human–Computer Interaction, Optimization and Robotic Applications (HORA), pp. 1–4 (2020) https://doi.org/10.1109/HORA49412.2020.9152919

  27. Gan, W., Lin, J.C., Fournier-Viger, P., Chao, H.C., Yu, P.S.: A survey of parallel sequential pattern mining. ACM Trans. Knowl. Discov. Data 13(3), 1–34 (2019). https://doi.org/10.1145/3314107

    Article  Google Scholar 

  28. Kirchgessner, M., Leroy, V., Amer-Yahia, S. et al.: Testing interestingness measures in practice: a large-scale analysis of buying patterns. In: IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 547–556 (2016). https://doi.org/10.1109/DSAA.2016.53

  29. Lin, J.C.W., Gan, W., Fournier-Viger, P., et al.: Weighted frequent itemset mining over uncertain databases. Appl. Intell. 44, 232–250 (2016). https://doi.org/10.1007/s10489-015-0703-9

    Article  Google Scholar 

  30. Sharmila, S., Vijayarani, S.: Comparative analysis of frequent closed itemset mining algorithms. Int. J. Res. Eng. Appl. Manag. (2018). https://doi.org/10.18231/2454-9150.2018.0616

    Article  Google Scholar 

  31. van Leeuwen, M., Ukkonen, A.: Fast estimation of the pattern frequency spectrum. In: Calders T., Esposito F., Hüllermeier E., Meo R. (eds.) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2014. Lecture Notes in Computer Science, vol. 8725 (2014). Springer, Berlin. https://doi.org/10.1007/978-3-662-44851-9_8

  32. Geerts, F., Goethals, B., Den Bussche, J.V.: Tight upper bounds on the number of candidate patterns. ACM Trans. Database Syst. 30(2), 333–363 (2005). https://doi.org/10.1145/1071610.1071611

    Article  Google Scholar 

  33. Shenoy, P., Haritsa, J.R., Sudarshan, S., et al.: Turbo-charging vertical mining of large databases. ACM SIGMOD Rec. 29(2), 22–23 (2000)

    Article  Google Scholar 

  34. Truong, T., Duong, H., Le, B., Fournier-Viger, P.: Efficient vertical mining of high average-utility itemsets based on novel upper-bounds. IEEE Trans. Knowl. Data Eng. 31(2), 301–314 (2019). https://doi.org/10.1109/TKDE.2018.2833478

    Article  Google Scholar 

  35. Allenby, R.B.J.T., Slomson, A.: How to Count: An Introduction to Combinatorics. Discrete Mathematics and Its Applications, 2nd edn., pp. 51–60. CRC Press, Boca Raton (2010)

    Book  Google Scholar 

  36. Goethals, B., Zaki, M.J.: Advances in frequent itemset mining implementations: report on FIMI’03. ACM SIGKDD Explor. Newsl. 6(1), 109–117 (2003). https://doi.org/10.1145/1007730.1007744

    Article  Google Scholar 

  37. Avriel, M., Diewert, W.E., Schaible, S., Zang, I.: Generalized Concavity. Plenum Press, New York (1988)

    Book  MATH  Google Scholar 

  38. Horn, R.A., Johnson, C.R.: Matrix Analysis, 2nd edn. Cambridge University Press, Cambridge (2013)

    MATH  Google Scholar 

  39. Hazewinkel, M. (ed.): Symmetric Matrix. Encyclopedia of Mathematics. Springer, Berlin (2001)

    Google Scholar 

  40. Shores, T.S.: Applied Linear Algebra and Matrix Analysis. Springer, Berlin (2007). https://doi.org/10.1007/978-0-387-48947-6

    Book  MATH  Google Scholar 

  41. Rechtschaffen, E.: Real roots of cubics: explicit formula for quasi-solutions. Math. Gaz. 92, 268–276 (2008). https://doi.org/10.1017/S0025557200183147

    Article  Google Scholar 

  42. Wadsworth, G.P.: Introduction to Probability and Random Variables. McGraw-Hill, New York (1960)

    MATH  Google Scholar 

  43. Ugarte, M.D., Militino, A.F., Arnholt, A.T.: Probability and Statistics with R, 2nd edn. CRC Press, Boca Raton (2016)

    MATH  Google Scholar 

  44. Riordan, J.: Moment recurrence relations for binomial, Poisson and hypergeometric frequency distributions. Ann. Math. Stat. 8(2), 103–111 (1937)

    Article  MATH  Google Scholar 

  45. Cameron, A.C., Trivedi, P.K.: Regression analysis of count data. J. Am. Stat. Assoc. (1998). https://doi.org/10.1017/CBO9780511814365

    Article  MATH  Google Scholar 

  46. Patel, J.K., Read, C.B.: Handbook of the Normal Distribution, 2nd edn. CRC Press, Boca Raton (1996)

  47. Kune, K.: Set Theory. College Publications, Beverly Hills (2011)

    Google Scholar 

  48. Rodych, V.: Wittgenstein’s critique of set theory. South. J. Philos. 38(2), 281–319 (2010). https://doi.org/10.1111/j.2041-6962.2000.tb00902.x

    Article  MathSciNet  Google Scholar 

  49. Paine, J.: Set-theoretic comparative methods: less distinctive than claimed. Comp. Political Stud. (2015). https://doi.org/10.1177/0010414014564851

  50. Perez, J.A.: Addressing mathematical inconsistency: Cantor and Gödel refuted. arXiv:1002.4433v1 [math.GM] (2010)

  51. Machover, M.: Set Theory, Logic and Their Limitations. Cambridge University Press, Cambridge (1996)

    MATH  Google Scholar 

  52. Darling, D. J.: The Universal Book of Mathematics. Wiley, London, p. 106 (2004)

  53. Stephen and Penny: how to show a non empty set is a subset of every set. mathcentral.uregina.ca: http://mathcentral.uregina.ca/QQ/database/QQ.09.06/narayana1.html. Accessed June 2020

  54. Wikipedia: Empty Set. https://en.wikipedia.org/wiki/Empty_set. Accessed July 2020

  55. Hurley, P.J.: A Concise Introduction to Logic, 12th edn. Cengage Learning, Boston (2015)

    MATH  Google Scholar 

  56. Ganter, B., Stumme, G., Wille, R. (eds.) Formal Concept Analysis: Foundations and Applications. Lecture Notes in Artificial Intelligence, No. 3626. Springer (2005). https://doi.org/10.1007/978-3-540-31881-1

  57. Bona, M.: Combinatorics of Permutations, 2nd edn. CRC Press, Boca Raton (2012)

    MATH  Google Scholar 

  58. Ferreirós, J.: Labyrinth of Thought: A History of Set Theory and Its Role in Mathematical Thought. Birkhäuser, Basel (2007). https://doi.org/10.1007/978-3-7643-8350-3

    Book  MATH  Google Scholar 

  59. William, W.: An Introduction to Analysis, p. 188. Prentice Hall, Upper Saddle River (2010)

  60. Krause, H.: Completing perfect complexes. Math. Z. 296, 1387–1427 (2020). https://doi.org/10.1007/s00209-020-02490-z

    Article  MathSciNet  MATH  Google Scholar 

  61. Dawkins P.: Convergence/divergence of series, section 4-4, tutorial. http://tutorial.math.lamar.edu/Classes/CalcII/ConvergenceOfSeries.aspx. Accessed Sept 2018

  62. Ayestaran, F.: Interactive implementation of pascal triangle in SQL. http://pascaltriangle.ayestaran.co.uk/. Accessed Feb 2016

  63. Frequent Itemset Mining Dataset Repository. http://fimi.cs.helsinki.fi/data/. Accessed July 2009

  64. Bárány, I., Vu, V.: Central limit theorems for Gaussian polytopes. Ann. Probab. arXiv:math/0610192v1 [math.CO] (2007)

  65. Knuth, D.E.: Two thousand years of combinatorics. In: Wilson, R., Watkins, J.J. (eds.) Combinatorics: Ancient and Modern, pp. 7–37. Oxford University Press, Oxford (2013). https://doi.org/10.1093/acprof:oso/9780199656592.003.0001

    Chapter  Google Scholar 

Download references

Acknowledgements

Special thanks are to Dr. Bipin C. Desai for his help and advice in the early stage of the writing of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tongyuan Wang.

Ethics declarations

Conflict of interest

This is an independent work of the only author without any kind of conflict with any other party.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: The detailed empirical results from the “Retail” dataset

Appendix: The detailed empirical results from the “Retail” dataset

Total number of elements n = 16470. Total number of tuples u = 88162. Longest pattern length \(\alpha \) = 76.

The \(g_i\) distribution:

\(3016\,\,5516\,\,6919\,\,7210\,\,6814\,\,6163\,\,5746\,\,5143\,\,4660\,\,4086\,\,3751\,\,3285\,\,2866 2620\,\,2310\,\,2115\,\,1874\,\,1645\,\,1469\,\,1290\,\,1205 \,\,981\,\,887\,\,819\,\,684\,\,586\,\,582\,\,472 480\,\,355\,\,310\,\,303\,\,272\,\, 234\,\,194\,\,136\,\,153\,\,123\,\,115\,\,112\,\,76\,\,66\,\,71\,\,60\,\,50\,\,44\,\,37 37\,\,33\,\,22\,\,24\,\,21 \,\,21\,\,10\,\,11\,\,10\,\,9\,\,11\,\,4\,\,9\,\,7\,\,4\,\,5\,\,2\,\,2\,\,5\,\,3\,\,30\,\,0\,\,1\,\,0\,\,1\,\,1\,\,0\,\,1\)

The \(H_k\) Series: \((1)\,\,908576\,\,(2)\,\,7164335\,\,(3)\, 52502539\,\,(4)\,\,366817927\,\,(5)\,\,2447321444\,\,(6) \,15534598332 \,\,(7)\,\,93307736462\,\,(8)\,\,527550301625\,\,(9)\,\,2796416534241\,\,(10) \,13863139450195\,\,(11)\,\,64204046715896\,\,(12)\,\,277757200264229\,\,(13) \,1.12312584494064E\!+\!15\,\,(14)\,\,4.24904654295735E\!+\!15\,\,(15) \,1.5058885990449E\!+\!16 \,\,(16)\,\,5.00625023811958E\!\!+\!\!16\,\,(17) 1.56327472119759E\!\!+\!\!17\,\,(18)\,\, 4.59121121980175E\!+\!17\,(19)\,\, 1.26976301242088E\!\!+18\,\,(20)\,\,3.31067777753649E+18\,\,(21)\,\,8.14636975894685E+18\,\,(22)\,\,1.8935525717633E+19\,\,(23)\,\,4.16131440789675E+19\,\,(24)\,\,8.65288386954046E+19\,\,(25)\,\,1.70361679656958E+20\,\,(26)\,\,3.17787016348576E+20\,\,(27) \,5.61949830792223E +20\,\,(28)\,\,9.4248316680112E+20\,\,(29)\,\,1.49988185541216E+21\,\,(30)\,\,2.26577690430042E+21\,\,(31)\,\, 3.2501290738274E +21\,\,(32)\,\,4.42825143223911E+21\,\,(33)\,\,5.73212226482143E+21\,\,(34)\,\,7.05069716806149E+21\,\,(35)\,\,8.24219004470137E +21\,\,(36)\,\,9.15769085268638E+21\,\,(37)\,\,9.67116815427317E+21\,\,(38)\,\,9.70768639718059E+21\,(39) \,9.26120299243453E +\!21\,\,(40)\,\,8.39616377302037E+21\,\,(41)\,\,7.23232941850291E\!+\!21\,\,(42)\,\,5.91772992838759E\!+\!21\,\,(43) 4.59811839985694E\!+\!21\,\,(44)\,\,3.39150713519183E\!+\!21\,\,(45) \,2.37356853011541E\!+\!21\,\,(46)\,\,1.57537354482505E\!+21\,\,(47) \,9.91013257283473E\!+\!20\,\,(48)5.9046404820075E\!+\!20\,\,(49) \,3.32957641309953E\!+\!20\,\, (50)\,\,1.77535631861495E\!+\!20\,\,(51) \,8.94240979386786E+19\,\,(52)\,\,4.25025368313273E+19\,\,(53) \,1.90382440924376E+19\,\,(54)\,\,8.0257650807726E+18\,\,(55) \,3.17919252128806E+18\,\,(56)\,\,1.18129875755773E+18\,\,(57) \,4.10926627354641E+17\,\,(58)\,\,1.33528383786941E +17\,\,(59) \,4.04304582589712E+16\,\,(60)\,\,1.13749193542012E+16\,\,(61) \,2.96417842579272E+15\,\,(62)\,\,712836643924391 (63)\,\,157536096738717\,\,(64) \,31838940145963\,\,(65)\,\,5851270021055\,\,(66)\,\,971240578752\,\,(67) 144437457258\,\,(68) \,\,19056211853\,\,(69)\,\,2203389079\,\,(70)\,\,219831833\,\,(71) \,18542293\,(72)\,\,1285749\,\,(73)\,\,70375\,\,(74)\,\,2851\,\,(75)\,\,76\,\,(76)\,\,1\)

The \(H_k\) Quasi Concavity: + Increase, - decrease, = equality.

\(1<>\,\,\,\,2+\,\,3+\,\,4+\,\,5+\,\,6+\,\,7+\,\,8+\,\,9+\,\,10+\,\,11+\,\,12+\,\,13+\,\,14+\,\,15+\,\,16+\,\,17+\,\,18+ 19+\,\,20+\,\,21+\,\,22+\,\,23+\,\,24+\,\,25+\,\,26+\,\,27+\,\,28+\,\,29+\,\,30+\,\,31+\,\,32+\,\,33+\,\,34+\,\,35+ 36+\,\,37+\,\,38+\,\,39-\,\,40-\,\,41-\,\,42-\,\,43-\,\,44-\,\,45-\,\,46-\,\,47-\,\,48-\,\,49-\,\,50-\,\,51-\,\,52- 53-\,\,54-\,\,55-\,\,56-\,\,57-\,\,58-\,\,59-\,\,60-\,\,61-\,\,62-\,\,63-\,\,64-\,\,65-\,\,66-\,\,67-\,\,68-\,\,69- 70-\,\,71-\,\,72-\,\,73-\,\,74-\,\,75-\,\,76-\)

The \(H_k\) Genuine Concavity: (\(H_k - (H_{k-1} + H_{k+1})/2 \ge 0?\))

Theoretic concavity domain = [33, 43]; exact = [33, 43], detailed as below:

\(1<>\,\,2\,\, :(-19541222.5)\,\,3:(-134488592)\,\,4:(-883094064.5) \,\,5:(-5503386685.5)\,\,6:(-32342930621)\,\,7: (-178234713516.5) \,8:(-917311833726.5)\,\,9:(-4398928341669)\,\,10:(-19637092174873.5) \,11:(-81606123141316)12:(-315907745564039) \,13:(-1.14027602667015E\!+\!15)\,\,14:(-3.84195937473746E\!+\!15) 15:(-1.20968884716276E\!+\!16)\,\,16:(-3.56306766739083E+16) \,\,17:(-9.8264340060926E\!+\!16)\,\,18: (-2.53924120290146E\!+\!17) \,19:(-6.15136437337449E\!+\!17)\,\,20:(-1.39738860814738E\!+\!18) 21:(-2.97673198863789E\!+\!18)\,\,22:(-5.94423120132421E\!+\!18) \,23:(-1.11190381275513E\!+\!19)\,\,24: (-1.94585731725583E\!+\!19) \,25:(-3.1796247865032E\!+\!19)\,\,26:(-4.83687388760149E\!+\!19) 27:(-6.81852607826247E\!+\!19)\,\,28:(-8.84326763010704E\!+\!19)\,\, 29:(-1.04248180138614E\!+\!20)\,\,30: (-1.09228560319354E\!+\!20) \,31:(-9.68850944423698E\!+\!19)\,\,32:(-6.28742370853074E\!+\!19) 33:(-7.35203532886717E\!+\!18)\,\,34:(6.35410133000913E\!+\!19) \,35:(1.37996034327435E\!+\!20) \,\,36:(2.01011753199109E\!+\!20) \,37:(2.38479529339683E\!+\!20)\,\,38:(2.41500823826744E\!+\!20) 39:(2.09277907334049E\!+\!20)\,\,40:(1.49397567551648E\!+\!20) \,41:(7.53825677989307E\!+\!19) \,\,42:(2.50601920766935E\!+\!18) \,43:(-5.65001319327722E\!+\!19)\,44:(-9.43363297943486E\!+\!19) \,45: (-1.09871809893028E\!+\!20)\,\,46:(-1.06917348874388E\!+\!20) \,47:(-9.19055392294298E\!+\!19) \,48:(-7.1521401095963E\!+\!19) 49:(-5.10421987211691E\!+\!19)\,\,50:(-3.36552377628212E\!+\!19) \,51: (-2.05949864077324E\!+\!19)\,\,52:(-1.17286341842308E\!+\!19) \,53:(-6.22590686361234E\!+\!18) \,\,54:(-3.08295322609023E\!+\!18) \,55:(-1.4243393978771E\!+\!18)\,\,56:(-6.13760816763625E\!+\!17) \,57: (-2.46486943317692E\!+\!17)\,\,58:(-9.21501590198654E\!+\!16) \,59:(-3.20211933115998E\!+\!16) \,\,60:(-1.03223989881807E\!+\!16) \,61:(-3.07969957327008E\!+\!15)\,\,62:(-848020617341325) 63:(-214801695296460)\,\,64:(-49854743233923)\,\,65:(-10553820341302.5) \,66:(-2026613160404.5)\,\,67: (-350710938044.5)\,\,68:(-54264211315.5) \,69:(-7434632764)\,\,70:(-891133853)\,\,71:(-92016498)\,\,72: (-8020585) \,73:(-573925)\,\,74:(-32374.5)\,\,75:(-1350)\)

The \(R_k\) Series:

\(0.21027292525152.9\,\,0.297094051410328\,\,0.382831246282321 \,\,0.463316718039126\,\,0.536416236147605\) \(0.600644667263741 \,\,0.65552176787858\,\,0.701570924935985\,\,0.739920275015639 \,\,0.771879594163558 \,\,0.798676329287781\,\,0.82134661868907 \,\,0.840718401553246\,\,0.857434446016042\,\,0.871986691039956\, \,\,0.884749696089944\,\,0.896009184273812\,0.905984998255989\, \,\,0.914848923844976\,\,0.922738141516824 0.929765099200748 \,\,0.936024549137197\,\,0.941598411268332\,\,0.94655903067817 \,\,0.950971295213339 0.954893979352975\,\,0.958380588604126 \,\,0.961479900011198\,\,0.964236331011297\,\,0.96669022077285 \,\,0.968878073679284\,\,0.970832791317424\,\,0.972583904574083 \,\,0.974157808823637\,\,0.975578000710482 \,\,0.976865313167724 \,\,0.978038144975912\,\,0.979112681618661 \,\,0.980103104973037 \,\,0.981021790211479 0.981879489047814\,\,0.98268549907259 \,\,0.983447819379726\,\,0.984173293000019\,\,0.984867736850611 \,\,0.985536060009938\,\,0.98618237115967\,\,0.986810076020608 \,\,0.987421965565713\,0.988020295733908 \,\,0.988606859302998 \,\,0.989183050515657\,\,0.989749922993555\,\, 0.990308241423762 \,\,0.990858527459561 \,\,0.991401100244955\,\,0.991936111947395 \,\,0.992463578665651\,\,0.992983407067619\,\,0.993495417104725 \,\,0.993999361143785\,\,0.994494939852474\,\,0.994981815169571 \,\,0.995459620685058\,\,0.995927969747206 \,\,0.996386461603664 \,\,0.996834685870954\,\,0.997272225611996\,\,0.997698659284314 \,\,0.998113561803096\,\, 0.99851650494359\,\,0.998907057287231 \,\,0.999284783895796\,\,0.9996492458786391\)

The \(R_k\) Monotonic: + Increase, - decrease, = equality.

\(=\,\,1+\,\,2+\,\,3+\,\,4+\,\,5+\,\,6+\,\,7+\,\,8+\,\,9+\,\,10+\,\,11+\,\,12+\,\,13+\,\,14+\,\,15+\,\,16+\,\,17+\,\,18+\,\,19+ 20+\,\,21+\,\,22+\,\,23+\,\,24+\,\,25+\,\,26+\,\,27+\,\,28+\,\,29+\,\,30+\,\,31+\,\,32+\,\,33+\,\,34+\,\,35+\,\,36+ 37+\,\,38+\,\,39+\,\,40+\,\,41+\,\,42+\,\,43+\,\,44+\,\,45+\,\,46+\,\,47+\,\,48+\,\,49+\,\,50+\,\,51+\,\,52+\,\,53+ 54+\,\,55+\,\,56+\,\,57+\,\,58+\,\,59+\,\,60+\,\,61+\,\,62+\,\,63+\,\,64+\,\,65+\,\,66+\,\,67+\,\,68+\,\,69+\,\,70+ 71+\,\,72+\,\,73+\,\,74+\,\,75+\)

The accumulative frequency \(w_0\) = 1.08160582031538E+23

The sum of odd length pattern frequencies \(H_{ odd}\) = 5.4080291015769E+22

The sum of even length pattern frequencies \(H_{ even}\) = 5.4080291015769E+22

The \(h_k\) Series:

\((1)\,\,88162\,\,(2)\,\,820414\,\,(3)\,\,6343921\,\,(4)\,\,46158618\,\,(5)\, 320659309\,\,(6) \,2126662135\,\,(7)\,\,13407936197\,\,(8)\,\,79899800265\,\,(9)\,\,447650501360\,\,(10) \,2348766032881\,\,(11)\,\,11514373417314\,\,(12)\,\,52689673298582\,\,(13) \,225067526965647(14)\,\,898058317974992\,\,(15)\,\,3.35098822498236E\!+\!15\,\,(16) \,1.17078977654666E\!+\!16\,\,(17)\,\,3.83546046157292E\!+\!16\,\,(18) \,1.1797286750403E+17\,\,(19)\,\,3.41148254476145E\!+\!17\,\,(20) \,\,9.28614757944738E\!+\!17\,\,(21)\,\,2.38206301959175E\!+\!18\,\,(22) \,\,5.7643067393551E+18\,\,(23)\,\,1.31712189782779E+19\,\,(24) \,2.84419251006897E+19\,\,(25)\,\,5.8086913594715E+19\,\,(26) \,\,1.12274766062243E\!+\!20\,\,(27)\,\,2.05512250286333E\!+\!20\,(28) \,3.56437580505891E\!+\!20\,\,(29)\,\,5.86045586295229 E+20\,\,(30) \,9.13836269116928E+20\,\,(31)\,\,1.35194063518349E+21\,\,(32) \,1.8981884386439E+21\,\,(33)\,\,2.53006299359521 E+21\,\,(34) \,3.20205927122623E+21\,\,(35)\,\,3.84863789683527E+21\,\,(36) \,4.3935521478661E+21\,\,(37)\,\,4.76413870482027 E+21\,\,(38) \,4.90702944945289E+21\,\,(39)\,\,4.80065694772769E+21\,\,(40) \,4.46054604470683E+21\,\,(41)\, 3.93561772831 354E+21\,\,(42) \,3.29671169018937E+21\,(43)\,\,2.62101823819822E+21\,\,(44) \,1.97710016165872E+21\,\,(45)\,\,1.414406973 53311E+21\,\,(46) \,9.59161556582305E+20\,\,(47)\,\,6.1621198824275E+20\,\,(48) \,3.74801269040722E+20\,\,(49)\,\,2.156627791 60028E+20\,\,(50) \,1.17294862149926E+20\,\,(51)\,\,6.02407697115692E+19\,(52) \,2.91833282271095E+19\,\,(53)\,\,1.3319208 6042178E+19\,\,(54) \,5.71903548821977E+18\,\,(55)\,\,2.30672959255284E+18\,\,(56) \,8.72462928735225E+17\,\,(57)\,\,3.088358 28822501E+17\,\,(58) \,1.02090798532141E+17\,\,(59)\,\,3.14375852548001E+16\,\,(60) \,8.99287300417102E+15\,\,(61)\,\,2.38204 635003018E+15\,\,(62)\,\,582132075762540\, \,(63)\,\,130704568161851\,\,(64)\,\,26831528576866\,\,(65)\,\,5007411569097\,\,(66) \,84385 8451958\,\,(67)\,\,127382126794\,\,(68)\,\,17055330464\,\,(69)\,2000881389\,\,(70) \,202507690\,(71)\,\,17324143\,\,(72)\,\,1218\,150\,\,(73) \,\,67599\,\,(74)\,\,2776\,\,(75)\,\,75 \,(76)\,\,1\)

The \(h_k\) Quasi Concavity: + Increase, - decrease, = equality.

\(1<>\,\,\,\,2+\,\,3+\,\,4+\,\,5+\,\,6+\,\,7+\,\,8+\,\,9+\,\,10+\,\,11+\,\,12+\,\,13+\,\,14+\,\,15+\,\,16+\,\,17+\,\,18+ 19+\,\,20+\,\,21+\,\,22+\,\,23+\,\,24+\,\,25+\,\,26+\,\,27+\,\,28+\,\,29+\,\,30+\,\,31+\,\,32+\,\,33+\,\,34+\,\,35+ 36+\,\,37+\,\,38+\,\,39-\,\,40-\,\,41-\,\,42-\,\,43-\,\,44-\,\,45-\,\,46-\,\,47-\,\,48-\,\,49-\,\,50-\,\,51-\,\,52- 53-\,\,54-\,\,55-\,\,56-\,\,57-\,\,58-\,\,59-\,\,60-\,\,61-\,\,62-\,\,63-\,\,64-\,\,65-\,\,66-\,\,67-\,\,68-\,\,69- 70-\,\,71-\,\,72-\,\,73-\,\,74-\,\,75-\,\,76-\)

The \(h_k\) Genuine Concavity: (\(h_k - (h_{k-1} + h_{k+1})/2 \ge 0?\))

Concavity domain = [33, 43], detailed as below:

\(1<>\,\,\,\,2:(-2395627.5)\,\,3:(-17145595)\,\,4:(-117342997)\,\,5:(-765751067.5) \,6:(-4737635618)\,\,7: (-27605295\) \(003)\,\,8:(-150629418513.5) \,9:(-766682415213)\,\,10:(-3632245926456)\,\,11:(-16004846248417.5) \,12: (-65601276892898.5)\,\,13:(-250306468671140)\,\,14:(-889969557999009) \,15:(-2.95198981673845E\!+\!15)\,\,16: (-9.14489865488914E\!+\!15) \,17:(-2.64857780190192E\!+\!16)\,\,18:(-7.17785620419068E\!+\!16) \,\,19:(-1.82145558248 24E\!+\!17)\,\,20:(-4.3299087908921E\!+\!17) \,21:(-9.64397729058168E\!+\!17)\,\,22:(-2.01233425957972E\!+\!18) \,23:(-3.93189694174449E\!+\!18)\,\,24:(-7.18714118580676E\!+\!18) \,25:(-1.22714319867515E\!+\!19)\,\,26:(-1.95248158782 805E\!+\!19) \,27:(-2.88439229977343E\!+\!19)\,\,28:(-3.93413377848903E\!+\!19) \,29:(-4.90913385161801E\!+\!19)\,\,30: (-5.51568416224334E\!+\!19) \,31:(-5.407171869692E\!+\!19)\,\,32:(-4.28133757454501E\!+\!19) \,\,33:(-2.0060861339 8568E\!+\!19)\,\,34:(1.27088260109896E\!+\!19) \,\,35:(5.08321872891022E\!+\!19)\,\,36:(8.71638470383339E\!+\!19) \,37:(1.13847906160774E\!+\!20)\,\,38:(1.2463162317891E\!+\!20) \,39:(1.16869200647833E\!+\!20)\,\,40:(9.24087066862158E\!+\!19) \,\,41:(5.69888608654323E\!+\!19)\,\,42:(1.83937069334968E\!+\!19) \,43:(-1.5887687725827E\!+\!19)\,\,44:(-4.0612444 2069455E\!+\!19) \,\,45:(-5.37238855874031E\!+\!19)\,\,46:(-5.6147924305625E\!+\!19) \,47:(-5.07694245687632E\!+\!19)\,\,48:(-4.11361146606667E\!+\!19) \,\,49:(-3.03852864352963E\!+\!19)\,\,50:(-2.06569122858727E\!+\!19) \,51:(-1.29983254769 484E\!+\!19)\,\,52:(-7.59666093078404E\!+\!18) \,\,53:(-4.13197325344678E\!+\!18)\,\,54:(-2.0939336101 6557E\!\!+\!\!18) \,55:(-9.8901961592466E\!\!+\!\!17)\,\,56:(-4.35319781952443E\!\!+\!17) \,57:(-1.78441034811182E\!\!+\!17)\,\,58:(-6.80459085065098E\!+\!16) \,59:(-2.41042505133557E\!+\!16)\,\,60:(-7.91694279824414E\!+\!15) \,61:(-2.40545618993661E\!+\!15)\,\,62: (-674243383333473) \,63:(-173777234007852)\,\,64:(-41024461288608)\,\,65:(-8830281945315) \,66:(-1723538395987.5)\,\,67:(-303074764417)\,\,68:(-47636173627.5) \,69:(-6628037688)\,\,70:(-806595076)\,\,71:(-84538777)\,\,72:(-7477721) \,73:(-542864)\,\,74:(-31061)\,\,75:(-1313.5)\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, T. The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern mining. Int J Data Sci Anal 16, 43–83 (2023). https://doi.org/10.1007/s41060-022-00340-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-022-00340-1

Keywords

Navigation