Advertisement

Journal of Intelligent Information Systems

, Volume 46, Issue 1, pp 99–120 | Cite as

Policy-based memoization for ILP-based concept discovery systems

  • Alev MutluEmail author
  • Pinar Karagoz
Article

Abstract

Inductive Programming Logic (ILP)-based concept discovery systems aim to find patterns that describe a target relation in terms of other relations provided as background knowledge. Such systems usually work within first order logic framework, build large search spaces, and have long running times. Memoization has widely been incorporated in concept discovery systems to improve their running times. One of the problems that memoization brings to such systems is the memory overhead which may be a bottleneck. In this work we propose policies that decide what types of concept descriptors to store in memotables and for how long to keep them. The proposed policies have been implemented as extensions to a concept discovery system called Tabular CRIS wEF, and the resulting system is named Policy-based Tabular CRIS. Effects of the proposed policies are evaluated on several datasets. The experimental results show that the proposed policies greatly improve the memory consumption while preserving the benefits introduced by memoization.

Keywords

Multi-relational data mining Inductive logic programming Concept discovery Memoization Memory consumption Scalability 

References

  1. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I. (1996). Fast discovery of association rules. In: Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI/MIT Press.Google Scholar
  2. Blaták, J., & Popelínskỳ, L. (2006). Drap: A framework for distributed mining first-order frequent patterns. In: Proceedings of the 16th conference on inductive logic programming, pp. 25–27.Google Scholar
  3. Blockeel, H., Dehaspe, L., Demoen, B., Janssens, G., Vandecasteele, H. (2002). Improving the efficiency of inductive logic programming through the use of query packs. Journal of Artificial Intelligence Research, 16, 135–166.zbMATHGoogle Scholar
  4. Chung, S., & Luo, C. (2008). Efficient mining of maximal frequent itemsets from databases on a cluster of workstations. Knowledge and Information Systems, 16, 359–391. doi: 10.1007/s10115-007-0115-1.CrossRefGoogle Scholar
  5. Cohen, W.W. (1995). Learning to classify english text with ilp methods. Advances in inductive logic programming, 32, 124–143.Google Scholar
  6. Costa, V.S., Srinivasan, A., Camacho, R., Blockeel, H., Demoen, B., Janssens, G., Struyf, J., Vandecasteele, H., Laer, W.V. (2003). Query transformations for improving the efficiency of ILP systems. Journal of Machine Learning Research, 4, 465–491.zbMATHGoogle Scholar
  7. Davis, M., Liu, W., Miller, P., Redpath, G. (2011). Detecting anomalies in graphs with numeric labels. In: CIKM, pp. 1197–1202.Google Scholar
  8. Dehaspe, L., & Raedt, L.D. (1995). Parallel inductive logic programming. In Proceedings of the MLnet Familiarization Workshop on Statistics, Machine Learning and Knowledge Discovery in Databases, pp. 112–117.Google Scholar
  9. Dehaspe, L., & Raedt, L.D. (1997). Mining association rules in multiple relations. In: N. Lavrac, S. Dzeroski (eds.) ILP, Lecture Notes in Computer Science, vol. 1297, pp. 125–132. Springer.Google Scholar
  10. Di Mauro, N., Taranto, C., Esposito, F. (2014). Link classification with probabilistic graphs. Journal of Intelligent Information Systems, 1–26.  10.1007/s10844-013-0293-0.
  11. Divina, F., Keijzer, M., Marchiori, E. (2003). A method for handling numerical attributes in ga-based inductive concept learners. In: E. Cantú-Paz, J.A. Foster, K. Deb, L. Davis, R. Roy, U.M. O’Reilly, H.G. Beyer, R.K. Standish, G. Kendall, S.W. Wilson, M. Harman, J. Wegener, D. Dasgupta, M.A. Potter, A.C. Schultz, K.A. Dowsland, N. Jonoska, J.F. Miller (eds.) GECCO, Lecture Notes in Computer Science, vol. 2723, pp. 898–908. Springer.Google Scholar
  12. Dolsak, B. (2002). Finite element mesh design expert system. Knowledge-based Systems, 15 (8), 315–322.CrossRefGoogle Scholar
  13. Dolsak, B., & Muggleton, S. (1992). The application of Inductive Logic Programming to finite element mesh design. In: Inductive Logic Programming. Academic Press.Google Scholar
  14. Doncescu, A., Waissman, J., Richard, G., Roux, G. (2002). Characterization of bio-chemical signals by inductive logic programming. Knowledge-Based Systems, 15 (1–2), 129–137.CrossRefGoogle Scholar
  15. Dong, Y., Du, X., Ramakrishna, Y., Ramakrishnan, C., Ramakrishnan, I., Smolka, S., Sokolsky, O., Stark, E., Warren, D. (1999). Fighting livelock in the i-Protocol: A comparative study of verification tools. In TACAS’99: Proceedings of the 5th International Conference on Tools and Algorithms for Construction and Analysis of Systems, Lecture Notes in Computer Science, vol. 1579, pp. 74–88. Springer Berlin / Heidelberg.Google Scholar
  16. Džeroski, S. (1993). Handling imperfect data in inductive logic programming. In: Proceedings of the Fourth Scandinavian Conference on Artificial intelligence—93, SCAI93, pp. 111–125. IOS Press, Amsterdam, The Netherlands, The Netherlands.Google Scholar
  17. Džeroski, S., Dehaspe, L., Ruck, B., Walley, W. (1994). Classification of river water quality data using machine learning. In: Proceedings of the 5th International Conference on the Development and Application of Computer Techniques to Environmental Studies, Vol. I: Pollution modelling, pp. 129–137.Google Scholar
  18. Dzeroski, S. (2003). Multi-relational data mining: An introduction. SIGKDD Explorations, 5 (1), 1–16.CrossRefGoogle Scholar
  19. Dzeroski, S., Jacobs, N., Molina, M., Moure, C., Muggleton, S., Laer, W.V. (1998). Detecting traffic problems with ILP. In: ILP’98: Proceedings of the 8th International Workshop on Inductive Logic Programming, pp. 281–290.Google Scholar
  20. Eager, D., Zahorjan, J., Lazowska, E. (1989). Speedup versus efficiency in parallel systems. IEEE Transactions on Computers, 38 (3), 408 –423.  10.1109/12.21127.CrossRefGoogle Scholar
  21. Fayyad, U.M., & Irani, K.B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In: R. Bajcsy (ed) IJCAI, pp. 1022–1029. Morgan Kaufmann.Google Scholar
  22. Fonseca, N., Silva, F., Camacho, R. (2006). April An inductive logic programming system. In: JELIA’06: Proceedings of the 10th European Conference on Logics in Artificial Intelligence, Lecture Notes in Computer Science, vol. 4160, pp. 481–484. Springer Berlin / Heidelberg.Google Scholar
  23. Goutte, C., & Gaussier, E. (2005). A probabilistic interpretation of precision, recall and f-score, with implication for evaluation. In: ECIR’05: Proceedings of the 27th European Conference on Information Retrieval, pp. 345–359. Springer.Google Scholar
  24. Graefe, G., & McKenna, W. (1993). The volcano optimizer generator: extensibility and efficient search. In: ICDE’93: Proceedings of the 9th International Conference on Data Engineering, pp. 209–218.Google Scholar
  25. Hinton, G. (1990). UCI machine learning repository kinship data set. http://archive.ics.uci.edu/ml/datasets/Kinship.
  26. Holt, J.D., & Chung, S.M. (2001). Multipass algorithms for mining association rules in text databases. Knowledge and Information Systems, 3, 168–183. doi: 10.1007/PL00011664.CrossRefzbMATHGoogle Scholar
  27. James, C. (1996). Part-of-speech disambiguation using ilp. Tech. rep., PRG-TR-25-96 Oxford University Computing Laboratory.Google Scholar
  28. Jia, Y., Zhang, J., Huan, J. (2011). An efficient graph-mining method for complicated and noisy data with real-world applications. Knowledge Information System, 28 (2), 423–447.CrossRefGoogle Scholar
  29. Kavurucu, Y., Senkul, P., Toroslu, I.H. (2009). ILP-based concept discovery in multi-relational data mining. Expert Systems with Applications, 36 (9), 11,418–11,428.CrossRefGoogle Scholar
  30. Kavurucu, Y., Senkul, P., Toroslu, I.H. (2009). ILP-based concept discovery in multi-relational data mining. Expert Systems with Applications, 36 (9), 11,418–11,428.CrossRefGoogle Scholar
  31. Kavurucu, Y., Senkul, P., Toroslu, I.H. (2010). Concept discovery on relational databases: New techniques for search space pruning and rule quality improvement. Knowledge-Based Systems, 23 (8), 743–756.CrossRefGoogle Scholar
  32. King, R.D. (2004). Applying inductive logic programming to predicting gene function. AI Magazine, 25 (1), 57.Google Scholar
  33. King, R.D., Muggleton, S.H., Srinivasan, A., Sternberg, M. (1996). Structure-activity relationships derived by machine learning: The use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. Proceedings of the National Academy of Sciences, 93 (1), 438–442.CrossRefGoogle Scholar
  34. Koga, H., Ishibashi, T., Watanabe, T. (2007). Fast agglomerative hierarchical clustering algorithm using locality-sensitive hashing. Knowledge and Information Systems, 12, 25–53. doi: 10.1007/s10115-006-0027-5.CrossRefzbMATHGoogle Scholar
  35. Krogel, M.A., & Wrobel, S. (2001). Transformation-based learning using multirelational aggregation. In: C. Rouveirol, M. Sebag (eds.) ILP, Lecture Notes in Computer Science, vol. 2157, pp. 142–155. Springer.Google Scholar
  36. Kuželka, O., Szabóová, A., železnỳ, F. (2013). A method for reduction of examples in relational learning. Journal of Intelligent Information Systems, 1–27.  10.1007/s10844-013-0294-z.
  37. Lahiri, M., & Berger-Wolf, T. (2010). Periodic subgraph mining in dynamic networks. Knowledge and Information Systems, 24, 467–497. doi: 10.1007/s10115-009-0253-8.
  38. Lavrac, N., & Dzeroski, S. (1993). Inductive Logic Programming: Techniques and Applications. Routledge, New York, NY, 10001.Google Scholar
  39. Lavrač, N., Džeroski, S., Grobelnik, M. (1991). Learning nonrecursive definitions of relations with LINUS. In: Y. Kodratoff (ed.) Proceedings of the 5th European Working Session on Learning, Lecture Notes in Artificial Intelligence, vol. 482, pp. 265–281. Springer-Verlag.Google Scholar
  40. Li, H.F., Huang, H.Y., Lee, S.Y. (2011). Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits. Knowledge and Information Systems, 28, 495–522.CrossRefGoogle Scholar
  41. Liu, H., Lin, Y., Han, J. (2011). Methods for mining frequent items in data streams: an overview. Knowledge and Information Systems, 26, 1–30. doi: 10.1007/s10115-009-0267-2.CrossRefGoogle Scholar
  42. Michalski, R., & Larson, J. (1997). Inductive inference of VL decision rules. In: Workshop on Pattern-Directed Inference Systems, vol. 63, pp. 33–44. SIGART Newsletter, ACM.Google Scholar
  43. Mooney, R.J., & Califf, M.E. (1995). Induction of first-order decision lists: Results on learning the past tense of english verbs. Journal of Artificial Intelligence Research, 3, 1–24.  10.1613/jair.148.Google Scholar
  44. Muggleton, S. (1990). Inductive logic programming. In: ALT, pp. 42–62.Google Scholar
  45. Muggleton, S. (1995). Inverse entailment and Progol. New Generation Computing. Special issue on Inductive Logic Programming, 13 (3-4), 245–286.Google Scholar
  46. Muggleton, S. (1999). Inductive Logic Programming. In: The MIT Encyclopedia of the Cognitive Sciences (MITECS). MIT Press.Google Scholar
  47. Muggleton, S., & Buntine, W. (1988). Machine invention of first order predicates by inverting resolution. In: ML’88: Proceedings of the 5th International Conference on Machine Learning, pp. 339–351.Google Scholar
  48. Muggleton, S., & Feng, C. (1990). Efficient induction of logic programs. In: ALT’90: Proceedings of the 1st Conference on Algorithmic Learning Theory, pp. 368–381.Google Scholar
  49. Muggleton, S., & Raedt, L.D. (1994). Inductive logic programming: Theory and methods. Journal of Logic Programming, 19/20, 629–679.  10.1016/0743-1066(94)90035-3.CrossRefMathSciNetzbMATHGoogle Scholar
  50. Mutlu, A., Berk, M.A., Senkul, P. (2010). Improving the time efficiency of ILP-based multi-relational concept discovery with dynamic programming approach. In: ISCIS’10: Proceedings of the 25th International Symposium on Computer and Information Sciences, pp. 43–50.Google Scholar
  51. Mutlu, A., & Senkul, P. (2012). Improving hash table hit ratio of an ilp-based concept discovery system with memoization capabilities. In: ISCIS’12: Proceedings of the 27th International Symposium on Computer and Information Sciences.Google Scholar
  52. Mutlu, A., & Senkul, P. (2014). Improving hit ratio of ILP-based concept discovery system with memoization. Computer Journal, 57 (1), 138–153.CrossRefGoogle Scholar
  53. Mutlu, A., Senkul, P., Kavurucu, Y. (2011) Improving the scalability of ILP-based multi-relational concept discovery system through parallelization. Knowledge-Based Systems. doi: 10.1016/j.knosys.2011.11.001
  54. Nassif, H., Al-Ali, H., Khuri, S., Keirouz, W., Page, D. (2010). An inductive logic programming approach to validate hexose binding biochemical knowledge. In: ILP’09: Proceedings of the 19th International Conference on Inductive Logic Programming, pp. 149–165. Springer-Verlag.Google Scholar
  55. Nědellec, C., Adě, H., Bergadano, F., Tausend, B. (1996). Declarative bias in ILP.Google Scholar
  56. Pazzani, M.J., Brunk, C., Silverstein, G. (1991). A knowledge-intensive approach to learning relational concepts. In: ML, pp. 432–436.Google Scholar
  57. Penn, G., & Munteanu, C. (2003). A tabulation-based parsing method that reduces copying. In: ACL’03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 200–207.Google Scholar
  58. Pompe, U., & Kononenko, I. (1995). Linear space induction in first order logic with relieff. Mathematical and Statistical Methods in Artificial Intelligence. CISM Course and Lecture Notes, 363, 185–220.MathSciNetzbMATHGoogle Scholar
  59. Quinlan, J.R. (1990). Learning logical definitions from relations. Machine Learning, 5 (3), 239–266.Google Scholar
  60. Robnik-Sikonja, M., & Kononenko, I. (2003). Theoretical and empirical analysis of relieff and rrelieff. Machine Learning, 53 (1–2), 23–69.CrossRefzbMATHGoogle Scholar
  61. Rocha, R. (2007). On improving the efficiency and robustness of table storage mechanisms for tabled evaluation. In: PADL, pp. 155–169.Google Scholar
  62. Rocha, R., Fonseca, N.A., Costa, V.S. (2005). On applying tabling to inductive logic programming. In: ECML’05: Proceeedings of the 16th European Conference on Machine Learning, pp. 707–714.Google Scholar
  63. Rocha, R., Silva, F., Costa, V.S. (2000). YapTab: A Tabling Engine Designed to Support Parallelism. In: TAPD’00: Proceedings of the 2nd Conference on Tabulation in Parsing and Deduction, pp. 77–87.Google Scholar
  64. Romero, O.E., Gonzalez, J.A., Holder, L.B. (2011). Handling of numeric ranges with the subdue system. In: FLAIRS Conference.Google Scholar
  65. Sagonas, K.F., & Stuckey, P.J. (2004). Just enough tabling. In: PPDP, pp. 78–89.Google Scholar
  66. Sato, T. (2008). A glimpse of symbolic-statistical modeling by prism. Journal of Intelligent Information Systems, 31 (2), 161–176.CrossRefGoogle Scholar
  67. Sebag, M., & Rouveirol, C. (1997). Tractable induction and classification in first order logic via stochastic matching. In: IJCAI’97: Proceedings of the 15th International Joint Conferences on Artificial Intelligence, pp. 888–893.Google Scholar
  68. Shapiro, E. (1983). Algorithmic Program Debugging. The MIT Press.Google Scholar
  69. Skillicorn, D.B., & Wang, Y. (2001). Parallel and sequential algorithms for data mining using inductive logic. Knowledge and Information Systems, 3, 405–421.CrossRefzbMATHGoogle Scholar
  70. Srinivasan, A. (1999). The Aleph Manual. http://www.comlab.ox.ac.uk/activities/machinelearning/Aleph/.
  71. Srinivasan, A. (1999). A study of two sampling methods for analyzing large datasets with ILP. Data Mining and Knowledge Discovery, 3 (1), 95–123.CrossRefGoogle Scholar
  72. Srinivasan, A., King, R.D., Muggleton, S.H., Sternberg, M. (1997). The predictive toxicology evaluation challenge. In: IJCAI-97: Proceedings of the 15th International Joint Conference on Artificial Intelligence, pp. 1–6.Google Scholar
  73. Srinivasan, A., Muggleton, S.H., King, R., Sternberg, M. (1994). Mutagenesis: Ilp experiments in a non-determinate biological domain. In: Proceedings of the 4th International Workshop on Inductive Logic Programming, volume 237 of GMD-Studien, pp. 217–232.Google Scholar
  74. Struyf, J., & Blockeel, H. (2003). Query optimization in inductive logic programming by reordering literals. In: ILP’03: Proceedings of the 13th International Conference on Inductive Logic Programming, pp. 329–346. Springer-Verlag.Google Scholar
  75. Tran, T.N., Satou, K., Ho, T.B. (2005). Using inductive logic programming for predicting protein-protein interactions from multiple genomic data. PKDD, pp. 321–330.Google Scholar
  76. Troncon, R., Demoen, B., Janssens, G. (2006). When tabling does not work. In: Proceedings of Colloquium on Implementation of Constraint Logic Programming Systems.Google Scholar
  77. Tveit, A., & Hetland, M. (2003). Multicategory incremental proximal support vector classifiers. In: V. Palade, R. Howlett, L. Jain (eds.) Knowledge-Based Intelligent Information and Engineering Systems, Lecture Notes in Computer Science, vol. 2773, pp. 386–392. Springer Berlin / Heidelberg.Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Computer Engineering DepartmentSelcuk UniversityKonyaTurkey
  2. 2.Computer Engineering DepartmentMiddle East Technical UniversityAnkaraTurkey

Personalised recommendations