Logical Languages for Data Mining

Giannotti, Fosca; Manco, Giuseppe; Wijsen, Jef

doi:10.1007/978-3-642-18690-5_9

Fosca Giannotti⁴,
Giuseppe Manco⁵ &
Jef Wijsen⁶

137 Accesses
4 Citations

Abstract

Data mining focuses on the development of methods and algorithms for such tasks as classification, clustering, rule induction, and discovery of associations. In the database field, the view of data mining as advanced querying has recently stimulated much research into the development of data mining query languages. In the field of machine learning, inductive logic programming has broadened its scope toward extending standard data mining tasks from the usual attribute-value setting to a multirelational setting. After a concise description of data mining, the contribution of logic to both fields is discussed. At the end, we indicate the potential use of logic for unifying different existing data mining formalisms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References References

S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, Reading, MA, 1995.
MATH Google Scholar
R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large databases. In Proc. ACM SIGMOD Int. Conf. Management of Data, pp. 207–216, 1993.
Google Scholar
R. Agrawal and K. Shim. Developing tightly-coupled data mining applications on a relational database system. In Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD’96), pp. 287–290, 1996.
Google Scholar
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proc. Int. Conf. Very Large Data Bases, pp. 487–499, 1994.
Google Scholar
D. Barbará, W. DuMouchel, C. Faloutsos, P. J. Haas, J. M. Hellerstein, Y. E. Ioannidis, H. V. Jagadish, T. Johnson, R. T. Ng, V. Poosala, K. A. Ross, and K. C. Sevcik. The New Jersey data reduction report. Data Engineering Bulletin, 20(4):3–45, 1997.
Google Scholar
R. J. Bayardo Jr., R. Agrawal, and D. Gunopulos. Constraint-based rule mining in large, dense databases. In Proc. 15th Int. Conf. on Data Engineering (ICDE’99), pp. 188–197, 1999.
Google Scholar
M. J. A. Berry and G. Linoff. Data Mining Techniques for Marketing, Sales, and Customer Support. Wiley, New York, 1997.
Google Scholar
G. Bisson. Learning in FOL with a similarity measure. In Proc. 10th National Conf. on Artificial Intelligence (AAAI’92), pp. 82–87, 1992.
Google Scholar
H. Blockeel and L. De Raedt. Top-down induction of first-order logical decision trees. Artificial Intelligence, 101(1–2):285–297, 1998.
Article MathSciNet MATH Google Scholar
H. Blockeel, L. De Raedt, and J. Ramon. Top-down induction of clustering trees. In Proc. 15th Int. Conf. on Machine Learning (ICML’98), pp. 55–63, 1998.
Google Scholar
U. Bohnebeck, T. Horvath, and S. Wrobel. Term comparisons in first-order similarity measures. In Proc. 8th Int. Workshop on Inductive Logic Pmgramming (ILP’98), LNAI 1446, pp. 65–79, 1998.
Google Scholar
J.-F. Boulicaut, M. Klemettinen, and H. Mannila. Querying inductive databases: A case study on the MINE RULE operator. In Proc. 2nd Eumpean Symposium on Principles of Data Mining and Knowledge Discovery (PKDD’98), LNCS 1510, pp. 194–202, 1998.
Google Scholar
J.-F. Boulicaut, P. Marcel, and C. Rigotti. Query driven knowledge discovery in multidimensional data. In Proc. of the ACM 2nd Int. Workshop on Data Warehousing and OLAP (DOLAP’99), pp. 87–93, 1999.
Google Scholar
T. Calders, R. T. Ng, and J. Wijsen. Searching for dependencies at multiple abstraction levels. ACM Trans. on Database Systems, 27(3):229–260, 2002.
Article Google Scholar
T. Calders and J. Wijsen. On monotone data mining languages. In Proc. 8th Int. Workshop on Database Pmgramming Languages (DBPL’01), LNCS 2397, pp. 119–132, Springer, 2002.
Google Scholar
S. Chaudhuri, U. M. Fayyad, and J. Bernhardt. Scalable classification over SQL databases. In Proc. 15th Int. Conf. on Data Engineering (ICDE’99), pp. 470–479, 1999.
Google Scholar
M.-S. Chen, J. Han, and P. S. Yu. Data mining: An overview from a database perspective. IEEE Trans. on Knowledge and Data Engineering, 8(6):866–883, 1996.
Article Google Scholar
S. Choenni and A. Siebes. Query optimization to support data mining. In Proc. Int. Workshop on Database and Expert Systems Applications (DEXA ’97), pp. 658–663, 1997.
Google Scholar
L. De Raedt. A logical database mining query language. In Proc. 10th Int. Conf. on Inductive Logic Pmgramming (ILP’00), LNAI 1866, pp. 78–92, 2000.
Google Scholar
L. Dehaspe and L. De Raedt. DLAB: A declarative language bias formalism. In Proc. Int. Symposium on Foundations of Intelligent Systems (ISMIS’96), LNCS 1079, pp. 613–622, 1996.
Google Scholar
L. Dehaspe and H. Toivonen. Discovery offrequent DATALOG patterns. Data Mining and Knowledge Discovery, 3(1):7–36, 1999.
Article Google Scholar
L. Dehaspe and H. Toivonen. Discovery of relational association rules. In A. Dzeroski and N. Lavrac, editors, Relational Data Mining, Chap. 8, pp. 189–212, Springer, 2001.
Google Scholar
A. Džeroski and N. Lavrac, editors. Relational Data Mining. Springer, Berlin, 2001.
MATH Google Scholar
W. Emde and D. Wettschereck. Relational instance-based learning. In Proc. 13th Int. Conf. on Machine Learning (ICML’96), pp. 122–130, 1996.
Google Scholar
A. Famili, W.-M. Shen, R. Weber, and E. Simoudis. Data preprocessing for intelligent data analysis. Intelligent Data Analysis, 1(1), 1997.
Google Scholar
U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, Cambridge, MA, 1996.
Google Scholar
M. R. Garey and D. S. Johnson. Computers and Intractability. A Guide to the Theory of NP-Completeness. W. H. Freeman, San Francisco, CA, 1979.
Google Scholar
F. Giannotti and G. Manco. Querying inductive databases via logic-based user-defined aggregates. In Proc. 3rd European Conf. on Principles of Data Mining and Knowledge Discovery (PKDD’99), LNAI 1704, pp. 125–135, 1999.
Google Scholar
F. Giannotti and G. Manco. Making knowledge extraction and reasoning closer. In Proc. 4th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD’00), LNAI 1805, pp. 360–371, 2000.
Google Scholar
F. Giannotti, G. Manco, M. Nanni, and D. Pedreschi. Nondeterministic, non-monotonic logic databases. IEEE Trans. on Knowledge and Data Engineering, 13(5):813–823, 2001.
Article Google Scholar
F. Giannotti, G. Manco, and F. Thrini. Specifying mining algorithms with iterative user-defined aggregates: A case study. In Proc. 5th European Conf. on Principles of Data Mining and Knowledge Discovery (PKDD’01), LNAI 2168, pp. 128–139, 2001.
Google Scholar
F. Giannotti, D. Pedreschi, and C. Zaniolo. Semantics and expressive power of nondeterministic constructs in deductive databases. Journal of Computer and System Sciences, 62(1):15–42, 2001.
Article MathSciNet Google Scholar
B. Goethals and J. Van den Bussche. On supporting interactive association rule mining. In Proc. of the 2nd Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK’00), LNCS 1874, pp. 307–316, 2000.
Google Scholar
B. Goethals and J. Van den Bussche. Relational association rules: Getting WARMeR. In Proc. of the ESP Exploratory Workshop on Pattern Detection and Discovery, LNCS 2447, pp. 125–139, 2002.
Google Scholar
G. Graefe, U. M. Fayyad, and S. Chaudhuri. On the efficient gathering of sufficient statistics for classification from large SQL databases. In Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD’98), pp. 204–208, 1998.
Google Scholar
J. Han. Towards on-line analytical mining in large databases. SIGMOD Record, 27(1):97–107, 1998.
Article Google Scholar
J. Han, Y. Fu, W. Wang, K. Koperski, and O. Zaiane. DMQL: A data mining query language for relational databases. In Proc. ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD’96), 1996.
Google Scholar
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco, CA, 2000.
Google Scholar
J. Han, L. Lakshmanan, and R. T. Ng. Constraint-based multidimensional data mining. IEEE Computer, 32(8):46–50, 1999.
Article Google Scholar
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proc. ACM SIGMOD Int. Conf. Management of Data, pp. 1–12, 2000.
Google Scholar
D. J. Hand, H. Mannila, and P. Smyth. Principles of Data Mining. MIT Press, Cambridge, MA, 2001.
Google Scholar
T. Imielinski and H. Mannila. A database perspective on knowledge discovery. Commun. of the ACM, 39(11):58–64, 1996.
Article Google Scholar
T. Imielinski and A. Virmani. MSQL: A query language for database mining. Data Mining and Knowledge Discovery, 3(4):373–408, 1999.
Article Google Scholar
T. Imielinski, A. Virmani, and A. Abdulghani. DMajor-Application programming interface for database mining. Data Mining and Knowledge Discovery, 3(4):347–372, 1999.
Article Google Scholar
A. K. Jain, M. N. Murthy, and P. J. Flynn. Data clustering: A review. ACM Computing Surveys, 31(3):264–323, 1999.
Article Google Scholar
D. Kapur and P. Narendran. NP-completeness of the set unification and matching problems. In Proc. 8th Int. Conf. on Automated Deduction, LNCS 230, pp. 489–495, 1986.
Google Scholar
M. Kirsten and S. Wrobel. Relational distance-based clustering. In Proc. 8th Int. Workshop on Inductive Logic Programming (ILP’98), LNAI 1446, pp. 261–270, 1998.
Google Scholar
M. Kirsten and S. Wrobel. Extending k-means clustering to first-order representations. In Proc. 10th. Int. Conf. on Inductive Logic Programming (ILP’00), LNCS 1866, pp. 112–119, 2000.
Google Scholar
S. Kramer, N. Lavrač, and P. Flach. Propositionalization approaches to relational data mining. In A. Dzeroski and N. Lavrac, editors, Relational Data Mining, Chap. 11, pp. 262–291, Springer, 2001.
Google Scholar
G. Manco. Foundations of a Logic-Based Framework for Intelligent Data Analysis. Ph.D. Thesis, Department of Computer Science, University of Pisa, 2001.
Google Scholar
H. Mannila. Inductive databases and condensed representations for data mining. In Proc. Int. Symposium on Logic Programming (ILPS’97), pp. 21–30, 1997.
Google Scholar
H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3):241–258, 1997.
Article Google Scholar
J. Marcinkowski and L. Pacholski. Undecidability of the Horn-clause implication problem. In Proc. of 33rd Annual IEEE Symposium on the Foundations of Computer Science, pp. 354–362, 1992.
Google Scholar
R. Meo, G. Psaila, and S. Ceri. An extension to SQL for mining association rules. Data Mining and Knowledge Discovery, 2(2):195–224, 1998.
Article Google Scholar
T. Mitchell. Machine Learning. McGraw-Hill, Boston, MA, 1997.
MATH Google Scholar
S. Muggleton and C. Feng. Efficient induction of logic programs. In Proc. of the 1st International Workshop on Algorithmic Learning Theory (ALT’90), pp. 368–381, 1990.
Google Scholar
A. Netz, S. Chaudhuri, U. M. Fayyad, and J. Bernhardt. Integrating data mining with SQL databases: OLE DB for data mining. In Proc. 17th Int. Conf. on Data Engineering (ICDE’01), pp. 379–387, 2001.
Google Scholar
R. T. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained association rules. In Proc. ACM SIGMOD Int. Conf. Management of Data, pp. 13–24, 1998.
Google Scholar
S.-H. Nienhuys-Cheng. Distance between Herbrand interpretations: A measure for approximations to a target concept. In Proc. 7th Int. Workshop on Inductive Logic Programming (ILP’97), LNAI 1297, pp. 213–226, 1997.
Google Scholar
S.-H. Nienhuys-Cheng and R. de Wolf. Least generalizations and greatest specializations of sets of clauses. Journal of Artificial Intelligence Research, 4:341–363, 1996.
MathSciNet MATH Google Scholar
S.-H. Nienhuys-Cheng and R. de Wolf. The subsumption theorem in inductive logic programming: Facts and fallacies. In L. De Raedt, editor, Advances in Inductive Logic Programming, pp. 265–276, IOS Press, 1996.
Google Scholar
G. Piatetsky-Shapiro. Discovery, analysis, and presentation of strong rules. In G. Piatetsky-Shapiro and W. J. Frawley, editors, Knowledge Discovery in Databases, pp. 229–248, AAAI/MIT Press, 1991.
Google Scholar
J. R. Quinlan. Learning logical definitions from relations. Machine Learning, 5:239–266, 1990.
Google Scholar
J. R. Quinlan and R. M. Cameron-Jones. Induction of logic programs: FOIL and related systems. New Generation Computing, 13(3&4):287–312, 1995.
Article Google Scholar
R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco, CA, 1993.
Google Scholar
E. Rahm and H. H. Do. Data cleaning: Problems and current approaches. IEEE Data Engineering Bulletin, 23(4):3–13, 2000.
Google Scholar
J. Ramon and M. Bruynooghe. A framework for defining distances between first-order logic objects. In Proc. 8th Int. Workshop on Inductive Logic Programming (ILP’98), LNCS 1446, pp. 271–280, 1998.
Google Scholar
J. A. Robinson. A machine-oriented logic based on the resolution principle. Journal of ACM, 12(1):23–41, 1965.
Article MATH Google Scholar
R. Sadri, C. Zaniolo, A. M. Zarkesh, and J. Adibi. A sequential pattern query language for supporting instant data mining for e-services. In Proc. 27th Int. Conf. on Very Large Data Bases (VLDB’01), pp. 653–656, 2001.
Google Scholar
S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with relational database systems: Alternatives and implications. Data Mining and Knowledge Discovery, 4(2/3):89–125, 2000.
Article Google Scholar
A. Savasere, E. Omiecinski, and S. B. Navathe. An efficient algorithm for mining association rules in large databases. In Proc. 21th Int. Conf. on Very Large Data Bases (VLDB’95), pp. 432–444, 1995.
Google Scholar
W.-M. Shen, K. Ong, B. G. Mitbander, and C. Zaniolo. Metaqueries for data mining. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pp. 375–398, AAAI/MIT Press, 1996.
Google Scholar
A. Siebes and M. L. Kersten. KESO: Minimizing database interaction. In Proc. 3rd Int. Conf. on Knowledge Discovery and Data Mining (KDD’97), pp. 247–250, 1997.
Google Scholar
C. Silverstein, S. Brin, and R. Motwani. Beyond market baskets: Generalizing association rules to dependence rules. Data Mining and Knowledge Discovery, 2(1):39–68, 1998.
Article Google Scholar
R. Srikant and R. Agrawal. Mining generalized association rules. Future Generation Computer Systems, 13(2/3):161–180, 1997.
Article Google Scholar
R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. In Proc. 3rd Int. Conf. on Knowledge Discovery and Data Mining (KDD’97), pp. 67–73, 1997.
Google Scholar
K. Thompson and P. Langley. Concept formation in structured domains. In D. H. Fisher, M. J. Pazzani, and P. Langley, editors, Concept Formation: Knowledge and Experience in Unsupervised Learning, pp. 127–161. Morgan Kaufmann, 1991.
Google Scholar
S. Tsur, J. D. Ullman, S. Abiteboul, C. Clifton, R. Motwani, S. Nestorov, and A. Rosenthal. Query flocks: A generalization of association-rule mining. In Proc. ACM SIGMOD Int. Conf. Management of Data, pp. 1–12, 1998.
Google Scholar
P. R. J. van der Laag and S.-H. Nienhuys-Cheng. Completeness and properness of refinement operators in inductive logic programming. Journal of Logic Programming, 34(3):201–225, 1998.
Article MathSciNet MATH Google Scholar
W. Van Laer and L. De Raedt. How to upgrade propositional learners to first order logic: A case study. In A. Džeroski and N. Lavrač, editors, Relational Data Mining, Chap. 10, pp. 235–261, Springer, 2001.
Google Scholar
H. Wang and C. Zaniolo. Using SQL to build new aggregates and extenders for object-relational systems. In Proc. 26th Int. Conf. on Very Large Data Bases (VLDB’00), pp. 166–175, 2000.
Google Scholar
S. M. Weiss and N. Indurkhya. Predictive Data Mining: A Practical Guide. Morgan Kaufmann, San Francisco, CA, 1997.
Google Scholar
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco, CA, 1999.
Google Scholar
S. Wrobel. Inductive logic programming for knowledge discovery in databases. In A. Džeroski and N. Lavrac, editors, Relational Data Mining, Chap. 4, pp. 74–101, Springer, 2001.
Google Scholar
C. Zaniolo, N. Ami, and K. Ong. Negation and aggregates in recursive rules: The LDL++ approach. In Proc. 3rd Int. Conf. on Deductive and Object-Oriented Databases (DOOD’93), LNCS 760, pp. 204–221, 1993.
Google Scholar
C. Zaniolo and H. Wang. Logic-based user-defined aggregates for the next generation of database systems. In K. R. Apt, V. W. Marek, M. Truszczynski, and D. S. Warren, editors, The Logic Programming Paradigm: A 25-Year Perspective, pp. 401–426, Springer, 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

ISTI, Institute of CNR, Via Moruzzi 1, 56124, Pisa, Italy
Fosca Giannotti
ICAR, Institute of CNR, Via Bucci 41c, 87036, Rende (CS), Italy
Giuseppe Manco
Institut d’Informatique, University of Mons-Hainaut, Avenue du Champ de Mars 6, B-7000, Mons, Belgium
Jef Wijsen

Authors

Fosca Giannotti
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe Manco
View author publications
You can also search for this author in PubMed Google Scholar
Jef Wijsen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science and Engineering University at Buffalo, State University of New York, 201 Bell Hall, Buffalo, NY, 14260, USA
Jan Chomicki
School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, 2052, Australia
Ron van der Meyden
Fakultät für Informatik, Otto-von-Guericke-Universität Magdeburg, Postfach 4120, 39016, Magdeburg, Germany
Gunter Saake

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Giannotti, F., Manco, G., Wijsen, J. (2004). Logical Languages for Data Mining. In: Chomicki, J., van der Meyden, R., Saake, G. (eds) Logics for Emerging Applications of Databases. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18690-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-18690-5_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-62248-9
Online ISBN: 978-3-642-18690-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics