Discovering Knowledge from Multi-relational Data Based on Information Retrieval Theory

Alfred, Rayner

doi:10.1007/978-3-642-03348-3_39

Rayner Alfred²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5678))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2195 Accesses

Abstract

Although the TF-IDF weighted frequency matrix (vector space model) has been widely studied and used in document clustering or document categorisation, there has been no attempt to extend this application to relational data that contain one-to-many associations between records. This paper explains the rationale for using TF-IDF (term frequency inverse document frequency), a technique for weighting data attributes, borrowed from Information Retrieval theory, to summarise datasets stored in a multi-relational setting with one-to-many relationships. A novel data summarisation algorithm based on TF-IDF is introduced, which is referred to as Dynamic Aggregation of Relational Attributes (DARA). The DARA algorithm applies clustering techniques in order to summarise these datasets. The experimental results show that using the DARA algorithm finds solutions with much greater accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alfred, R.: A genetic-based feature construction method for data summarisation. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds.) ADMA 2008. LNCS, vol. 5139, pp. 39–50. Springer, Heidelberg (2008)
Chapter Google Scholar
Alfred, R., Kazakov, D.: Clustering Approach to Generalised Pattern Identification Based on Multi-Instanced Objects with DARA. In: Ioannidis, Y., Novikov, B., Rachev, B. (eds.) ADBIS 2007. LNCS, vol. 4690. Springer, Heidelberg (2007)
Chapter Google Scholar
Alfred, R., Kazakov, D.: Discretisation Numbers for Multiple-Instances Problem in Relational Database. In: Ioannidis, Y., Novikov, B., Rachev, B. (eds.) ADBIS 2007. LNCS, vol. 4690, pp. 55–65. Springer, Heidelberg (2007)
Chapter Google Scholar
Blockeel, H., Raedt, L.D.: Top-Down Induction of First-Order Logical Decision Trees. Artif. Intell. 101(1-2), 285–297 (1998)
Article MathSciNet MATH Google Scholar
Blockeel, H., Sebag, M.: Scalability and Efficiency in Multi-Relational Data Mining. In: SIGKDD Explorations, vol. 5(1), pp. 17–30 (2003)
Google Scholar
Finn, P.W., Muggleton, S., Page, D., Srinivasan, A.: Pharmacophore Discovery Using the Inductive Logic Programming System PROGOL. Machine Learning 30(2-3), 241–270 (1998)
Article Google Scholar
Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
MATH Google Scholar
Jain, A.K., Duin, R.P.W., Mao, J.: Statistical Pattern Recognition: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(1), 4–37 (2000)
Article Google Scholar
Kirsten, M., Wrobel, S.: Relational Distance-Based Clustering. In: 8th International Conference on Inductive Logic Programming, pp. 261–270 (1998)
Google Scholar
Kramer, S., Lavrac, N., Flach, P.: Propositionalisation Approaches to Relational Data Mining. In: Deroski, S., Lavrac, N. (eds.) Relational Data mining. Springer, Heidelberg (2001)
Google Scholar
Krogel, M.A., Wrobel, S.: Transformation-Based Learning Using Multirelational Aggregation. In: Rouveirol, C., Sebag, M. (eds.) ILP 2001. LNCS, vol. 2157, pp. 142–155. Springer, Heidelberg (2001)
Chapter Google Scholar
Quinlan, R.J.: C4.5: Programs for Machine Learning. Morgan Kaufmann Series in Machine Learning (1993)
Google Scholar
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill Book Company, New York (1984)
MATH Google Scholar
Salton, G., Wong, A., Yang, C.S.: A Vector Space Model for Automatic Indexing. Commun. ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Srinivasan, A., Muggleton, S., Sternberg, M.J.E., King, R.D.: Theories for Mutagenicity: A Study in First-Order and Feature-Based Induction. Artif. Intell. 85(1-2), 277–299 (1996)
Article Google Scholar
van Rijsbergen, C.J.: Information Retrieval. Butterworth (1979)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: PracticalMachine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Artificial Intelligence, Universiti Malaysia Sabah, Locked Bag 2073, 88999, Kota Kinabalu, Sabah, Malaysia
Rayner Alfred

Authors

Rayner Alfred
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Science & Engineering Institute, School of Education Technology, Beijing Normal University, Xinjiekouwai Ave. 19, 100875, Beijing, China
Ronghuai Huang
The Hong Kong University of Science and Technology, Clear Water Bay,, Hong Kong, Hong Kong
Qiang Yang
School of Computing Science, Simon Fraser University, 8888 University Drive, V5A 1S6, Burnaby, BC, Canada
Jian Pei
Faculty of Economics, University of Porto, Rua Dr. Roberto Frias, 4200-465, Porto, Portugal
João Gama
School of Information, Zhongguancum, Renmin University, 100872, Beijing, China
Xiaofeng Meng
School of Information Technology and Electrical Engineering, The University of Queensland, 4072, St. Lucia, Queensland, Australia
Xue Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alfred, R. (2009). Discovering Knowledge from Multi-relational Data Based on Information Retrieval Theory. In: Huang, R., Yang, Q., Pei, J., Gama, J., Meng, X., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2009. Lecture Notes in Computer Science(), vol 5678. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03348-3_39

Download citation

DOI: https://doi.org/10.1007/978-3-642-03348-3_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03347-6
Online ISBN: 978-3-642-03348-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics