Skip to main content

SQL-Based KDD with Infobright’s RDBMS: Attributes, Reducts, Trees

  • Conference paper
Rough Sets and Intelligent Systems Paradigms

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8537))

Abstract

We present a framework for KDD process implemented using SQL procedures, consisting of constructing new attributes, finding rough set-based reducts and inducing decision trees. We focus particularly on attribute reduction, which is important especially for high-dimensional data sets. The main technical contribution of this paper is a complete framework for calculating short reducts using SQL queries on data stored in a relational form, without a need of any external tools generating or modifying their syntax. A case study of large real-world data is presented. The paper also recalls some other examples of SQL-based data mining implementations. The experimental results are based on the usage of Infobright’s analytic RDBMS, whose performance characteristics perfectly fit the requirements of presented algorithms.

The second author was partly supported by Polish National Science Centre (NCN) grants DEC-2011/01/B/ST6/03867 and DEC-2012/05/B/ST6/03215, and by National Centre for Research and Development (NCBiR) grant SP/I/1/77065/10 by the strategic scientific research and experimental development program: “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apanowicz, C., Eastwood, V., Ślęzak, D., Synak, P., Wojna, A., Wojnarski, M., Wróblewski, J.: Method and system for data compression in a relational database. US Patent 8,700,579 (2014)

    Google Scholar 

  2. Bae, S.-H., Choi, J.Y., Qiu, J., Fox, G.C.: High Performance Dimension Reduction and Visualization for Large High-dimensional Data Analysis. In: Proc. of HPDC, pp. 203–214 (2010)

    Google Scholar 

  3. Bazan, J.G., Nguyen, H.S., Nguyen, S.H., Synak, P., Wróblewski, J.: Rough Set Algorithms in Classification Problem. In: Rough Set Methods and Applications, pp. 49–88. Physica-Verlag (2000)

    Google Scholar 

  4. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI, pp. 137–150 (2004)

    Google Scholar 

  5. Hu, X., Han, J., Lin, T.Y.: A New Rough Sets Model Based on Database Systems. Fundamenta Informaticae 59(2-3), 135–152 (2003)

    MathSciNet  MATH  Google Scholar 

  6. Janusz, A., Nguyen, H.S., Ślęzak, D., Stawicki, S., Krasuski, A.: JRS’2012 Data Mining Competition: Topical Classification of Biomedical Research Papers. In: Yao, J., Yang, Y., Słowiński, R., Greco, S., Li, H., Mitra, S., Polkowski, L. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 422–431. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  7. Janusz, A., Ślęzak, D.: Rough Set Methods for Attribute Clustering and Selection. Applied Artificial Intelligence 28(3), 220–242 (2014)

    Article  Google Scholar 

  8. Kowalski, M., Ślęzak, D., Synak, P.: Approximate Assistance for Correlated Subqueries. In: Proc. of FedCSIS, pp. 1455–1462 (2013)

    Google Scholar 

  9. Kowalski, M., Ślęzak, D., Toppin, G., Wojna, A.: Injecting Domain Knowledge into RDBMS – Compression of Alphanumeric Data Attributes. In: Kryszkiewicz, M., Rybinski, H., Skowron, A., Raś, Z.W. (eds.) ISMIS 2011. LNCS, vol. 6804, pp. 386–395. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  10. Kowalski, M., Stawicki, S.: SQL-Based Heuristics for Selected KDD Tasks over Large Data Sets. In: Proc. of FedCSIS, pp. 303–310 (2012)

    Google Scholar 

  11. Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley (2004)

    Google Scholar 

  12. Liu, H., Motoda, H. (eds.): Feature extraction, construction and selection – a data mining perspective. Kluwer Academic Publishers, Dordrecht (1998)

    MATH  Google Scholar 

  13. Liu, H., Motoda, H.: Computational Methods of Feature Selection. Chapman & Hall/CRC (2008)

    Google Scholar 

  14. Nguyen, H.S., Nguyen, S.H.: Fast split selection method and its application in decision tree construction from large databases. Int. J. Hybrid Intell. Syst. 2(2), 149–160 (2005)

    Article  Google Scholar 

  15. Nguyen, H.S., Ślęzak, D.: Approximate reducts and association rules. In: Zhong, N., Skowron, A., Ohsuga, S. (eds.) RSFDGrC 1999. LNCS (LNAI), vol. 1711, pp. 137–145. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  16. Pawlak, Z., Skowron, A.: Rudiments of Rough Sets. Information Sciences 177(1), 3–27 (2007)

    Article  MathSciNet  Google Scholar 

  17. Rahman, M.M., Ślęzak, D., Wróblewski, J.: Parallel Island Model for Attribute Reduction. In: Pal, S.K., Bandyopadhyay, S., Biswas, S. (eds.) PReMI 2005. LNCS, vol. 3776, pp. 714–719. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  18. Sarawagi, S., Thomas, S., Agrawal, R.: Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications. Data Min. Knowl. Discov. 4(2/3), 89–125 (2000)

    Article  Google Scholar 

  19. Ślęzak, D., Kowalski, M.: Towards approximate SQL – infobright’s approach. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 630–639. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  20. Ślęzak, D., Stencel, K., Nguyen, H.S.: (No)SQL Platform for Scalable Semantic Processing of Fast Growing Document Repositories. ERCIM News 2012(90) (2012)

    Google Scholar 

  21. Ślęzak, D., Synak, P., Wojna, A., Wróblewski, J.: Two Database Related Interpretations of Rough Approximations: Data Organization and Query Execution. Fundamenta Informaticae 127(1-4), 445–459 (2013)

    Google Scholar 

  22. Ślęzak, D., Wróblewski, J., Eastwood, V., Synak, P.: Brighthouse: An Analytic Data Warehouse for Ad-hoc Queries. PVLDB 1(2), 1337–1345 (2008)

    Google Scholar 

  23. Świeboda, W., Nguyen, H.S.: Rough Set Methods for Large and Spare Data in EAV Format. In: Proc. of RIVF, pp. 1–6 (2012)

    Google Scholar 

  24. Szczuka, M.S., Wojdyłło, P.: Neuro-wavelet classifiers for EEG signals based on rough set methods. Neurocomputing 36(1-4), 103–122 (2001)

    Article  Google Scholar 

  25. Widz, S., Ślęzak, D.: Rough Set Based Decision Support – Models Easy to Interpret. In: Selected Methods and Applications of Rough Sets in Management and Engineering, pp. 95–112. Springer (2012)

    Google Scholar 

  26. Widz, S., Ślęzak, D.: Granular attribute selection: A case study of rough set approach to MRI segmentation. In: Maji, P., Ghosh, A., Murty, M.N., Ghosh, K., Pal, S.K. (eds.) PReMI 2013. LNCS, vol. 8251, pp. 47–52. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  27. Wojnarski, M., et al.: RSCTC’2010 Discovery Challenge: Mining DNA Microarray Data for Medical Diagnosis and Treatment. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 4–19. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  28. Wróblewski, J.: Analyzing relational databases using rough set based methods. In: Proc. of IPMU, vol. 1, pp. 256–262 (2000)

    Google Scholar 

  29. Wróblewski, J.: Pairwise Cores in Information Systems. In: Proc. of RSFDGrC, vol. 1, pp. 166–175 (2005)

    Google Scholar 

  30. Zhang, J., Li, T., Ruan, D., Gao, Z., Zhao, C.: A parallel method for computing rough set approximations. Information Sciences 194, 209–223 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Wróblewski, J., Stawicki, S. (2014). SQL-Based KDD with Infobright’s RDBMS: Attributes, Reducts, Trees. In: Kryszkiewicz, M., Cornelis, C., Ciucci, D., Medina-Moreno, J., Motoda, H., Raś, Z.W. (eds) Rough Sets and Intelligent Systems Paradigms. Lecture Notes in Computer Science(), vol 8537. Springer, Cham. https://doi.org/10.1007/978-3-319-08729-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08729-0_3

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08728-3

  • Online ISBN: 978-3-319-08729-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics