Skip to main content

A Document-Based Data Warehousing Approach for Large Scale Data Mining

  • Conference paper
Pervasive Computing and the Networked World (ICPCA/SWS 2012)

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 7719))

Abstract

Data mining techniques are widely applied and data warehousing is relatively important in this process. Both scalability and efficiency have always been the key issues in data warehousing. Due to the explosive growth of data, data warehousing today is facing tough challenges in these issues and traditional method encounters its bottleneck. In this paper, we present a document-based data warehousing approach. In our approach, the ETL process is carried out through MapReduce framework and the data warehouse is constructed on a distributed, document-oriented database. A case study is given to demonstrate details of the entire process. Comparing with RDBMS based data warehousing, our approach illustrates better scalability, flexibility and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gupta, V.R.: An Introduction to Data Warehousing. System Services Corporation (1997)

    Google Scholar 

  2. Tan, A.X., et al.: A Comparison of Approaches for Large-Scale Data Mining. Technical Report UTDCS-24-10 (2010)

    Google Scholar 

  3. Yang, L., Shi, Z.: An Efficient Data Mining Framework on Hadoop using Java Persistentce API. In: 10th IEEE International Conference on Computer and Information Technology (2010)

    Google Scholar 

  4. Zhao, J.: Designing Distributed Data Warehouses and OLAP Systems. In: ISTA 2005, pp. 254–263 (2005)

    Google Scholar 

  5. Sreenivasa Rao, V., Vidyavathi, S.: Distributed Data Mining And Mining Multi-agent Data. International Journal on Computer Science and Engineering (IJCSE) 02(04), 1237–1244 (2010)

    Google Scholar 

  6. Han, J., et al.: A Novel Solution of Distributed Memory NoSQL database for Cloud Computing. In: 2011 10th IEEE/ACIS International Conference on Computer and Information Science (2011), 978-0-7695-4401-4/11$26.00

    Google Scholar 

  7. Sen, A., Sinha, A.P.: A comparison of data warehousing methodologies. Communications of The ACM 48(3) (2005)

    Google Scholar 

  8. JSON, http://www.json.org/

  9. Inmon, W.H.: Building the Data Warehouse. John Wiley (1992)

    Google Scholar 

  10. Chaudhuri, S., Dayal, U.: An Overview of Data Warehousing and OLAP Technology. ACM Sigmod Record (1997)

    Google Scholar 

  11. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI (2004)

    Google Scholar 

  12. Ghemawat, S., et al.: The Google File System. In: SOSP 2003. ACM (2003)

    Google Scholar 

  13. Chang, F., et al.: BigTable: A Distributed Storage System for Structured Data. In: OSDI (2006)

    Google Scholar 

  14. Apache Hadoop, http://hadoop.apache.org/

  15. KDD Cup 2012, http://www.kddcup2012.org/

  16. MongoDB, http://www.mongodb.org/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chai, H., Wu, G., Zhao, Y. (2013). A Document-Based Data Warehousing Approach for Large Scale Data Mining. In: Zu, Q., Hu, B., Elçi, A. (eds) Pervasive Computing and the Networked World. ICPCA/SWS 2012. Lecture Notes in Computer Science, vol 7719. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37015-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37015-1_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37014-4

  • Online ISBN: 978-3-642-37015-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics