Skip to main content

A Hybrid Framework for Query Processing and Data Analytics on Spark

  • Conference paper
  • First Online:
Web and Big Data (APWeb-WAIM 2018)

Abstract

In this paper, we propose a hybrid framework for query processing and data analytics over large-scale data on Spark, to support multi-paradigm process (incl. SQL, OLAP, data mining, machine learning etc.) in distributed environments. The framework features a three-layer data process module and a work flow module which controls the former. We will demonstrate the strength of our framework properly applying traffic scenarios in a real world.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://en.zuche.com/.

References

  1. Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. SIGMOD Rec. 26(1), 65–74 (1997)

    Article  Google Scholar 

  2. Berson, A., Smith, S.J.: Data Warehousing, Data Mining, and OLAP. McGraw-Hill, New York (1997)

    Google Scholar 

  3. Gray, J., Chaudhuri, S., Bosworth, A., et al.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min. Knowl. Discov. 1(1), 29–53 (1997)

    Google Scholar 

  4. Baragoin, C., Bercianos, J., Komel, J., Robinson, G., Sawa, R., Schuinder, E.: DB2 OLAP server theory and practices. International Technical Support Organization (2001)

    Google Scholar 

  5. Zaharia, M., et al.: Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. HotCloud 12, 10–10 (2012)

    Google Scholar 

  6. Fernández-Delgado, M., Cernadas, E., Barro, S., Gomes Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems. J. Mach. Learn. Res. 15(1), 3133–3181 (2014)

    MathSciNet  MATH  Google Scholar 

  7. Gonzalez, J.E., Xin, R.S., et al.: GraphX: graph processing in a distributed dataflow framework. OSDI 14, 599–613 (2014)

    Google Scholar 

  8. Hadoop (2015). http://hadoop.apache.org/

  9. Thusoo, A., Sarma, J.S., Jain, N., et al.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)

    Article  Google Scholar 

  10. Ricci, F., Rokach, L., Shapira, B.: Introduction to recommender systems handbook. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recommender Systems Handbook, pp. 1–35. Springer, Boston, MA (2011). https://doi.org/10.1007/978-0-387-85820-3_1

    Chapter  MATH  Google Scholar 

  11. Zhou, Y., Wilkinson, D., Schreiber, R., Pan, R.: Large-scale parallel collaborative filtering for the Netflix prize. In: Fleischer, R., Xu, J. (eds.) AAIM 2008. LNCS, vol. 5034, pp. 337–348. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68880-8_32

    Chapter  Google Scholar 

  12. zur Muehlen, M., Rosemann, M.: Multi-paradigm process management. In: Proceedings of CAISE 2004, pp. 169–175 (2004)

    Google Scholar 

  13. Meng, X., Bradley, J.K., Yavuz, B., et al.: Mllib: machine learning in apache spark. J. Mach. Learn. Res. 17, 1–7 (2016)

    Google Scholar 

  14. Spofford, G.: MDX Solutions: With Microsoft SQL Server Analysis Services. Wiley, New York (2001)

    Google Scholar 

  15. Sheth, A.P., et al.: Supporting state-wide immunisation tracking using multi-paradigm workflow technology. In: Proceedings of VLDB 1996, pp. 263–273 (1996)

    Google Scholar 

  16. Oozie: Apache workflow scheduler for Hadoop. The Apache Software Foundation (September, 2010). http://oozie.apache.org/

  17. Dodge, G., Gorman, T.: Oracle Data Warehousing. Wiley, New York (1998)

    Google Scholar 

  18. Schrader, M., Vlamis, D.: Oracle Essbase & Oracle OLAP. Peter Gbolagade Akintunde (2009)

    Google Scholar 

  19. Bontempo, C., Zagelow, G.: The IBM data warehouse architecture. Commun. ACM 41(9), 38–48 (1998)

    Article  Google Scholar 

  20. Rouse, W.: What is big data analytics? TechTarget.com (2012). http://searchbusinessanalytics.techtarget.com/definition/big-data-analytics

Download references

Acknowledgements

We would like to thank CAR Inc provides datasets for science research. This work is supported by the Key Technology R&D Program of Tianjin (16YFZCGX00210), the the National Key R&D Program of China (2016YFB1000603, 2017YFC0908401), and the National Natural Science Foundation of China (61672377).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaowang Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, H., Zhang, X., Zhang, J., Feng, Z. (2018). A Hybrid Framework for Query Processing and Data Analytics on Spark. In: U, L., Xie, H. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 11268. Springer, Cham. https://doi.org/10.1007/978-3-030-01298-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01298-4_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01297-7

  • Online ISBN: 978-3-030-01298-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics