A Hybrid Framework for Query Processing and Data Analytics on Spark

Chen, Haokun; Zhang, Xiaowang; Zhang, Jiahui; Feng, Zhiyong

doi:10.1007/978-3-030-01298-4_15

Haokun Chen^15,17,
Xiaowang Zhang^15,17,
Jiahui Zhang^15,17 &
…
Zhiyong Feng^16,17

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11268))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

1068 Accesses

Abstract

In this paper, we propose a hybrid framework for query processing and data analytics over large-scale data on Spark, to support multi-paradigm process (incl. SQL, OLAP, data mining, machine learning etc.) in distributed environments. The framework features a three-layer data process module and a work flow module which controls the former. We will demonstrate the strength of our framework properly applying traffic scenarios in a real world.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://en.zuche.com/.

References

Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. SIGMOD Rec. 26(1), 65–74 (1997)
Article Google Scholar
Berson, A., Smith, S.J.: Data Warehousing, Data Mining, and OLAP. McGraw-Hill, New York (1997)
Google Scholar
Gray, J., Chaudhuri, S., Bosworth, A., et al.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min. Knowl. Discov. 1(1), 29–53 (1997)
Google Scholar
Baragoin, C., Bercianos, J., Komel, J., Robinson, G., Sawa, R., Schuinder, E.: DB2 OLAP server theory and practices. International Technical Support Organization (2001)
Google Scholar
Zaharia, M., et al.: Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. HotCloud 12, 10–10 (2012)
Google Scholar
Fernández-Delgado, M., Cernadas, E., Barro, S., Gomes Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems. J. Mach. Learn. Res. 15(1), 3133–3181 (2014)
MathSciNet MATH Google Scholar
Gonzalez, J.E., Xin, R.S., et al.: GraphX: graph processing in a distributed dataflow framework. OSDI 14, 599–613 (2014)
Google Scholar
Hadoop (2015). http://hadoop.apache.org/
Thusoo, A., Sarma, J.S., Jain, N., et al.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)
Article Google Scholar
Ricci, F., Rokach, L., Shapira, B.: Introduction to recommender systems handbook. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recommender Systems Handbook, pp. 1–35. Springer, Boston, MA (2011). https://doi.org/10.1007/978-0-387-85820-3_1
Chapter MATH Google Scholar
Zhou, Y., Wilkinson, D., Schreiber, R., Pan, R.: Large-scale parallel collaborative filtering for the Netflix prize. In: Fleischer, R., Xu, J. (eds.) AAIM 2008. LNCS, vol. 5034, pp. 337–348. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68880-8_32
Chapter Google Scholar
zur Muehlen, M., Rosemann, M.: Multi-paradigm process management. In: Proceedings of CAISE 2004, pp. 169–175 (2004)
Google Scholar
Meng, X., Bradley, J.K., Yavuz, B., et al.: Mllib: machine learning in apache spark. J. Mach. Learn. Res. 17, 1–7 (2016)
Google Scholar
Spofford, G.: MDX Solutions: With Microsoft SQL Server Analysis Services. Wiley, New York (2001)
Google Scholar
Sheth, A.P., et al.: Supporting state-wide immunisation tracking using multi-paradigm workflow technology. In: Proceedings of VLDB 1996, pp. 263–273 (1996)
Google Scholar
Oozie: Apache workflow scheduler for Hadoop. The Apache Software Foundation (September, 2010). http://oozie.apache.org/
Dodge, G., Gorman, T.: Oracle Data Warehousing. Wiley, New York (1998)
Google Scholar
Schrader, M., Vlamis, D.: Oracle Essbase & Oracle OLAP. Peter Gbolagade Akintunde (2009)
Google Scholar
Bontempo, C., Zagelow, G.: The IBM data warehouse architecture. Commun. ACM 41(9), 38–48 (1998)
Article Google Scholar
Rouse, W.: What is big data analytics? TechTarget.com (2012). http://searchbusinessanalytics.techtarget.com/definition/big-data-analytics

Download references

Acknowledgements

We would like to thank CAR Inc provides datasets for science research. This work is supported by the Key Technology R&D Program of Tianjin (16YFZCGX00210), the the National Key R&D Program of China (2016YFB1000603, 2017YFC0908401), and the National Natural Science Foundation of China (61672377).

Author information

Authors and Affiliations

School of Computer Science and Technology, Tianjin University, Tianjin, 300350, People’s Republic of China
Haokun Chen, Xiaowang Zhang & Jiahui Zhang
School of Computer Software, Tianjin University, Tianjin, 300350, People’s Republic of China
Zhiyong Feng
Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin, 300350, People’s Republic of China
Haokun Chen, Xiaowang Zhang, Jiahui Zhang & Zhiyong Feng

Authors

Haokun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jiahui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyong Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaowang Zhang .

Editor information

Editors and Affiliations

University of Macau, Macao, China
Leong Hou U
Education University of Hong Kong, Hong Kong, China
Haoran Xie

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, H., Zhang, X., Zhang, J., Feng, Z. (2018). A Hybrid Framework for Query Processing and Data Analytics on Spark. In: U, L., Xie, H. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 11268. Springer, Cham. https://doi.org/10.1007/978-3-030-01298-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-01298-4_15
Published: 21 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01297-7
Online ISBN: 978-3-030-01298-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics