Skip to main content

Scale Out Parallel and Distributed CDR Stream Analytics

  • Conference paper
Data Management in Grid and Peer-to-Peer Systems (Globe 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6265))

Included in the following conference series:

Abstract

In the era of information explosion, huge amount of data are generated from various sensing devices continuously, which are often too low level for analytics purpose, and too massive to load to data-warehouses for filtering and summarizing with the reasonable latency. Distributed stream analytics for multilevel abstraction is the key to solve this problem.

We advocate a distributed infrastructure for CDR (Call Detail Record) stream analytics in the telecommunication network where the stream processing is integrated into the database engine, and carried out in terms of continuous querying; the computation model is based on network-distributed (rather than clustered) Map-Reduce scheme. We propose the window based cooperation mechanism for having multiple engines synchronized and cooperating on the data falling in a common window boundary, based on time, cardinality, etc. This mechanism allows the engines to cooperate window by window without centralized coordination. We further propose the quantization mechanism for integrating the discretization and abstraction of continuous-valued data, for efficient and incremental data reduction, and in turn, network data movement reduction. These mechanisms provide the key roles in scaling out CDR stream analytics.

The proposed approach has been integrated into the PostgreSQL engine.

Our preliminary experiments reveal its merit for large-scale distributed stream processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arasu, A., Babu, S., Widom, J.: The CQL Continuous Query Language: Semantic Founda-tions and Query Execution. VLDB Journal 2(15) (June 2006)

    Google Scholar 

  2. Chandrasekaran, S., et al.: TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. In: CIDR 2003 (2003)

    Google Scholar 

  3. Chen, Q., Therber, A., Hsu, M., Zeller, H., Zhang, B., Wu, R.: Efficiently Support Map-Reduce alike Computation Models Inside Parallel DBMS. In: Proc. Thirteenth International Database Engineering & Applications Symposium, IDEAS’09 (2009)

    Google Scholar 

  4. Chen, Q., Hsu, M., Liu, R.: Extend UDF Technology for Integrated Analytics. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 256–270. Springer, Heidelberg (2009)

    Google Scholar 

  5. Chen, Q., Hsu, M.: Data-Continuous SQL Process Model. In: Proc. 16th International Conference on Cooperative Information Systems, CoopIS’08 (2008)

    Google Scholar 

  6. Dean, J.: Experiences with MapReduce, an abstraction for large-scale computation. In: Int. Conf. on Parallel Architecture and Compilation Techniques. ACM, New York (2006)

    Google Scholar 

  7. DeWitt, D.J., Paulson, E., Robinson, E., Naughton, J., Royalty, J., Shankar, S., Krioukov, A.: Clustera: An Integrated Computation and Data Management System. In: VLDB 2008 (2008)

    Google Scholar 

  8. Fayyad, U.M., Irani, K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In: 13th Int. Joint Conf. on Artificial Intelligence (1993)

    Google Scholar 

  9. Franklin, M.J., et al.: Continuous Analytics: Rethinking Query Processing in a Net-work-Effect World. In: CIDR 2009 (2009)

    Google Scholar 

  10. Gedik, B., Andrade, H., Wu, K.-L., Yu, P.S., Doo, M.C.: SPADE: The System S Declarative Stream Processing Engine. In: ACM SIGMOD 2008 (2008)

    Google Scholar 

  11. Greenplum: Greenplum MapReduce for the Petabytes Database (2008), http://www.greenplum.com/resources/MapReduce/

  12. Liarou, E., et al.: Exploiting the Power of Relational Databases for Efficient Stream Processing. In: EDBT 2009 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, Q., Hsu, M. (2010). Scale Out Parallel and Distributed CDR Stream Analytics. In: Hameurlain, A., Morvan, F., Tjoa, A.M. (eds) Data Management in Grid and Peer-to-Peer Systems. Globe 2010. Lecture Notes in Computer Science, vol 6265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15108-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15108-8_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15107-1

  • Online ISBN: 978-3-642-15108-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics