Skip to main content

Ubiq: A Scalable and Fault-Tolerant Log Processing Infrastructure

  • Conference paper
  • First Online:
Real-Time Business Intelligence and Analytics (BIRTE 2015, BIRTE 2016, BIRTE 2017)

Abstract

Most of today’s Internet applications generate vast amounts of data (typically, in the form of event logs) that needs to be processed and analyzed for detailed reporting, enhancing user experience and increasing monetization. In this paper, we describe the architecture of Ubiq, a geographically distributed framework for processing continuously growing log files in real time with high scalability, high availability and low latency. The Ubiq framework fully tolerates infrastructure degradation and data center-level outages without any manual intervention. It also guarantees exactly-once semantics for application pipelines to process logs as a collection of multiple events. Ubiq has been in production for Google’s advertising system for many years and has served as a critical log processing framework for several dozen pipelines. Our production deployment demonstrates linear scalability with machine resources, extremely high availability even with underlying infrastructure failures, and an end-to-end latency of under a minute.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abadi, D.J., et al.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003)

    Article  Google Scholar 

  2. Abadi, D.J., et al.: The design of the borealis stream processing engine. In: CIDR, pp. 277–289 (2005)

    Google Scholar 

  3. Ananthanarayanan, R., et al.: Photon: fault-tolerant and scalable joining of continuous data streams. In: SIGMOD, pp. 577–588 (2013)

    Google Scholar 

  4. Apache Flink (2014). http://flink.apache.org

  5. Apache Samza (2014). http://samza.apache.org

  6. Apache Storm (2013). http://storm.apache.org

  7. Arasu, A., et al.: STREAM: the Stanford stream data manager. In: SIGMOD, p. 665 (2003)

    Google Scholar 

  8. Chandra, T.D., et al.: Paxos made live - an engineering perspective. In: PODC, pp. 398–407 (2007)

    Google Scholar 

  9. Chandrasekaran, S., et al.: TelegraphCQ: continuous dataflow processing. In: SIGMOD, p. 668 (2003)

    Google Scholar 

  10. Chen, J., et al.: NiagaraCQ: a scalable continuous query system for internet databases. In: SIGMOD, pp. 379–390 (2000)

    Google Scholar 

  11. Corbett, J.C., et al.: Spanner: Google’s globally distributed database. ACM Trans. Comput. Syst. 31(3), 8 (2013)

    Article  Google Scholar 

  12. Gupta, A., et al.: Mesa: geo-replicated, near real-time, scalable data warehousing. PVLDB 7(12), 1259–1270 (2014)

    Google Scholar 

  13. Gupta, A., Shute, J.: High-availability at massive scale: building Google’s data infrastructure for ads. In: BIRTE (2015)

    Google Scholar 

  14. Kulkarni, S., et al.: Twitter Heron: stream processing at scale. In: SIGMOD, SIGMOD 2015, pp. 239–250 (2015)

    Google Scholar 

  15. Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133–169 (1998)

    Article  Google Scholar 

  16. Verma, A., et al.: Large-scale cluster management at Google with Borg. In: EuroSys, pp. 18:1–18:17 (2015)

    Google Scholar 

  17. Zaharia, M., et al.: Discretized streams: fault-tolerant streaming computation at scale. In: SOSP, pp. 423–438 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manpreet Singh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Basker, V. et al. (2019). Ubiq: A Scalable and Fault-Tolerant Log Processing Infrastructure. In: Castellanos, M., Chrysanthis, P., Pelechrinis, K. (eds) Real-Time Business Intelligence and Analytics. BIRTE BIRTE BIRTE 2015 2016 2017. Lecture Notes in Business Information Processing, vol 337. Springer, Cham. https://doi.org/10.1007/978-3-030-24124-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-24124-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-24123-0

  • Online ISBN: 978-3-030-24124-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics