Abstract
Apache Storm is a scalable fault-tolerant distributed real time stream-processing framework widely used in big data applications. For distributed data-sensitive applications, low-latency, high-throughput communication modules have a critical impact on overall system performance. Apache Storm currently uses Netty as its communication component, an asynchronous server/client framework based on TCP/IP protocol stack. The TCP/IP protocol stack has inherent performance flaws due to frequent memory copying and context switching. The Netty component not only limits the performance of the Storm but also increases the CPU load in the IPoIB (IP over InfiniBand) communication mode. In this paper, we introduce two new implementations for Apache Storm communication components with the help of RDMA technology. The performance evaluation on Mellanox QDR Cards (40 Gbps) shows that our implementations can achieve speedup up to 5\(\times\) compared with IPoIB and 10\(\times\) with Gigabit Ethernet. Our implementations also significantly reduce the CPU load and increase the throughput of the system.
Similar content being viewed by others
References
Abadi, D.J., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.B.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003). https://doi.org/10.1007/s00778-003-0095-z
Agostini, E., Rossetti, D., Potluri, S.: Gpudirect async: exploring GPU synchronous communication techniques for infiniband clusters. J. Parallel Distrib. Comput. 114, 28–45 (2018). https://doi.org/10.1016/j.jpdc.2017.12.007
Akidau, T., Bradshaw, R., Chambers, C., Chernyak, S., Fernández-Moctezuma, R., Lax, R., McVeety, S., Mills, D., Perry, F., Schmidt, E., Whittle, S.: The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proc. VLDB Endow. 8(12), 1792–1803 (2015). https://doi.org/10.14778/2824032.2824076
Amarasinghe, G., de Assunção, M.D., Harwood, A., Karunasekera, S.: Ecsnet++: a simulator for distributed stream processing on edge and cloud environments. Future Gener. Comput. Syst. 111, 401–418 (2020). https://doi.org/10.1016/j.future.2019.11.014
Corral-Plaza, D., Medina-Bulo, I., Ortiz, G., Boubeta-Puig, J.: A stream processing architecture for heterogeneous data sources in the internet of things. Comput. Stand. Interfaces (2020). https://doi.org/10.1016/j.csi.2020.103426
Evans, R.: Apache storm, a hands on tutorial. In: 2015 IEEE International Conference on Cloud Engineering, IC2E 2015, Tempe, AZ, USA, March 9–13, 2015, p. 2. IEEE Computer Society (2015). https://doi.org/10.1109/IC2E.2015.67
Friedman, E., Tzoumas, K.: Introduction to Apache Flink: Stream Processing for Real Time and Beyond, 1st edn. O’Reilly Media, Inc., Newton (2016)
He, Z., Wang, D., Fu, B., Tan, K., Hua, B., Zhang, Z.L., Zheng, K.: MASQ: RDMA for virtual private cloud. In: Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication, SIGCOMM ’20, p. 1–14. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3387514.3405849
Jia, C., Liu, J., Jin, X., Lin, H., An, H., Han, W., Wu, Z., Chi, M.: Improving the performance of distributed tensorflow with RDMA. Int. J. Parallel Program. 46(4), 674–685 (2018). https://doi.org/10.1007/s10766-017-0520-3
Liu, X., Buyya, R.: Resource management and scheduling in distributed stream processing systems: a taxonomy, review, and future directions. ACM Comput. Surv. 53(3), 50:1-50:41 (2020). https://doi.org/10.1145/3355399
Lu, F., Fang, T., Zhang, Z., Li, S., Chen, J., An, H., Han, W.: Improving the performance of mongodb with RDMA. In: Z. Xiao, L.T. Yang, P. Balaji, T. Li, K. Li, A.Y. Zomaya (eds.) 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2019, Zhangjiajie, China, August 10-12, 2019, pp. 1004–1010. IEEE (2019). https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00144
MacArthur, P., Liu, Q., Russell, R.D., Mizero, F., Veeraraghavan, M., Dennis, J.M.: An integrated tutorial on infiniband, verbs, and MPI. IEEE Commun. Surv. Tutor. 19(4), 2894–2926 (2017). https://doi.org/10.1109/COMST.2017.2746083
Ousterhout, K., Rasti, R., Ratnasamy, S., Shenker, S., Chun, B.: Making sense of performance in data analytics frameworks. In: 12th USENIX Symposium on Networked Systems Design and Implementation, NSDI 15, Oakland, CA, USA, May 4–6, 2015, pp. 293–307. USENIX Association (2015). https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/ousterhout
Stuedi, P., Metzler, B., Trivedi, A.: jVerbs: Ultra-low latency for data center applications. In: Proceedings of the 4th Annual Symposium on Cloud Computing, SoCC 2013 (2013). https://doi.org/10.1145/2523616.2523631
Sun, D., Gao, S., Liu, X., Li, F., Buyya, R.: Performance-aware deployment of streaming applications in distributed stream computing systems. Int. J. Bio Inspired Comput. 15(1), 52–62 (2020). https://doi.org/10.1504/IJBIC.2020.105892
Trivedi, A., Stuedi, P., Pfefferle, J., Stoica, R., Metzler, B., Koltsidas, I., Ioannou, N.: On the [ir]relevance of network performance for data processing. In: A. Clements, T. Condie (eds.) 8th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2016, Denver, CO, USA, June 20–21, 2016. USENIX Association (2016). https://www.usenix.org/conference/hotcloud16/workshop-program/presentation/trivedi
Wu, Z., Li, M., Chi, M., Xu, L., An, H.: Runtime adaptive matrix multiplication for the SW26010 many-core processor. IEEE Access 8, 156915–156928 (2020). https://doi.org/10.1109/ACCESS.2020.3019302
Yang, S., Son, S., Choi, M., Moon, Y.: Performance improvement of apache storm using infiniband RDMA. J. Supercomput. 75(10), 6804–6830 (2019). https://doi.org/10.1007/s11227-019-02905-7
Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J., Ghodsi, A., Gonzalez, J., Shenker, S., Stoica, I.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016). https://doi.org/10.1145/2934664
Zeuch, S., Breß, S., Rabl, T., Monte, B.D., Karimov, J., Lutz, C., Renz, M., Traub, J., Markl, V.: Analyzing efficient stream processing on modern hardware. Proc. VLDB Endow. 12(5), 516–530 (2019). https://doi.org/10.14778/3303753.3303758
Zhang, S., He, B., Dahlmeier, D., Zhou, A.C., Heinze, T.: Revisiting the design of data stream processing systems on multi-core processors. In: 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 19–22, 2017, pp. 659–670. IEEE Computer Society (2017). https://doi.org/10.1109/ICDE.2017.119
Zhang, S., He, J., Zhou, A.C., He, B.: Briskstream: Scaling data stream processing on shared-memory multicore architectures. In: P.A. Boncz, S. Manegold, A. Ailamaki, A. Deshpande, T. Kraska (eds.) Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30–July 5, 2019, pp. 705–722. ACM (2019). https://doi.org/10.1145/3299869.3300067
Acknowledgements
We are thankful to the reviewers for evaluating this study and providing valuable feedback.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The work is supported by the National Key Research and Development Program of China (Grant No. 2017YFB0202002).
Rights and permissions
About this article
Cite this article
Zhang, Z., Liu, Z., Jiang, Q. et al. RDMA-Based Apache Storm for High-Performance Stream Data Processing. Int J Parallel Prog 49, 671–684 (2021). https://doi.org/10.1007/s10766-021-00696-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-021-00696-0