Advertisement

Service Oriented Computing and Applications

, Volume 13, Issue 4, pp 287–295 | Cite as

Hydrological stream data pipeline framework based on IoTDB

  • YuanSheng Lou
  • Yu Qin
  • Feng YeEmail author
  • Peng Zhang
  • Yong Chen
SPECIAL ISSUE PAPER
  • 73 Downloads

Abstract

With the increasing amount of hydrological data in Chuhe river basin, the traditional relational database has been unable to meet the needs of users, which not only makes it difficult to achieve low latency and high throughput in the real-time transmission of hydrological data, but also causes the phenomenon of long time or even system crash when querying large amount of annual water-level data. To solve this problem, this paper proposes a stream data pipeline framework based on timeseries databases IoTDB and Kafka, which can provide services for hydrological early warning and anomaly detection researchers. Based on the hydrological sensor data of Chuhe river, the processing scenarios of sensor stream data are set and compared with other NoSQL (HBase, MongoDB, RiakTS and Redis) in different scenarios. The performance and workload of different NoSQL in this data pipeline are tested. Finally, it is docked with Flink real-time stream data processing platform and compared with other data pipelines. The experimental results show that the stream data pipeline composed of IoTDB, Kafka and Flink is outstanding in data acquisition, transmission, incremental query and data analysis.

Keywords

IoTDB NoSQL Kafka Stream data Data pipeline Real-time processing 

Notes

Acknowledgements

This work was partly supported by the 2018 Jiangsu Province Key Research and Development Program (Modern Agriculture) Project under Grant No. 20195013812, 2017 Jiangsu Province Postdoctoral Research Funding Project under Grant No. 1701020C, 2017 Six Talent Peaks Endorsement Project of Jiangsu under Grant No. XYDXX- 078, the Fundamental Research Funds for the Central Universities under Grant No. 2013B01814.

References

  1. 1.
    Tang E, Fan Y (2017) Performance comparison between five NoSQL databases. In: International conference on cloud computing & big data. IEEEGoogle Scholar
  2. 2.
    Kang L, Deolalikar V, Pradhan N (2015) Big data gathering and mining pipelines for CRM using open-source. In: IEEE international conference on big dataGoogle Scholar
  3. 3.
    Raj P (2018) A detailed analysis of NoSQL and NewSQL databases for bigdata analytics and distributed computing. Adv Comput 109:1–48CrossRefGoogle Scholar
  4. 4.
    Lawlor B, Lynch R, Mac MA, Walsh P (2018) Field of genes: using apache kafka as a bioinformatic data repository. Gigascience 7(4):giy036CrossRefGoogle Scholar
  5. 5.
    Nazeer H, Iqbal W, Bokhari F (2017) Real-time text analytics pipeline using open-source big data tools. arXiv:1712.04344v1
  6. 6.
    Freire SM, Teodoro D, Wei-Kleiner F, Sundvall E, Karlsson D, Lambrix P (2016) Comparing the performance of nosql approaches for managing archetype-based electronic health record data. PLoS ONE 11(3):e0150069CrossRefGoogle Scholar
  7. 7.
    Nguyen CN, Kim JS, Hwang S (2016) KOHA: building a kafka-based distributed queue system on the fly in a Hadoop cluster. Foundations and applications of self* systems. In: IEEE International Workshops on IEEEGoogle Scholar
  8. 8.
    Yi M, Ting X, Shao-Bin L (2017) Research on NoSQL distributed big data mining method in complex attribute environment. Sci Technol EngGoogle Scholar
  9. 9.
    O’Donovan P, Leahy K, Bruton K (2015) An industrial big data pipeline for data-driven analytics maintenance applications in large-scale smart manufacturing facilities. J. Big Data 2(1):25CrossRefGoogle Scholar
  10. 10.
    Nallakaruppan MK, Kumaran US (2018) Quick fix for obstacles emerging in management recruitment measure using IOT-based candidate selection. Serv. Oriented Comput Appl 12(3–4):275–284CrossRefGoogle Scholar
  11. 11.
    Zhang Q, Li S, Li Z (2015) CHARM: a cost-efficient multi-cloud data hosting scheme with high availability. IEEE Trans Cloud Comput 3(3):1CrossRefGoogle Scholar
  12. 12.
    Al-Sakran A, Qattous H, Hijjawi M (2018) A proposed performance evaluation of NoSQL databases in the field of IoT. In: The 8th international conference on computer science and information technology (CSIT 2018). IEEE Computer SocietyGoogle Scholar
  13. 13.
    Veloudis S, Paraskakis I, Petsos C (2017) Cloud service broker-age: enhancing resilience in virtual enterprises through service governance and quality assurance. Serv. Oriented Comput Appl 11(4):445–458CrossRefGoogle Scholar
  14. 14.
    Feng Y, Peng Z, Sheng G, Yong C (2019) Intelligent Chuhe system based on the new generation of big data processing engine Flink. Water Resour Prot 2:90–94Google Scholar
  15. 15.
    Reniers V, Rafique A, Van Landuyt D, Joosen W (2017) Object-nosql database mappers: a benchmark study on the performance overhead. J Internet Serv Appl 8(1):1CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.College of Computer and InformationHohai UniversityNanjingChina
  2. 2.Jiangsu Water Resources DepartmentNanjingChina
  3. 3.Postdoctoral CentreNanjing Longyuan Micro-Electronic CompanyNanjingChina

Personalised recommendations