Advertisement

A Variable-Length Network Encoding Protocol for Big Genomic Data

  • Mohammed Aledhari
  • Mohamed S. Hefeida
  • Fahad SaeedEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9674)

Abstract

Modern genomic studies utilize high-throughput instruments which can produce data at an astonishing rate. These big genomic datasets produced using next generation sequencing (NGS) machines can easily reach peta-scale level creating storage, analytic and transmission problems for large-scale system biology studies. Traditional networking protocols are oblivious to the data that is being transmitted and are designed for general purpose data transfer. In this paper we present a novel data-aware network transfer protocol to efficiently transfer big genomic data. Our protocol exploits the limited alphabet of DNA nucleotide and is developed over the hypertext transfer protocol (HTTP) framework. Our results show that proposed technique improves transmission up to 84 times when compared to normal HTTP encoding schemes. We also show that the performance of the resultant protocol (called VTTP) using a single machine is comparable to BitTorrent protocol used on 10 machines.

Keywords

Network protocol Big Data Genomics HTTP 

Notes

Acknowledgment

This work was supported in part by the grant NSF CCF-1464268.

References

  1. 1.
  2. 2.
  3. 3.
    Aledhari, M., Saeed, F.: Design and implementation of network transfer protocol for big genomic data. In: IEEE 4th International Congress on Big Data (BigData Congress 2015), June 2015Google Scholar
  4. 4.
    Bhushan, A.: File transfer protocol. The Internet Engineering Task Force (1972)Google Scholar
  5. 5.
    Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T.J., Higgins, D.G., Thompson, J.D.: Multiple sequence alignment with the clustal series of programs. Nucleic Acids Res. 31(13), 3497–3500 (2003)CrossRefGoogle Scholar
  6. 6.
    Deutsch, L.P.: Deflate compressed data format specification version 1.3. The Internet Engineering Task Force (1996)Google Scholar
  7. 7.
    Deutsch, P.: Gzip file format specification version 4.3. The Internet Engineering Task Force (1996)Google Scholar
  8. 8.
    Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., Berners-Lee, T.: Hypertext transfer protocol – http/1.1 (1999)Google Scholar
  9. 9.
    Forouzan, B.A.: TCP/IP Protocol Suite. McGraw-Hill, Inc., New York (2002)Google Scholar
  10. 10.
    Gilbert, E.N., Moore, E.F.: Variable-length binary encodings. Bell Syst. Tech. J. 38(4), 933–967 (1959)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Huffman, D.: A method for the construction of minimum-redundancy codes. Proc. IRE 40(9), 1098–1101 (1952)CrossRefzbMATHGoogle Scholar
  12. 12.
    Krakauer, L.J., Baxter, L.: Method of fixed-length binary encoding and decoding and apparatus for same (1989)Google Scholar
  13. 13.
    Mogul, J.C., Douglis, F., Feldmann, A., Krishnamurthy, B.: Potential benefits of delta encoding and data compression for http. SIGCOMM Comput. Commun. Rev. 27(4), 181–194 (1997). http://doi.acm.org/10.1145/263109.263162 CrossRefGoogle Scholar
  14. 14.
    Néron, B., Ménager, H., Maufrais, C., Joly, N., Maupetit, J., Letort, S., Carrere, S., Tuffery, P., Letondal, C.: Mobyle: a new full web bioinformatics framework. Bioinformatics 25(22), 3005–3011 (2009). http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2773253/ CrossRefGoogle Scholar
  15. 15.
    Postel, J.: User datagram protocol. RFC 768, August 1980Google Scholar
  16. 16.
    Pajarola, R.: Fast prefix code processing. In: Proceedings of the International Conference on Information Technology: Coding and Computing [Computers and Communications], ITCC 2003, pp. 206–211. IEEE (2003)Google Scholar
  17. 17.
    Postel, J.: Transmission control protocol. The Internet Engineering Task Force (1981)Google Scholar
  18. 18.
    Postel, J., Reynolds, J.: File transfer protocol. The Internet Engineering Task Force (1985)Google Scholar
  19. 19.
    Rathore, Y., Ahirwar, M.K., Pandey, R.: A brief study of data compression algorithms. Int. J. Comput. Sci. Inf. Secur. 11(10), 86 (2013)Google Scholar
  20. 20.
    Sayood, K.: Introduction to data compression. Newnes (2012)Google Scholar
  21. 21.
    Singh, E.: Sap hana platform for healthcare: Bringing the world closer to real-time personalized medicine, October 2013Google Scholar
  22. 22.
    Stevens, W.R.: TCP/IP Illustrated (vol. 1): The Protocols. Addison-Wesley Longman Publishing Co., Inc., Boston (1993)zbMATHGoogle Scholar
  23. 23.
    Touch, J., Heidemann, J., Obraczka, K.: Analysis of http performance. ISI Research report ISI/RR-98-463, (original report dated August 1996), USC/Information Sciences Institute (1998)Google Scholar
  24. 24.
    Welch, T.A.: A technique for high-performance data compression. Computer 6(17), 8–19 (1984)CrossRefGoogle Scholar
  25. 25.
    Zimmermann, H.: OSI reference model—the ISO model of architecture for open systems interconnection. Innovations in Internetworking, pp. 2–9. Artech House Inc., Norwood (1988) http://dl.acm.org/citation.cfm?id=59309.59310
  26. 26.
    Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© IFIP International Federation for Information Processing 2016

Authors and Affiliations

  • Mohammed Aledhari
    • 1
  • Mohamed S. Hefeida
    • 2
  • Fahad Saeed
    • 1
    Email author
  1. 1.Western Michigan UniversityKalamazooUSA
  2. 2.American University of the Middle EastEqailaKuwait

Personalised recommendations