Skip to main content

High Performance Network Architectures for Data Intensive Computing

  • Chapter
  • First Online:
Handbook of Data Intensive Computing

Data Intensive Computing is characterized by problems where data is the primary challenger, whether it is the complexity, size, or rate of the data acquisition. The hardware platform required for a data intensive computing environment consists of tens, sometimes even hundreds, of thousands of compute nodes with their corresponding networking and storage subsystems, power distribution and conditioning equipment, and extensive cooling systems. An essential requirement for processing exploding volumes of data is to move processing and analysis to data, where possible, rather than data to processing and analysis [1]. It is also critical to maximize the parallelism over the data and the efficiency of data movement between discrete devices in a network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. C. Tanasescu and T. Reed, “Data Intensive Computing: How SGI Altrix ICE and Intel Xeon Processor 5500 Series Help Sustain HPC Efficiency Amid Explosive Data Growth,” http://www.sgi.com/pdfs/4154.pdf, 2009

  2. J. McKendrick, “Size of the data universe:1.2 zettabytes and growing fast,” ZDNet, May 2010

    Google Scholar 

  3. Y. Chen, D. Pavlov and J. F. Canny, “Large-Scale Behavioral Targetting,” ACM KDD’09, Paris, France, July 2009

    Google Scholar 

  4. D. Newman, A. Asuncion, P. Smyth and Max Welling, “Distributed Algorithms for Topic Models,” Journal of Machine Learning Research, Aug., 2009

    Google Scholar 

  5. R. Shankar and G. Narendra, “MapReduce Programming with Apache Hadoop,” JavaWorld.com, Sept. 2008

    Google Scholar 

  6. LexisNexis Risk Solutions, “LexisNexis HPCC: ECL Programmers Guide, ”http://www.lexisnexis.com/risk/about/guides/programmers-guide.pdf, 2010

    Google Scholar 

  7. LexisNexis Risk Solutions, “High-Performance Cluster Computing,” http://www.lexisnexis.com/government/solutions/literature/hpcc-das.pdf, 2010

  8. G. Harrison, “10 Things You Should Know About NoSQL Databases,” http://www.techrepublic.com/downloads/10-things-you-should-know-about-nosql-databases, Aug 2010

  9. C. Strauch, “NoSQL Databases,” Stuttgart Media University, Feb 2011

    Google Scholar 

  10. HP, “HP Superdome 2: the Ultimate Mission-critical Platform,” http://www.compaq.com/cpq-storage/pdfs/4AA1--7762ENW.pdf, June 2010

  11. J. Baker, C. Bond, J. C. Corbett and etc., “Megastore: Providing Scalable, Highly Available Storage for Interactive Services,” 5th Biennial Conference on Innovative Data Systems Research (CIDR’11), January 2011

    Google Scholar 

  12. M. Diehl, “Database Replicatio with Mysql,” Linux Journal, May 2010

    Google Scholar 

  13. L. Barroso and U. Hölzle “The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines,” 2009

    Google Scholar 

  14. J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” OSDI’04: Sixth Symposium on Operating System Design and Implementation,” San Francisco, CA, Dec. 2004

    Google Scholar 

  15. Cisco Systems, “Data Center Design – IP Network Infrastructure,” http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/DC_3_0/DC-3_0_IPInfra.pdf, Oct. 2009

  16. M. Arregoces and M. Portolani, “Data Center Fundamentals,” Cisco Press, 2004

    Google Scholar 

  17. B. Hedlund, “Top of Rack vs End of Row Data Center Designs,” http://bradhedlund.com/2009/04/05/top-of-rack-vs-end-of-row-data-center-designs/, April 2009

  18. A. Greenberg, J. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel and S. Sengupta “VL2: A Scalable and Flexible Data Center Network,” ACM SIGCOMM, Barcelona, Spain, Aug., 2009

    Google Scholar 

  19. A. Greenberg, P. Lahiri, D. A. Maltz, P. Patel and S. Sengupta. “Towards a next generation data center architecture: Scalability and Commoditization,” PRESTO Workshop at SIGCOMM, 2008

    Google Scholar 

  20. S. M. Rumble, D. Ongaro, and R. Stutsman, M. Rosenblum and J. K. Ousterhout, “It’s Time for Low Latency,” Proceedings of the 13th Workshop on Hot Topics in Operating Systems (HotOS 2011).

    Google Scholar 

  21. J. Dean, “Designs, lessons and advice from building large distributed systems,” Keynote talk: The 3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware (October 2009)

    Google Scholar 

  22. S. Amershi, J. Fogarty, A. Kapoor and D. Tan, “Effectie End-User Interaction with Machine Learning,” AAAI-11, Nector, 2011

    Google Scholar 

  23. R. Perlman, Interconnections, Second Edition Addison-Wesley, 2000.

    Google Scholar 

  24. M. Al-Fares, A. Loukissas, and A. Vahdat, “A scalable, Commodity Data Center network Architecture,” in Proceedings of SIGCOMM, 2008

    Google Scholar 

  25. S. Mahapatra and X. Yuan, “Load Balancing Mechanisms in Data Center Networks,” the 7th Int. Conf.& Expo on Emerging Technologies for a Smarter World (CEWIT), Sept. 2010. (invited)

    Google Scholar 

  26. W. J. Dally and B. Towles, Principles and Practices of Interconnection Networks, Morgan Kaufmann Publishers, 2004

    Google Scholar 

  27. G. Lin and N. Pippenger, “Parallel Algorithms for Routing in Non-Blocking Networks,” Math. System Theory, Vol. 27 pp. 29–40, 1994

    Article  MATH  MathSciNet  Google Scholar 

  28. L. G. Valiant, “A scheme for fast parallel communication,” SIAM Journal on Computing, Vol. 11, No. 2, pp. 350–361, 1982

    Article  MATH  MathSciNet  Google Scholar 

  29. J. Touch and R. Perlman, “Transparent Interconnection of Lots of Links (TRILL): Problem and Applicability Statement,” RFC 5556, IETF, May 2009

    Google Scholar 

  30. R. Perlman, D. Eastlake, S. Switches, D. G. Dutt, S. Gai and A. Ghanwani, “RBridges: Base Protocol Specification,” Internet-Draft, IETF, Mar. 2010 (http://tools.ietf.org/html/draft-ietf-trill-rbridge-protocol-16).

  31. P. Ashwood-Smith, “Shortest Path Bridging IEEE 802.1aq Overview & Applications,” in UK Network Operators Forum, Sept. 2010 (http://www.uknof.org.uk/uknof17/Ashwood_Smith-SPB.pdf).

  32. D. Eastlate, P. Ashwood-Smith, S. Keesara, and P. Unbehagen, “The Great Debate: TRILL Versus 802.1aq (SPB),” in North American Network Operators’ Group (NANOG) Meeting 50, Oct. 2010 (http://www.nanog.org/meetings/nanog50/presentations/Monday/NANOG50.Talk63.NANOG50_TRILL-SPB-Debate-Roisman.pdf).

  33. Cisco Systems, “Data Center Interconnect: Layer 2 Extension between Remote Data,” http://www.cisco.com/en/US/prod/collateral/switches/ps5718/ps708/white_paper_c11_493718. pdf, July 2009

  34. Cisco Systems, “Cisco Overlay Transport Virtualization Technology Introduction and Deployment Considerations,” Jan. 2011 http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/DCI/whitepaper/DCI3_OTV_Intro_WP.pdf)

  35. R. Moskowitz and P. Nikander, “Host Identity Protocol (HIP) Architecture,” RFC 4423, IETF, May 2006 (http://www.ietf.org/rfc/rfc4423.txt).

  36. D. Farinacci, V. Fuller, D. Meyer and D. Lewis, “Locator/ID Separation Protocol (LISP),” draft-farinacci-lisp-12, IETF, Mar. 2009 (http://tools.ietf.org/html/draft-farinacci-lisp-12).

  37. D. Ferrucci, E. Brown, J. Chu-Carroll and etc., “Building Watson: An Overview of the DeepQA Project,” Association for the Advancement of Artificial Intelligence, Fall, 2010

    Google Scholar 

  38. J. Qiu, J. Ekanayake, T. Guanarathene, and etc., “Data Intensive Computing for Bioinformatics,” Bloomington, IN, Indiana University, December, 2009

    Google Scholar 

  39. J. Shafer, S. Rixner and A.L. Cox, “Datacenter Storage Architecture for MapReduce Applications,” Workshop on Architectural Concerns in Large Datacenters (ACLD 2009), Austin, TX, June 2009.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Geng Lin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Lin, G., Liu, E. (2011). High Performance Network Architectures for Data Intensive Computing. In: Furht, B., Escalante, A. (eds) Handbook of Data Intensive Computing. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1415-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-1415-5_1

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-1414-8

  • Online ISBN: 978-1-4614-1415-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics