CS260 Seminar

Winter 2010

High Performance Data Center Networking

I. Summary

This seminar is about networking issues that arise in the context of supporting tightly-coupled, high-performance clusters located in a single machine room, in contrast to loosely-coupled distributed systems or general data services over the public Internet. Historically, this subject has developed from two different background areas. First, supercomputer centers have moved away from big special-purpose vectorized computers (i.e., single instruction, multiple data [SIMD] architectures, such as the Cray-1) to large groups of commodity processors connected by an LAN (i.e., multiple instruction, multiple data [MIMD] architectures, such as a Beowulf cluster). Meanwhile, the interfaces on high performance disk drives began to migrate from parallel ribbon cables (such as SCSI) to serial optical cables (such as fibre channel). Since serial optical cables could be much longer than parallel ribbon cables without degrading the signal, this led to the idea of creating Storage Area Networks (SANs) to allow disks to be housed in a different physical location from the computer system to which they are logically attached. Originally SANs were constructed using dedicated fibre channel equipment, running in parallel to (but not connected with) the organizations existing Local Area Network (which most probably was Ethernet-based). More recently, however, there has been some interest in developing SANs that work using commodity Ethernet equipment to save cost and reduce operational / maintenance / training costs by standardizing on a single technology.

Ethernet was never designed to serve as a high-performance interconnect within a tighly-coupled computer system, so a number of technical issues have come to light, and work is currently underway to address them. Our goal is to learn:

the performance requirements for this type of application,
the capabilities of various networking technologies that were / are / may be used in the future for this application,
the published data (in terms of case studies, simulations, analytical models, etc) that is relevant to point (1.) or (2.), or the application of some specific technology to this problem domain.

II. Presentation Topics

Each of you will need to make two different class presentations (each about 20-30 minutes in length) on paper of your choosing related to the goals of the class. As soon as you have chosen a paper, send me a link and/or copy for my approval. We will then fit them into a schedule of presentations.

Here is a list of possible topics:

Principles of operation and performance studies about various networking technologies when used in this application. Other networking technologies include Myrinet, InfiniBand, fiber channel, and others that were primarily developed for high end supercomputer systems at the national labs. (The published rankings of the top 500 supercomputers is a good place to look for ideas.) What makes them special, and what evidence is there for these claims? How well would they fit into the data center model (as opposed to the cost-is-no-object supercomputer world)? Here is an interesting paper about how networking technologies in the top 500 supercomputers have been changing over time.
Performance studies. We already looked at one paper about TCP/IP over Ethernet from Carnegie Mellon, which does NOT include anything about BCN. What happens when we add BCN or a similar technology to Ethernet? What about hacking TCP? Recalling that the IBM Tech Report we looked at in week 3 claimed that high utilization of the network was a specific goal (and over-provisioning was not a viable solution), what is known about workloads and network utilizations in real, operating clusters? More importantly, since these networks are intended to take the place of traditional computer-system interconnects like PCIe, what is known and/or expected in terms of their utilization levels for a well-designed, functioning computer system???
Many sources, both for and against the "Data Center Ethernet" movement, have argued the need for lossless data transfer over the network. If this is true, then congestion control, flow control, and fairness in networks has a long and rich history in the networking literature. What else is there beyond Ethernet PAUSE and TCP congestion window manipulation? In particular, what evidence is there to support the idea that BCN will work? More generally, if we look beyond Ethernet, Infiniband claims that alternate-path routing (which it supports, and is not allowed in switched Ethernet systems because of the Spanning Tree algorithm) is an important feature, whereas over 30 years ago, Harry Rudin argued that all types of dynamic, adaptive routing algorithms were fundamentally not of significant value. Meanwhile, Myrinet features a novel sub-packet level "flit" (aka "flow control unit") based hop-by-hop flow control method which is potentially subject to deadlocks without strict controls over the paths.

CS260 Seminar

Winter 2010

High Performance Data Center Networking

I. Summary

II. Presentation Topics

III. Presentation Schedule