CS260 Seminar
Winter 2010
High Performance Data Center
Networking
I. Summary
This seminar is about networking issues that arise in the
context of supporting tightly-coupled, high-performance clusters
located in a single machine room, in contrast to loosely-coupled
distributed systems or general data services over the public Internet.
Historically, this subject has developed from two different background
areas. First, supercomputer centers have moved away from big
special-purpose vectorized computers (i.e., single instruction,
multiple data [SIMD] architectures, such as the Cray-1) to large groups
of commodity processors connected by an LAN (i.e., multiple
instruction, multiple data [MIMD] architectures, such as a Beowulf
cluster). Meanwhile, the interfaces on high performance disk drives
began to migrate from parallel ribbon cables (such as SCSI) to serial optical
cables (such as fibre
channel). Since serial optical cables could be much longer than
parallel ribbon cables without degrading the signal, this led to the
idea of creating Storage Area
Networks (SANs) to allow disks to be housed in a different physical
location from the computer system to which they are logically attached.
Originally SANs were constructed using dedicated fibre channel
equipment, running in parallel to (but not connected with) the
organizations existing Local Area Network (which most probably was
Ethernet-based). More recently, however, there has been some interest
in developing SANs that work using commodity Ethernet equipment to save
cost and reduce operational / maintenance / training costs by
standardizing on a single technology.
Ethernet was never designed to serve as a
high-performance interconnect within a tighly-coupled computer system,
so a number of technical issues have come to light, and work is
currently underway to address them. Our goal is to learn:
- the performance
requirements for this type of application,
- the capabilities
of various networking technologies that were / are / may be used in the
future for this application,
- the published data
(in terms of case studies, simulations, analytical models, etc) that is
relevant to point (1.) or (2.), or the application of some specific
technology to this problem domain.
II. Presentation Topics
Each of you will need to make
two
different class presentations (each about 20-30 minutes in length) on
paper of your choosing related to the goals of the class. As soon as
you have chosen a paper, send me a link and/or copy for my approval. We
will then fit them into a schedule of presentations.
Here is a list of possible topics:
- Principles of operation and performance studies about various
networking technologies when used in this application. Other
networking technologies include Myrinet, InfiniBand, fiber
channel, and others that were primarily developed for high end
supercomputer systems at the national labs. (The published rankings of
the top 500 supercomputers is a
good place to look for ideas.) What makes them special, and what
evidence is there for these claims? How well would they fit into the
data center model (as opposed to the cost-is-no-object supercomputer
world)? Here is an interesting
paper about how networking technologies in the top 500 supercomputers
have been changing over time.
- Performance studies. We already looked at one paper about TCP/IP
over Ethernet from Carnegie Mellon, which does NOT include anything
about BCN. What happens when we add BCN or a similar technology to
Ethernet? What about hacking TCP? Recalling that the IBM Tech Report we looked at in
week 3 claimed that high utilization of the network was a specific goal
(and over-provisioning was not a viable solution), what is known about
workloads and network utilizations in real, operating clusters? More
importantly, since these networks are intended to take the place of
traditional computer-system interconnects like PCIe, what is
known and/or expected in terms of their
utilization levels for a well-designed, functioning computer system???
- Many sources, both for and against the "Data Center Ethernet"
movement, have argued the need for lossless data transfer over the
network. If this is true, then congestion control, flow control, and
fairness in networks has a long and rich history in the networking
literature. What else is there beyond Ethernet PAUSE and TCP congestion
window manipulation? In particular, what evidence is there to support
the idea that BCN will work? More generally, if we look beyond
Ethernet, Infiniband claims that alternate-path routing (which it
supports, and is not allowed in switched Ethernet systems because of
the Spanning Tree algorithm) is an important feature, whereas over 30
years ago, Harry
Rudin argued that all types of dynamic, adaptive routing algorithms
were fundamentally not of significant value. Meanwhile, Myrinet
features a novel sub-packet level "flit" (aka "flow control unit")
based hop-by-hop flow control method which is potentially subject to
deadlocks without strict controls over the paths.