CS204 Fall 2013

Group Project

Due: 11:59pm Thursday December 12, 2013

I. Summary

The project will be a non-trivial original piece of work, which is produced by an individual student or by 2  students working together as a team. The project could be structured in any of the following ways: 

The end result should be at the level of a decent workshop paper. I will offer several ideas for projects related to issues we have been covering in lecture.  However, you are welcome to select a topic of your own choosing if you prefer. Please let me know ASAP when you have chosen a topic so I can offer suggestions on how to proceed before you (attempt to) do all the work.

II. Give me your own work -- do not copy from other sources

The purpose of the project is for you to show me that you understand the concepts and techniques covered in the class, and that you can apply them in ways that go beyond simply "plugging numbers into formulas" or telling what someone else has written. In particular, do not take material directly from other sources (e.g., written descriptions, figures, computer code, experimental data, etc.) and use it as if it was your own work! That is called "plagiarism" and it is an academic offence for which I give a failing grade.

Almost every year I catch someone who turned in a "survey paper" that was copied word-for-word from another document (or perhaps pieced together from sections copied from 2 or 3 documents). That is just as wrong as copying someone else's computer program. Moreover, keeping the same structure while rewriting each sentence with different words with the same meaning is still wrong -- like changing the variable names, or reordering the function definitions, to hide its origin.  It is fine to include a few specific items taken directly from other sources -- a short definition, a well-done figure, etc -- as long as you state the source of the item and provide a reference. Anything longer than a few sentences should be rewritten and reorganized using your own words (yes, even if you are not a native-speaker of English!) and possibly your own terminology and/or mathematical notation. (Developing your own terminology is surprisingly important when trying to compare related papers from different authors since their terminologies are often unrelated and/or conflicting.) More importantly, your goal should be to explain each author's approach, specific contribution, point of view, etc, in a way that highlights the differences and similarities among the authors. 

III. Here are Some Sample Topics

(1) High Performance TCP Connections over Wide Area Networks

TCP's standard sliding-window mechanism creates a major performance bottleneck on "Long-Fat Networks"  (LFNs). Long ago, a 1.5 Mbps connection through a geostationary satellite link would have been considered an LFN, because its 0.5 second minimum RTT demands an advertised window size of at least 100 Kbytes to avoid starvation, whereas 64Kbytes is the largest value that can fit in the 16-bit field in the TCP header. RFC 1323 offered some relief from the problem by defining the window-scaling option, protection against sequence number wrap-around, and recommendations for RTT calculation. Today, however, there is a demand for multi-gigabit per second connection speeds between supercomputer centers located on different continents -- which is far beyond the capabilities of TCP additive-increase, multiplicative-decrease congestion control algorithm. Therefore, let's ignore the issue of fitting a very large advertised window and/or byte-level sequence number into a TCP header and focus on adapting TCP's additive-increase, multiplicative-decrease (AIMD) congestion control algorithm to work well on LFNs.
For concreteness, assume the target is to support a continuous transmission rate of about 1 Gigabit/sec (i.e., 109 bps) across a trans-continental path (i.e., the RTT is 100 msec.). In this case, the TCP sender would need to use a congestion window size of about 8,500 packets (based on an Ethernet-like 1,500 byte MSS) in order to maintain the target rate. Consider one of the following sub-topics:

(1a) Notice that AIMD's linear growth (one MSS per RTT) is proportionately very small as the size of the congestion window increases, so the penalty (in lost throughput) for each packet loss is very severe for LFNs. Therefore, evaluate the benefits of adding the following RAID-inspired form of error correction to the TCP stream. Following each block of B consecutive TCP segments, assume the TCP transmitter sends an special "error correction segment" consisting of the bitwise exclusive-OR of the entire block to the TCP receiver. Thus, if at least B out of B+1 segments associated with this block reach the receiver, it can reconstruct a single lost TCP segment. However, two or most losses from the same block cannot be fixed.  Furthermore, transmitting the special error correction segments reduces the throughput of the TCP connection by a factor of B/(B+1), whether or not there were any lost segments.  Evaluate the effectiveness of this error-correction method on the throughput for the target TCP flow. Estimate the optimal value of B as a function of the probability of loss, P, between 10-3 and 10-6.

(1b) This time consider how to modify the linear growth in AIMD's congestion avoidance phase to better handle a very large congestion window. For example, you could try increasing the congestion window by more than one MSS per RTT, such as its square root or logarithm. Find a function that produces a similar ratio between the rates of decrease (-50%) and increase as AIMD would for small window sizes (say 4-32). Evaluate what happens if two LFNs using your modified version of TCP meet at a common bottleneck.

(1c) Finally, consider the option of changing AIMD's unit of window adjustment from a single maximum-sized segment (MSS) to a stream of packets.
Unlike part (1b), this time we assume that the TCP sender uses AIMD to adjust the gap between consecutive packets while transmitting a continous stream,
i.e., double the gap after each loss, otherwise reduce the gap linearly.

(2) Modify TCP to handle the exchange of "direct access/block oriented" data (like a disk file) rather than "sequential" data (like a magnetic tape).

Recall that a TCP session transfers a sequence of bytes [a, a+1, ..., a+n]  -- reliably -- from Alice to Bob, where the parameters  a and n are determined by Alice's SYN and FIN packets. Bob only releases the data to his client in sequential order, and Bob's advertised window restricts the maximum amount by which Alice's transmissions can deviate from this strict ordering. However, this reliable byte-stream abstraction is not appropriate for every application. For example, in a live video chat application, Bob may prefer a momentary gap in the picture over a long pause in the event of a packet loss. More generally, suppose the data represents a decision tree, and after seeing node 1 Bob knows he wants to jump directly to the subtree rooted at node 50. Why not let him skip the intermediate nodes if he already knows the information is useless to him?

Thus, consider the alternative of abstracting the data transfer as a sequence of reads by Bob from a direct-access file at Alice. At connection establishment, Alice would announce both the first and last sequence numbers in the file. (Later on, Alice could update these values if the attributes of the file changed, for example, a live video stream where new images are constantly being added and old ones are thrown away.) Similarly, at connection establishment and instead of acknowledging new data as it is receieved, Bob would respond to Alice with updated "wish lists" of content from file he wants Alice to send. Clearly, Bob can adjust the size of his "wish list" apply flow control, similar to the "advertized window" in TCP acknowledgements. However, since Bob may choose not to ask for data to be sent sequentially (and he is also free to change his mind, and remove items from his wish list that he never received), Alice cannot identify lost packets from the way in which Bob changes his "wish list". Therefore, Bob must provide some other feedback mechanism beyond the "wish list" so that Alice can use a TCP-like congestion-control algorithm.

Modify the TCP header fields that are carried in connection-establishment packets, data packets, and acknowledgements to support the "direct access" model, described above. Show how your modified header fields would handle a variety of situations, including: (a) normal TCP-like sequential, reliable, flow-controlled data transfers, (b) real-time unreliable flow-controlled multimedia streaming, and (c) receiver-controlled non-sequential access to a file (e.g., Bob carrying out a binary search over a large file stored at Alice, without transfering the whole data set).

(4) Fun with Network Trace Data.

For many years, researchers have been studying the statistical properties of network traffic collected by a "sniffer" attached to a single link in the Internet core. As we will see in lecture, there has been great interest in understanding the packet inter-arrival time distributions, i.e., Poisson vs. long-range dependence.  However, the properties of the  packet size distribution have largely been ignored, other than recognizing that sizes tend to be bimodal (e.g., data segements vs. ACKs). In this project, you will look at publically available sources of network traces, such as the WIDE project, to determine the degree (if any) and nature of the statistical correlation in the lengths of sequentially-transmitted of packets. More specifically, let f(x) = Pr[packet size is x bytes] be the unconditional probability density function for packet sizes (for example, see the summary page for a typical WIDE dataset). In general, classical analytical models in queueing theory must assume that the size of each packet is chosen independently from f(x), without regard for the sizes of any other packets. Another possibility, inspired by the study of long-range dependence in interarrival times, would be to assume that the sizes of the ith and jth packet in the sequence are correlated that depends on their separation, |i-j|. A third possibility might be that the packet size distribution is related to its location relative to the start of a busy period, i.e., if xi and xj represent the sizes of the ith and jth packets (if any) transmitted during a single busy period, then how are their respective packet size distributions fi(xi) and fj(xj) related?

(5) Data Center Networking

In recent years, there has been considerable interest in how to design specialized networks for large data centers. Start with the idea that a data center is a large cluster with thousands (perhaps hundreds of thousands) of "nodes" (rack-mount PCs, file servers, etc) connected by a fast local area network (such as 1G or 10G Ethernet). Each of those rack-mount PCs may further serve as the platform for hosting tens, hundreds, or ?? of virtual machines. Most of those virtual machines are configured to serve some specific task. However, the number of active copies of each VM configuration, and their placement on physical nodes may change dynamically, as a function of workload. In addition, the data center may provide "cloud computing" services to various outside clients, each of which has service level agreements that may require the data center to dedicate certain resources to that client while isolating it from everyone else. Therefore, in the ideal case, a data center network should be able to move lots of data quickly from any subset of nodes to any other subset of nodes. (For example, search-engine query processing is based on the MapReduce paradigm, which follows a broadcast/convergcast data exchange pattern over a multi-level tree structure. Others have suggested "striping" a large file across multiple file servers to reduce a client's access time, which may lead to network overload due to "incast" traffic.)

Orthogonal to transport-layer issues that arise in high-throughput, low-latency networking, data center networking must also overcome the issues of scaling, in which (what could viewed as) a single local area network might need to support a million nodes (including virtual machines, all of which may not all be active at the same time).  For example, suppose we try to build such a network using commodity layer-2 bridges, each of which has 50 1G Ethernet ports (say) connected by an internal non-blocking interconnection fabric.


(6) Modify TCP to solve the "incast" congestion problem in data center networks
The phenomenon of network overload due to "incast" traffic is basically a self-inflicted distributed denial of service attack, where a client invites larger number of file servers to send it large volumes of data simultaneously. The right way to solve this problem is NOT reducing TCP timeout values, but allowing the application on the TCP client that is requesting the data to instruct the file serves to SLOW DOWN their transmission rate so it does not overwhelm the switch buffer.  For years there have been proprietary algorithms for lowering TCP transmisson rates to accomodate a flow where the sender is much faster than the receiver. When dialup connections to the Internet were common, web servers would voluntarily reduce their transmission rate whenever they detected a slow connection. Conversely, Packeteer has developed external "traffic shaper" devices that can adjust the throughput of TCP flows passing through them. Thus, there is nothing in TCP to prevent a sender from reducing its transmission rate. However, there is no way for the receiver to REQUEST a lower transmission rate other than reducing its advertised window. In a high performance local area network, even an advertised window of 1 MSS doesn't cause much of a slowdown. Thus, we must find a mechanism to allow the TCP session to negotiate slow speeds.