Department of Computer Science and Engineering
CS 179i: Project in Computer
Science / Networks
Spring 2014
Suggested Topic: To design and implement yet another new network
performance monitoring tool
- Background on network performance monitoring tools
- Here
is a detailed tutorial that explains three different classes of
monitoring tools based on:
- raw packet capture (via tcpdump);
- polling the packet switch via SNMP to read its internal
counters (e.g., MRTG);
and
- processing flow records exported by the packet switch
(e.g., Cisco Netflow)
- In recent years, a new method for controlling packet
switches called OpenFlow
has been developed, which creates a new opportunity for monitoring
network traffic
- OpenFlow separates the "data plane" (i.e., the path from
input port to output port followed by each packet) from the "control
plane" (i.e., the forwarding tables and control policies which
determine the fate of a packet -- give it a forwarding path or throw it
away)
- This functional separation allows the "control plane" to
be moved to an external server, which handles the control functions of
multiple packet switches in the same area. This neighborhood controller
has more information than an individual packet switch, which allows it
to make coordinated forwarding decisions across multiple switches to
improve their overall efficiency.
- In order for the neighborhood controller to do its job,
each OpenFlow switch must send control messages to the controller that
announce the start and end of every TCP session
- This
paper recently proposed the idea of passively listening to those
OpenFlow control messages to learn about the traffic on a link without
adding overhead to the packet switch for answering SNMP polling
messages or exporting flow records. However, their "zero overhead
method" (ZOM) has two related drawbacks:
- Information about each TCP flow (starting time, ending
time, total bytes sent) comes from the OpenFlow end session control
message. Thus, the ZOM algorithm cannot operate in real time (i.e., it
must back up to the starting time and update the traffic volumes during
the lifetime of each dying session). This time delay is quite serious
because a significant number of TCP sessions have a very long duration
and carry a very large volume of traffic (i.e., they are
"elephants" mixed into a population of "mice").
- Network traffic is famous for its "burstiness" (i.e.,
rapidly varying peaks and valleys, rather than steady flow rates).
Since ZOM spreads the traffic volume for each TCP session evenly
between its starting and ending times, it is likely to underestimate
the burstiness of the actual traffic on the link.
- What work needs to be done.
- We don't have access to any OpenFlow-compatible network
hardware, but we do have access to a virtually unlimited number of raw
network trace files from the WIDE
project
- These trace files are in tcpdump format, but there is
another tool called wireshark
that is much easier to use.
- The trace consists of one record per packet, including a
timestamp, source/dest IP address, and source/dest port numbers (these
addresses are anonymized to protect the privacy of people using the
network)
- You can write a program (say "dump2flows") to read a
dump file and convert it to a series of OpenFlow start/end flow
records, to provide the input data for your network monitoring tool
- You can also write a slightly different program that reads
the dump file and reduces the record for each packet into a format that
contains only timestamp, size, and flow#. This second conversion
program will be useful later to compare how much the ZOM algorithm
smooths the traffic.
- Implement the ZOM algorithm, to convert the output of your
dump2flows program into some graphs of traffic volumes on this link.
You might want to use a similar output format to MRTG as shown by this
example. (By the way, the designer of MRTG has moved on to a new
project called RRDtool,
which has some interesting features.)
- Now let's see if you can improve on the ZOM algorithm by
adding some estimate for the "invisible" traffic (i.e., traffic from
TCP sessions that have started but not yet ended). It should be quite
simple for your program to calculate the average transmission rate for
a TCP session as it runs. If you have this value, then you can add
(average rate * number invisible flows) to the current traffic rate
generated by ZOM. Note that long-lasting TCP flows must be
"elephants" (since all of the "mice" would have already ended), so a
possible enhancement would be to increase the average rate per
invisible flow as the age of that flow increases.