Due:
11:59pm
Sunday December 14, 2008
The project will be a non-trivial
original
piece of work, which is produced by an individual student or by 2
students working together as a team. The project could be structured in
any of
the following ways:
The end result should be at the
level of a
decent workshop paper. I will offer several ideas for projects related
to
traffic characterization issues we have been covering in lecture.
However, you are welcome to select a topic of your own choosing if you
prefer.
Please let me know ASAP when you have chosen a topic so I can offer
suggestions
on how to proceed before you (attempt to) do all the work.
Develop a mechanism for dynamically
allocating Ethernet MAC addresses to
solve the growing problem of (mis)use where large numbers of MAC
addresses
assigned to virtual devices or other software-based entities. (For now,
let's call this DMATS == Dynamic MAC Address Translation System until
someone comes up with a better acronym.)
Each individual piece of equipment connected to an IEEE 802.3 Ethernet
or IEEE
802.11 WiFi network is assigned a unique 48-bit MAC address by its
manufacturer. In this way, all 802 LANs world-wide allow
"plug-and-play" connection of any compatible piece of equipment
without configuring any of the equipment to prevent address conflicts.
(In contrast
to this globally-unique addressing approach, USB networks, for example,
use
7-bit locally-assigned addresses, so each USB device must execute an
address
assignment protocol whenever it joins any network, and no USB network
can support
more than 127 devices no matter how hard you work at the
configuration.)
Although the number of distinct 48-bit MAC addresses is very large
compared to
the world-wide total number of physical devices equipped with Ethernet
interfaces that is likely to be manufactured within the foreseeable
future, it
is clearly not
sufficient
to provide a unique identifier to every logical entity in large
software-based
systems, such as individual transactions handled by credit-card
processing centers,
individual files (or disk blocks) on network-accessible file servers,
etc.
Therefore, the IEEE
has established
the following rules for using Ethernet MAC addresses [with emphasis
added
by me]:
The use
of 48-bit identifiers has been extended to serve as protocol
identifiers
to identify protocol designs and design revisions of protocols operating between instances of physical
equipment,
where there are expected to be far fewer such protocols identified than
there
are items of addressable physical equipment.
With the
exception of such protocol identifiers, EUI-48 identifiers are intended
to
identify items of real physical equipment or parts of such
equipment such
as separable subsystems or individually addressable ports. The expected
use
should not exceed one EUI-48 identifier per hardware subsystem or at most a very low number of EUI-48
identifier per
physical instances of such equipment (e.g. groups of ports as in IEEE
Std
802.3ad, for link aggregation). Allocation of a single EUI-48 bit
identifier to
identify or permit addressing of a fixed and permanent function
associated with
a real item of physical equipment occurs for the lifetime of that
equipment or
an indefinite period of use.
In
particular any application that called for subdivision of the available
number
space, for block allocation to physical equipment without an
identifiable
physical instance per EUI-48 identifier, or for encoding functional
capabilities within significant bits or bit patterns of the identifier,
has the
potential to rapidly exhaust the address space. To reduce the prospect of exhaustion, new
applications and proposed extensions to current applications with
significant
volume expectations are STRONGLY encouraged to make use of EUI-64,
rather than
the EUI-48, to identify hardware instances.
Unfortunately, several rapidly-growing
application
areas have recently appeared, in which a single physical device
consumes large
numbers of Ethernet MAC addresses, and in some cases those addresses
are used as
an encoding to request specific functional capabilities from the
underlying
device—both of which are specifically excluded by the rules established
by the
IEEE. For example, companies like VMware are aggressively promoting
server
virtualization as a means to simplify Systems Administration, by
partitioning complex
multipurpose monolithic servers into a large number of simple,
single-purpose
application servers – and each of those virtual servers is just a
software
image executing on top of a virtual machine emulator hosted by some
large physical
computer(s). In this case, each pre-configured virtual server needs a
unique MAC
address, since outside applications must be able to communicate with
the appropriate
server even if it is just a software instance. Moreover, each virtual
server
could be suspended or resume execution at any time (without a “reboot”
of its
local copy of the operating system) and we do not know in advance which
one(s)
will be executing at the same time. In addition, the underlying
physical
computer must have separate MAC addresses for each of its actual
network interface
cards, and provide a virtual bridge for directing network traffic
to/from the appropriate
virtual server. In a different example, the designer of a distributed
packet-filtering
gateway wanted the flexibility to shift the traffic, one VLAN at a
time, from
one physical cluster-member to another– which required the gateway
cluster to
have a pool of virtual MAC address large enough to accommodate all 4096
possible VLAN identifiers multiplied by the total number of physical
network
cards belonging to all cluster members!
Clearly, if large scale deployment of
MAC-address
intensive applications such as these are to be supported, some method
must be developed
to allow them to work within a finite pool of MAC addresses that may be
reused
in disjoint locations. (Think about the local-use IP addresses of the
form 10.x.y.z and 192.168.x.y, which can be used by anyone inside a
private network as long as they are
kept off the global Internet.) Therefore, let’s focus on a method for
dynamic
MAC address allocation for the virtual server environment.
II.a Virtualization Example
Let C be
the physical computer that is hosting the virtualization software, and
let v[1],
v[2], ...., v[n] be the set
of
virtual hosts that are being emulated on top of C. Clearly, physical computer C has a physical network card with
pre-assigned MAC address
M(C), and also an IP
address I(C) for Internet
access. Note that I(C)
could have been statically assigned to C, or dynamically assigned through a request
to the
DHCP server, say D, which
is responsible
for the C’s domain.
Currently, the software image for virtual
machine v[k] must be
pre-configured with its own unique MAC
address M(v[k]), but v[k] can wait until it starts executing to
request a
dynamically-assigned IP address from D. In this case, v[k] sends a DHCP request
packet to D, with M(v[k]) as the MAC layer return address, so that D knows where to send the response and who
just
acquired the lease for I(v[k]).
Note, however, that v[k]
doesn't
have an physical network card, so the virtualization software on C needs to emulate a layer-2 virtual bridge,
say B, with separate
ports for each of its active virtual
machines. Without loss of
generality, let's use the same index “k” to identify a particular virtual machine
and its associated port B[k] on the virtual bridge. Therefore, we
recognize that
the virtualization software on C
already has at least one locally-unique identifier for each virtual
machine
that has nothing to do with its pre-configured MAC address. Now let’s
consider
some possible approaches for eliminating the need to pre-configure each
instance of a virtual machine with a unique MAC address M(v[k]). After discussing this concept with experts
in virtual
machine technology, they expect to be able to checkpoint the image of
any
virtual machine AFTER it has gone through the initial reboot and
configuration
and is already running its normal workload, and later resume execution
after an
arbitrary delay.
Therefore, a solution that requires the
operating system of the virtual machine to go through a full reboot and
discovery of the current hardware configuration each time the
virtualization software on C decides to start running v[k] is not acceptable. However, you may assume that the
hosts requesting a MAC address have a full-duplex connection, rather than shared access through a
broadcast medium, so the location of the requestor can be uniquely
determined down to the level of a single port on the closest switch. |
II.b Approach 1: Adding Dynamic MAC
address allocation to DHCP, the existing
protocol for Dynamic IP address allocation
DHCP servers for distributing IP-layer
network
addresses are already a standard feature in most networks. Why not just
add the
capability of distributing MAC-layer address / IP address pairs to an
existing DHCP
server such as D? In this
case,
when the virtualization software on C
wants to start running one of its virtual hosts v[k], some entity on C would simply generate a DHCP lease request
for an address pair on
behalf of v[k]. The DHCP
server D would respond by
offering to lease the address pair to
physical host C for the
purpose of
“sub-leasing” them to its virtual machine v[k]. Since D currently
maintains a log of all current leases as a table of MAC-address / IP
address
pairs, the structure of this log would need to be expanded from pairs
to triples,
using the combination of C's
unique MAC address and the sub-lease index k to represent the “key” for determining the
current
lease holder for a MAC / IP address pair. (However, the primary lease
of I(C) for C’s
own use, based on its physical MAC address M(C) would continue as usual.) The big open
question in
this approach is which entity on C
should be responsible for generating the DHCP lease request for the
address
pair needed by virtual machine v[k]?
II.c Approach
2: Adding a Dynamic MAC address
allocation function to the Bridges
The alternative is to pre-configure all
virtual
machines running on C with the
same fixed MAC address. Any conveniently-available
address will do, but I suggest that either M0 (some global dummy constant), or
M(C) from C's actual network card seems to
be a good choice. Thereafter, we give the responsibility for giving v[k] its
own MAC address the virtual bridge, B.
Assuming virtual machines can use
ordinary DHCP to get themselves a dynamically assigned IP address
(which is
really critical, or else the whole concept falls apart immediately!!)
then as
soon as v[k] starts executing
it will discover that its current lease for an IP
address (if it has one) has expired and requests a replacement in the
usual
way. However, the virtual bridge B
intercepts the DCHP dialog between v[k]
and
D to modify the request to
include the newly-minted option to get an address
pair, and to later on to delete the MAC address M(v[k]) from the response –
while allowing the IP address IP(v[k])
to reach the virtual machine as usual.
Thereafter, the virtual bridge B does layer-2 Network
Address Translation (NAT) for each of its active virtual machines,
Normally,
Network Address Translation is done in layer-3 routers to allow traffic
from
multiple hosts, each of which has a locally-assigned address on a
private
subnet, to share a single global IP address on the "public"
Internet. In this situation, however, I'm suggesting that NAT operates
in the
reverse direction, so that traffic belonging to multiple virtual
machines, all
of which are pre-configured to believe that they are the unique owner
of the
same MAC address, is demultiplexed by the layer-2 NAT bridge onto a
distinct
MAC address for each virtual machine when it reaches the "public"
Enterprise LAN domain. Thus:
II.d Approach
3: Handle the Dynamic MAC address
allocation function entirely within the Bridges, using Spanning Tree to
decide who is in charge and when to allocate a MAC address (or block of
MAC addresses) to a specific Bridge.
Starting from the previous approach, why not let the
Layer-2 Bridges run their own MAC address allocation function that is
not connected to DHCP in any way (although the design of the protocol
might be copied if that makes sense). In this case virtual bridge B, or any other physical bridge P, could apply layer-2 NAT to the
traffic coming from any port if the source/destination of that traffic
is the reserved dummy MAC address. Furthermore, the bridge could
generalize this scheme to include all MAC
addresses that have the "local assignment bit" set but the value of the
address does not appear to fit with the structured dynamic address
assignment plan for the network domain under the control of the set of
bridges running spanning tree. In other words, if a random node tries
to connect to this network using a randomly-chosen identity from the
range of locally-assigned MAC addresses, the bridge it encounters, say P, will use layer 2 NAT to move it
to a different locally-assigned MAC address that has been granted by
the spanning tree root to P for
its use.
III. Alternate
Project Topic suggestions
III.a
Is there a method for determining who is
actively using Bogon IP addresses?
According to Internet hacker
tradition, bogons
were the
unit of measurement for the "bogosity" (absurdity, impossibility,
etc) of some entity or situation. Thus, a bogon is an IP packet found
traveling
on the public Internet that claims (through its source address field)
to have
originated from a network number that is currently unallocated and
reserved for
future use and hence not available for legitmiate use by any host.
(Note the
distinction from a martian
-- or
"packet from Mars" -- which carries legitimate addresses but was
found traveling through some location that should have been impossible
to
reach.) Normally, bogons are used to hide the identity of the guilty
party in
denial-of-service attacks, where the sender does not need or want to
receive a
response from the victim for its malicious request. However, recent
studies of
Peer-to-Peer file sharing patterns shows that a substantial number
of users
are successfully completing the file exchanges through a connection
hidden
behind a bogon IP address. In the early days of the Internet (when
memory was
expensive and before the widespread adoption of BGP), many core IP
routers had
incomplete routing tables, and relied on a "default" route to a
more-knowledgable and/or more-centrally located neighbor to deal with
messages
addressed to an unknown destination address. Obviously, it was
important for
the directed graph of all routers' default routes to be acyclic to
prevent
looping among lost packets! Moreover, a dishonest network administrator
could
quietly start using bogon IPs for sending traffic to any destination
whose
default route would eventually direct any replies back to her network.
However,
BGP route updates allow core IP routers to generate a complete enough
routing
table to function without the need for "default" routes, and many
routers have a policy of applying "bogon filtering" to their traffic.
In this case, how are P2P users able to get replies sent to their bogon
IP
address? What tools might allow us to track down the location of these
bogon IP
users?
III.b
Use packet traces to estimate the
packet waiting times inside the router by analysing the interdeparture
times in
trace files representing traffic leaving the router.
The last part of
Assignment 2
provides an introduction to this topic. Given only a sequence of packet departures from a port
on a
router (or other packet switch), what can we say about the queueing
delays
experienced by those packets inside the router? Clearly, if the link
was idle
just before a particular "tagged" packet was transmitted over the
output link, then it didn't need to wait in the queue before being
transmitted.
Conversely, if it was the Jth
packet sent during a particular link busy period, then its waiting time
can be
no larger than the time from the start of that busy period until its
own
transmission. I mentioned in class, queue inferencing is a more general
technique for estimating the waiting times that customers experience in
a queue
by looking only at the sequence of departures. The difference here is
that queue
inferencing calculates an estimate for the arrival time of each packet
served
in a link busy period, rather than just assuming the worst-case in
which they
all arrived at the beginning. The technique originated in operations
research
literature, with application to placement of automated teller machines
by banks
who only had access to the log of customer transactions at the ATM. It
was
later adapted to network monitoring by Manjunath and myself in this
paper. The implementation of the algorithm is a bit messy, but it
could be
applied to any network trace, such as the ones at the Wide Project. If you are
interested in this, I would be happy to talk to you off line.
III.d Investigate the timestamping problem in SONET traces
Recall from problem set 2 that inter-packet spacings in the Caida
traces were difficult to determine because of the periodic insertion of
260 bytes of SONET overhead. Since the SONET frame length has a fixed
duration, why not break the data into fixed length segments equal to
the SONET frame length and try to identify a common location within all
those segments where the interpacket spacing is always large enough to
allow room for the SONET overhead? Since 260 bytes occupies an
OC-48 link for approximately 0.5 microseconds, these gaps will be
difficult to identify without looking at a lot of data. On the other
hand, life will be much easier if we can find some trace files with
more than 6 decimal digits of precision in the timestamps.
III.e Something of
your own choosing, that is relevant to the subjects covered in this
course.
Email your topic suggestions to the instructor for approval before
getting started.