METHODOLOGY |
This project is among the first to study the Internet topology and its effect on traffic. The study of the topology of the Internet has been neglected by the community. The PIs have revived the interest in this area with their pioneer work [Faloutsos et al. 1999]. They showed that despite its apparent randomness, the Internet topology follows some surprisingly simple laws. Namely, topological properties can be described by power-laws. As a result, the topology can be characterized concisely using simple numbers: the exponents of these power-laws. Having such an accurate topological model will help us discover how the topology affects the Internet traffic. The benefit from this work will be that researchers will be able to: a) use simulation results to speculate the performance of the network, b) design efficient protocols, and c) troubleshoot anomalous behavior.
Motivation. Modeling the topology of the Internet is a very challenging task due to its size and its constant growth. The Internet is a large collection of heterogeneous components interconnected in an arbitrary way. There have been several measurement studies of its topology, but there does not exist a satisfactory model of the topology. The absence of this knowledge is one of the reasons "Why we don't know how to model the Internet" [Paxson and Floyd, 1997]. An important part of the problem is the inability to generate realistic graphs for simulation purposes. Furthermore, the interpretation of the results, measured or simulated, becomes more difficult when the topology is not considered.
The scope of this project is twofold. The first part will analyze the Internet topology and derive simple yet accurate models. This part will provide the models that should be used for fast and realistic simulations. In addition, it will provide an understanding of the evolution of the network and enable predictions. The second part will quantify the effect of the topology on the traffic and the end-to-end performance. The analysis of traffic data is a challenging data-mining task due to each volume and intricate interdependencies. The work will provide mechanisms and tools to to interpret the observed network behavior.
The initial work has laid the groundwork for a systematic understanding of the topology. However, there are still a lot of open questions that this project addresses.
Topology Analysis and Generation of Realistic Graphs:
This part of the work develops an Internet model and methods and tools
to generate realistic graphs for simulation purposes. Despite the
recent discoveries, there does not exist a complete comprehensive
model for the Internet topology yet. We do not have a set of
properties that could act as a necessary and sufficient condition of
the realism of a model. We need to develop a comprehensive model and
then devise methods to generate realistic graphs. The evolution of
the topology can also provides significant intuition on the nature of
the network and it can help estimate and predict future of the
Internet. Note that our study uses the extensive measured topological
data collected by NLANR, which is the data repository with the longest
time span available.
The Effect of Topology on Traffic:
This project studies the effect of topology on traffic aspects such as
congestion and end-to-end delay. There is very limited work on traffic
analysis in conjunction with topological properties. The work has two
components: a) we need to study the traffic, and b) identify the
effect of topology. In the first part, we need to collect the
appropriate data and develop models for its behavior. In this part,
the work will be leveraged from the work of the other research
participants. In the second part, we need to identify which of the
traffic properties are the result of topology. For both parts, the
size and the type of the data requires novel techniques of low
complexity. One of the co-PIs is an expert in data-mining and brings a
large array of novel techniques. He has developed a novel technique to
approximate the density of the network (neighborhood size), which is
computationally very expensive.