[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RRG] recent progress in topology research



Below is my review of few recent results in topology research. I
select the papers to discuss based mostly on my current interests. I
split the review into the following two parts: papers from
networking and from statistical physics.

I. Networking

1. The best SIGCOMM paper award winning
http://www.sigcomm.org/sigcomm2004/papers/p599-li.pdf is definitely
worth reading. The authors employ degree-preserving rewiring to show
that that HOT-like router-level topologies are not the maximally
random graphs in the set of simple connected graphs having a given
degree sequence. Furthermore, graphs in this set can have different
properties, so that a given degree distribution is not a constraint
restrictive enough. Note that the first experiments of this kind for
AS-level topologies were performed in
http://arxiv.org/abs/cond-mat/0205379 (but see also II.B  and II.C
below).

2. The Netdimes project http://www.netdimes.org/ implementing the
idea of traceroute@home has produced first public results in
http://arxiv.org/abs/cs.NI/0506099 The idea is to find 'hidden
tangential links', i.e. peering links between small ISPs, by having
sources of traceroute measurements placed at the periphery of the
network -- namely, at the PCs of the project participants: anyone
can download the agent and become a participant. These tangential
links can hardly be discovered by any other means. The first results
seem to confirm findings in II.A.1 below of estimates of
(in)accuracies of resource-limited traceroute-like explorations of
different network topologies.

3. http://arxiv.org/abs/cs.NI/0502085 delivers the best available
generator of simple connected random graphs with power-law degree
sequences. It improves on
http://www.cc.gatech.edu/~gantsich/TopologyGenerators/TheMCSimulationMethodForGeneratingConnectedPLRG.pdf

II. Statistical physics

A) Sampling biases of traceroute-like topology measurements

1. http://arxiv.org/abs/cs.NI/0412007 (there are also two journal
versions: for Theoretical Computer Science and Physical Review E)
essentially dismisses fears
http://www.ieee-infocom.org/2003/papers/09_01.PDF that the real
Internet topology might be qualitatively different than the measured
one. The authors employ the 'mean field' approximation (supported
with simulations) to analyze as to what degrees a wide range of the
most commonly used statistical characteristics of
traceroute-measured topologies can differ from the same
characteristics of the real topologies under probe. Roughly, the
main finding is that these differences cannot be large. In
principle, the Internet has a chance to be a classical Erdos-Renyi
(ER) random graph in reality, but then its average node degree must
be of the order of the measured maximum degree. The latter means
that the overwhelming majority of nodes in the real topology must
have degrees very close to the observed maximum degree since the
node degree distribution in ER graphs is Poissonian, narrowly
centered around its average. We know for sure that 90% of ASs in the
Internet do not have degree of around 2000, therefore the real
Internet can only be quantitatively, not qualitatively, different
from the measured one. The authors also show then that the more
fat-tailed the betweenness distribution -- the betweenness
distribution in the Internet is fat-tailed
http://www.caida.org/analysis/topology/as_topo_comparisons/ -- the
higher the quantitative accuracy of traceroute-like probing. Those
who are interested in quick reading should go directly to the
conclusions section of the paper containing a nice summary of
implications. One of them, mentioned at the very end, questions a
potential value of distributed traceroute@home-like explorations
with large numbers of sources (cf. I.2 above).

2. A more rigorous approach to the same problem yields a smaller
number of useful results: http://arxiv.org/abs/cond-mat/0503087
derives only the degree distributions seen by traceroute-like
measurements with one source only. The paper confirms that an ER
graph can produce a power-law distribution.

B) Equilibrium vs. non-equilibrium network models (aka top-down vs.
bottom-up)

There are two approaches to modeling complex networks. The
equilibrium (top-down) approach is to construct an ensemble of
static random graphs reproducing certain properties of observed
networks and then to derive their other properties by the standard
methods. The classic example of this approach is PLRG. The
non-equilibrium approach (bottom-up) tries to mimic the actual
dynamics of network growth: if this dynamics is accurately captured,
then the modeling algorithm, when let to run to produce a network of
the required size, will output the topology coinciding with the
observations. The classic examples of this approach are the
Barabasi-Albert (BA) and HOT models. In reality, the ultimate truth
is clearly behind the more ambitious non-equilibrium approach. In
practice however, the non-equilibrium approach is much more
involved, both theoretically and methodologically, and as such it
has so far generated a much smaller number of practically useful
results.

1. Equilibrium ensemble construction procedures and their analysis

a) http://arxiv.org/abs/cond-mat/0405566 might look, for a
networking community reader, too involved mathematically, but one is
encouraged to check at least Sections I and II. At the very
beginning, the authors point out that the bottom-up approach in
complex network studies is akin to kinetic theory of gases in
physics: while trying to be more realistic, it quickly becomes
over-complicated, intractable, and eventually unproductive. The
authors then discuss how the standard tools of equilibrium
statistical mechanics can be directly applied to analysis of random
graph ensembles.

b) Note that the first consistent set of coherent definitions
followed by in-depth analysis appeared much earlier in
http://arxiv.org/abs/cond-mat/0204111

c) However, the 'Polish group' has accumulated probably the largest
number of results in this area. A couple of recent highlights
include the papers analyzing properties of both equilibrium,
http://arxiv.org/abs/cond-mat/0502124 , as well as non-equilibrium,
http://arxiv.org/abs/cond-mat/0503548 , networks (the authors call
them 'homogeneous' and 'causal', respectively). Their work resulted
also in a public release of the graph generator software producing
ensembles of random graphs with prescribed expected degree sequences
http://arxiv.org/abs/cond-mat/0506330 Interestingly, all the last
papers, starting 2001, of Andre Krzywicki, who is very distinguished
http://qcd.th.u-psud.fr/page_perso/Krzywicki/ , are dedicated to the
subject of equilibrium statistical mechanics of complex networks
http://arxiv.org/find/cond-mat/1/au:+Krzywicki_A/0/1/0/all/0/1

2. HOT FKP revisited

a) http://arxiv.org/abs/cond-mat/0407643 analytically solves a
slight modification of the HOT FKP model, shows lack of 'clean'
power-laws in it because of unavoidable degree distribution
degeneracy---condensation of "stubs on hubs", which can be easily
reproduced in experiments. The authors go further to show that, in
the large network limit, power laws are necessarily absent in any
HOT/FKP model modification involving intervertex distance
optimization. The power laws can partially appear though but only in
a form of pre-asymptotic finite-size effects. That is, the situation
is almost exactly the same as with the PFP model
http://arxiv.org/abs/cs.NI/0402011 most accurately reproducing
Internet topology properties among all the non-equilibrium models.

b) The subset of the same findings (i.e., degree distribution
degeneracy) had been previously proven in
http://research.microsoft.com/~borgs/Papers/FKPdgrees.pdf

3. Hidden variables http://arxiv.org/abs/cond-mat/0306072
contribution highlights:

a) develops the recently introduced formalism of hidden variables,
which is an alternative to the BA preferential attachment: every
node is assigned a tag drawn from some distribution and connection
between node pairs are formed according to some probability function
of tag values at the nodes. These hidden variables can be related to
expected node degrees, or it can something completely different,
e.g. ISP brand name...

b) proposes an algorithm to construct maximally random graphs with
the prescribed expected degree correlations (i.e., the joint degree
distribution). http://arxiv.org/abs/cond-mat/0308336v1 independently
proposes the same algorithm.

c) demonstrates that non-equilibrium growing network models can be
*formally* mapped to equilibrium models with one of the hidden
variables being... time!

C) Correlations

All real networks, including the Internet, are growing, and all
growing networks necessarily exhibit various types of correlations
which significantly complicate topology analysis. One of the central
problem in this area is to try to differentiate between 'forced' and
'natural' correlations: forced correlations are those due to
constraints imposed by a combination of other network properties
(e.g., degree correlations due to network's having a given degree
sequence, being simple and connected), while natural correlations
are those appearing in addition to forced ones, 'inside' the forced
constraints. Thus, ability to differentiate between forced and
natural correlations is critical for these two purposes: 1) only the
natural correlations can tell us how 'far away' a real network is
from its maximally random equilibrium counterpart respecting the
given set of forced constraints (cf. II.B and I.1 above); and 2)
only the natural correlation can provide some additional information
about the actual laws of the network evolution.

1. http://arxiv.org/abs/cond-mat/0408110 and somewhat earlier
http://arxiv.org/abs/cond-mat/0303327 address the problem of to
which extent certain forms of degree distribution force degree
correlations. The first paper even suggests an algorithm producing
uncorrelated networks even if the original degree distribution
necessarily forces degree correlations. Of course, the resulting
degree distribution is slightly different from the original, but
this difference is minimized.

2. http://arxiv.org/abs/cond-mat/0409686 suggests another definition
of local clustering decoupled from node degree correlations. The
definition seems to be useful, although it is not commonly accepted.

3. http://arxiv.org/abs/cond-mat/0308444 addresses the problem of to
which extent degree correlations force clustering. Surprisingly, we
have recently found the formulas from this paper, even though
derived for pseudographs, are in an extremely good agreement with
observation of clustering in maximally random simple connected
graphs with the joint degree distributions observed in the AS-level
Internet topologies.

D) Distance distributions

The performance parameters (such as stretch) of compact routing
algorithms depend mostly on (if not only on) the specifics of the
graph's distance distribution. Therefore, knowledge of analytical
properties of distance distributions in scale-free networks are
critical for moving forward with providing any analytic estimates of
the performance of compact routing applied to these networks.

1. http://arxiv.org/abs/cond-mat/0411160 empirically discovers that
the average distance between nodes of degree k and k' is
~a-b\log(k*k') in a *very wide* class of graphs. The authors also
try to provide some preliminary explanation of the discovered
effect.

2. http://arxiv.org/abs/cond-mat/0407098 uses random walks to
calculate distance distribution in uncorrelated networks. The
results closely match simulations.

3. http://arxiv.org/abs/cond-mat/0411526 is the first paper to
analytically derive the average distance from a node of degree k in
*correlated* networks. The results are obtained only for
deterministic network models though.
--
dima.
http://www.caida.org/~dima/



--
to unsubscribe send a message to rrg-request@psg.com with the
word 'unsubscribe' in a single line as the message text body.
archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg