[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

More comments on architecture document, sect 4



draft-green-cdnp-gen-arch-01.txt, Section 4:

The overview of distribution talks about the process of publishing CONTENT
from an ORIGIN to the SURROGATES of a single CDN. Once again, we have the
question of whether or not this is peering -- there's no second CDN in
sight. Do we really want to say that every PUBLISHER putting content into a
CDN is dealing with a peering process, and that there is a DISTRIBUTION CPG
on each side?  That seems kind of heavy and complex to me.  It's also
inconsistent with other parts of the documents, which imply that there is an
interface between ORIGIN and DISTRIBUTION SYSTEM that involves no peering.

An exchange facility consisting of co-located DISTRIBUTION CPGs seems
interesting and important enough to merit more than a single sentence. We
should say a little more about whether we want to support such a deployment,
and perhaps why or why not.  (Of course, first we'd have to agree on those
things).

> Replication advertisement may take place in a layer 5 model similar
> to the way BGP is used today at layer 3. DISTRIBUTION CPGs could
> take care of exterior content replication between content providers
> and CDNs, while at the same time performing content replication
> interior to their networks in an independent manner. If this model
> is used then the internal structure of the networks is hidden and
> the only knowledge of other networks is the locations of
> DISTRIBUTION CPGs.

I find the layer-5/layer-3 etc. stuff mostly unhelpful. It seems to me that
this paragraph is saying the same thing several times, i.e. that we're tying
together black-box networks at the CPGs.  This has already been said several
times before this in slightly different ways.

Perhaps this model can be abstracted out to earlier in the document, and
then we could refer back to that description for all three kinds of peering
(since it does seem to me that the same basic constraints apply in each
case).

Should "advertisement" above be ADVERTISEMENT?

> Hierarchical caching, where
> SURROGATEs, upon getting a cache miss, retrieve CONTENT from a cache
> higher up the chain, represents the pull model.

I'm not comfortable with bringing hierarchical caching in at this point. It
seems to me that push vs. pull is something that can be explained, and then
there are scaling techniques involving hierarchies that can be applied to
either push or pull.

My previous experiences suggest that push vs. pull is fairly tricky to
define accurately, so we may want to invest a little more in pinning down
what we mean (or why we care).

> On one hand it may be
> possible to do this transparently with no DISTRIBUTION CPGs on the
> transit network. On the other hand it may be desirable for the
> transit network to have DISTRIBUTION CPGs.

I don't really know what the "transit network" is at this point, although
I'd guess that it's intended to be the "network between".

Even with that guess, I basically don't know what this pair of sentences is
trying to tell me.

> Replication of CONTINUOUS MEDIA takes place in a different model
> from content which has a fixed length CONTENT DATA UNIT, especially
> in the case of live streaming data. Replication in this case
> typically takes the form of splitting the live streaming data at
> various points in the network.

This explains why "live" is different, but not why CONTINUOUS MEDIA is
different. And I didn't think that CONTINUOUS MEDIA was distinguished by
having a variable-length CONTENT DATA UNIT, I thought it was distinguished
by having a constrained playout time.  So I'm confused at this point.

> In the CDN peering system
> DISTRIBUTION CPGs could perform this function. In this sense the
> collection of DISTRIBUTION CPGs would constitute an application
> layer multicast overlay network.

Much additional explanation required before this makes sense, I suspect.
The antecedent of "this function" is probably "splitting" but it's not
completely obvious.

> The three main components of distribution are replication, signaling
> and advertising. Each of these is utilized between DISTRIBUTION CPGs
> belonging to content providers and CDNs.  They may also be used
> between CDNs.

I find the introduction of DISTRIBUTION CPGs here to be somewhat confusing.
It seems like the point is to explain the high-level connections and roles,
in which case the point to be made is that these activities can take place
either between a PUBLISHER and a CDN or between one CDN and another CDN.
CPGs seem to belong at the next level of more detailed explanation.

> The final goal of replication involves moving the content from an
> ORIGIN server to SURROGATE delivery servers. The immediate goal in
> CDN peering is moving the content between DISTRIBUTION CPGs.

I think this distinction betwen the "overall goal" and the "immediate goal"
is a good one, and we should have a fairly exact parallel in the paragraphs
for signalling and advertisement that follow this one.

One thing that's unclear to me is whether we believe that there will be a
single distribution-peering-protocol spoken between DISTRIBUTION CPGs, or
whether there are distinct replication, signalling, and advertisement
protocols.  I think we need to tighten up our position on that, even if what
we say is that it's an issue for the WG to decide.

> Specific problems in content signaling needing further investigation
> include:

Missing from the list is the fundamental question of what kind of content
signal is supported and what kind is not. We need some more careful
definition of what we think can be put into a content signal beyond the
example of freshness.

> 1.  How do we represent a collection of meta-data in a concise and
> compressed manner?

Oddly, this question is not asked about signalling, even though it's
signalling where the term "meta-data" is used earlier. We need to be
consistent about whether both signalling and advertisement use meta-data and
whether they share representation issues (and why or why not).

> 4.  How distributed of an approach should be used for this problem?

It's not clear what this question means.  Probably we need to have more
explanatory text about the possible approaches.

> 5.  How do we prevent looping?

It's not clear what this question means. We need to have at least some
explanation of how loops could arise and why they would cause problems.

> It is possible that when fetching content as opposed to
> pushing content that sessions between replication peering systems
> may be directed by the redirection system.

This seems convoluted.  Maybe we could make this a couple of sentences and
explain the issue or concern a little more.  I'm also not completely
convinced that this is a "requirement," -- it seems sort of like a "problem"
but I may be misunderstanding it.

> 3. Scalable distribution of the signals on a large scale.

I think that we have to start being quantitative about what we mean by
"large scale".  Are we thinking in terms of the number of content items, the
number of networks, or the number of signals?  For each of those, how big is
"large scale"?

> 3.  A well-known state machine.

As a requirement, this is a little obscure and needs some more explanation.

> 4.  Use of TCP or SCTP (because soft-state protocols will not scale).

There's not nearly enough information provided to support this statement. It
might be true, but it's impossible to tell from what's currently in the
document. The claim just sort of parachutes into the document at this point.

> 5.  Well-known error codes to diagnose protocols between different
> networks.

Hard to tell what this means and whether a given design is achieving it or
not.

> 6.  Capability negotiation.

What sort of capabilities?  This seems to be coming out of the blue again.

> 7.  Ability to represent policy.

What sort of policy?  Again, nothing previous has prepared us for this
requirement.