[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Shim6 failure recovery after garbage collection



:: One of the more comprehensible objections to shim6 that was raised at NANOG
:: 35 was from large content providers who currently serve many thousands of
:: simultaneous clients through load balancers or other content-aggregation
:: devices (the kind of devices which switch connections to origin servers
:: without having to store any locally).
:: 
:: I don't remember the precise number of simultaneous sessions the devices were
:: intended to be capable of serving, but it was a lot.
:: 
:: The observation was that with the amount of (server, client) state being held
:: on those devices, adding what might be an average of (say) 2x128 bits + misc
:: overhead per session might present scaling difficulties.

A single WSM-6 Foundry SI450 can handle 15M sessions in the state machine. 
Assuming an overhead of say, 320 bits per session * 15M sessions we come 
up with approx 600MB of extra RAM added to those devices (and that's on 
the low side). Multiply out that a large content provider would have 
*hundreds* of these devices, it's not a small cost (depending on 
the vendor, that memory is not general purpose DRAM, and could be *very* 
expensive). Now, that is only extra memory to do nothing but hold other 
locators.

On the web server side, it's not uneard of for a single webserver to 
handle 10-20k active, concurrent connections, with another 20k or so 
being in various *_WAIT states. Adding an extra 40byte per session 
overhead per server is really not that bad (800kb of RAM/server), 
although I have no idea what that overhead does to the kernel queues...

The thing is, those numbers are only taking into account holding 
*locators*, and when you start talking about holding onto other things 
(like, say reachability state, performance (RTT) state) the memory 
utilizations starts to increase slightly more, although still manageble 
for the servers (but is rapidly getting more and more expensive on the 
SLB's). When you then start talking about now holding some sort of a TE 
state (because TE is a requirement), and you need to add the routing 
table into the equation, *now* it's gets down right nasty. 10-20Mb per 
server for shim6 overhead is minor, but add in 200+Mb of routing state, 
and it's a non-starter.

Also, all of this conversation is only talking about memory overhead, what 
about other overhead? Would the server have to do any sort of failure 
detection, and how many cycles would that consume? Would the server have 
to do any sort of path optimization, and how many cycles would that 
consume? How do I get TE state to all of my 100k+ host? How many 
cycles would all the hosts need to consume if one of my peers 
bounces, and now, instead of 10-20 routers processing that, all 100k 
of my hosts have to be updated with that information? etc... 

-igor