Tracing where it started

David Diaz <techlist@smoton.net> writes:

With the rapid onset of an attack such as the one sat morning. Models
I have show that not only would the spare capacity been utilized
quickly but that in a tiered (colored) customer system. That the lower
service level customers (lead colored, silver etc) would have had

Does your model(s) also take into account that people's capital
structure may not allow them the luxury of leaving multiple OC-X
ports wired up and sitting idle waiting for a surge?

One thing I found somewhat interesting among the "dymanic" allocation
of resources type infrastructure was the fact that my capacity
planning is on the order of weeks, while the exchanges assume
something on the order of minutes. I don't have enough capital sitting
around that I can afford to deploy and hook up a bunch of OC-x ports
to an exchange and then sit there waiting for them to be used maybe
sometimes in the future, for sure, etc etc.

So perhaps the thought of an optical exchange running out of resources
might be a bit of an overkill at this stage?

/vijay

Actually, I think that was the point of the dynamic provisioning ability. The UNI 1.0 protocol or the previous ODSI, were to allow the routers to provision their own capacity. The tests in the real world done actually worked although I still believe they are under NDA.

The point was to provision or reprovision capacity as needed. Without getting into the arguments of whether this is a good idea, the point was to "pay" for what you used when you used it. The biggest technical factor was "how the heck do you bill it."

If a customer goes from their normal OC3 ---> OC12 for 4hrs three times in a month... what do you bill them for? Do you take it down to the DS0/min level and just multiple or do you do a flat rate or a per upgrade???

The point was you could bump up on the fly as needed, capacity willing, then down. The obvious factor is having enough spare capacity in the bucket. This should not be an issue within the 4 walls of a colo. If it's a beyond the 4 walls play then there should be spare capacity available that normally serves as redundancy in the mesh.

The other interesting factor is that now you have sort of aTDMA arrangement going on( very loose analogy here). In that your day can theoretically be divided into 3 time zones.

In the zone:
8am - 4pm ----- Business users, Financial backbones etc
4pm -12am ----- Home users, DSL, Cable, Peer to Peer
12am - 8am ---- Remote backup services, forgein users etc

Some of the same capacity can be reused based on peer needs.

This sort of addressed the "how do i design my backbone" argument. Where engineers ahve to decide whether to built for peak load and provide max QoS but also the highest cost backbone; or whether to built for avg sustained utilization. This way you can theoretically get the best of both worlds. As long as the billing goes along with that.

You are right this is a future play. But though it was interesting from the perspective of what if all this technology was enabled today, what affect would the mSQL worm have had. Would some of these technologies have exacerbated the problems we saw. Trying to get better feedback on the future issues, so far some of the offline comments and perspectives have been helpful and inciteful as well as yours...

Dave

David Diaz <techlist@smoton.net> writes:

was to "pay" for what you used when you used it. The biggest
technical factor was "how the heck do you bill it."

Actually I'd think the biggest technical factor would be the trained
monkey that would sit at the switch and do OIR of line cards on the
router as appropriate and reroute patches.

If a customer goes from their normal OC3 ---> OC12 for 4hrs three
times in a month... what do you bill them for? Do you take it down to
the DS0/min level and just multiple or do you do a flat rate or a per
upgrade???

Does this include the monkey cost as the monkey switches the ports
around? (well, technically you can get software switchable oc3/oc12
ports, but substitute for 48/192 and go from there)

The point was you could bump up on the fly as needed, capacity
willing, then down. The obvious factor is having enough spare
capacity in the bucket. This should not be an issue within the 4

And the monkey. I really don't have enough capital sitting around to
leave a spare port idle for the 4 hours a day I need it.

This sort of addressed the "how do i design my backbone"
argument. Where engineers ahve to decide whether to built for peak
load and provide max QoS but also the highest cost backbone; or
whether to built for avg sustained utilization. This way you can
theoretically get the best of both worlds. As long as the billing
goes along with that.

I don't plan to be buying service from anyone who is building to
average sustained utilization (sic). My traffic tends to be bursty.

/vijay

David Diaz <techlist@smoton.net> writes:

was to "pay" for what you used when you used it. The biggest
technical factor was "how the heck do you bill it."

Actually I'd think the biggest technical factor would be the trained
monkey that would sit at the switch and do OIR of line cards on the
router as appropriate and reroute patches.

If a customer goes from their normal OC3 ---> OC12 for 4hrs three
times in a month... what do you bill them for? Do you take it down to
the DS0/min level and just multiple or do you do a flat rate or a per
upgrade???

Does this include the monkey cost as the monkey switches the ports
around? (well, technically you can get software switchable oc3/oc12
ports, but substitute for 48/192 and go from there)

No monkeys. I was referring to the protocols that people have been working on that automatically "reprovision" on the fly. The very simplistic view is (and this can be within your own network).

Router A ---> Optical box/mesh ---> Router B

Router A determines it needs to upgrade from OC3 to OC12 sends request and AUTH pwd ---> Optical mesh ----> Router B acks, says ok I do have capacity and your AUTH pwd verified ---> OC12 ---> Optical ---> OC12 ---> Router B

Actually there are different ways to do this. It goes beyond what I was asking here. But I would be happy to expand on it. You can actually on day one have a OC48 handing off 1310nm to the Optical switch. The switch could then provision OC3s, OC12s off that. The switches Im speaking of do virtual concant., so they can slice and dice the pipe. Nothing says you have to use the whole thing on day 1.

Actually that was sort of the point for a lot of people that were interested. They could have an OC48 and have 2 x OC12s off of that going to two different locations/peers. If peer numbers 3 and 4 show up at the mesh/box then it's a simple point and click to provision that as soon as they are hot.

Sidenote: As far as monkeys go. You dont need a monkey since the protocol is theoretically doing it on the fly from layer3 down to layer1. Not to mention that CNM (customer network management) exists, which allows customers to actually have READ and WRITE privs on their "owned" circuits. So your own monkeys could do it with point and click. Neat thing from using this as a wholesale carrier is the ability to actually take an OC192, sell an OC48 to a customer, have that customer sell an OC12 off of that and so on. Everyone would have their own pwd that allows them to view their circuit and those "below" them but not above etc etc. It's off topic but interesting.

My posted comment was concerning if this technology of layer3 to layer1 integration/communication would have exacerbated the mSQL worm as it might have had more ability to grab larger peering pipes.

On last thought. On the "leaving spare capacity comment. If you might mean to say that OC48 ports on your router are much more expensive then OC12, and therefore with 20 peers, buying 20 x OC48 ports when u usually use an avg of OC12 on each is cost prohibitive, I can understand that. 1) how do you do it today with those peers, since you dont like the avg sustained model. 2) what if you had 20 x OC12 ports but had 1 space OC48 port that would dynamically make layer1 connections to whichever peer needed that capacity at that moment? Forgeting the BGP config issue for the moment on the layer3 side. Would this be an improvement? Basically a hot spare OC48 that could replace any of the OC12s on the fly?

dave

Actually, I think that was the point of the dynamic provisioning ability. The UNI 1.0 protocol or the previous ODSI, were to allow the routers to provision their own capacity. The tests in the real world done actually worked although I still believe they are under NDA.

The point was to provision or reprovision capacity as needed. Without getting into the arguments of whether this is a good idea, the point was to "pay" for what you used when you used it. The biggest technical factor was "how the heck do you bill it."

If a customer goes from their normal OC3 ---> OC12 for 4hrs three times in a month... what do you bill them for? Do you take it down to the DS0/min level and just multiple or do you do a flat rate or a per upgrade???

The point was you could bump up on the fly as needed, capacity willing, then down. The obvious factor is having enough spare capacity in the bucket. This should not be an issue within the 4 walls of a colo. If it's a beyond the 4 walls play then there should be spare capacity available that normally serves as redundancy in the mesh.

The other interesting factor is that now you have sort of aTDMA arrangement going on( very loose analogy here). In that your day can theoretically be divided into 3 time zones.

In the zone:
8am - 4pm ----- Business users, Financial backbones etc
4pm -12am ----- Home users, DSL, Cable, Peer to Peer
12am - 8am ---- Remote backup services, forgein users etc

Some of the same capacity can be reused based on peer needs.

This sort of addressed the "how do i design my backbone" argument. Where engineers ahve to decide whether to built for peak load and provide max QoS but also the highest cost backbone; or whether to built for avg sustained utilization. This way you can theoretically get the best of both worlds. As long as the billing goes along with that.

You are right this is a future play. But though it was interesting from the perspective of what if all this technology was enabled today, what affect would the mSQL worm have had. Would some of these technologies have exacerbated the problems we saw. Trying to get better feedback on the future issues, so far some of the offline comments and perspectives have been helpful and inciteful as well as yours...

Well the problem with optical bandwidth on demand is that you will have to pay for the network even when it isn't being used. Basically you have three billing principles, pay per usage, pay for the service, a mix of the two. With all the models you still need to distribute the cost over bandwidth and in worst case this will end up being higher per transfered data.

- kurtis -

Well the feedback onlist and extensive offlist was great. The respondents seem to feel that because of the rapid onset of the attack, an dynamically allocated optical exchange might have exacerbated the problem. But this is also the benefits, it allows flexible bandwidth with a nonblocking backplane. So backbones with a critical event such as a webcast have the capacity they need when they need it. A common shared backplane architecture might provide a nature bottleneck. One can also see this as a possible growth problem the rest of the time.

Respondents strayed away from the specific subject of the dynamics of the Optical exchange under an mSQL type attack and went into the pros and cons. The number one topic: Billing

Billing was also the biggest challenge in implementation of the technology. Once the ability was there, and the real world tests showed this technology was actually functional. No one was exactly sure of the business algorithm to charge by. Most commentators were concerned about losing billing control. That a peer (possibly under attack) may actually cause fees to be assessed to your own backbone. It must be understood that your network must give approval for this to happen. And if you have CNM (customer network management) enabled and even running on a screen in your noc, u are aware immediately when this happens. Without that, you have your specific peer locked down to whatever size pipe you have chosen.

On the billing, it might be flat rate with the ability to "burst" to a higher sized capacity. Perhaps this is a flat rate charge, or would allow you to burst a certain amount of hours etc. No one has gotten a clear picture. The simplest answer is probably to do, as was mentioned, a similar scheme as in IP. Bill to the 95th percentile. It seems fair. Use a multiplier of DS0s per hour x $ and go with that. You might even lock it down so that at a certain $ figure, no more bursting is allowed. I do not like that kind of billing to network control, but it would seem that CFOs would demand some kind of ceiling limit.

As far as oscillation between protection scheme in different layers. This has been a problem with things like an IP over ATM network. It should not be a problem and there has been a lot of testing. It is true the possibility for thrashing is there but probably not at sub 50ms layers. We have that now over sonet private peering circuits. But even in a metro wide optical exchange scheme, the two farthest points on the mesh being ~100 miles, reroute time was 16ms. Those are the real world tests when we were testing the network as we were breaking routes.

There were some discussions of rule sets. No conclusions. filters should probably be left to the backbones with very little control at the optical layer (IX). The only rule sets might be to service levels or billing.

David