What Worked - What Didn't

Sean_Donelan · September 17, 2001, 4:52pm

As the New York Stock market re-opens, and some things are returning
to normal, I'd like to look at how well the Internet performed last
week.

At the Oakland NANOG I'd like to give a presentation about what worked,
and what didn't work during the last week in regards to the Internet. I
would like to gather what details I can from both small and large
providers in New York, the rest of the USA, and even overseas about
what they saw, what problems they experienced, and what things worked.

You can send me private mail if you wish, with or without attribution.
This is a personal effort, not assocated with my employer.

Oakland NANOG is several weeks away, so I don't expect an immediate
response. I expect many ISPs will be conducting their own internal
reviews. But if you could, please consider responding. I'm looking
for input from small, medium and large providers. Thank you.

A few questions, all related to the time between Sept 11 and 17:

1. Briefly tell me who you are, and generally where your operations
were located?

2. What worked?

3. What didn't work?

4. Did you activate your emergency response plan?

5. Were you required to do anything different operationally? Did you
make preventive operational changes?

6. Were any infrastructure administration functions impaired, such
as DNS registration, routing registry, address delegation?

7. Were you able to communicate NOC-to-NOC when needed?

8. Were any means of communications nonfunctional or impaired (direct
dial telephone, toll-free telephone, pager, e-mail, fax) when you
attempted to communicate with other NOC's?

9. Did you ask for or receive a request for mutual aid from any other
providers? Was it provided?

10. Within the limits of safety and rescue efforts, where you able to
gain access to your physical facilities?

11. Did hoaxes or rumors impact your operations?

12. Do you have any recommendations how Internet providers could have
responded differently?

Marshall_Eubanks1 · September 17, 2001, 5:17pm

Sean Donelan wrote:

As the New York Stock market re-opens, and some things are returning
to normal, I'd like to look at how well the Internet performed last
week.

At the Oakland NANOG I'd like to give a presentation about what worked,
and what didn't work during the last week in regards to the Internet. I
would like to gather what details I can from both small and large
providers in New York, the rest of the USA, and even overseas about
what they saw, what problems they experienced, and what things worked.

You can send me private mail if you wish, with or without attribution.
This is a personal effort, not assocated with my employer.

Oakland NANOG is several weeks away, so I don't expect an immediate
response. I expect many ISPs will be conducting their own internal
reviews. But if you could, please consider responding. I'm looking
for input from small, medium and large providers. Thank you.

A few questions, all related to the time between Sept 11 and 17:

1. Briefly tell me who you are, and generally where your operations
    were located?

2. What worked?

3. What didn't work?

4. Did you activate your emergency response plan?

5. Were you required to do anything different operationally? Did you
    make preventive operational changes?

6. Were any infrastructure administration functions impaired, such
    as DNS registration, routing registry, address delegation?

7. Were you able to communicate NOC-to-NOC when needed?

8. Were any means of communications nonfunctional or impaired (direct
    dial telephone, toll-free telephone, pager, e-mail, fax) when you
    attempted to communicate with other NOC's?

9. Did you ask for or receive a request for mutual aid from any other
    providers? Was it provided?

10. Within the limits of safety and rescue efforts, where you able to
    gain access to your physical facilities?

11. Did hoaxes or rumors impact your operations?

12. Do you have any recommendations how Internet providers could have
    responded differently?

Sean;

Multicasting worked. It handled a big traffic spike without a hiccup.

Regards
Marshall Eubanks

T.M. Eubanks
Multicast Technologies, Inc
10301 Democracy Lane, Suite 410
Fairfax, Virginia 22030
Phone : 703-293-9624 Fax : 703-293-9609
e-mail : tme@multicasttech.com
http://www.on-the-i.com

Test your network for multicast : http://www.multicasttech.com/mt/
Check the status of multicast in real time :
multicasttech.com - Diese Website steht zum Verkauf! - Informationen zum Thema multicasttech.

Daniel_Golding6 · September 17, 2001, 5:39pm

The big lessons seem to be these...

1) The Internet, as currently constituted makes a lousy news propagation
method, for large audiences. The one to many model in unicast IP puts too
large of a load on the source. Good multicast (which we don't have yet) may
fix this. Until that happens, the TV is still a better broadcast news
medium. Mechanisms like Akamai's Edgesuite are a pretty good solution until
that occurs, as they distribute the load pattern, from a "one to many" to a
"many to many" model.

2) The Internet is superior to circuit switched services for one to one
communications during this sort of condition. Fast busies were the order of
the day in NYC and DC for the PSTN and cell phone networks. Instant
Messanger services, IRC and email were more reliable than the telephone
network by several orders of magnitude.

3) Since the transient from normal conditions was server-limited, there were
not any significant network congestion issues. The next time a major event
like this happens (and, of course, there will be a next time), news sites
may be better prepared, which could cause the next transient from normal
conditions to be network-limited.

The big winners were cable TV, email, packet networks and IM applications.
The big losers with cell phones, circuit switching, PSTN, non-akamized news
sites.

(My apologies if this post if perceived to be on-topic, operational, or has
anything to do with internetworking. We will now return to our regularly
scheduled, off-topic posts)

- Daniel Golding
Sockeye Networks

Miles_Fidelman · September 17, 2001, 5:46pm

one comment on this: email-based news seemed to work VERY well - both very
focused news (such as operational material on nanog), and more general
news (I found CNNs "breaking news" email list to be very informative - in
fact, I first heard about the initial airliner crash via that list)

Miles

Marshall_Eubanks1 · September 17, 2001, 5:49pm

Daniel Golding wrote:

The big lessons seem to be these...

1) The Internet, as currently constituted makes a lousy news propagation
method, for large audiences. The one to many model in unicast IP puts too
large of a load on the source. Good multicast (which we don't have yet) may
fix this. Until that happens, the TV is still a better broadcast news
medium. Mechanisms like Akamai's Edgesuite are a pretty good solution until
that occurs, as they distribute the load pattern, from a "one to many" to a
"many to many" model.

Akamai did not work well Tuesday morning, at least for me. I do not know whether their servers
were overloaded, or couldn't get content from the source, but they did NOT work
well as seen from here.

Washington Post.com, for example, loaded ONCE for me before about 3:00 PM EDT, and I
know that site is Akamized.

Contrarily Yours
Marshall Eubanks

2) The Internet is superior to circuit switched services for one to one
communications during this sort of condition. Fast busies were the order of
the day in NYC and DC for the PSTN and cell phone networks. Instant
Messanger services, IRC and email were more reliable than the telephone
network by several orders of magnitude.

3) Since the transient from normal conditions was server-limited, there were
not any significant network congestion issues. The next time a major event
like this happens (and, of course, there will be a next time), news sites
may be better prepared, which could cause the next transient from normal
conditions to be network-limited.

The big winners were cable TV, email, packet networks and IM applications.
The big losers with cell phones, circuit switching, PSTN, non-akamized news
sites.

(My apologies if this post if perceived to be on-topic, operational, or has
anything to do with internetworking. We will now return to our regularly
scheduled, off-topic posts)

- Daniel Golding
  Sockeye Networks

From: owner-nanog@merit.edu [mailto:owner-nanog@merit.edu]On Behalf Of
Marshall Eubanks
Sent: Monday, September 17, 2001 1:17 PM
To: Sean Donelan
Cc: nanog@merit.edu
Subject: Re: What Worked - What Didn't

Sean Donelan wrote:

> As the New York Stock market re-opens, and some things are returning
> to normal, I'd like to look at how well the Internet performed last
> week.
>
> At the Oakland NANOG I'd like to give a presentation about what worked,
> and what didn't work during the last week in regards to the Internet. I
> would like to gather what details I can from both small and large
> providers in New York, the rest of the USA, and even overseas about
> what they saw, what problems they experienced, and what things worked.
>
> You can send me private mail if you wish, with or without attribution.
> This is a personal effort, not assocated with my employer.
>
> Oakland NANOG is several weeks away, so I don't expect an immediate
> response. I expect many ISPs will be conducting their own internal
> reviews. But if you could, please consider responding. I'm looking
> for input from small, medium and large providers. Thank you.
>
> A few questions, all related to the time between Sept 11 and 17:
>
> 1. Briefly tell me who you are, and generally where your operations
> were located?
>
> 2. What worked?
>
> 3. What didn't work?
>
> 4. Did you activate your emergency response plan?
>
> 5. Were you required to do anything different operationally? Did you
> make preventive operational changes?
>
> 6. Were any infrastructure administration functions impaired, such
> as DNS registration, routing registry, address delegation?
>
> 7. Were you able to communicate NOC-to-NOC when needed?
>
> 8. Were any means of communications nonfunctional or impaired (direct
> dial telephone, toll-free telephone, pager, e-mail, fax) when you
> attempted to communicate with other NOC's?
>
> 9. Did you ask for or receive a request for mutual aid from any other
> providers? Was it provided?
>
> 10. Within the limits of safety and rescue efforts, where you able to
> gain access to your physical facilities?
>
> 11. Did hoaxes or rumors impact your operations?
>
> 12. Do you have any recommendations how Internet providers could have
> responded differently?

Sean;

   Multicasting worked. It handled a big traffic spike without a hiccup.

                                 Regards
                                 Marshall Eubanks

T.M. Eubanks
Multicast Technologies, Inc
10301 Democracy Lane, Suite 410
Fairfax, Virginia 22030
Phone : 703-293-9624 Fax : 703-293-9609
e-mail : tme@multicasttech.com
http://www.on-the-i.com

Test your network for multicast : http://www.multicasttech.com/mt/
Check the status of multicast in real time :
multicasttech.com - Diese Website steht zum Verkauf! - Informationen zum Thema multicasttech.

T.M. Eubanks
Multicast Technologies, Inc
10301 Democracy Lane, Suite 410
Fairfax, Virginia 22030
Phone : 703-293-9624 Fax : 703-293-9609
e-mail : tme@multicasttech.com
http://www.on-the-i.com

Test your network for multicast : http://www.multicasttech.com/mt/
Check the status of multicast in real time :
multicasttech.com - Diese Website steht zum Verkauf! - Informationen zum Thema multicasttech.

Vivien_M · September 17, 2001, 5:57pm

Washingtonpost.com kept alternating between Akamaized and not Akamaized in my experience; I'm guessing that it takes some time for content to replicate across Akamai servers, so in the meantime they put the new content up locally, and once it was on all the Akamai servers changed their links to the Akamaized URL. For some reason though, it seemed that _all_ the links changed from Akamaized or not Akamaized and back and so on, and not just the new ones. It made for a rather ... odd situation.

Vivien

Daniel_Golding6 · September 17, 2001, 6:09pm

hmm. I don't work for Akamai, so I can't presume to speak for them, but...

I specified Edgesuite, rather than simply akamizing the links. I think that
moving ALL content, rather than just some linked content to distributed
servers makes a big difference.

- Dan

Ian_Cooper1 · September 17, 2001, 6:09pm

But it would depend on how far the Akamaization had been taken. Typical use (Freeflow) would be for all the graphics to sit on the Akamai surrogates - that still means that you have to pull the initial HTML "glue" from the (overloaded) origin server.

I guess the future will show whether the fast-moving news environment will choose to use the full Edgesuite environment (in the case of Akamai, let's not forget there are other CDNs out there), which would also deliver the initial HTML.

ianai · September 17, 2001, 6:16pm

Washingtonpost.com kept alternating between Akamaized and not Akamaized in

>my experience; I'm guessing that it takes some time for content to replicate
>across Akamai servers, so in the meantime they put the new content up
>locally, and once it was on all the Akamai servers changed their links to
>the Akamaized URL. For some reason though, it seemed that _all_ the links
>changed from Akamaized or not Akamaized and back and so on, and not just the
>new ones. It made for a rather ... odd situation.

The customer controls whether an image, site, stream, or anything else is "Akamaized". And content is not replicated to any Akamai server until an end user "mapped" to that server requests it.

So, when a customer changes from a standard URL to an Akamized URL, there is no wait time for the data to be pushed to all servers. The very first user asking for that content will be mapped to the nearest Akamai server, which will then pull the data down and give it to the user, saving a copy on its HD. Subsequent users will get the data directly from the hard drive.

This is a strictly technical post on how Akamai works. Akamai has absolutely no control over whether a content provider uses Akamai's system to distribute all, some, or none of their content.

Bandy_Rush1 · September 17, 2001, 6:18pm

The big winners were cable TV, email, packet networks and IM applications.
The big losers with cell phones, circuit switching, PSTN, non-akamized
news sites.

no one went after the comms infrastructure. when they do, i suspect that
we will find the internet is extremely vulnerable. how many folk even
have md5 auth turned on their bgp peering sessions? what nievete!

randy

ianai · September 17, 2001, 6:26pm

I specified Edgesuite, rather than simply akamizing the links. I think that

>moving ALL content, rather than just some linked content to distributed
>servers makes a big difference.

Again, a strictly technical post:

EdgeSuite does serve the entire page, and while it is possible that "moving ALL content" might take longer than just moving images, I (personally) believe that would perform better than Akamaizing only images during times of peak congestion.

EdgeSuite, much like FreeFlow, does not pre-populate servers. It requests content that has been requested of it. So when a user goes to an EdgeSuited site, they are sent to the nearest Akamai server. That Akamai server requests the HTML as well as individual objects, saves them to the hard drive, and serves them to the user. If no user requests a page, it will not be fetched.

So the first user may not experience a large performance increase, but they might, we have other behind-the-scenes tricks which sometimes helps. Either way, they should not see a performance decrease. And all subsequent users should see a substantial performance increase.

From the standpoint of an origin server, it only sees one request per region of Akamai servers (upper bound, usually lower). With FreeFlow, the origin server has to serve HTML to *every* user, and only the large files (images, PDFs, other static content - whatever they tell us to deliver) are served by Akamai.

ianai · September 17, 2001, 6:32pm

no one went after the comms infrastructure. when they do, i suspect that

>we will find the internet is extremely vulnerable. how many folk even
>have md5 auth turned on their bgp peering sessions? what nievete!

If someone can splice into my point-to-point OC system, fake being the router on the other end, and keep my peer from calling me and asking what happened, well, then I have MUCH bigger things to worry about than whether my BGP session is valid. (And he probably has the capability to do whatever he wants, no matter how hard I try to stop him.)

As for public peering points, the ARP resolution would cause problems, and either I or my peer would notice pretty darned quickly. But only a small percentage of the traffic on the 'Net goes over public peering points these days anyway.

Not sure where else anyone could use MD5 on their BGP. Maybe I missed something?

Valdis_Kletnieks · September 17, 2001, 6:46pm

You *do* do ingress and egress filtering of your own addresses, and have checked
that your router does in fact use cryptographically challenging seuquence
numbers, right?

And even if you don't, using MD5 is not *that* expensive (or shouldn't be),
and provides security in depth.

Unfortunately, I'll bet there's a LOT of routers that don't have filtering
in place, don't have good sequence numbers, and don't use MD5. Enough said...

ianai · September 17, 2001, 7:00pm

If someone can splice into my point-to-point OC system, fake being the

>> router on the other end, and keep my peer from calling me and asking what
>
>You *do* do ingress and egress filtering of your own addresses, and have
>checked
>that your router does in fact use cryptographically challenging seuquence
>numbers, right?

I do not do anything. I Am Not An Isp.

But when I did run a network, I did *NOT* ingress filter on my own address space. I ran networks with multi-homed clients. If I did not allow my own address space to be announced to me, I would not have been able to talk to my multi-homed downstreams if their link to me was down. When a link to your upstream is down and you cannot send mail to noc@ through your second upstream, you tend to get a new upstream pretty quick.

I *ABSOLUTELY* believe in filtering customer announcements into my backbone. Been a big proponent of it for many years. Search the archives.

As for "cryptographically challenging sequence numbers", well, no, I have not inspected the code on any cisco or Juniper routers lately. Whatever sequence numbers they use are the sequence numbers they use, and I ain't gonna hack the code to change it.

>And even if you don't, using MD5 is not *that* expensive (or shouldn't be),
>and provides security in depth.

I do not *think* it would tax the CPU too much, but it has been at least 3 years since I have done it. IIRC, the CPU overhead was near nil.

And it only provides security for the BGP session, not "in depth". I am not saying that is a bad thing, just mentioning the limitation.

>Unfortunately, I'll bet there's a LOT of routers that don't have filtering
>in place, don't have good sequence numbers, and don't use MD5. Enough said...

Actually, I am still not certain why it was said at all. There are far, far more difficult hurdles to over come when spoofing a BGP session between major carriers than the sequence numbers. And most people notice when a major peer goes down, very, very quickly. MD5 or not.

In fact, I would wager that the misdirected traffic due to the added configuration complexity (yes, one line, but trust me, it can be a bitch if you forget the line, or forget the password) would far outweigh any savings you got from stopping attacks.

But not way to tell for certain since this type of attack is practically unheard of. (Or perhaps that is a way to tell?

Strata_Rose_Chalup · September 17, 2001, 7:48pm

Yes, very. The #coverage channel on slashnet had folks watching/listening
to various conventional media, as well as monitoring international news
sites, and posting updates and links via moderators. A tremendous amount
of info came in that way, and usually scooped any individual media station.

I'd guess that setting up an IRC net for nanog-type operational traffic
would be very helpful. Equally helpful would be gatewaying that net
via packet radio on amateur frequencies. "Commercial" traffic is
prohibited, but in a disaster this kind of thing would be equivalent
to health-and-welfare traffic.

In fact, now that I recall, SANS was asking for amateur radio
operators to send in contact info in June or July. They were talking
about putting together a non-internet communications network to be
used in case of serious virus/DoS/etc slams on the net. It doesn't
take a rocket scientist to see that they're thinking InfoWar type
scenarios. I don't know if the project was abandoned or if it got
complexified into something more formal and thus slowed down. We
never heard back from them.

Ham Radio Operators?
The threat to critical Internet resources from distributed denial of
service attack tools continues to increase. An effective emergency
communications network may be of great value if damage is done to both the
Internet and to phone systems. SANS is looking for ham and packet radio
operators who are willing to take a leadership role to help establish and
maintain an emergency communication channel. If you are qualified and
interested please send an email telling us about your ham radio and
computer security activities. Send it to info@sans.org with Emergency
Communications Network in the subject line.

It would be worth bringing back FidoNet or similar in parallel
with packet radio networks. A lot of packet radio is BBS-based,
and doesn't necessarily network between BBS's. I'm pretty new to
packet, so go check out some of the packet links on http://www.tapr.org/
(Tucson Amateur Packet Radio), one of the best sites on the net
for packet stuff. These folks have been real pioneers in it.

If folks are interested in discussing this (packet nanog for emergencies,
and/or irc comm net ditto) more, I'd be happy to host or set up a mailing
list for it.

SRC

PS- And whether it was officially sanctioned or not, hats off to whoever put
CNN's close-caption feed onto IRC as well. Low-bandwidth news w/o the
talking heads.

Miles Fidelman wrote:

Vadim_Antonov1 · September 17, 2001, 7:56pm

All US long-distance telephony infrastructure can be effectively disabled
by a couple dozen or so backhoes digging in the right places. Even
competing carriers often share cables.

--vadim

Daniel_Golding6 · September 17, 2001, 8:11pm

Gee, the only major ISP that uses MD5 for peering links is Verio. That what
you were looking for, Randy?

Seriously, BGP session hijacking is the least of our worries. If you want to
hit internet infrastructure, the points of weakness are obvious and
physical. Car bombs at a dozen sites that we all know so well would be
enough to seriously degrade internet communications, particularly if they
were detonated near the fiber entrance facilities.

This underscores the previous concerns mentioned by some about the common
colocation of private peering by major internet carriers. Looks a little
riskier now, yes?

- Daniel Golding

Bandy_Rush1 · September 17, 2001, 8:14pm

Gee, the only major ISP that uses MD5 for peering links is Verio.

i believe that statement to be false

randy

Alex_Bligh1 · September 17, 2001, 8:18pm

Only all the well documented attacks (including DoS).
Think about sending RST to BGP port (and other random
ports) on your routers.

ianai · September 17, 2001, 8:21pm

>> Maybe I missed something?
>
>Only all the well documented attacks (including DoS).
>Think about sending RST to BGP port (and other random
>ports) on your routers.

I was under the impression that MD5 would not stop an RST attack. It that incorrect?

And if you filtered on source IP for all your downstreams, this would solve that problem. (Unless the attacker was a major carrier, in which case he may very well be in possession of your MD5 passphrase.)