China prefix hijack

Just half an hour ago China Telecom hijacked one of our prefixes:

Your prefix: X.Y.Z.0/19:
Prefix Description: NETNAME
Update time: 2010-04-08 15:58 (UTC)
Detected by #peers: 1
Detected prefix: X.Y.Z.0/19
Announced by: AS23724 (CHINANET-IDC-BJ-AP IDC, China Telecommunications Corporation)
Upstream AS: AS4134 (CHINANET-BACKBONE No.31,Jin-rong Street)
ASpath: 39792 4134 23724 23724

Luckily it had to be limited as only one BGPmon peer saw it. Anyone else noticed it?

i think so yeah

AS 23724 is now announcing 63.218.188.0/22 which is historically announced
by ASes: 3491.
Time: Thu Apr 8 16:55:02 2010 GMT
Observed path: 812 174 4134 23724 23724

Sorry, I'm not seeing an announcement for X.Y.Z.0/19 here.

Hi Grzegorz,

.-- My secret spy satellite informs me that at 08/04/10 9:33 AM Grzegorz Janoszka wrote:

Just half an hour ago China Telecom hijacked one of our prefixes:

Your prefix: X.Y.Z.0/19:
Prefix Description: NETNAME
Update time: 2010-04-08 15:58 (UTC)
Detected by #peers: 1
Detected prefix: X.Y.Z.0/19
Announced by: AS23724 (CHINANET-IDC-BJ-AP IDC, China Telecommunications
Corporation)
Upstream AS: AS4134 (CHINANET-BACKBONE No.31,Jin-rong Street)
ASpath: 39792 4134 23724 23724

Luckily it had to be limited as only one BGPmon peer saw it. Anyone else
noticed it?

Yes many prefixes have been 'impacted' by this. These include prefixes for websites such as dell.com and cnn.com.

The event has been detected globally by peers in Rusia, USA, Japan and Brazil.
However not all individual prefix 'hijacks' were detected globally, many only by one or 2 peers, in one or 2 countries, but some by more.

The common part in the ASpath is
4134 23724

Which are:
AS4134 CHINANET-BACKBONE No.31,Jin-rong Street
AS23724 CHINANET-IDC-BJ-AP IDC, China Telecommunications Corporation

ASns peering with AS4134 seem to have picked this up and propagated that to their customers.
Some of these ASns include:
AS9002 RETN-AS ReTN.net Autonomous System
AS12956 TELEFONICA Telefonica Backbone Autonomous System
AS209 ASN-QWEST - Qwest Communications Company, LLC
AS3320 DTAG Deutsche Telekom AG
AS3356 LEVEL3 Level 3 Communications
AS7018 ATT-INTERNET4 - AT&T WorldNet Services

All RIS peers that detected this where behind (transit/peer) one of those ANS's.

Most 'alerts' have now been cleared, they typically lasted a few minutes.

Cheers,
  Andree

Hi,

We received BGPmon notifications for all of our prefixes as well. Not sure if it's relevant, but this is also announced upstream from us by 3491. Example:

Just wondering if this was a "Fat fingered" mistake or intentional...

-J

If it was a mistake, I hope he fares a bit better than his counterparts
in other Chinese industries...

Hello,

Just a note of confirmation that 23724 originated as many as 31847
prefixes during an 18 minute window starting around 15:54 UTC.
They were prepending their own AS, and this is several orders of
magnitude more prefixes than they normally originate.

-Martin

Hi, team.

Joe wrote:

Just wondering if this was a "Fat fingered" mistake or intentional...

I'm thinking "oops."

Looking only for prefixes with the aspath " 4134 23724 23724 <end>," and
only on 2010-04-08 UTC, we see 15210 prefixes announced.

Of those, 9598 are allocated to CN, 11017 are allocated by APNIC, and
4193 are neither (LACNIC, AFRINIC, RIPE, ARIN).

The prefixes are almost sequential as well. There are some few gaps.
It ranges beginning with a prefix in 8/8 and ending with a prefix in 222/8.

There are a wide range of networks of every sort. Nations involved,
according to RIR records, include:

   Prefixes Country Code
   9444 CN
   2192 US
    758 AU
    448 CO
    186 RU
    139 ID
    131 TH
    122 JP
    103 KR
    101 EC
     96 BR
     91 IN
     84 AR
     [ ... ]

So I'm leaning towards "big oops." I'll see if we can find someone at
AS23724 to ask, and perhaps assist if needed.

Thanks,
Rob.

Interestingly, they re-originated these prefixes - as opposed to
simply leaking them, which means origin AS-based filters (e.g., as
provided by the current RPKI and SIDR work) would have prevented
this (however, origin AS-based filters would NOT have prevented the
i-root incident a couple weeks back). Most of the incidents we see
of this sort with a large number of prefixes are traditional leaks
with path preservation - so that does make one raise an eyebrow.

Of course, even gross "max prefix" policies would have also helped
here to some extent, to at least limit the scope of this incident
to a much smaller number of prefixes.

One might well observe that RFC 1998-esque policies that employ
LOCAL_PREF to prefer prefixes from customers over like prefixes
from peers means that ALL ISPs that employ such policies in that
transit service hierarchy will first ignore the AS path length when
making BGP best path decisions (i.e., if a leaking Chinese provider
were a transit customer of a large U.S. provider and were given BGP
preference as a result, then all of that U.S. ISPs customers will
end up using the Chinese path as opposed to a path learned locally
in the U.S. from a peer). Perhaps it's time to rethink application
of such policies ubiquitously across peers and customers, or to at
least be more selective in such policy application.

Just one more incident to illustrate how fragile the routing system
is, and how broken the current "routing by rumor" model continues to
be.

-danny

I also see some of this from France.

On this incident/error, even if tools like BGPMon, watchmy.net and
others exactly did their roles, I asking myself if there are some other
public tools which can help.

CIDR returns Chinanet as the biggest announcer (but could be the case
previously)
97074688 Largest address span announced by an AS (/32s)
   AS4134: CHINANET-BACKBONE No.31,Jin-rong Street
on http://www.cidr-report.org/as2.0/
Same stats from http://www.ris.ripe.net/dashboard/4134
I'm not sure either of them is real-time.

There is also a "hole" in
http://www.cymru.com/BGP/bgp_prefixes.html

So, how each one has assess the impact of this on his network ? How
could we check where route's propagation stop(ed) ?
Thanks to Renesys and Team Cymru for the stats of how many
prefixes/countries where affected.

I hope most Tier1 operators have rules to filter too big announces
changes to avoid the Youtube/Pakistan Telecom effect or i-root as said
previously.

thanks
Best regards,

  Jul

Hi Jul, list

.-- My secret spy satellite informs me that at 08/04/10 1:57 PM jul wrote:

So, how each one has assess the impact of this on his network ? How
could we check where route's propagation stop(ed) ?
Thanks to Renesys and Team Cymru for the stats of how many
prefixes/countries where affected.

Some additional information such as a list of all prefixes affected, geographical impact & some more information regarding this incident can be found here:
http://bgpmon.net/blog/?p=282

Cheers,
  Andree

"Martin A. Brown" <mabrown@renesys.com> writes:

Just a note of confirmation that 23724 originated as many as 31847
prefixes during an 18 minute window starting around 15:54 UTC.
They were prepending their own AS, and this is several orders of
magnitude more prefixes than they normally originate.

a couple of times when CIX still existed, i fscked up badly and sent
a full deaggregated table to some peers. (the perils of running BGP3
and BGP4 on the same router, plus on-the-job-training for yours truly.)

it's also happened a half dozen times by fat fingers other than mine.

then there are the people who advertise 0/0, and all the people who accept
0/0. historians may say of the nanog@ m/l archives, "much hilarity ensued."

are we all freaking out especially much because this is coming from china
today, and we suppose there must be some kind of geopolitical intent
because china-vs-google's been in the news a lot today?

i'm more inclined to blame the heavy solar wind this month and to assume
that chinanet's routers don't use ECC on the RAM containing their RIBs and
that chinanet's router jockeys are in quite a sweat about this bad publicity.

There's been a fair amount of speculation that at least some of these incidents may be related to censorship mechanisms, and a further tendency to conflate them, rather than looking more closely at the dynamics of each occurrence.

Paul Vixie <vixie@isc.org> writes:

i'm more inclined to blame the heavy solar wind this month and to assume
that chinanet's routers don't use ECC on the RAM containing their RIBs and
that chinanet's router jockeys are in quite a sweat about this bad publicity.
--
Paul Vixie
KI6YSY

That is likely to be an increasing problem in upcoming months/years.
Solar cycle 24 started in August '09; we're ramping up on the way out
of a more serious than usual sunspot minimum.

We've seen great increases in CPU and memory speeds as well as disk
densities since the last maximum (March 2000). Speccing ECC memory is
a reasonable start, but this sort of thing has been a problem in the
past (anyone remember the Sun UltraSPARC CPUs that had problems last
time around?) and will no doubt bite us again.

Rob Seastrom, AI4UC

That is likely to be an increasing problem in upcoming months/years.
Solar cycle 24 started in August '09; we're ramping up on the way out
of a more serious than usual sunspot minimum.

I wonder what kind of buildings are less susceptible to these kinds
of problems. And is there a good way to test your data centre
to come up with some kind of vulnerability rating?

Would a Faraday cage be sufficient to protect against cosmic ray bit-flipping
and how could you retrofit a Faraday cage onto a rack or two of gear?

--Michael Dillon

Scientists build neutrino detectors in mines 8,000 feet underground because
that much rock provides *partial* shielding against cosmic rays causing
spurious detection events.

Fortunately, the sun emits almost no cosmic rays.

It does however spew a lot of less energetic particles that will cause
single-bit upsets in electronic gear. Time to double-check that all your
gear has ECC ram - the problem with the UltraSparc CPUs last time was that
they had some cache chips built by IBM. IBM said "Use these chips in an
ECC config", but Sun didn't. The ions hit, and the resulting bit-flips
crashed the machines. Incidentally, Sun sued IBM over that, and the judge
basically said "Well, IBM *told* you not to do that up front. Suit dismissed".

One of the other big issues will be noise on satellite and microwave links
screwing your S/N ratio.

The one that scares me? Inducted currents on long runs of copper. You get a
200-300 mile 765Kva transmission line, and a solar flare hits, the Earth's
magnetic field gets dented, so the field lines move relative to the stationary
copper cable, and suddenly you have several thousand extra amps popping out one
end of that cable. Ka-blam. The big danger there is that many substations are
not designed for that - so it would basically *permanently* destroy that
substation and they'd get to replace it. And of course, that's a several-weeks
repair even if it's the only one - and in that sort of case, there will be
*dozens* of step-down transformers blown up the same afternoon.

How long can you run on diesel? :wink:

I'm more inclined to believe that it would be a solar conjunction actually. The scenerio would be that they lost track of their bird and started tracking the sun. Since we all know that old Sol is an excellant originating point of radiated noise, surely with that much noise, and a solid lock on it, the odds of its random noise being something decipherable are much more acceptable than normal.

We've seen great increases in CPU and memory speeds as well as disk
densities since the last maximum (March 2000). Speccing ECC memory is
a reasonable start, but this sort of thing has been a problem in the
past (anyone remember the Sun UltraSPARC CPUs that had problems last
time around?) and will no doubt bite us again.

Sun's problem had an easy solution - and it's exactly the one you've
mentioned - ECC.

The issue with the UltraSPARC II's was that they had enough redundancy to
detect a problem (Parity), but not enough to correct the problem (ECC). They
also (initially) had a very abrupt handling of such errors - they would
basically panic and restart.

From the UltraSPARC III's they fixed this problem by sticking with Parity in

the L1 cache (write-through, so if you get a parity error you can just dump
the cache and re-read from memory or a higher cache), but using ECC on the
L2 and higher (write-back) caches. The memory and all datapaths were
already protected with ECC in everything but the low-end systems.

It does raise a very interesting question though - how many systems are you
running that don't use ECC _everywhere_? (CPU, memory and datapath)

Unlike many years ago, today Parity memory is basically non-existent, which
means if you're not using ECC then you're probably suffering relatively
regular single-bit errors without knowing it. In network devices that's
less of an issue as you can normally rely on higher-level protocols to
detect/correct the errors, but if you're not using ECC in your servers then
you're asking for (silent) trouble...

  Scott.

The topic of sunspots is certainly familiar from long ago. We had a
7513
that crashed unexpectedly, upon a review of the data available, it was
determined
that a parity error had occurred. I can't remember the exact error as it was
several
years ago, but upon a quick search this article seems familiar.

http://www.ciscopress.info/en/US/products/hw/switches/ps700/products_tech_no
te09186a00801b42bf.shtml

Search on cosmic radiation and/or SEU within.

-Joe