Science vs. bullshit

Lightning talk followup because I want to make sure there was not a miscommunication. A two sentence comment at the mic while 400+ of your not-so-close friends are watching does not a rational discussion make.

The talk in question:

    < >

The disagreement is whether Renesys can reliably find out how many transit providers an AS has. Remember, we are discussing transit providers here, not peers.

My point is if an AS has _transit_, then it must be visible in the global table (assuming a reasonably large set of vantage points), or it would not be transit. Of course, this is not perfect, but it is a pretty close approximation for fitting curves over 10s of 1000s of ASes. So things like "I have two transit providers, and one buys transit from the other" is a small number and not relevant to fitting curves. (It also means you are an idiot, or in a corner of the Internet where you should probably be considered as having only one provider.)

Majdi has pointed out other corner cases where transit is not viewable through systems like Rensys. For instance, announcing prefixes to Provider 2 with a community to local-pref the announcement below peer routes. That means only one transit is visible in BGP data.

There were several reasons some of us did not think edge cases like this were important. For instance, Renesys keeps -every- update ever, so if Provider 1 ever flaps, Rensys will see Provider 2. Also, when looking for the number of providers, a "backup path" may not be relevant since no packets take that path.

More importantly, I thought the point of the talk was to show that the table was growing during the recession and people were still getting more providers. The result is a curve, not a hard-and-fast number. Corner cases like the one above are barely noise, so the curve it still valid.

It is true that finding peering edges with things like route-views is problematic at best, so finding ASes with one transit plus peering might be problematic. But since I do not think that was the point of the talk, I do not consider that problem.

If anyone who still thinks the problems with finding transit edges somehow make the talk 'bullshit' could clarify their position, I would be grateful.

Strictly speaking, with the subject of "Science vs bullshit", you and msa
have named a hypothesis, no? Can either of you think of a way to disprove
that, and if so, where's your data? :slight_smile:


Randy's right that it can be somewhat difficult to agree on a single
methodology for generating accurate assessments of how many transit
providers a particular network uses at a particular moment in time.
There are at least two knobs to turn: how long you integrate updates (we
like to use at least 24 hours of continuous time in order to flush out
backup routes, but it's sensible to look for weeks or longer to get the real
rarities to show their heads), and how much peer diversity you require in
order to call a provider relationship 'globally visible transit' for a given
prefix (I used 50+ peers as a rough rule of thumb, but you can pick
lower/higher numbers and get arguably meaningful answers). It's like
asking, "how big is the global routing table .. REALLY?" Depends on how
you count.

The thing about the data I presented, however, is that it is _differential_
... it says "set your knobs, look at four days over four years, and let's
see if the migration among populations seems consistent." In fact, the
recurrence is pretty stable -- the same percentage of people in "diversity
class X" tend to end up in "diversity class Y" twelve months later, over
multiple years, with small changes that we can identify as trends. This
gives confidence that the knobs are set in such a way that they are
achieving some meaningful classification of the prefix population.

To Patrick's point, the shape of the curve tells us useful things, even if
the precise boundaries among diversity classes can be drawn in subtly
different ways.

And that's exactly why we look for techniques that can give information
about trends (for example, my point that some dual-homed ASNs appear to be
postponing their decision to attain higher degrees of multihoming) even in
the presence of some classification uncertainty at the single-prefix level.

I'm glad to have sparked so much excitement with a 10-minute talk. Imagine
if I had dragged it out to 30 minutes!

cheers, ---jim

The thing about the data I presented, however, is that it is _differential_
... it says "set your knobs, look at four days over four years, and let's
see if the migration among populations seems consistent."

as we discussed this morning, this has the problem of not knowing how
much of the change is in the lens through which you are looking and how
much is in that at which you are looking.

bgp is way too damned good at information hiding.