Someone Please Help Me Understand

Ok, I'm trying to learn, so bear with me.

We are an ISP in Indianapolis that has full routes from 3 different
providers HE.Net in Columbus OH being one. We also are peered with 2
peering exchanges, including EquinixIX in Chicago. The problem is
Instagram and Facebook (same company, I know) for our customers seems
very slow.

This is where I need a way to troubleshoot/understand more. I did a
traceroute to the IP that is serving the pictures, and it resolves to
the FBCDN servers in Dallas, and is showing packet loss and pings once
it hits Dallas, and are in the 1xxs of ms.

Tracing route to instagram-p3-shv-01-dfw1.fbcdn.net [31.13.66.52]

over a maximum of 30 hops:

  1 4 ms 3 ms 4 ms 10.7.0.1

  2 20 ms 43 ms 42 ms inmtvlobs-rtr-01.dynamic.pdsconnect.me
[192.69.57.1]

  3 25 ms 47 ms 29 ms
inmtvlmwt-rtr-01.infrastructure.pdsconnect.me [192.69.48.162]

  4 46 ms 32 ms 58 ms
inindyhen-core1.infrastructure.pdsconnect.me [192.69.48.193]

  5 36 ms 53 ms 51 ms ge2-4.core1.cmh1.he.net [184.105.32.1]

  6 47 ms 41 ms 75 ms 10ge1-2.core1.chi1.he.net
[184.105.222.165]

  7 57 ms 57 ms 53 ms 100ge14-1.core2.chi1.he.net
[184.105.81.97]

  8 57 ms 73 ms 84 ms 100ge12-1.core1.mci3.he.net
[184.105.81.209]

  9 75 ms 73 ms 102 ms 10ge15-6.core1.dal1.he.net
[184.105.222.10]

10 93 ms 103 ms 92 ms eqix-da1.facebook.com [206.223.118.176]

11 102 ms 101 ms * psw01c.dfw1.tfbnw.net [173.252.65.196]

12 92 ms 97 ms 105 ms msw1aq.01.dfw1.tfbnw.net [204.15.21.89]

13 110 ms * 98 ms instagram-p3-shv-01-dfw1.fbcdn.net
[31.13.66.52]

Since I am peered with the route servers in EquinixIX Chicago, shouldn't
the data be coming from there, or at least hit their routers? In my
trace, it shows HE to Chicago, then to Dallas. How does FB decide what
IP the content gets displayed from, and is there anything I can do as a
provider? If it is DNS, I can obviously clear the cache to see if it
gets new IPs. If I'm not getting FB peering IPs in Chicago, do I need
to peer directly? Should I get FaceBook involved?

Eric Rogers

PDS Connect

(317) 831-3000 x200

Are you using locally resolving DNS servers? I don't know how FB determines where your content comes from, but some CDNs test the performance to your resolving DNS server.

Yes, we have our own Bind9 caching servers in different geographic locations, direct on fiber using SSD drives... Seem very quick when using GRC's DNS Benchmark tests.

Eric Rogers
PDS Connect
(317) 831-3000 x200

Well, what I meant by that (which wouldn't be the problem if you have them on-net), is that they'll do performance tests to your resolving DNS server to determine which node is the best node to serve you from. Well, they meaning other CDNs. I don't know how FB determines it.

Are you seeing any routes to 32934 from your Chicago Equinix connection?

By performance testing, it varies, but latency is a big one. Again, doesn't seem to be your problem in your case.

Facebook does not open peer.
https://www.facebook.com/peering/
https://www.peeringdb.com/asn/32934

So you need to get in touch with them to peer with them, assuming you
meet their standard (50mbps of exchanges between you and them). So you
need to establish a peering relationship before you can use it.

And even if you peer, there is no garantee you get your "big" data from
the nearest. You may get the HTML from the nearest but their "logic" may
decide that your images or videos will be served from some other site
because of how the decide this (database of DNS servers etc). (when they
generate the html for your customers, they would include URLs that point
to distant servers at which point no amount of peering will change that.
You need to find out why the content provider's logic doesn't feed your
customers URLs to the nearest CDN.

Hi Eric,

With this type of connectivity you have to pay attention to Traffic Engineering...

And when I say, traffic engineering, I mean both ways.. how you are sending traffic to them
along with how they are sending traffic to you... (sometimes a bit more challenging to do).

I will give you two specific example, just to illustrate the point...

We are located in the east coast, we have ip transit to Cogent network, via one intermediary ASN.
We also have IP Transit with GTT and Hibernia networks.
We also have direct peering on multiple Peering Fabrics.

1st cases...
We have our outbound traffic engineered to prefer direct routes.. e.g. when sending traffic to Cogent, we send
it out via the intermediary ASN to Cogent.
However when traffic is coming back from Cogent.... they see our prefixes via intermediary ASN as well as Hibernia Networks,
since Hibernia networks is a lower ASN, they prefer that route....
So, one can say, no big deal, except, Hibernia Networks connects to Cogent on the West Coast !... so our return traffic is going
from the east coast to west coast and them back to east coast....
So one can easily say... Houston we have a problem !...

2nd Case..
We are peered with some networks at Telx TIE, via one of our (intermediary) ASN...So while we can send traffic over to that network via our ASN, however that networks sees our prefixes via our (intermediary) ASN as Hibernia as well.... Hibernia being a lower ASN, they send traffic back to us via them...

In both cases we use communities to take corrective action....

Moral of the story is..... just because you have multiple peers, and peer with folks on the Peering Fabric, the default configuration of BGP will not AUTOMAGICALY optimize the paths in your favor....

And thus the condition you describe will be the result...

Faisal Imtiaz
Snappy Internet & Telecom
7266 SW 48 Street
Miami, FL 33155
Tel: 305 663 5518 x 232

Help-desk: (305)663-5518 Option 2 or Email: Support@Snappytelecom.net

Thanks Faisal,

I appreciate the time you took and the detail you have placed. I did try prepending our HE connection thinking it was an issue via HE, and we started going out Level3, and it also went to Dallas with nearly the same packet loss. I don't know what the return path is/was, but through another provider, it also showed major packet loss. That leads me to believe that FB is/was having issues in Dallas. Maybe on their peering port? I have since found out they don't peer through the route servers, but only directly through the exchanges (direct peering relationship). I have since submitted a peering request to FB and also submitted a request to their NOC to look at the packet loss and why we are getting Dallas IPs. I have not received a response to either.

I can use the community strings to manipulate our announcement of our routes, but won't DNS tell the browser what IP to ultimately get the data?

I am not trying to publically shame or air dirty laundry, I am just trying to understand the situation more. CDNs bring a whole new level I have yet to comprehend with multicast DNS and GeoIP responses...

Eric Rogers
PDS Connect
www.pdsconnect.me
(317) 831-3000 x200

Eric,
There is no simple cut and dry way of troubleshooting such a situation, other than need to look at the problem in multiple different ways..
It also helps in being able to do some comparative test/results with another nearby network...

It is also not un-common to have to shutdown a peer v.s prepend.. when troubleshooting.
One has to keep in mind that many of the IP Transit networks use local pref for customer routes, thus nullifying (ignore) the AS prepends.
Each provider is different, HE does not have a published set of communities, thus effectively do not allow their customers to do any significant traffic
engineer.. (anyone from HE, if I am wrong, please feel free to correct me).
Level3 by default overrides any AS prepends with local pref, but does allow it's customers to use communities to override those settings.

. I am not trying to publically shame or air dirty laundry, I am just trying to

understand the situation more. CDNs bring a whole new level I have yet to
comprehend with multicast DNS and GeoIP responses...

Understood, I have been there so I can relate. Nanog is a great place to learn, even when asking dumb questions, folks here have been very supportive in explaining, and every now and then one sees a sarcastic reply, but overall I cannot say I have ever had anyone treat me in a condescending manner.

My humble suggestion is that you start with simple stuff first .. i.e. bgp traffic engineering before trying to wrap your head around multicast DNS and GeoIP response... I often find the answer to complex issues to be in the simple stuff, which often gets overlooked !

:slight_smile:

Faisal Imtiaz
Snappy Internet & Telecom
7266 SW 48 Street
Miami, FL 33155
Tel: 305 663 5518 x 232

Help-desk: (305)663-5518 Option 2 or Email: Support@Snappytelecom.net