A BGP issue?

I run a small network on a mission base in the Amazon jungle which is fed by a satellite internet connection. We had an outage from Feb 25th to the 28th where we had no connectivity with email, http/s, ftp, Skype would indicate it's connected but even chatting failed, basically everything stopped working except for ICMP. I could ping everywhere just fine. I started doing traceroutes and they all were very odd, all not reaching their destination and some hopping all over creation before dying. But if I did traceroute with ICMP it worked fine. Does this indicate our upstream (Bantel.net) had a BGP issue? Bantel blamed Hughesnet which is the service they resell. I'm wondering what kind of problem would let ping work fine but not any of the other protocols. It also seems odd that I could traceroute via UDP part way to a destination but then it would fail if the problem was my own provider. Thanks.

If this is the wrong forum for this post I'm sorry and please just hit delete. If this is the wrong forum but you'd be kind enough to share your expertise please reply off-list. Thanks!

Here's some examples of the traceroutes I saved during the outage.

Using UDP:

Gregs-MacBook-Pro:~ GregIhnen$ traceroute metaconi.com
traceroute to metaconi.com (70.32.39.205), 64 hops max, 52 byte packets
1 192.168.7.1 (192.168.7.1) 1541.165 ms 25.665 ms 39.211 ms
2 * * *
3 192.168.14.254 (192.168.14.254) 625.710 ms 860.264 ms 694.238 ms
4 192.168.180.5 (192.168.180.5) 645.666 ms 757.161 ms 664.785 ms
5 10.254.253.158 (10.254.253.158) 738.661 ms 801.487 ms 728.139 ms
6 fe11-0-5.miami1.mia.seabone.net (195.22.199.77) 726.884 ms 733.989 ms 647.736 ms
7 te3-4.miami7.mia.seabone.net (195.22.199.97) 740.233 ms 694.619 ms 685.464 ms
8 206.111.1.161.ptr.us.xo.net (206.111.1.161) 639.077 ms 741.495 ms 679.880 ms
9 te-4-1-0.rar3.miami-fl.us.xo.net (207.88.12.161) 650.312 ms 612.386 ms 660.452 ms
10 te-3-2-0.rar3.atlanta-ga.us.xo.net (207.88.12.5) 787.079 ms 725.495 ms 685.068 ms
11 te-11-0-0.rar3.washington-dc.us.xo.net (207.88.12.10) 760.002 ms 828.076 ms 702.041 ms
12 ae0d0.mcr2.chicago-il.us.xo.net (216.156.0.166) 719.324 ms 641.274 ms 689.997 ms
13 ae1d0.mcr1.chicago-il.us.xo.net (216.156.1.81) 669.613 ms 813.794 ms 737.211 ms
14 edge1.chi1.ubiquityservers.com (216.55.8.30) 729.875 ms 751.481 ms 730.088 ms
15 * * *
16 * * *
17 * * *
18 * * *
19 * * *
20 * * *
21 * * *
22 * * *
23 * * *
24 * * *
25 * * *
26 * * *
27 * *

Now here it is again doing traceroute via ICMP:

Gregs-MacBook-Pro:~ GregIhnen$ traceroute -I metaconi.com
traceroute to metaconi.com (70.32.39.205), 64 hops max, 72 byte packets
1 192.168.7.1 (192.168.7.1) 5.254 ms 3.059 ms 2.578 ms
2 * * *
3 192.168.14.254 (192.168.14.254) 1511.146 ms 711.304 ms 822.967 ms
4 192.168.180.5 (192.168.180.5) 712.672 ms 821.990 ms 713.009 ms
5 10.254.253.158 (10.254.253.158) 823.244 ms 711.764 ms 823.219 ms
6 fe11-0-5.miami1.mia.seabone.net (195.22.199.77) 712.640 ms 613.306 ms 614.429 ms
7 te3-4.miami7.mia.seabone.net (195.22.199.97) 823.232 ms 711.881 ms 823.166 ms
8 206.111.1.161.ptr.us.xo.net (206.111.1.161) 712.765 ms 822.398 ms 712.531 ms
9 te-4-1-0.rar3.miami-fl.us.xo.net (207.88.12.161) 822.809 ms 920.831 ms 712.399 ms
10 te-3-2-0.rar3.atlanta-ga.us.xo.net (207.88.12.5) 823.288 ms 711.478 ms 822.887 ms
11 te-11-0-0.rar3.washington-dc.us.xo.net (207.88.12.10) 712.705 ms 822.287 ms 712.713 ms
12 * ae0d0.mcr2.chicago-il.us.xo.net (216.156.0.166) 738.656 ms 919.752 ms
13 ae1d0.mcr1.chicago-il.us.xo.net (216.156.1.81) 921.381 ms 920.884 ms 1228.683 ms
14 edge1.chi1.ubiquityservers.com (216.55.8.30) 921.560 ms 920.482 ms 921.634 ms
15 relativity.mrk.com (70.32.39.205) 880.318 ms 753.150 ms 823.285 ms
Gregs-MacBook-Pro:~ GregIhnen$

Here's an example of a UDP traceroute going all over creation:

Gregs-MacBook-Pro:~ GregIhnen$ traceroute skype.com
traceroute to skype.com (78.141.177.7), 64 hops max, 52 byte packets
1 192.168.7.1 (192.168.7.1) 18.939 ms 4.596 ms 27.124 ms
2 * * *
3 192.168.14.254 (192.168.14.254) 724.034 ms 704.520 ms 823.886 ms
4 192.168.180.5 (192.168.180.5) 711.962 ms 704.606 ms 823.208 ms
5 10.254.253.158 (10.254.253.158) 712.622 ms 912.870 ms 921.471 ms
6 fe11-0-5.miami1.mia.seabone.net (195.22.199.77) 712.642 ms 822.307 ms 712.720 ms
7 * te9-1.ccr01.mia03.atlas.cogentco.com (154.54.11.37) 3692.277 ms 702.345 ms
8 te9-1.ccr01.mia03.atlas.cogentco.com (154.54.11.37) 823.172 ms 920.050 ms 921.612 ms
9 te8-2.ccr01.mia01.atlas.cogentco.com (154.54.28.245) 921.681 ms
    te8-7.ccr02.mia01.atlas.cogentco.com (154.54.1.185) 703.270 ms
    te8-2.ccr02.mia01.atlas.cogentco.com (154.54.2.153) 730.152 ms
10 te0-0-0-5.ccr21.atl01.atlas.cogentco.com (154.54.30.33) 797.769 ms
    te2-1.ccr02.atl01.atlas.cogentco.com (154.54.3.25) 913.513 ms
    te0-1-0-4.ccr21.atl01.atlas.cogentco.com (154.54.24.161) 782.095 ms
11 te0-4-0-7.ccr21.dca01.atlas.cogentco.com (154.54.42.189) 814.870 ms
    te0-5-0-7.ccr22.dca01.atlas.cogentco.com (154.54.42.201) 815.878 ms
    te0-2-0-3.ccr21.dca01.atlas.cogentco.com (154.54.24.9) 912.453 ms
12 te0-5-0-6.ccr22.jfk02.atlas.cogentco.com (154.54.42.30) 913.183 ms
    te0-2-0-2.ccr21.jfk02.atlas.cogentco.com (154.54.26.186) 913.078 ms
    te0-4-0-7.ccr22.jfk02.atlas.cogentco.com (154.54.41.14) 913.268 ms
13 te2-8.ccr02.lon02.atlas.cogentco.com (154.54.30.22) 833.515 ms
    te0-4-0-7.mpd22.jfk02.atlas.cogentco.com (154.54.41.30) 702.568 ms
    te0-4-0-7.ccr22.bos01.atlas.cogentco.com (154.54.44.50) 815.549 ms
14 te0-3-0-2.ccr21.bos01.atlas.cogentco.com (154.54.44.14) 1010.769 ms
    te0-1-0-5.ccr21.bos01.atlas.cogentco.com (154.54.44.30) 913.070 ms
    te4-8.ccr01.lon01.atlas.cogentco.com (130.117.0.186) 913.076 ms
15 te7-3.mpd02.lon01.atlas.cogentco.com (154.54.30.130) 913.495 ms
    te4-4.mpd02.lon01.atlas.cogentco.com (130.117.1.134) 831.442 ms
    te1-1.mpd02.lon01.atlas.cogentco.com (154.54.5.162) 811.198 ms
16 te0-0-0-0.mpd21.par01.atlas.cogentco.com (130.117.2.5) 913.200 ms 912.636 ms
    te1-4.ccr01.bru01.atlas.cogentco.com (130.117.51.106) 921.496 ms
17 te0-0-0-0.mpd21.par01.atlas.cogentco.com (130.117.2.5) 913.344 ms 920.902 ms
    te1-4.ccr01.bru01.atlas.cogentco.com (130.117.51.106) 921.368 ms
18 149.6.134.86 (149.6.134.86) 920.406 ms 1014.226 ms *
19 213.166.61.194 (213.166.61.194) 959.442 ms 828.583 ms 920.909 ms
20 213.135.247.42 (213.135.247.42) 1019.971 ms
    78.141.177.7 (78.141.177.7) 988.848 ms
    213.135.247.42 (213.135.247.42) 1011.307 ms
Gregs-MacBook-Pro:~ GregIhnen$

UDP traceroute worked to Google though I still couldn't load their pages

Gregs-MacBook-Pro:~ GregIhnen$ traceroute google.com
traceroute to google.com (74.125.229.114), 64 hops max, 52 byte packets
1 192.168.7.1 (192.168.7.1) 5.023 ms 3.971 ms 6.712 ms
2 * * *
3 192.168.14.254 (192.168.14.254) 1471.985 ms 643.770 ms 785.534 ms
4 192.168.180.5 (192.168.180.5) 712.715 ms 704.409 ms 626.374 ms
5 10.254.253.158 (10.254.253.158) 808.083 ms 647.244 ms 619.878 ms
6 fe11-0-5.miami1.mia.seabone.net (195.22.199.77) 776.534 ms 711.870 ms 640.372 ms
7 * te7-1.miami7.mia.seabone.net (195.22.199.109) 810.819 ms 731.713 ms
8 google.miami7.mia.seabone.net (89.221.41.18) 712.471 ms
    google.miami7.mia.seabone.net (89.221.41.74) 703.638 ms
    google.miami7.mia.seabone.net (89.221.41.18) 704.814 ms
9 209.85.253.116 (209.85.253.116) 822.336 ms 761.621 ms
    209.85.253.74 (209.85.253.74) 709.705 ms
10 209.85.254.180 (209.85.254.180) 705.062 ms 702.723 ms 824.497 ms
11 74.125.229.114 (74.125.229.114) 712.196 ms 704.509 ms 596.077 ms
Gregs-MacBook-Pro:~ GregIhnen$

Honestly, I would rate this as one of the most on-topic posts in a while.

BGP only handles reachability, not higher level protocols. (Of course, you can h4x0r anything to do jus about anything, but we are talking the general case here.)

If you can ping, BGP is working. If you can ping and cannot use TCP, then something other than BGP is at fault.

I've seen strange things like someone enabling TCP compression (common on very small or very expensive links) one side but not the other, which then allowed ICMP and UDP but not TCP. It is a great way to annoy someone. "See, I can ping, it must be your side!"

Have you tried TCP traceroute? Or telnetting to port 80?

+1.
When you have http working I suggest running:
http://netalyzr.icsi.berkeley.edu/index.html
to give you a benchmark of what your connection can do in the way of protocols.

Regards,
Hank

Greg - you may want try doing pings with large packets. You may have MTU
mismatch or some other problem with a link with lets small ICMP pings
through but mangles or discards large packets.

--vadim

I have seen this kind of problems in our customers networks and this is
motly related to MTU/MSS. Network reachability is fine but applciations does
not work. Apply the MSS on the LAN interface which is around 100 bytes less
than the MTU on the WAN interface.

-Thanks,
Viral.

Patrick,

  Thank you very much! Thank you to everyone else who replied.

  I did try TCP traceroute and it failed too. I didn't have a machine to telnet to on port 80 but I did try an ssh tunnel on port 9999 and it failed too.

  From what everyone is saying it sounds like it was the satellite internet provider's compression scheme that was having trouble or some kind of an MTU issue.

  What I don't understand is why when using traceroute UDP/TCP/GRE I could get replies from some routers but not all routers to the destination, and why some routes were bizarre. If it was a failure of the sat internet provider's compression scheme or an MTU issue wouldn't traceroute UDP/TCP/GRE fail completely? What could have happened to my packets that would make them go only part way or go the wrong way?

  According to our satellite internet service provider Bantel the outage was system wide.

Thank again!
Greg

I run a small network on a mission base in the Amazon jungle which is fed by a satellite internet connection. We had an outage from Feb 25th to the 28th where we had no connectivity with email, http/s, ftp, Skype would indicate it's connected but even chatting failed, basically everything stopped working except for ICMP. I could ping everywhere just fine. I started doing traceroutes and they all were very odd, all not reaching their destination and some hopping all over creation before dying. But if I did traceroute with ICMP it worked fine. Does this indicate our upstream (Bantel.net) had a BGP issue? Bantel blamed Hughesnet which is the service they resell. I'm wondering what kind of problem would let ping work fine but not any of the other protocols. It also seems odd that I could traceroute via UDP part way to a destination but then it would fail if the problem was my own provider. Thanks.

If this is the wrong forum for this post I'm sorry and please just hit delete. If this is the wrong forum but you'd be kind enough to share your expertise please reply off-list. Thanks!

Honestly, I would rate this as one of the most on-topic posts in a while.

BGP only handles reachability, not higher level protocols. (Of course, you can h4x0r anything to do jus about anything, but we are talking the general case here.)

If you can ping, BGP is working. If you can ping and cannot use TCP, then something other than BGP is at fault.

I've seen strange things like someone enabling TCP compression (common on very small or very expensive links) one side but not the other, which then allowed ICMP and UDP but not TCP. It is a great way to annoy someone. "See, I can ping, it must be your side!"

Have you tried TCP traceroute? Or telnetting to port 80?

  I did try TCP traceroute and it failed too. I didn't have a machine to telnet to on port 80 but I did try an ssh tunnel on port 9999 and it failed too.

Sure you do. Any web server will allow you to telnet to port 80.

  TiggerBook-Air3:~ patrick$ telnet www.yahoo.com 80
  Trying 67.195.160.76...
  Connected to any-fp.wa1.b.yahoo.com.
  Escape character is '^]'.
  GET GET
  <HEAD><TITLE>Not Found</TITLE></HEAD>
  <BODY BGCOLOR="white" FGCOLOR="black">
  <FONT FACE="Helvetica,Arial"><B>
   Your requested URL was not found.</B></FONT>
  
  <!-- default "Not Found" response (404) -->
  </BODY>
  Connection closed by foreign host.

[In case it wasn't clear, I typed "GET GET" myself, just to have the web server respond with something.]

  From what everyone is saying it sounds like it was the satellite internet provider's compression scheme that was having trouble or some kind of an MTU issue.

  What I don't understand is why when using traceroute UDP/TCP/GRE I could get replies from some routers but not all routers to the destination, and why some routes were bizarre. If it was a failure of the sat internet provider's compression scheme or an MTU issue wouldn't traceroute UDP/TCP/GRE fail completely? What could have happened to my packets that would make them go only part way or go the wrong way?

It was likely not MTU if you can traceroute to some places, but not others. Traceroute doesn't send or receive big packets.

And I didn't really see anything terribly unusual in the traces you sent, other than some not completing. If you are talking about the Cogent one, with many routers per hop, that's just standard load balancing.