network issue on ec2 classic us-east-1??

Hi,

Over the last 6 hrs i have had over 100 instances in us-east-1 in EC2
Classic fail their instance health checks and a reboot via the console
solves them. Logs on the host point to a loss of all network
connectivity. Anyone else experiencing something like this?

Reached out to AWS support and haven't gotten anywhere with that yet.

-Grant

Grant,

We have been having issues for a few weeks now with instances that randomly stop getting their IP from DHCP. Did you see any dhcp errors?

Regards,

Dovid

Hi David and Grant,

  We have been experiencing exactly the same issue also now whereby our
instances randomly stop getting their DHCP reservation and then drop
offline. A simple reboot in the AWS console usually sorts it but as yet we
do not know the root cause.

Regards,
Neil

On 1/15/16, 1:31 PM, "NANOG on behalf of Dovid Bender"

Neil / Dovid,

How long ago did your issues start? Symptoms are the same, but the issue
for me started early this morning at an alarming rate.

-Grant

Hi Grant,
  We saw the first confirmed issue last week. So far only experienced 2
confirmed - that last week and one this morning, but its possible there
have been others.

Neil

Gotcha, thanks for the info.
I am at 128 instances and counting in the last 8 hrs

-Grant

Could be residual from this incident yesterday? AWS claims it's been
resolved, though.

[RESOLVED] Instance Connectivity
3:13 PM PST We are investigating connectivity issues for some instances in
the US-EAST-1 Region.
3:33 PM PST We can confirm connectivity issues when using public IP
addresses for some instances within the EC2-Classic network in the
US-EAST-1 Region. Connectivity between instances when using private IP
addresses is not affected. We continue to work on resolution.
4:00 PM PST We continue to work on resolving the connectivity issues when
using public IP addresses for some instances within the EC2-Classic network
in the US-EAST-1 Region. For instances with an associated Elastic IP
address (EIP), we have confirmed that re-associating the EIP address will
restore connectivity. For instances using EC2 provided public IP addresses,
associating a new EIP address will restore connectivity.
6:19 PM PST We continue to work on resolving public IP address
connectivity for some EC2-Classic instances in the US-EAST-1 Region. We
have started to see recovery for some of the affected instances and
continue to work towards full recovery.
7:11 PM PST Between 2:26 PM and 7:10 PM PST we experienced connectivity
issues when using public IP addresses for some instances within the EC2
Classic network in the US-EAST-1 Region. Connectivity between instances
using the private IP address was not affected. The issue has been resolved
and the service is operating normally.

Thanks to all the replied on and off list!

tl;dr dhclient died and the instances gave up their IP's

Turns out this one was inadvertently my fault. I got bit by a bug in an
old version of NetworkManager. Something triggered an update of a package
on some of my instances, which lead to this bug showing up.

The bug appears in versions of NetworkManage prior to
NetworkManager-1.0.0-14.git2015012
https://bugzilla.redhat.com/show_bug.cgi?id=1285974
https://bugzilla.redhat.com/show_bug.cgi?id=1136836
https://rhn.redhat.com/errata/RHBA-2015-0311.html

Thanks!
Grant

Sorry for the delayed reply. It's been going on for about two weeks. The last few days have been ok but unless we know it's been fixed we will keep looking.

How has it been for you the last few days?

Regards,

Dovid