Re: puck not responding

We’re getting rocked by storms here in Michigan, could be related.

[ brief version of what happened from what I can tell reconstructing things]

I was alerted ~4am US/E yesterday about the issue. This machine has been generously hosted by my previous employer for quite some time, funnily enough it was 7 years ago almost to the day since I started my current employment.

The IPMI was not responsive and the machine was located in 350 Cermak, on a floor that was not impacted with the heat/cold event.

I have been meaning to move things off and on, but never quite had the motivation to tackle the task. Yesterday forced my hand.

Once I confirmed that we could get the machine out of the colocation facility (thank you again NTT) I drove from Michigan to Chicago, got lunch and picked up the machine and headed back to the colocation that I have in Michigan at the 123Net/DetroitIX site.

Once I had a console on it, I determined that this old machine had a few things that had been gradually updated and upgraded over time, not all the filesystem options were set correctly and after some tune2fs options were set and fstab updated to ensure everything is migrated fully from ext2 -> ext4 the system was able to be booted without issues.

Afterwards I’ve determined that there is still a hardware related problem, so I am now going to move it to new hardware later today schedule permitting as I want to go onsite and make sure that the I/O is performant.

Feb 28 22:09:05 kernel: Memory: 32816872K/33544380K available (20480K kernel code, 3276K rwdata, 14748K rodata, 4588K init, 4892K bss, 727248K reserved, 0K cma-reserved)
Feb 29 00:20:07 kernel: Memory: 16326408K/16767164K available (20480K kernel code, 3276K rwdata, 14748K rodata, 4588K init, 4892K bss, 440496K reserved, 0K cma-reserved)

Not quite a great thing when nobody is onsite and the machine requires being power cycled and the amount of memory changes.

If you are seeing any other issues, do let me know, I did move the IPv4 space but have renumbered for v6, so if you use my free secondary dns service, and your own vanity name, you will need to update your AAAA records.

If you are seeing any reachability issues let me know, there should be ROA and other objects in place for things.

Sorry everyone got this email, feel bad it’s like when warren asked the list some personal details :slight_smile:

- Jared

(Even more details: changing disk images from qcow -> qcow2 and other things like ext2 -> ext3/4 over all the years as the machine has gone from Linux -> FreeBSD -> Linux again and other things is always a fun way to keep bringing your legacy around with you, it’s good overall)

On behalf of cisco-nsp and outages - we salute you.

-Hank

Apparently some of the most important email lists, Outages, etc, are
being kept online by 1 person's Unix/Linux server.

Thank you greatly for your service

Regards,

There’s other people who have access etc, but when it comes to hardware that is quite old, last substantive refresh was in 2011, it’s served its purpose well.

Obligatory xkcd xkcd: Dependency

Yeah, thats cool. It reminds me good old internet from 90's and early 2000.

Anyway, if that list is so importand, maybe its time to run
it with redundancy of 1+N (master-slave topology)? Its all MTAs
so its pretty easy, all you need to sync data from master to slaves
via push (best, because its nearly instant). Slave down? Nothing really
happened. Master down? next Slave takes over and bring Master online or
nominate any of those slaves as new Master.

If it wasn’t for how clunky they are with email sites, I’d suggest moving to a cloud somewhere. But …

-George

George Herbert <george.herbert@gmail.com> writes:

If it wasn’t for how clunky they are with email sites, I’d suggest
moving to a cloud somewhere. But …

I believe statistics point in favour of the single puck.nether.net
host....

BTW, for anyone else taking advantage of the excellent secondary service
provided by puck: You might want to update your AXFR ACLs. It seems the
IPv6 address has changed.

I must admit that such transfer failures go unnoticed due to the large
volume of unwanted requests. So I appreciate the extra effort sending
an email warning when a zone i disabled.

Thanks for running all these high quality services!

Bjørn

George Herbert <george.herbert@gmail.com> writes:

> If it wasn’t for how clunky they are with email sites, I’d suggest
> moving to a cloud somewhere. But …

I believe statistics point in favour of the single puck.nether.net
host....

BTW, for anyone else taking advantage of the excellent secondary service
provided by puck: You might want to update your AXFR ACLs. It seems the
IPv6 address has changed.

I must admit that such transfer failures go unnoticed due to the large
volume of unwanted requests. So I appreciate the extra effort sending
an email warning when a zone i disabled.

  Yes, I'm notifying people now and have updated the FAQ/docs
page. I also said there that I would notify people if the geography of
the machine changed and it has.

  I still need to get my upstreams to notify all their upstreams
to permit packets as there's one provider that does uRPF in the mix, so
I have blocked their routes for now.

Thanks for running all these high quality services!

it's the sustained community efforts that have allowed technology to
improve to the point where auto-updates and many other things are
without trouble, sadly i had to do a bit of physical moving of things,
but the machine should now have a ~10g uplink and if I can find the
right 100g device that I'm happy with I'm in a better position to
update/upgrade it now compared to a week ago.

  - Jared