Quakecon: Network Operations Center tour

Non-work, work related information. Many NANOG geeks might be interested
in this video tour of the Quakecon NOC tour. As any ISP operator knows,
gamers complain faster about problems than any NMS, so you've got to
admire the bravery of any NOC in the middle of a gaming convention floor.

What Powers Quakecon | Network Operations Center Tour
https://www.youtube.com/watch?v=mOv62lBdlXU

highlights:
  "happy and blinking"
  "two firewalls for the two att 1gig links, and two spare doing ....."

catalyst 6500's

Also the 3750 on top of the services rack is funny... because empty.

It would have been more interesting to see:

-- a network weather map
-- the ELK implementation
-- actual cache statistics (historically steam/game downloads are not
cahce'able)

Thanks for the share though Sean!

* mianosm@gmail.com (Steven Miano) [Sun 02 Aug 2015, 03:52 CEST]:

It would have been more interesting to see:

-- a network weather map
-- the ELK implementation
-- actual cache statistics (historically steam/game downloads are not
cahce'able)

Not quite true according to http://blog.multiplay.co.uk/2014/04/lancache-dynamically-caching-game-installs-at-lans-using-nginx/

Also, 2 Gbps for 4,400 people? Pretty lackluster compared to European events. 30C3 had 100 Gbps to the conference building. And no NAT: every host got real IP addresses (IPv4 + IPv6).

  -- Niels.

Also, 2 Gbps for 4,400 people? Pretty lackluster compared to European
events. 30C3 had 100 Gbps to the conference building. And no NAT:
every host got real IP addresses (IPv4 + IPv6).

ietf, >1k people, easily fits in 10g, but tries to have two for
redundancy. also no nat, no firewall, and even ipv6. but absorbing or
combatting scans and other attacks cause complexity one would prefer to
avoid. in praha, there was even a tkip attack, or so it is believed;
turned off tkip.

the quakecon net was explained very poorly. what in particular provides
game-quality latency, or lack thereof? with only 2g, i guess i can
understand the cache. decent bandwidth would reduce complexity. and
the network is flat?

randy

* randy@psg.com (Randy Bush) [Sun 02 Aug 2015, 13:37 CEST]:

ietf, >1k people, easily fits in 10g, but tries to have two for redundancy. also no nat, no firewall, and even ipv6. but absorbing or combatting scans and other attacks cause complexity one would prefer to avoid. in praha, there was even a tkip attack, or so it is believed; turned off tkip.

Didn't the IETF already deprecate TKIP?

the quakecon net was explained very poorly. what in particular provides game-quality latency, or lack thereof? with only 2g, i guess i can understand the cache. decent bandwidth would reduce complexity. and the network is flat?

Cabling up 4,400 ports does take a lot of effort, though.

The QuakeCon video was typical for a server guy talking about network: with a focus on the network periphery, i.e. some servers supporting the network. I guess a tale of punching 300-odd patchpanels is not that captivating to everybody out there.

  -- Niels.

Steam moved to http streaming few years ago for exact that reason

Quakecon is essentially a giant LAN party. Bring Your Own Computer (BYOC), and there are big gaming rigs at Quakecon, and compete on the LAN. There isn't that much Internet traffic. There is only 100Mbps wired to
each gaming station.

I'm not a quake fanatic, I don't know what are the important network metrics for a good gaming experience. But I assume the important metrics
are local, and they install a big central server complex in the center
of the room. I'm assuming the critical lag is between the central
server and the competitors; not the Internet. Otherwise they could
have all stayed home and played in their basements across the
Internet. Latency is probably more important than bulk bandwidth.

Cool stuff!

For reference here are the blog for the tech-crew at the worlds second largest LAN-party, The Gathering:
http://technical.gathering.org/

A few highlights:
* Over 12,000 Gigabit ports, 500 * 10Gigabit ports, 50 * 40Gigabit ports (not all utilized of course).
* Gigabit to all participants.
* Dual-stack public IPv4 and IPv6 to all participants.
* 30Gbit internet connection (upgradeable if needed).
* Zero-touch provisioning of all edge switches.

Most of the NMS and provisioning systems are made in-house and are available on github (https://github.com/tech-server/) and all configuration files are released to the public after each event on ftp://ftp.gathering.org (seems to be down at the moment).

I find this hard to believe.

:slight_smile:

I was hoping for more 'how the network is built' (flat? segmented? any
security protections so competitors can't kill off their competition?)
and ideally some discussion of why the decisions made a difference.
(what tradeoffs were made and why?)

It would be interesting to learn whether they saw any DDoS attacks or cheating attempts during competitive play, or even casual non-competitive play amongst attendees.

any security protections so competitors can't kill off their
competition?)

It would be interesting to learn whether they saw any DDoS attacks or
cheating attempts during competitive play, or even casual
non-competitive play amongst attendees.

I wonder if that would be a reason for the relatively anemic 1Gb Internet
pipe-- making sure that a DDoS couldn't push enough packets through to
inconvenience the LAN party.

(Disclaimer: $DAYJOB did the audio/visual/lighting for QuakeCon but we had
nothing to do with the network and I was utterly uninvolved in any way,
so my speculation is based on no information obtained from outside my own
skull.)

While increasing bandwidth is not a viable DDoS defense tactic, decreasing it isn't one, either.

While increasing bandwidth to the endpoint isn't viable wouldn't increasing
the edge bandwidth out to the ISP be a start in the right direction?

I would assume this would a start to the problem if your attacks were
volumetric.

Once the bandwidth is there you can look at mitigation before it reaches
the endpoint, in this case the computers on the floor (assuming no NAT).

It's completely reasonable when the world at large is only secondary to the local, on-net operations.

I recently wrapped up a 1300 players with gigabit connections where we had a single 5gig link. We never saturated the link and peaked at 3.92Gbps for a new minutes. Bandwidth usage peaks on the first day and settles down after that (the event was during an entire weekend starting on friday). If I recall correctly, average was around 2Gpbs.

We did not have a steam/web cache and I expect it might reduce even more the actual load on actual BW usage.

It has nothing to do with DDoS.

In a world of 430gb/sec reflection/amplification DDoS attacks, not really.

;>

Just increasing bandwidth has never been a viable DDoS defense tactic, due to the extreme asymmetry of resource ratios in favor of the attackers.

I was involved in delivering 1GigE to Dreamhack in 2001 which at the time (if I remember correctly), 4500 computers that participants brought with them.

Usually these events nowadays tend to use 5-20 gigabit/s for that amount of people, so 2x1GE is just not enough. Already in 2001 that GigE was fully loaded after 1-2 days.

It most certainly does. If the core of the mission is local LAN play and your Internet connection fills up.... who gives a shit? The games play on. If your 500 megabit corporate connection gets a 20 terabit DDoS, your RDP session to the finance department will continue to hum along just fine.