Earlier this month I attended CLAW 2019, the third Crisis Management Workshop for the GÉANT Community. The event was held at the Poznan Supercomputing and Networking Centre in Poland – not the easiest place to get to from the UK, but lovely once you’re there:
My role at Jisc is Head of Delivery, but I also act as a Major Incident Manager, part of our process for dealing with major network incidents. This blog post highlights some of my learnings from CLAW 2019 – how Crisis Management is done at other NRENs, how that differs from what Jisc does, and what improvements I can take back to the Jisc MI team.
Worth also noting that we’re constantly reviewing and updating our processes anyway, in light of incidents that occur and feedback we receive, and some of the things discussed in this post will feed into that ongoing cycle of continual improvement.
Before I start however, I couldn’t forgive myself if I didn’t mention that the trip started with my first ever flight on lesser known Hungarian airline WizzAir, and also that I stayed in an official Euro 2012 hotel in Poznan 🙂
So the event itself…..
It was split over 2 days – day 1 was a mix of presentations and a short (3 hours!) practical exercise, and day 2 was a much more in-depth (6 hours!!) practical exercise. I only attended the first day due to other commitments back at base on the second day, so what follows is based on the first day of the event only.
First thing to note is what Jisc calls Major Incident Management, everyone else seems to call Crisis Management, so for the purposes of this blog I’ll standardise on Crisis Management.
The contents of day 1 of the event can be split into 2 areas:
- Presentations from other NRENs about Crisis Management;
- A practical exercise in dealing with a crisis.
First the presentations….
The standout talk for me was by Anna Wilson from HEAnet. Anna presented on ‘Real Life Crisis: Network Outage During 9/11’, a fascinating look back at how the events of 18 years ago impacted the internet and global NREN connectivity. The talk also looked at the shape of internet in general in 2001, which in itself was eye opening, and ended with some reflections on the lasting impact of 9/11 and how it helped shape networking as we see it today. I was so impressed with Anna’s talk that I immediately tapped her up for a repeat performance at Jisc’s Networkshop conference next year, an offer she kindly accepted.
The remainder of the talks on day 1 of the event were from members of other NRENs across Europe, talking about their own approach to Crisis Management. All talks were interesting and informative in their own right, and it’s hard to summarise or pick out any highlights as I found all content useful. Perhaps the one thing that did become clear though was that everyone who presented had a similar approach to Crisis Management, and they all differed to the way Jisc does it.
Jisc MI structures and processes were borne out of a DDoS attack 3 years ago, where the scale of the attack prompted a surge of incoming calls that swamped the Jisc Service Desk, and as a result the MI processes we’ve developed have been almost entirely focused on Jisc’s approach to comms during a MI – managing calls into Jisc, and coordinating outbound comms to a variety of stakeholders. Aside from comms, one of the other principles key to Jisc’s approach is to leave the teams responsible for fixing the problem, to fix the problem. Let them focus on what they’re supposed to be doing, rather the bringing them into additional structures and meetings tasked with ‘managing’ the situation. This principle also extends to the people managing the people fixing the problem (engineering team leaders, for example).
All other NRENs I spoke to and watched present of course have a strong focus on comms as well, but also on how to deal with stressful situations, how manage priorities in a crisis, what to focus on, making clear decisions, etc etc etc. Jisc has chosen MI managers based on who is deemed best equipped to deal with such situations, rather than proactively developing people to be better prepared to act in such a way when required to. Most notably however was the involvement in a wide variety of groups in the process – as above, Jisc considers functions like engineering and security as inputs into the MI process, whereas other NRENs consider them part of the process – in the meetings, sharing information, and supporting decision making. Food for thought, and points I’ll definitely take back to the Jisc team.
Onto the group exercise….
It was hosted by Wouter Beijersbergen van Henegouwen, an external consultant specialising in Crisis Management. The scenario was based on a fictional NREN in a fictional country that had experienced a fictional data leak. The scenario consisted of 5 roles, each of which had its own set of information on the incident, and the exercise was to conduct a crisis meeting to ascertain what had happened and agree a course of action. With each role drip feeding various bits of information during the meeting, it had a genuine ‘real life’ feel to it which is so often hard to recreate in a simulation environment. My role was ‘manager’, meaning I knew less about the incident than most and had to chair the meeting whilst trying to piece together the series of events that had led to the crisis. Good fun, and a really useful exercise to take part in. I’ll be taking the format and supporting scenario information back to Jisc to feed into our next major incident workshop.
So overall a really good event, despite only experiencing half of it. I’ll definitely be attending again in future.
The journey home was very nearly derailed as I was forced to run through Warsaw airport to catch my connecting flight back to Heathrow, but thankfully time was just about on my side and my own personal crisis was averted with seconds to spare.