Federated roaming during COVID-19 isolation

Govroam and eduroam are services aimed at facilitating network roaming access to staff and students visiting participating organisations. Our current national response to the pandemic is to radically restrict travel and work for non-keyworkers, so we’d predict that Jisc’s roaming services would be barely used at present. The situation seems to be a little more nuanced than that.

The impact on govroam does follow the predicted pattern, with day-for-day comparisons with this time last year showing an 80% reduction in roaming authentications even without allowing for the growth of the service over that period. However, our view of govroam usage is limited to the actions of users who roam outside their local region, such that their authentication traffic is routed by Jisc’s central servers. Under normal circumstances, we estimate that about 80% of govroam roaming happens within regional federations, with staff traveling locally to neighbouring organisations. So we can’t yet draw any firm conclusions on how govroam usage has changed at the local level.

The picture for eduroam is a lot clearer, as all UK roaming authentication traffic passes through the Jisc core. Based on growth rates from the last couple of years, we would have expected eduroam traffic to have increased this March compared with last year by around 10%, but what we are actually seeing is a drop compared to last year of some 36%. Whilst this is a significant reduction, it’s by no means evidence of eduroam lying idle during the present crisis. We have still seen some 100 million authentications across 1.2 million unique devices in March this year.

So what is all this roaming activity in the education sector? Online resources seem to suggest that the majority of academic organisations are delivering teaching content online and campuses are closed to students, with non-pandemic research activities greatly curtailed. It’s unlikely we’ll get a clear picture until after the dust settled from the Corona incident. However, it’s worth noting that alongside the extensive govroam presence in the NHS, there is also an eduroam presence in most teaching hospitals. We wouldn’t be surprised if it turns out that the combination of medical research and relocated NHS staff accounts for the majority of this ongoing roaming activity.

In the meantime, your federated roaming team continues to keep the machinery turning: if you have a roaming requirement or query, please get in touch – particularly if it is regarding a service that may assist the keyworkers that are keeping society running in its current reduced scope.

A brief blip in the DNS

At around 5pm on Wednesday 25th March we received a few reports that some sites on Janet were having issues with DNS resolution.

This was caused by the site’s own resolver still performing DNSSEC Lookaside Validation (DLV), and a failure by the operators of the DLV zone that broke the DNSSEC signing of that zone.

In summary, if you’re using BIND as a DNS resolver, and you have ‘dnssec-lookaside auto‘ or ‘dnssec-lookaside yes‘ in your named.conf file, remove that line. It refers to an obsolete feature that is not required in any circumstances, and can — as was shown yesterday — break.

As a reminder, there was a time before the root zone of the DNS was signed with DNSSEC, but when zones lower down in the hierarchy were signed (or there were other holes between the root and a signed zone). To create a chain of trust, operators of a signed zone could place the Delegation Signer (DS) record in the DLV (as a DLV record), operated by ISC, and DNS resolvers could be configured to look in parallel to the main DNS hierarchy (look-aside) to check the signatures were correct.

As deployment of DNSSEC grew, the need for this became less, and eventually the DLV was replaced with a signed empty zone about two and a half years ago.

No default BIND configurations distributed by ISC ever had DLV enabled, but some operating system vendors had enabled it in their own configurations, and these may have been left in place following upgrades.

If it is enabled, the BIND configuration file named.conf will have the following line:


        dnssec-lookaside auto;

Or


        dnssec-lookaside yes;

Delete that line!

Deleting it will not break anything now. Not deleting will break something later as ISC would like to remove even the empty signed zone.

Quoting, ordering and delivery – what’s changed?

Following recent government advice on limiting movement to essential activities only, and further restrictions imposed by numerous parties on delivery of new infrastructure, this blog post aims to highlight the environment we’re now working in, and how it affects the quoting, ordering and delivery of Janet connectivity services.

Note that the majority of the information in this blog applies to the Janet IP and Intersite connectivity services where the installation of physical infrastructure is required. The Netpath and Microsoft Azure ExpressRoute services can often be provisioned as virtual connectivity over exiting physical infrastructure, so might be exempt from the restrictions detailed below.

The Global Connect service is largely based on connectivity in other countries so isn’t affected by the restrictions we’re seeing in the UK, but may still be affected by similar constraints in other countries.

For context, some of the specific restrictions we’re seeing are as follows:

  • Data centres and customer sites are limiting access to essential work only, which is largely regarded as fixing faults rather than delivery of new infrastructure. We’re also seeing a lot of customer sites on total lockdown with no access being permitted for any reason.
  • Openreach has announced a number of different measures over the past few days, which to our understanding equates to the following:
    • No new orders are being processed now until ­_at least_ June 2020.
    • All existing orders will be completed as much as possible, but any work required in customer sites is likely to be postponed until further notice.
    • Faults will continue to be fixed, but in the context of customer sites only if it’s safe to do so.
    • The exception to all of the above is around delivery of infrastructure to ‘critical services’, such as the NHS, which will take priority over everything else.
  • We’ve placed a number of restrictions on Janet engineers regarding their method and duration of travel.

There may be edge cases where underlying infrastructure required to fulfil a customer connectivity requirement is already ‘in the ground’ and the service can be delivered via soft config changes rather than any extensive engineering activity in the field. If this applies to you, well done, you’re one of the lucky ones, but for the vast majority of customers and requirements the conditions described below will apply.

Quoting

You can still request quotes for connectivity services, and we can still provide pricing, that hasn’t changed. So if you have a requirement for connectivity and would like a quote, or if you just need some costs for budgetary purposes, please ask either your Account Manager or our Quotes Team (connect@ja.net) who’ll be happy to assist.

Ordering

Once we’ve provided you with a quote, you can still place an order with us for the relevant service(s). No change there.

Delivery

This is where things have changed. Due to restrictions imposed by Openreach (our main partner for circuit delivery) on installation of new services, and the tightening of permissions to access Janet network points of presence and customer sites, delivery of new connectivity services is going to take a lot longer than usual. As a guide, delivery of services quoted and ordered this month (March 2020) shouldn’t be expected until October 2020, and could be later than that if further restrictions are imposed.

UPDATE 27/03/2020: Regarding orders currently ‘in flight’ (i.e. placed before restrictions were applied), we’re still seeing delivery activities being undertaken by Openreach and other suppliers, and in some cases circuits are being completed and handed over to us. In that situation, we’ll then proceed with bringing new connections into service, as long all work required at both the Janet PoP end and the customer end can be done via ‘remote hands’ by the PoP host and the customer, with Janet engineers carrying out all support and config tasks remotely. 

 

Networking FAQ

Members and Customers have been asking how Jisc are managing, and adapting usage of the Janet Network, and other key services, in light of the Coronavirus pandemic. Here’s an FAQ with our response as of 23rd March 2020.

As operations and service delivery changes to meet the rapidly changing needs of our Members and Customers we will add to this FAQ or post longer more detailed breakout posts.

 

Q: Has Janet got enough bandwidth?

Answer by James Blessing, Deputy director of network architecture

Yes, there is over 3TB of connected capacity on the edge of Janet to various peers of which 500G is for Global Transit[i]. In the first few days of the working from home directive we have seen a drop of about 1/3rd of the traffic on the Janet backbone but there has been a change between the traffic on the different individual peers. We continue to monitor our individual peers, moving traffic to alternative paths if any link shows sign of congestion, and arranging additional capacity where necessary.

[1] The Internet is a network of networks, to reach other autonomous networks an operator needs to either directly connect to every other network via a dedicated peering or contract with a larger network to ‘transit’ their network to get to other networks. Jisc has built the Janet network to peer with as many other networks as possible and then use two transit providers to reach the rest of the world.

 

Q: If my organisation’s Janet connection bandwidth needs upgrading will that still be done?

Answer by Neil Shewry, Head of delivery

If you’ve got an upgrade or new service on order with us already, we’ll continue with the work required to deliver it, subject to the various suppliers involved continuing to deliver their components (which at the time of writing they all are). But one issue may be site access, as we are seeing data centres and customer sites restricting access to nothing beyond essential work. That said, we’ll do all we can to deliver your service as quickly as we can.

If you want/need an upgrade, but haven’t yet approached us about it, then please come and talk to us. We’ll be happy to work with you to design and cost a solution to meet your requirements, and we’ll happily place orders on our suppliers for the infrastructure required. What we can’t guarantee is exactly how long it will take to deliver your service, as based on the restrictions mentioned above, access to the necessary locations on our network might be difficult. That said, we’ll do all we can to deliver your service as quickly as we can.

All of this boils down to us continuing to work in a BAU environment as much as we can, with the caveat that things will probably just take a little bit longer than usual.

UPDATE 24/03/2020: Given the latest restrictions imposed by the government, by our suppliers, and internally on our own engineers, the delivery of all new/upgraded services will almost certainly be delayed. We’ll endeavor to deliver services as quickly as possible, and will advise on expected lead times once we have the relevant information to hand.

 

Q: What effect has the COVID-19 situation had on the Janet Access Programme

Answer by Neil Shewry, Head of delivery

To date the situation hasn’t had a huge impact – networks are still being designed, orders are still being placed, and infrastructure is still being delivered. Projects continue. That said, we are seeing signs of access restrictions being imposed at data centres and at customer sites, places we need to get into for our engineers to engineers to deliver and enable new infrastructure. If telecoms engineers are accepted as ‘key workers’ according to the current list published by the government; that may help them back into work and similarly help IT managers argue the case for careful and considered access to their sites. Even with this exception led decision to allow access it is likely that Janet access projects will be delayed, but by how much we can’t currently say.

UPDATE 24/03/2020: Given the latest restrictions imposed by the government, by our suppliers, and internally on our own engineers, the Janet Access Programme will almost certainly be delayed. We’re busy re-planning and re-forecasting, and will share revised plans as soon as they’re available.

 

Q: How working from home has changed the traffic on the Janet Network?

Answer by Rob Evans, Chief Network Architect

As many of us are moving from working at a college or University to working at home, so the ways that data is flowing across the Janet network are changing. Read a detailed post on the impact on the Janet Network and how traffic is proactive monitored and managed.

https://shapingthefutureofjanet.jiscinvolve.org/wp/2020/03/20/how-working-from-home-has-changed-the-traffic-on-the-janet-network-and-what-happened-on-thursday/

 

Q: What has been ISPs’ response to the sudden shift to remote working/study and the demands that online delivery will place on staff and students’ home broadband. Will ISPs treat home broadband connections as they do leased lines for business, with QoS etc?

Answer by Rob Evans, Chief Network Architect & Mark Clark, Subject Specialist: infrastructure

It’s worth noting that most homeworking activities, even a multi-party Zoom conference, use far less bandwidth than a Netflix stream (I’ve just been on an hour-long 10 party zoom meeting with average download of 2Mbps, watching Netflix last night was 6Mbps).

This might be interesting from BT: https://newsroom.bt.com/the-facts-about-our-network-and-coronavirus/

The impact on most domestic ISPs will be minimal, the peak traffic times for domestic providers is usually 18:00 – 20:00 outside of normal office working hours, and mostly from service streaming, gaming and the like. Zoom and Skype et al don’t really have a huge bandwidth requirement comparatively speaking.

An issue closer to home is where problems arise in using Wifi in built up areas and blocks of flats with completing interference from neighbour’s wireless router and microwave ovens etc.

Teams calls, Zoom etc are sensitive to packet loss and jitter caused by this Wifi interference so it’s advisable to use an ethernet cable direct from your home laptop to your providers router and that usually really helps with performance.  It’s also worth remembering that the bandwidth performance quoted by ISP’s is the download speed, the upload speed is usually much slower like 10% of that quoted speed and that upload speed is important if you are on a conference. Try a broadband checker (such as https://www.speedtest.net/) to test that out.  There have been some noticeable issues or at least a step up in traffic at a national level between providers where ISP’s connect together (peering) which is monitored closely and are being addressed.

I don’t think there is going to be a mass deployment of QoS, because it’s difficult to know what traffic to differentiate, but it’s also going to be difficult to roll anything like that out to “N” million lines.

Q: How are Jisc’s federated roaming services, eduroam and govroam, effected?

Answer by Mark O’Leary, Head of Network access

With many of us working from home, the power of federated roaming to facilitate network access when visiting other sites might seem less relevant. However, to meet the demand of the current situation, many key workers particularly in the NHS are finding themselves redistributed to points of need. In many cases govroam and eduroam are part of this dynamic solution.

Jisc’s network access team is geographically distributed and contains homeworkers under normal circumstances, so the current situation is simply an extension of practices which we already familiar with, so both services are fully operational.

Q: Will my circuit fault still get fixed?

Answer by Neil Shewry, Head of delivery

Short answer is yes it will, it just may take a bit longer than usual. Openreach has recently announced MBORC (Matters Beyond Our Reasonable Control) status, which means they’ll still fix faults, but can’t currently commit to their usual SLA.

We’re seeing telecoms supplier work being split into a number of different categories at the moment, based on how important it is in the context of COVID-19, as follows:

  1. Blue Light services
  2. Critical National Infrastructure
  3. Welfare customers
  4. COVID-19 at risk
  5. Customers with no service
  6. Customers with significantly degraded service
  7. Customers with intermittent service
  8. Other repair and existing / new provisioning jobs

Fixing faults in our community starts at level 5 – ‘Customers with no service’.

So fear not, faults will still get fixed, they may just take longer than usual, and as always the precise time to resolution depends on the severity of the fault and the work required to fix it.

How working from home has changed the traffic on the Janet Network (and what happened on Thursday)

As many of us are moving from working at a college or University to working at home, so the ways that data is flowing across the Janet network are changing.

Up until this week, the largest traffic flows across Janet were inbound from GÉANT and the major content providers towards our members and customers.  The Janet network has direct connections (peerings) with the larger domestic broadband providers, plus many peerings with smaller providers and content providers across the LINX, the largest Internet Exchange Point (IXP) in the UK.

On Tuesday evening the Prime Minister told the country we should consider working from home where possible.  That was reflected in the traffic we saw on the network.  Between Wednesday 11th and Wednesday 18th March, we had shed about a third of our incoming traffic to Janet.

That’s understandable.  There are fewer people taking their laptops onto campus and accessing services that are hosted off Janet such as Google, Youtube, Office365, etc.  In addition to that, an increasing number of services provided by our members are provisioned in cloud providers, so when they’re accessed from home, the traffic uses the broadband provider’s own peerings with the the cloud providers and doesn’t touch Janet.

So, with all of that, there’s plenty of spare capacity on the backbone, yes?

Well, yes there is, but along with an overall reduction in traffic coming in to the network we saw a marked increase in traffic outbound to the larger domestic service providers such as BT and Virgin Media as people accessed resources that are still hosted on campus, or used VPNs that tunnelled all their traffic to their institution.  This wasn’t a marginal increase either, our Private Network Interconnects (PNIs) with the broadband providers, which had happily been sitting at less than 50% utilisation at any point before (including evenings, weekends and bank holidays), started to congest.

That required some manual intervention from the Janet NOC to move traffic from the PNIs to the LINX, on which we had more spare capacity, whilst we started the process to add more capacity.

Then, on Thursday, we started getting complaints about poor performance.  None of the PNIs or IXP connections were overloaded, and we couldn’t see what might have been causing it.  On top of that, some members said that when they changed provider the problem disappeared, which suggested it might have been a problem with one of our peers.

Our engineers, who in turn were working from home via a number of providers — BT, Virgin Media, Andrews & Arnold— hadn’t noticed problems on their own connections either.

However, reports kept coming in from our members, and we started getting a couple of reports via providers that suggested our connection to the LINX might have been at fault.  The interface was reporting no errors, and double-checking the configuration to ensure anti-DDoS measures weren’t catching the wrong traffic didn’t reveal anything either.  We then logged onto the LINX’s stats portal and that showed that they had been receiving errors from us over the last couple of days, but the error rate had jumped drastically over the preceding few hours.

This is a 100 gigabit ethernet interface which uses four lanes of 25Gbps to provide 100GE, and it appeared that one of the four laser diodes in the transceiver (a device about the size of a matchbox) had dropped its output power compared to the other three.

Usually we would then contact our maintenance providers to swap out the part, but fortunately there was an unused transceiver in one of the other routers in the PoP, and an engineer that could be there in 20 minutes (as opposed to a four hour SLA for the maintenance provider).  We swapped the transceiver and immediately got reports that the problems had eased.

There is no congestion on the Janet network or our external peerings at the moment, but as we settle into what is likely to be the ‘new normal’ for the foreseeable future, we are still in the process of adding capacity, and when that is done we’ll remove the manual steering of traffic towards the LINX.

An anecdotal observation on the traffic levels over the past couple of days is that instead of us all working through lunchtime, which seems to happen when we’re in the office, we currently see a drop off in traffic between 12pm and 2pm whilst you are running errands or having a proper lunch hour — or at least accessing fewer resources on Janet!

The next thing to watch is going to be what happens when the schools (largely) close next week, we’ll be keeping a close eye.

Wi-Fi 6E – the Wi-Fi refresh we all need?

As I write on February 3rd, OfCom has just launched a consultation around release of a block of 6GHz spectrum for wireless comms. It seems a good moment to talk about Wi-Fi 6E.

 

Before we start, let’s be clear what we’re talking about. Most of us are comfortable discussing the zoo of 802.11 standards, and this blog will largely be about 802.11ax. But the Wi-Fi Alliance, presumably aiming to assist the domestic consumer, has introduced a revised nomenclature that groups Wi-Fi technology into generations:

Wi-Fi 1* 802.11b (1999)
Wi-Fi 2* 802.11a (1999)
Wi-Fi 3* 802.11g (2003)
Wi-Fi 4 802.11n (2009)
Wi-Fi 5 802.11ac (2014)
Wi-Fi 6 802.11ax (2019)

* Legacy tech isn’t being actively rebranded.

Unfortunately, this cleaned up schema is already causing some confusion. As we’ll discuss below, the true benefits of 802.11ax can only be realised in uncongested spectrum, but Wi-Fi 6 co-exists in the 2.4GHz and 5GHz spectrum with legacy devices and non-WiFi applications, so a new category has had to be introduced, Wi-Fi 6E, which is the version of 802.11ax implemented on hardware that can broadcast in the newly released and therefore pristine 6GHz spectrum.

 

Space to grow

In 20 years, Wi-Fi has done an amazing job. It manages to carry more data than any other access technology, utilising only a tiny 600MHz total slice of the available spectrum. But Cisco forecasts that by 2022, Wi-Fi will carry 51% of global IP traffic over some 549 million hotspots worldwide (Cisco’s annual Mobile Visual Networking Index (VNI)), and more spectrum must be made available to accommodate this continuing growth.

It’s not just traffic volume that drives the case for releasing spectrum. To realise the benefits of the most recent generations of Wi-Fi technology, wider channel bands must be allocated. However, the congestion caused by unrelated technologies and legacy standards occupying the same unlicensed spectrum is a problem. Add in dynamic frequency selection (DFS) technology intervening to protect licensed infrastructure applications like air traffic control radar and the result is that even if your Wi-Fi 6 hardware is technically capable of operating on a 160MHz channel in the 5GHz ISM band, it will very rarely find sufficient free, contiguous space to actually do so. This effect hobbles your hardware, and in many cases an upgrade from Wi-Fi 5 to Wi-Fi 6 will show little if any benefit at the single device level. Wi-Fi 6 does do a lot better at handling larger groups of devices, so there is still a net benefit, but it is only when the new protocol capabilities are deployed in ‘empty’ spectrum capable of allocating them high bandwidth channels that anything approaching the theoretical maximums of performance that make Wi-Fi 6 a game changer kick in. This is where Wi-Fi 6E, designed to run in a 6GHz channel reserved for it, delivers.

Fortunately, Ofcom has just announced consultation around proposals to release 500MHz of the 6GHz band exclusively for Wi-Fi 6E use in the UK; this corresponds with the lower end of the block being proposed in the US and matches that under consideration by the EU. These frequencies will be released for low power use to protect existing license holders.

Source: Ofcom

 

What’s so different?

The feature set of Wi-Fi 6E is likely to be identical to Wi-Fi 6, but those features will work to their full potential when given the bandwidth to breathe. The key features that will drive a better connectivity experience for your users are:

  • WPA3 (wireless protected access v3): WPA3 is more resistant to brute force attacks than previous iterations. Wi-Fi 6 is the first generation to make WPA3 mandatory.
  • MU-MIMO (multi-user, multiple input, multiple output): The current generation Wi-Fi 5 struggles a little with anything more than 4 connected devices. It can handle 4 at once, but any additional ones have to wait in a queue to pass traffic. Wi-Fi 6 can handle 8 simultaneous flows, so there’s less queuing and lower latency.
  • OFDMA (orthogonal frequency division multiple access): this may be the really disruptive technology. Unlike previous generations of Wi-Fi which leave a channel open to a single device while its transmission completes, locking other devices out until they get a turn, OFDMA divides each channel into smaller sub-channels called resource units, so up to 30 devices can share a channel simultaneously instead of taking turns.
  • TWT (Target Wake Time): By scheduling with the router when any given device will attempt to communicate with it, they can power down antennae in between these periods, conserving power and reducing latency. This also makes Wi-Fi 6 great for Internet of Things (IoT) applications, where the power budget is limited.
  • No DFS: your devices are free to use the channels without external constraints designed to minimise interference with local licensed services.

 

What about 5G?

5G has received the vast majority of the hype but Wi-Fi 6/6E is going to make a similar impact. We are seeing research hinting that Wi-Fi 6 may be just as disruptive and likely to be quicker to market than a fully usable and pervasive 5G option. One thing is sure, that the marketing hype and early city-centre deployments of 5G will create expectations from the student population that we might have to look to Wi-Fi 6, and 6E in particular, to fulfill in the medium term.

Source: https://www.zdnet.com/article/cisco-rolls-out-wi-fi-6-networking-stack-bets-the-standard-will-enable-as-much-as-5g/

 

What does all this mean for education?

We are all probably responding to similar drivers on campus: higher bandwidth demands from applications, more mobile devices per user, and higher user densities. Wi-Fi 6E addresses each of these areas to some degree, so seems a strategic next step in upcoming wireless refresh exercises.

A particular strength is likely to be improving performance in high density usage areas like lecture theatres and other teaching venues. This is already a particular challenge for wireless engineers, often overcome by stacking multiple access points on different channels in the same space. The combination of wider channel bandwidth, MU-MIMO and OFDMA should improve issues around simultaneous user number limits in a single space, or the dreaded “everyone trying to log into the same system at the same point in a lecture” problem that often brings networks crashing to a halt.

So, if you have a wireless refresh in your immediate future, give serious consideration to the 6GHz version of Wi-Fi 6 / 802.11ax – and keep a wary lookout for pre-certification chipsets that may still be on the market.

 

Useful reading:

 

 

 

 

 

 

 

Capturing Requirements for Jisc Frameworks

I’ve procured and administered a handful of procurement frameworks on Jisc’s behalf over the years, and a lot of the language and accepted practices of public procurement are familiar to me (drummed in by colleagues in our excellent procurement unit). However, I’ve been struck recently that for a number of our members, when we invite them to complete an invitation to tender (ITT) for one of our frameworks, such as the dynamic purchasing system for procuring public wifi services, it may be the first time they’ve had to write in this particular idiom. Hence this short guide.

The purpose of an ITT is to describe exactly what you want to buy and to elicit responses from the supplier that give you enough information to judge whether they will supply a good solution and be a good delivery partner.

Background

In your ‘background’ section, you must set the context, and you have to keep reminding yourself that the bidders don’t know things that you take for granted. If later in the ITT you are going to ask them to describe how their solution will integrate with your network, you have to provide them with some details and diagrams of that network here. This will probably need to be suitably edited to remove unnecessary detail that might prompt a security concern, such as the IP address of routers. Similarly, if you are going to ask them to confirm that their solution has sufficient capacity, you need to offer some usage projections for the life of the solution based on your experience of your campus. Basically, for every question you ask them in subsequent sections, you need to reflect whether there is any background info that you have that would make it easier for them to answer fully that you could add in here. The more relevant detail you provide here, the fewer clarification questions you may get from the bidders, and ultimately the better design of solution you may be offered.

The bulk of your ITT consists of statements for the bidder to respond to, that can come in two flavours, a mandatory requirement or an informational requirement.

Mandatory Requirements

These are the nuclear option in your ITT. If you ask for something in a mandatory requirement (MR), the bidder must be able to provide exactly what you asked for. If they can’t, you are forced to throw the rest of their bid away and they cannot win the contract. Mandatory requirements are therefore graded purely on a pass or fail basis; you can’t assess how well they do whatever it is you asked for, or compare between bidders and decide which does that thing better or with more features. The language is typically direct and forceful to express this: The bidder must confirm that the proposed solution is capable of <X>.

If you write a too-narrow MR, or make assumptions about what kind of solution they might offer and phrase it with that in mind, you may find yourself forced to reject an otherwise excellent solution. For example, if you are sourcing a network system that will be used by minors, you might be concerned that it has to be firewalled from undesirable content on the internet. You might use an MR to specify that the bidder’s solution must integrate with your existing firewall (which you would have described in your background section, knowing you’d ask this question), or you might perhaps have a requirement such as the solution must be capable of implementing the site blacklist as published on the PREVENT website at a known URL. But you should avoid, for example, assuming that the firewall would be implemented as a router access control list just because that’s an approach that you are familiar with and describing it as such in an MR, because that might exclude an arguably better solution based on alternate technologies.

Generally, you are better off pairing an MR that describes at a high level the functionality that must be present with an informational requirement (IR) that gives you the opportunity to ask for further details.

Informational Requirements

IRs will typically form the bulk of your ITT.  They give you the chance to ask for details of approach and implementation, and require you to offer a marking scheme so you can indicate how well the bidder’s response meets the requirement you set. Those marks will help guide your eventual purchasing decision, and can also be used to give an indication of the relevant importance you assign to different aspects of the solution. You should make your IRs as specific as you can, to avoid bidders going off on a tangent and providing information you don’t need, but keep it focused on the area you are addressing in that IR; it’s seldom helpful to mix questions about multiple facets of the bidder’s proposal in a single IR.

Taking the firewall example above, you might end up with:

  • MR1 The bidder must confirm that their solution is capable of implementing firewalling rules.
  • IR1 The bidder shall describe, using diagrams where relevant:
    • how the proposed firewall solution would implement the blacklist as published by PREVENT at <URL> (5 marks);
    • what logs will be held (and for how long) of firewall operations (3 marks); and
    • the mechanism(s) by which the customer can change firewall parameters in real time (5 marks).

It’s a quirk of procurement rules that you must give maximum marks within your marking scheme to an answer that fully meets the requirement that you set out; you can’t leave ‘headroom’ marks for an even better answer that does everything you asked for plus even more that you didn’t mention.

It’s not just structure

This blog addresses the formalism of structuring the ITT language; it doesn’t really tell you what you should enquire about. How will the solution evolve with time to accommodate changing needs? How does the bidder see GDPR duties being divided with the customer? Could you introduce charging next year if you wanted to? All I can suggest is trying to anticipate the headaches that a future version of you might wish you’d avoided at this stage.

For expert advice on our frameworks, you can always speak to our team procurement@jisc.ac.uk.

Summary

The key ingredients of a well structured ITT that gives you the best possible chance of getting good bids in for consideration should include:

  1. a background statement that provides all the relevant detail a bidder might need about what you are seeking to purchase;
  2. a limited and highly selective handful of MRs that address only the most vital essentials of a solution that you can’t live without and are phrased in a general way open to a range of approaches;
  3. a comprehensive set of IRs that address every different facet of the proposed solution that you want information on (alongside their marking scheme to allow bidders to judge their relative importance to you).

Thoughts on Crisis Management

Earlier this month I attended CLAW 2019, the third Crisis Management Workshop for the GÉANT Community. The event was held at the Poznan Supercomputing and Networking Centre in Poland – not the easiest place to get to from the UK, but lovely once you’re there:

     

My role at Jisc is Head of Delivery, but I also act as a Major Incident Manager, part of our process for dealing with major network incidents. This blog post highlights some of my learnings from CLAW 2019 – how Crisis Management is done at other NRENs, how that differs from what Jisc does, and what improvements I can take back to the Jisc MI team.

Worth also noting that we’re constantly reviewing and updating our processes anyway, in light of incidents that occur and feedback we receive, and some of the things discussed in this post will feed into that ongoing cycle of continual improvement.

Before I start however, I couldn’t forgive myself if I didn’t mention that the trip started with my first ever flight on lesser known Hungarian airline WizzAir, and also that I stayed in an official Euro 2012 hotel in Poznan 🙂

So the event itself…..

It was split over 2 days – day 1 was a mix of presentations and a short (3 hours!) practical exercise, and day 2 was a much more in-depth (6 hours!!) practical exercise. I only attended the first day due to other commitments back at base on the second day, so what follows is based on the first day of the event only.

First thing to note is what Jisc calls Major Incident Management, everyone else seems to call Crisis Management, so for the purposes of this blog I’ll standardise on Crisis Management.

The contents of day 1 of the event can be split into 2 areas:

  1. Presentations from other NRENs about Crisis Management;
  2. A practical exercise in dealing with a crisis.

First the presentations….

The standout talk for me was by Anna Wilson from HEAnet. Anna presented on ‘Real Life Crisis: Network Outage During 9/11’, a fascinating look back at how the events of 18 years ago impacted the internet and global NREN connectivity. The talk also looked at the shape of internet in general in 2001, which in itself was eye opening, and ended with some reflections on the lasting impact of 9/11 and how it helped shape networking as we see it today. I was so impressed with Anna’s talk that I immediately tapped her up for a repeat performance at Jisc’s Networkshop conference next year, an offer she kindly accepted.

The remainder of the talks on day 1 of the event were from members of other NRENs across Europe, talking about their own approach to Crisis Management. All talks were interesting and informative in their own right, and it’s hard to summarise or pick out any highlights as I found all content useful. Perhaps the one thing that did become clear though was that everyone who presented had a similar approach to Crisis Management, and they all differed to the way Jisc does it.

Jisc MI structures and processes were borne out of a DDoS attack 3 years ago, where the scale of the attack prompted a surge of incoming calls that swamped the Jisc Service Desk, and as a result the MI processes we’ve developed have been almost entirely focused on Jisc’s approach to comms during a MI – managing calls into Jisc, and coordinating outbound comms to a variety of stakeholders. Aside from comms, one of the other principles key to Jisc’s approach is to leave the teams responsible for fixing the problem, to fix the problem. Let them focus on what they’re supposed to be doing, rather the bringing them into additional structures and meetings tasked with ‘managing’ the situation. This principle also extends to the people managing the people fixing the problem (engineering team leaders, for example).

All other NRENs I spoke to and watched present of course have a strong focus on comms as well, but also on how to deal with stressful situations, how manage priorities in a crisis, what to focus on, making clear decisions, etc etc etc. Jisc has chosen MI managers based on who is deemed best equipped to deal with such situations, rather than proactively developing people to be better prepared to act in such a way when required to. Most notably however was the involvement in a wide variety of groups in the process – as above, Jisc considers functions like engineering and security as inputs into the MI process, whereas other NRENs consider them part of the process – in the meetings, sharing information, and supporting decision making. Food for thought, and points I’ll definitely take back to the Jisc team.

Onto the group exercise….

It was hosted by Wouter Beijersbergen van Henegouwen, an external consultant specialising in Crisis Management. The scenario was based on a fictional NREN in a fictional country that had experienced a fictional data leak. The scenario consisted of 5 roles, each of which had its own set of information on the incident, and the exercise was to conduct a crisis meeting to ascertain what had happened and agree a course of action. With each role drip feeding various bits of information during the meeting, it had a genuine ‘real life’ feel to it which is so often hard to recreate in a simulation environment. My role was ‘manager’, meaning I knew less about the incident than most and had to chair the meeting whilst trying to piece together the series of events that had led to the crisis. Good fun, and a really useful exercise to take part in. I’ll be taking the format and supporting scenario information back to Jisc to feed into our next major incident workshop.

So overall a really good event, despite only experiencing half of it. I’ll definitely be attending again in future.

The journey home was very nearly derailed as I was forced to run through Warsaw airport to catch my connecting flight back to Heathrow, but thankfully time was just about on my side and my own personal crisis was averted with seconds to spare.

To 100Gbit/s, and beyond!

The capacity of the Janet network has always seen massive growth – in my time at Jisc it’s gone from the 10Gbits/s SuperJanet4 network in 2006, to the 100Gbits/s SuperJanet5 network in 2011, to the 600Gbits/s Janet6 network that operates today. We also made a bit of noise last year when we upgraded the core of the network to 400Gbit/s, which was a complex and time consuming piece of work, but ultimately one that put us at the forefront of R&E networking globally.

So, in summary, we’re almost constantly upgrading Janet. But why do we do it? In simple terms, this is why:

Traffic on the network just keeps rising, thanks to all of our lovely members and customers doing more and more exciting things that require more and more bandwidth.

It used to be the case (no more than 5 years ago) that even our biggest users were connected to Janet at 10Gbits/s and that was plenty. The past few years however we’ve seen a step change in network requirements, and the number of 100Gbits/s connected customers is on the rise! This post looks at those big users, when they made the leap to 100Gbit/s, and what they’re doing with it.

First up was the Science and Technology Facilities Council (STFC) Rutherford Appleton Laboratory (RAL). STFC is a world-leading multi-disciplinary science organisation. Its research seeks to understand the Universe from the largest astronomical scales to the tiniest constituents of matter, and creates impact on a very tangible, human scale. RAL’s 40Gbit/s of Janet connectivity was bursting at the seams by Q3 2018, and adding yet more 10Gbit/s channels was no longer the most efficient way to increase capacity, so 100Gbit/s upgrades were implemented. RAL is one of the five largest computing centres that make up the WLCG collaboration; a group of ~200 universities and research institutes around the world providing computing to The Large Hadron Collider (LHC) experiments. RAL archives around 12% of the LHC data produced at CERN. Whilst RAL has dedicated connectivity to CERN for the purposes of receiving this data (which is expected to be upgraded to 100Gbit/s by 2021 to meet the LHC Run 3 requirements), it then uses it’s Janet IP connectivity to share that data with the rest of the UK, hence needing high bandwidth Janet connections.

Next was Imperial College London who made the leap and upgrade Janet connectivity to 100Gbit/s in Q1 2019, and also implemented 100Gbit/s connectivity to the Jisc Shared Data Centre in Slough at the same time. Imperial’s previous 20Gbits/s of Janet connectivity was filling up, and behind the scenes there was a lot of throttling back of the particle physics researchers going on to avoid flooding the connections completely, so on upgrading Janet connectivity to 100Gbit/s all limits were removed, and the traffic graph below shows what happened.

(image taken from a presentation given by Imperial College London at Networkshop47)

University of Edinburgh, specifically its Advanced Computing Facility on the outskirts of Edinburgh – the high-performance computing data centre of EPCC housing a range of supercomputers. As of Q1 2019, connectivity via the University of Edinburgh 10Gbits/s Janet IP connections was no longer suitable given the increase seen in traffic to/from the ACF. Dedicated 100Gbit/s connections were successfully deployed from the ACF directly onto the Janet backbone in July 2019, relieving the pressure on the University and the Janet regional network in Edinburgh.

Finally the European Bioinformatics Institute (EBI) on the Hinxton Campus south of Cambridge –  is part of the European Molecular Biology Laboratory (EMBL) Europe’s flagship laboratory for the life sciences. No prizes for guessing that this customer generates, processes and transmits an enormous amount of data. Work is underway to upgrade raw bandwidth to the site from N x 10Gbit/s to 100Gbit/s (whilst also retaining the N x 10Gbits/s as well). On top of that 100Gbit/s connectivity will be provided from the Hinxton Campus to EMBL-EBIs new data centre to further support its activities.

 

So, what next?

Well we’re in discussion with a number of other institutions about upgrading from N x 10Gbits/s to 100Gbits/s, all of which we expect to come to fruition over the next 12 months – the traffic growth speaks for itself and there’s never really any option other than keep upgrading.

We’ll also continue to upgrade the Janet backbone to cope with the steady flow of upgrades at the edge, in units of 100Gbits/s and 400Gbits/s where appropriate.

Finally, we continue to crunch the numbers, run the reports and predict the future growth, so that we’re always ahead of the game in terms of knowing how and where the traffic levels are increasing. We also work closely with our optical and routing hardware vendors to understand the next generation of products they’re working on, as well as monitoring wider industry activities.