24 Mar

Data in the fight against pandemics

How is data influencing the decisions made to fight the Covid-19 pandemic?
Anthony Payne
min read

Core decision makers are relying on experts to collect and interpret hundreds (or thousands) of data sources, define core metrics to focus on, and deploy algorithms, models, predictions, simulations, etc. to determine the impacts of various measures. Read more to find out how this data is enabling our decision makers to create scenarios, plan, advise and make the most complex data-driven decisions in the world today.


Although individuals around the globe tend to have their own views and opinions about how to tackle a pandemic, in reality, they likely lack the data necessary for decision making. I have no doubt you have heard statements similar to the following (amongst many others):

  • Lockdowns are the best solution - they stop the spread of viruses and disease.
  • People should continue with life - the quicker the virus spreads, the faster this is over.
  • We need a balanced approach - let's flatten the curve.

Without sounding insensitive, each of these approaches has potential advantages and disadvantages for various groups of people. In an extreme over-simplification of the process that decision makers use, there are two somewhat contradicting metrics that somehow must be jointly optimised:

  • Minimise economic and social disruption
  • Minimise hospitalisation and fatality rates

The simple answer is that it is impossible to optimise both. Instead, core decision makers rely on experts to collect and interpret hundreds (or thousands) of data sources, define core metrics to focus on, and deploy algorithms, models, predictions, and other research to determine the impacts of various measures.


It is difficult to grasp the extent of data sources that are considered by experts when dealing with a pandemic. One of the core data-driven pandemic advisory groups in the UK is the Scientific Pandemic Influenza Group on Modelling (SPI-M). SPI-M provides advice to the UK government on scientific matters to form the government's response to influenza pandemics and human infectious disease threats.

SPI-M has a long-standing publication titled SPI-M Modelling Summary, which is a regularly updated, publicly available paper describing how SPI-M models influenza pandemics (find it here). This paper was most recently updated in November 2018 - which seems far away right now - but if you find the time to read through, it will strike you as familiar in the world today.

At the core of this paper, which existed long before the ongoing COVID-19 pandemic, there is a descriptive listing of all the data sources required for general pandemic modelling.

I cannot repeat them all here, as there are four pages, single-spaced, of data requirements, and these are simple descriptive bullet points each of which may pull from dozens to hundreds of data sources. For example, one of the approximately 20 broad data categories is from the National Pandemic Flu Service (somewhat like the 111 service with a focus on influenza), which includes:

  • Lists of all positive identifications of pandemic influenza
  • Age group, sex, risk group
  • Geographical region, district, postcode
  • Number of patients referred on to GP and their complications
  • Children referred to GPs and assessed, along with above information
  • Delay from symptoms to treatment
  • Virology sample/confirmation

The data above, which accounts for a tiny fraction of the data drawn on when modelling pandemics, is pulling from many resources, comes in structured and unstructured forms, includes free text, joins to external data repositories, etc.

In addition to raw data sources, modelling and decision making is based on existing surveys and literature. In a nutshell, there are hundreds of academic publications that are relevant to pandemic modelling, and each of these draws on further data sources. To put some backing into the SPI-M 's massive experience and data-driven approach, I note that UK affiliated academics authored 24.3% of all epidemiological publications about Western European, Baltic, and Balkan countries from 1993-2012 (source here).


The simple answer is all of the above, plus a lot more.

The SPI-M, SPI-B (Scientific Pandemic Influenza Group on Behaviours), and NERVTAG (New and Emerging Respiratory Virus Threats Advisory Group), are the three core expert groups advising SAGE (Scientific Advisory Group for Emergencies) on the COVID-19 response.

Each of these organisations draws from their own pool of literature and use-case specific data sources. Additionally, data is being collected in real-time and at mass scale from within the country and across the globe to keep up-to-date in a rapidly changing ecosystem. I personally find it difficult to comprehend the quantity of data and breadth of sources being used to inform government decisions during the ongoing pandemic.

Public delivery of messages, advice, and measures are all given by a small number of individuals - the Prime Minister, Chief Medical Officer, and Chief Scientific Advisor to name a few of them. While it appears that decisions are being made by these few individuals based on empirical evidence (because who really believes it when the government says they are using science?), this really isn't the case (at least in the UK).

There are thousands of dedicated, hard-working scientists, analysts, engineers, developers, doctors, researchers, and academics that have collectively fed into some of the most complex data-driven decisions faced by the world today. Although there is always political input (such as setting the balance between health and economy), decisions are made because of expert handling of data.

Whether or not you believe the government is prioritising correctly, keep in mind that data is at the centre of fighting this pandemic. The plot below was created by advisory experts (link to pdf here) to give an indication of the shape of the COVID-19 pandemic curve based on different measures. Essentially, the advisors simulate the outcomes of each potential scenario, and it is then up to the government to make the decision on which method best fits their priorities (albeit with many other data feeds and experts further contributing to that decision). Keep in mind that all the curves you've seen in popular media are generalised for high-level summary reports - the analysis and real curves that underpin them are based on the complex data sources previously listed, and include real numbers, estimates, timescales, and likelihoods.

curve image


Another core resource for informing the public about decision-making in this pandemic is SAGE's Scientific evidence supporting the government response to COVID-19 (available here). Most of the linked papers and articles are advisory documents produced by the various advisory boards previously mentioned.

If you read through it, you will find that things you thought were 'off the cuff', 'unimportant', or 'not based on evidence' are often driven by extensive literature review, surveys, and other data sources.

As an example, in a publication from 6 March, there is discussion about the term 'isolation' and how before it is used widely, evidence is required to determine its public perception. Research into existing data and resources is likely how the term 'shielding' was developed by the government's advisory boards to positively reinforce the fact that this is a protective measure to save the most vulnerable individuals, rather than a scheme to cut them off from society.

Another great example of a subtle recommendation is that quarantine should be prescribed as 'voluntary' and should be spun as 'an act of altruistic civic duty'. This is suggested to increase the compliance rate of quarantine measures and reduce the stigma around quarantined households. This is also based on expert views, survey data, and the vast pool of scientific literature.

Even these examples of seemingly insignificant terminology recommendations draw from dozens of data sources, research pieces, and survey responses to create the best possible outcome in line with government and public health targets.


There are hundreds of other ways data is being used to fight the pandemic, including (but certainly not limited to) the following:

  • research into medication
  • comparing genetic variants of the virus to track outbreaks
  • treatment and vaccine development
  • economic modelling and predictions
  • measurement of unforeseen long-term impacts of various measures


If you find yourself questioning the decisions being made (which is a healthy and encouraged activity), I invite you to research and understand why they were made. There is an immense pool of data feeding into every decision. Even with a full understanding of the breadth and scope of data, classifying decisions as right or wrong is exceedingly difficult; without that understanding, it is impossible.

Our Most Recent Blog Posts

Discover our latest thoughts, tendencies, and breakthroughs in the realm of software development and data.

Swipe to View More

Get In Touch

Have a project in mind? No need to be shy, drop us a note and tell us how we can help realise your vision.

Please fill out this field.
Please fill out this field.
Please fill out this field.
Please fill out this field.
Send Message

Thank you.

We've received your message and we'll get back to you as soon as possible.
Sorry, something went wrong while sending the form.
Please try again.