B3737 - The CIVIC project Predicting Covid-19 Impact on Vulnerable Individuals and Communities via Health and Loyalty Card data - 19/03/2021
Diseases such as COVID-19 are not defeated, and there remains an urgent need for improved modelling of disease incidence to support future early-warning systems; improved prevalence estimation; and understanding of long term impacts to vulnerable communities. Finding data to underpin such analyses, however, is extremely difficult indeed. Much COVID incidence goes completely unreported; even the largest studies are limited to tiny fractions of the population, and biased towards specific demographics; and generalized studies have to use either tip-of-the-iceberg medical statistics (ONS, NHS), fine-grained but unsustainable self-reporting technologies (e.g. KCL’s COVID Symptom Study app), or broad brush behavioural data (e.g. Google COVID-19 mobility reports, Social media data). With a lack of widespread adoption of track-and-trace systems in the UK, and (understandably) declining public engagement with self-reporting initiatives, new approaches are urgently required.
A solution is available however. Epidemics such as COVID-19 are now well recognized as being driven as much by behavioural factors as they are from clinical ones - behavioural factors that are richly embedded in the mass, anonymized retail transaction logs held (untapped) in CIVIC's private-sector partners' datasets. Through the interrogation of such data (e.g. health purchases from the UK's leading health retailer), CIVIC will address key epidemiological knowledge gaps in: determining dynamic estimates of untested COVID-19 via fine-grained pharmacy and self-medication datasets; advancing AI/Modelling knowledge by triangulating behavioural features embedded in >1.5 billion loyalty-card logs - that can act as predictors of future outbreak; and advancing the state-of-the-art in identification and mapping of key vulnerable communities across the UK (Olio, Fareshare, ONS).
In such modelling "ground-truth" labelling of target (independent) variables) is crucial. In stage 1 of CIVIC, analysis will occur at aggregated LSOA geographical levels, with data-points labelled with (a) time series of recorded incidence in each LSOA (NHS) and (b) time series of relevant 111-related activity. However in Stage-2, CIVIC will importantly, also investigate the potential of "data-linkage", an approach that holds potential to underpin finer-grained individual level analyses at scale. Such processes must be engrained with privacy-by-design, and underpinned by informed and participatory consent, if they are ever to contribute to public health.