B2021 - Using linked health and administrative data to reduce bias in observational research - 06/06/2013
Aim
The overarching aim is to examine how linked health and administrative data can be used to avoid bias in prospective cohort studies, using the Avon Longitudinal Study of Parents and Children (ALSPAC) as an exemplar. This aim will be addressed using simulation studies and by examining three questions of epidemiological importance:
a) Is breastfeeding associated with IQ at age 15? Linkage to education data (GCSE results) will be used to examine the missingness mechanism for IQ, and may be used in imputation of the missing values.
b) Is smoking in the early teenage years associated with educational attainment at age 16? Data on smoking from the young people's GP records will be used to examine missing data patterns in self-reported smoking and to investigate misclassification. GCSE results from linked educational data will be used as the outcome in this analysis.
c) Is maternal smoking in pregnancy associated with depression at age 17? As for smoking, linkage to relevant data held within GP records will be used to look at the objectives below in relation to this outcome.
Objectives
1. To develop methods for using linked health and administrative data to examine patterns of missing data and model missingness mechanisms in longitudinal studies such as ALSPAC, focussing in particular on outcomes and exposures that are likely to be MNAR (missing not at random).
2. To incorporate linked health and administrative data in multiple imputation models to explore biases introduced by missing data in exposures or outcomes in observational studies.
3. To compare data in ALSPAC to equivalent outcomes recorded in linked electronic primary care records (GP data) to investigate misclassification in the self-reported outcomes and, in particular, to identify whether these are subject to differential or non-differential misclassification.
4. To develop methods to use both linked data and self-reported data to minimise the impact of measurement error on analyses in observational studies.
As one of the exemplars involves obtaining data on depression from electronic patient records, a further objective is:
5. To devise and modify existing algorithms for defining depression using electronic GP data, using information contained within Read codes and to use this information to estimate the prevalence of depression among ALSPAC teenagers.
Exposure variables
Breastfeeding, smoking in pregnancy, early teenage smoking (at 12/13 years) - from ALSPAC and linked GP records
Outcome variables
IQ at 15 years, GCSE results (linked data), depression at 17 years - from ALSPAC and linked GP records
Confounding variables
Maternal and paternal education, family occupational social class, housing tenure, family adversity index (and the individual components), family income, maternal and paternal smoking, maternal & paternal pre and post-natal depression, parental conflict, marital status (parents), maternal age at birth, maternal alcohol intake in pregnancy, family composition.