B1016 - Exploring missing data mechanisms in drugs and alcohol data MAR or MNAR - 02/07/2010

B number: 
B1016
Principal applicant name: 
Dr Ian White (MRC Biostatistics Unit, University of Cambridge, UK)
Co-applicants: 
Prof Kate Tilling (University of Bristol, UK), Dan Kackson (MRC Biostatistics Unit, University of Cambridge, UK)
Title of project: 
Exploring missing data mechanisms in drugs and alcohol data: MAR or MNAR?
Proposal summary: 

BACKGROUND

Missing data is a major concern in ALSPAC and other longitudinal studies. Many analyses assume that data are missing at random (MAR), but this assumption is hard to verify from data and is often not very plausible.

ALSPAC has sources of data that offer the opportunity to learn more about the missing data mechanism, and in particular about whether MAR is true:

1. Repeated attempts to contact participants at each wave.

2. External sources of data from schools and SATs.

Repeated attempts have previously been used to learn about missing data mechanisms , , but not in longitudinal data sets.

OBJECTIVES

1. To explore the missing data mechanism for key variables on drugs, alcohol and conduct disorder.

2. To develop appropriate statistical methods, in particular for handling repeated attempts data at multiple waves.

DATA

Main outcomes: drug use, alcohol use and conduct disorder derived from clinic visits at ages 11, 13, 15 and 17, and questionnaires at intermediate ages.

Related outcomes: schools data at years 6 and 11; SATs data at ages 11 and 14.

Auxiliary variables: the number of letters sent before each clinic visit happened (invitation, reminder, last chance) and before each questionnaire was returned (initial, reminder). Ideally we would like to use further information on whether "defaulters" were phoned and dates of letters, but we understand this is not currently available.

Covariates including sex, socio-economic status, ethnicity.

PLAN OF ANALYSIS

1. Descriptive statistics, including associations of the main outcomes with the related outcomes and the auxiliary variables.

2. Using related outcomes: assuming the main outcomes are MAR when the related outcomes are included in the model, explore the missing data mechanism when the related outcomes are not in the model. This could be done using multiple imputation.

3. Using repeated attempts: develop and fit models for repeated attempts at multiple waves. Hence explore developmental trajectories (perhaps using latent class models).

Paul Clarke and Michael Spratt are doing a related project focussing on repeated attempts at a single time point, and we will work with them to ensure that the two projects complement each other. We will work with Matt Hickman (drugs and alcohol strand) and Glyn Lewis ("risky behaviours" strand) to ensure that this project fits well with their strands.

PROPOSED OUTPUTS

1. A statistical paper describing the models for repeated attempts at multiple waves.

2. An epidemiological paper describing what has been learned about missing data in ALSPAC

Date proposal received: 
Friday, 2 July, 2010
Date proposal approved: 
Friday, 2 July, 2010
Keywords: 
Primary keyword: