B3011 - Investigating the performance of multiple imputation with increasing proportions of missingness - 11/12/2017

B number: 
B3011
Principal applicant name: 
Jon Heron | Bristol Medical School (PHS) (United Kingdom)
Co-applicants: 
Mr Paul Madley-Dowd, Dr Rachael Hughes, Professor Kate Tilling
Title of project: 
Investigating the performance of multiple imputation with increasing proportions of missingness
Proposal summary: 

Missing data is a common problem in epidemiology where participant drop out can substantially reduce the sample size of initially large cohorts. One method of dealing with missing data is to use multiple imputation (MI) in which copies of the dataset are created and missing values are replaced in each dataset using an imputation model. An analysis model is then fitted to each imputed dataset and the point estimates of model parameters are combined using Rubin’s Rules. Variables included in the imputation model but not the analysis model are known as auxiliary variables.

A common question among researchers and reviewers is what proportion of missing data warrants the use of MI. A lower threshold of 5% missingness has been suggested as a point below which MI provides negligible benefit. At the opposite end some reviewers suggest an upper threshold of 50% missingness above which MI should not be attempted.

The fraction of missing information (FMI) is a measure able to quantify the loss of information to missingness while accounting for the amount of information retained by other variables within the dataset. It can be thought of as the fraction of the total variance of a MI model that is attributable to the between imputation variance. The FMI can take values between 0 and 1 with low values being preferable.

In a simulation study proportions of missing data in a multivariate normal dataset were increased using a missing completely at random pattern. Multiple imputation was then used and its performance compared to complete case analysis. Imputation models with varying amounts of auxiliary information were investigated in terms of the bias and precision of parameter estimates, confidence interval coverage and FMI.

An empirical example, using ALSPAC data, will now be used to support the findings of the simulation study.

Date proposal received: 
Tuesday, 5 December, 2017
Date proposal approved: 
Wednesday, 6 December, 2017
Keywords: 
Statistics/methodology, Cognitive impairment, Statistical methods, Methods - e.g. cross cohort analysis, data mining, mendelian randomisation, etc.