B4301 - Guidelines for releasing de-identified synthesised data from the Avon Longitudinal Study of Parents and Children - 12/04/2023
The Avon Longitudinal Study of Parents and Children (ALSPAC) is a prospective birth cohort, based in and around Bristol, UK. Maintaining data security and participant anonymity and confidentiality are key principles for the study, hence why data access is restricted to bona fide researchers and ALSPAC data are not openly available. Despite these valid reason for restricting data availability, this position is somewhat in conflict with emerging best scientific practices, which encourage making data openly available to facilitate reproducible and replicable open scientific research.
Given the rich nature of the resource, ALSPAC data may also be valuable as a educational tool, such as for teaching methods such as longitudinal modelling or approaches to modelling missing data. To aid these efforts, we want to assess methods for generating and making openly-available synthesised ALSPAC data; these synthesised datasets are modelled on the original ALSPAC data, thus maintaining variable distributions and relations among variables, while at the same time preserving participant anonymity and confidentiality. We will explore how data can be synthesised using the ‘synthpop’ package in the R statistical programming language, and aim to present a list of guidelines which all researchers wishing to release such synthesised ALSPAC data must follow.